Paperman Search Server

A lightweight HTTP server for searching and listing files in a Paperman paper repository.

Overview

The Paperman Search Server provides a REST API to search for and list document files (.max, .pdf, .jpg, .jpeg, .tiff) in a Paperman repository. It’s designed to be used by external applications that need to query the paper repository without direct filesystem access.

Building

The server requires Qt 5 (or Qt 4 with reduced functionality) and is built using qmake:

qmake paperman-server.pro
make

This will produce the paperman-server executable.

Running the Server

Basic Usage

./paperman-server <repository-path>

Example:

./paperman-server /home/user/Documents/papers

Options

-p, --port <port> - Port to listen on (default: 8080)
-h, --help - Show help message

Example with custom port:

./paperman-server -p 9000 /home/user/Documents/papers

API Endpoints

All endpoints return JSON responses.

GET /status

Get server status and repository information.

Response:

{
  "status": "running",
  "repository": "/home/user/Documents/papers"
}

GET /search

Search for files matching a pattern.

Query Parameters: - q (required) - Search pattern (case-insensitive substring match) - path (optional) - Subdirectory to search in (relative to repository root) - recursive (optional) - Search subdirectories (default: true)

Example:

curl "http://localhost:8080/search?q=invoice"
curl "http://localhost:8080/search?q=2024&path=archive&recursive=true"

Response:

{
  "success": true,
  "count": 2,
  "results": [
    {
      "path": "invoice-2024.max",
      "name": "invoice-2024.max",
      "size": 293568,
      "modified": "2024-01-15T10:29:22"
    },
    {
      "path": "archive/invoice-2023.pdf",
      "name": "invoice-2023.pdf",
      "size": 150234,
      "modified": "2023-12-31T15:30:00"
    }
  ]
}

GET /list

List all files in a directory.

Query Parameters: - path (optional) - Directory to list (relative to repository root, default: root)

Example:

curl "http://localhost:8080/list"
curl "http://localhost:8080/list?path=2024/invoices"

Response:

{
  "success": true,
  "path": "",
  "count": 20,
  "files": [
    {
      "name": "document1.max",
      "path": "document1.max",
      "size": 293568,
      "modified": "2024-01-15T10:29:22"
    }
  ]
}

Page Delivery

When an individual page is requested (/file?path=...&page=N), the server converts it to a single-page PDF. The compression strategy depends on the page content:

Greyscale/colour pages (8 or 24 bpp) use JPEG compression (DCTDecode) at quality 80. This gives a 3–5x size reduction for greyscale and up to 13x for colour pages that are really greyscale with scanner noise.
Monochrome pages (1 bpp) keep FlateDecode (zlib). JPEG is unsuitable for hard black/white edges and FlateDecode already compresses 1-bit data very well (~11 KB per page).

Scanner-produced “colour” pages whose RGB channels differ by no more than 10 levels are automatically detected as greyscale and converted before JPEG encoding.

Supported File Types

The server searches for and lists the following file types: - .max - Paperman/Maxview native format - .pdf - PDF documents - .jpg, .jpeg - JPEG images - .tiff, .tif - TIFF images

CORS Support

The server includes CORS headers (Access-Control-Allow-Origin: *) to allow access from web applications.

Error Handling

Errors are returned with appropriate HTTP status codes and JSON error messages:

{
  "success": false,
  "error": "Directory does not exist"
}

Common HTTP status codes: - 200 OK - Request successful - 400 Bad Request - Missing or invalid parameters - 404 Not Found - Endpoint not found - 405 Method Not Allowed - Only GET requests are supported

Security Notes

The server only provides read-only access to the repository
All file paths are relative to the repository root to prevent directory traversal
No authentication is currently implemented - use firewall rules or reverse proxy for access control
Consider running behind a reverse proxy (nginx, Apache) for production use

Integration Examples

Using curl

# Search for files containing "invoice"
curl "http://localhost:8080/search?q=invoice"

# List files in root directory
curl "http://localhost:8080/list"

Using Python

import requests

# Search for files
response = requests.get('http://localhost:8080/search', params={'q': 'invoice'})
results = response.json()
print(f"Found {results['count']} files")
for file in results['results']:
    print(f"  {file['name']} - {file['size']} bytes")

Using JavaScript (browser or Node.js)

// Search for files
fetch('http://localhost:8080/search?q=invoice')
  .then(response => response.json())
  .then(data => {
    console.log(`Found ${data.count} files`);
    data.results.forEach(file => {
      console.log(`  ${file.name} - ${file.size} bytes`);
    });
  });

Troubleshooting

Server won’t start: - Check if the port is already in use: netstat -ln | grep 8080 - Verify the repository path exists and is accessible - Check file permissions

No results returned: - Verify files exist in the repository - Check file extensions match supported types - Try a broader search pattern

Connection refused: - Ensure the server is running - Check firewall settings - Verify you’re connecting to the correct host and port

License

GPL-2 (same as Paperman)