Paperman Search Server

A lightweight HTTP server for searching and listing files in a Paperman paper repository.

Overview

The Paperman Search Server provides a REST API to search for and list document files (.max, .pdf, .jpg, .jpeg, .tiff) in a Paperman repository. It’s designed to be used by external applications that need to query the paper repository without direct filesystem access.

Building

The server requires Qt 5 (or Qt 4 with reduced functionality) and is built using qmake:

qmake paperman-server.pro
make

This will produce the paperman-server executable.

Running the Server

Basic Usage

./paperman-server <repository-path>

Example:

./paperman-server /home/user/Documents/papers

Options

  • -p, --port <port> - Port to listen on (default: 8080)

  • -h, --help - Show help message

Example with custom port:

./paperman-server -p 9000 /home/user/Documents/papers

API Endpoints

All endpoints return JSON responses.

GET /status

Get server status and repository information.

Response:

{
  "status": "running",
  "repository": "/home/user/Documents/papers"
}

GET /list

List all files in a directory.

Query Parameters: - path (optional) - Directory to list (relative to repository root, default: root)

Example:

curl "http://localhost:8080/list"
curl "http://localhost:8080/list?path=2024/invoices"

Response:

{
  "success": true,
  "path": "",
  "count": 20,
  "files": [
    {
      "name": "document1.max",
      "path": "document1.max",
      "size": 293568,
      "modified": "2024-01-15T10:29:22"
    }
  ]
}

Page Delivery

When an individual page is requested (/file?path=...&page=N), the server converts it to a single-page PDF. The compression strategy depends on the page content:

  • Greyscale/colour pages (8 or 24 bpp) use JPEG compression (DCTDecode) at quality 80. This gives a 3–5x size reduction for greyscale and up to 13x for colour pages that are really greyscale with scanner noise.

  • Monochrome pages (1 bpp) keep FlateDecode (zlib). JPEG is unsuitable for hard black/white edges and FlateDecode already compresses 1-bit data very well (~11 KB per page).

Scanner-produced “colour” pages whose RGB channels differ by no more than 10 levels are automatically detected as greyscale and converted before JPEG encoding.

Supported File Types

The server searches for and lists the following file types: - .max - Paperman/Maxview native format - .pdf - PDF documents - .jpg, .jpeg - JPEG images - .tiff, .tif - TIFF images

CORS Support

The server includes CORS headers (Access-Control-Allow-Origin: *) to allow access from web applications.

Error Handling

Errors are returned with appropriate HTTP status codes and JSON error messages:

{
  "success": false,
  "error": "Directory does not exist"
}

Common HTTP status codes: - 200 OK - Request successful - 400 Bad Request - Missing or invalid parameters - 404 Not Found - Endpoint not found - 405 Method Not Allowed - Only GET requests are supported

Security Notes

  1. The server only provides read-only access to the repository

  2. All file paths are relative to the repository root to prevent directory traversal

  3. No authentication is currently implemented - use firewall rules or reverse proxy for access control

  4. Consider running behind a reverse proxy (nginx, Apache) for production use

Integration Examples

Using curl

# Search for files containing "invoice"
curl "http://localhost:8080/search?q=invoice"

# List files in root directory
curl "http://localhost:8080/list"

Using Python

import requests

# Search for files
response = requests.get('http://localhost:8080/search', params={'q': 'invoice'})
results = response.json()
print(f"Found {results['count']} files")
for file in results['results']:
    print(f"  {file['name']} - {file['size']} bytes")

Using JavaScript (browser or Node.js)

// Search for files
fetch('http://localhost:8080/search?q=invoice')
  .then(response => response.json())
  .then(data => {
    console.log(`Found ${data.count} files`);
    data.results.forEach(file => {
      console.log(`  ${file.name} - ${file.size} bytes`);
    });
  });

Troubleshooting

Server won’t start: - Check if the port is already in use: netstat -ln | grep 8080 - Verify the repository path exists and is accessible - Check file permissions

No results returned: - Verify files exist in the repository - Check file extensions match supported types - Try a broader search pattern

Connection refused: - Ensure the server is running - Check firewall settings - Verify you’re connecting to the correct host and port

License

GPL-2 (same as Paperman)

Copyright (C) 2009 Simon Glass