Paperman Search Server ====================== A lightweight HTTP server for searching and listing files in a Paperman paper repository. Overview -------- The Paperman Search Server provides a REST API to search for and list document files (.max, .pdf, .jpg, .jpeg, .tiff) in a Paperman repository. It’s designed to be used by external applications that need to query the paper repository without direct filesystem access. Building -------- The server requires Qt 5 (or Qt 4 with reduced functionality) and is built using qmake: .. code:: bash qmake paperman-server.pro make This will produce the ``paperman-server`` executable. Running the Server ------------------ Basic Usage ~~~~~~~~~~~ .. code:: bash ./paperman-server Example: .. code:: bash ./paperman-server /home/user/Documents/papers Options ~~~~~~~ - ``-p, --port `` - Port to listen on (default: 8080) - ``-h, --help`` - Show help message Example with custom port: .. code:: bash ./paperman-server -p 9000 /home/user/Documents/papers API Endpoints ------------- All endpoints return JSON responses. GET /status ~~~~~~~~~~~ Get server status and repository information. **Response:** .. code:: json { "status": "running", "repository": "/home/user/Documents/papers" } GET /search ~~~~~~~~~~~ Search for files matching a pattern. **Query Parameters:** - ``q`` (required) - Search pattern (case-insensitive substring match) - ``path`` (optional) - Subdirectory to search in (relative to repository root) - ``recursive`` (optional) - Search subdirectories (default: true) **Example:** .. code:: bash curl "http://localhost:8080/search?q=invoice" curl "http://localhost:8080/search?q=2024&path=archive&recursive=true" **Response:** .. code:: json { "success": true, "count": 2, "results": [ { "path": "invoice-2024.max", "name": "invoice-2024.max", "size": 293568, "modified": "2024-01-15T10:29:22" }, { "path": "archive/invoice-2023.pdf", "name": "invoice-2023.pdf", "size": 150234, "modified": "2023-12-31T15:30:00" } ] } GET /list ~~~~~~~~~ List all files in a directory. **Query Parameters:** - ``path`` (optional) - Directory to list (relative to repository root, default: root) **Example:** .. code:: bash curl "http://localhost:8080/list" curl "http://localhost:8080/list?path=2024/invoices" **Response:** .. code:: json { "success": true, "path": "", "count": 20, "files": [ { "name": "document1.max", "path": "document1.max", "size": 293568, "modified": "2024-01-15T10:29:22" } ] } Page Delivery ------------- When an individual page is requested (``/file?path=...&page=N``), the server converts it to a single-page PDF. The compression strategy depends on the page content: - **Greyscale/colour pages** (8 or 24 bpp) use JPEG compression (DCTDecode) at quality 80. This gives a 3--5x size reduction for greyscale and up to 13x for colour pages that are really greyscale with scanner noise. - **Monochrome pages** (1 bpp) keep FlateDecode (zlib). JPEG is unsuitable for hard black/white edges and FlateDecode already compresses 1-bit data very well (~11 KB per page). Scanner-produced "colour" pages whose RGB channels differ by no more than 10 levels are automatically detected as greyscale and converted before JPEG encoding. Supported File Types -------------------- The server searches for and lists the following file types: - ``.max`` - Paperman/Maxview native format - ``.pdf`` - PDF documents - ``.jpg``, ``.jpeg`` - JPEG images - ``.tiff``, ``.tif`` - TIFF images CORS Support ------------ The server includes CORS headers (``Access-Control-Allow-Origin: *``) to allow access from web applications. Error Handling -------------- Errors are returned with appropriate HTTP status codes and JSON error messages: .. code:: json { "success": false, "error": "Directory does not exist" } Common HTTP status codes: - ``200 OK`` - Request successful - ``400 Bad Request`` - Missing or invalid parameters - ``404 Not Found`` - Endpoint not found - ``405 Method Not Allowed`` - Only GET requests are supported Security Notes -------------- 1. The server only provides read-only access to the repository 2. All file paths are relative to the repository root to prevent directory traversal 3. No authentication is currently implemented - use firewall rules or reverse proxy for access control 4. Consider running behind a reverse proxy (nginx, Apache) for production use Integration Examples -------------------- Using curl ~~~~~~~~~~ .. code:: bash # Search for files containing "invoice" curl "http://localhost:8080/search?q=invoice" # List files in root directory curl "http://localhost:8080/list" Using Python ~~~~~~~~~~~~ .. code:: python import requests # Search for files response = requests.get('http://localhost:8080/search', params={'q': 'invoice'}) results = response.json() print(f"Found {results['count']} files") for file in results['results']: print(f" {file['name']} - {file['size']} bytes") Using JavaScript (browser or Node.js) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: javascript // Search for files fetch('http://localhost:8080/search?q=invoice') .then(response => response.json()) .then(data => { console.log(`Found ${data.count} files`); data.results.forEach(file => { console.log(` ${file.name} - ${file.size} bytes`); }); }); Troubleshooting --------------- **Server won’t start:** - Check if the port is already in use: ``netstat -ln | grep 8080`` - Verify the repository path exists and is accessible - Check file permissions **No results returned:** - Verify files exist in the repository - Check file extensions match supported types - Try a broader search pattern **Connection refused:** - Ensure the server is running - Check firewall settings - Verify you’re connecting to the correct host and port License ------- GPL-2 (same as Paperman) Copyright (C) 2009 Simon Glass