Paperman Search Server API Documentation
Overview
The Paperman Search Server provides a REST API for searching, listing, and retrieving files from paper repositories. It supports on-the-fly PDF conversion from various formats.
Base URL: http://localhost:8080
Version: 1.0
All endpoints use the GET HTTP method and return JSON responses
(except file downloads which return the file content).
Authentication
The server supports optional API key authentication via the
X-API-Key header.
Enabling Authentication
Set the PAPERMAN_API_KEY environment variable when starting the
server:
export PAPERMAN_API_KEY="your-secret-key-here"
./paperman-server /path/to/repository
Or with systemd:
# Edit /etc/systemd/system/paperman-server.service
[Service]
Environment="PAPERMAN_API_KEY=your-secret-key-here"
Using Authentication
Once enabled, all endpoints (except /status) require the API key:
# Without API key - fails
curl http://localhost:8080/search?q=test
# Response: {"error":"Invalid or missing API key...","success":false}
# With API key - works
curl -H "X-API-Key: your-secret-key-here" http://localhost:8080/search?q=test
Authentication Behavior
Disabled by default: If
PAPERMAN_API_KEYis not set, no authentication is requiredStatus endpoint exempt:
/statusalways works without authentication (for health checks)All other endpoints protected: When enabled,
/search,/list,/file,/reposrequire valid API key401 Unauthorized: Invalid or missing API key returns HTTP 401 with JSON error
Security Note: Always use HTTPS (SSL/TLS) when accessing the server over a network to prevent API key interception.
Common Response Format
Success Response
{
"success": true,
"data": "...",
"count": 0
}
Error Response
{
"success": false,
"error": "Error message description"
}
Endpoints
1. Server Status
Get the current server status and repository information.
Endpoint: GET /status
Parameters: None
Response:
{
"status": "running",
"repository": "/path/to/repository"
}
Example:
curl http://localhost:8080/status
2. List Repositories
Get a list of all configured repositories.
Endpoint: GET /repos
Parameters: None
Response:
{
"success": true,
"count": 2,
"repositories": [
{
"path": "/home/user/papers",
"name": "papers",
"exists": true
},
{
"path": "/home/user/archive",
"name": "archive",
"exists": true
}
]
}
Example:
curl http://localhost:8080/repos
3. Search Files
Search for files matching a pattern in the repository.
Endpoint: GET /search
Parameters:
Pa rameter |
Type |
Req uired |
Def ault |
Description |
|---|---|---|---|---|
|
st ring |
Yes |
Search pattern (partial filename match) |
|
` repo` |
st ring |
No |
F irst |
Repository name to search in |
` path` |
st ring |
No |
Root |
Directory path to search in (relative to root) |
|
boo lean |
No |
f alse |
Search subdirectories |
Response:
{
"success": true,
"pattern": "invoice",
"path": "/home/user/papers",
"count": 3,
"files": [
{
"name": "invoice-2023-01.pdf",
"path": "invoices/invoice-2023-01.pdf",
"size": 45632,
"modified": "2023-01-15T10:30:00"
},
{
"name": "invoice-2023-02.pdf",
"path": "invoices/invoice-2023-02.pdf",
"size": 52441,
"modified": "2023-02-12T14:22:00"
}
]
}
Examples:
# Basic search
curl "http://localhost:8080/search?q=invoice"
# Search in specific repository
curl "http://localhost:8080/search?q=invoice&repo=papers"
# Search in subdirectory
curl "http://localhost:8080/search?q=report&path=2023"
# Recursive search
curl "http://localhost:8080/search?q=contract&recursive=true"
Notes: - Pattern matching is case-insensitive - Searches for partial filename matches - Only returns files with supported extensions (.max, .pdf, .jpg, .tiff)
4. List Directory Contents
List all files in a specific directory.
Endpoint: GET /list
Parameters:
Par ameter |
Type |
Re quired |
De fault |
Description |
|---|---|---|---|---|
`` path`` |
st ring |
No |
Root |
Directory path (relative to repository) |
`` repo`` |
st ring |
No |
First |
Repository name |
Response:
{
"success": true,
"path": "invoices",
"count": 5,
"files": [
{
"name": "invoice-2023-01.pdf",
"path": "invoices/invoice-2023-01.pdf",
"size": 45632,
"modified": "2023-01-15T10:30:00"
},
{
"name": "invoice-2023-02.pdf",
"path": "invoices/invoice-2023-02.pdf",
"size": 52441,
"modified": "2023-02-12T14:22:00"
}
]
}
Examples:
# List root directory
curl "http://localhost:8080/list"
# List subdirectory
curl "http://localhost:8080/list?path=invoices"
# List in specific repository
curl "http://localhost:8080/list?path=2023&repo=archive"
5. Get File Content
Retrieve a file’s content, optionally converting it to PDF.
Endpoint: GET /file
Parameters:
Parameter |
Type |
Req uired |
Default |
Description |
|---|---|---|---|---|
|
str ing |
Yes |
File path (relative to repository) |
|
|
str ing |
No |
First |
Repository name |
|
str ing |
No |
|
Output type: |
|
int |
No |
0 |
Extract a single page from a PDF (1-based). Returns a standalone single-page PDF. |
|
str ing |
No |
Set to |
Response: - Success: Binary file content with appropriate
Content-Type header - Error: JSON error response
When pages=true is given, the response is JSON:
{
"success": true,
"pages": 5
}
When page=N is given, the response is a single-page PDF
(application/pdf). Extracted pages are cached in
/tmp/paperman-pages/ with the same 7-day expiry as thumbnails.
Content-Type Headers: - .pdf → application/pdf - .jpg,
.jpeg → image/jpeg - .tif, .tiff → image/tiff -
.max → application/octet-stream - PDF conversion →
application/pdf
Examples:
# Download original file
curl "http://localhost:8080/file?path=invoice.pdf" -o invoice.pdf
# Download from specific repository
curl "http://localhost:8080/file?path=document.pdf&repo=archive" -o document.pdf
# Convert JPEG to PDF on-the-fly
curl "http://localhost:8080/file?path=scan.jpg&type=pdf" -o scan.pdf
# Convert .max file to PDF
curl "http://localhost:8080/file?path=document.max&type=pdf" -o document.pdf
# Get page count for a PDF
curl "http://localhost:8080/file?path=document.pdf&pages=true"
# Download just page 1 (for fast initial display)
curl "http://localhost:8080/file?path=document.pdf&page=1" -o page1.pdf
PDF Conversion: - Supports: .max, .jpg, .jpeg, .tif,
.tiff - Conversion timeout: 30 seconds - Uses paperman’s built-in
conversion engine - Maintains image quality and metadata
Error Responses:
// File not found
{
"success": false,
"error": "File not found"
}
// Invalid path (directory traversal attempt)
{
"success": false,
"error": "Invalid file path"
}
// Conversion failed
{
"success": false,
"error": "PDF conversion failed: <error details>"
}
// Conversion timeout
{
"success": false,
"error": "PDF conversion timed out (30s limit)"
}
Supported File Types
The server handles the following file types:
Extension |
Description |
PDF Conversion |
Direct View |
|---|---|---|---|
|
Paperman format |
✅ |
❌ |
|
PDF document |
N/A |
✅ |
|
JPEG image |
✅ |
✅ |
|
TIFF image |
✅ |
✅ |
Error Codes
HTTP Code |
Description |
Common Causes |
|---|---|---|
200 |
OK |
Request successful |
400 |
Bad Request |
Invalid path, missing parameters |
401 |
Unauthorized |
Invalid or missing API key |
404 |
Not Found |
File/endpoint not found |
405 |
Method Not Allowed |
Non-GET request |
500 |
Internal Server Error |
Conversion failed, file read error |
501 |
Not Implemented |
Unsupported conversion (deprecated) |
CORS
All endpoints include CORS headers:
Access-Control-Allow-Origin: *
This allows web applications from any origin to access the API.
Rate Limiting
Currently, no rate limiting is implemented. The server is designed for trusted local or network use.
Examples
JavaScript/Fetch API
const API_KEY = 'your-secret-key-here'; // Set if authentication is enabled
// Search for files
fetch('http://localhost:8080/search?q=invoice', {
headers: {
'X-API-Key': API_KEY // Include if auth enabled
}
})
.then(response => response.json())
.then(data => {
console.log(`Found ${data.count} files`);
data.files.forEach(file => {
console.log(`- ${file.name} (${file.size} bytes)`);
});
});
// Download a file
fetch('http://localhost:8080/file?path=document.pdf', {
headers: {
'X-API-Key': API_KEY
}
})
.then(response => response.blob())
.then(blob => {
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'document.pdf';
a.click();
});
// Convert to PDF
fetch('http://localhost:8080/file?path=scan.jpg&type=pdf', {
headers: {
'X-API-Key': API_KEY
}
})
.then(response => response.blob())
.then(blob => {
const url = URL.createObjectURL(blob);
window.open(url, '_blank');
});
Python
import requests
API_KEY = 'your-secret-key-here' # Set if authentication is enabled
headers = {'X-API-Key': API_KEY} # Include if auth enabled
# Search for files
response = requests.get('http://localhost:8080/search',
params={'q': 'invoice'},
headers=headers)
data = response.json()
print(f"Found {data['count']} files")
# Download a file
response = requests.get('http://localhost:8080/file',
params={'path': 'document.pdf'},
headers=headers)
with open('document.pdf', 'wb') as f:
f.write(response.content)
# Convert to PDF
response = requests.get('http://localhost:8080/file',
params={'path': 'scan.jpg', 'type': 'pdf'},
headers=headers)
with open('scan.pdf', 'wb') as f:
f.write(response.content)
cURL
# Set API key if authentication is enabled
API_KEY="your-secret-key-here"
# Get server status (no auth required)
curl http://localhost:8080/status
# Search files (with auth)
curl -H "X-API-Key: $API_KEY" "http://localhost:8080/search?q=invoice" | jq
# List directory (with auth)
curl -H "X-API-Key: $API_KEY" "http://localhost:8080/list?path=2023" | jq
# Download file (with auth)
curl -H "X-API-Key: $API_KEY" "http://localhost:8080/file?path=document.pdf" -o document.pdf
# Convert to PDF (with auth)
curl -H "X-API-Key: $API_KEY" "http://localhost:8080/file?path=scan.jpg&type=pdf" -o scan.pdf
# Pretty print JSON response (with auth)
curl -s -H "X-API-Key: $API_KEY" http://localhost:8080/repos | jq .
Security Considerations
Path Traversal Prevention
The server prevents directory traversal attacks: - Paths containing
.. are rejected - Absolute paths starting with / are rejected -
All paths are resolved relative to the repository root
Network Security
For production use, consider: 1. Firewall: Restrict access to trusted IPs 2. Reverse Proxy: Use nginx/apache with SSL/TLS 3. Authentication: Add authentication layer via reverse proxy 4. Private Network: Run on private network only
File Access
Server runs with limited user permissions
Only configured repository paths are accessible
No write operations are supported (read-only API)
Performance
Response Times
Typical response times on local network:
Endpoint |
Response Time |
Notes |
|---|---|---|
|
< 1ms |
Cached information |
|
< 5ms |
Directory metadata |
|
10-100ms |
Depends on directory size |
|
5-50ms |
Depends on directory size |
|
10-500ms |
Depends on file size |
PDF convert |
1-30s |
Depends on file size/complexity |
Caching
Three disk caches are maintained under /tmp/, all keyed by an MD5
hash of the file path and modification time. Entries expire after 7
days and are cleaned on server start.
/tmp/paperman-thumbnails/JPEG thumbnails generated by
pdftocairo./tmp/paperman-pages/Single-page PDFs extracted from multi-page documents via
page=N./tmp/paperman-converted/Full-document PDFs converted from non-PDF formats (e.g.
.max) viatype=pdf. Conversion uses the File class directly, so no external binary is needed. Page images are extracted sequentially, then compressed in parallel across all available CPU cores usingQtConcurrent, then merged into the final PDF. If the requesting client disconnects mid-extraction the partial file is removed.
Troubleshooting
PDF Conversion Issues
Problem: Conversion returns error - Solution: Check journalctl
logs for detailed error messages:
sudo journalctl -u paperman-server -f
Problem: Conversion is slow for large files - Solution: The
first request converts and caches the result; subsequent requests are
served from /tmp/paperman-converted/. Compression runs in
parallel across all CPU cores. If the client disconnects before
conversion finishes, the server aborts and cleans up.
File Access Issues
Problem: “File not found” but file exists - Solution: Check file path is relative to repository root, not absolute
Problem: “Invalid file path” error - Solution: Path contains
.. or starts with /. Use relative paths only.
Changelog
Version 1.3 (Current)
Parallel PDF compression using
QtConcurrentacross all CPU coresStreamed file responses (512 KB chunks with flow control)
Conversion progress reporting via
progress=trueOptional local URL for fast LAN downloads (app)
Version 1.2
PDF conversion uses the File class directly instead of spawning a
papermansubprocessConversion cache (
/tmp/paperman-converted/) with 7-day expiryServer aborts conversion when the client disconnects
Return 500 error instead of raw file on conversion failure
Version 1.1
Single-page PDF extraction via
page=NparameterPage count query via
pages=trueparameterPage cache with 7-day expiry (
/tmp/paperman-pages/)
Version 1.0
Initial release
Basic search, list, and file retrieval
Multi-repository support
On-the-fly PDF conversion
Binary file download support
Security: Path traversal prevention
CORS enabled for web applications
Support
For issues, feature requests, or contributions: - GitHub: https://github.com/sjg20/paperman - Email: sjg@chromium.org
License
GPL-2 - See LICENSE file for details