Paperman Search Server API Documentation

Overview

The Paperman Search Server provides a REST API for searching, listing, and retrieving files from paper repositories. It supports on-the-fly PDF conversion from various formats.

Base URL: http://localhost:8080

Version: 1.0

All endpoints use the GET HTTP method and return JSON responses (except file downloads which return the file content).

Authentication

The server supports optional API key authentication via the X-API-Key header.

Enabling Authentication

Set the PAPERMAN_API_KEY environment variable when starting the server:

export PAPERMAN_API_KEY="your-secret-key-here"
./paperman-server /path/to/repository

Or with systemd:

# Edit /etc/systemd/system/paperman-server.service
[Service]
Environment="PAPERMAN_API_KEY=your-secret-key-here"

Using Authentication

Once enabled, all endpoints (except /status) require the API key:

# Without API key - fails
curl http://localhost:8080/search?q=test
# Response: {"error":"Invalid or missing API key...","success":false}

# With API key - works
curl -H "X-API-Key: your-secret-key-here" http://localhost:8080/search?q=test

Authentication Behavior

  • Disabled by default: If PAPERMAN_API_KEY is not set, no authentication is required

  • Status endpoint exempt: /status always works without authentication (for health checks)

  • All other endpoints protected: When enabled, /search, /list, /file, /repos require valid API key

  • 401 Unauthorized: Invalid or missing API key returns HTTP 401 with JSON error

Security Note: Always use HTTPS (SSL/TLS) when accessing the server over a network to prevent API key interception.

Common Response Format

Success Response

{
  "success": true,
  "data": "...",
  "count": 0
}

Error Response

{
  "success": false,
  "error": "Error message description"
}

Endpoints

1. Server Status

Get the current server status and repository information.

Endpoint: GET /status

Parameters: None

Response:

{
  "status": "running",
  "repository": "/path/to/repository"
}

Example:

curl http://localhost:8080/status

2. List Repositories

Get a list of all configured repositories.

Endpoint: GET /repos

Parameters: None

Response:

{
  "success": true,
  "count": 2,
  "repositories": [
    {
      "path": "/home/user/papers",
      "name": "papers",
      "exists": true
    },
    {
      "path": "/home/user/archive",
      "name": "archive",
      "exists": true
    }
  ]
}

Example:

curl http://localhost:8080/repos

3. Search Files

Search for files matching a pattern in the repository.

Endpoint: GET /search

Parameters:

Pa rameter

Type

Req uired

Def ault

Description

q

st ring

Yes

Search pattern (partial filename match)

` repo`

st ring

No

F irst

Repository name to search in

` path`

st ring

No

Root

Directory path to search in (relative to root)

recu rsive

boo lean

No

f alse

Search subdirectories

Response:

{
  "success": true,
  "pattern": "invoice",
  "path": "/home/user/papers",
  "count": 3,
  "files": [
    {
      "name": "invoice-2023-01.pdf",
      "path": "invoices/invoice-2023-01.pdf",
      "size": 45632,
      "modified": "2023-01-15T10:30:00"
    },
    {
      "name": "invoice-2023-02.pdf",
      "path": "invoices/invoice-2023-02.pdf",
      "size": 52441,
      "modified": "2023-02-12T14:22:00"
    }
  ]
}

Examples:

# Basic search
curl "http://localhost:8080/search?q=invoice"

# Search in specific repository
curl "http://localhost:8080/search?q=invoice&repo=papers"

# Search in subdirectory
curl "http://localhost:8080/search?q=report&path=2023"

# Recursive search
curl "http://localhost:8080/search?q=contract&recursive=true"

Notes: - Pattern matching is case-insensitive - Searches for partial filename matches - Only returns files with supported extensions (.max, .pdf, .jpg, .tiff)


4. List Directory Contents

List all files in a specific directory.

Endpoint: GET /list

Parameters:

Par ameter

Type

Re quired

De fault

Description

`` path``

st ring

No

Root

Directory path (relative to repository)

`` repo``

st ring

No

First

Repository name

Response:

{
  "success": true,
  "path": "invoices",
  "count": 5,
  "files": [
    {
      "name": "invoice-2023-01.pdf",
      "path": "invoices/invoice-2023-01.pdf",
      "size": 45632,
      "modified": "2023-01-15T10:30:00"
    },
    {
      "name": "invoice-2023-02.pdf",
      "path": "invoices/invoice-2023-02.pdf",
      "size": 52441,
      "modified": "2023-02-12T14:22:00"
    }
  ]
}

Examples:

# List root directory
curl "http://localhost:8080/list"

# List subdirectory
curl "http://localhost:8080/list?path=invoices"

# List in specific repository
curl "http://localhost:8080/list?path=2023&repo=archive"

5. Get File Content

Retrieve a file’s content, optionally converting it to PDF.

Endpoint: GET /file

Parameters:

Parameter

Type

Req uired

Default

Description

path

str ing

Yes

File path (relative to repository)

repo

str ing

No

First

Repository name

type

str ing

No

orig inal

Output type: original or pdf

page

int

No

0

Extract a single page from a PDF (1-based). Returns a standalone single-page PDF.

pages

str ing

No

Set to true to return the page count as JSON instead of file content. PDF files only.

Response: - Success: Binary file content with appropriate Content-Type header - Error: JSON error response

When pages=true is given, the response is JSON:

{
  "success": true,
  "pages": 5
}

When page=N is given, the response is a single-page PDF (application/pdf). Extracted pages are cached in /tmp/paperman-pages/ with the same 7-day expiry as thumbnails.

Content-Type Headers: - .pdfapplication/pdf - .jpg, .jpegimage/jpeg - .tif, .tiffimage/tiff - .maxapplication/octet-stream - PDF conversion → application/pdf

Examples:

# Download original file
curl "http://localhost:8080/file?path=invoice.pdf" -o invoice.pdf

# Download from specific repository
curl "http://localhost:8080/file?path=document.pdf&repo=archive" -o document.pdf

# Convert JPEG to PDF on-the-fly
curl "http://localhost:8080/file?path=scan.jpg&type=pdf" -o scan.pdf

# Convert .max file to PDF
curl "http://localhost:8080/file?path=document.max&type=pdf" -o document.pdf

# Get page count for a PDF
curl "http://localhost:8080/file?path=document.pdf&pages=true"

# Download just page 1 (for fast initial display)
curl "http://localhost:8080/file?path=document.pdf&page=1" -o page1.pdf

PDF Conversion: - Supports: .max, .jpg, .jpeg, .tif, .tiff - Conversion timeout: 30 seconds - Uses paperman’s built-in conversion engine - Maintains image quality and metadata

Error Responses:

// File not found
{
  "success": false,
  "error": "File not found"
}

// Invalid path (directory traversal attempt)
{
  "success": false,
  "error": "Invalid file path"
}

// Conversion failed
{
  "success": false,
  "error": "PDF conversion failed: <error details>"
}

// Conversion timeout
{
  "success": false,
  "error": "PDF conversion timed out (30s limit)"
}

Supported File Types

The server handles the following file types:

Extension

Description

PDF Conversion

Direct View

.max

Paperman format

.pdf

PDF document

N/A

.jpg, .jpeg

JPEG image

.tif, .tiff

TIFF image


Error Codes

HTTP Code

Description

Common Causes

200

OK

Request successful

400

Bad Request

Invalid path, missing parameters

401

Unauthorized

Invalid or missing API key

404

Not Found

File/endpoint not found

405

Method Not Allowed

Non-GET request

500

Internal Server Error

Conversion failed, file read error

501

Not Implemented

Unsupported conversion (deprecated)


CORS

All endpoints include CORS headers:

Access-Control-Allow-Origin: *

This allows web applications from any origin to access the API.


Rate Limiting

Currently, no rate limiting is implemented. The server is designed for trusted local or network use.


Examples

JavaScript/Fetch API

const API_KEY = 'your-secret-key-here';  // Set if authentication is enabled

// Search for files
fetch('http://localhost:8080/search?q=invoice', {
  headers: {
    'X-API-Key': API_KEY  // Include if auth enabled
  }
})
  .then(response => response.json())
  .then(data => {
    console.log(`Found ${data.count} files`);
    data.files.forEach(file => {
      console.log(`- ${file.name} (${file.size} bytes)`);
    });
  });

// Download a file
fetch('http://localhost:8080/file?path=document.pdf', {
  headers: {
    'X-API-Key': API_KEY
  }
})
  .then(response => response.blob())
  .then(blob => {
    const url = URL.createObjectURL(blob);
    const a = document.createElement('a');
    a.href = url;
    a.download = 'document.pdf';
    a.click();
  });

// Convert to PDF
fetch('http://localhost:8080/file?path=scan.jpg&type=pdf', {
  headers: {
    'X-API-Key': API_KEY
  }
})
  .then(response => response.blob())
  .then(blob => {
    const url = URL.createObjectURL(blob);
    window.open(url, '_blank');
  });

Python

import requests

API_KEY = 'your-secret-key-here'  # Set if authentication is enabled
headers = {'X-API-Key': API_KEY}  # Include if auth enabled

# Search for files
response = requests.get('http://localhost:8080/search',
                       params={'q': 'invoice'},
                       headers=headers)
data = response.json()
print(f"Found {data['count']} files")

# Download a file
response = requests.get('http://localhost:8080/file',
                       params={'path': 'document.pdf'},
                       headers=headers)
with open('document.pdf', 'wb') as f:
    f.write(response.content)

# Convert to PDF
response = requests.get('http://localhost:8080/file',
                       params={'path': 'scan.jpg', 'type': 'pdf'},
                       headers=headers)
with open('scan.pdf', 'wb') as f:
    f.write(response.content)

cURL

# Set API key if authentication is enabled
API_KEY="your-secret-key-here"

# Get server status (no auth required)
curl http://localhost:8080/status

# Search files (with auth)
curl -H "X-API-Key: $API_KEY" "http://localhost:8080/search?q=invoice" | jq

# List directory (with auth)
curl -H "X-API-Key: $API_KEY" "http://localhost:8080/list?path=2023" | jq

# Download file (with auth)
curl -H "X-API-Key: $API_KEY" "http://localhost:8080/file?path=document.pdf" -o document.pdf

# Convert to PDF (with auth)
curl -H "X-API-Key: $API_KEY" "http://localhost:8080/file?path=scan.jpg&type=pdf" -o scan.pdf

# Pretty print JSON response (with auth)
curl -s -H "X-API-Key: $API_KEY" http://localhost:8080/repos | jq .

Security Considerations

Path Traversal Prevention

The server prevents directory traversal attacks: - Paths containing .. are rejected - Absolute paths starting with / are rejected - All paths are resolved relative to the repository root

Network Security

For production use, consider: 1. Firewall: Restrict access to trusted IPs 2. Reverse Proxy: Use nginx/apache with SSL/TLS 3. Authentication: Add authentication layer via reverse proxy 4. Private Network: Run on private network only

File Access

  • Server runs with limited user permissions

  • Only configured repository paths are accessible

  • No write operations are supported (read-only API)


Performance

Response Times

Typical response times on local network:

Endpoint

Response Time

Notes

/status

< 1ms

Cached information

/repos

< 5ms

Directory metadata

/search

10-100ms

Depends on directory size

/list

5-50ms

Depends on directory size

/file

10-500ms

Depends on file size

PDF convert

1-30s

Depends on file size/complexity

Caching

Three disk caches are maintained under /tmp/, all keyed by an MD5 hash of the file path and modification time. Entries expire after 7 days and are cleaned on server start.

/tmp/paperman-thumbnails/

JPEG thumbnails generated by pdftocairo.

/tmp/paperman-pages/

Single-page PDFs extracted from multi-page documents via page=N.

/tmp/paperman-converted/

Full-document PDFs converted from non-PDF formats (e.g. .max) via type=pdf. Conversion uses the File class directly, so no external binary is needed. Page images are extracted sequentially, then compressed in parallel across all available CPU cores using QtConcurrent, then merged into the final PDF. If the requesting client disconnects mid-extraction the partial file is removed.


Troubleshooting

PDF Conversion Issues

Problem: Conversion returns error - Solution: Check journalctl logs for detailed error messages: sudo journalctl -u paperman-server -f

Problem: Conversion is slow for large files - Solution: The first request converts and caches the result; subsequent requests are served from /tmp/paperman-converted/. Compression runs in parallel across all CPU cores. If the client disconnects before conversion finishes, the server aborts and cleans up.

File Access Issues

Problem: “File not found” but file exists - Solution: Check file path is relative to repository root, not absolute

Problem: “Invalid file path” error - Solution: Path contains .. or starts with /. Use relative paths only.


Changelog

Version 1.3 (Current)

  • Parallel PDF compression using QtConcurrent across all CPU cores

  • Streamed file responses (512 KB chunks with flow control)

  • Conversion progress reporting via progress=true

  • Optional local URL for fast LAN downloads (app)

Version 1.2

  • PDF conversion uses the File class directly instead of spawning a paperman subprocess

  • Conversion cache (/tmp/paperman-converted/) with 7-day expiry

  • Server aborts conversion when the client disconnects

  • Return 500 error instead of raw file on conversion failure

Version 1.1

  • Single-page PDF extraction via page=N parameter

  • Page count query via pages=true parameter

  • Page cache with 7-day expiry (/tmp/paperman-pages/)

Version 1.0

  • Initial release

  • Basic search, list, and file retrieval

  • Multi-repository support

  • On-the-fly PDF conversion

  • Binary file download support

  • Security: Path traversal prevention

  • CORS enabled for web applications


Support

For issues, feature requests, or contributions: - GitHub: https://github.com/sjg20/paperman - Email: sjg@chromium.org


License

GPL-2 - See LICENSE file for details