Command-line Interface
Paperman provides command-line tools for converting between file formats and performing batch operations without the GUI.
Basic Usage
paperman [options] <file-or-directory>
Conversion Options
-m, --max <file>Convert the given file to .max format. For PDFs with 8 or more pages, the conversion runs in parallel using multiple processes for faster rendering.
-p, --pdf <file>Convert the given file to .pdf format.
-j, --jpeg <file>Convert the given file to .jpg format.
--output <file>Write the converted output to the specified path instead of the current directory. Works with
-m,-pand-j.--page-range S:EConvert only pages S to E (1-based, inclusive). For example,
--page-range 1:10converts the first ten pages.--jobs NSet the number of parallel worker processes for
-mconversion. The default (0) auto-detects the CPU count, limited so that each worker gets at least 10 pages. Use--jobs 1to force single-process conversion.
Other Options
-s, --sum <dir>Compute MD5 checksums for all files in a directory.
-o, --ocr <dir>Run OCR on all .max files in a directory (recursive).
-q, --search <query>Search the OCR index for a query string.
-t, --testRun the built-in unit tests.
-h, --helpDisplay usage information.
Parallel PDF-to-max Conversion
Converting large PDFs to .max format is CPU-intensive because each page must
be rendered via Poppler’s renderToImage(), which takes roughly 55 ms per
page. Since Poppler serialises rendering within a single process, threads do
not help.
Paperman works around this by spawning multiple child processes, each with its
own Poppler Document instance, so rendering happens truly in parallel.
How it works
When paperman -m big.pdf detects 8 or more pages:
The parent determines the worker count:
min(cpu_count, page_count / 10), overridable with--jobsPages are split into roughly-equal ranges across N workers
Each worker runs as a separate process:
paperman -m big.pdf --page-range S:E --output /tmp/.../partN.maxWorkers run with
QT_QPA_PLATFORM=offscreenso no display is neededOnce all workers finish, the parent merges the partial .max files using
stackStack(), which copies compressed chunks directly — no decompression or re-encodingTemporary files are cleaned up automatically
Each worker gets at least 10 pages to avoid process-spawn overhead dominating on high-core machines. Each page renders in ~55 ms, so a 100-page PDF uses up to 10 workers and completes in under a second of wall time rather than ~5.5 seconds sequentially.
Examples
Convert a PDF using automatic parallelism:
paperman -m document.pdf
Force sequential (single-process) conversion:
paperman -m document.pdf --jobs 1
Convert with 4 workers, output to a specific path:
paperman -m document.pdf --jobs 4 --output /path/to/output.max
Convert only pages 20-30:
paperman -m document.pdf --page-range 20:30 --output pages20-30.max
Convert a .max file to PDF at a specific path:
paperman -p document.max --output /path/to/output.pdf
Testing
# Run integration tests
make test-parallel
# Run unit tests (includes testImageDepth)
paperman -t