Skip to main content

Your AI Can't Read Documents. This Fixes That.

The AI memory system that unlocks infinite context. One command. 154 tools. Every PDF, contract, and report -- searchable, comparable, and verifiable. Running 100% on your hardware. Ships with 1,150 Hormozi transcripts ready to search.

$ npx -y ocr-provenance-mcp install
See it in action
terminal

No API keys. No cloud. Your data never leaves your machine.

Works with Claude Code, Claude Desktop, Cursor, and Windsurf -- automatically.

GPU-accelerated with NVIDIA. CPU mode for .md and .txt files -- free processing, no GPU needed.

154 MCP Tools
5 AI Models
100% Local
GPU Accelerated
3,700+ Tests Passing

You Have Hundreds of Documents. Your AI Can't Touch Any of Them.

  • You copy-paste text from PDFs into chat windows
  • You manually search through folders to find what you need
  • You can't prove where extracted data came from
  • You have no way to search semantically across your document corpus
  • You lose hours doing work that should take seconds
"Every hour you spend doing this manually is an hour you're not spending on work that actually grows your business."

One Install. Your AI Gets 154 Tools.

Copy-paste text from PDFs

"Search my 200 contracts for non-compete clauses" -- done in 200ms

Manually compare document versions

Structured diff with every change flagged by significance

Can't prove data origin

SHA-256 cryptographic provenance chain, W3C PROV export

No semantic search

Hybrid search: BM25 + 768-dim vectors + cross-encoder reranking

Hours of manual work

Full pipeline: 1,150 docs processed in ~3 minutes

Send documents to cloud APIs

100% local. Your hardware. Your data.

$ npx -y ocr-provenance-mcp install

See Exactly What It Does

Included: Alex Hormozi Business Strategy Super Skill -- Every install ships with a pre-processed, fully searchable database of 1,147 Alex Hormozi YouTube video transcripts (last year of content) plus all 3 of his books ($100M Offers, $100M Leads, Gym Launch Secrets). That's 2.6+ million tokens of business strategy context -- more than fits in any context window. This is what we call a Super Skill: a skill that requires more context than a single context window can hold. Point your AI at this database and say "What would Hormozi do?" about pricing, offers, lead gen, copywriting, or anything. Ready to search the moment you install. No credits needed.

~2-5 sec/page

OCR Speed

~12ms/chunk

Embedding

<100ms

Semantic Search

<200ms

Hybrid Search

~3 min

Full Pipeline (1,150 docs)

$ npx -y ocr-provenance-mcp install

154 Tools Across 18 Categories. All Included.

Document Ingestion

7

Ingest files, directories. Process, retry, reprocess.

+

Tools in this category are available after install.

Search

7

Keyword, semantic, hybrid. Cross-database. RAG context.

+

Tools in this category are available after install.

Document Management

10

List, view, delete, deduplicate, version history, structure.

+

Tools in this category are available after install.

Provenance Tracking

6

Full chain-of-custody. SHA-256. W3C PROV export.

+

Tools in this category are available after install.

Vision AI (VLM)

3

Describe images, charts, diagrams -- local Chandra VLM.

+

Tools in this category are available after install.

Image Processing

9

Extract, search, reanalyze, stats.

+

Tools in this category are available after install.

Embeddings

4

768-dim vectors with nomic-embed-text-v1.5.

+

Tools in this category are available after install.

Document Comparison

6

Side-by-side diff. Batch compare. Similarity matrix.

+

Tools in this category are available after install.

Clustering

7

Auto-cluster by similarity. No parameter tuning needed.

+

Tools in this category are available after install.

Contract Lifecycle

9

Clauses, obligations, calendar, playbooks, summaries.

+

Tools in this category are available after install.

Compliance & Audit

5

SOC 2, HIPAA, SOX. Full audit trail.

+

Tools in this category are available after install.

Collaboration

11

Annotations, locking, alerts, review workflows.

+

Tools in this category are available after install.

Workflow & Approvals

8

Multi-step chains. Assignment. Queue management.

+

Tools in this category are available after install.

Database Management

20

Multi-DB. Backup, restore, clone, merge, snapshot, share.

+

Tools in this category are available after install.

Tags & Organization

6

Create, apply, search tags across everything.

+

Tools in this category are available after install.

Reports & Analytics

8

Quality, cost, performance, error, trend analysis.

+

Tools in this category are available after install.

Intelligence

5

Interactive guide. Table extraction. Smart recommendations.

+

Tools in this category are available after install.

System

33

Health, config, maintenance, license, dashboard, webhooks.

+

Tools in this category are available after install.

"Other document tools give you OCR. We give you OCR + semantic search + vision AI + provenance + compliance + clustering + contract lifecycle + collaboration + workflow + analytics. All local. All in one install."

How It Works

Watch the full walkthrough

Watch: OCR Provenance MCP full demo and walkthrough video
Watch Full Demo (15 min)
Model Purpose VRAM
Marker-pdf v1.10.2 Document OCR with layout preservation 8-10 GB
Chandra v0.1.8 Vision AI -- images, charts, diagrams ~18 GB
nomic-embed-text-v1.5 768-dim semantic embeddings 2-3 GB
HDBSCAN Auto-clustering by similarity CPU
ms-marco-MiniLM-L-12-v2 Cross-encoder reranking ~1 GB
DOCUMENT --> OCR_RESULT --> CHUNK --> EMBEDDING
                       --> IMAGE --> VLM_DESC --> EMBEDDING
     ^ SHA-256 hash at every node

Your Data Never Leaves Your Machine. Ever.

100% local processing -- all inference on YOUR hardware

Ed25519 signed license tokens -- cryptographic offline verification

SHA-256 provenance chains -- every extraction linked to source

HMAC-signed balances -- tamper detection on all billing

Container hardening -- cap-drop=ALL, no-new-privileges, non-root

Secret isolation -- signing keys stripped from dashboard process

Zod schema validation on all 154 tool inputs

Directory traversal prevention on all file operations

Zero telemetry -- no analytics, no tracking, no phone-home

"We don't need to guarantee your data stays private. There's no cloud to send it to. The processing happens on your GPU, in a container, on your machine. That's not a promise -- it's the architecture."

Install Free. Pay Only When You Process.

First 100 customers who spend $100 get $10,000 credited to their account. That's 333,000+ files of processing power. Limited to the first 100.

$0.03 / file

Pay for what you use

  • Full install with all 154 tools -- free
  • .md and .txt files process for free (no OCR needed, embedding only, works great on CPU)
  • Buy credits via Stripe for PDF, DOCX, PPTX, and other OCR-requiring formats
  • All search, management, compliance, and collaboration tools -- always free
  • No monthly fee, no subscription
  • Credits never expire
  • Add funds: $5, $10, $25, $50, or custom
  • Includes bundled Alex Hormozi business strategy dataset -- ready to search immediately
$ npx -y ocr-provenance-mcp install

Search and management tools are always free. Credits are only consumed when running OCR, VLM, or embedding on supported document formats.

Custom

For production use

  • Everything in Pay-Per-Use
  • Commercial license for production environments
  • Priority support
  • Volume pricing on credits
  • Custom terms
"Most document intelligence tools charge $0.01-0.05 per page via cloud APIs. And your data leaves your machine every time. OCR Provenance processes locally on your hardware at $0.03/file -- and your data never leaves. Markdown and text files? Completely free."

What You Need

Supported Formats

PDF DOCX DOC PPTX PPT XLSX XLS EPUB PNG JPG JPEG TIFF TIF BMP GIF WEBP TXT FREE MD FREE CSV HTML

20 file types supported. .md and .txt files process free -- no OCR models needed, just embedding. Works great on CPU.

System Requirements

Component Minimum Recommended
Docker Engine 20+ Desktop (latest)
Node.js 20+ 22+ LTS
RAM 8 GB 16+ GB
Disk 30 GB 50+ GB
GPU Optional (CPU works for .md/.txt) NVIDIA RTX 3060+ (16+ GB VRAM)
OS Windows with WSL2 Windows with WSL2 + NVIDIA GPU

Full GPU processing (OCR + VLM + Embeddings): Windows with NVIDIA GPU. Minimum 16 GB VRAM for VLM (Chandra). Recommended: 24 GB (RTX 3090/4090).

CPU-only mode: Works on Windows for .md/.txt embedding and all search/management tools. No GPU required.

macOS: Bare metal release coming soon. The Docker container does not currently support Mac GPU passthrough.

Linux: Supported with NVIDIA GPU via Docker.

Works With Your AI Client. Automatically.

Claude Code

claude mcp add ocr-provenance-mcp -s user -- npx -y ocr-provenance-mcp

Claude Desktop

Add to claude_desktop_config.json

Cursor

Add to ~/.cursor/mcp.json

Windsurf

Standard MCP configuration
"The installer auto-detects and registers with every supported client. You probably don't need to do any of this."

Stop Copying and Pasting. Give Your AI the Tools to Do the Work.

"Every document you process manually today is time you're not getting back. The install takes 60 seconds."

$ npx -y ocr-provenance-mcp install

Start processing documents in under 60 seconds