Instinct AI — Verified Benchmark Results

Corrections from prior claims: Previous materials cited "8,829 invoices/sec" and "100% accuracy." Those numbers came from a synthetic benchmark that skipped OCR, field extraction, and database operations. The real numbers are below. Every figure on this page was measured by running the actual pipeline code on actual test files.

At a Glance

178

Digital PDFs/sec
(full pipeline)

100%

Core field accuracy
(amount, date, inv#, vendor)

74.2%

All-field accuracy
(12 types, 10 invoices)

151ms

HTTP upload + process
(new document)

0.41

Scanned images/sec
(EasyOCR, CPU-only)

5/5

ROI math
sanity checks

Test 1: Digital PDF Full Pipeline

PASS

Read file → SHA-256 hash → PyMuPDF text extract → 14 regex field extractors → normalize output

Metric	Result
Files tested	12 PDFs (70 pages)
Avg per file	5.6ms
Throughput	178 invoices/sec
50-page PDF	27.2ms (1,840 pages/sec)
Text extraction success	12/12 (100%)

invoice_1.pdf   -> 7.5ms  | 965 chars  | 11/14 fields
invoice_7.pdf   -> 3.1ms  | 980 chars  | 12/14 fields
multi_page_50   -> 27.2ms | 47132 chars | 13/14 fields

Test 2: Field Extraction Accuracy (Real Invoices)

74.2% overall

14 regex extractors run on text from 10 real invoice PDFs — 12 field types per invoice

Field Type	Found	Total	Rate
Amount ($)	10	10	100%
Date	10	10	100%
Invoice Number	10	10	100%
Vendor Name	10	10	100%
Store ID	10	10	100%
Payment Terms	10	10	100%
Tax Rate	10	10	100%
Priority	10	10	100%
Work Order #	8	10	80%
Trade	1	10	10%
NTE	0	10	0%
PO Number	0	10	0%

Context: NTE, PO#, and trade fields were not present in the test invoice text. 0% extraction is correct behavior — the extractor returns nothing when the field doesn't exist (no false positives). Core business fields (amount, date, invoice#, vendor) are 100%.

Test 3: Field Extraction (Synthetic Inputs)

19/19 PASS

19 hand-crafted test strings with known expected outputs across 14 field types

Metric	Result
Tests run	19
Passed	19 (100%)
Avg extraction time	0.093ms
Total time	1.8ms

PASS: amount     | Total: $1,234.56          | got: 1234.56
PASS: date       | Due: 2026-03-22           | got: 2026-03-22
PASS: wo#        | Work Order: WO-12345      | got: WO-12345
PASS: vendor     | Vendor: ABC Hauling Corp  | got: ABC Hauling Corp
PASS: trade      | Trade: HVAC Service       | got: HVAC

Test 4: Scanned Image OCR (Degraded Quality)

50% usable

30 images across 10 quality categories • EasyOCR • CPU-only (no GPU acceleration)

Image Category	Usable	Avg Confidence	Avg Time
Clean scan	3/3	0.92	4.1s
Low noise	3/3	0.90	4.4s
High noise	3/3	0.84	4.5s
Rotated 2 degrees	3/3	0.87	4.9s
Rotated 5 degrees	3/3	0.77	4.1s
Blur	0/3	0.09	2.0s
Heavy blur	0/3	0.30	1.3s
Low DPI (150)	0/3	0.12	6.0s
Low DPI (75)	0/3	0.03	1.5s
Combined degradation	0/3	0.22	5.5s

Honest assessment: OCR handles clean, noisy, and rotated scans well. Fails on blur and low DPI. For production use, scanned invoices should be at least 200 DPI and reasonably focused. Blurry phone photos will not process reliably.

Test 5: HTTP API Latency

PASS

10 requests per endpoint, round-trip measured

Endpoint	Avg	P50	Min	Max
/api/stats	21ms	17ms	9ms	40ms
/api/docs (10 results)	28ms	28ms	15ms	78ms
/api/vendors	13ms	14ms	4ms	27ms
/api/work-orders	193ms	42ms	30ms	1,562ms
/api/locations	16ms	21ms	4ms	28ms

File Upload	Time	Rate
New document (64KB PDF, full processing)	151ms	6.6 docs/sec
Duplicate detection (same file)	15ms	24 docs/sec

Note: Work orders endpoint spiked to 1.5s on cold first call (SQLite page cache miss on 55KB response), then settled to ~42ms. All other endpoints consistently under 30ms.

Test 6: ROI Calculator Math Verification

5/5 CHECKS

Default scenario: 75 trucks, 120 routes/day, 55 mi/route, $3.85/gal diesel, 5.0 MPG, 15% AI optimization

Annual miles:      1,716,000 = 120 routes x 55 mi x 260 days
Gallons used:      343,200 = 1,716,000 / 5.0 MPG
Annual fuel cost:  $1,321,320 = 343,200 gal x $3.85
Miles eliminated:  257,400 = 1,716,000 x 15%
Fuel saved:        $198,198 = $1,321,320 x 15%
Maint saved:       $48,906 = 257,400 mi x $0.19/mi
TOTAL SAVED:       $247,104/year

Sanity Check	Value	Expected Range	Result
Fuel cost per truck per year	$17,618	$8K - $35K	PASS
Miles per truck per year	22,880	15K - 100K	PASS
Savings % matches input	15.0%	15%	PASS
Total under 30% of revenue	$247,104	< $18M	PASS
Diesel in EIA forecast range	$3.85	$3.40 - $5.00	PASS

Note: The 15% route optimization savings percentage is an adjustable input, not a guaranteed outcome. Published case studies (Casella 21%, Virginia Beach, Barcelona 30%) suggest 10-30% is achievable, but we have not run Flood Brothers routes ourselves.

Test 7: Email Parsing & Matching

PASS

182 ServiceChannel emails parsed for work order numbers and vendor identification

Metric	Result
Emails parsed	182
Speed	13,555 emails/sec
Work order # detected	178/182 (97.8%)
Vendor detected	181/182 (99.5%)

Test 8: Deduplication Speed

PASS

SHA-256 hash of 482 files across demo batch and inbox

Metric	Result
Files hashed	482
Speed	27,867 hashes/sec
Duplicates found	0

What We Cannot Prove From These Tests

Claim	Status	Why
Manual AP baseline (4 invoices/hr)	Unverified	IOFM industry report — we didn't time manual clerks ourselves
Route optimization saves 15-21%	Unverified	Casella/Virginia Beach case studies — we haven't run Flood Brothers' actual routes
Valuation multiple increase (3x to 6x)	Projection	Based on industry benchmarks and MiroFish simulation, not audited financials
Flood Brothers revenue ($60.6M)	Third-party estimate	Growjo/ZoomInfo estimate, not confirmed by Flood Brothers
GPU-accelerated OCR speed	Not tested	All OCR tests were CPU-only. GPU would be significantly faster.

Benchmark suite: benchmark_pipeline.py + benchmark_all.py
Results saved to: benchmark_results.json + benchmark_full_results.json
All tests reproducible by running the scripts in the doc-rag project directory.

Prepared by OZ3 Automation • March 22, 2026