top of page
Shyrley P.
Admin
Data Specialist
More actions
Profile
Join date: Jan 8, 2026
Posts (6)
Apr 5, 2026 ∙ 5 min
From Scanned PDF to Structured Database: Digitizing 19th-Century US Legal Codes
What automated OCR misses when the source material is 150 years old, and how we handled it. A legal history researcher sent us four PDFs. Each contained a different US state's civil code from the 1870s: California, North Dakota, South Dakota, and Montana. The goal was a structured Excel database where every section of law had its own row. Section number and full text cleanly separated, consistently formatted, and ready for database import and comparative analysis. Page from a historical legal...
4
0
Jan 25, 2026 ∙ 3 min
From Sentences to Spreadsheets: Historical Directory Digitization for Unstructured Narrative Data
Data isn't always in tables. See how our historical directory digitization extracted Railroads, Industry, and Crops from unstructured paragraphs in a 1949 directory.
1
0
Jan 18, 2026 ∙ 3 min
The "Perfect Scan" Paradox: Tabular Data Extraction for Dense Election Returns
Standard OCR confuses tight columns, leading to data errors. See how our tabular data extraction ensured 100% accuracy for a complex 1958 election project.
2
0
bottom of page