🏛️ Success Story: The Provincial Archive of Valdemora Digitizes and Publishes Over 25,000 Handwritten Pages with After OCR
- Leo Barrios

- 21 may
- 2 Min. de lectura
Summary:The Historical Archive of the Province of Valdemora successfully digitized, transcribed, and published online more than 25,000 handwritten pages from the 19th century using the comprehensive services provided by After OCR. The project preserved key documents of the region’s administrative history and made them accessible to the public through a modern, searchable web platform connected via API to other institutions.
🎯 The Challenge
The archive holds council minutes and official correspondence from 1824 to 1911—handwritten records, mostly in historical cursive script, with marginal annotations, ink variations, and physical degradation.
The main challenges included:
Highly variable and often illegible handwriting.
Inconsistent document structure (some with headings, others without).
A need to transform this heritage into usable data for researchers, genealogists, and citizens.
The archive staff had limited technical experience and resources to carry out a complex HTR (Handwritten Text Recognition) project.
🧠 After OCR’s Solution
After OCR proposed a full end-to-end solution in five phases:
Digitization and Preprocessing
High-resolution scanning (300–400 DPI) of council books.
Automatic deskewing, contrast enhancement, and page segmentation.
Transcription for Training
Manual transcription of 50,000 words by After OCR's validation team.
Structured annotation using XML/TEI tags for headers, named entities, and dates.
HTR Model Training
Training using proprietary technology based on PyLaia and Transformers.
Continuous model evaluation, reaching a final Character Error Rate (CER) of 3.4%.
Fine-tuning for abbreviations and local 19th-century terminology.
Validation and Quality Control
Manual correction of 10% of pages for accuracy assurance.
Peer-review workflow with full traceability of edits.
Publication and Access
Deployment of a custom responsive website with advanced search features.
Integration with ORCID, Wikidata, and civil registries.
REST API for university and public history data platforms.
🌍 Results
25,783 pages digitized and transcribed with image-text alignment.
120,000+ named entities automatically indexed (people, roles, places).
Public launch of the platform at archivohistorico.valdemora.gov (fictitious URL) in April 2025.
Downloads available in PDF, plain text, TEI/XML, and JSON via API.
📈 Impact
Since launch, the platform has achieved:
80,000+ unique visits in the first 6 months.
Cited in 14 academic research papers.
Integrated into the regional educational program “Valdemora Through Time.”
Council minutes linked to civil death records, censuses, and registries via After OCR’s API.
🗣️ Testimonial
“After OCR helped us go beyond digitization: they enabled us to transform our archive into a living, indexed, open resource for scholars and citizens alike.”— María Eugenia Ríos, General Coordinator of the Valdemora Archive
📌 After OCR Services Used
Historical document digitization
Assisted transcription and linguistic annotation
Custom HTR model training
Human validation and quality control
Web publication with advanced search
API for external access and system integration
🧭 Next Steps
Expanding the model to parish and notarial records (18th–20th centuries)
Adding automatic multilingual translation
Developing online historical analysis tools (interactive maps, timelines, person network visualizations)
Do you manage an archive full of handwritten history waiting to be rediscovered?After OCR can guide you through every step—from initial scans to searchable, public, and long-term accessible historical data.