layout | title | rank |
---|---|---|
page |
Tools and Web services |
6 |
A semi-automatic text annotation tool is developped by the project. It takes PDF documents as input and processes them automatically by applying the three following steps:
- layout detection,
- optical characters recognition (with PERO OCR),
- named entities recognition (fine-tuned CamemBERT model).
Users can then check and manually correct each automatically detected and processed text section.
The trade directories from the 19th century are a challenging dataset with very heterogeneous layouts, fonts, and contents. Source: gallica.bnf.fr / Bibliothèque nationale de France |
SODUCO text annotation tool |
The historical geocoder takes both addresses and dates into account |
Add a description here.
A sample vectorisation output |
A collaborative tool to validate and edit geospatial data and more is developped to improve data quality by getting a human validation of any type of geospatial data. It allows users to improve this quality by creating, removing, modifying or validating any feature (geometry and attributes).
General view of the tool with uploaded data | Edit mode, creation of the geometry of a new feature |
Edit mode, change of attributes of an existing feature | Status mode to see what features were created, removed or modified |
A catalog has been developped to store, reference and retrieve archival records and digital data used and produced throughout the project.
SODUCO catalog |