Operational Overview
Last updated
Last updated
The Document Understanding Subnet leverages a decentralized, multi-step process to accurately detect checkboxes and their associated text within documents. This system is designed to process images efficiently while ensuring data accuracy through a Validator-Miner structure.
The validator acts as the quality assurance component of the system. It selects images from a dataset of images with corresponding ground truth data, which represents the correct checkbox-text associations within each document. The validator selects an image and sends it to the miner for processing, while retaining the ground truth data to compare against the miner’s results after processing is complete.
The miner is responsible for analyzing the image and extracting checkbox-text associations through a series of specialized models and processors:
Vision Model: Detects checkboxes in the document image, mapping their exact coordinates for precise localization.
OCR Engine and Preprocessor: Extracts text from the document image, organizing it into lines while capturing the bounding coordinates for each text line. This structured text data is essential for associating the correct text with corresponding checkboxes.
Post-Processor: Merges the checkbox coordinates from the Vision Model with the line coordinates from the OCR output. This final step aligns checkboxes with their relevant text, enabling the subnet to create accurate checkbox-text pairs.
After processing, the miner sends the results back to the validator for evaluation and validation.