Object Character Recognition Project
The Louisville Gas and Electric company needed the ability to search the text on over three hundred thousand blueprint drawings to quickly find generation equipment schematics.
Starting as a proof of concept written in Python, the process included downloading the blueprint PDF from the third-party vendor system, extracting each page to an image, and then processing each image for text recognition. The process was designed to have specific checkpoints to audit the progress and timing of each file. The solution was then redesigned to run all in memory to optimize performance.
The result was a distributed process, with access to run across four servers, that could be executed over the entire data set within seventy-two hours. The effort to improve the text quality and application speed is ongoing.
- Manual time-consuming search for drawings based on file name
- Access to blueprint drawings was a single file and single page at a time
- The third-party software was not able to provide text search within a drawing
- Unable to know or find the company/contractor that manufactured the equipment
- Processes the data set and extracts text from the drawings
- Tunes and processes new drawings as they are loaded
- Features Python-based interface to simplify access to vendor software
- Capable of on-demand job-based processing
- Segments text and processes images via convolutional neural network and Tesseract
- Able to easily and quickly process new drawings
- OCR process improved and fully audited, w/metrics
- Designed to easily include another node
- Managed and supported by KiZAN analytics team