OCR Infrastructure Requirements
Scalable Parallel Data Processing
Grooper’s shared computing infrastructure makes parallel processing on a large scale easy. Leverage hundreds of processing cores spread across multiple servers and workstations.
Perform in minutes what takes hours or even days using traditional processing methods.
The Power of Parallel Processing
The performance benefits of parallel processing are significant, particularly for low latency, CPU-intensive activities. Consider the job below, executed on a 56-core thread pool. CPU-intensive activities experience a massive speed increase – up to 50X!
Activity | Description | Serial | Parallel | Parallel |
Import | Loaded 46K TIF & PDF documents from CMIS | 13 hours | 3.2 hours | 4x |
Split | Split documents into 426K pages | 11.7 hours | 27 minutes | 26x |
Enhance | Applied image cleanup to 426K pages | 4.2 days | 2.0 hours | 50x |
OCR | Performed full text OCR on 426K pages | 1.9 days | 1.3 hours | 35x |
Extract | Extracted additional fields from 46K documents | 3.7 hours | 9.3 minutes | 25x |
Merge | Merged 426K pages back into multipage files | 8.5 hours | 24 minutes | 21x |
Export | Exported 46K documents to CMIS | 18.0 hours | 3.5 hours | 5x |
To enable Grooper to act as a true AI accelerator, we realized the importance of parallel processing. Conditioning documents for complete data capture requires a tremendous amount of computational cycles for image processing, text collection, and data analysis.
This was the correct architecture to maintain reasonable processing times without having to compromise our ability to get every piece of data we wanted.
Performance Monitoring
See all the workstations and servers that are connected to your Grooper environment and centrally monitor the memory, processors, and storage of each.
If any machines are being overworked, simply disable some of the automation services to free up resources.
Global Service Management
Centrally start and stop document capture or document classification automation services in bulk, even when they are spread across your entire collection of machines. There is no need to login to each machine to handle these tasks.
One-Click Software Upgrades
Upgrade one machine, complete unit-testing, then push a new version to all machines in a single click. Stay up to date and escape the mindset of multi-week upgrade project planning and 2 to 4 year upgrade cycles.
Minimum System Requirements
Grooper service components include Microsoft Windows Server and Microsoft SQL. Detailed system requirements can be found on Grooper xChange.