Grooper Logo Small

OCR Infrastructure Requirements

Scalable Parallel Data Processing

grooper Parallel Data Processing

Grooper’s shared computing infrastructure makes parallel processing on a large scale easy. Leverage hundreds of processing cores spread across multiple servers and workstations.

Perform in minutes what takes hours or even days using traditional processing methods.

The Power of Parallel Processing

The performance benefits of parallel processing are significant, particularly for low latency, CPU-intensive activities. Consider the job below, executed on a 56-core thread pool. CPU-intensive activities experience a massive speed increase – up to 50X!

The Power of Parallel Processing
Activity Description Serial Parallel Parallel
Import Loaded 46K TIF & PDF documents from CMIS 13 hours 3.2 hours 4x
Split Split documents into 426K pages 11.7 hours 27 minutes 26x
Enhance Applied image cleanup to 426K pages 4.2 days 2.0 hours 50x
OCR Performed full text OCR on 426K pages 1.9 days 1.3 hours 35x
Extract Extracted additional fields from 46K documents 3.7 hours 9.3 minutes 25x
Merge Merged 426K pages back into multipage files 8.5 hours 24 minutes 21x
Export Exported 46K documents to CMIS 18.0 hours 3.5 hours 5x

To enable Grooper to act as a true AI accelerator, we realized the importance of parallel processing. Conditioning documents for complete data capture requires a tremendous amount of computational cycles for image processing, text collection, and data analysis.

This was the correct architecture to maintain reasonable processing times without having to compromise our ability to get every piece of data we wanted.

Performance Monitoring

software infrastructure monitoring

See all the workstations and servers that are connected to your Grooper environment and centrally monitor the memory, processors, and storage of each.

If any machines are being overworked, simply disable some of the automation services to free up resources.

Global Service Management

infrastructure service management

Centrally start and stop document capture or document classification automation services in bulk, even when they are spread across your entire collection of machines. There is no need to login to each machine to handle these tasks.

One-Click Software Upgrades

One Click Upgrades

Upgrade one machine, complete unit-testing, then push a new version to all machines in a single click. Stay up to date and escape the mindset of multi-week upgrade project planning and 2 to 4 year upgrade cycles.

Minimum System Requirements

Grooper service components include Microsoft Windows Server and Microsoft SQL. Detailed system requirements can be found on Grooper xChange.