Overcoming the Real-World Pitfalls of Google Document AI

Oleksandr Tserkovnyi

Authors

Oleksandr Tserkovnyi TrialBase Inc., Principal Engineer Dominican Republic, Punta Cana

Keywords:

Google Document AI, LegalTech, document analysis automation, PDF processing, RAG, LLM, microservice architecture, asynchronous processing, ProcessorPool

Abstract

This paper discusses the practical and feature gaps that were encountered with Google Document AI in building the AI product at TrialBase platform (ai.trialbase.com), which automates legal document analysis. Results matter because there is an explosion of electronic legal documents that require fast and reliable parsing, which is essential for systems based on LLMs and retrieval-augmented generation. Standard Document AIs seldom work well in practice, even if there are no damaged PDFs, and if a large dataset is being used, wherein the API quota is not hit, and processing costs do not matter. The architecture proposed in this paper is robust, efficient at transforming various documents into structured data. Event-driven microservice architecture with message queues and a PDF sanitization pipeline solves real-world problems by enabling ProcessorPool (multiple processors using synchronous Document AI API to go beyond quota limitation concurrently drastically reducing processing times). Pre-sanitization, coupled with asynchronous batch processing and a custom load balancer, got a tenfold speed increase with enhanced reliability over real-world legal documents. The article is meant to help LegalTech researchers and practitioners, workflow developers, and engineers working on high-performance, reliable Google Cloud-based projects.

References

Appalaraju, S., Jasani, B., Kota, B. U., Xie, Y., & Manmatha, R. (2021). DocFormer: End-to-End Transformer for Document Understanding. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv48922.2021.00103

He, S., & Schomaker, L. (2017). Beyond OCR: Multi-faceted understanding of handwritten document characteristics. Pattern Recognition, 63, 321–333. https://doi.org/10.1016/j.patcog.2016.09.017

Li, Z., Guo, L., Cheng, J., Chen, Q., He, B., & Guo, M. (2022). The Serverless Computing Survey: A Technical Primer for Design Architecture. ACM Computing Surveys, 54(10s), 1-34. https://doi.org/10.1145/3508360

Powalski, R., Borchmann, Ł., Jurkiewicz, D., Dwojak, T., Pietruszka, M., & Pałka, G. (2021). Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer. Arxiv. https://doi.org/10.48550/arxiv.2102.09550

Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., & Zhou, L. (2021). LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2579–2591. https://doi.org/10.18653/v1/2021.acl-long.201

Overcoming the Real-World Pitfalls of Google Document AI

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License