Overcoming the Real-World Pitfalls of Google Document AI
Keywords:
Google Document AI, LegalTech, document analysis automation, PDF processing, RAG, LLM, microservice architecture, asynchronous processing, ProcessorPoolAbstract
This paper discusses the practical and feature gaps that were encountered with Google Document AI in building the AI product at TrialBase platform (ai.trialbase.com), which automates legal document analysis. Results matter because there is an explosion of electronic legal documents that require fast and reliable parsing, which is essential for systems based on LLMs and retrieval-augmented generation. Standard Document AIs seldom work well in practice, even if there are no damaged PDFs, and if a large dataset is being used, wherein the API quota is not hit, and processing costs do not matter. The architecture proposed in this paper is robust, efficient at transforming various documents into structured data. Event-driven microservice architecture with message queues and a PDF sanitization pipeline solves real-world problems by enabling ProcessorPool (multiple processors using synchronous Document AI API to go beyond quota limitation concurrently drastically reducing processing times). Pre-sanitization, coupled with asynchronous batch processing and a custom load balancer, got a tenfold speed increase with enhanced reliability over real-world legal documents. The article is meant to help LegalTech researchers and practitioners, workflow developers, and engineers working on high-performance, reliable Google Cloud-based projects.
References
Appalaraju, S., Jasani, B., Kota, B. U., Xie, Y., & Manmatha, R. (2021). DocFormer: End-to-End Transformer for Document Understanding. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv48922.2021.00103
He, S., & Schomaker, L. (2017). Beyond OCR: Multi-faceted understanding of handwritten document characteristics. Pattern Recognition, 63, 321–333. https://doi.org/10.1016/j.patcog.2016.09.017
Li, Z., Guo, L., Cheng, J., Chen, Q., He, B., & Guo, M. (2022). The Serverless Computing Survey: A Technical Primer for Design Architecture. ACM Computing Surveys, 54(10s), 1-34. https://doi.org/10.1145/3508360
Powalski, R., Borchmann, Ł., Jurkiewicz, D., Dwojak, T., Pietruszka, M., & Pałka, G. (2021). Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer. Arxiv. https://doi.org/10.48550/arxiv.2102.09550
Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., & Zhou, L. (2021). LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2579–2591. https://doi.org/10.18653/v1/2021.acl-long.201
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Oleksandr Tserkovnyi

This work is licensed under a Creative Commons Attribution 4.0 International License.