Architectural Strategies for Reprocessing Historical Data in Real-Time Systems
Keywords:
Kappa architecture, Apache Kafka, real-time stream processing, historical data reprocessing, time-travel replay, Kafka Streams, Apache Flink, stateful microservices, event sourcing, data streaming architectureAbstract
This study examines architectural strategies for reprocessing historical data in real-time systems built around the Kappa architecture and Apache Kafka–based microservices. The research addresses the growing need to recompute derived state, machine-learning features and aggregates without interrupting continuous processing or violating correctness guarantees. The work systematises approaches to “time-travel” over event logs, including full-topic replay, snapshot-plus-log reconstruction and isolated backfill pipelines. Special attention is given to the interaction between Kafka, stateful stream processors such as Apache Flink and Kafka Streams, and microservice-oriented designs that rely on local or external state stores. The goal is to formulate practical design guidelines for architecting reprocessing workflows under strict latency, availability and consistency requirements. The article presents an analytical comparison of modern stream-processing platforms and real-world case studies from the financial and fraud detection domains. In conclusion, the study formulates recommendations on choosing between local and external state, structuring replay traffic, and integrating reprocessing pipelines into production Kappa-style systems without global downtime.
References
Bozkurt, A., Ekici, F., & Yetiskul, H. (2023). Utilizing Flink and Kafka technologies for real-time data processing: A case study. The Eurasia Proceedings of Science, Technology, Engineering and Mathematics, 24, 177–183. https://doi.org/10.55549/epstem.1406274
Dev, R. S., & Usha, J. (2025). Real-time processing with Kafka, ksqlDB & Apache Flink: A fraud detection pipeline. International Journal of Computer Applications, 187(60), 13–18. https://www.ijcaonline.org/archives/volume187/number60/dev-2025-ijca-925872.pdf
Mei, Y., Lan, Z., Huang, L., Lei, Y., Yin, H., Xia, R., Hu, K., Carbone, P., Kalavri, V., & Wang, F. (2025). Disaggregated state management in Apache Flink 2.0. Proceedings of the VLDB Endowment, 18(12), 4846–4859. https://doi.org/10.14778/3750601.3750609
Pamarthi, S. (2023). Apache Flink and Apache Kafka in financial services: Real-time streaming for data processing and analytics [White paper]. https://www.researchgate.net/publication/397017733_Apache_Flink_and_Apache_Kafka_in_Financial_Services_Real-Time_Streaming_for_Data_Processing_and_Analytics
Pelle, I., Szőke, B., Fayad, A., Cinkler, T., & Toka, L. (2023). A comprehensive performance analysis of stream processing with Kafka in cloud native deployments for IoT use-cases. In NOMS 2023: IEEE/IFIP Network Operations and Management Symposium (pp. 1–6). https://doi.org/10.1109/NOMS56928.2023.10154377
Podduturi, S. M. (2024). Real-time data processing in microservices architectures. International Journal of Computer Engineering and Technology, 15(6), 760–773. https://doi.org/10.5281/zenodo.14228620
Saket, S., Chandela, V., & Kalim, M. D. (2024). Real-time event joining in practice with Kafka and Flink. arXiv. Advance online publication. https://arxiv.org/abs/2410.15533
Tambi, V. K. (2023). Real-time data stream processing with Kafka-driven AI models. International Journal of Current Engineering and Scientific Research. Advance online publication. https://philpapers.org/archive/VARRDS.pdf
Tanneru, B. (2023). Application of Kafka messaging in microservices for real-time data processing. International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences, 11(5), 1–4. https://doi.org/10.5281/zenodo.14945204
Wang, G., Chen, L., Dikshit, A., Gustafson, J., Chen, B., Sax, M. J., Roesler, J., Blee-Goldman, S., Cadonna, B., Mehta, A., Madan, V., & Rao, J. (2021). Consistency and completeness: Rethinking distributed stream processing in Apache Kafka. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD ’21) (pp. 2602–2613). https://doi.org/10.1145/3448016.3457556
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Rybchanka Aliaksandr

This work is licensed under a Creative Commons Attribution 4.0 International License.