Generative AI Agents at Enterprise Scale: Architecting RAG-Enhanced LLM Systems for Production Deployment

Authors

  • Manikanteswara Yasaswi Kurra Author

DOI:

https://doi.org/10.70153/IJCMI/2025.17303

Keywords:

Generative AI Agents, Retrieval-Augmented Generation, Enterprise LLM Systems, Multi-Agent Orchestration, Vector Databases, AI Governance

Abstract

The rapid evolution of Large Language Models (LLMs) has catalyzed a fundamental shift in enterprise AI capabilities, enabling organizations to deploy intelligent agents that combine generative AI with
retrieval-augmented generation (RAG) for autonomous decision-making and task execution. This paper presents
a comprehensive framework for architecting and deploying generative AI agents at enterprise scale, addressing the
unique challenges of production environments including data sovereignty, system reliability, and operational governance. We examine how RAG architectures mitigate LLM limitations by dynamically incorporating domainspecific knowledge from enterprise repositories, achieving significant improvements in response accuracy and
contextual relevance. The study details multi-layered architecture patterns encompassing agent orchestration,
memory systems, tool integration, and feedback loops essential for sustained performance. Our analysis covers
critical implementation dimensions including vector database optimization, chunking strategies, hybrid search
mechanisms, and semantic caching techniques that enable sub-second response times at scale. Security and
compliance frameworks are explored, including role-based access control, data lineage tracking, and audit mechanisms required for regulated industries. Performance benchmarking across financial services, healthcare, and
manufacturing deployments reveals 45-65% accuracy improvements over baseline LLMs and 30-55% reduction in
operational overhead through intelligent automation. We address scalability bottlenecks, cost optimization strategies, and integration patterns for legacy enterprise systems. The paper concludes with architectural blueprints,
best practices for iterative deployment, and a maturity model guiding organizations from proof-of-concept to
full-scale production systems capable of handling millions of daily interactions while maintaining governance,
explainability, and continuous improvement capabilities

Downloads

Download data is not yet available.

Author Biography

  • Manikanteswara Yasaswi Kurra

    Manikanteswara Yasaswi Kurra
    Senior Associate, Cognizant Technology Solutions
    Email: manikanteswarayasaswikurra@gmail.com

References

Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot

learners. Advances in Neural Information Processing Systems. 2020;33:1877-1901.

Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A survey of large language models. arXiv preprint

arXiv:2303.18223. 2023.

Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-augmented generation for

knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems. 2020;33:9459-9474.[4] Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, et al. Retrieval-augmented generation for large language models:

A survey. arXiv preprint arXiv:2312.10997. 2023.

Xi Z, Chen W, Guo X, He W, Ding Y, Hong B, et al. The rise and potential of large language model based

agents: A survey. arXiv preprint arXiv:2309.07864. 2023.

Wang L, Ma C, Feng X, Zhang Z, Yang H, Zhang J, et al. A survey on large language model based autonomous

agents. arXiv preprint arXiv:2308.11432. 2023.

Zhu Y, Wang X, Chen J, Qiao S, Ou Y, Yao Y, et al. LLMs for knowledge graph construction and reasoning:

Recent capabilities and future opportunities. arXiv preprint arXiv:2305.13168. 2023.

Mialon G, Dess`ı R, Lomeli M, Nalmpantis C, Pasunuru R, Raileanu R, et al. Augmented language models:

a survey. arXiv preprint arXiv:2302.07842. 2023.

Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, et al. On the opportunities and risks

of foundation models. arXiv preprint arXiv:2108.07258. 2021.

Liu X, Yu H, Zhang H, Xu Y, Lei X, Lai H, et al. AgentBench: Evaluating LLMs as agents. arXiv preprint

arXiv:2308.03688. 2023.

Sumers TR, Yao S, Narasimhan K, Griffiths TL. Cognitive architectures for language agents. arXiv preprint

arXiv:2309.02427. 2023.

Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, et al. PaLM: Scaling language modeling

with pathways. arXiv preprint arXiv:2204.02311. 2022.

Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open foundation and

fine-tuned chat models. arXiv preprint arXiv:2307.09288. 2023.

Anthropic. The Claude 3 model family: Opus, Sonnet, Haiku. Technical Report. 2024.

Asai A, Wu Z, Wang Y, Sil A, Hajishirzi H. Self-RAG: Learning to retrieve, generate, and critique through

self-reflection. arXiv preprint arXiv:2310.11511. 2023.

Shi W, Min S, Yasunaga M, Seo M, James R, Lewis M, et al. REPLUG: Retrieval-augmented black-box

language models. arXiv preprint arXiv:2301.12652. 2024.

Johnson J, Douze M, J´egou H. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data.

;7(3):535-547.

Malkov YA, Yashunin DA. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020;42(4):824-

Douze M, Guzhva A, Deng C, Johnson J, Szilvasy G, Mazar´e PE, et al. The Faiss library. arXiv preprint

arXiv:2401.08281. 2024.

Park JS, O’Brien JC, Cai CJ, Morris MR, Liang P, Bernstein MS. Generative agents: Interactive simulacra

of human behavior. arXiv preprint arXiv:2304.03442. 2023.

Downloads

Published

2025-12-31

How to Cite

[1]
M. Y. Kurra, “Generative AI Agents at Enterprise Scale: Architecting RAG-Enhanced LLM Systems for Production Deployment”, IJCMI, vol. 17, no. 1, pp. 17361–17377, Dec. 2025, doi: 10.70153/IJCMI/2025.17303.

Similar Articles

1-10 of 20

You may also start an advanced similarity search for this article.