The article discusses the practical considerations for choosing large language models (LLMs) to build generative AI agents. It introduces the PACT framework, which emphasizes privacy, accuracy, cost, and time efficiency. The framework guides enterprise leaders in selecting the most suitable LLM for their projects, considering factors such as data sensitivity, accuracy requirements, cost-effectiveness, and hardware capabilities. It also recommends techniques like Retrieval Augmented Generation and fine-tuning to enhance performance.
Artificial Intelligence has brought us to the age of generative AI agents, capable of creating entirely new forms of data, from captivating poems to intricate computer code.
Generative AI agents are a type of artificial intelligence solutions (AI) that can create and understand unstructured data, such as text, code, images, videos or music. They inherit intelligence from underlying Large Language Models trained on a massive amount of data and learn to identify patterns in that data. This allows them to generate and perform tasks on applications emulating humans.
Choose a large language model: There are many different generative models available like Open AI GPT series, Anthropic Claude, Google Gemini and Open source models like Llama, Gemma and Mistral etc, each with their own strengths and weaknesses. Top production solution choices include GPT 3.5/Turbo, Gemini, Mistral and other Llama derivatives. But with a plethora of large language models (LLMs) vying for your attention, selecting the ideal one for your project can be a daunting task.
PACT is a no nonsense decisioning framework for enterprise business and technology leaders to start the organisation’s Generative AI journey to deploy production grade solutions in a few weeks if not days.
Privacy
Take a privacy first approach. If the use case involves personally identifying or financial sensitive data? If so, choose a private/open source LLM as first choice and leverage Retrieval Augmented Generation (RAG) and Fine Tuning (FT) techniques. If data privacy is not a major concern (for example informational marketing chatbot) then – your first choice should be one of the public LLMs from Azure Open AI or Anthropic Claude or Google Gemini and any other language specific models.
Accuracy
Focus on accuracy and reduce hallucinations using Retrieval Augmented Generation and Fine tuning techniques by grousing models with your data with proper citation. This approach is good enough for getting through hallucinations and improving effectiveness.
Cost
Arrive at total cost of ownership at production scale. For few use cases leveraging public LLMs may be cost effective provided above two criteria of Privacy and Accuracy are satisfactory. However, for many use cases – simply using an open source LLL/SLM can reduce 70 to 90% costs.
Time
Last but not least, choose the model which can perform to deliver the right end user experience based on query performance on simple commodity hardware. Choose GPU based architecture based on cost and performance requirements.
Prime technique to build generative AI agents/solutions is to ground LLMs with organisational data. As you choose Retrieval Augmented Generation technique for retrieval use cases (primarily) and choose fine tuning approach for generative and reasoning use cases.
PACT is derived based on common sense experience of deploying enterprise grade Generative AI solutions in production at scale while cutting through all the noise around privacy, cost, hallucinations and GPU availability etc. However, ensure you analyse your business problem/use case and adjust the weightage of PACT framework components for your industry and organisation.
Enjoy building and deploying Generative AI solutions and do share any other approaches or additions.
-By Vijay Navaluri, Co-Founder & Chief Customer Officer, Supervity AI