Who is this for?
CDO or CIO who is exploring large language model (LLM) in their organisation If your goal is to develop new products or automate processes to reduce costs, having your own LLM may not be necessary!
In the history of AI, we now have two distinct eras: Pre-GPT and post-GPT.
We now live in the post-GPT era of AI. In just six months since ChatGPT’s release, AI developments have dominated headlines. Companies like Anthropic and DeepMind have introduced new large language models (LLMs). Meta and Google have also revealed their own LLMs. This has sparked interest in obtaining a customised LLM or ChatGPT. However, this may be optional to achieve your goals.
The Allure and Challenges of a Custom LLM
Many express interest in having their own ChatGPT or fine-tuned large language model (LLM). But what does this really entail? Creating an LLM from scratch is hugely ambitious, requiring massive data, research teams, and computing power. Even OpenAI struggles with hardware needs. Realistically, most companies cannot develop a unique LLM.
Fine-tuning is more feasible but still faces hurdles. You need huge datasets relevant to your use case, which are difficult to obtain. Fine-tuning costs can reach $700,000. More importantly, fine-tuned models struggle with steerability. They can easily generate inappropriate or incorrect content despite your data.
The role of prompt engineering in making LLMs more contextual
Prompt engineering plays a crucial role in maximising the effectiveness of large language model (LLM) models. It’s a user-driven, hands-on process with both advantages and challenges.
- Relevance: Prompt engineering ensures AI responses align closely with the user’s query or task. Users provide valuable “few-shot” data with each prompt, enabling precise responses.
- Customization: Users can tailor AI responses to their unique needs, fostering personalised and context-aware interactions.
- Performance Boost: High-quality user data enhances AI model performance, resulting in more accurate and insightful answers.
- Token Limitations: Prompt engineering faces token size constraints, requiring responses to fit within limits. This can be challenging for complex queries and lengthy explanations.
- Cost Concerns: Larger responses may incur higher costs for generation and deployment. Striking a balance between cost-effectiveness and detailed responses is crucial.
- Usability: Prompt engineering demands a specialised skill set, limiting its accessibility to a broader user base.
RAG (Retrieval-Augmented Generation) Models
Imagine AI as your research assistant, seamlessly retrieving information from its vast database and providing highly informative and relevant responses—like having a personal librarian at your disposal!
- More factual responses: LLM’s responses are based on the provided information. The model is less likely to “hallucinate” incorrect or misleading information.
- Consistency: more likely to get the same answer to the same question.
- Cost: Building an RAG pipeline is less expensive than fine-tuning. You only need to update the database instead of training a new model when updating the information.
- Currency: Ensure the LLM’s responses are based on up-to-date data.
- Accessible source: Users can have access to the source for cross-checking. The large language model (LLM) acts as a helper while referencing the source as the truth.
- Retrieval Quality: RAG’s effectiveness relies on accurate initial retrieval; inaccuracies can impact responses.
- Developer Dependency: Developers must maintain the Vector DB, and implementing RAG can be computationally intensive and complex.
- Scalability: Scaling RAG for extensive data and queries while maintaining responsiveness requires efficient retrieval and tuning.
- Integration: Integrating RAG into applications may demand additional development effort for seamless interactions.
In summary, RAG models combine retrieval and generation for informative responses, offering user-friendliness but requiring attention to retrieval quality, complexity, scalability, and integration.
Choosing the Right Approach
Deciding between prompt engineering, RAG (retrieval-augmented generation), and fine-tuning depends on your unique needs. Here’s a concise guide:
- Best for: providing specific AI instructions when you have a clear idea of desired outcomes.
- Use Cases: Custom content, precise Q&A, structured data extraction
RAG (retrieval-augmented generation)
- Best for: Incorporating information from extensive knowledge sources into AI responses, especially for knowledge-rich NLP tasks.
- Use Cases: Summarization, recommendations, research assistance
- Best for: tailoring pre-trained language models to specific tasks or domains.
- Use Cases: Sentiment Analysis, Text Classification, Translation, Chatbots
Meet Darsh, an ambitious young investment analyst at a major asset management firm in Europe. Like many in his field, Darsh relies heavily on ESG reports to make sound investing decisions for his clients.
On Monday morning, Darsh logged into the company’s virtual assistant chatbot to ask a routine question: “How many ESG report titles am I allowed to download this month?”
Behind the scenes, the chatbot springs into action. First, it identifies Darsh from his login credentials and pulls up his context: He’s an associate with junior analyst privileges.
Next, the chatbot cross-references Darsh’s role with the company knowledge base. Bingo! Junior analysts are allotted up to 10 reports per month.
Armed with this contextual data, the chatbot politely responds: “Your firm subscribes to 30 reports monthly. Given your junior analyst status, you may download up to 10
But wait—something seems off. Darsh is only granted 5 reports, not 10! Sensing the mistake, the chatbot’s verification module kicks in. It asks a secondary AI model to double-check the facts.
Aha, the original response was wrong. The chatbot quickly self-corrects and replies, “My apologies; you are permitted up to five reports per month.”
Thanks to its intelligent prompt architecture, the chatbot delivered the right information to Darsh, avoiding a costly mistake. It’s just another day’s work for a virtual assistant that adapts on the fly!
Creating a unique large language model (LLM) is challenging and unnecessary for most. Fine-tuning also has significant downsides. In practice, combining these approaches can be powerful. Start with prompt engineering for initial guidance, fine-tune for domain-specificity, and utilise RAG to fetch additional context when needed.
The choice depends on your objectives, task complexity, data availability, and resource constraints. Experimentation helps determine the best strategy for your use case.
Prasad Prabhakaran, Practice Lead, esynergy