Manjit Singh
6 min readMay 8, 2024

Retrieval Augmented Generation (RAG with Amazon Bedrock Code Sample)

In the world of artificial intelligence and machine learning, the quest to make models more efficient, accurate, and context-aware is ongoing. One such advancement in this arena is the concept of Retrieval Augmented Generation (RAG). Services like Amazon Bedrock, with hosted models simplifies harnessing RAG to elevate the capabilities of LLMs. But what exactly is RAG, and how does it help?

This is continuation from previous post regarding Foundation Models:

Understanding RAG

At its core, RAG is a process designed to enhance text generation in LLMs by retrieving relevant information related to a query and using this information to augment the generation process. It’s a straightforward yet powerful idea that leverages the depth of neural network-based models. These models require an internal mathematical vector representation of words, or tokens, a process known as text embedding.

Text embedding models transforms words and tokens into a vector representation. Text embeddings are not limited to text; multimodal embeddings are used to handle images, although we’ll focus on text embeddings for now.

The Magic of Text Embedding

The embedding process converts text into an N-dimensional vector, which can then be utilized in various applications. This vectorized representation of text documents opens up a variety of potential use cases, including but not limited to search (where results are ranked by relevance to a query string), clustering (grouping text strings by similarity), recommendations, anomaly detection etc.

The RAG Process in Action

Let us say we have text documents with current political and sporting news, financial markets or just some corporate documents. Base models would not know about these. Through the RAG process, these document vectors can add context to a query, assisting the LLM in generating a more precise answer.

What sets RAG apart is its ability to enhance LLM prompts with additional context from the most similar or relevant document based on vector similarity. This approach creates a customized version of the LLM prompt that appears fine-tuned to the specific document context. It’s an alternative to custom modeling that’s not only cost-effective but also offers flexibility without the need for hosting custom models.

RAG Process

Overcoming Limitations of Traditional LLMs

Traditional LLMs might struggle with queries about very recent events not included in their training set or proprietary data like internal corporate documentation. This is where RAG shines. By adding context to the query via a document search conducted with text embeddings, RAG bridges the gap between the LLM’s knowledge base and the specificity of the query.

Such an approach can substitute for fine-tuning a model since it’s far less expensive, and you can store the vector embeddings locally, making it a highly practical solution for businesses and developers alike.

To invoke models, we will use Amazon bedrock-runtime service

import boto3
bedrock = boto3.client(region_name="us-east-1",service_name='bedrock-runtime')

Let us use amazon.titan-embed-text-v1 to convert text to vector embeddings

import json
def embed_text_from_amazon(content):
# Convert the input text into a JSON-formatted string
json_request = {"inputText": content}
body = json.dumps(json_request)

try:
# Invoke the model with the prepared request body
response = bedrock.invoke_model(body=body, modelId="amazon.titan-embed-text-v1")

# Parse the response to get the embedding
response_body = response.get('body').read()
embedding = json.loads(response_body)['embedding']
return embedding
except Exception as e:
print(f"An error occurred: {e}")
return None

Prepare some text files with recent events

I have prepared few small text files that capture few very recent events that LLMs deployed in Cloud are unlikely to know so soon:

  • Kentucky Derby results for 2024 (May)
  • UK Local elections held in May 2024
  • NASA confirming debris from ISS falling in Florida in April 2024

Loop through these files and get their vector embeddings using amazon model defined earlier

NOTE: Ideally these would be stored in some persistent vector database but for testing purpose we just store them in panadas data frame.

import os
import pandas as pd


# Directory containing the text files
directory = 'recent-events-2024'

# List to hold file data and embeddings
data = []

# Loop through each file in the directory
for filename in os.listdir(directory):
if filename.endswith(".txt"):
# Construct full file path
path = os.path.join(directory, filename)
# Read the content of the file
with open(path, 'r', encoding='utf-8') as file:
content = file.read()
# Generate embedding for the content
embedding = embed_text_from_amazon(content)
# Append a tuple of filename, content, and embedding to the list
data.append((filename, content, embedding))

# Create a DataFrame
df = pd.DataFrame(data, columns=['Filename', 'Content', 'Embedding'])

print(df.head())
Filename                                            Content  \
0 bernard-hill .txt Actor Bernard Hill, best known for roles in Ti...
1 kentucky-derby-2024.txt Saturday's 150th Kentucky Derby (G1) delivered...
2 space-debris-2024.txt Monday, April 22, 2024\nNASA confirmed on Mond...
3 uk-local-elections.txt The 2024 United Kingdom local elections took p...

Embedding
0 [-0.0034637451, -0.014587402, 0.140625, -0.224...
1 [-0.095703125, -0.045654297, -0.0013885498, -0...
2 [0.79296875, -0.034179688, -0.15625, 0.5234375...
3 [0.59375, 0.012817383, 0.12597656, -0.02160644...

Find most similar text using Cosine Similarity

import json
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
# This code is written this way because of just testing in memory
def find_most_similar_text(df, query_embedding):
# Convert DataFrame embeddings into a list of embeddings
embeddings = np.array(df['Embedding'].tolist())

# Calculate cosine similarity between the query and all embeddings
similarities = cosine_similarity([query_embedding], embeddings)

# Find the index of the highest similarity score
most_similar_idx = np.argmax(similarities)

# Return the content of the most similar document
return df.iloc[most_similar_idx]['Content']

LLM with RAG

Convert question to embeddings, do local similarity search and query LLM

def answer_question_with_context(df, question_text, max_tokens=512):
# get question embeddings
question_embedding = embed_text_from_amazon(question_text)

# Perform similarity search to find the most relevant text
similar_text = find_most_similar_text(df, question_embedding)

# Compose the prompt by appending the question text
prompt = f"Answer the following question: {question_text} in the provided context only. Here is the reference text:\n{similar_text}"
#prompt = similar_text + " " + question_text

# Prepare the request body for the model invocation
body = json.dumps({
"inputText": prompt,
"textGenerationConfig": {
"temperature": 0,
"topP": 0.01,
"maxTokenCount": max_tokens
}
})

try:
# Invoke the Bedrock model with the prepared prompt
response = bedrock.invoke_model(body=body, modelId="amazon.titan-text-express-v1")
response_body = json.loads(response.get('body').read())
print(response_body["results"][0]["outputText"])
except Exception as e:
print(f"An error occurred while invoking the model: {e}")
question_text = "Who won the Kentucky Derby in 2024?"
answer_question_with_context(df, question_text)
Mystik Dan
#let us ask bit more about the race
question_text = "Can you tell me about the Kentucky Derby in 2024?"
answer_question_with_context(df, question_text)
The 145th running of the Kentucky Derby was held on May 6, 2024, at Churchill Downs in Louisville, Kentucky. The race was won by the 18-1 shot Mystik Dan, ridden by jockey Brian Hernandez Jr. and trained by Kenny McPeek. Mystik Dan's victory marked the first time in the race's history that a horse with odds of 18-1 or higher won. The margin of victory was only a head, with Sierra Leone and Forever Young finishing second and third, respectively. The Kentucky Derby is one of the most prestigious horse races in the world, and its 150th running was a historic event.

Let us ask one more question:

question_text = "Can you tell me about UK Local Elections in 2024?"
answer_question_with_context(df, question_text)
The 2024 United Kingdom local elections took place on 2 May 2024 to choose around 2,600 councillors on 107 councils in England, 11 directly elected mayors in England, the 25 members of the London Assembly, and 37 police and crime commissioners in England and Wales. The 2024 Blackpool South parliamentary by-election was held on the same day.

The results were a strong showing for the Labour Party, who finished first on the expense of the Conservative Party, who finished third. The Liberal Democrats finished second for the first time since 2009.

Summary

  • For testing, we avoided use of actual persistent vector database. For large text corpus, we may run batch process first to generate text embeddings
  • Our text documents were unique, so we did not had to play with similarity search
  • We were able to run search samples using Retrieval Augmented Generation and got expected results
Manjit Singh

Platform Engineer, Senior Software Engineer & Data Scientist.