Using a Knowledge Base to Connect Amazon Bedrock to your Custom Data

Using a Knowledge Base to Connect Amazon Bedrock to your Custom Data

Here’s a step-by-step process for using Knowledge Bases with Amazon Bedrock, to easily customize Bedrock with your own data. The code and commands used can be found here.

The architecture looks like this:

Architecture Components

A SageMaker notebook is used as the IDE (Integrated Dev Environment), to run all commands and code to interact with Bedrock.
The custom data is uploaded to an S3 bucket. You can manually upload any text based data that you want to use.
Amazon OpenSearch Serverless (AOSS) is used to create a vector index/data store, from the S3 data.
The Bedrock Knowledge Base is configured to use the AOSS vector index as it’s data store, and will answer prompts based on the provided data.

Prerequisites

1) Do everything in us-west-2.
2) In your AWS account, request access for the Bedrock models that you would like to use. You’ll find this in the Bedrock console, under model access. ( For this, I enabled all the Amazon Titan and Claude models.)

3)Create a Sagemaker Notebook, that we’ll use to run the commands from.

4) The IAM role associated with your notebook instance will need permission for a few services, so after the instance is created, I updated the associated IAM role:

S3 full access
Access to make API calls to Bedrock
IAM access to create IAM roles
Lambda full access
Amazon Opensearch serverless full access

Custom policy enabling Amazon OpenSearch Serverless access:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: “aoss:*”,
“Resource”: “*”
}
]
}

After everything is set up, the permissions can be tightened up.

5) After the notebook is ready, select the notebook instance and select open the Jupyter Lab. Then clone the following Git repository, to download the .ipynb file with all the commands to use: https://github.com/fayekins/bedrock-rag-demo/

6) From the cloned repository, open the following file: bedrock_rag_demo.ipynb

7) Run all the cells in contained in the .ipynb file, which at a high level, will do the following:

Install required libraries like boto3, which is the AWS SDK for Python that interacts with Bedrock. And opensearch-py which is the Python client used to interact with OpenSearch.
Create an S3 bucket to store our custom data. (Then manually upload the custom data, I just used a PDF containing some text.)
Create the OpenSearch Serverless collection which is a container for OpenSearch indexes.

Create the OpenSearch Serverless vector index. This will contain the vector embeddings, or numerical representations of your data. So that the LLM can make sense of your data and understand the meaning it contains.
Configure the Bedrock Knowledge Base using the OpenSearch Serverless vector index. Data source will be S3.
Ingest the data into the Knowledge Base

Testing

8) Run some prompts to test that the LLM is using the Knowledge Base to answer. Try a prompt that we know it won’t find the answer in the custom data. If it’s working properly, the model should return a that it doesn’t know.

So this is a great way to avoid dreaded hallucinations, to help improve accuracy, and to control the data that is being used by the model!

The code I used can be found here.

Leave a Reply

Your email address will not be published. Required fields are marked *