OpenAI LLM Chat with Multiple PDFs with Accurate Results

Rmag Breaking News

Introduction

Hi folks,
In today’s article, we are going to solve a specific problem of OpenAI and LLM which is chatting with Open AI with multiple pdf documents with accurate responses.

I have been searching for this problem solution for the last 15 days and I have found many ways and tried the same but could not find the exact solution for the above problem. So I came up with a solution by reading multiple articles and watching videos.

Problem Statement

Create an OpenAI-based question-answering tool which can answer queries with multiple PDF documents.

Tech Stack

We are going to use Python as a programming language and some useful libraries to fix this issue.

OpenAI
Langchain
FastAPI
PyPDF2
python-dotenv
langchain_community
FAISS (faiss-cpu)

Code

create a directory for the Fast Api app and in that directory create another directory where the PDF files will be stored

main_dir

docs
main.py

main.py

import os
from fastapi import FastAPI, HTTPException,
from dotenv import load_dotenv
from PyPDF2 import PdfReader
from langchain.memory.buffer import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores.faiss import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI

load_dotenv()

os.environ[‘OPENAI_API_KEY’] = os.getenv(“OPENAI_API_KEY”)
os.environ[“KMP_DUPLICATE_LIB_OK”]=”True”

app = FastAPI(debug=True, title=”Bot API”, version=”0.0.1″)

text_folder = ‘docs’

embedding_function = OpenAIEmbeddings()

pdf_docs = [os.path.join(text_folder, fn) for fn in os.listdir(text_folder)]

def get_pdf_text(pdf_docs):
text = “”
for pdf in pdf_docs:
pdf_reader = PdfReader(pdf)
for page in pdf_reader.pages:
text += page.extract_text()
return text

def get_text_chunks(text):
text_splitter = CharacterTextSplitter(
separator=”n”,
chunk_size=1000,
chunk_overlap=200,
length_function=len
)
chunks = text_splitter.split_text(text)
return chunks

def get_vectorstore(text_chunks):
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
return vectorstore

def get_qa_chain(vectorstore):
llm = ChatOpenAI()
memory = ConversationBufferMemory(
memory_key=’chat_history’, return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=vectorstore.as_retriever(),
memory=memory
)
return conversation_chain, memory

text = get_pdf_text(pdf_docs)
text_chunks = get_text_chunks(text)
vectorstore = get_vectorstore(text_chunks)
qa_chain, memory = get_qa_chain(vectorstore)

@app.get(“/ask-query”)
async def query(query: str):
# Process query

resp = qa_chain.invoke(query)

if(len(resp[‘chat_history’]) >= 6):
memory.clear()

return {“response”: resp}

Conclusion:
This is a basic code which does the stuff as per my requirements. Please install all the requirements create a .env file and add the API key to load in the Fast Api.
If you still face any issues then let’s discuss them in the comment section.

Leave a Reply

Your email address will not be published. Required fields are marked *