OpenAI LLM Chat with Multiple PDFs with Accurate Results

Introduction

Hi folks,
In today’s article, we are going to solve a specific problem of OpenAI and LLM which is chatting with Open AI with multiple pdf documents with accurate responses.

I have been searching for this problem solution for the last 15 days and I have found many ways and tried the same but could not find the exact solution for the above problem. So I came up with a solution by reading multiple articles and watching videos.

Problem Statement

Create an OpenAI-based question-answering tool which can answer queries with multiple PDF documents.

Tech Stack

We are going to use Python as a programming language and some useful libraries to fix this issue.

OpenAI
Langchain
FastAPI
PyPDF2
python-dotenv
langchain_community
FAISS (faiss-cpu)

Code

create a directory for the Fast Api app and in that directory create another directory where the PDF files will be stored

main_dir

docs
main.py

main.py

import os
from fastapi import FastAPI, HTTPException,
from dotenv import load_dotenv
from PyPDF2 import PdfReader
from langchain.memory.buffer import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores.faiss import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI

load_dotenv()

os.environ[‘OPENAI_API_KEY’] = os.getenv(“OPENAI_API_KEY”)
os.environ[“KMP_DUPLICATE_LIB_OK”]=”True”

app = FastAPI(debug=True, title=”Bot API”, version=”0.0.1″)

text_folder = ‘docs’

embedding_function = OpenAIEmbeddings()

pdf_docs = [os.path.join(text_folder, fn) for fn in os.listdir(text_folder)]

def get_pdf_text(pdf_docs):
text = “”
for pdf in pdf_docs:
pdf_reader = PdfReader(pdf)
for page in pdf_reader.pages:
text += page.extract_text()
return text

def get_text_chunks(text):
text_splitter = CharacterTextSplitter(
separator=”n”,
chunk_size=1000,
chunk_overlap=200,
length_function=len
)
chunks = text_splitter.split_text(text)
return chunks

def get_vectorstore(text_chunks):
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
return vectorstore

def get_qa_chain(vectorstore):
llm = ChatOpenAI()
memory = ConversationBufferMemory(
memory_key=’chat_history’, return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=vectorstore.as_retriever(),
memory=memory
)
return conversation_chain, memory

text = get_pdf_text(pdf_docs)
text_chunks = get_text_chunks(text)
vectorstore = get_vectorstore(text_chunks)
qa_chain, memory = get_qa_chain(vectorstore)

@app.get(“/ask-query”)
async def query(query: str):
# Process query

resp = qa_chain.invoke(query)

if(len(resp[‘chat_history’]) >= 6):
memory.clear()

return {“response”: resp}

Conclusion:
This is a basic code which does the stuff as per my requirements. Please install all the requirements create a .env file and add the API key to load in the Fast Api.
If you still face any issues then let’s discuss them in the comment section.

Stiri similare

Chicago woman charged with biting cop at Hammond Walmart

Así ha sido el último punto de Nadal en el Mutua Madrid Open y sus partidos contra Djokovic y Federer en la Caja Mágica

Daily News boys athlete of the week: Dylan Volantis, Westlake

The Cheyenne Supercomputer is going for a fraction of its list price at auction right now

City celebrates townhome transformation in Nob Hill

Top battleground Senate race heats up as party-backed Republican faces onslaught from former Trump official

OpenAI LLM Chat with Multiple PDFs with Accurate Results

Introduction

Problem Statement

Tech Stack

Code

Related

Leave a Reply Cancel reply

Introduction

Problem Statement

Tech Stack

Code

Share on:

Related

Leave a Reply Cancel reply

Stiri similare