This article demostrate the query result of DocQA using LangChain and OpenAI
There’s been a lot of buzz lately about LLM and how it can be used for
different applications with OpenAI. In this article, we’ll give you a quick
rundown on how to set up a cool DocQA application using LangChain and OpenAI,
and we’ll even show you some examples of the query results. So, let’s dive in!
The high-level idea is to:
Load PDFs from a directory.
Read all the texts and split them into chunks.
Convert each chunk into an embedding.
Save the embedding into ChromaDB.
Convert the user’s query input into an embedding.
Find the most relevant chunk based on embedding similarity.
Output the selected chunk.
The code is pretty simple:
Below is the terminal output:
Below are screenshot images that include the asnwer from the target page:
Q1-review: Unable to extract or answer the correct value from the Cash-flow
table. I believe it is because the texts in the table are jumbled together.
Q2-review: Model answered the company is domiciled and incorporated in
Singapore, which is correct. However, I’m expecting the model give me a full
address as answer.
Q3-review: We got the correct answer this time. There is no dividend for the periods ended 30-Sep-2022 and 30-Sep-2021.
Q4-review: Got the correct value 1.09 from the table.
Q5-review: Got the correct value 1.16 from the table. I think it is just lucky
as we all know the text from table are actually jumbled together and is not a
proper sentence like the Q3 dividend case.
Q6-review: The model answered it correctly and personally I like this
question-answer pair the most as the model rephrased it instead of directly
extracting or copying the paragraph from the page.