Using a Multimodal Document ML Model to Query Your Documents | by Eivind Kjosbakken | Apr, 2024

Using a Multimodal Document ML Model to Query Your Documents | by Eivind Kjosbakken | Apr, 2024

Leverage the power of the mPLUG-Owl document understanding model to ask questions about your documents

This article will discuss the Alibaba document understanding model, recently released with model weights and datasets. It is a powerful model capable of performing various tasks such as document question answering, extracting information, and document embedding, making it a helpful tool when working with documents. This article will implement the model locally and test it out on different tasks to give an opinion on its performance and usefulness.

This article will discuss the latest model within document understanding. Image by ChatGPT. OpenAI. (2024). ChatGPT (4) [Large language model]. https://chat.openai.com

· Motivation
· Tasks
· Running the model locally
· Testing of the model
∘ Data
∘ Testing the first, leftmost receipt:
∘ Testing the second, rightmost receipt:
∘ Testing the first, leftmost lecture note:
∘ Testing the second, rightmost lecture note
· My thoughts on the model
· Conclusion

My motivation for this article is to test out the latest machine-learning models that are publicly available. This model caught my attention since I have worked and am still working on machine learning applied to documents. I have also previously written an article on my work with a similar model called Donut that does OCR-free document understanding. I think the concept of having a document and asking visual and textual questions about it is awesome, so I spend time working with documents, understanding models, and testing their performance. This article is the second article in my series on testing out the latest machine-learning models, and you can read my first article on time series forecasting with Chronos below:

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *