IMP301 - Vietnamese Handwritten Image Q&A

# IMP301 - Vietnamese Handwritten Image Q&A This project aims to make chatbots better by being able to ingest knowledge from images besides from text sources. ## What To Do? 1. Build OCR pipeline to extract text from image 1. Criteria - OCR pipeline should accept a Vietnamese handwritten image and return the corresponding formatted text string - Submission includes a model weights file (.pt, .bin, etc.), and the code that reads an image from disk and returns a string 2. Dataset - Cinnamon: https://drive.google.com/drive/folders/1Qa2YA6w6V5MaNV-qxqhsHHoYFRK5JB39 - VNonDB 3. Resources - Image processing methods for handwritten images: https://medium.com/lumenore/build-an-ocr-system-from-scratch-in-python-69c08e78de2 - https://pbcquoc.github.io/vietnamese-ocr 4. Notes - Image should be pre-processed with image processing methods: sharpen, threshold, align 2. Build embedding system to embed extracted text into vectors 1. Criteria - Embedding system should accept a text string and return the corresponding embedded vector - Submission includes a model weights file (.pt, .bin, etc.), and the code that accepts a string and return a vector 2. Resources 3. Notes - Need to understand different methods on how to train - Can use pre-trained models such as BERT or PhoBERT for VNmese 3. Build vector similar search system to search for similar vectors 1. Criteria - Vector similar search system should accept a vector and return top_k similar vectors 2. Resources - https://labelbox.com/blog/how-vector-similarity-search-works/ 4. Notes: - Can use a pre-built db (qdrant, chroma, etc.) but need to understand how it works, e.g. indexing algorithms, searching algorithms, etc. - Implement a custom db if possible 4. Build QA system with LLMs to interact with user 1. Criteria - QA system should accept a query and return an answer - When a question is asked, embed it into a vector and find a similar vector, then use the text of that vector to input into LLM 2. Resources - https://medium.com/@murtuza753/using-llama-2-0-faiss-and-langchain-for-question-answering-on-your-own-data-682241488476 5. Build a simple web app 1. Criteria - User should be able to upload an image, enter a question, and receive an answer 2. Resources - Streamlit/gradio ## Timeline ### 14 Sep - 17 Sep 1. Learn/revise OCR and basic image processing techniques - How to manipulate image using Python, e.g. read/write image, convert to numpy, convert png to jpeg & vice versa, change brightness, contrast, etc.? - Which techniques can be applied to enhance the image quality for DL models, e.g. sharpening, thresholding, text aligning, etc.? - Which types of DL models can be applied to solve the OCR problem, e.g. what architecture, how many layers, why? 2. Learn/revise embedding in NLP - What is embedding and why need to use embedding? - What are the methods to embed a string into a vector? 3. Learn/revise vector similar search - What is vector similar search? Why using vector search? Vector search vs other search, e.g. full-text search? - What are the related algorithms? - How to store those vectors?