# IMP301 - Vietnamese Handwritten Image Q&A
This project aims to make chatbots better by being able to ingest knowledge from images besides from text sources.
## What To Do?
1. Build OCR pipeline to extract text from image
1. Criteria
- OCR pipeline should accept a Vietnamese handwritten image and return the corresponding formatted text string
- Submission includes a model weights file (.pt, .bin, etc.), and the code that reads an image from disk and returns a string
2. Dataset
- Cinnamon: https://drive.google.com/drive/folders/1Qa2YA6w6V5MaNV-qxqhsHHoYFRK5JB39
- VNonDB
3. Resources
- Image processing methods for handwritten images: https://medium.com/lumenore/build-an-ocr-system-from-scratch-in-python-69c08e78de2
- https://pbcquoc.github.io/vietnamese-ocr
4. Notes
- Image should be pre-processed with image processing methods: sharpen, threshold, align
2. Build embedding system to embed extracted text into vectors
1. Criteria
- Embedding system should accept a text string and return the corresponding embedded vector
- Submission includes a model weights file (.pt, .bin, etc.), and the code that accepts a string and return a vector
2. Resources
3. Notes
- Need to understand different methods on how to train
- Can use pre-trained models such as BERT or PhoBERT for VNmese
3. Build vector similar search system to search for similar vectors
1. Criteria
- Vector similar search system should accept a vector and return top_k similar vectors
2. Resources
- https://labelbox.com/blog/how-vector-similarity-search-works/
4. Notes:
- Can use a pre-built db (qdrant, chroma, etc.) but need to understand how it works, e.g. indexing algorithms, searching algorithms, etc.
- Implement a custom db if possible
4. Build QA system with LLMs to interact with user
1. Criteria
- QA system should accept a query and return an answer
- When a question is asked, embed it into a vector and find a similar vector, then use the text of that vector to input into LLM
2. Resources
- https://medium.com/@murtuza753/using-llama-2-0-faiss-and-langchain-for-question-answering-on-your-own-data-682241488476
5. Build a simple web app
1. Criteria
- User should be able to upload an image, enter a question, and receive an answer
2. Resources
- Streamlit/gradio
## Timeline
### 14 Sep - 17 Sep
1. Learn/revise OCR and basic image processing techniques
- How to manipulate image using Python, e.g. read/write image, convert to numpy, convert png to jpeg & vice versa, change brightness, contrast, etc.?
- Which techniques can be applied to enhance the image quality for DL models, e.g. sharpening, thresholding, text aligning, etc.?
- Which types of DL models can be applied to solve the OCR problem, e.g. what architecture, how many layers, why?
2. Learn/revise embedding in NLP
- What is embedding and why need to use embedding?
- What are the methods to embed a string into a vector?
3. Learn/revise vector similar search
- What is vector similar search? Why using vector search? Vector search vs other search, e.g. full-text search?
- What are the related algorithms?
- How to store those vectors?