---
tags: computer science, large language model, rag
---
<center>
# Chatbots Lab
</center>
<div style="background-color:rgba(0, 0, 0, 0.10); text-align: center">
Location: Room 223A, 德田館<br>
Time: 19:00 ~ 21:00, Mon. and Thu.
</div>
<br>
<div style="text-align: right">``AI is not here to replace us; <br>it’s here to amplify us.''<br>-- Mustafa Suleyman<br>
</div>
<br>
<div style="text-align: right">``Those who guard their lips preserve their lives,<br>but those who speak rashly will come to ruin.''
<br>
-- Proverbs 13:3 (NIV)
</div>
<br>
## Course Information
### Instructor
:::info
- Name: 盧政良 (Arthur)
- Email: arthurzllu@gmail.com
:::
### Objectives
- Understand the fundamentals of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG).
- Build QA chatbots leveraging RAG to answer questions based on specialized knowledge bases.
- Deploy the QA chatbot to a cloud server (e.g., GCP) providing with simple web UI.
- Develop evaluation mechanisms to assess the chatbot performance and optimize its responses.
### Prerequisites
- Proficiency in Python programming.
- You may refer to my crash course for Python: [APCS Python 101](https://hackmd.io/@arthurzllu/HJNXq84SO).
### Tech Stack
- Python fundamentals
- Basic operations on [Google Colab](https://colab.research.google.com/) to developing chatbots
- LLM APIs
- Vector database: ChromaDB
- Langchain frameworks: RetrievalQA
- Web app using Gradio, Streamlit, Flask
- Evaluation frameworks: Langsmith, RAGAS
- Deployment as cloud service: GCP (or Azure, AWS)
- Virtualization using Docker
- LLMOps
### Recording Classroom Lectures Policy
:::warning
Recording of classroom lectures is <font color="red">**prohibited**</font> unless advance written permission is obtained from the class instructor and any guest presenter(s).
:::
## Syllabus
### Day 0 Introduction <font size =-1><a href = "https://docs.google.com/presentation/d/1EsdL3TMqfRMKqmZ5b4Iasq6UTC-ULCMd0Wc4ej_L4_U/edit?usp=sharing">slides</a></font>
- Overview of LLMs and RAG
- What are LLMs? How do they work?
- Introduction to RAG: Retrieval + Generation synergy
- Use cases for QA chatbots
- Setting Up the Development Environment
- Configure Google Colab and local Python environment
- Install and test libraries
- Convert your knowledge base (e.g., PDF) into text for processing
- Hands-on: Run a pre-trained LLM to generate text
### Day 1 Building the RAG System <font size =-1><a href = "https://docs.google.com/presentation/d/1aU5h6D7b7Tl7PqEv_MpJoYvZ1Tjabw-pzW6O9E8pRH0/edit?usp=sharing">slides</a></font>
- Vectorizing the Knowledge Base
- Convert text to embedding vectors using embedding models
- Set up a vector database (e.g., ChromaDB) for retrieval
- Hands-on: Retrieve relevant documents for sample queries
- Integrating LLM for Answer Generation
- Combine retrieved documents with LLM to generate answers
- Hands-on: Build a simple RAG pipeline
### Day 2 Evaluation and Optimization <font size =-1><a href = "https://docs.google.com/presentation/d/1thU5fzYlp3B9-G9L-jtd3Sd4uqv9RhQ5FmmDaGzieig/edit?usp=sharing">slides</a></font>
- Evaluating Chatbot Performance
- Automated metrics: BLEU, ROUGE, Precision/Recall
- Manual evaluation: Design a scoring rubric (accuracy, fluency, relevance)
- Hands-on: Evaluate your chatbot with a test set
- Optimizing the RAG System
- Improve retrieval (e.g., tweak embedding model)
- Enhance generation (e.g., refine prompts)
- Hands-on: Optimize based on evaluation results
### Day 3 Deploying to the Cloud <font size =-1><a href = "https://docs.google.com/presentation/d/1pVBALToH_idaUwZcI43he5LFBmJMdbkU6REYhg8cQlY/edit?usp=sharing">slides</a></font>
- Google Cloud Platform (GCP)
- Create a new project with a virtual machine (also add a new firewall rule for service)
- BASH: basic commands (ssh, cp, mkdir, mv, apt, and so)
- Docker Techniques
- Pack the app using Dockerfile: build an image and run a container
- Deploying the Chatbot
- Hands-on: Test your chatbot with sample queries
### Day 4 Demenstration <font size =-1><a href = "https://docs.google.com/presentation/d/1UI2T5h4MjKzHJKeJU0nsmRNzlWdJYboJFyqHKie1NRA/edit?usp=sharing">slides</a></font>
- Workshop for students' chatbots
- Future works
## References
### Courses
- Hung-yi Lee, [Introduction to Generative AI](https://speech.ee.ntu.edu.tw/~hylee/genai/2024-spring.php), Department of Electrical Engineering, National Taiwan University, 2024sp
- Jerry Liu and Anupam Datta, [Building and Evaluating Advanced RAG Applications](https://www.deeplearning.ai/short-courses/building-evaluating-advanced-rag/), [Deeplearning.AI](https://www.deeplearning.ai/)
### LLM
- https://ollama.com/
- https://huggingface.co/models?other=LLM
### Frameworks
- [LlamaIndex](https://docs.llamaindex.ai/en/stable/)
- [LangChain](https://python.langchain.com/docs/introduction/)
- [LangSmith](https://docs.smith.langchain.com/)
- [LangGraph](https://www.langchain.com/langgraph)
- [RAGAS](https://docs.ragas.io/en/stable/)
### Embeddeing Models
- https://huggingface.co/models?library=sentence-transformers
### Vector Databases
- [Chroma](https://www.trychroma.com/)
### Cloud Serverless
- [Google Cloud Documentation](https://cloud.google.com/docs)
### MISC
- [自組雲端伺服器提供學員做 chatbot 的空間](https://grok.com/share/bGVnYWN5_fff8a0cc-5438-4aa2-aa52-0117b6d7a27a)
- [ChatGPT API 費用詳解:如何選擇模型並優化成本](https://www.newspiggy.com/post/chatgpt-api-pricing)
- https://www.youtube.com/watch?v=EWvNQjAaOHw
## Gradebook
<iframe src="https://docs.google.com/spreadsheets/d/e/2PACX-1vTXBtOmF8YfJXR0xndH2yJxTDQqeu4QqzwjbC1iNgIxuMQIC84G_6g8c-NlPxJmESUTexG3snysPhWz/pubhtml?gid=384028235&single=true&widget=true&headers=false" height = 400></iframe>
<iframe src="https://docs.google.com/spreadsheets/d/e/2PACX-1vTXBtOmF8YfJXR0xndH2yJxTDQqeu4QqzwjbC1iNgIxuMQIC84G_6g8c-NlPxJmESUTexG3snysPhWz/pubhtml?gid=0&single=true&widget=true&headers=false" height = 400></iframe>
<!--
Recent innovations in generative AI (GenAI) have made cutting-edge technology more affordable and accessible than ever for public sector organizations. In addition, the widespread adoption of generative AI tools has introduced a broad range of use cases that help public sector organistiaon deliver value to citizens, gain more valuable insights from their data, and automate repetitive tasks. Some of the most common use cases include:
Public-facing chatbots to deliver information about an organization conversationally.
Internal chatbots to help employees quickly extract meaningful insights from your organization's data without having to write code or database queries. This could include donor and outreach data, volunteer data, budgeting, reporting, strategy and more.
Multi-media content generation for your outreach that is customized to match your organization's tone and style.
Overview
Prompt engineering is an emerging discipline focused on developing optimized prompts to efficiently apply language models to various tasks. Prompt engineering helps researchers understand the abilities and limits of large language models (LLMs). By using various prompt engineering techniques, you can often get better answers from the foundation models without spending effort and cost on retraining or fine-tuning them.
Note that prompt engineering does not involve fine-tuning the model. In fine-tuning, the weights/parameters are adjusted using training data with the goal of optimizing a cost function. Model fine-tuning is generally an expensive process in terms of computation time and actual costs. Prompt engineering attempts to guide the trained foundation model (FM) to provide more relevant and accurate answers using various methods, such as, better worded questions, similar examples, intermediate steps, and logical reasoning.
Prompt Engineering leverages the principle of “priming”: providing the model with a context of few (3-5) examples of what the user expects the output to look like, so that the model mimics the previously “primed” behavior. By interacting with the LLM through a series of questions, statements, or instructions, users can effectively guide the LLM's understanding and adjust its behavior according to the specific context of the conversation.
In short, prompt engineering is a new and important field for optimizing how you apply, develop, and understand language models, especially large language models. At its core, it is about designing prompts and interactions to expand what language technologies can do, address their weaknesses, and gain insights into their functioning. Prompt engineering equips us with strategies and techniques for pushing the boundaries of what is possible with language models and their applications.
Why is it relevant?
The key ideas are:
Prompt engineering is the fastest way to harness the power of large language models.
Prompt engineering optimizes how you work with and direct language models.
It boosts abilities, improves safety, and provides understanding.
Prompt engineering incorporates various skills for interfacing with and advancing language models.
Prompt engineering enables new features like augmenting domain knowledge with language models without changing model parameters or fine-tuning.
Prompt engineering provides methods for interacting with, building with, and grasping language models' capabilities.
Higher quality prompt inputs lead to higher quality outputs.
In this guide, you will focus on best practices for prompt engineering with several models in Bedrock, including Amazon Titan models and third party models provided by Anthropic and Mistral AI.
Structure of a prompt
As you explore prompt engineering examples, note that each prompt contains one or more of the following:
Role: What persona the model should adopt when generating text.
Instructions: A task for the model to do. (Task description or instruction on how the model should perform)
Context: External information to guide the model.
Input data: The input you want a response for.
Output indicator: The output type or format.
What is Retrieval Augmented Generation?
You've already used a form of RAG in Lab #1 where you pasted in a document and asked the model a question. But there is only so much information that a model can ingest, so we need to a system that can handle not only larger documents, but whole libraries.
Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts. -->