# Building an AI-Powered Technical Question Answering System with Python
## A Promise Fulfilled
Some weeks ago, I started the GenAI program sponsored by Andela and tutored by Ed Donner, and I promised to share everything I learned along the way. Well, I'm here to keep that promise!
This article is the first in a series where I'll be sharing practical projects from my journey in Generative AI. Today, we're building an AI-powered assistant called `🧞 Genie` that answers coding questions right in your terminal or web browser.
Let's dive in! 🚀
## What We're Building
A smart AI assistant that:
- 🤖 Answers technical programming questions with code examples
- 🌊 Streams responses in real-time (like ChatGPT)
- 🎨 Displays beautiful markdown formatting in the terminal
- 🌐 Has an optional web interface
- 🔄 Works with OpenAI's GPT models OR local Ollama models
**Two Ways to Use:**
1. **Terminal Interface**: Fast for developers
2. **Web Interface**: User-friendly for everyone
## Project Setup
### Installing UV
`uv` is a blazing-fast Python package manager. It's faster than pip and handles virtual environments automatically.
**Windows (PowerShell):**
```powershell
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
```
**macOS/Linux:**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
Verify:
```bash
uv --version
```
### Creating Your Project
```bash
mkdir llm-answer-tech-ques
cd llm-answer-tech-ques
uv init
```
### Installing Dependencies
Create `requirements.txt`:
```txt
python-dotenv
openai
gradio
rich
```
**What each package does:**
- **python-dotenv**: Loads API keys securely from `.env` file
- **openai**: Official SDK for OpenAI (also works with Ollama)
- **gradio**: Creates web interfaces with minimal code
- **rich**: Beautiful terminal output with markdown rendering
Install all dependencies with `uv`:
```bash
uv add -r requirements.txt
```
This command automatically creates a virtual environment and installs all packages!
### Environment Variables
Create `.env` file:
```bash
OPENAI_API_KEY=your_api_key_here
```
Inside the `.gitignore`, add:
```txt
.env
*.pyc
```
## The Complete Code
Before we break it down, here's the complete `main.py` file so you can see everything together:
```python
import os
import sys
from dotenv import load_dotenv
from rich.console import Console
from rich.live import Live
from rich.markdown import Markdown
from rich.text import Text
from openai import OpenAI
import gradio as gr
load_dotenv(override=True)
api_key: str = os.getenv('OPENAI_API_KEY')
console = Console()
USE_OLLAMA = "--ollama" in sys.argv
USE_GRADIO = "--gradio" in sys.argv
class SolveTechnicalQuestions:
_system_prompt = ""
def __init__(self, model: str = "gpt-4o-mini") -> None:
self.openai_client = OpenAI()
self._MODEL = model
def get_user_technical_question_prompt(self, question: str):
prompt = f"""
Answer this technical question comprehensively:
Provide:
1. A clear, accurate answer
2. Code examples if relevant
3. Best practices and recommendations
4. Potential pitfalls or considerations
5. Additional resources or references if helpful
Format your response in a structured, easy-to-read manner.
Question: {question}
"""
return prompt
def set_system_prompt(self, system_prompt: str) -> None:
self._system_prompt = system_prompt
def set_endpoint(self, endpoint: str, api_key: str = "ollama") -> None:
self.openai_client = OpenAI(base_url=endpoint, api_key=api_key)
def set_model(self, model: str) -> None:
self._MODEL = model
def start(self, stream=False):
try:
while True:
question = input(">>> ")
if question.strip().lower() in ['quit', 'exit', 'q']:
print("Goodbye!")
break
if not question.strip():
print("Please enter a question.")
continue
message = self.get_user_technical_question_prompt(question.strip())
response = self.openai_client.chat.completions.create(
model=self._MODEL,
messages=[
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": message},
],
stream=stream
)
if stream:
full_response = ""
with Live(Text("🤔 Thinking...", style="italic yellow"),
console=console, refresh_per_second=10) as live:
for chunk in response:
if chunk.choices[0].delta.content:
full_response += chunk.choices[0].delta.content
live.update(Markdown(full_response, style="bold cyan"))
full_response += "\n"
live.update(Markdown(full_response, style="bold cyan"))
else:
full_response = response.choices[0].message.content
console.print(Markdown(full_response, style="bold cyan"))
except KeyboardInterrupt:
print("\nGoodbye!")
except Exception as e:
print(f"Error: {e}")
def start_with_gradio(self, question: str, stream=False):
if not question.strip():
return "Please enter a question."
message = self.get_user_technical_question_prompt(question.strip())
response = self.openai_client.chat.completions.create(
model=self._MODEL,
messages=[
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": message},
],
stream=stream
)
if stream:
full_response = ""
for chunk in response:
if chunk.choices[0].delta.content:
full_response += chunk.choices[0].delta.content
yield full_response
full_response += "\n"
yield full_response
else:
yield response.choices[0].message.content
# Create the system prompt that defines the AI's expertise
technical_system_prompt = """
You are an expert technical assistant with deep knowledge in:
PROGRAMMING & DEVELOPMENT:
- Python, JavaScript, Java, C++, Go, Rust, TypeScript
- Web development (React, Vue, Angular, Node.js)
- Mobile development (iOS, Android, Flutter)
- DevOps (Docker, Kubernetes, CI/CD, AWS, Azure, GCP)
- Database systems (SQL, NoSQL, PostgreSQL, MongoDB)
- Software architecture patterns and best practices
SYSTEMS & INFRASTRUCTURE:
- Operating systems (Linux, Windows, macOS)
- Networking protocols and security
- Cloud computing and distributed systems
- Monitoring, logging, and observability
- Performance optimization and scaling
AI & MACHINE LEARNING:
- Machine learning algorithms and frameworks
- Deep learning (TensorFlow, PyTorch)
- Natural language processing
- Computer vision and image processing
- MLOps and model deployment
RESPONSE GUIDELINES:
1. Provide accurate, up-to-date technical information
2. Include code examples when relevant
3. Explain complex concepts clearly
4. Suggest best practices and alternatives
5. Warn about potential pitfalls or security issues
6. Reference official documentation when appropriate
Always prioritize accuracy and practical applicability in your technical responses.
"""
# Initialize the chatbot
Chat = SolveTechnicalQuestions()
Chat.set_system_prompt(technical_system_prompt)
# Check if using Ollama
if USE_OLLAMA:
console.print(Text("Using Ollama (local)", style="bold green"))
Chat.set_endpoint("http://localhost:11434/v1")
Chat.set_model("llama3.2")
if __name__ == "__main__":
if USE_GRADIO:
gr_output = gr.Markdown(label="Response")
stream_input = gr.Checkbox(label='Stream', value=False)
question_input = gr.Textbox(
label="Question",
info="Ask it any technical question",
lines=1
)
interface = gr.Interface(
fn=Chat.start_with_gradio,
title=" 🧞 Genie",
inputs=[question_input, stream_input],
outputs=[gr_output],
flagging_mode="never"
)
interface.launch(inbrowser=True)
else:
Chat.start(stream=True)
```
Now let's break down each part so you understand how it works!
## Understanding the Code Step by Step
### Step 1: Imports - What Are We Bringing In?
```python
import os
import sys
from dotenv import load_dotenv
from rich.console import Console
from rich.live import Live
from rich.markdown import Markdown
from rich.text import Text
from openai import OpenAI
import gradio as gr
```
**What does each import do?**
| Import | What It Does |
|--------|--------------|
| `os` | Lets us access environment variables (like your API key) |
| `sys` | Lets us read command-line arguments (like `--ollama`) |
| `load_dotenv` | Reads your `.env` file and loads your secrets |
| `Console` | Creates a fancy terminal that can display colors and formatting |
| `Live` | Allows real-time updates in the terminal (for streaming text) |
| `Markdown` | Renders markdown formatting (headers, code blocks, lists) |
| `Text` | Creates styled text with colors |
| `OpenAI` | The client that talks to AI models (works with both OpenAI and Ollama!) |
| `gradio` | Creates web interfaces easily |
### Step 2: Initial Setup
```python
load_dotenv(override=True)
api_key: str = os.getenv('OPENAI_API_KEY')
console = Console()
USE_OLLAMA = "--ollama" in sys.argv
USE_GRADIO = "--gradio" in sys.argv
```
**Line by line explanation:**
1. **`load_dotenv(override=True)`**
- Reads your `.env` file
- Makes `OPENAI_API_KEY` available to your code
- `override=True` means it will update existing variables
2. **`api_key: str = os.getenv('OPENAI_API_KEY')`**
- Gets your API key from the environment
- Stores it in the `api_key` variable
- `: str` is a "type hint" telling Python this should be text
3. **`console = Console()`**
- Creates a Rich console for beautiful terminal output
- We'll use this to display formatted responses
4. **`USE_OLLAMA = "--ollama" in sys.argv`**
- `sys.argv` contains command-line arguments
- If you run `uv run main.py --ollama`, this becomes `True`
- If you run `uv run main.py`, this stays `False`
5. **`USE_GRADIO = "--gradio" in sys.argv`**
- Checks if you ran the script with `--gradio` flag
- If you run `uv run main.py --gradio`, this becomes `True`
- If you run `uv run main.py`, this stays `False`
### Step 3: The Main Class - Our AI Assistant Blueprint
**What is a Class?**
Think of a class like a blueprint for building something. Just like an architect's blueprint describes how to build a house, our `SolveTechnicalQuestions` class describes how to build an AI assistant.
```python
class SolveTechnicalQuestions:
_system_prompt = ""
def __init__(self, model: str = "gpt-4o-mini") -> None:
self.openai_client = OpenAI()
self._MODEL = model
```
**Breaking it down:**
- **`class SolveTechnicalQuestions:`** - We're creating a new blueprint called "SolveTechnicalQuestions"
- **`_system_prompt = ""`** - A "class variable" that stores instructions for the AI. The underscore `_` means "this is internal, don't touch it directly"
- **`def __init__(self, model: str = "gpt-4o-mini") -> None:`**
- This is the "constructor" - it runs when we create a new assistant
- `self` refers to the assistant we're creating
- `model = "gpt-4o-mini"` means "use gpt-4o-mini by default, but let me change it"
- **`self.openai_client = OpenAI()`** - Creates a connection to the AI service
- **`self._MODEL = model`** - Saves which model we want to use
### Step 4: The Prompt Template - Telling AI What We Want
```python
def get_user_technical_question_prompt(self, question: str):
prompt = f"""
Answer this technical question comprehensively:
Provide:
1. A clear, accurate answer
2. Code examples if relevant
3. Best practices and recommendations
4. Potential pitfalls or considerations
5. Additional resources or references if helpful
Format your response in a structured, easy-to-read manner.
Question: {question}
"""
return prompt
```
**Why do we need this?**
Without this template, if you ask "What is async?", the AI might just say "Async means asynchronous." That's not very helpful!
With our template, the same question gets transformed into:
> "Answer this technical question comprehensively. Provide: 1. A clear answer, 2. Code examples... Question: What is async?"
Now the AI knows to give you a full explanation with examples!
**The `f"""..."""` syntax:**
- `f` means "f-string" - Python will replace `{question}` with your actual question
- Triple quotes `"""` let us write multiple lines
### Step 5: Configuration Methods - Customizing Our Assistant
```python
def set_system_prompt(self, system_prompt: str) -> None:
self._system_prompt = system_prompt
def set_endpoint(self, endpoint: str, api_key: str = "ollama") -> None:
self.openai_client = OpenAI(base_url=endpoint, api_key=api_key)
def set_model(self, model: str) -> None:
self._MODEL = model
```
**What each method does:**
| Method | Purpose | Example |
|--------|---------|---------|
| `set_system_prompt()` | Tell the AI what kind of expert it should be | "You are a Python expert..." |
| `set_endpoint()` | Switch between OpenAI and Ollama | `"http://localhost:11434/v1"` for Ollama |
| `set_model()` | Change which AI model to use | `"llama3.2"` or `"gpt-4"` |
**Why is `set_endpoint()` powerful?**
The OpenAI SDK can talk to ANY service that speaks the same "language" (API format). Ollama mimics OpenAI's format, so we just point to a different address and it works!
### Step 6: Terminal Interface - The Heart of Our App
This is where the magic happens! Let's break it down piece by piece:
```python
def start(self, stream=False):
try:
while True:
question = input(">>> ")
```
**The Loop:**
- `while True:` creates an infinite loop - keeps asking questions until you exit
- `input(">>> ")` shows `>>>` and waits for you to type something
- Your question gets stored in the `question` variable
```python
if question.strip().lower() in ['quit', 'exit', 'q']:
print("Goodbye!")
break
if not question.strip():
print("Please enter a question.")
continue
```
**Input Validation:**
- `question.strip()` removes extra spaces from the beginning and end
- `.lower()` converts to lowercase so "QUIT" and "quit" both work
- `break` exits the loop
- `continue` skips to the next iteration (asks for a new question)
```python
message = self.get_user_technical_question_prompt(question.strip())
response = self.openai_client.chat.completions.create(
model=self._MODEL,
messages=[
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": message},
],
stream=stream
)
```
**Calling the AI:**
This is how we talk to ChatGPT/Ollama! The `messages` list contains:
| Role | What It Means | Example |
|------|---------------|---------|
| `system` | Instructions for the AI's behavior | "You are an expert in Python..." |
| `user` | The question being asked | "How do I read a CSV file?" |
Think of it like this:
- **System message** = Hiring instructions for an employee
- **User message** = The task you're asking them to do
**What is Streaming?**
Imagine ordering food at a restaurant:
- **Without streaming**: You wait 20 minutes, then get all your food at once
- **With streaming**: You get appetizers first, then main course, then dessert as they're ready
With AI:
- **Without streaming (`stream=False`)**: Wait for entire response, then display it
- **With streaming (`stream=True`)**: See words appear as the AI "thinks" them
```python
if stream:
full_response = ""
with Live(Text("🤔 Thinking...", style="italic yellow"),
console=console, refresh_per_second=10) as live:
for chunk in response:
if chunk.choices[0].delta.content:
full_response += chunk.choices[0].delta.content
live.update(Markdown(full_response, style="bold cyan"))
full_response += "\n"
live.update(Markdown(full_response, style="bold cyan"))
```
**Streaming Explained:**
1. `Live(...)` creates a "live display" that can update in real-time
2. Shows "🤔 Thinking..." while waiting for the first words
3. `for chunk in response:` - AI sends response in small pieces (chunks)
4. We add each chunk to `full_response` and update the display
5. The user sees text appearing word by word - just like ChatGPT!
```python
else:
full_response = response.choices[0].message.content
console.print(Markdown(full_response, style="bold cyan"))
```
**Non-Streaming:**
- `response.choices[0].message.content` gets the complete response
- Display it all at once with pretty markdown formatting
```python
except KeyboardInterrupt:
print("\nGoodbye!")
except Exception as e:
print(f"Error: {e}")
```
**Error Handling:**
- `KeyboardInterrupt` catches Ctrl+C so we exit gracefully
- `Exception` catches any other errors and shows what went wrong
### Step 7: Web Interface Method - For Gradio
```python
def start_with_gradio(self, question: str, stream=False):
if not question.strip():
return "Please enter a question."
message = self.get_user_technical_question_prompt(question.strip())
response = self.openai_client.chat.completions.create(
model=self._MODEL,
messages=[
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": message},
],
stream=stream
)
if stream:
full_response = ""
for chunk in response:
if chunk.choices[0].delta.content:
full_response += chunk.choices[0].delta.content
yield full_response
full_response += "\n"
yield full_response
else:
yield response.choices[0].message.content
```
**What is `yield`?**
This is a common confusion for beginners! Let me explain:
- **`return`** = "Here's your answer, I'm done!" (function stops)
- **`yield`** = "Here's a partial answer, I'll give you more soon!" (function pauses)
**Why use `yield` for Gradio?**
Gradio needs to update the webpage as new text arrives. With `yield`:
1. First chunk arrives → yield "Hello" → Gradio displays "Hello"
2. Second chunk arrives → yield "Hello, how" → Gradio displays "Hello, how"
3. Third chunk arrives → yield "Hello, how are you?" → Gradio displays "Hello, how are you?"
This creates the streaming effect in the web browser!
**Difference from Terminal Method:**
- Terminal: Uses `Live()` to update the display
- Gradio: Uses `yield` to send updates to the webpage
### Step 8: The System Prompt - Giving Our AI a Job Description
**What is a System Prompt?**
Think of it like hiring an employee. Before they start working, you tell them:
- What their job is
- What they should be good at
- How they should respond to customers
That's exactly what a system prompt does for AI!
```python
technical_system_prompt = """
You are an expert technical assistant with deep knowledge in:
PROGRAMMING & DEVELOPMENT:
- Python, JavaScript, Java, C++, Go, Rust, TypeScript
- Web development (React, Vue, Angular, Node.js)
...
"""
```
**Why is this important?**
| Without System Prompt | With System Prompt |
|----------------------|-------------------|
| AI gives generic, short answers | AI gives detailed, technical answers |
| May not include code examples | Always includes code when relevant |
| Might miss important warnings | Warns about pitfalls and security |
**You can customize this!** Want a Python-only assistant? Change the prompt:
```python
technical_system_prompt = """
You are a Python expert. Only answer Python questions.
Always provide working code examples with comments.
"""
```
### Step 9: Creating and Configuring the Assistant
```python
Chat = SolveTechnicalQuestions()
Chat.set_system_prompt(technical_system_prompt)
if USE_OLLAMA:
console.print(Text("Using Ollama (local)", style="bold green"))
Chat.set_endpoint("http://localhost:11434/v1")
Chat.set_model("llama3.2")
```
**Line by line:**
1. **`Chat = SolveTechnicalQuestions()`**
- Creates a new assistant using our blueprint
- Like building a house from our architectural plans
- `Chat` is now a fully functional AI assistant object
2. **`Chat.set_system_prompt(technical_system_prompt)`**
- Gives our assistant its "job description"
- Now it knows to be a technical expert
3. **`if USE_OLLAMA:`**
- Remember, `USE_OLLAMA` is `True` if you ran `uv run main.py --ollama`
- This block only runs when using Ollama
4. **`Chat.set_endpoint("http://localhost:11434/v1")`**
- Points to Ollama running on your computer
- `localhost` means "this computer"
- `11434` is the port Ollama listens on
5. **`Chat.set_model("llama3.2")`**
- Uses the Llama 3.2 model we downloaded earlier
### Step 10: Main Execution - Starting Our App
```python
if __name__ == "__main__":
```
**What does this mean?**
This is Python's way of saying "only run this code if I'm the main file being executed."
- If you run `uv run main.py` → This code runs ✅
- If another file imports `main.py` → This code doesn't run ❌
This is a common Python pattern you'll see everywhere!
```python
if USE_GRADIO:
gr_output = gr.Markdown(label="Response")
stream_input = gr.Checkbox(label='Stream', value=False)
question_input = gr.Textbox(
label="Question",
info="Ask it any technical question",
lines=1
)
```
**Creating Gradio Components:**
| Component | What It Creates | Purpose |
|-----------|----------------|---------|
| `gr.Markdown` | A text area that renders markdown | Shows AI's response |
| `gr.Checkbox` | A checkbox | Toggle streaming on/off |
| `gr.Textbox` | A text input field | Where you type questions |
```python
interface = gr.Interface(
fn=Chat.start_with_gradio,
title=" 🧞 Genie",
inputs=[question_input, stream_input],
outputs=[gr_output],
flagging_mode="never"
)
interface.launch(inbrowser=True)
```
**Building the Interface:**
- `fn=Chat.start_with_gradio` → When user submits, call this function
- `title=...` → The title shown at the top of the page
- `inputs=[...]` → What the user can input (question + stream checkbox)
- `outputs=[...]` → Where to show the response
- `flagging_mode="never"` → Disables the "flag" button (not needed here)
- `launch(inbrowser=True)` → Start the server and open browser automatically
```python
else:
Chat.start(stream=True)
```
**The Simple Path:**
If you didn't use the `--gradio` flag, just start the terminal interface with streaming enabled!
## Setting Up With OpenAI
### Get Your API Key
1. Go to [https://platform.openai.com](https://platform.openai.com)
2. Create account and add payment method
3. Navigate to Settings → API Keys
4. Create new key and copy it
5. Add to `.env` file:
```bash
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxx
```
### Running with OpenAI
No code changes needed! Just run:
```bash
uv run main.py
```
Your code uses OpenAI by default with the `gpt-4o-mini` model.
**Try other models:**
```python
Chat.set_model("gpt-4") # Better quality, more expensive
Chat.set_model("gpt-3.5-turbo") # Faster, cheaper
```
## Setting Up With Ollama
### What is Ollama?
Ollama runs AI models locally on your computer:
- ✅ Completely free
- ✅ Works offline
- ✅ Private (data never leaves your computer)
### Installing Ollama
1. **Download:** Visit [https://ollama.ai](https://ollama.ai)
- Windows/macOS: Download and run installer
- Linux: `curl -fsSL https://ollama.ai/install.sh | sh`
2. **Verify:**
```bash
ollama --version
```
3. **Download Model:**
```bash
ollama pull llama3.2
```
4. **Test:**
```bash
ollama run llama3.2
```
Type a question to verify it works, then `/bye` to exit.
### Running with Ollama
Use the `--ollama` flag:
```bash
uv run main.py --ollama
```
You'll see: **"Using Ollama (local)"** in green text.
The `USE_OLLAMA` variable in the code detects this flag and automatically:
- Sets endpoint to `http://localhost:11434/v1`
- Changes model to `llama3.2`
## Running in the Terminal
Simply run without the `--gradio` flag:
```bash
# With OpenAI:
uv run main.py
# With Ollama:
uv run main.py --ollama
```
### Using the Interface
```
>>> How do I read a CSV file in Python?
```
You'll see "🤔 Thinking..." then the response streams in with:
- Syntax-highlighted code blocks
- Formatted markdown
- Bold headers and lists

### Example Questions
```
>>> What is a Python decorator?
>>> Explain async/await with examples
>>> How do I handle exceptions in async code?
>>> What are React hooks?
```
**Exit:** Type `quit`, `exit`, `q`, or press `Ctrl+C`
## Running With Gradio (Web Interface)
Use the `--gradio` flag:
```bash
# With OpenAI:
uv run main.py --gradio
# With Ollama:
uv run main.py --gradio --ollama
```
Your browser opens automatically to `http://127.0.0.1:7860`

### Using the Web Interface
1. Type your question in the text box
2. Check/uncheck "Stream" for real-time updates
3. Click Submit
4. See markdown-formatted response
### Sharing Your Interface
Add `share=True`:
```python
interface.launch(inbrowser=True, share=True)
```
Gradio creates a public URL valid for 72 hours that anyone can access!
## Conclusion
You've built a fully functional AI assistant with:
- ✅ Two interfaces (terminal & web)
- ✅ Two AI providers (OpenAI & Ollama)
- ✅ Beautiful markdown output
- ✅ Streaming responses
- ✅ Easy switching with command-line flags
### Enhancement Ideas
- Add conversation memory
- Connect to your documentation (RAG)
- Save conversations to file
- Deploy to a server
- Add voice interface
This is just the first article from my Andela GenAI journey. Stay tuned for more practical AI projects!
### Resources
- [OpenAI API Docs](https://platform.openai.com/docs)
- [Ollama](https://ollama.ai)
- [Gradio Docs](https://www.gradio.app/docs)
- [Rich Library](https://rich.readthedocs.io/)
---
*If this helped you, share it with others! Watch for my next article where I'll dive deeper into GenAI applications.*