# Database Demo
**Recommended Leanring Resources**
Materials provided by Prof Chiu
**Outline**
1. Database Background
2. Raw Data Overview
3. Normalize Database Using Python
4. Normalization Check
5. Connect to SQLite
6. Check SQLite Connection
**Color Notation**
:::warning
π€ **Think**
:::
:::danger
βΌοΈ **Important Concept**
:::
:::info
π **Case Study, Scenarios**
:::
:::success
π€ **AI Integration**
:::
---
## 01 Database Background
**The board** asked the executive leaders to propose methods to improve hospital profit by increasing revenue or reducing the cost.
**The executive leaders** of the hospital follow a data-driven approach and decide to find out the proposals through the hospital management system.
**As a data analyst**, you receive the following tables and you notice that the tables are not normalized. Thus, you need to first normaliz the database and provide a dashboard that could support the decision-making.
**Repo:** https://github.com/PoYaSharonLin/PDDS/tree/Dash-Demo-Dev
## 02 Raw Data Overview
**Data:**
https://github.com/PoYaSharonLin/PDDS/blob/Dash-Demo-Dev/src/system_data.xlsx
**Patient Table**
```
DataFrame from sheet: patient
PatientID PatientName Age Gender BloodType PrimaryDiagnosis AssignedDoctors TotalExaminations LastVisitDate HospitalStayDays ContactNumber
0 P001 John Smith 45 Male O+ Hypertension D001, D003 5 2025-06-15 3 555-123-4567
1 P002 Mary Johnson 32 Female A- Diabetes D002 3 2025-07-01 1 555-987-6543
2 P003 Alex Brown 60 Male B+ Heart Disease D001, D004 8 2025-07-08 5 555-456-7890
3 P004 Sarah Davis 28 Female AB+ Asthma D003 2 2025-05-20 0 555-321-0987
4 P005 Emma Wilson 50 Female O- Arthritis D002, D004 4 2025-06-30 2 555-654-3210
```
**Doctor Table**
```
DataFrame from sheet: doctor
DoctorID DoctorName Specialty YearsExperience AssignedPatients ExaminationsOrdered AvgPatientAge Department ContactEmail
0 D001 Dr. Alice Lee Cardiology 12 P001, P003 10 52.5 Heart alice.lee@hospital.com
1 D002 Dr. Bob Carter Endocrynology 8 P002, P005 6 41.0 General bob.carter@hospital.com
2 D003 Dr. Clara Wong Pulmonology 15 P001, P004 4 36.5 Lung clara.wong@hospital.com
3 D004 Dr. David Kim Orthopedics 10 P003, P005 5 55.0 Bones david.kim@hospital.com
```
**Examination Table**
```
DataFrame from sheet: examination
ExamID ExamType PatientIDs PatientNames DoctorID DoctorName ExamDate ExamCost ResultStatus DurationMinutes
0 E001 Blood Test P001, P002 John Smith, Mary Johnson D001 Dr. Alice Lee 2025-06-10 100 Normal, Normal 30
1 E002 ECG P003 Alex Brown D001 Dr. Alice Lee 2025-07-05 150 Abnormal 45
2 E003 X-Ray P004, P001 Sarah Davis, John Smith D003 Dr. Clara Wong 2025-05-18 200 Normal, Normal 20
3 E004 Blood Test P002, P005 Mary Johnson, Emma Wilson D002 Dr. Bob Carter 2025-06-28 100 Abnormal, Pending 30
4 E005 MRI P005, P003 Emma Wilson, Alex Brown D004 Dr. David Kim 2025-06-29 600 Pending, Pending 60
5 E006 Ultrasound P001 John Smith D003 Dr. Clara Wong 2025-06-12 120 Normal 40
6 E007 Blood Test P003, P004, P005 Alex Brown, Sarah Davis, Emma Wilson D001 Dr. Alice Lee 2025-07-07 150 Normal, Normal, Normal 30
```
**Table Relationship**
- 1 patient can have many doctors, and 1 doctor can have many patients (many-to-many)
- 1 patient can have many examinations, and 1 examination can involve many patients (many-to-many)
- 1 doctor can order many examinations, and 1 examination can be ordered by many doctors (many-to-many).
:::warning
π€ **What are some issues of the current tables?**
:::: spoiler **π₯Έ Ans**
1. Many-to-many relation needs a separate table
2. Should be in attoic value e.g. PatientName, Date etc.
:::
## 03 Normalize Tables Using Python
**Scipt:**
```bash
python3 preprocessing/normalization.py
```
**Code:**
https://github.com/PoYaSharonLin/PDDS/blob/Dash-Demo-Dev/preprocessing/normalization.py
### 1. Many-to-many relation needs a separate table
| Table Name | Primary Key (PK) |
| -------- | -------- |
| patient | patientID |
| patient_doctor | patientID & doctorID |
**Code**
```python=
patient = sheets["patient"].drop(columns=['AssignedDoctors']).copy()
patient_doctor = sheets["patient"][["PatientID", "AssignedDoctors"]].copy()
patient_doctor["AssignedDoctors"] = patient_doctor["AssignedDoctors"].str.split(",").apply(lambda id_list: [id.strip() for id in id_list]) # noqa: E501
# .str.split(",")
# input: 'P001, P003'
# output: ['P001', ' P003']
# apply(lambda id_list: [id.strip() for id in id_list])
# apply a function to id_list, what the function does is defined in []
# input: ['P001', ' P003']
# output: ['P001', 'P003']
patient_doctor = patient_doctor.explode("AssignedDoctors")
# input: D001 with ['P001', 'P003']
# output: D001, P001 and D001, P003
patient_doctor = patient_doctor.reset_index(drop=True)
```
**Output**
```
print(patient_doctor)
PatientID AssignedDoctors
0 P001 D001
1 P001 D003
2 P002 D002
3 P003 D001
4 P003 D004
5 P004 D003
6 P005 D002
7 P005 D004
```
**ER Model**

::::spoiler ER Model Code
```mermaid=
erDiagram
Patient ||--o{ Patient_Doctor : has
Doctor ||--o{ Patient_Doctor : assigned_to
Patient ||--o{ Patient_Examination : undergoes
Examination ||--o{ Patient_Examination : includes
Doctor ||--o{ Doctor_Examination : conducts
Examination ||--o{ Doctor_Examination : performed_by
Patient {
string PatientID PK
string PatientName
int Age
string Gender
string BloodType
string PrimaryDiagnosis
int TotalExaminations
date LastVisitDate
int HospitalStayDays
string ContactNumber
}
Doctor {
string DoctorID PK
string DoctorName
string Specialty
int YearsExperience
float AvgPatientAge
string Department
string ContactEmail
}
Examination {
string ExamID PK
string ExamType
date ExamDate
int ExamCost
int DurationMinutes
}
Patient_Doctor {
string PatientID FK
string AssignedDoctors FK
}
Patient_Examination {
string ExamID FK
string PatientIDs FK
string ResultStatus
}
Doctor_Examination {
string ExamID FK
string DoctorID FK
}
```
::::
### 2. Should be in attoic value e.g. PatientName, Date etc.
**Code**
```python=
patient[['FirstName', 'LastName']] = patient['PatientName'].str.split(' ', expand=True) # noqa: E501
patient['LastVisitDate'] = pd.to_datetime(patient['LastVisitDate'])
patient['Year'] = patient['LastVisitDate'].dt.year
patient['Month'] = patient['LastVisitDate'].dt.month
patient['Day'] = patient['LastVisitDate'].dt.day
patient = patient.drop(['PatientName', 'LastVisitDate'], axis=1)
print(patient)
```
**Output**
```python=
PatientID Age Gender BloodType PrimaryDiagnosis TotalExaminations HospitalStayDays ContactNumber FirstName LastName Year Month Day
0 P001 45 Male O+ Hypertension 5 3 555-123-4567 John Smith 2025 6 15
1 P002 32 Female A- Diabetes 3 1 555-987-6543 Mary Johnson 2025 7 1
2 P003 60 Male B+ Heart Disease 8 5 555-456-7890 Alex Brown 2025 7 8
3 P004 28 Female AB+ Asthma 2 0 555-321-0987 Sarah Davis 2025 5 20
4 P005 50 Female O- Arthritis 4 2 555-654-3210 Emma Wilson 2025 6 30
```
**ER Model**

::::spoiler ER Model Code
```
erDiagram
Patient ||--o{ Patient_Doctor : has
Doctor ||--o{ Patient_Doctor : assigned_to
Patient ||--o{ Patient_Examination : undergoes
Examination ||--o{ Patient_Examination : includes
Doctor ||--o{ Doctor_Examination : conducts
Examination ||--o{ Doctor_Examination : performed_by
Patient {
string PatientID PK
string FirstName
string LastName
int Age
string Gender
string BloodType
string PrimaryDiagnosis
int TotalExaminations
int HospitalStayDays
string ContactNumber
int Year
int Month
int Day
}
Doctor {
string DoctorID PK
string Title
string FirstName
string LastName
string Specialty
int YearsExperience
float AvgPatientAge
string Department
string ContactEmail
}
Examination {
string ExamID PK
string ExamType
int ExamCost
int DurationMinutes
int Year
int Month
int Day
}
Patient_Doctor {
string PatientID FK
string AssignedDoctors FK
}
Patient_Examination {
string ExamID FK
string PatientID FK
string ResultStatus
}
Doctor_Examination {
string ExamID FK
string DoctorID FK
}
```
::::
## 04 Normalization Check
:::success
**π€ Prompt**
does this ER model pass 1NF, 2NF, 3NF, and BCNF? If so, please provide the checklist
:::
**Tables Check**
| Table | 1NF | 2NF | 3NF | BCNF | Notes |
|----------------------|-----|-----|-----|------|----------------------------------------------------------------------|
| Patient | β | β | β | β | Atomic values, single PK (PatientID), no partial or transitive dependencies. |
| Doctor | β | β | β | β | Atomic values, single PK (DoctorID), no partial or transitive dependencies. |
| Examination | β | β | β | β | Atomic values, single PK (ExamID), no partial or transitive dependencies. |
| Patient_Doctor | β | β | β | β | Junction table, composite PK, no non-key attributes. |
| Patient_Examination | β | β | β | β | Junction table, composite PK, ResultStatus depends on full key. |
| Doctor_Examination | β | β | β | β | Junction table, composite PK, no non-key attributes. |
**Example**
Patient Table
Attributes: PatientID (PK), FirstName, LastName, Age, Gender, BloodType, PrimaryDiagnosis, TotalExaminations, HospitalStayDays, ContactNumber, Year, Month, Day
- **1NF**
- Atomic values: All attributes (e.g., FirstName, Age, BloodType) are atomic. The original PatientName and LastVisitDate were split into atomic components (FirstName, LastName, Year, Month, Day) in the provided code.
- No repeating groups: No multi-valued attributes or lists (e.g., AssignedDoctors was moved to the Patient_Doctor table).
- Primary key: PatientID is the primary key.
- **Result: Satisfies 1NF.**
- **2NF**
- The primary key is a single attribute (PatientID), so there are no composite keys, and thus no partial dependencies are possible.
- All non-key attributes (FirstName, LastName, Age, etc.) depend fully on PatientID.
- **Result: Satisfies 2NF.**
- **3NF**
- Check for transitive dependencies: No non-key attribute depends on another non-key attribute.
- For example, FirstName, LastName, Age, etc., depend only on PatientID.
Year, Month, and Day (from LastVisitDate) describe the last visit date for the patient and depend only on PatientID, not on other non-key attributes.
TotalExaminations might seem like it could depend on examinations, but itβs a summary attribute specific to the patient (e.g., a count of exams), so it depends on PatientID.
- **Result: Satisfies 3NF, assuming no hidden dependencies (e.g., ContactNumber doesnβt depend on Age).**
- **BCNF**
- All functional dependencies must have a superkey on the left-hand side.
Dependencies: PatientID β {FirstName, LastName, Age, Gender, BloodType, PrimaryDiagnosis, TotalExaminations, HospitalStayDays, ContactNumber, Year, Month, Day}.
- Since PatientID is the primary key (and thus a superkey), all dependencies satisfy BCNF.
- **Result: Satisfies BCNF.**
## 05 Connect to SQLite
**Scipt:**
```
python3 preprocessing/DBconnection.py
```
**Code:**
https://github.com/PoYaSharonLin/PDDS/blob/Dash-Demo-Dev/preprocessing/DBconnection.py
```python
conn = sqlite3.connect('hospital_database.db')
cursor = conn.cursor()
# Create tables in SQLite
cursor.execute('''
CREATE TABLE IF NOT EXISTS patient (
PatientID INTEGER PRIMARY KEY,
FirstName TEXT,
LastName TEXT,
Year INTEGER,
Month INTEGER,
Day INTEGER
)
''')
cursor.execute('''
CREATE TABLE IF NOT EXISTS doctor (
DoctorID INTEGER PRIMARY KEY,
Title TEXT,
FirstName TEXT,
LastName TEXT
)
''')
cursor.execute('''
CREATE TABLE IF NOT EXISTS examination (
ExamID INTEGER PRIMARY KEY,
ExamType TEXT,
Year INTEGER,
Month INTEGER,
Day INTEGER
)
''')
cursor.execute('''
CREATE TABLE IF NOT EXISTS patient_doctor (
PatientID INTEGER,
DoctorID INTEGER,
PRIMARY KEY (PatientID, DoctorID),
FOREIGN KEY (PatientID) REFERENCES patient(PatientID),
FOREIGN KEY (DoctorID) REFERENCES doctor(DoctorID)
)
''')
cursor.execute('''
CREATE TABLE IF NOT EXISTS patient_examination (
ExamID INTEGER,
PatientID INTEGER,
ResultStatus TEXT,
PRIMARY KEY (ExamID, PatientID),
FOREIGN KEY (ExamID) REFERENCES examination(ExamID),
FOREIGN KEY (PatientID) REFERENCES patient(PatientID)
)
''')
cursor.execute('''
CREATE TABLE IF NOT EXISTS doctor_examination (
ExamID INTEGER,
DoctorID INTEGER,
PRIMARY KEY (ExamID, DoctorID),
FOREIGN KEY (ExamID) REFERENCES examination(ExamID),
FOREIGN KEY (DoctorID) REFERENCES doctor(DoctorID)
)
''')
# Export DataFrames to SQLite
patient.to_sql('patient', conn, if_exists='replace', index=False)
doctor.to_sql('doctor', conn, if_exists='replace', index=False)
examination.to_sql('examination', conn, if_exists='replace', index=False)
patient_doctor.rename(columns={'AssignedDoctors': 'DoctorID'}).to_sql('patient_doctor', conn, if_exists='replace', index=False) # noqa: E501
patient_examination.rename(columns={'PatientIDs': 'PatientID'}).to_sql('patient_examination', conn, if_exists='replace', index=False) # noqa: E501
doctor_examination.to_sql('doctor_examination', conn, if_exists='replace', index=False) # noqa: E501
# Commit changes and close connection
conn.commit()
conn.close()
```
## 06 Check SQLite Connection using command line
- Go to ./src
- Run `sqlite3 hospital_database.db`

- Check SQLite3 connection
- Run`.tables` should return all tables
- Run `SELECT * FROM patient LIMIT 5;`

- Run other example queries

**SQL**
https://github.com/PoYaSharonLin/PDDS/blob/Dash-Demo-Dev/query.sql
```sql=
-- Insert Patient data
BEGIN;
INSERT INTO patient (FirstName, LastName, Year, Month, Day)
VALUES ('John', 'Doe', 1985, 6, 15);
COMMIT;
SELECT * FROM patient;
-- Retrieve all patients assigned to a specific doctor
SELECT p.FirstName, p.LastName
FROM patient p
JOIN patient_doctor pd ON p.PatientID = pd.PatientID
WHERE pd.DoctorID = "D001";
-- Count the number of examinations per patient
SELECT p.FirstName, p.LastName, COUNT(pe.ExamID) AS ExamCount
FROM patient p
LEFT JOIN patient_examination pe ON p.PatientID = pe.PatientID
GROUP BY p.PatientID, p.FirstName, p.LastName;
-- Find all doctors who performed 'Blood Test'
SELECT d.Title, d.FirstName, d.LastName
FROM doctor d
JOIN doctor_examination de ON d.DoctorID = de.DoctorID
JOIN examination e ON de.ExamID = e.ExamID
WHERE e.ExamType = 'Blood Test';
```
----
## 06 Create DB and Check SQLite Connection using DB Browser
**SQL:**
https://github.com/PoYaSharonLin/PDDS/blob/database-demo/query.sql
**1. Create new database**

**2. Import CSV as data table**

**3. Set table name & import CSV**

**4. Browse Data & Write Changes**

**5. Try out SQL Queries**
Click "Execute SQL" and paste the code
- INSERT
```SQL=
INSERT INTO patient (FirstName, LastName, Year, Month, Day)
VALUES ('John', 'Doe', 1985, 6, 15);
SELECT * FROM patient;
```
Click the "Execute" button the execute the SQL. After executing the SQL, you should see that the new row is inserted


- SELECT
To select the patient who are under the same doctor, we can use the SQL below.
```SQL=
SELECT p.FirstName, p.LastName
FROM patient p
JOIN patient_doctor pd ON p.PatientID = pd.PatientID
WHERE pd.AssignedDoctors = 'D001';
```

:::warning
**Verification**
To verify whether the SQL is correct, let's first take a look at the `patient_doctor` table.
**STEP 1**
The patients whose doctor is D001 are P001 and P003

**STEP 2**
To see whether P001 and P003 are John and Alex, let's check the patient table.

Indeed, both Alex and John are under the same doctor D001, Alice Lee.
:::