# Assignment 4
### Assignment specifications
Deadline: 2024/06/02 23:59
Office hour:2024/05/31 16:30 - 18:30 (65410)
Presentaion format:ppt
> The list of oral presentations for 2024/06/06, will be announced on Moodle on 2024/06/05.
### Objective :Statistical and Decoding Analysis on Motor Imagery EEG Dataset
### Dataset: BCI Competition IV 2a
The dataset contains data on motor imagery tasks involving the imagination of movement of the **left hand (class 1)**, **right hand (class 2)**, **both feet (class 3)**, and **tongue (class 4)**.
>[reference paper](https://www.bbci.de/competition/iv/desc_2a.pdf)
### Download dataset
> [NAS](https://140.116.246.33:5001)
> 使用者帳戶: Neuroimaging
> 密碼:baisp65410
> 資料夾:Assignment 4
>
> You will see three files named `subject1.npz`, `subject2.npz`, and `subject3.npz` in the `Assignment 4` folder, which were selected from the BCI Competition IV 2a dataset.
### Preprocessing
#### Load data
Below is a Python example where you can use `numpy` to open a `.npz` file.
```python
import numpy as np
data = np.load('./subject1.npz')
```
___
#### Split data
The names convention used for each datafile is the next:
**'s'**:contains all the raw data in a numpy array format.
If you want to extract EEG signals from specific channels, please refer to the example code below.
```python
EEG_signal = data['s']
channel_C3 = EEG_signal[:, 7]
```
> The index 7 represents the channel C3. For the info of each channel, please read the original paper.
> 
The data contains the following three items for all events:
**'etyp'**: types of all events.
**'epos'**: sample points of onset times of all events.
**'edur'**: durations of all events.
In other words, the *i*-th elements of data['etyp'], data['epos'], and data['edur'] indicate the type, time point, and duration of the *i*-th event.
```python
event_type = data['etyp']
event_time = data['epos']
event_dur = data['edur']
combined = np.stack((event_type[6:10], event_time[6:10], event_dur[6:10]), axis=-1)
print(combined)
```
```shell
[[[ 32766 96160 0]]
[[ 768 96510 1875]]
[[ 769 97010 313]]
[[ 768 98513 1875]]]
```
To understand the meaning of the values in the first column, please see the **Table 2**.

In the second row, the values are `[768, 96510, 1875]`, indicating that a start of trial occurs at sample point `96510` and continues for `1875` ms. In the third row, the values are `[769, 97010, 313]`, indicating that a left-hand cue onset occurs at sample point `97010` and continues for `313` ms. Thus, the duration of this two events is `500` sample points. Since the sampling rate is 250 Hz, the time duration between the two sample points `96510` and `97010` is actually `2` seconds.
Now, let's take a look at the **Figure 2**, which shows the experimental paradigm.

The beep represents the start of a trial and its onset is at 0 s. The yellow box indicates the duration of a cue and the start of cue is at 2 s. The duration between 0 to 2 s is fixation cross. Thus, the time between `96510` and `97010` sample points corresponds to the box of Fixation Cross in the **Figure 2**.
Since our purpose is to classify the type of motor imagery from EEG data, we need to epoch EEG data from the blue parts (Motor Imagery). However, we don't know the actual timing that a particiant started to do motor imagery because participants were free to explore. In this research, the 'Cue' is only to prompt the subject on the category of motor imagery to focus on. The duration between `receiving the cue` and `concentrating on imaginging the specified category of movement` may be different across trials and/or across subjects.
___
### Data Analysis Tasks
After you complete the previous steps, you should have four classes of data for each subject and each class has **72** epochs.
##### **Task 1: Statistical analysis**
For each subject in the four classes, please calculate the mean power in the $\delta$ $band$ (0-4 Hz), $\theta$ $band$ (4-7 Hz), and $\alpha$ $band$ (8-12 Hz). Additionally, analyze whether there are significant differences between the classes.



Above three images show the mean power and standard error for various classes across different bands using the **C3 channel** for illustration. Please also practice with the **Cz** and **C4** channels.
___
##### **Task 2: Decoding analysis**
Previously, you have obtained epochs of Motor Imagery EEG data. Next, please further divide each epoch into multiple segments, each segment is 50-ms long. (Assuming your Motor Imagery lasts for **4 seconds**, there will be **80 segments**.) For the *i*-th segments of all epochs, you can calculate the 5-fold cross-validation (CV) accuracy. After you compute the 5-fold CV accuracy for each of the 80 segments, you will be able to determine which time segment within the 4-second Motor Imagery period has a significant impact on classification accuracy. The following is an example to illustrate the analysis results.

The above results were obtained using LSTM, with training data selected from the raw EEG signals of C3 channel.
In this example, no preprocessing was applied. You are free to add any preprocessing steps and observe whether the preprocessing steps improve accuracy.
In this example, raw signals were applied as input of LSTM model. You can also calculate features from epochs and apply them as input of machine learning models. For example, you can select the features that show significant differences in **Task 1** and then perform the 5-fold CV on the selected features with any type of classifier, such as support vector machine, fully connected neural network, and xgBoost. You can examine whether the feature at a specific time segment exhibits EEG changes that are easier to classify.
In sum, in task 2, you need to **present the 5-fold cross validation accuracy of each segment in your report** (Note: The example only shows results of 8 segments). Don't forget to tell us your analysis methods and what you observe from your analysis results. Then, try your best to explain your observations.
___
##### **Task 3**
In Tasks 1 and 2, you already compare the differences between classes for each individual subject. In Task 3, please conduct an analysis across subjects. Refer to your findings from Task 1 and Task 2, and discuss the relevant observations in your presentation. In this task, statistical methods are not required. Good visualization approaches may help you to compare results. For example, you can overlay or align the results of different participants.
___
### Bonus
In the `Assignment 4` folder, you will see another folder named `bonus`. In this folder, there are also three data files named `subject1_bonus.npz`, `subject2_bonus.npz`, and `subject3_bonus.npz`. Their names correspond one-to-one with the previous files. The difference is that the event types (769, 770, 771, 772) in these files have all been changed to **783**.
You can use the results of your previous statistical analysis to determine which of the classes (769, 770, 771, 772) each 783 event truly corresponds to. Alternatively, you can use machine learning (ML) or deep learning (DL) methods to build a classification model for each subject. Extract each 783 event and input it into the model for prediction.
If you take on the bonus challenge, please be sure to include the **following table** in your presentation and replace each **?** with the actual classes (**769, 770, 771, 772**).
**Table**
| | subject1_bonus | subject2_bonus | subject3_bonus |
| --------- | -------------- | -------------- | -------------- |
| 1st 783 | 769 | 769 | 769 |
| 2nd 783 | 770 | 770 | 770 |
| 3rd 783 | 770 | 770 | 770 |
| 4th 783 | 769 | 769 | 769 |
| 5th 783 | 770 | 770 | 770 |
### Grading Criteria:
* Report quality (50%)
* Analysis results (50%)
* Bonus (30%) - 2 points for each correct answer
If you encounter any difficulties with this assignment, don't hesitate to ask questions on Moodle, or you can seek assistance from the TA.
Email of TA : P76121152@gs.ncku.edu.tw (陳柏儒)