# Dataset 解讀
Data link: https://drive.google.com/drive/folders/1O2S2Ej15L-szub_ZHdm86VTlnnok1n79 (provided by TA)
內容包含8組dataset:subj01-08
每一組dataset的內容如下
- training split
- fMRI
- images
- test split
- fMRI
- ROI mask
- space
- mapping
以下說明各檔案內容,也會加註讀取方式
開頭先加入以下程式碼:
```python=
from google.colab import drive # let colab can access your drive
drive.mount('/content/drive')
import numpy as np
import csv
```
以下的檔案路徑皆為簡化過的,直接複製會出錯
我用'..'來把重複的東西替換掉了,例如:
```python=
np.load('../subj01/training_split/training_fmri/lh_training_fmri.npy')
```
可能要改成
```python=
np.load('/content/drive/MyDrive/2023-Machine-Learning-Dataset/subj01/training_split/training_fmri/lh_training_fmri.npy')
```
## fMRI
npy檔,分為lh, rh(左右半球)
二維陣列
Rows代表圖片張數,訓練集5000、測試集150
Columns代表voxel(體素),lh 19004, rh 20544
e.g. training.lh (5000, 19004), test.rh (150, 20544)
```python=
test = np.load('../subj01/training_split/training_fmri/lh_training_fmri.npy')
print(test.shape)
print(test)
```
## ROI mask
一維陣列,其值代表對應的region(見mapping)
包含以下檔案:
> all-vertices_fsaverage_space
floc-bodies_fsaverage_space
floc-bodies_space
floc-faces_fsaverage_space
floc-faces_space
floc-places_fsaverage_space
floc-places_space
floc-words_fsaverage_space
floc-words_space
prf-visualrois_fsaverage_space
prf-visualrois_space
streams_fsaverage_space
streams_space
```python=
test = np.load('../subj01/roi_masks/lh.all-vertices_fsaverage_space.npy')
print(test.size)
print(test)
```
size:
with fsaverage (163842)
lh without fsaverage (19004)
rh without fsaverage (20544)
### mapping
包含6個檔案,分別對應不同ROI mask
描述ROI mask中的值所代表的region
```python=
test = np.load('../subj01/roi_masks/mapping_streams.npy', allow_pickle=True)
print(test)
```
> mapping_floc-bodies
{0: 'Unknown', 1: 'EBA', 2: 'FBA-1', 3: 'FBA-2', 4: 'mTL-bodies'}
>
> mapping_floc-faces
{0: 'Unknown', 1: 'OFA', 2: 'FFA-1', 3: 'FFA-2', 4: 'mTL-faces', 5: 'aTL-faces'}
>
> mapping_floc-places
{0: 'Unknown', 1: 'OPA', 2: 'PPA', 3: 'RSC'}
>
> mapping_floc-words
{0: 'Unknown', 1: 'OWFA', 2: 'VWFA-1', 3: 'VWFA-2', 4: 'mfs-words', 5: 'mTL-words'}
>
> mapping_prf-visualrois
{0: 'Unknown', 1: 'V1v', 2: 'V1d', 3: 'V2v', 4: 'V2d', 5: 'V3v', 6: 'V3d', 7: 'hV4'}
>
> mapping_streams
{0: 'Unknown', 1: 'early', 2: 'midventral', 3: 'midlateral', 4: 'midparietal', 5: 'ventral', 6: 'lateral', 7: 'parietal'}
## image infos
TA另外補充的,獨立於8個dataset之外
csv檔
二維陣列,shape(5001, 135)
Rows代表訓練集的5000張圖片
Columns為特徵,e.g. 'person', 'bicycle' ...
值為(0, 1),代表該圖片是否符合該特徵
```python=
dataroot = '../image_infos/subj01_infos_train.csv'
test = []
with open(dataroot, newline='') as csvfile:
test = np.array(list(csv.reader(csvfile)))
print(test.shape)
print(test)
```