fmri2text - HackMD

# fmri2text - mind retreat - https://gitlab.inria.fr/parietal/non_continuous_decoding - design matrix of LM features, BOLD encoding - generative model, missing predicted / actual bold comparison ## 24 Octobre 2022 - init github repo - [x] refactor utils file @athual - [x] fix fir util @ccaucheteux - [x] run decoding on multiple TRs @ccaucheteux - setup narratives on dragostore - [x] get mean subject left and right - [ ] script datalad configuration - certificate error preventing download - project narrative subjects on FUGW barycenter - alignment data - can it be the same as the encoding data? - idea - PCA on LM features, trained on text that is independant from our transcripts - align subjects using beta coefficients of encoding model fitted on PCs - this would allow alining subjects who listened to different stories? - modules - barycenter - inputs - list of subjects - v0 who were scanned listening to the same story - v1 who have been scanned listening to different stories - story name - outputs - OT plan from each subject to barycentric subject - subject's features transported to barycentric subject - PCA transformer fitted on LM features - alignment - inputs - subject who was scanned listening to the story used to compute barycenter - barycentring subject (BOLD) - PCA transformer fitted on LM features - outputs - OT plan from subject to barycentric subject - subject's BOLD transported to barycentric subject - encoding - inputs - list of aligned BOLD matrices (n_tr, n_voxels) - transcript for each matrix - outputs - encoding transformer - Alexis: - avg (exp 1): {story: bold}. bold of shape `[tr, voxels]` - single (exp 3-5): Fugw().transform() : `[tr, voxels] -> [tr, voxels]` - Charlotte: - debug features in decode/encode (notebook) - go all to scripts - `utils.py` (`def build_design_matrix()`) - `encode.py` - `decode.py` (decode, decode_one_step) - `0_run_exp_avg_subject.py` - encoder trained on voxel-wise average of multiple subjects, eval on new story - `1_run_exp_fugw_avg_subject.py` - encoder trained on voxel-wise average of multiple sujects after they have been aligned with FUGW, eval on new story - `2_run_exp_one_subject.py` - encoder trained on one subject, eval on same or different subject - `3_run_exp_one_subject_fugw.py` - encoder trained on one subject, eval on different subject aligned with fugw - `4_run_exp_concat_subject.py` - encoder trained on concatenated subjects, eval on left-out subject - `5_run_exp_concat_fugw_subject.py` - encoder trained on concatenated subjects after they have been aligned with FUGW, eval on left-out subject - plusieurs histoires - out of story generalisation - autre generative LM - Mardi 25 Octobre / Mercredi: - parle `paths.py` - `eval.py` ```python # eval.py def eval(true_texts, decoded_texts): """ true_texts: list of str of shape [n] decoded_texts: list of str of shape [n] """ return bleu, meteor, wer, bert ``` ## 25/26 Octobre 2022 (Charlotte) **Questions pour le 26:** - definir la prbmatique/contrib + exp + outputs (e.g. plots) - on aimerait l'eval sur quoi exactement (combien de TRs) - Questions sur le papier de Huth: - are we sure that its model is not multi-subjects? - combien de TR pour l'eval? - combien de candidates K? - quel model generatif de langage? TODO: - [x] start FIR at TR=1 or 2 - [x] add WER in eval - [x] avg, out of story. Test on a ~1,000 words story. - [x] Fix eval (predictions should be a long string, or at least several TRs.) - [x] single subject - [x] concat subject - [ ] code Beam decoder - [ ] add encoder eval ## Huth paper **Important points to discuss:** - code/data availability soon - cross-subjects story perception. They *do* align subjects (supp table 4). - eval: - they show short segments. But say they eval by generating 1,800 words in a row?? - empty starting point + no context at all? Strange no? - written: *"The language 628 model is provided with the words that occur in last 8 seconds of the candidate"* **Eval Setups (always out of story)**: - story perception - story imagination - movie watching - cross-subjects story perception: Sup Table 4 **Methodological points to include**: - [x] starting at t=2s (start_tr=1 or 2) - [x] add WER in eval - [x] add Beam in decoder ## 28 Octobre 2022 (Charlotte) Handling of infinite/nan values in decoding: - how to proceed if nan / infinite / negative values in scores: how to score ? how to normalize? - z_score / softmax ## 29 Octobre (Charlotte) - TODO: - [x] check NLL in generate / check infinite values in NLL / decide how to normalize (softmax for NLL, min_max for brainscore) - [x] Multi-story - [ ] Check partial_fit sklearn - [ ] Organise repo. - `models/` - `feature_extractor.py` : from array of text to design matrix. .fit() .transform() - `bayse_decoder.py`: from (context, bold) to text - `end_to_end_decoder.py`: from (context, bold) to text - `align/` - `fugws.py` - `baseline_aligners.py`: from X,y,subjects to X,y. .fit(), .transform() - `data/` - `narratives.py`: Dataset with `__getitem__(session)` and .sessions, .metadata, .subjects, .stories / .tasks - `loader.py` - `concat_datasets.py` From per-session dataset, to multisessions, subjects datasets () - `experiments/`: ### 31 Octobre (Charlotte) TODO: - [x] Regarder resultats avg_sub - [x] Lancer single_sub - [ ] Lancer multi-sub. Check a quel point on est far away en RAM pour du multisujet dans un seul model - [ ] Clean pipeline? (avec un models/) - [ ] Dans models, introduire partial_fit_estimator - [x] Article La Recherche - [ ] Poster Neurips ### 3 Novembre (Charlotte) - [x] launch with start_tr=2 to have a fair baseline model - [x] Add error bars in eval to select the best predicted sentences - [ ] Launch on all subjects with SGDRegressor (RAM out of memory => need to pass in partial_fit and loader) - models: add subject embedding layer - skearn based with custom SGDRegressor? OK for SGDRegressor - torch based - [ ] TorchDataset and concat_collator - [ ] Factorize Models - [ ] Add linear torch model - [ ] HYDRA ### 7 Novembre: **scale exp** - [ ] torch dataset and batch loader - [ ] partial_fit sklearn estimator - [ ] nn.Linear model with torch lightning - [ ] subject embedding layer **check current** see what is best as a metric to score language generations - what voxel_mask / weights? (var mask, R mask, R2 mask) - MSE or other? what could be used? ## 31 Octobre (Common) **Summary** Next meeting: jeudi 10 Next steps => 10: - Alexis + Alex: alignement (avg subject then beta values) - Alex: - [ ] intégrer IBC LPP dans data - [ ] vérifier que les données sont clean - Charlotte: - [x] *pipeline* sans alignement avg_sub + single_sub (un seul sujet) - [ ] *results* sans alignement avg_sub + single_sub - [x] regarder comment adapter pour gros dataset *multisub* (torch/partial_fit de sklearn) (mercredi) - [x] add *Huth data* (jeudi) - [ ] *clean* pipe avec `models/` `data/` dans une nouvelle branche (vendredi) - [ ] jeudi: regarder les resultats + debug Huth dataset (with Alexis) - [ ] vendredi: clean pipe avec `models/` **First results on average subjects** **Methods** - For each story, average bolds across subjects - Fit on n-1 story, test on last story - Evaluate encoding and decoding on the left out story - Encoding eval: R score for each voxel - Decoding procedure: - start with empty context. - given context, generate n_gen=30 possible continuation using gpt2 (for now, we use the *true* number of words to generate), - compute the brainscore of each possible generation (MSE this time because we want one measure per sample) - keep the n_beam=10 best in terms of brainscores ("lm_and_brain", blue), or perplexity ("lm_only", red). - update context with the segments kept - Decoding eval: bleu / rouge / bertscore / meteor **Encoding results** <img src="https://i.imgur.com/4sDM4YE.png" alt="drawing" width="200"/> ![](https://i.imgur.com/EdA4x5X.png) **Decoding results** ![](https://i.imgur.com/wtQD9hy.png) **Decoding grid** default params: {'n_beam': 10, 'n_gen': 30, 'start_tr': 0, 'decode': 'beam', 'n_steps': 100} ![](https://i.imgur.com/9X11E3l.png) ### Experiment example ```python= def run_exp(): # Data X, y, groups = NarrativeDataset().dataset # X=bolds, y=texts, groups=subjects/stories train, test = train_test_split(X, y=y, groups=groups) # Model model = Model(**model_params) # -- Train -- model.fit(X[train], y[train], groups[train], **fit_params) # -- Eval -- # eval fitting metrics run_eval_encoder(model.encoder, X[test], y[test]) # eval decoded texts predicted_texts = model.predict(X[test]) metrics, df = run_eval_texts(predicted_texts, y[test]) return model, ``` ### Feature extractor class ```python= class FeatureExtractor(BaseTransformer): def __init__(self): self.fir = FirTransformer() self.nlp_tokenizer = AutoTokenizer.from_pretrained(conf.nlp_model_name) self.nlp_model = AutoModelForCausalLM.from_pretrained(conf.nlp_model_name) def fit(self, X, y=None): return self def transform(self, texts, y=None): features = build_design_matrix(texts, model=self.nlp_model, self.nlp_tokenizer, **params) return features def fit_transform(self, X, y=None): self.fit() return self.transform(X, y=y) ``` ### Model class ```python= class SklearnEncoder(object): def __init__(self, conf): self.y_pipe = self.build_y_pipeline(conf) self.pipe = self.build_pipeline(conf) def build_pipeline(self, conf): steps = [ ("scaler", StandardScaler()), ("ridge", RidgeCV(np.logspace(-3, 6, 10))) ] return Pipeline(steps) def build_y_pipeline(self, conf): y_steps = [ ("scaler", RobustScaler(quantile_range=(0.1, 99.9))), ] return Pipeline(y_steps) def fit(self, X, y): y = self.y_pipe.fit_transform(y) self.pipe.fit(X, y) return self def predict(self, X): y_pred = self.pipe.predict(X) y_pred = self.y_pipe.inverse_transform(y_pred) return y_pred class BayseModel(object): def __init__(self, conf): # Feature Extractor (text -> design matrix) self.feature_extractor = FeatureExtractor(**conf.feature_params, **conf.fir_params) # Encoder self.encoder = SklearnEncoder(**conf.encoder_params) # Decoder self.decoder = BayseDecoder(**conf.decoder_params) def fit(self, bolds, texts): # Learn encoding matrix features = self.feature_extractor.fit_transform(texts) self.encoder.fit(features, bolds) def decode(self, bolds, prev_texts, **decoder_kwargs): """ Generate new text given a series of bolds and one context """ decoded = _decode(self.encoder, bold, context, **decoder_kwargs) return decoded def evaluate(self, bolds, texts, start_tr=1): """ Eval encoder (R) Generate sequences given start_tr Eval generations """ encoder_metrics = _run_eval_encoder(self.encoder, bolds, texts) decoded_texts = self.decode(bolds[start_tr:], prev_texts[:start_tr]) decoder_metrics = _run_eval_texts(decoded_texts, texts[start_tr:]) decoder_metrics["context"] = prev_texts[:start_tr] return decoder_metrics def save(self): # Only need to save the sklearn encoder params save_model(self.encoder) save_conf(nlp_conf) def load(self, file): rebuild ``` ## Code base utils.py can be split into: - features.py (get_features, generate_continuations) - fir.py (design_matrix, apply_fir) - data.py (get_data) fmri/narratives.py should be renamed into data/narratives.py. Other files in fmri/ can be deleted except from exclude_scans fmri/narratives.py should disappear and be merged with some data utils ## Datasets Huth: https://openneuro.org/datasets/ds003020/versions/1.0.2 ## From events to design_matrix @athual ```python import pandas as pd import numpy as np def get_texts_from_events(events, extra_trs=5, text_col="word_raw", tr=2): """ events a dataframe of shape [n_words]. with columns: - onset - word_raw (or text_col) """ # Load eventsulus events = events.dropna(subset=[text_col]) # Aggregate eventsulus by fMRI scans events["scan"] = (events["onset"].interpolate().astype(float) // tr).astype( int ) assert not events["scan"].isna().any() events = events.groupby("scan")[text_col].agg(lambda x: " ".join(x).strip()) # Subselect non empty scans min_tr = int(events.index.min()) max_tr = int(events.index.max() + extra_trs) events = events.loc[min_tr:max_tr] # Re-align scans and text text = np.empty(max_tr, dtype=f"<U{events.apply(len).max()+1}") for time_frame in events.index: text[time_frame] = events.loc[time_frame] text = text[min_tr:] return text if __name__=="__main__": from transformers import AutoTokenizer, AutoModelForCausalLM from src.utils.utils import get_features, build_design_matrix events = pd.DataFrame({"onset": np.arange(0, 100, 0.5), "word_raw": np.arange(200).astype(str)}) # Aggregate text by TR texts = get_texts_from_events(events) # Load model model_name = "gpt2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Get features GPT-2 features = build_design_matrix( texts, model=model, tokenizer=tokenizer ) print(features.shape) ``` ## Example to load beta coef for one subject @athual ```python= import numpy as np import torch from sklearn.model_selection import KFold from transformers import AutoModelForCausalLM, AutoTokenizer from src.encode import encode from src.utils.data import get_bolds, prepare_data def get_beta_coef( subject, model_name="gpt2", device="cpu", tr=1.5, ): # Get data print("Loading data") bolds = get_bolds(subject) texts, bolds, stories = prepare_data( list(bolds.keys()), list(bolds.values()), tr=tr ) # Define splits cv = KFold(shuffle=False) train, test = list(cv.split(texts, bolds))[0] # Init nlp model print(f"Loading {model_name} model") tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) if torch.cuda.is_available(): model.to(device) # Encoding print( f"Fitting encoder on {len(train)} samples," f" {len(np.unique(stories[train]))} stories." ) encoder = encode( texts[train], bolds[train], nlp_model=model, nlp_tokenizer=tokenizer, bold_pca=0, ) # Extract weights weights = encoder["encoding_model"]["ridge"].coef_ return weights if __name__ == "__main__": weights = get_beta_coef("sub-004") print("Weights of shape [voxels, dim_gpt x n_fir_delays]: ", weights.shape) ``` ## FUWGS - SRM to get avg subject - beta to get single subject ## Useful links @athual - LM Generation https://huggingface.co/blog/how-to-generate - Eternal Terminal https://eternalterminal.dev/usermanual/ ## Results (02/11/2022) ### Average subject **Encoding** <img src="https://i.imgur.com/6WImHha.png" alt="drawing" width="300"/> ![](https://i.imgur.com/rAh4AH9.png) **Decoding** ![](https://i.imgur.com/CHmpjZU.png) ### Single-subject **Encoding** <img src="https://i.imgur.com/GJFNkn8.png" alt="drawing" width="300"/> **Decoding** ![](https://i.imgur.com/UkugMxo.png) ### Multi-subject (train), left-out subject same story (test) **Encoding** <img src="https://i.imgur.com/h9PP6QB.png" alt="drawing" width="300"/> <img src="https://i.imgur.com/CMHeKVA.png" alt="drawing" width="300"/> **Decoding** ![](https://i.imgur.com/H9aGr9i.png) **Conclusion** On Narratives: * *Encoding* good pour average et multi. Relatively bad for single. Rmk: scores relatively bad for PrettyMouth et Lucy * *Decoding* issue. Seems to always yiels the same results. Error in code? Wordlen fixed? On Lebel: issue with alignment/raw data projected. # Decoding end-to-end ## Options * **Option 1 - Replace** + Linearly predict `text_embeddings` given fMRI using sklearn + Replace activations with the predicted `text_embeddings` + Issues: How to predict different words? Shouldn't we predict directly sentence embeddings? * **Option 2 - CrossAttention** + Add a cross-attention layer to a generative model. The model has to be generative. + e.g. with GPT-2. Start with context. * **Option 3 - Guidance** + Optimize activations to best predict the condition. ## Options 2: Cross Attention ``` python class BoldSpatialReducer(nn.Module): """ Project fMRI into smaller dimensionality (voxels->dim) """ def __init__(self, bold_dim_in=40962, bold_dim_out=768, bold_dim_hidden=64, ): super().__init__() self.bold_dim_in = bold_dim_in self.bold_dim_out = bold_dim_out self.spatial_reducer = nn.Sequential( nn.Linear(bold_dim_in, bold_dim_hidden), nn.Linear(bold_dim_hidden, bold_dim_out) ) def forward(self, bold): """ from [T, V] to [T, D] """ out = self.spatial_reducer(bold) # [TR, V] -> [TR, D] return out class fMRIConditionalGPTDecoder(ConditionalGPTDecoder): def __init__(self, config, cross_attention_layers=(), freeze_mlp=True, add_fmri=False, bold_dim=40962, bold_dim_hidden=64, init_fmri_std=None, max_fmri_position_embeddings=40, add_fmri_position_embedding=True, sinusoidal_fmri_embeddings=True,): super().__init__(config, cross_attention_layers=cross_attention_layers, freeze_mlp=freeze_mlp) # Add fMRI layer self.add_fmri = add_fmri if self.add_fmri: self.bold_dim = bold_dim self.fmri_layer = BoldSpatialReducer(bold_dim_in=self.bold_dim, bold_dim_out=self.config.n_embd, bold_dim_hidden=bold_dim_hidden) init_weights(self.fmri_layer, std=self.config.initializer_range) # Add positional embedding for fMRI self.add_fmri_position_embedding = add_fmri_position_embedding if self.add_fmri_position_embedding: if sinusoidal_fmri_embeddings: # Set gradient to False self.fmri_pe = SinusoidalEmbedding(self.config.n_embd) else: self.fmri_pe = nn.Embedding(max_fmri_position_embeddings, self.config.n_embd) init_weights(self.fmri_pe, std=self.config.initializer_range) def forward(self, *args, fmri=None, fmri_positions=None, **kwargs): assert not "encoder_hidden_states" in kwargs or \ kwargs["encoder_hidden_states"] is None, \ "Encoder hidden states should be None" if fmri is not None: assert self.add_fmri if self.add_fmri and (fmri is not None): # Project fMRI onto GPT space B, T, V = fmri.shape assert V == self.bold_dim fmri_embeds = self.fmri_layer(fmri.reshape(B * T, V)) fmri_embeds = fmri_embeds.reshape(B, T, self.config.n_embd) if self.add_fmri_position_embedding: if fmri_positions is None: fmri_positions = torch.arange(T).long().to(fmri_embeds.device) fmri_embeds += self.fmri_pe(fmri_positions)[None] # [B, T, D] else: fmri_embeds = None kwargs["encoder_hidden_states"] = fmri_embeds out = super().forward(*args, **kwargs) return out def prepare_inputs_for_generation(self, *args, fmri=None, fmri_positions=None, **kwargs): """ This is needed for generation """ output = super().prepare_inputs_for_generation(*args, **kwargs) output.update({"fmri": fmri, "fmri_positions": fmri_positions}) return output ```