# NMA project
:::: info
**Team**

**Meeting link**: [Zoom here](https://zoom.us/j/7257005725?pwd=eFd2YVNUaHUybWgvaUVhNURwaERNdz09#success)
**Presentation**: [Google Slides link](https://docs.google.com/presentation/d/1MU72KmShql9Y-8keIqsJoZnjTlHHiq1BxwFFtpIrWEU/edit?usp=sharing)
## Notebooks
[Mishka Nemes](https://colab.research.google.com/drive/19u5784S0lr3xOTCxfxgMdQqIV0Kp4qhn#scrollTo=m85TRSrDrcnb)
[Aslan Satary Dizaji](https://colab.research.google.com/drive/1wPfLitOUPl1KRi1QF56rWWz-qSkrH2lW?usp=sharing)
::::
:::success
### Papers
[HCP manual](https://www.humanconnectome.org/storage/app/media/documentation/s1200/HCP_S1200_Release_Reference_Manual.pdf)
[Function in the Human Connectome: Task-fMRI and Individual Differences in Behavior](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4011498/)
[A rapid fMRI task battery for mapping of visual, motor, cognitive, and emotional function](https://www.sciencedirect.com/science/article/abs/pii/S1053811905025498)
:::
## TA meeting 13 August 2021
- train the agent on all the data we have and then test the agent on synthetic data
-
## TA meeting 11 August 2021
- how to implement the acme agents?
- size of dataset: sufficient? Yes depending on the data quality and the model we're trying to fit
- 4 conditions (body parts, faces, places, tools) x 2 runs x 10 trials per block x 100 subjects = 8000 data entries
- next meeting: multiple inputs/ observations on the agent
1 divide data in training and test
2 play with the number of stimuli
3 change the DQN agent - yes, for example the lr
## TA meeting 6 August 2021
- concern over n-back task does not suit the RL setup
- generate synthetic data could be a solution
- environment is determined by state, action, immediate feedback/ reward, evolution to next state
- separate the training network between 0-back and 2-back? yes because the setup is different between them (ie the states)
- missing A: how do we get data in the correct shape?
- look at difference between optimal policy and how humans behave
- gambling task is difficult to model in the RL framework because we are lacking information on what the true state is/ what the correct number was
- every subject for every block there should 8 non-match and 2 match
## Project plan week 2
- [x] **Monday**: collaboratively work (pair programming) to set up the HCP data in a format similar to the mock data
- [ ] **Tuesday**: run both (mock and human data) in parallel and identify/ calculate key outputs. Make the results more interpretable using data viz
- [ ] **Wednesday**: discuss with project TA the results and establish what the following steps should be
- [ ] **Thursday**: TBC
- [ ] **Friday**: TBC
## Project plan week 3
- [ ] **Monday**: long project day; write abstract
- [ ] **Tuesday**: TBC
- [ ] **Wednesday**: wrap up experiments
- [ ] **Thursday**: design presentation and record
- [ ] **Friday**: project presentation!
## Abstract
One major goal of cognitive sciences is to understand the human decision making process facing a diverse range of cognitive tasks. Decision making processes are assumed to follow a Markovian chain framework. Given that, reinforcement learning (RL) is a powerful method that can be employed to analyze those processes. This study investigates how well an RL agent can imitate human behaviour when performing an “N-back” working memory behavioural task, what is the policy that the agent is going to develop, and how is that policy comparable to human performance. Our hypothesis is that an RL agent trained on human-inspired data on the “2-back” task should adopt a policy close to the human one, instead of developing an optimal policy to achieve best performance for the task. The model employed to investigate this consists of a DeepMind Acme DQN agent, on which we compared and tested multiple different network architectures, reward functions, and input state configurations. This model is trained on human-inspired “2-back” behavioural data, that we derive from the available data of the HCP2021 repository. A major part of the project was data wrangling and curation to fit the required RL framework. Following that, we applied the DQN model to the target data and we observed that the agent is capable of imitating human behaviour to a degree. This model’s usefulness is limited due to the lack of quality and quantity of data, however when given enough data in an appropriate format, we expect that it should be possible to carry out intriguing comparative analyses of both policies and architectures between the RL agent and the actual human brain.
**A. What is the phenomena? Here summarize the part of the phenomena which your modeling addresses.**
**B. What is the key scientific question?: Clearly articulate the question which your modeling tries to answer.**
**C. What was our hypothesis?: Explain the key relationships which we relied on to simulate the phenomena.**
**D. How did your modeling work? Give an overview of the model, it’s main components, and how the modeling works. ‘’Here we … ‘’**
**E. What did you find? Did the modeling work? Explain the key outcomes of your modeling evaluation.**
**F. What can you conclude? Conclude as much as you can with reference to the hypothesis, within the limits of the modeling.**
**G. What are the limitations and future directions? What is left to be learned? Briefly argue the plausibility of the approach and/or what you think is essential that may have been left out.**