# Nerobagel query tool ai
## Project Setup
- clone the repo
`git clone https://github.com/neurobagel/query-tool-ai.git`
- create virtual environment
`python3 -m venv venv`
`source venv/bin/activate`
- set up pre-commit ( flake8, black, mypy)
`pre-commit install`
- complete installations
`pip install -r requirements.txt`
## Milestone 1 - Parsing user prompt
This task has to be completed by leveraging LLMs.
One major issue with LLMs is - **hallucinations**
- #### google/flan-t5-xxl
```python
chain = LLMChain(llm=HuggingFaceHub(repo_id='google/flan-t5-xxl',model_kwargs={'temperature': 0.01}), prompt=prompt))
```
1. Extracting all parameters together -
prompt: How many female subjects older than 50 with a Parkinson’s diagnosis?

**Issue** - The model here is assuming certain values which are not mentioned in the prompt like imaging_sessions and phenotypic_sessions.
2. Extracting values one by one -
- extract_age.py

- extract_sex.py

- extract_sessions

**Issue** - The model here is extracting correct values, but the google/flan-t5-xxl model from hugging-face has a rate limit.
Also this model could not perform well when it came to categorical values like diagnosis, assessment tool, health-control and image-modality.
- #### llama-2 - Though it is a larger model it was not able to provide accurate solutions.
```python
llm = ChatOllama(model="llama2")
chain = LLMChain(llm=llm, prompt=prompt)
output = chain.run(user_query)
return extract_output(output)
```
**Example of LLM Response -**

- #### gemma - This LLM is somewhat better that llama-2 but still the issue of hallucination persists till some extent
```python
llm = ChatOllama(model="gemma")
chain = LLMChain(llm=llm, prompt=prompt)
output = chain.run(user_query)
return extract_output(output)
```
**Example of LLM Response -**





- #### mistral - So far this LLM proves to be the best when it comes to extracting all the parameter values from the user query at once.
- A Pydantic model defining the schema for information extraction.
```python
class Parameters(BaseModel):
"""
Parameters for information extraction.
"""
max_age: Optional[str] = Field(description="maximum age if specified", default=None )
min_age: Optional[str] = Field(description="minimum age if specified", default=None)
sex: Optional[str] = Field(description="sex", default=None)
diagnosis: Optional[str] = Field(description="diagnosis", default=None)
is_control: Optional[bool] = Field(description="healthy control subjects", default=None)
min_num_imaging_sessions: Optional[str] = Field(description="minimum number of imaging sessions", default=None)
min_num_phenotypic_sessions: Optional[str] = Field(description="minimum number of phenotypic sessions", default=None)
assessment: Optional[str] = Field(description="assessment tool used or assessed with", default=None)
image_modal: Optional[str] = Field(description="image modal", default=None)
```
**Examples of LLM Response** -
