The Neurobagel Annotation Tool

# The Neurobagel Annotation Tool Neurobagel's annotation tool takes BIDS-style phenotypic data (.tsv files) and corresponding data description files (.json files) and gives users the ability to annotate their data using configurable data models for preparation to inject that modeled data into Neurobagel's graph database for [federated querying](https://github.com/neurobagel/query-tool). ## Annotation workflow summary 1. **Upload**: Select configuration, upload data table (.tsv) and optionally a data dictionary (.json) 2. **Column Annotation**: Annotate individual columns with descriptions, data types, and standardized variables - 2.1 **Multi-Column Measures** (conditional): annotate measures that span multiple columns like assessment tools 3. **Value Annotation**: Annotate the values within categorized columns with standardized terms and formats 4. **Download**: Download the completed annotated BIDS-style data dictionary ## Step Instructions ### Upload Step  The Upload step is where you select yourannotation configuration, upload your data files, and preview them before proceeding to annotation. #### Selecting a Configuration - **Configuration Selection**: Choose from available configurations (e.g., "Neurobagel", "ENIGMA") using the dropdown menu. Each configuration defines the standardized variables and vocabularies available for annotation. If you don't select a configuration the tool will load the default Neurobagel configuration for you. - The tool will load the selected configuration, which determines what standardized variables you can map your columns to. #### Uploading a Data Table (.tsv file) - **Data Table Upload**: Click the file upload area or drag your file into the upload area to upload your BIDS-style phenotypic data file (typically named `participants.tsv`). - **File Preview**: Once uploaded, you can toggle the preview to examine the contents and ensure you've uploaded the correct file. - **Pagination**: If your data table is large, use the pagination controls to navigate through the preview. #### Uploading a Data Dictionary (Optional .json file) - **Data Dictionary Upload**: After uploading a data table, you can optionally upload a corresponding BIDS-style data dictionary file. - **File Preview**: Toggle the preview to examine the JSON structure and verify the contents. - **Continuing Previous Work**: - **BIDS only data dictionary**: If you upload a basic BIDS data dictionary (with only Description, Levels, and Units fields), the tool will automatically read and load those BIDS specific fields and will allow you to use it as a starting point for new annotations - **Previously annotated data dictionary**: If you upload a previously annotated data dictionary (containing Neurobagel annotations), the tool will automatically restore your previous work: - Column descriptions and data types will be pre-populated - Standardized variable mappings will be restored - Multi-column measure selections and mappings will be preserved - Value annotations (formats, term mappings, missing values) will be loaded - **Incremental Updates**: You can modify any existing annotations or add new ones to partially completed work ### Column Annotation Step  The Column Annotation step is where you annotate individual columns from your data table. in your provided TSV is displayed in its own card with options to edit descriptions, select data types, and map to standardized variables. #### Annotating Individual Columns Each column card contains several annotation options: - **Column Description**: - Click the "Edit" button to add or modify the column description - This description will appear in your final data dictionary - If you uploaded a data dictionary, existing descriptions will be pre-populated - **Data Type Selection**: - Choose between "Categorical", "Continuous", or leave the data type empty if neither is applicable - **Automatic selection**: If you map the column to a standardized variable the app will select the data type for you based on the data type specified for that standardized variablefrom the selected configuration - **Categorical**: For discrete values like sex, diagnosis, or diagnosis - **Continuous**: For numerical measurements like age, scores, or physical measurements - **Not applicable**: For identifier columns like participant ID, session ID, and multi-column measures - **Standardized Variable Mapping**: - Use the dropdown to map your column to a standardized variable from your selected configuration - Available options depend on your chosen configuration (e.g., Neurobagel, ENIGMA) - Common standardized variables include "Participant ID", "Age", "Sex", "Diagnosis", "Assessment Tool" #### Multi-Column Measures (Conditional)  If you have columns mapped to multi-column standardized variables like assessment tools, you'll proceed to a Multi-Column Measures step within the Column Annotation workflow. This allows you to group related columns that belong to the same measurement. ##### Configuring Assessment Tools - **Measurement Selection**: - Each card represents a potential multi-column measure - Use the dropdown to select the specific tool from the list of standardized terms (e.g., "Montreal Cognitive Assessment", "Beck Depression Inventory" in case of assessment tools) - The tool name and description will be displayed once selected - **Column Assignment**: - Use the column dropdown to assign specific columns to each tool - Multiple columns can be assigned to the same tool, however, columns can only be mapped to one tool - The interface shows how many columns are currently assigned - **Adding Multiple Tools**: - Use the "+" button to add additional assessment tool cards if you have multiple tools in your dataset - Each tool can have its own set of assigned columns ### Value Annotation Step  The Value Annotation step is where you annotate the actual values within your categorized columns. The interface uses a sidebar navigation to organize columns by their standardized variable types. #### Sidebar Navigation  The sidebar organizes your columns into categories: - **Annotated**: Columns that have been mapped to standardized variables - **Unannotated**: Columns that haven't been mapped yet - **Continuous**: Columns with continuous data (Age, scores, measurements) - **Categorical**: Columns with discrete categories (Sex, Diagnosis, groups) - **Other**: Columns with no data type selected #### Annotating Continuous Values  For continuous columns (like Age): - **Format Selection**: Choose the appropriate format from the dropdown (e.g., "float", "euro", "int"). Available only for continuous columns that have been mapped to a standardized variable - **Units Description**: Add or edit the units description (e.g., "years", "points", "milliseconds") - **Missing Values**: Mark specific values as missing using the "Mark as missing" button #### Annotating Categorical Values  For categorical columns (like Sex, Diagnosis): - **Term Mapping**: Map each value to a standardized term from the controlled vocabulary. Available only for categorical columns that have been mapped to a standardized variable - **Level Descriptions**: Add or edit descriptions for each unique value in your column - **Missing Values**: Mark values as missing or not missing as needed #### Annotating Multi-column measure values  For multi-column measures: - **Tool Navigation**: Use the sidebar to select different tools under the mapped standardized variable - **Grouped Display**: All columns belonging to the same tool are visually grouped together - **Column Tabs**: Switch between different columns within the same tool - **Annotation**: Based on the columns data type you will be able to add or edit descriptions for each unique value in your column for categorical columns and select the format and add or units description for continuous columns #### Managing Missing Values Any value in any column can be marked as missing: - Click "Mark as missing" to exclude a value from annotation - Click "Mark as not missing" to include it back - Missing values are tracked separately and included in the final data dictionary ### Download Step  The Download step is the final step where you can review your completed annotations and download the annotated data dictionary. #### Completion Status  The page displays an alert indicating whether your annotations are complete or if there are any missing required annotations. You can download the dictionary even if not all annotations are complete, but you'll see warnings about incomplete sections. #### Data Dictionary Preview  - **Preview Toggle**: Click the preview button to show/hide the JSON structure of your annotated data dictionary - **JSON Structure**: Review the complete structure including: - Column descriptions - Data types and formats - Standardized variable mappings - Level descriptions and term mappings - Missing value specifications - Assessment tool groupings #### Download Options  - **Download Dictionary**: Click the "Download Annotated Data Dictionary" button to save your completed annotations as a JSON file - **File Naming**: The downloaded file will be named `[original_filename]_annotated.json` - **Annotate New Dataset**: After downloading, you can click "Annotate New Dataset" to start over with a new dataset while keeping your current configuration #### What's Included in the Download The downloaded data dictionary includes: - **BIDS Compatibility**: Standard BIDS data dictionary structure with Description, Levels, Units fields - **Neurobagel Annotations**: Enhanced metadata including: - Standardized variable mappings (`IsAbout` fields) - Term URLs from controlled vocabularies - Format specifications for continuous variables - Assessment tool groupings (`IsPartOf` fields) - Missing value specifications The resulting file can be used with BIDS datasets and is ready for ingestion into Neurobagel's graph database for federated querying. ## Shortened version # The Neurobagel Annotation Tool The Neurobagel annotation tool converts your data tables into standardized, machine-readable data dictionaries using FAIR vocabularies. Perfect for BIDS datasets and research data harmonization. **Workflow summary**: Upload table → Annotate columns → Annotate values → Download dictionary ### 1. Upload - **Upload your data table** (.tsv file) - typically `participants.tsv` from a BIDS dataset - **Optional**: Upload a data dictionary (.json file) for extra context - Use `participants.json` from BIDS datasets - Or continue previous Neurobagel annotation work In the following steps you annotate your table by first describing the columns and then the values within these columns. ### 2. Column Annotation  look at each column from your table and then you can: - **Add a description** - Click "Edit" to describe what the column contains - **Select the data type** - Choose "Categorical" for discrete values or "Continuous" for numerical measurements - **Indicate if the column matches a standardized variable** - Select from Neurobagel standardized variables - Neurobagel understands a set of predefined variables (e.g., "Age", "Sex", "Diagnosis", "Assessment tool") - Your task is to look through the columns in your table and identify those that contain information about these variables - Once a standardized variable is selected the tool will automatically infer the data type  - **Multi-column Measures**: If you have multiple columns for the same measure like assessment tool, group them together ### 3. Value Annotation  Use the sidebar to navigate between your columns and annotate their values: **For continuous columns** (Age, scores): - Select format (float, int, etc.) - Add units description ("years", "points") **For categorical columns** (Sex, Diagnosis): - Map each value to standardized terms - Add descriptions for each unique value **Handle missing or unmappable values**: Use "Mark as missing" for values that are absent, not applicable, or don't match any standardized term options ### 4. Download  - **Preview** your annotated data dictionary - **Download** your completed file - **Start over** with "Annotate New Dataset" if needed Your downloaded file is BIDS-compatible and ready for Neurobagel's graph database.