# Exploring the API of 4TU.ResearchData for RDM support staff workshop (TU/e, WUR,UT)
<img src="https://data.4tu.nl/static/images/logo.png" width="200" />
## ๐ Workshop Overview
The idea of this workshop is to familiarize them with the main endpoints of the WebAPI for requesting, downloading, searching, and uploading datasets and software to the 4TU.ResearchData repository. The main target audience is the RDM support staff from the Dutch technical universities
The proposed schedule for a 3-hour session is:
- Introduction (10 minutes)
- Hands-on practice with the endpoints in the terminal (up to 2 hours, including one break)
- Discussion and troubleshooting: time for questions and for addressing projects, issues, or challenges participants may already have within their faculties (1 hour)
## ๐ Schedule
Main topics:
- Fetching datasets, software, collections
- Searching and fetching data from authors
- Searching accounts within your institution
- Uploading
- Metadata uploading
- File uploading
- Submit for review
- Image preview
## ๐ Prerequisites
Before attending the workshop, please ensure you have:
- (Required) Two private tokens in data.4tu.nl (main enviroment) and in next.data.4tu.nl (test environment)
- (Required) The `yq` a command-line YAML processor (similar syntax to `jq`).
- Linux
```bash
sudo apt-get update
sudo apt install yq
```
- macOS
```bash
brew install yq
```
- Windows (in powershell)
- Install scoop
```bash=
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Invoke-RestMethod -Uri https://get.scoop.sh | Invoke-Expression
```
```bash =
scoop install yq
```
or
```bash
choco install yq
```
- (Optional but nice to have) The `jq` program to render nicely json outputs:
- Linux
```bash
sudo apt-get update
sudo apt-get install -y jq
```
- macOS
```bash
brew install jq
```
- Windows Powershell
- Install scoop
```bash=
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Invoke-RestMethod -Uri https://get.scoop.sh | Invoke-Expression
```
```bash=
scoop install main/jq
```
- Check in Git bash your installation
```bash=
yq --version
jq --version
```
- If youโre using Git Bash or WSL, use the Linux instructions above.
- Otherwise, download the executable from the jq releases page (https://jqlang.org/download/) and add it to your PATH.
## Introductory presentation
Presentation link : https://zenodo.org/records/17520532
## 4TU API v2 documentation
- Incomplete!!!
- https://djehuty.4tu.nl/#x1-630006
- We have developed the v3 with new features related to:
- Git repositories
- Data upload
- Image preview and modification via IIIF
- Grups id for specific institutions
- RO-CRATE (Research object crate)
## Figshare documentation
v2 of the API of 4TU is backward compatible with the API from Figshare https://docs.figshare.com/#articles_list
Thus the same endpoints , parameters and response schema apply to both.
## Why is it important to use Web APIs for research?
Web APIs help automate access to research data and metadata. This enables reproducibility, automation of data pipelines, and programmatic interaction with repositories like 4TU.ResearchData.
## REST APIs in a nutshell
A REST API is a web service that uses HTTP methods (GET, POST, etc.) to allow communication between clients and servers. Responses are usually in JSON format, making them easy to parse and reuse.
## List of commands
## Reuse: Search and Download Datasets
### Get datasets or software deposited in 4TU (via `curl`)
```bash
curl -X GET "https://data.4tu.nl/v2/articles" | jq
```
## What is curl?
curl stands for **Client URL**.
Itโs a command-line tool that allows you to transfer data to or from a server using various internet protocols, most commonly HTTP and HTTPS.
It is especially useful for making API requests โ you can send GET, POST, PUT, DELETE requests, upload or download files, send headers or authentication tokens, and more.
## Why curl works for APIs
REST APIs are based on the HTTP protocol, just like websites. When you visit a webpage, your browser sends a GET request and displays the HTML it gets back. When you use curl, you do the same thing, but in your terminal. For example:
`curl https://data.4tu.nl/v2/articles` This sends an HTTP GET request to the 4TU.ResearchData API.
## Key reasons why curl is used:
Itโs built into most Linux/macOS systems and easily installable on Windows.
Scriptable: usable in bash scripts, notebooks, automation.
Supports headers, query parameters, tokens, POST data, etc.
Can output to files (>, -o, -O) or pipe to processors like jq.
### Add parameters to the same endpoint to filter results
- open the documentation: https://djehuty.4tu.nl/ (apologies by the documentation, it is also incomplete , we will try to make it better.....)
```bash
curl "https://data.4tu.nl/v2/articles?limit=2&published_since=2025-05-01" > data.json
```
```bash
curl "https://data.4tu.nl/v2/articles?limit=2&published_since=2025-05-01" | jq
```
### Exercise : request **10** **datasets** published from **January 1st 2025** and show it in the screen
```bash
curl "https://data.4tu.nl/v2/articles?item_type=3&limit=10&published_since=2025-01-01" | jq
```
Tip: The v2 of the API of 4TU.ResearchData is based on the figshare API , which practically means, that if you dont find something you were looking for in the current documentation https://djehuty.4tu.nl/#x1-640006.1, you can look in : https://docs.figshare.com/#articles_list
### Get 10 software records published after 01-01-2025 (via `curl`)
```bash
curl "https://data.4tu.nl/v2/articles?item_type=9&limit=1&published_since=2025-01-01" | jq
```
### Get information per dataset ID
```bash
curl "https://data.4tu.nl/v2/articles/03c249d6-674c-47cf-918f-1ef9bdafe749" | jq # /v2/articles/uuid
```
### Get all the files per dataset ID
```bash
curl "https://data.4tu.nl/v2/articles/03c249d6-674c-47cf-918f-1ef9bdafe749/files" | jq # /v2/articles/uuid/files
## Open this link in the browser to checl the uuid of a file to download (the readme, the last file)
```
### How to download a specific file
```bash
# print the readme file in the screen
curl "https://data.4tu.nl/file/03c249d6-674c-47cf-918f-1ef9bdafe749/20382d28-0ed9-4f9b-918a-936a2c6f8f76" # /file/article-uuid/file-uuid
#| Command | Behavior |
#| ---------------------- | -------------------------------------- |
#| `curl URL` | Prints file to screen (no saving) |
#| `curl -O URL` | Downloads and saves with original name |
#| `curl -o filename URL` | Downloads and saves with custom name |
#| `curl -L -O URL` | Follows redirects and saves file |
#| `curl -C - -O URL` | Resumes an interrupted download |
```
## Collections
### Fetching all collections
```bash
curl "https://data.4tu.nl/v2/collections" | jq
```
### Fetching collections with parameters
```bash
curl "https://data.4tu.nl/v2/collections?limit=2&published_since=2025-01-01" | jq
```
### Fetching information of a specific collection
```bash
curl "https://data.4tu.nl/v2/collections/a72aa7ae-7fd2-450b-a1c4-1fa093d15438" | jq
```
### Fetching information of the datasets of a collection
```bash
curl "https://data.4tu.nl/v2/collections/a72aa7ae-7fd2-450b-a1c4-1fa093d15438/articles" | jq
```
## Search Datasets by Keyword
```bash
curl --request POST --header "Content-Type: application/json" --data '{ "search_for": "aerospace" }' https://data.4tu.nl/v2/articles/search | jq
```
```bash
curl --request POST --header "Content-Type: application/json" --data '{ "search_for": "architecture" }' https://data.4tu.nl/v2/articles/search | jq
```
## Using a Token to Access Author Info (via `curl`)
#### Create the .env file and copy your private token there
`echo 'API_TOKEN="your_token_here"' > .env`
`echo "Token loaded: ${API_TOKEN:0:5}..."`
`source .env`
### Searching authors by name
```bash
# Requires setting a token in a sourced .env file (maybe skip this step but mention it
curl --request POST https://data.4tu.nl/v2/account/authors/search --header "Authorization: token ${API_TOKEN}" --header "Content-Type: application/json" --data '{ "search": "Leila" }' | jq
```
### Searching accounts within your institution
- "https://data.4tu.nl/v3/groups" (GET) This endpoint lists the groups ID of the institutions that you can use to filter the output of v2/articles/
```bash
curl -X GET "https://data.4tu.nl/v3/groups" | jq
```
## Upload Datasets (POST Requests)
### Basic Upload of metadata to a draft dataset
```bash
curl -X POST https://next.data.4tu.nl/v2/account/articles --header "Authorization: token ${API_TOKEN_NEXT}" --header "Content-Type: application/json" --data '{ "title": "Dataset RDM session", "authors": [{ "first_name": "Leila", "full_name": "Leila Inigo", "last_name": "Inigo", "orcid_id": "0000-0003-4324-5350" }] }' | jq
```
### Adding an author to the draft dataset
- first we need to copy the uuid of the draft dataset created in the previous step in the next.data.4tu.nl website
```bash
curl -X POST "https://next.data.4tu.nl/v2/account/articles/UUID/authors" --header "Authorization: token ${API_TOKEN_NEXT}" --header "Content-Type: application/json" --data '{ "authors": [{ "first_name": "John", "full_name": "Doe", "last_name": "Doe", "orcid_id": "0000-0303-4524-5350" }] }' | jq
```
### Upload Using YAML Metadata
- They need to download the example_metadata.yaml file
`curl -o example_metadata.yaml https://raw.githubusercontent.com/4TUResearchData-Carpentries/WebAPI4RDM/refs/heads/main/Lesson_development/example_metadata.yaml`
#### Upload to next server
```bash
yq '.' example_metadata.yaml | curl -X POST https://next.data.4tu.nl/v2/account/articles -H "Authorization: token ${API_TOKEN_NEXT}" -H "Content-Type: application/json" -d @-
```
#### Upload to the production server
```bash
yq '.' example_metadata.yaml | curl -X POST https://data.4tu.nl/v2/account/articles -H "Authorization: token ${API_TOKEN}" -H "Content-Type: application/json" -d @-
```
#### Command explanation:
`yq '.' example_metadata.yaml` : Converts example_metadata.yaml into JSON
- yq is a command-line tool to read/manipulate YAML (like jq is for JSON).
- `'.'` means "read the full YAML structure as-is".
`-d @-`
- `-d` sends data in the body of the POST request.
- `@-` means: read the request body from stdin (standard input), i.e., the piped-in JSON from yq.
##### Now try to submit it and realize that need a least a file to submit for review
### File upload
```bash
curl -X POST "https://next.data.4tu.nl/v3/datasets/dataset-id/upload" --header "Authorization: token ${API_TOKEN_NEXT}" --header "Content-Type: multipart/form-data" -F "file=@absolute-path-to-the-file"
```
#### Now lets take the uuid of the draft just created in the previous example and put it in the endpoint
- For tha data , first download the data using curl from github
`curl -O "https://raw.githubusercontent.com/4TUResearchData-Carpentries/WebAPI4RDM/refs/heads/main/Lesson_development/data_files/test_a.csv" `
```bash
curl -X POST "https://next.data.4tu.nl/v3/datasets/UUID/upload" --header "Authorization: token ${API_TOKEN_NEXT}" --header "Content-Type: multipart/form-data" -F "file=@ABSOULTE_PATH2FILE"
```
#### FIle upload with strict check for empty files and duplicates
```bash
MD5SUM=$(md5sum "ABSOULTE_PATH2FILE" | awk '{print $1}')
```
```bash
curl -X POST "https://next.data.4tu.nl/v3/datasets/UUID/upload?strict_check=1&md5=${MD5SUM}" --header "Authorization: token ${API_TOKEN_NEXT}" --header "Content-Type: multipart/form-data" -F "file=@ABSOULTE_PATH2FILE"
```
the response of this is that the resource is already available and stops there
### Submit for review
```bash
yq '.' example_metadata.yaml | curl -X PUT "https://next.data.4tu.nl/v3/datasets/UUID/submit-for-review" --header "Authorization: token ${API_TOKEN_NEXT}" --header "Content-Type: application/json" --data @-
```
## Image preview with IIIF (International Image Interoperability Framework)
IIIF is a standard for delivering high-resolution images over the web. It allows users to zoom, pan, and interact with images in a web browser.
### Open the browser and insert this endpoint:
https://data.4tu.nl/iiif/v3/c3eee5e4-1651-4541-8fb4-f240fbd1c4ba/full/1024,1024/0/default.jpg
### Explore the manifest
- example: https://data.4tu.nl/iiif/v3/c3eee5e4-1651-4541-8fb4-f240fbd1c4ba/full/1024,1024/0/default.jpg
- dataset of the example : https://data.4tu.nl/datasets/8289a903-7ccf-401b-af66-f5b3c9abe4b6/1
- "https://data.4tu.nl/iiif/v3/<file_uuid>" (GET, context of the image)
- example: https://data.4tu.nl/iiif/v3/312f1d4a-2b83-491c-b906-a9d5497f6c9d
- "https://data.4tu.nl/iiif/v3/c2a8d5ce-c4ea-46ed-bcdc-e35033e908a8/1/manifest"
- To open it in an editor (https://manifest-editor.digirati.services/?tab=recent)
## Motivation for Using bash :
Use case: Imagine a researcher is interested in getting the descriptions and categories of datasets uploaded in April 2025
Challenge: The description and categories are exposed if a dataset in specific is queried
```bash
curl -s "https://data.4tu.nl/v2/articles/fb26fd3f-ba3c-4cf0-8926-14768a256933" | jq
```
### Get the description and categories of the datasets uploaded in April 2025
```bash
curl -s "https://data.4tu.nl/v2/articles/fb26fd3f-ba3c-4cf0-8926-14768a256933" | jq -r '"Description: " + .description + "\nCategories: " + (.categories | map(.title) | join(", "))' > datasets_description_categories.md
```
### Bash Script: Loop Through UUIDs to Collect Metadata
```bash
curl -s "https://data.4tu.nl/v2/articles?published_since=20250401&item_type=3&limit=10" | jq '.[] | {uuid: .uuid}' > article_ids.jsoncat article_ids.json | jq -r '.uuid' | while read uuid; do curl -s "https://data.4tu.nl/v2/articles/$uuid" | jq -r '"Description: " + .description + "\nCategories: " + (.categories | map(.title) | join(", "))' >> articles_full_metadata.md ; done
```
### Limitations of Bash Scripts
- Harder to debug or extend
- Tricky to structure or merge data
- Not ideal for large-scale automation
## Bonus: Using `connect4tu` bash Package
You can also use the [connect4tu](https://github.com/leilaicruz/connect4tu) package for a cleaner bash interface to the 4TU API.
## Questions
### How do we get more than 10 responses?
by using the `limit` parameter in the GET request https://data.4tu.nl/v2/articles? yes
Liz: I tried this command:
curl https://data.4tu.nl/v2/articles | jq > data_all.json
and I also only get 10 datasets in the file
fixed it by running this (thanks to Nami): curl -X GET https://data.4tu.nl/v2/articles\?limit\=100 | jq
### Next time: How to add authors that are already in the system?
* If we use the author information string when creating a dataset, we end up creating new author entries.
* This is not optimal because we end up with duplicate authors
* A better approach is to use an UUID of an author when working on dataset medatata (perhaps by querying for the author UUID first, and using that when creatiing the dataset)
