owned this note
owned this note
Published
Linked with GitHub
# RDS course advertisement
15 - 25 Nov 2021
Monday - Thursday 13:00 - 17:00
Online training course
[Link: Apply Here]
## Introduction
The Alan Turing Institute is pleased to announce that registration is now open for the *Research Data Science* online training course (15 - 25 November 2021), organised by the institute's [Research Engineering Group (REG)](https://www.turing.ac.uk/research-engineering).
The course consists of four modules, each involving a half-day taught session and a half-day hands-on session. The course requires part time (4h / day) remote attendance Monday-Thurs over two weeks in the UK timezone (BST) - see details below.
The first half of each module will be recorded and made available (along with all other materials) for asynchronous self-study at any other time.
## About
Data science methods and tools have become commonplace in research projects across academia, government and industry. Researchers increasingly need to collaborate with multi-disciplinary teams of data scientists, software engineers and other stakeholders.
This course is designed for researchers interested in understanding and using data science methods in their work. The course will help learners move beyond data science principles, to learn how to tackle real, complex and sometimes vaguely defined research data science projects. They will learn how to do this in a collaborative environment, with an emphasis on practical techniques and technologies and with an overarching awareness of ethics and diversity issues. This is an intensive, hands-on course, informed by REG's experience with research data science projects and aiming to bring learners in touch with day-to-day research data science practices.
The course consists of:
- Taught modules that will introduce learners to key concepts, methodology and ways of solving problems.
- Hands-on modules where learners will work in teams to tackle a real research data science problem, including scoping it, discussing it from an equality, diversity and inclusion (EDI) point of view and coding collaboratively to produce a data science solution.
This course complements the Turing's Research Software Engineering with Python course (found [here](https://alan-turing-institute.github.io/rsd-engineeringcourse/)).
## Key objectives and learning outcomes
The main objectives of the course are the following:
- Teach attendees how to use research data science (RDS) methods in an interdisciplinary research environment.
- Move beyond core principles and methodology, towards a hands-on, practical understanding, focused on collaboration, reproducibility and openness.
- Provide exposure to a real-world RDS project and demonstrate the decision-making process used to choose the right method and tools for each setting and in each project step.
- Embed data ethics, diversity and inclusion awareness into the learners' approach to all stages of an RDS project, providing multiple examples.
The learning outcomes are the following:
- Attendees will understand fundamental RDS methods (e.g., data wrangling, visualisation, exploration, modeling) and know when/how to apply them to their research in order to draw data-driven insights or create data-driven tools.
- Attendees will be familiar with the stages of a collaborative RDS project, from scoping and data exploration to visualisation and modeling and will become aware of the challenges of tackling real-world problems.
- Attendees will be able to recognise power imbalances, bias and diversity issues in their technical work and in their ways of working with others and challenge them.
## Who should apply
The course is open to postgraduate students, early career researchers, practitioners (e.g. data analysts/scientists) and researchers interested in following an RSE career. Turing PhD students and researchers are particularly encouraged to apply.
## Prerequisites
Participants are expected to:
- Be comfortable with basic Python, either through working on a project or through attending a training course. Indicatively, they should be comfortable with the concepts covered in the ["Introduction to Python" module](https://alan-turing-institute.github.io/rsd-engineeringcourse/html/ch00python/index.html) from the Turing's Research Software Engineering with Python course. The [Programming with Python Software Carpentry](https://swcarpentry.github.io/python-novice-inflammation/) also covers some of these concepts. Familiarity with Matplotlib, NumPy and Pandas is beneficial but not required.
- Have some basic knowledge of Git (setting up repositories, commits) through using it in projects or by attending training, e.g. the Software Carpentry's [Version Control with Git](https://swcarpentry.github.io/git-novice/) (Sections 1 to 4 and 7 to 9).
- Have read the first two sections of the Turing Way's [Guide for Collaboration](https://the-turing-way.netlify.app/collaboration/collaboration.html) ("Getting Started in GitHub" and "Maintainers and Reviewers in GitHub").
## Timeline
28 September - Applications open
11 October - Applications close
22 October - Offers sent out
15 - 25 November - Online course takes place
## Syllabus
Module 1: Intro to Data Science
- Taught session (15 November):
- What data science and research data science are, overview of the variety of cultures within them.
- Stages in a data science project and common issues when scoping a project.
- Intro to EDI for data science.
- How to work collaboratively in data science projects.
- Hands-on session (16 November):
- Scope a research data science project using a real-world survey individual-level dataset, including discussion of research question and EDI issues and setting up a collaborative GitHub repo.
Module 2: Handling data
- Taught session (17 November):
- Data wrangling, cleaning and provenance.
- Handling missing data.
- Data access: SQL, APIs.
- Data privacy and security.
- Hands-on session (18 November):
- Explore, pre-process and clean the dataset from Module 1. Discuss and decode various complexities (e.g. missing/ambiguous values, bias in data collection, data privacy and sensitivity).
Module 3: Data visualisation
- Taught session (22 November)
- Figures gone wrong.
- Rules of the data visualisation game.
- Atlas of visualisations.
- Storytelling with data visualisation.
- Data visualisation for data exploration.
- Hands-on session (23 November)
- Build visualisations to understand the dataset from Module 1 and 2 using material from the taught sessions, explore the relationships and importance of variables.
Module 4: Modeling
- Taught session (24 November)
- The what and why of Statistical Modeling
- Inside a Model.
- Building a Model.
- Evaluating and Validating Models.
- Hands-on session (25 November):
- Build your own model based on the knowledge acquired so far about the dataset and the techniques taught in this module. Improve upon baseline, interpret results to answer research questions and discuss limitations and alternative approaches.
# Application page (form completed by applicants)
## Contact information
- ...
- ...
## Basic EDI monitoring info
- ...
- ...
## Why do you want to attend this course? (up to 250 words)
## What is your experience with Python? (up to 250 words)
## Would you say that (using Strongly agree Strongly disagree statements):
- I feel comfortable developing code in Python
- I feel comfortable using Git
## Have you attended (synchonously or asynchronously) the Turing's [Research Software Engineering with Python course](https://alan-turing-institute.github.io/rsd-engineeringcourse/)? (this is not a prerequisite)
## Have you read or do you plan to read the first two sections of the Turing Way's [Guide for Collaboration](https://the-turing-way.netlify.app/collaboration/collaboration.html)?