Try   HackMD

2021-10-18
SWD1a: Introduction to Python

Welcome to the hack pad for SWD1a course from Research Computing at the University of Leeds!

You can edit this document using Markdown syntax.

Contents

  1. Links to resource
  2. Agenda Day 1
  3. Agenda Day 2
  4. Agenda Day 3
  5. Agenda Day 4
  6. What's your name and where do you come from?
  7. Further reading
  8. Extra code snippets
  9. Challenges

Agenda Day 1

Time Agenda
1300 Intro, using Google Colab, What is Python?
1350 Break
1400 Python basics, Handling data
1450 Break
1500 Python packages and Pandas
1550 Questions
1600 Close

Agenda Day 2

Time Agenda
1300 Mount google drive onto colab, using pandas
1350 Break
1400 Indexing and subsetting, Data types and formats
1450 Break
1500 Combining dataframes
1550 Questions
1600 Close

Agenda Day 3

Time Agenda
1300 Indexing and subsetting, Data types and formats
1350 Break
1400 Merging dataframes
1450 Break
1500 Loops and functions
1550 Questions
1600 Close

Agenda Day 4

Time Agenda
1300 Finishing off merging, loops and functions
1350 Break
1400 loops and functions cont., plotting with plotnine
1450 Break
1500 Bringing it all together
1550 Questions
1600 Close

What's your name and where do you come from?

  • Add your own entry below using the edit mode
  • Ed Keavney - 2nd year Earth and Environment PhD researcher - help with data analysis
  • Ollie Clark - Research Software Engineer in Research Computing. (Databases, web and, software engineering. A bit of Azure)
  • John Hodrien - Research Software Engineering (Linuxy, parallel, graphics, cloud), never used Python really in anger
  • I am Smail Kechidi - Postdoctoral researcher at the schoold of Civil Engineering. I am keep to learn to code in Python for software development.
  • Josefa Sepulveda, Earth and Enviromental. PHD student. I will develop some numerical model
  • Kenrick Ho - PGR in Music. I've worked with some music programming softwares before and looking to learn new programming skills.
  • Harrison Tan - PHD Student biology. Here for fun
  • Hi I'm Matt Beasley. I'm a Clinical Doctoral Research Fellow working in Radiotherapy research at Leeds. I'm hoping on using python to help with a couple of projects as part of my PhD over the next few years.
  • Hi, Sylwia Orynek. 1st year PhD student in School of Design. Researching 3D hand weaving and parametric software
  • Hi, I'm Andreia Miranda I'm portuguese and a PhD student in embryology I want to learn python to help me in my research
  • Peter Pittaway - PGR in Chemical Engineering, interested in automated discovery platforms for polymer synthesis, which requires some programming knowledge!
  • Saul Castaneda, PhD in Mechanical Engineering, Interested in programming in general.
  • Rowan Taylor, final year PhD in Molecular and Cellular Biology, FBS, want to learn Python for Bioinformatics
  • Nikesh Patel, Research Fellow in FBS, partly here for fun, mostly have alot of data to analyse and graphs to plot.
  • Natalie Chaddock - originally from Manchester - 3rd year PhD in Genetic Epidemiology/Bioinformatics - interested in learning Python for data scientist roles following PhD, and to branch out from constantly using R
  • Ruben Mujica-Mota, Assoc Prof in Health economics, learn Python to apply Machine Learning for causal inference research
  • Jim Zhong - Clinical Research Fellow in Leeds, background is radiology trainee and using python for PhD research looking at imaging and clinical datasets
  • Mohamed Derar- PhD Medicine- 3rd year - would like to use python to write scripts to analyse genomic data
  • Daliya Kaskirbayeva - Research Fellow, Academic Unit of Health Economics, data preparation
  • Prasetyaning Diah Lestari, Indonesia, 1st year Phd student at ITS
  • Albin Mejzini - London - 3rd year PhD Molecular Biology
  • Tarun Kakkar, Research Fellow, LICAMM
  • Hi, I'm Yue, a first-year PhD student in the School of Chemsitry

Further reading

Misc code snippets

Mounting google drive

Mount the google drive to your colab notebook. Will need to run this every time you start a new notebook.

# Connect my Google Drive to Google Colab from google.colab import drive drive.mount ('/content/gdrive')

Change directory to the training folder - the path here can be discovered by clicking on the three dots by the directory in the tree on the left (click the files button if the tree isn't visible) and selecting copy path, then pasting that into the box with Ctrl-V.

%cd /content/gdrive/MyDrive/Colab Notebooks/arc_training/swd1a-python-2021-10

Check your working directory for the notebook

! pwd

Download data:

# But first we need to download it !wget -O data.zip https://arctraining.github.io/python-2021-04/data/portal-teachingdb-master.zip

Ceiling division in python

There is no default operator for ceiling division in python but it's encouraged to use the builtin maths library.

import math math.ceil(7 / 3) 3

Extending the base list class

This isn't recommended but this implementation creates a child class of list within which there is an add_function. This adds together the items at the key passed and the next key.
Maybe you can spot the potential error cases?

class newList(list): def add_function(self, key): return super(newList, self).__getitem__(key) \ + super(newList, self).__getitem__(key + 1) val = [1, 2] val = newList(val) val.add_function(0)

Challenges

Challenge 1: A fibonacci sequence function

  1. Write a function that computes the fibonacci sequence to a given length.

E.g.

fib(10) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Hints

To solve this you may want to read about another python sequence called range. range(10) creates an immutable sequence of numbers that are useful for iterating over. However the function is a generator so behaves differently to lists and tuples.

>>> range(10) range(0, 10) >>> list(range(10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> for x in range(10): ... print(x) ... 0 1 2 3 4 5 6 7 8 9

Challenge 2: summary data

  1. How many recorded individuals are female F and how many male M?
  2. What happens when you group by two columns using the following syntax and then calculate mean values?
  • grouped_data2 = surveys_df.groupby(['plot_id', 'sex'])
  • grouped_data2.mean()
  1. Summarise weight values for each site in your data. HINT: you can use the following syntax to only create summary statistics for one column in your data. by_site['weight'].describe()