Software Caprentry Intro to Python Workshop - Day 2

UCSD Biomedical Library Building, Classroom 4 - 9:00am - 4:00pm

November 19-20, 2019


This HackMD: https://bit.ly/2XxSArl

signin here

name, affiliation, dept/lab
Reid Otsuji, librarian, library
Ryan Johnson, librarian, library
Anand Saran, Postdoc, UCSD, Zarrinpar lab
LiYun Hsu, UCSD, School of Pharmacy MS of DDPM
Alexandra Akscyn, UCSD, School of Pharmacy MS of DDPM
Anjanei Dhayalan, UCSD, School of Pharmacy MS of DDPM
Thania Bejarano, UCSD, Urban Studies and Planning
Viona Deconinck, UCSD, Visual Arts
Amulya Lingaraju, UCSD, Postdoc, Zarrinpar Lab
Meng-Ping Hsieh, UCSD, MS in DDPM

Getting help after the workshop

Day 2 Git Notes

git is the code and local version control (i.e. on your machine). GitHub is the online, collaborative place

line endings

Macs:


git config --global core.autocrlf input

h4

Windows:

git config --global core.autocrlf true

Other settings

Text editor (sets text editor as nano):

git config --global core.editor "nano -w"

Checking global settings:

git config --list

What else can you configure?

git config --help

Working with git

Initialize a git repository in your current working directory

git init

This has created a hidden file .git

To find out what's going on with git in your folder:

git status

Note: don't nest version controlled folders - it gets very complicated very fast!

Within your git repository folder, all changes will be tracked. Once you make changes, run git status again to see what has changed.

Tell git to track a certain file using:

git add filename.ext

Record the changes to this file using:

git commit -m "[message of commit]"

Note: you must add a commit message, it is not an optional argument

Important!: Order of operations is git add then git commit

These 3 commangs - git status, git add, and git commit - are the majority of what you'll probably do with git!

See history of commits (most recent commit listed first):

git log

If your history gets long, see an abbreviated history using:

git log --oneline

See the changes between current status and last commit of file

git diff

(This will show diff of any files that have changed, or you can specify which file after git diff)

To see changes from staged (you've done git add but not git commit):
git diff --staged

Other git log options: use git log --patch to see filename for each change

Also: git log --name-only --oneline

Note: There are a lot of 'flag' options for these comments! (aka, what comes after --). The full documentation has a lot of information, but if you have an idea of what you're looking for, it can be useful. For instance, for git log: https://www.git-scm.com/docs/git-log

How can you see other previous versions, not just the previous? To see diff from one previous:

git diff HEAD~1 filename.txt

To see diff from two previous:

git diff HEAD~2 filename.txt

and so on.

You can also use the alphanumeric identifier for the commit. For example

git diff 07c1c262 mars.txt

will show difference between current and status at commit 07c1c262 for the document mars.txt.

We've seen how to view differences between files, what how can we actually roll back the status of the file to undo changes?

This is git checkout. For example, to roll back a file to a previous status:

git checkout 07c1c262 mars.txt

Can get back to most recent using:
git checkout master filename.txt

Working with GitHub

GitHub: https://github.com/

You can connect a "remote" (i.e., GitHub) repository location to a local on your folder. From within your local git repository location:

git remote add origin [enter your https://githubURLrepo.git URL]"

See what remote repos you have using

git remote -v

To send changes from local to remote GitHub, use git push:

git push origin master

"origin" here is the GitHub repo and "master" is master branch of local repo

To pull changes from remote GitHub to local, use git pull:

git pull origin master

You may get a pop up that requires you to log in to GitHub when pushing or pulling changes. You can permanently authenticate using:

git config credential.helper store
git push [your URL for repo: https://github.com/owner/repo.git]

Username for 'https://github.com': <USERNAME>
Password for 'https://USERNAME@github.com': <PASSWORD>

You can also use the following to "cache" your credentials on a Windows machine:
git config --global credential.helper wincred

Something useful is the .gitignore file. This is when you don't want to track something, or maybe a sub folder that you don't want to push to GitHub, but want to have accessible for you on your local within your git repo. You can create a .gitignore and set git to ignore certain files, file extensions, or folders. For example nano .gitignore to create and open this file, then add *.csv to this document, and git will ignore all files with extension .csv.

Python Day 2

f = 'inflamation-01.csv'
import matplotlib.pyplot as plt

plt.plot
import numpy as np
data = np.loadtxt(f, delimiter=',')

data

Exercise

In [44]: name = 'Newton'

In [45]: for i in range(4):
: print(i,name)
:
0 Newton
1 Newton
2 Newton
3 Newton

for name in name:
: print (name)

​​​​# Day 2 - afternoon - Python continued

Exercise: Print out the letters in the string 'Newton' using a for loop.

name = 'Newton'
list(name)
for i in name:
    print(i)

Exercise: Use a for loop to reverse a string.

for i in name: 
    newstring = i + newstring 
    print(newstring)

Exercise: Write a for loop to sum positive and negative numbers separately.

positive_sum = 0
negative_sum = 0
test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]

for i in test_list: 
    if i>0: 
        positive_sum = positive_sum + i 
    else: 
        negative_sum = negative_sum + i

Exercise: Write a for loop to sort file types into buckets.

from glob import glob

data_files = []
image_files = []
other_things = []

filenames = glob('*')
for filename in filenames:
   if 'inflammation' in filename and '.csv' in filename: # This is a data file.
      data_files.append(filename)
   elif '.png' in filename: # This is an image file.
      image_files.append(filename)
   else:                    # Neither a data nor an image file.
      other_things.append(filename)

Documenting a function.

def make_plots(filename):
  fig = matplotlib.pyplot.figure(figsize=(10,3))
  
  data = numpy.loadtxt(filename, delimiter=',')
  ax1 = fig.add_subplot(1, 3, 1)
  ax2 = fig.add_subplot(1, 3, 2)
  ax3 = fig.add_subplot(1, 3, 3)
  
  ax1.set_ylabel('Average')
  ax2.set_ylabel('Max')
  ax3.set_ylabel('Min')
  
  ax1.plot(numpy.mean(data, axis=0))
  ax2.plot(numpy.max(data, axis=0))
  ax3.plot(numpy.min(data, axis=0))
  
  return fig

Looking at Errors

In [1]: def favorite_ice_cream():
: ice_cream = ["chocolate","vanilla","strawberry"]
: print(ice_cream[3])
:

In [2]: favorite_ice_cream
Out[2]: <function __main__.favorite_ice_cream>

In [3]: favorite_ice_cream()

IndexError Traceback (most recent call last)
<ipython-input-3-2abb2966448f> in <module>()
> 1 favorite_ice_cream()

<ipython-input-1-5c427c5d4c54> in favorite_ice_cream()
1 def favorite_ice_cream():
2 ice_cream = ["chocolate","vanilla","strawberry"]
> 3 print(ice_cream[3])
4

IndexError: list index out of range


def some_function():
return 1

File "<ipython-input-8-0ebe323d3e9c>", line 2
return 1
^
IndentationError: expected an indented block

History for day 2

ls
f = 'inflammation-01.csv'
import matplotlib.pyplot as plt
plt.plot
import numpy as np
data = np.loadtxt(f, delimiter=',')
data
np.diff?
pwd
a = [0, 2, 5, 9, 14]
np.array(a)
a_array = np.array(a)
np.diff(a_array)
datadiff = np.diff(data, axis=1)
datadiff
matplotlib.pyplot.imshow(data)
plt.imshow(data)
plt.figure()
plt.imshow(datadiff)
plt.colorbar()
plt.figure()
plt.imshow(datadiff, cmap=plt.cm.bwr)
plt.colorbar()
history
plt.xlabel('Days')
plt.xlabel('Days')
plt.xlabel('Days')
plt.gcf().set_xlabel('Days')
plt.gca().set_xlabel('Days')
5**2
5*2
5**2
result = 1
for i in range(3):
    result = result*num
    
num = 5
for i in range(3):
    result = result*num
    
result
5**3
result = 1
for i in range(3):
    result = result*num
    print(result)
    
for i in range(3):
    result = result*num
    print(i,result)
    
result = 1
for i in range(3):
    result = result*num
    print(i,result)
    
animals = ['cat', 'dog', 'fish']
for i in animals:
    print(i)
    
range?
for animal in animals:
    print(animal)
    
    
name = 'Newton'
name = 'Newton'
list(name)
for i in name:
    print(i)
    
name
name = 'Newton'
list(name)
for i in name:
    print(i)
    
name
list(name)
name
name = 'Newton'
name = list(name)
for i in name:
    print(i)
    
name
name = 'Newton'
for i in name:
    print(i)
    
    
name
name = 'Newton'
for i in range(len(name)):
    print(name[i])
    
    
    
len(name)
name = 'Newton'
for i in range(len(name)):
    print(i,name[i])
    
    
    
name
newstring = ''
'New' + 'ton'
'New' + 'ton'
newstring
for i in range(len(name)):
    newstring = newstring + name[i]
    
newstring
newstring = ''
for i in range(len(name)):
    newstring = newstring + name[i]
    print(newstring)
    
for i in range(len(name)):
    newstring = newstring + name[i]
    print(i,newstring)
    
newstring = ''
for i in range(len(name)):
    newstring = newstring + name[i]
    print(i,newstring)
    
for i in name:
    newstring = newstring + i
    print(i,newstring)
    
newstring = ''
for i in name:
    newstring = newstring + i
    print(i,newstring)
    
newstring = ''
for i in name:
    newstring = i + newstring 
    print(newstring)
    
newstring = ''
newstring = ''
for i in name:
    newstring = i + newstring 
    print(i,newstring)
    
newstring = ''
for i, letter in enumerate(name):
    newstring = i + newstring 
    print(i,newstring)
    
    
newstring = ''
for i, letter in enumerate(name): 
    print(i,newstring)
    
    
    
newstring = ''
for i, letter in enumerate(name): 
    print(i,letter)
    
    
    
    
enumerate?
x = 5
coefficients = [2, 4, 3]
2*x**0 + 4*x**1 + 3*x**2
y = 2*x**0 + 4*x**1 + 3*x**2
for i, cc in enumerate(coefficients):
    print(i,cc)
    
y = 0
for i, cc in enumerate(coefficients):
    y = y + cc*x**i
    
y
for animal in animals:
    print(animal)
    
n = 0
for animal in animals:
    print(animal)
    n = n + 1
    print(n)
    
for i, animal in enumerate(animals):
    print(i,animal)
    
    
history
names = ['Curie', 'Newton', 'Turing']
names = ['Curie', 'Newtong', 'Turing']
names[1] = 'Newton'
names
name = 'Darwin'
name[0] = 'd'
names
name[1] = 'Darwin'
names = 'Darwin'
names = ['Curie', 'Newtong', 'Turing']
names[1] = 'Darwin'
names
name
name = 'darwin'
x = [1, 'Darwin', 3.14]
x
x = [['eggs', 'flour', 'sugar'],]
x = [['eggs', 'flour', 'sugar']]
x
x = [['eggs', 'flour', 'sugar'], ['onions', 'cucumbers', 'pepper']]
x
x[0]
x[0][0]
x[-1]
my_list = []
for baking_supply in x[0]:
    my_list.append(baking_supply)
    
my_list
num = 53
if num > 0:
    print('number is positive')
elif num==0:
    print('number is zero')
else:
    print('number is negative')
    
if num>50 and num<100:
    print('number is between 50 and 100')
elif num==0:
    print('number is zero')
else:
    print('number is negative')
    
if num<50 or  num>100:
    print('number is smaller than 50 or greater than 100.')
else:
    print('number is between 50 and 100.')
    
    
history
list(range(5))
x = 1
counter = 1
for i in range(10):
    counter = counter + 1
    print(counter)
    
for i in range(10):
    counter += 1
    print(counter)
    
counter = 1
for i in range(10):
    counter = counter*i
    print(counter)
    
    
for i in range(1,10):
    counter = counter*i
    print(counter)
    
    
counter = 1
for i in range(1,10):
    counter = counter*i
    print(counter)
    
    
counter = 1
for i in range(1,10):
    counter *= counter
    print(counter)
    
    
for i in range(1,10):
    counter *= i
    print(counter)
    
    
for i in range(1,10):
    counter = counter*i
    print(counter)
    
    
counter = 1
for i in range(1,10):
    counter -= i
    print(counter)
    
    
    
positive_sum = 0
negative_sum = 0
test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
for i in test_list: 
    if i>0: 
        positive_sum = positive_sum + i 
    else: 
        negative_sum = negative_sum + i
        
positive_sum
negative_sum
for i in test_list: 
    if i>0:
        positive_sum = positive_sum + i 
    elif i==0:
        pass # Do nothing.
    else:
        negative_sum = negative_sum + i
        
ls
large_files = []
small_files = []
other_things = []
from glob import glob
filenames = glob("*")
filenames
string = "I'm hungry"
string = 'I'm hungry'
filenames
large_files
small_files
other_things
data_files = []
image_files = []
other_things = []
string
if 'hun' in string:
    print("'hun' is a substring of "+string)
    
'inflammation-' in 'inflammation-01.csv'
filenames
other_things
other_things.append('script.py')
other_things
from glob import glob

data_files = []
image_files = []
other_things = []

filenames = glob('*')
for filename in filenames:
   if '.csv' in filename:   # This is a data file.
      data_files.append(filename)
   elif '.png' in filename: # This is an image file.
      image_files.append(filename)
   else:                    # Neither a data nor an image file.
      other_things.append(filename)
      
data_files
image_files
other_things
data_files
from glob import glob

data_files = []
image_files = []
other_things = []

filenames = glob('*')
for filename in filenames:
   if 'inflammation' in filename and '.csv' in filename:   # This is a data file.
      data_files.append(filename)
   elif '.png' in filename: # This is an image file.
      image_files.append(filename)
   else:                    # Neither a data nor an image file.
      other_things.append(filename)
      
data_files
from glob import glob

data_files = []
image_files = []
other_things = []

filenames = glob('*')
for filename in filenames:
   if 'inflammation' in filename:   # This is a data file.
      data_files.append(filename)
   elif '.png' in filename: # This is an image file.
      image_files.append(filename)
   else:                    # Neither a data nor an image file.
      other_things.append(filename)
      
data_files
from glob import glob

data_files = []
image_files = []
other_things = []

filenames = glob('*')
for filename in filenames:
   if 'inflammation' in filename and 'csv' in filename:   # This is a data file.
      data_files.append(filename)
   elif '.png' in filename: # This is an image file.
      image_files.append(filename)
   else:                    # Neither a data nor an image file.
      other_things.append(filename)
      
data_files
ls
ls *png
glob('*.png')
def make_plots(filename):
  fig = plt.figure(figsize=(10,3))
  
  data = np.loadtxt(filename, delimiter=',')
  ax1 = fig.add_subplot(1, 3, 1)
  ax2 = fig.add_subplot(1, 3, 2)
  ax3 = fig.add_subplot(1, 3, 3)
  
  ax1.set_ylabel('Average')
  ax2.set_ylabel('Max')
  ax3.set_ylabel('Min')
  
  ax1.plot(np.mean(data, axis=0))
  ax2.plot(np.max(data, axis=0))
  ax3.plot(np.min(data, axis=0))
  
  return fig
  
make_plots?
def make_plots(filename):
  "Function to make plots of patient data."
  fig = plt.figure(figsize=(10,3))
  
  data = np.loadtxt(filename, delimiter=',')
  ax1 = fig.add_subplot(1, 3, 1)
  ax2 = fig.add_subplot(1, 3, 2)
  ax3 = fig.add_subplot(1, 3, 3)
  
  ax1.set_ylabel('Average')
  ax2.set_ylabel('Max')
  ax3.set_ylabel('Min')
  
  ax1.plot(np.mean(data, axis=0))
  ax2.plot(np.max(data, axis=0))
  ax3.plot(np.min(data, axis=0))
  
  return fig
  
make_plots?
def make_plots(filename):
  """
  Function to make plots of patient data.
  
  Example: fig = make_plots(data_filename)
  """
  fig = plt.figure(figsize=(10,3))
  
  data = np.loadtxt(filename, delimiter=',')
  ax1 = fig.add_subplot(1, 3, 1)
  ax2 = fig.add_subplot(1, 3, 2)
  ax3 = fig.add_subplot(1, 3, 3)
  
  ax1.set_ylabel('Average')
  ax2.set_ylabel('Max')
  ax3.set_ylabel('Min')
  
  ax1.plot(np.mean(data, axis=0))
  ax2.plot(np.max(data, axis=0))
  ax3.plot(np.min(data, axis=0))
  
  return fig
  
  
make_plots?
np.mean?
def favorite_ice_cream():
    ice_creams = ['chocolate', 'vanilla', 'strawberry']
    print(ice_creams[3])
    
favorite_ice_cream()
def some_function()
def some_function():
return 1
print(a)
print(b)
count = 1
Count = 1
count is Count
count == Count
file_handle = open('myfile.txt', 'r')
file_handle = open('inflammtion-01.csv', 'r')
file_handle = open('inflammation-01.csv', 'r')
clear
numbers = [1.5, 2.3, 0.7, -0.001, 4.4]
total = 0
for n in numbers:
    total += n
    
total
total = 0
for n in numbers:
    assert n>0
    total += n
    
for n in numbers:
    assert n>0, "Data should only contain positive values."
    total += n
    
Select a repo