owned this note
owned this note
Published
Linked with GitHub
# Software Caprentry Intro to Python Workshop
### UCSD Biomedical Library Building, Classroom 4 - 9:00am - 4:00pm
### November 19-20, 2019
---
### This HackMD: https://bit.ly/2XduD8q
Workshop page: https://ucsdlib.github.io/2019-11-19-UCSD
**Name, Affiliation, Dept. or lab**
Reid Otsuji, librarian, Library
Ryan Johnson, librarian, Library
Tamara Bozich, Library
Julie Cakici, UCSD, FMPH, JDP Bloss Lab
LiYun Hsu, UCSD, School of Pharmacy MS of DDPM
Justin Shaffer, UCSD, Pediatrics (Knight Lab)
Viona Deconinck, UCSD, Visual Arts
Anjanei Dhayalan, UCSD, MS-DDPM
Alexandra Akscyn, UCSD, School of Pharmacy, Drug Dev MS
Kendra Scheer, UCSD, Undergraduate, Paleoethnobotony Lab
Joy Kumagai, UCSD, SIO (Aburto Lab)
Anand Saran, UCSD, Postdoc, Zarrinpar lab
Erica Maissy, UCSD, Biomedical Sciences (Zarrinpar Lab)
Amulya Lingaraju, UCSD, Postdoc, Zarrinpar Lab
Etran Chane McComic, UCSD,MS-DDPM
Meng-Ping Hsieh, UCSD, MS in DDPM
kaiyang tan,ucsd, undergraduate economics
Thania Bejarano, UCSD, Urban Studies and Planning
Razvan Amironesei, UCSD, Philosophy
# Day 1 - Morning - Shell
# Link to stuff
https://tinyurl.com/yx3kf4ay
https://drive.google.com/open?id=1Mm22A7Fk6ajO53Nha48jkZuPAupuyrX1
## Add notes here:
##### Notes on terminal
'ls' command that tells you where you are
'man ls' opens manual [quit using 'q'] for Windows 'man ls' doesnt work. Use ls --help for help directory
-letter is a flag
q is short for quit
'nano' opens a text editor, save as name.txt
'-F' displays a slash immediately after each pathname that is a directory
~ root
la shows hidden files
Cancel = control X on Mac
mkdir = make directory
cp = copy
Rename = mov
cd = change directory
cd .. = goes up one directory
autocomplete with tab
ls = lists
cat = concatenate
? = single character wild card
Control C = rids of anything you have typed
Head = finding the top 10 lines
Echo = add phrases in a file
>> creates another phrase
> = creates a phrase
| = doing things in between demands
loop = rename files, over and over
##### Notes on project
Download the data-shell file from the Google Drive.
```
ls #this grey box in the HackMD is called a code chunk
```
```
pwd
```
```
man ls
# for windows [command] --help -will bring up the manual
# man [command] -will bring up manual in mac terminal
```
### Challenge 1
```
You can also use two options at the same time. What does the command ls do when used with the -l option? What about if you use both the -l and the -h option?
Some of its output is about properties that we do not cover in this lesson (such as file permissions and ownership), but the rest should be useful nevertheless.
```
## Navigating files
```
ls Desktop
```
```
cd
cd .. #up one directory level
cd ~
cd [path]
cd -
```
```
cd data-shell move into the folder
```
```
#combining flags - can run them together
ls -l -a
ls -la
```
## working with files
```
nano #bulit-in text editor
#in nano to save a file use ^O for WriteOut, name the file then use ^X to exit
```
```
mkdir #make directory
```
```
touch [filename.txt] #will create an empty txt file
```
```
mv #move command
```
```
cp
cp -r thesis/ data/
```
```
* #wildcard
? #single character wildcard
```
### Pipes and Filters
```
wc #word count
wc -l #length count
wc -c # character count
```
```
wc *.pdb | sort #basic pipe with 2 commands
```
```
crtl + c #will kill all running processes
```
```
sort
```
```
head #view top 10 files
head -n [number of lines to view]
tail #view bottom files
```
```
echo #print display
echo $(date) # display timestamp
```
### pipes
```
| #this is a pipe
```
```
sort -n lengths.txt | head -n 1
```
```
sort -n lengths.txt | tail -n 1
```
```
wc -l *.pdb | sort -n
```
```
wc -l *.pdb | sort -n | head -n 1
```
```
cat animals.txt | head -n 5 | tail -n 3 | sort -r > final.txt
```
follow north-pacific-gyre examples in lesson
## loops
```
# loop format example
for thing in list_of_things
do
operation_using $thing #indent code here with spaces
done
```
```
for filename in basilisks.dat minitaur.dat unicorn.dat
do
head -n 2 $filename | tail -n 1
done
```
if loops is run befor e;crtl + c will kill the loop
### Shell notes
https://swcarpentry.github.io/shell-novice/reference/
`ls` - lists what is in the directory
`pwd` - lists the path
`man ls` - opens manual
`q` - quit from the manual's interface
`ls -F` - adds dash after directory folders (if there is a backslash it indicates it is a folder
`ls -l` - lists file details in long format (`ls -long` will also work)
bash profile is a hidden file to customize terminal to desired effects (ex font color)
`ls -h` makes file size human readable
Navigate to google drive folder; download data folder to desktop
Find the folder on the desktop using terminal
`cd Desktop` - moves to the Desktop
`cd ..` - moves you back one
`cd ~` - moves you back to root
up arrow can toggle through last commands
`cd` and `ls` command will only look in current path unless you specify the full path
`ls -la` - uses both the long and the a command (a shows hidden files)
`.bash_profile` in data-shell is where you can customize font color ect.
#### Nano
`nano` opens a blank text editor, saving in nano saves auto to current file directory
`nano draft.txt` - will open a nano with a file name of "nano_draft.txt"
`^` - means control, example ctrl+X in nano = exit
'File Name to Write: path/to/where/you/want/flowers.txt'
typing tab after will autofill, if there is multiple it will list options
#### Manipulating directories and files
`mkdir` - makes directory
`mkdir thesis` - makes directory in current path called thesis
`touch delp.txt` - creates a new blank file without opening
strive for useful simple file names
data_trim_removedAdapters (camelCase = capital for new letters)
use accepted abbreviations in field, make sure they align with norms
`mv` move a file between directories
`mv draft.txt ../` moves the file up a directory'
`mv path/draft.txt ./`
`pwd` - returns full path of the present working directory
`mv textfilename newtextfilename` changes the name of the file
`cp -r thesis/ data/`
to navigate down one `cd t`(tab to be lazy)should fill thesis
`rm` short for remove
`touch quotations.txt` adds new (empty) txt file
`rm -i quotations.txt` asks you before removing (maybe add to alias in future)
`source filename` refreshes saved files
`rm -r thesis` removes the thesis directory
`ls molecules/` what's in the molecules
#### Wildcards
example `ls *t??ne.pdb` will return both 'ethane' and 'methane'
`wc` is word count
`*` is wildcard
`wc *.pdb` returns data on all .pdb file types in folder
`wc -l *.pdb > lengths.txt`
`sort_lengths.txt`
lengths, words, characters is how it appears
`sort -n lengths.txt > lengths_sorted.txt` (sorts by numeric length and creats new file from smallest to largest)
`cat lengths_sorted.txt` output text of the file to terminal
`echo time` echo is a print function
`sort -n lengths.txt | head -n 1` returns the first line (lowest value)
`sort -r lengths.txt` - returns descending order
`cat animals.txt | head -n 5 | tail -n 3 | sort -r > final.txt`
`ls *[AB].txt` - brackets are an or command
`ls *A.txt; ls *B.txt` semi-colon is an "and" command
`head -n 5 basilisk.dat minotaur.dat unicorn.dat` in creatures directory will return the data
beginning a loop, adds a > when you hit enter
```
for filename in basilisk.dat minotaur.dat unicorn.dat
> do
> head -n 2 $filename | tail -n 1
> done
```
Outputs:
CLASSIFICATION: basiliscus vulgaris
CLASSIFICATION: bos hominus
CLASSIFICATION: equus monoceros
# Day 1 - afternoon - Intro Python
## Setup
apaloczy@ucsd.edu -
Install basic python libraries if you don't have them already. Miniconda is a good option:
https://docs.conda.io/en/latest/miniconda.html
Then open a terminal and install three more libraries:
```
conda install ipython numpy matplotlib
```
Download the dataset we'll be working with:
http://swcarpentry.github.io/python-novice-inflammation/data/python-novice-inflammation-data.zip
Then unzip it and cd to the directory 'data'.
To be able to access python type "ipython"
Import Library 'numpy':
```
import numpy
```
"==" checks to see if one variable is the same as the other. Returns a True or False.
Indexing starts at 0, not 1... so to access the second row and second column you have to do
`data[1,1]` not `data[2,2]`
`data[0,-1] # '-' counts from the last`
`data['row', 'column']`
":" is a slicer
```
time.ctime() # will print the current time
```
```
numpy.max(data[0,:])
numpy.max(data, axis=)
numpy.max(data, axis=1)
```
Axis = 1 -> horizontal "sweep" the columns
Axis = 0 -> vertical "sweep" the rows
```
numpy.mean() # gets the average
numpy.min() # gets the minimum value
```
```
# import so we can make plots
import matplotlib.pyplot
```
```
matplotlib.pyplot.imshow(data)
```
```
matplotlib.pyplot.show() #run this if your figure doesn't pop up
```
```
# adding x label
matplotlib.pyplot.xlabel('Days', fontsize = 20) # adding a label and adjusting fontsize
# adding y label
matplotlib.pyplot.ylabel('Patient ID', fontsize = 20) # adding an label and adjusting fontsize
```
```
# open new figure
fig = matplotlib.pyplot.figure(figsize=(10,3)) # this line is called figure object
fig.clf() # clear figure
fig.add_subplot(1,3,1)
axis1 = fig.add_subplot(1,3,1)
axis2 = fig.add_subplot(1,3,2)
axis3 = fig.add_subplot(1,3,3)
#axis labels
axis1.set_ylabel('Average', fontsize=15)
axis2.set_ylabel('max', fontsize=15)
axis3.set_ylabel('min', fontsize=15)
#plot
axis1.plot(numpy.mean(data, axis=0))
axis2.plot(numpy.max(data, axis=0))
axis3.plot(numpy.max(data, axis=0))
axis1.cla # if you want to clear a label on an axis
```
### Loops
```
for n in range(1,9):
file = 'inflammation-0' + str(n) + '.csv'
data = numpy.loadtxt(file, delimiter=',')
```
make sure you check the indentations in the loop
```
for n in range(1,9):
file = 'inflammation-0' + str(n) + '.csv'
if n==1:
data = numpy.loadtxt(file, delimiter=',')
else:
data = data + numpy.loadtxt(file, delimiter=',')
data
```
```
from glob import glob
filenames = glob('inflammation*.csv')
filenames.sort()
for filename in filenames:
print(filename)
fig = matplotlib.pyplot.figure(figsize=(10,3))
data = numpy.loadtxt(filename, delimiter=',')
ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2)
ax3 = fig.add_subplot(1, 3, 3)
ax1.set_ylabel('Average')
ax2.set_ylabel('Max')
ax3.set_ylabel('Min')
ax1.plot(numpy.mean(data, axis=0))
ax2.plot(numpy.max(data, axis=0))
ax3.plot(numpy.min(data, axis=0))
figname = filename[:-3] + 'png'
fig.savefig(figname)
```
```
matplotlib.pyplot.close('all') # close all open plot windows.
```
## functions
```
def fahrenheit2celsius(temp_in_fahrenheit):
temp_in_celsius = (temp_in_fahrenheit - 32)*5/9
return temp_in_celsius
```
## Paste Exercise code below:
```
for i in temperatures_fahr:
print(fahrenheit2celcius(i))
```
```
def make_plots(filename):
fig = matplotlib.pyplot.figure(figsize=(10,3))
data = numpy.loadtxt(filename, delimiter=',')
ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2)
ax3 = fig.add_subplot(1, 3, 3)
ax1.set_ylabel('Average')
ax2.set_ylabel('Max')
ax3.set_ylabel('Min')
ax1.plot(numpy.mean(data, axis=0))
ax2.plot(numpy.max(data, axis=0))
ax3.plot(numpy.min(data, axis=0))
return fig
```
#### save figure function
```
def save_figures(filename, fig):
figname = filename[:-3] + 'png'
fig.savefig(figname)
```
#### Cleaned up loop for plotting and saving figures.
```
for filename in filenames:
print(filename)
fig = make_plots(filename)
save_figures(filename, fig)
```
# save iPython history (within IPython)
```
%save -r python_session_history 1-9999999
```
# IPython history for this session
```
numpy.max(data, axis=0)
import numpy
numpy.max(data, axis=0)
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
numpy.max(data, axis=0)
numpy.max(data)
numpy.max(data, axis=1)
numpy.mean(data, axis=1)
numpy.min(data, axis=1)
numpy.min(data, axis=1)
data
import matplotlib.pyplot
matplotlib.pyplot.imshow(data)
matplotlib.pyplot.colorbar()
matplotlib.pyplot.clf()
matplotlib.pyplot.imshow(data)
matplotlib.pyplot.colorbar()
matplotlib.pyplot.colorbar()
matplotlib.pyplot.colorbar()
matplotlib.pyplot.colorbar()
matplotlib.pyplot.clf()
show()
matplotlib.pyplot
matplotlib.pyplot.imshow(data)
matplotlib.pyplot.colorbar()
matplotlib.pyplot.set_xlabel('Days')
matplotlib.pyplot.xlabel('Days')
matplotlib.pyplot.xlabel('Days', fontsize=20)
matplotlib.pyplot.ylabel('Patient ID', fontsize=20)
average_inflammation = numpy.mean(data, axis=0)
average_inflammation
matplotlib.pyplot.figure()
matplotlib.pyplot.plot(average_inflammation)
fig2 = matplotlib.pyplot.figure()
fig1 = matplotlib.pyplot.figure()
matplotlib.pyplot.plot(average_inflammation, figure=fig1)
matplotlib.pyplot.plot(average_inflammation, figure=fig2)
matplotlib.pyplot.plot(average_inflammation, figure=fig1)
matplotlib.pyplot.plot(average_inflammation, figure=fig2)
matplotlib.pyplot.xlabel('Days', fontsize=25)
matplotlib.pyplot.ylabel('Average inflammation', fontsize=25)
data
fig = matplotlib.pyplot.figure(fingsize=(10, 3))
fig = matplotlib.pyplot.figure(figsize=(10, 3))
fig = matplotlib.pyplot.figure(fingsize=(10, 3))
fig = matplotlib.pyplot.figure(figsize=(10, 3))
fig.add_subplots(1, 3, 1)
fig.add_subplot(1, 3, 1)
fig.clf()
axis1 = fig.add_subplot(1, 3, 1)
axis2 = fig.add_subplot(1, 3, 2)
axis3 = fig.add_subplot(1, 3, 3)
axis1.set_ylabel('Average', fontsize=15)
axis2.set_ylabel('Max', fontsize=15)
axis3.set_ylabel('Min', fontsize=15)
axis1.plot(numpy.mean(data, axis=0))
axis2.plot(numpy.max(data, axis=0))
axis3.plot(numpy.min(data, axis=0))
axis1.cla()
axis1.plot(numpy.mean(data, axis=0))
axis1.set_ylabel('Average', fontsize=15)
matplotlib.pyplot.figure()
matplotlib.pyplot.plot(numpy.std(data, axis=0))
get_ipython().run_line_magic('ls', '')
for n in range(9):
file = 'inflammation-0' + str(n) + '.csv'
print(n, str(n))
'string' + 1
'string' + str(1)
'string' + '1'
for n in range(9):
file = 'inflammation-0' + str(n) + '.csv'
print(file)
for n in range(9):
file = 'inflammation-0' + str(n) + '.csv'
numpy.loadtxt(file)
for n in range(1, 9):
file = 'inflammation-0' + str(n) + '.csv'
data = numpy.loadtxt(file, delimiter=',')
for n in range(1, 9):
file = 'inflammation-0' + str(n) + '.csv'
data = numpy.loadtxt(file, delimiter=',')
number = 3
if number>0:
print('number is positive.')
elif number==0:
print('number is zero')
else:
print('number is negative')
number = 0
if number>0:
print('number is positive.')
elif number==0:
print('number is zero')
else:
print('number is negative')
for n in range(1, 9):
file = 'inflammation-0' + str(n) + '.csv'
if n==1:
data = numpy.loadtxt(file, delimiter=',')
else:
data = data + nump.loadtxt(file, delimiter=',')
for n in range(1, 9):
file = 'inflammation-0' + str(n) + '.csv'
if n==1:
data = numpy.loadtxt(file, delimiter=',')
else:
data = data + numpy.loadtxt(file, delimiter=',')
data
from glob import glob
glob('inflammation-*.csv')
filenames = glob('inflammation-*.csv')
filenames
filenames.sort()
filenames
filenames
patient = 'Laura'
list(patient)
filenames = glob('inflammation-*.csv')
filenames
filenames
filenames.sort()
filenames
from glob import glob
filenames = glob('inflammation*.csv')
filenames.sort()
for filename in filenames:
print(filename)
fig = matplotlib.pyplot.figure(figsize=(10,3))
data = numpy.loadtxt(filename, delimiter=',')
ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2)
ax3 = fig.add_subplot(1, 3, 3)
ax1.set_ylabel('Average')
ax2.set_ylabel('Max')
ax3.set_ylabel('Min')
ax1.plot(numpy.mean(data, axis=0))
ax2.plot(numpy.max(data, axis=0))
ax3.plot(numpy.min(data, axis=0))
matplotlib.pyplot.close()
matplotlib.pyplot.close('all')
from glob import glob
filenames = glob('inflammation*.csv')
filenames.sort()
for filename in filenames:
print(filename)
fig = matplotlib.pyplot.figure(figsize=(10,3))
data = numpy.loadtxt(filename, delimiter=',')
ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2)
ax3 = fig.add_subplot(1, 3, 3)
ax1.set_ylabel('Average')
ax2.set_ylabel('Max')
ax3.set_ylabel('Min')
ax1.plot(numpy.mean(data, axis=0))
ax2.plot(numpy.max(data, axis=0))
ax3.plot(numpy.min(data, axis=0))
matplotlib.pyplot.close('all')
from glob import glob
filenames = glob('inflammation*.csv')
filenames.sort()
for filename in filenames:
print(filename)
fig = matplotlib.pyplot.figure(figsize=(10,3))
data = numpy.loadtxt(filename, delimiter=',')
ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2)
ax3 = fig.add_subplot(1, 3, 3)
ax1.set_ylabel('Average')
ax2.set_ylabel('Max')
ax3.set_ylabel('Min')
ax1.plot(numpy.mean(data, axis=0))
ax2.plot(numpy.max(data, axis=0))
ax3.plot(numpy.min(data, axis=0))
close('all')
matplotlib.pyplot.close('all')
filename
filename[3]
filename[0]
filename[-3:]
filename[:-3]
filename[:-3] + 'png'
filename.replace('i', 'a')
filename.replace('csv', 'png')
get_ipython().run_line_magic('ls', '')
from glob import glob
filenames = glob('inflammation*.csv')
filenames.sort()
for filename in filenames:
print(filename)
fig = matplotlib.pyplot.figure(figsize=(10,3))
data = numpy.loadtxt(filename, delimiter=',')
ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2)
ax3 = fig.add_subplot(1, 3, 3)
ax1.set_ylabel('Average')
ax2.set_ylabel('Max')
ax3.set_ylabel('Min')
ax1.plot(numpy.mean(data, axis=0))
ax2.plot(numpy.max(data, axis=0))
ax3.plot(numpy.min(data, axis=0))
figname = filename[:-3] + 'png'
fig.savefig(figname)
get_ipython().run_line_magic('ls', '')
matplotlib.pyplot.close('all')
temp = 72
(temp - 32)*5/9
def fahrenheit2celsius(temp_in_fahrenheit):
temp_in_celsius = (temp_in_fahrenheit - 32)*5/9
return temp_in_celsius
fahrenheit2celsius
fahrenheit2celsius(72)
fahrenheit2celsius(51)
fahrenheit2celsius(451)
temperature_fahr = [32, 72, 90, 451]
fahrenheit2celsius(51)
animals = ['cat', 'dog', 'fish']
for i in animals:
print(i)
get_ipython().run_line_magic('paste', '')
filenames
for filename in filenames:
print(filename)
make_plots(filename)
for filename in filenames:
print(filename)
make_plots(filename)
plt.close('all')
matplotlib.pyplot.close('all')
def save_figures(filename, fig):
figname = filename[:-3] + 'png'
fig.savefig(figname)
for filename in filenames:
print(filename)
make_plots(filename)
save_figures(filename, fig)
get_ipython().run_line_magic('ls', '-lthr')
get_ipython().run_line_magic('paste', '')
for filename in filenames:
print(filename)
fig = make_plots(filename)
save_figures(filename, fig)
matplotlib.pyplot.close('all')
for filename in filenames:
print(filename)
fig = make_plots(filename)
save_figures(filename, fig)
matplotlib.pyplot.close('all')
get_ipython().run_line_magic('paste', '')
get_ipython().run_line_magic('history', '')
get_ipython().run_line_magic('save', "'python_session_history'")
get_ipython().run_line_magic('save', 'python_session_history')
get_ipython().run_line_magic('save', 'python_session_history 1-100')
get_ipython().run_line_magic('ls', '-lthr')
get_ipython().system('gedit python_session_history.py')
get_ipython().run_line_magic('save', 'python_session_history 1-9999')
```