owned this note
owned this note
Published
Linked with GitHub
Shell lesson
---
**Shell quick-reference** -- http://swcarpentry.github.io/shell-novice/reference/
**Setup**
1. Download the shell lesson data: http://swcarpentry.github.io/shell-novice/data/shell-novice-data.zip
2. Unzip it on your desktop
**Shell challenge #1**
1. I want to use `ls` to list all of the files in all the directories under this one. This is called "recursion". Figure out which flag tells `ls` to list directories recursively.
```
ls -R
```
2. I want to use a shell command to download a file from a webserver. What command should I use?
```
curl
```
```
wget
```
**Shell challenge #2**
Starting in `~/Desktop/data-shell/north-pacific-gyre/2012-07-03`, give four different commands that change the working directory back to the home directory. *Tip: you can test your commands, then get back to the data directory with* **cd -**
**Shell challenge #3**
1. We're going to need the file `~/Desktop/data-shell/data/amino-acids.txt` for our analysis. Copy it into `~/Desktop/data-shell/north-pacific-gyre/2012-07-03`.
```
cp ../../data/amino-acids.txt .
```
2. Make a new directory called `scripts`. Move `goodiff` and `goostats` into it.
```
mkdir scripts
mv goostats scripts/
```
3. Check your computer's `Trash Can` or `Recycling` for the `README.txt` file that you removed. Is it there?
> No! These files are gone forever!
4. Make a directory called `backup`, then run the command `cp *.txt backup/`. Look in the `backup` directory. Can you tell what the `*` character means?
> Asterisk is a wildcard character representing any character
**Shell challenge #4**
1. Create a file named `hydrocarbons.txt` that contains the length of each file ending in `ane.pdb`.
```
wc *ane.pdb > hydrocarbons.txt
```
2. Sort the file **so that the longest pdb file is at the top.** *Hint: how would you learn about more flags to the sort command?*
```
sort -r -n hydrocarbons.txt
```
3. Show the contents of `ammonia.pdb`, and the contents of `methane.pdb`. Now, try typing `cat ammonia.pdb methane.pdb`. Why is `cat` short for "concatenate"?
> `cat` concatenates, or joins, file contents.
**Shell challenge #5**
1. Return to `~/Desktop/data-shell/north-pacific-gyre/2012-07-03`. Are all the files the same length?
> No
```
wc *
```
```
wc * | sort -n | less
```
2. How many total measurements were taken by the analyzer? (Ie, how many lines are in ALL the files?)
> 5040
```
wc NEN*
```
```
cat NEN* | wc
```
3. The `echo` command takes whatever you give it on the command line and sends it out its standard output. Try it out by typing `echo hello workshop!`
> This repeats (echos) back out anything you type in to it.
4. Show the contents of your `README` file to remind yourself what is in there. Now say `echo 2017-07-08 Found a short file >> README` and check its contents again. What just happened? What did the `>>` operator do to the output of the `echo` command?
> Single bracket writes (overwrites); double bracket appends.
**Shell challenge #6**
1. Return to `~/Desktop/data-shell/north-pacific-gyre/2012-07-03`. Create a file named `chemical-1.dat` that contains the *first line* of every data file in the directory.
```
for filename in N*.txt
do
head -n 1 $filename >> chemical-1.dat
done
```
2. Create a file named `chemical-3.dat` that contains the *third line* of every data file in the directory.
3. Remove `chemical-1.dat` and `chemical-3.dat`. With a single command, can you figure out how to make similar files for each of the first 10 lines? *Hint: you can nest loops!*
**Shell challenge #7**
1. Return to `~/Desktop/data-shell/north-pacific-gyre/2012-07-03`. Run the shell script `scripts/goostats` on every data file in the directory. The `goostats` script takes two arguments, an input file and an output file. Write the output files to the `analysis` directory.
```
for filename in NEN*
do
bash scripts/goostats $filename analysis/output-$filename
done
```
2. Create a shell script that takes a list of file names and returns the shortest file.
```
wc "$@" | sort -n | head -n 1
```
3. Run `goostats` specifying an input file but no output file. The error it gives you is somewhat ... cryptic. Run `bash -x scripts/goostats NENE02040Z.txt` -- how does the `-x` flag help you debug the script?
> `-x` is short for execute. It shows what command is being run along the way and lets you troubleshoot where an error came from.
Python lesson
---
**Python resources**
* Thinking Like a Computer Scientist. Link: http://openbookproject.net/thinkcs/python/english3e/index.html
* numpy documentation link: http://www.numpy.org/
* astropy tutorials: http://www.astropy.org/astropy-tutorials/
* pandas. link: http://pandas.pydata.org/
* data carpentry lessons using pandas: http://www.datacarpentry.org/python-ecology-lesson/
* scikit learn: http://scikit-learn.org/stable/
* PEP-8 (Style Guide for Python Code): https://www.python.org/dev/peps/pep-0008/
* PEP-257 (Python Docstring Conventions): https://www.python.org/dev/peps/pep-0257/
* `import this`
* `import antigravity`
**Transitioning from IDL to Python**
IDL to numpy syntax translations here: http://mathesaurus.sourceforge.net/idl-numpy.html
**Courses, tools, and code for astronomers learning Python**
- [AstroPy: a community Python library for astronomy](http://www.astropy.org/)
- [Using Python for astronomical data analysis (STSci workshop from 2016)](https://github.com/spacetelescope/AAS2016)
- [Astronomical Python: tools and packages collected by the University of Washington](http://staff.washington.edu/rowen/AstroPy.html)
- [Coursera: data-driven astronomy](https://www.coursera.org/learn/data-driven-astronomy)
- [Practical Python for astronomers](https://python4astronomers.github.io/)
**Conferences & Communities**
- [Pycon](https://www.pycon.org/)
- [Scipy](http://conference.scipy.org/)
- [Numfocus](https://www.numfocus.org/)
**Pyladies**
- main group: http://www.pyladies.com/
- DC chapter: https://www.meetup.com/dc-pyladies/
- [Learning to Code in DC](https://technical.ly/dc/2015/12/23/women-learn-to-code-in-the-new-year/)
**Questions from the group**
- Python style guide (PEP-8)
- How to write reusable "scripts" instead of Jupyter notebook?
**Setup**
* Make a new folder in your Desktop called python-novice-inflammation.
* Download python-novice-inflammation-data.zip and move the file to this folder. Link: http://swcarpentry.github.io/python-novice-inflammation/data/python-novice-inflammation-data.zip
* Also download python-novice-inflammation-code.zip and move it to the same folder. Link: http://swcarpentry.github.io/python-novice-inflammation/code/python-novice-inflammation-code.zip
* Extract these files to their own folders.
**Exercise**
Create matplotlib chart to show maximum value by day
```
max_daily_inflammation = data.max(axis = 0)
max_daily = matplotlib.pyplot.plot(max_daily_inflammation)
matplotlib.pyplot.show()
```
Create matplotlib chart to show minimum value for each patient
```
avg_plot = matplotlib.pyplot.plot(data.min(axis = 1))
matplotlib.pyplot.show()
```
**Exercise 4: https://swcarpentry.github.io/python-novice-inflammation/04-files/**
Loop through inflammation data and plot average, max, min
**Exercise 4a**
Plot the difference between the average of the first dataset and the average of the second dataset, i.e., the difference between the leftmost plot of the first two figures.
**Exercise 4b: Use each of the files once to generate a dataset containing values averaged over all patients:**
```
filenames = glob.glob('inflammation*.csv')
composite_data = numpy.zeros((60,40))
for f in filenames:
# sum each new file's data into as it's read
#
# and then divide the composite_data by number of samples
composite_data /= len(filenames)
```
**Exercise 5: Check your understanding of conditionals in Python**
https://swcarpentry.github.io/python-novice-inflammation/05-cond/
1. Write some conditions that print True if the variable a is within 10% of the variable b and False otherwise. Compare your implementation with your partner’s: do you get the same answer for all possible pairs of numbers?
2. Write some code that sums the positive and negative numbers in a list separately, using in-place operators. Do you think the result is more or less readable than writing the same without in-place operators?
```
x = 1 # original value
x += 1 # add one to x, assigning result back to x
x *= 3 # multiply x by 3
print(x)
```
3. Sorting into buckets: The folder containing our data files has large data sets whose names start with “inflammation-“, small ones whose names with “small-“, and possibly other files whose sizes we don’t know. Our goal is to sort those files into three lists called large_files, small_files, and other_files respectively. Add code to the template below to do this. Note that the string method startswith returns True if and only if the string it is called on starts with the string passed as an argument.
```
files = ['inflammation-01.csv', 'myscript.py', 'inflammation-02.csv', 'small-01.csv', 'small-02.csv']
large_files = []
small_files = []
other_files = []
```
Your solution should:
loop over the names of the files
figure out which group each filename belongs
append the filename to that list
In the end the three lists should be:
```
large_files = ['inflammation-01.csv', 'inflammation-02.csv']
small_files = ['small-01.csv', 'small-02.csv']
other_files = ['myscript.py']
```
**Exercise 6: writing functions!**
https://swcarpentry.github.io/python-novice-inflammation/06-func/
1. combining strings
2. selecting characters from strings
3. rescaling an array
## Homework!
- **Explore fruitful functions and test-driven function writing from "Thinking Like a Computer Scientist"**
http://openbookproject.net/thinkcs/python/english3e/fruitful_functions.html
- **Push the code you wrote in this workshop to a repository on your GitHub account**
Git exercises
-------------
**Pro Git (book)**
https://git-scm.com/book/en/v2
**Git Flight Rules** https://github.com/k88hudson/git-flight-rules
**Git workflow tips** https://sandofsky.com/blog/git-workflow.html
**How to undo (almost) anything in git** https://github.com/blog/2019-how-to-undo-almost-anything-with-git
**Understanding git rebase** https://medium.freecodecamp.org/git-rebase-and-the-golden-rule-explained-70715eccc372
![](https://media.giphy.com/media/1M2BVyXFvMbYY/giphy.gif)
**git rebase vs. git merge** https://medium.com/@porteneuve/getting-solid-at-git-rebase-vs-merge-4fa1a48c53aa
**astropy git workflow** http://docs.astropy.org/en/stable/development/workflow/development_workflow.html#new-to-git
**Exercise 1**
1. Make a new file about another planet in the solar system. Add a fact or two, then commit that file to the repository as well.
2. Here in the hack pad, type one attribute of a good log message.
-Succint and easy to understand
-https://xkcd.com/1296/ (beginning is good, went bad at the end)
-Be specific about alterations and additions.
-Describe intent of commit
-Add information about Neptune.
-Contains relevant information
**Exercise 2**
1. Delete the file about another planet that you created in our last exercise. (Remember, you can do so with the `rm` command.) What happens when you say `git status`? Then, recover the file with git checkout.
2. Go to http://github.com. If you have an account already, log in to it; if not, sign up for an account.
**Exercise 3**
On your GitHub project page, click "XXX Commits". Explore this interface for a few minutes; click on a commit, or on the buttons to the side of the commits. Think about the information this is giving you, and how you would get the same information from the git command line. Post anything interesting you find in the hack pad.
- The initial interface is similar to 'git log'
- clicking on a file is similar to 'git diff'
**Exercise 4**
1. If you were the Collaborator, assume the role of the Owner -- give your colleague access to your repo, and if you are now the Collaborator, clone the repo and practice changing it then pushing changes back to GitHub.
2. Play with GitHub some more. In particular, have a look at the commit viewer. Add a comment to your colleague's commit. Why might comments like this be useful in a collaborative environment? If you discover anything cool, add it to the Etherpad.
**Exercise 5**
Reverse roles with your partner. The person who resolved the conflict, make a new commit and push it to GitHub. The other partner, make a conflicting commit, then pull from GitHub and resolve the conflict.