Renato Alves & Toby Hodges
EMBL Bio-IT Project
23 & 24 March 2020
EMBL Bio-IT
Sponsored by de.NBI
https://hackmd.io/nFqIR8nqQ86FeFNY-Em8nA?both
You can access the course material at https://github.com/tobyhodges/ITPP
itpp-master
folder to DesktopType 'x' below when you've downloaded and unzipped the folder to your Desktop
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxXxxxxxxxXxXXx
It's important to keep learning and playing with your new skills and to start applying them to some small projects! Otherwise, what you have learned in this course will be lost soon…
More python:
Scientific computing / data science in python:
Coding Exercises:
Sessions marked with a * will take place with the whole group.
Self-directed sessions will take place in breakout rooms with
support from the instructors/helpers.
We will do our best to ensure breaks happen as listed below.
Length of discussion & demo sessions may vary from those listed,
depending on number of questions received, etc.
Day 1 | |
---|---|
09:30 | Introduction & Installation Troubleshooting * |
10:00 | Self-Paced Work with Support |
11:00 | Morning Break * |
11:15 | Working with Lists (Discussion, Demo & Exercise Walkthroughs) * |
11:30 | Self-Paced Work with Support |
12:30 | Lunch Break * |
13:30 | Debugging (Discussion, Demo & Exercise Walkthroughs) * |
14:00 | Self-Paced Work with Support |
15:00 | Afternoon Break * |
16:00 | Discussion, Demo, Feedback & Wrap-up * |
17:00 | End |
Day 2 | |
---|---|
09:30 | Recap & Discussion |
10:00 | Self-Paced Work with Support |
11:00 | Morning Break * |
11:15 | Nested Data Structures (Discussion, Demo & Exercise Walkthroughs) * |
11:30 | Self-Paced Work with Support |
12:30 | Lunch Break * |
13:30 | Reading Data from a File (Discussion, Demo & Exercise Walkthroughs) * |
14:00 | Self-Paced Work with Support |
15:00 | Afternoon Break * |
16:00 | Plotting Exercise Walkthrough & Wrap-up * |
17:00 | End |
write your questions here - we will review them in the discussion sessions
Q: What does [*]
before a cell mean ?
[*]
means the cell is executing. It should turn into a number once done. If it doesn't the python process might be stuck. In the menu above the notebook you should see "Kernel", in that menu select "Interrupt Kernel" or if this doesn't work, "Restart kernel". If you restart the kernel you will lose any defined variables.Q: Is there a shortcut to delete a cell?
X
or dd
. See also the Jupyter essentials
section further down in this document.Q: what does this operation ^
mean?
**
(e.g 3**3 == 27)
.^
is a bitwise operation. See this stackoverflow question for more information.Q: Is there a difference between typing "10 + 3" and "10+3" without spaces in the command? I've seen that the given answer is the same, but maybe this distinction is important in further uses.
Q: what is the difference between an object and a variable?
number=10
, the variable is number
and the object the number 10
). Every entity in Python is an object.Q: I couldn't get the command keys to work
Esc
to exit insert mode. The blue box around should disappear. At this point the command keys should work.Q: I don't understand where is the error in the final exercise "Debugging exercise", please could you explain what kind of error is?
Q: When using .pop(), it doesn't let me choose which mayonnaise I want to remove telling me it can only take one string or command at a time: no numbers then to define which mayonnaise I want gone… How do I tell the program which mayonnaise in my list I want gone?
.pop()
for example use .pop(0)
instead of .pop(mayonnaise)
if you want delete your mayonnaise in position 0 of the list.pop(mayonnaise)
, the last mayonnaise is being removed. If you do .pop(number)
the word (not necessarily mayonnaise) is being removed. That's what I experienced. :Dhelp(shopping.pop)
should help here. GR is correct, .pop()
takes a number representing a position in the list. .pop()
doesn't accept strings so .pop("mayonnaise")
doesn't work. Note as well that .pop(mayonnaise)
is referring to a variable name mayonnaise
. If you want to remove one of the "mayonnaise"
entries you can use .remove()
..remove()
will remove the first occurrence in the list. You will need to call .remove()
multiple times to remove all occurrences..pop(position)
. You can use .index()
to find the first occurrence of mayonnaise
. help(.index)
will tell you how to start looking for words after a given position which will allow you to find subsequent occurrences.Q: can .sort(), sort by something else than alphabetical order?
help(variable.sort)
and you will find the key=
attribute. key=
is supposed to be a function that defines how to sort elements. See here for more examples.Q: Probably not important, but I've noticed that once you append mayonnaise into the list, the list is displayed visually with every word in a different line; however, when you remove all the "mayonnaise"s added and execute, then the list appears again in a single line like in the very beginning. Is there any specific reason for that?
print()
the result should be consistent.Q: What is a "syntax error EOF"?
EOF
stands for End of File. This usually happens if Python is looking for a character but it reached the end of the file before finding it. One case where this would happen is if you open a quote for a string but never close that string (e.g. name = "This string is missing the end quote
).Q: Before exercise 2.4 it is mentioned that even in cases in which indirect loops are needed, there are ways to do it that are more efficient than using "range" objects. Could you provide some examples on this? For example another option that came to mind was using something like "for i,j in zip(list_A, list_B):" Would this be a more efficient option or a less efficient one?
for element in my_list
. Using zip()
is efficient if you need to loop over pairs of elements or list elements that are linked. In Python 3 zip()
is a very efficient function and doesn't create a duplicate of the lists being zipped.Q: How do you get to use the "Markdown essentials" or how are they relevant? I'm assuming it's not possible to use them in the JupyterLab since I haven't found out a way to do so (not even using print()
). So… in which situation/where would you use them to see the expected visual return? Is it mentioned only so that we can use it here when asking questions and such?
Markdown
instead of Code
. Select it in the drop-down menu on the top of the notebook page.Q: in Jupyter, there are different cells, where we run different functions… now in Spyder, there is this main code that we run on this left-side window, and if we want to run different things, we open different files, is that correct?
F9
or the corresponding button on the toolbar.Q: I tried running only some selected lines. Did not work :( spyder runs whole file.
F9
key to run selected lines in Spyder. There is also a button for this: find the green arrow in the menu at the top and then move 3 buttons to the right; there you go. The symbol looks like a square followed by a vertical line followed by an arrow.Q: When formatting text: How can I include formatting such as padding and aligning (Example:{:>10}'.format('test'))
when more than one different elements are displayed (for example in exercise 2.5)?
{}
and you can add padding instructions to each individually. (e.g "{:>10}{:>5}".format("one", "two")
)Q: How I can resolve ex Exercise 1.3?
words
already contains a list of words. You can start by obtaining the fourth word (fourth = words[3]
) and then the third letter (fourth[2]
) or without using an intermediate variable (words[3][2]
).Q: How do I run the spyder interface? Does it come with the .zip folder?
Q: What exactly iselif:
?
elif
is a shorthand for "else if" - you should use it to provide alternative tests to run after the initial if
statement. For example:if colour == "red":
print("you lose!")
elif colour == "black":
print("you win!")
elif colour == "green":
print("Everybody wins! :)")
will print "you lose!" if the variable colour
has the value "red" but, if it doesn't have that variable, the next test is performed. This next test checks whether colour
has the value "black": if it does, then "you win!" is printed; if not, then the next test (whether colour
has the value "green") is run. Please let us know below if this is still unclear!
Q: (Exercise 3.2) The cheat sheet says that myDict.keys()
returns the list of keys of the dictionary. But list Operations like myDict.keys().sort()
do not work. Is there an easy way to make the statement's return item a list?
myDict.keys()
used to return a list. In python 3, this is no longer the case; it returns a view instead, which is faster in some applications. This is a mistake in the cheat sheet. You can use list(myDict.keys())
to get the list and then sort it, i.e. list(myDict.keys()).sort()
.str
items and myList.sort()
returns None
in my environment.
myList.sort()
always returns None
. It sorts the list "in place", which indeed doesn't work here because the list of keys wasn't assigned to any variable. To have the sorted list returned, use sorted(myList)
. See the following examples:
myList = [2,1,3]
print(myList)
>> [2,1,3]
myList.sort()
print(myList)
>> [1,2,3]
print([2,1,3].sort())
>> None
print(sorted([1,2,3]))
>> [1,2,3]
Q: However, the behavior seems different if myList
contains only str items?
sortedKeyList
, it does. Will think about it. :-) Thank you.A: Correct! myList.sort()
modifies an existing list. sorted(myList)
creates and returns a modified version of the input list.
theDict = {
'A':{},
'C':{},
'B':{},
}
keyList = list(theDict.keys())
print(keyList)
# >> ['A', 'C', 'B']
sortedKeyList = keyList.sort() # keyList.sort() does not return a list. The sort() functions operates on a list!
print(sortedKeyList)
# >> None
print(sorted(keyList))
# >> ['A', 'B', 'C']
Q: What is a cell (in Spyder)? and a Kernel?
Q: For exercise 2.7 the following code works:
def hypot(a, b):
return (a**2 + b**2)**0.5
hypot(3, 4)
I dont quite understand the missing sqrt part, I keep getting name math is not defined
sqrt
function is not directly available in python. It lives in the math
library and as such it needs to be imported. If you use import math
you can then use math.sqrt()
.Q: For exercise 3.2 - what is the best way to sort the names in the group of groups? I did a sort function for each subgroup. Not very elegant:
GroupA.sort()
GroupB.sort()
GroupC.sort()
for group in AllGroups.keys():
print(group)
for student in AllGroups[group]:
print(student)
.sort()
), it's a good idea to think about how to move this into a loop to remove the repetition. Also, have a look at sorted(list)
as an alternative to list.sort()
; it could be helpful here.Q: (Chapter 4 Getting Data From Files) I don't get any output (in Spyder) if I try to print every line from datafile
directly. Only the creation of lines
and then printing every line from lines
yields the desired output which defeats the whole point of trying to save memory capacities. :-/
dataFile = open('speciesDistribution.txt', 'r')
allDataLines = dataFile.readlines()
for line in allDataLines:
line = line.strip()
print(line)
dataFile = open('speciesDistribution.txt', 'r')
for line in dataFile:
line = line.strip()
print(line)
Q: It works for me now, too. I didn't remove/uncomment the allDataLines = dataFile.readlines()
. As soon as I uncomment this line everything works as intended.
dataFile.readlines()
or by for line in dataFile
), that line "disappears".Q: Can you explain the difference between (), [], {} in the command? Thnx!
[]
is roughly equivalent to list()
, {}
is roughly equivalent to dict()
and ()
can refer to tuple()
when used with a comma (e.g. (1,2,4)
or (1,)
), to call a function as in list()
or simply as mathematical precedence in 2 * (3 + 1)
. There are additional contexts where these symbols have different meanings. Some of the keywords to follow on for that are: list comprehension, dictionary comprehension, generators, sets.Q: I still can't figure out how to do this (previous question referring to exercise 3.2). If I move group.sort()
into the loop it doesn't work
group
inside the loop. Can you make it be the list you want to sort?Q: When talking about reading files, r+ mode is mentioned as being reading + write. Does this "write" function work like "w" or more like "append" mode. I mean, will "r+" clear the text in the file?
Q: I've realised that in Exercise 2.4, both making use of:
for i in range(len(shopping)):
print(shopping[i], amounts[i])
and
for i in range(len(amounts)):
print(shopping[i], amounts[i])
works fine and returns what you expect. I don't quite understand why, though. I would have expected (and lost quite some time trying that…) that I had to use the for
statement and the range(len())
function in both "shopping" and "amounts", such that it knows that I want it to go through all indices inside both lists. How making the loop only indicating that for the given indices in a given range in ONE of the lists takes into consideration to go through both lists and return all indices?
len()
we will effectively be working with one number. So the code is doing something like: len(...) -> 7
and then range(7)
. As long as both shopping
and amounts
have 7 elements, we will get the same result. If you want to see a different behavior between the two cases you pasted, try adding an additional item to the shopping list with shopping.append(item)
and repeat the first case.range(len())
function, your list components behave as items and not indices (items meaning parts in a list and indices meaning positions in a given range), so what's confusing to me is that you don't have to specify that you will treat something as an index in a given range rather than an item on a list. I hope this is not too confusig, I don't know how to ask it in a clearer way…
for
loop behaves is what we place in front of it. Perhaps compare and run the following code:print(shopping)
for item in shopping:
print(item)
indices = list(range(len(shopping)))
print(indices)
for index in indices:
print(index)
I
# Notice that indices was derived from shopping but effectively is just a list with numbers
gridplot
is now no longer in io
but instead in layouts
. So try from bokeh.layouts import gridplot
.group_scores = list(AllGroupResults[group].values())
The error says TypeError: 'list' object is not callable
. If I delete 'list' and the surrounding brackets it works.
print(list)
? It should show <class 'list'>
.
list
somewhere in the notebook. If you have something like list=[1,2,3]
then the built-in function list()
is "gone".
list
function is being overwritten). Alternatively you can also use del list
to delete the variable list
returning to the original list()
behavior.gridplot
command takes a list of lists of figures
as first (and only provided) argument. Plotting does work as intended when layout
is changed to:layout = gridplot([[fig1, fig2],
[fig3, None]])
- A:
This section has been moved to the end of the current document
A
≠ a
shift + enter
: execute cellctrl + enter
: execute cellalt + enter
: execute cell open a new code cell immediately belowesc
: exit edit mode -> command modeWhen in command mode, the following keyboard shortcuts can be used:
key | effect |
---|---|
C | copy selected cell |
V | paste copied cell below selected cell |
X | cut (copy+remove) selected cell |
A | insert new, blank cell above selected cell |
B | insert new, blank cell below selected cell |
H | display help sheet |
[
and (
bracketscode
(i.e. monospaced font) with backticksand multiple
lines
of code
with triple backticks
on their own lines
import numpy
def print_explanation:
explanation = '''
you can even turn on syntax highlighting for
most languages by naming the language immediately
after the opening three backticks
'''
print(explanation)
ipython
. I'm confused.
ipython
is another interface to python. For the present exercises you should be able to use the Jupyter Notebook or Jupyter Lab interface. Check above for the Jupyter essentials
section which includes keyboard shortcuts to execute and edit a notebook cell.import
python modules on day two. You have also installed anaconda which includes the conda
package manager with which you can install additional python packages.numbers = "0123456789"
/
in the help page of a function?
Name / Affiliation / Operating system e.g. Toby Hodges / Zeller Team, EMBL Heidelberg, MacOS
Advice
pr
and press tab it will get completed to print
, you may get a dropdown menu if there are different optionsPlease type 'x' next o the chapter that you were working on at the end of day 1:
Can someone clarify what are chapters 5 to 7 in the list above - There are only 4 notebooks?
Put your exercise solution requests below (e.g. 1.2; 2.3). put an 'X' next to the question number to "upvote" other people's requests
Please fill out this survey at the end of day 2:
https://de.surveymonkey.com/r/denbi-course?sc=hdhub&id=000260
for group in AllGroupResults:
print()
print('Results for {}'.format(group))
for student in AllGroupResults[group]:
print('{}:\t{}'.format(student, AllGroupResults[group][student]))
group_scores = list(AllGroupResults[group].values())
mean_score = sum(group_scores) / len(group_scores)
print('Mean score for {}: {}'.format(group, mean_score))
my_name = 'Toby'
def get_feedback_message(score):
if score < 60:
return 'You must try harder next time. Are you taking this course seriously?'
elif 60 <= score <= 79:
return "Well done, that's a good score."
else:
return "Congratulations! That's an excellent score!"
template = '''Dear {},
I have finished marking the assessment for your seminar group, {}.
You scored {}.
{}
Kind regards,
{}'''
for group in AllGroupResults:
for student in AllGroupResults[group]:
score = AllGroupResults[group][student]
print(template.format(student,
group,
score,
get_feedback_message(score),
my_name))
print()
Clarifying what are key/values and how the inner dictionaries are values of the outer dictionary
for group in AllGroupResults:
group_dictionary = AllGroupResults[group]
print(group_dictionary)
#for student in AllGroupResults[group]:
for student in group_dictionary:
print(student)
# FOR X IN DICTIONARY:
# X -> key of that DICTIONARY
# Y = DICTIONARY[X] ; Y -> value corresponding to key X
score = AllGroupResults[group][student]
Please fill out this survey at the end of day 2:
https://de.surveymonkey.com/r/denbi-course?sc=hdhub&id=000260