Introduction to Python programming with applications in bioinformatics 2022

* # Introduction to Python programming with applications in bioinformatics 2022 ### NB: You might have to force reload the Canvas page with the HackMD for it to update ## Resources Course website: https://uppsala.instructure.com/courses/71521 HackMD: https://hackmd.io/b-RP-xOdSQysr-wO-eY7kg #### Teachers and Teaching assistans: Nina Norgren Dimitris Bampalikis Nanjiang Shu Jon Ander Novella Jeanette Tångrot Kristina Benevides Pär Larsson Ville R Pedro Allison Churcher ## Basic instructions for HackMD - Start a new question with the ```-``` symbol - You can add code using - [ ] ```python ```python <everything_in_here_is_code>```! ``` ## Questions? Any questions you have, feel free to write them here, and the TAs will try to answer them as soon as possible. ### Day 1 - Would it fix the issue with 3.14 + 2 resulting in an inaccurate number if you make 2 into a float as well (e.g. 2.0)? - Not really, as the problem is that (all) floats are represented as base 2 (binary) fractions, so the problem will be the same. - You may go around this using the decimal module as follows: ```python >>> Decimal(3.14 + 2).quantize(Decimal('.01'), rounding=ROUND_DOWN) ## Output = Decimal('5.14') ``` - Can booleans in Python also be read as 0 and 1 (False and True) like in R? - Yes you can use them like in R. Here is a way to test that: ```python True == 1 ``` Should return True. There is extended explanation in Day's 4 slides about that topic. ### Day 2 - How to do a pairwise sum of two uneven lists? ```python= [x + y for x,y in itertools.zip_longest([1, 2], [1, 2, 3], fillvalue=0)] ``` - Is there a way to count with len() instead of the following solution? ```python= electronics_count = 0 for line in blocket: cls = line.strip().split('\t') if cls[1] == 'electronics': electronics_count += 1 print(electronics_count) ``` - There is a way, but it's not recommended: ```python= fh = open('blocket_listings.txt', 'r', encoding = 'utf-8') electronics_count = 0 el_list = [] for line in fh: cls = line.strip().split('\t') if cls[1] == 'electronics': el_list.append(cls[1]) print(len(el_list)) fh.close() ``` The reason this solution is bad, is that we are storing the word electronics in a list in order to use the `len` function afterwards. However, it could be useful if w wanted to store the prices for all electronics, such as here: ```python= fh = open('downloads/blocket_listings.txt', 'r', encoding = 'utf-8') electronics_count = 0 el_list = [] for line in fh: cls = line.strip().split('\t') if cls[1] == 'electronics': el_list.append(int(cls[3])) print(len(el_list)) print(sum(el_list)) fh.close() ``` - Since many people asked, I am posting a sample solution for a variation of the IMDB exercise, which stores all the movies with the best rating. In the original dataset, there is only one movie with rating 9.3, but assuming there were two, here is a sample solution for finding them both: - (Please keep in mind that the rating of the movie `Unforgiven` was changed in order to show the outcome of the code.) ```python= # Go through the list of the movies and store the best at each time fh = open('downloads/250.imdb', 'r', encoding = 'utf-8') # List to store the "best" ratings while looping through the dataset best = [] # Current best rating best_rating = 0 for line in fh: if not line.startswith('#'): cols = line.strip().split('|') rating = float(cols[1].strip()) # if the rating is higher or equal to the previous highest, append it to the list of best movies if rating >= best_rating: best_rating = rating best.append([rating,cols[6]]) fh.close() print(best) # This would print a list containing all the best movies at the time # [[8.5, 'Paths of Glory'], [8.6, 'Léon: The Professional'], [8.6, 'La La Land'], [9.3, 'Unforgiven'], [9.3, 'The Shawshank Redemption']] # This result is obviously wrong, so we need to go through this list and select only the actual best movies # Go through the short list above and pick the movies that have rating equal to the best rating real_best = [] for result in best: if best_rating == result[0]: real_best.append(result) print(real_best) # This would print the actual list with the best movies, which contains only rating equal to 9.3 # [[9.3, 'Unforgiven'], [9.3, 'The Shawshank Redemption']] ``` ### Day 3 - ### Day 4 - ### Day 5 - ### Project - ## Project solution presented on Friday afternoon ## Feedback If you have any feedback during the course, feel free to add it here: - ## Post-course questions If you have any questions regarding exercises etc. during next week (v42), you can write them here. We will not be as fast answering them as during this week, but we will keep an eye on it.