Python Data Analysis Basics

# Python Data Analysis Basics ![](https://i.imgur.com/3i7HsJF.png) ![](https://i.imgur.com/XWYZBxa.png) ```python= ages = [] for row in moma: date = row[6] birth = row[3] if type(birth) == int: age = date - birth else: age = 0 ages.append(age) final_ages = [] for a in ages: if a > 20: final_age = a else: final_age = "Unknown" final_ages.append(final_age) ``` In order to do this, we can take advantage of the fact that, behind the scenes, Python stores strings in a list-like structure, which lets us slice them in the same way we would a list. Let's look at a simple example: ![](https://i.imgur.com/hP2nXE1.png) In order to use this technique with our ages, we'll need to: Convert the integer value to a string. Use slicing to slice all but the last character. ![](https://i.imgur.com/WbamCbf.png) > Create an empty list, decades, to store the artist decade data. > Iterate over the values in ages, and in each iteration: > If age is "Unknown", assign it to the variable decade > If age isn't "Unknown": > Convert the integer value to a string, assigning it to the variable decade. > Use list slicing to remove the final character of decade. > Use the + operator to add the substring "0s" to the end of the string decade. > Append decade to the decades list. ```python= decades = [] for age in ages: if age == "Unknown": decade = age else: decade = str(age) decade = decade[:-1] decade = decade + "0s" decades.append(decade) ``` ![](https://i.imgur.com/gTcRVPj.png) A frequency table shows us how many of each item we have on our list: ![](https://i.imgur.com/LwGwkHv.png) The logical steps we'll need to follow are shown in the diagram below: ![](https://i.imgur.com/nqcKSfu.png) ```python= decade_frequency = {} for d in decades: if d not in decade_frequency: decade_frequency[d] = 1 else: decade_frequency[d] += 1 ``` #### [The str.format() method is a powerful tool that helps us write easy-to-read code while combining strings with other variables.](https://docs.python.org/3/library/stdtypes.html#str.format ) ```python= artist = "Pablo Picasso" birth_year = 1881 template = "{name}'s birth year is {year}" output = template.format(name=artist, year=birth_year) print(output) ``` The code block below might look a bit long and intimidating at first glance, but everything in it is a concept you've seen before. We've added comments so you can follow along with the logic. ![](https://i.imgur.com/jTVHdcJ.png) ![](https://i.imgur.com/YwZyqzb.png) ```python= def artist_summary(artist): num_artworks = artist_freq[artist] template = "There are {num} artworks by {name} in the data set" output = template.format(name=artist, num=num_artworks) print(output) artist_summary("Henri Matisse") ``` ![](https://i.imgur.com/btRizzt.png) The output of this code is below: Your bank balance is $12,345.68 ```python= pop_millions = [ ["China", 1379.302771], ["India", 1281.935991], ["USA", 326.625791], ["Indonesia", 260.580739], ["Brazil", 207.353391], ] template = "The population of {} is {:,.2f} million" for country in pop_millions: name = country[0] pop = country[1] output = template.format(name, pop) print(output) ``` We use the [dict.items() method](https://docs.python.org/3/tutorial/datastructures.html#looping-techniques) which returns each of the key-value pairs from our dictionary one-at-a-time. This helps us loop over dictionaries more easily. We can assign both the key and value (in that order) when we define our loop: ```python= gender_freq = {} for row in moma: gender = row[5] if gender not in gender_freq: gender_freq[gender] = 1 else: gender_freq[gender] += 1 for gender, num in gender_freq.items(): template = "There are {n:,} artworks by {g} artists" print(template.format(g=gender, n=num)) ``` [Python Documentation: Format Specifications](https://docs.python.org/3/library/string.html#formatspec) [PyFormat: Python String Formatting Reference](https://pyformat.info/) ###### tags: `python`