# Python Data Analysis Basics


```python=
ages = []
for row in moma:
date = row[6]
birth = row[3]
if type(birth) == int:
age = date - birth
else:
age = 0
ages.append(age)
final_ages = []
for a in ages:
if a > 20:
final_age = a
else:
final_age = "Unknown"
final_ages.append(final_age)
```
In order to do this, we can take advantage of the fact that, behind the scenes, Python stores strings in a list-like structure, which lets us slice them in the same way we would a list.
Let's look at a simple example:

In order to use this technique with our ages, we'll need to:
Convert the integer value to a string.
Use slicing to slice all but the last character.

> Create an empty list, decades, to store the artist decade data.
> Iterate over the values in ages, and in each iteration:
> If age is "Unknown", assign it to the variable decade
> If age isn't "Unknown":
> Convert the integer value to a string, assigning it to the variable decade.
> Use list slicing to remove the final character of decade.
> Use the + operator to add the substring "0s" to the end of the string decade.
> Append decade to the decades list.
```python=
decades = []
for age in ages:
if age == "Unknown":
decade = age
else:
decade = str(age)
decade = decade[:-1]
decade = decade + "0s"
decades.append(decade)
```

A frequency table shows us how many of each item we have on our list:

The logical steps we'll need to follow are shown in the diagram below:

```python=
decade_frequency = {}
for d in decades:
if d not in decade_frequency:
decade_frequency[d] = 1
else:
decade_frequency[d] += 1
```
#### [The str.format() method is a powerful tool that helps us write easy-to-read code while combining strings with other variables.](https://docs.python.org/3/library/stdtypes.html#str.format )
```python=
artist = "Pablo Picasso"
birth_year = 1881
template = "{name}'s birth year is {year}"
output = template.format(name=artist, year=birth_year)
print(output)
```
The code block below might look a bit long and intimidating at first glance, but everything in it is a concept you've seen before. We've added comments so you can follow along with the logic.


```python=
def artist_summary(artist):
num_artworks = artist_freq[artist]
template = "There are {num} artworks by {name} in the data set"
output = template.format(name=artist, num=num_artworks)
print(output)
artist_summary("Henri Matisse")
```

The output of this code is below:
Your bank balance is $12,345.68
```python=
pop_millions = [
["China", 1379.302771],
["India", 1281.935991],
["USA", 326.625791],
["Indonesia", 260.580739],
["Brazil", 207.353391],
]
template = "The population of {} is {:,.2f} million"
for country in pop_millions:
name = country[0]
pop = country[1]
output = template.format(name, pop)
print(output)
```
We use the [dict.items() method](https://docs.python.org/3/tutorial/datastructures.html#looping-techniques) which returns each of the key-value pairs from our dictionary one-at-a-time. This helps us loop over dictionaries more easily. We can assign both the key and value (in that order) when we define our loop:
```python=
gender_freq = {}
for row in moma:
gender = row[5]
if gender not in gender_freq:
gender_freq[gender] = 1
else:
gender_freq[gender] += 1
for gender, num in gender_freq.items():
template = "There are {n:,} artworks by {g} artists"
print(template.format(g=gender, n=num))
```
[Python Documentation: Format Specifications](https://docs.python.org/3/library/string.html#formatspec)
[PyFormat: Python String Formatting Reference](https://pyformat.info/)
###### tags: `python`