# Cleaning and Preparing Data in Python (Basics)
The MoMA data is in a CSV file called artworks.csv. Here's what the first five lines of that file look like:

```python=
# import the reader function from the csv module
from csv import reader
# use the python built-in function open()
# to open the children.csv file
opened_file = open('artworks.csv')
# use csv.reader() to parse the data from
# the opened file
read_file = reader(opened_file)
# use list() to convert the read file
# into a list of lists format
moma = list(read_file)
# remove the first row of the data, which
# contains the column names
moma = moma[1:]
```
---
### str.replace()
> In order to do this, we'll learn the str.replace() method. The str.replace() method is like a "find and replace" tool for strings. Let's look at the individual steps required to change our string:
>
> We need to find all instances of the old substring, "red".
> We need to replace each of those instances with the new substring, "blue".
> To achieve this using str.replace(), we need to provide two arguments:
>
> old: The substring we want to find and replace.
> new: The substring we want to replace old with.
> Both of these are positional arguments, so we can use them without specifying their names. Let's look at what this looks like in the diagram below:
> 
>
> We may decide that we can just replace the substring "r" with "R". Let's look at what happens when we do that:
> 
>
> Because the substring "r" was found in the words favorite and color, we have replaced them giving us "favoRite" and "coloR". Be careful where you might have a substring hidden inside other words, and if this happens, just use a longer substring:
> 
> ```python=
> age1 = "I am thirty-one years old"
> age2 = age1.replace("one", "two")
> ```
### [str.title()](https://docs.python.org/3/library/stdtypes.html#str.title)
> The str.title() method returns a copy of the string with the first letter of each word transformed to uppercase (also known as title case).
> ```python=
> my_string = "The cool thing about this string is that it has a CoMbInAtIoN of UPPERCASE and lowercase letters!"
> my_string_title = my_string.title()
> print(my_string_title)
>
> The Cool Thing About This String Is That It Has A Combination Of Uppercase And Lowercase Letters!
> ```
> - Instructions
>
> Create a function called strip_characters(), which accepts a string argument and:
Iterates over the bad_chars list, using str.replace() to remove each character.
Returns the cleaned string.
Create an empty list, stripped_test_data.
Iterate over the strings in test_data, and on each iteration:
Use the function you created earlier to clean the string.
Append the cleaned string to the stripped_test_data list.
```python=
test_data = ["1912", "1929", "1913-1923",
"(1951)", "1994", "1934",
"c. 1915", "1995", "c. 1912",
"(1988)", "2002", "1957-1959",
"c. 1955.", "c. 1970's",
"C. 1990-1999"]
bad_chars = ["(",")","c","C",".","s","'", " "]
def strip_characters(string):
for char in bad_chars:
string = string.replace(char,"")
return string
stripped_test_data = []
for s in test_data:
test_str = strip_characters(s)
stripped_test_data.append(test_str)
print (stripped_test_data)
```
```python=
test_data = ["1912", "1929", "1913-1923",
"(1951)", "1994", "1934",
"c. 1915", "1995", "c. 1912",
"(1988)", "2002", "1957-1959",
"c. 1955.", "c. 1970's",
"C. 1990-1999"]
bad_chars = ["(",")","c","C",".","s","'", " "]
def strip_characters(string):
for char in bad_chars:
string = string.replace(char,"")
return string
stripped_test_data = ['1912', '1929', '1913-1923',
'1951', '1994', '1934',
'1915', '1995', '1912',
'1988', '2002', '1957-1959',
'1955', '1970', '1990-1999']
def process_date(date):
if "-" in date:
split_date = date.split("-")
date = round((int(split_date[0])+int(split_date[1])) / 2)
else:
date = int(date)
return date
processed_test_data = []
or d in stripped_test_data:
date = process_date(d)
processed_test_data.append(date)
for row in moma:
date = row[6]
date = strip_characters(date)
date = process_date(date)
row[6] = date
```
###### tags: `python`