---
title: 'Homework 1'
label: 'homework'
layout: 'post'
geometry: margin=2cm
tags: homework
---
# CS 100 Homework #1
### The Name Game
##### Due: October 4, 2022 at 10:00 pm
### Instructions
Please submit your completed spreadsheet to Gradescope.
Be sure to follow the CS 100 course collaboration policy as you work on this and all CS 100 assignments.
### Objectives
Students will learn the basics of spreadsheets. They will also practice exploratory data analysis, as they search for interesting stories hidden in data.
### Data
In this assignment, you'll be exploring a century of baby names in the U.S.. Download [babyNames.ods](https://cs.brown.edu/courses/cs100/homeworks/data/1/babyNames.ods). The data set provided includes the annual counts of a few select baby names since 1900, and all names during a few select years. These data were selected from this more [complete data set](https://raw.githubusercontent.com/jcbain/celeb_baby_names/master/data/NationalNames.csv).
Open `babyNames.ods` in Sheets. You will find multiple data sets, which include data about baby names, primarily from the last century:
* the name Hillary over the years
* the name Barack over the years
* the names of all babies born in 1900, 1910, 1920, etc., up to and including 1990
* the names of Amy's two daughters, Ella and Carmen, over the years
### Problems
#### 1. Names of Famous Politicians
##### A. When the name Hillary was popular…
A.1 In which year(s) was Hillary most common as a *male* name? How many male babies were named Hillary in those year(s)?
A.2 How many male babies in total were named Hillary in the 1900s (i.e., from 1900 to 1999); how many female babies?
A.3. Create a chart (the type of chart is up to you—think about what would be best for these data) that depicts the year vs. the count of females named Hillary. When was there a sharp spike in the counts? What might account for this? Be sure to use informative names for your axes, your legend, and your title.
A.4. Likewise, create a chart that depicts the year vs. the count of males named Hillary. Where is the spike in this chart?
A.5. Create a new column in which you compute the total of both males and females named Hillary each year. How many total babies were named Hillary in 1963? 1975? 1992? 2002?
##### B. And when the name Barack came on the scene…
B.1. Sort the "Barack" data frame in descending order based on "Male Count". In what year were the most American babies named Barack? And in what year were the fewest?
B.2. Why do you think we only have 8 years worth of data for Barack?
#### 2. British Royalty
2.1. Create a new sheet called "British Royalty". Label the first few rows in this sheet with at least eight names that are common among British royalty (e.g., Anne, Elizabeth, Margaret, Victoria, etc.), and label the columns with the years 1900 through 1990, counting by 10s.
2.2. Populate the entries in this table with the total number of *male* babies born with each of those names in each of those years. Then, create *another table* with the same column and row labels and populate this matrix with the total number of female babies born with those names in each of those years. *Hint*: use VLOOKUP.
2.3. Compute the proportion of male babies named after British Royalty, female babies named after British Royalty, and the total proportion of all babies named after British royalty in each of those decades. *Hint:* To represent British royalty, we suggest using the female names Anne, Elizabeth, Margaret, and Victoria, and the male names Phillip, George, William, and Charles.
2.4. Recompute the male, female, and total proportions under the assumption that all the seemingly mistaken entries (i.e., males named Elizabeth; women named Charles; etc.) have been corrected (i.e., the males named Elizabeth have been relabeled as females, and the females named Charles have been relabeled as males). How do the new proportions differ from the originals?
*Hint*: Do not clean the data manually. That would be too time consuming! Just alter your formulas so that they look up not only the females named Elizabeth, but the males as well; then sum these two values to arrive at the numerator. To correct the denominator, you must update both the male and female counts by adding and subtracting mistakenly gendered names, as necessary.
2.5. Create a chart that depicts the proportion of the US population named for British royalty in the 20th century.
#### 3. A Few Very Popular Names
In a children's book called *Lizard Music*, a large proportion of the lizards on the island are called Raymond. In other words, the single most popular name applies to, say, 75% of the population. Likewise, if many many parents in the US in a given year were to give their newborns the same names, it is conceivable that the (say) 10 most popular names could account for more than (say) 50% of the population.
3.1 Among the most popular names in 1990, how few does it take to cover 10% of the total population of babies that year? What about 25%?
3.2 How many names make up the top half of the distribution of female names? What about the bottom half? What about male names (both top and bottom)? How big are the differences? Do you have a theory about what might account for the differences?
#### 4. Exploration
Amy's two daughters are named Ella and Carmen. Their names have varied in popularity over the past 150 years. For this final question, you need not investigate Amy's two daughters' names. Feel free to select any name from this [data set](https://raw.githubusercontent.com/jcbain/celeb_baby_names/master/data/NationalNames.csv). *N.B.* As this data set is large, you should not include all of it in your final submission. If needed, ask the TAs for help filtering these data to include only *two* names of interest, one per sheet, like Ella and Carmen.
4.1 Plot the popularity of two names over time. Choose names like Ella and Carmen whose popularity varied.
4.2 Speculate about why these names escalated in popularity when they did.