Homework 6: Python Basics

Due: Monday, November 18 at 11:59 pm

Setup

You should have VSCode set up from the first week of the course. In case you don't, refer to the VSCode Installation and Setup Guide

Create a python file called hw6_code.py to include your code.
Create a python file called test_hw6.py and put the following lines at the top of that file:

import pytest
from hw6_code import *

All of the tests that you write should be written in test_hw6.py as assert expressions in functions that start with test_.

Remember your resources!

Big Picture

The goal of this assignment to give you practice with the basics of writing list-processing programs in Python and using dictionaries. It mostly follows what we've done in class, while also showing a couple of useful Python built-in functions.

The Assignment – Election Time

Make sure to only use Python concepts we've talked about in lecture when solving these problems. In particular, if you have other Python knowledge of other techniques, don't use them here if you want full credit.

In particular, don't use the built-in Python count function. That defeats the point of the problems.

Also, make sure that your inputs for each function are in the order that they are mentioned in the directions. Even if your functions are correct, if your inputs are not in the correct order, it will fail the autograder.

The Autograder we use for Python assignments gives slightly different error output than the Autograder we used for Pyret. If you encounter an error, double-check your function names and inputs. Also make sure that you can run your code and your testing code in VSCode without error. If that doesn't resolve the issue, make a post on Ed with a link to your submission.

CS111 is conducting a vote for the best bee character. We're going to write some programs to tally votes. We'll represent a vote as a single string containing the name of the bee that got voted for. We'll represent the collection of votes recorded by a single voting machine as a list of votes (list of string).

For every function we ask you to write, you should have a corresponding testing function in test_hw6.py. Name each test function the same as the original function but with test_ prepended. In other words, the tests for a function check_votes would be in a function called test_check_votes.

Part 1: Basic Vote Counting

Task 1: Write a function called name_matches. It should take in a vote (string) and a name (string). It should return a boolean indicating whether the vote is for the given name, regardless of how either vote or the name have been capitalized.

Note: You can convert a string s to lowercase in Python by writing s.lower(). As in Pyret, you have Boolean operators and, or, and not. Equality is checked using == (and inequality using !=)

Task 2: In hw6_code.py, write a function called any_votes_for that takes in a name and a list of votes and returns a boolean indicating whether any of the votes were for the given name (case insensitive). Your solution should use a for loop.

Task 3: In hw6_code.py, write a function count_votes_for that takes in a name and a list of votes. It should return the number of votes that were for the given name (case insensitive). Your solution should use a for loop.

Task 4: In hw6_code.py, write a function called got_more_votes that determines which of two names got more votes. Your function should take in two names and a list of votes. It should return the name that got more votes (case insensitive). If there is a tie, return the string “tie”.

Part 2: Election Integrity

Voting is a topic for which people are reasonably concerned about representation, access, and accuracy (among other things). Not surprisingly, there are many arguments and proposals regarding how to properly use computing technology as part of elections: should we vote on computers so they can count ballots? allow voting online? use digital scanners to process votes marked on paper? And so on. Each option offers benefits and risks around participation in, and accuracy of, elections.

Task 5: - Read this brief article on the use of internet voting in Estonia. The article has a quote on "paper, plus audits," as a strategy for keeping elections secure. VerifiedVoting.org is an organization of experts on election security and election technology. Take a look at their issues page and read the brief summaries of the issues. (Nothing to write for this task.)

With people concerned about the validity of the election results, the decision has been made to run some checks against the votes to see if they appear plausible.

One issue is that write-in votes can be ambiguous. Imagine an election in which two people with the first name "Ruth" have been trying to get elected. Some votes have come through for "Ruth Achebe", some for "Ruth Flynn", and some for just "Ruth". The "Ruth" votes need to get handled for accurate results.

Someone has proposed counting "Ruth" votes that have a last name, using the presence of a space in the name string to indicate whether a vote has both a first and a last name.

Task 6: Write a function clean_votes that takes a list of votes and returns a new list of votes containing only the votes from the original list that contain at least one space (represented by the string " "). Your solution should use a for loop. Hint: in class, we saw a for loop being used to update a running sum variable. How can you use a for loop to build out a new list to return?

Task 7: Write a test_clean_votes function (in the test_hw6.py file) that explores various scenarios in which this specific approach would and wouldn't behave as we expected. If you think there are situations that the "check for space" situation would handle poorly for a goal of proper vote counting, include a test case that illustrates the undesirable behavior (write your test to pass what clean_votes actually does, but include a comment about why the result is undesirable.) In general, pay particular attention to the set of strings and vote lists that you include in your test lists. We'll be looking to see how well you explore the space of names and vote lists when grading this question.

Even with removing the single-name votes, people are still concerned that there might be weird issues in the votes. Given prior trends in the district where the votes have been cast, there are expectations on what percentage of the vote specific candidates might get (within a margin of error). We'll write a program to check whether a candidate's votes are within such an expected range.

Task 8: In hw6_code.py, write a function check_percent that takes the name of a candidate, a list of votes, an expected percentage (decimal between 0.0-1.0, inclusive), and an error tolerance (decimal). The function returns one of "higher", "in-range", or "lower". It returns "in-range" if the percentage of votes that the candidate received is within the error tolerance (inclusive) of the expected percentage. Otherwise, "higher" or "lower" is returned based on where the actual percentage lands relative to the error tolerance. For example, if the expected percentage is 0.4 and the error tolerance is 0.03, the function would return "in-range" if and only if the candidate got between 0.37 and 0.43 fraction of the vote.

Hint: This is an excellent problem on which to practice planning. We're not having you develop the plan explicitly, but we strongly suggest that you develop one on paper.

This task does a statistical, computational spot-check based on expected results. Another option is to have a human-led spot-check on election results.

Task 9: One of the electronic voting best-practices that VerifiedVoting recommends is a Risk-Limiting Audit (RLA). In an RLA, people take a random sample (subset) of the votes that were cast and count the votes. They check this work against the answer a computer returned on this sample. If the people get a different answer than the computer, that indicates an error with the computer-based counting that is cheaper to detect than a manual recount. Rhode Island conducted a statewide RLA in June 2020!

When doing an RLA, care must be taken that the random sample is statistically sound. We'll explore the idea of taking a sample here.

Write the following two functions, which try to sample one third of a given list:

sample_third that takes in a list of Booleans (yes/no votes) and returns the first 1/3 of the list. Hint: there are a few ways to approach the problem, but list slicing is probably the most straightforward. Slicing is described in more detail for strings, but the same principles apply for lists (using item indices instead of character positions). Slice indices need to be ints: if you get an error related to this, you can convert a number to an int using int(), e.g. int(6/3) will be 2.
sample_every_3 that takes in a list of Booleans and returns every 3rd element of the list, starting with index 0 (so for a 9-element list, this function would return the elements at indices 0, 3, and 6). Hint: you might want to define a variable that helps you keep track of the position in the list as you're looping through it. To check if a number is divisible by another number, you can take the remainder using the % (modulo) operator: 8 % 4 = 0, 9 % 4 = 1

You can assume that the length of the input lists is a multiple of 3.

Task 10: When testing sample_third and sample_every_3, one thing we might check is that the samples are representative of the population, that is, the fraction of yes-votes is approximately the same in the input list and the result. In class, we learned that pytest is fairly powerful in that it allows us to run any Boolean expression as a test. Come up with a Boolean expression that evaluates to true if and only if the difference between the fraction of yes-votes (True values) in the input list and the fraction of yes-votes in the output list is less than 0.1. Use this expression in the following test functions:

test_sample_third_representative, which tests the expression for some input list on the sample_third function. You will have to play around with input lists here to make sure your tests pass. In a comment at the bottom of this function, give an example of an input list where the test did not pass.
test_sample_every_3_representative, which does the same thing for sample_every_3. You might have to use a different input list to make this test pass. In a comment at the bottom of this function, give an example of an input list where the test did not pass.

Task 11: Often, votes come in district-by-district, so voting results will start with all of the votes from district A, then all of the votes from district B, and so on. In a comment under your code for Task 9, answer these questions:

Which function of sample_third and sample_every_3 would you choose to help ensure a fair sample?
What advantages does random sampling have over either the sample_third method or the sample_every_3 method?
Besides statistical soundness of sampling, what other technical challenges do you think RLA poses?

Part 3: Electronic voting access

In some of the questions above, you’ve thought about how to write code that might increase trust in electronic voting systems through data cleaning, statistical checks, and testing. However, this is not the whole story. Please submit your answers to the following questions in a pdf called hw6_src.pdf.

Task 12: Read the executive summary from this report by Brookings about broadband access.

Task 13: After reading the article, do you think that the issues VerifiedVoting.org focuses on are sufficient to address concerns about access? Why or why not?

Task 14: In what ways might this inequity in broadband access compound existing inequalities? Give one or two specific examples.

Part 4: Using dictionaries

In class, we learned about a new data type (dictionaries) that optimize lookup operations. Let's rewrite our code from Part 1 so that it uses dictionaries instead. The keys of the dictionary will be names of candidates, and the values will be the number of votes each candidate got. For example, if "bumblebee" got 10 votes and "honeybee" got 13 votes, the dictionary would look like {"bumblebee": 10, "honeybee": 13}.

For testing the functions below, it might help to create some example dictionaries as global variables in test_hw6.py, so that you can reference them in the testing functions without redefining them.

Task 15: In hw6_code.py, write a function called dict_any_votes_for that takes in a name and a dictionary of votes and returns a boolean indicating whether any of the votes were for the given name (case sensitive – clarified on 11/16/24 at 4:42 pm. Our autograder tests will not check for variations on case). Your solution should not use a for loop.

Task 16: In hw6_code.py, write a function dict_count_votes_for that takes in a name and a dictionary of votes. It should return the number of votes that were for the given name (case sensitive, that is, an exact match). Your solution should not use a for loop.

Task 17: In hw6_code.py, write a function called votes_list_to_dict that takes in a list of votes (like from part 1) and returns a dictionary of those votes. The coversion should be case-insensitive and turn all of the candidates into lowercase strings. Your solution should use a for loop.

Hint

You will have to handle both adding a candidate who isn't yet in the dictionary, and incrementing a vote for a candidate who appears multiple times in the list.

Check Block (Autograder Compatibility)

Make sure that your inputs for each function are in the order that they are mentioned in the directions
Paste the following code at the end of your hw6_code.py file and run the code. If All required methods exist! prints, you're good to go! If One of the required methods is missing or named incorrectly prints, you are either missing a method, or have named one incorrectly.

try:
    assert "name_matches" in dir()
    assert "any_votes_for" in dir()
    assert "count_votes_for" in dir()
    assert "got_more_votes" in dir()
    assert "clean_votes" in dir()
    assert "check_percent" in dir()
    assert "sample_third" in dir()
    assert "sample_every_3" in dir()
    assert "dict_any_votes_for" in dir()
    assert "dict_count_votes_for" in dir()
    assert "votes_list_to_dict" in dir()
    print("All required functions exist!")
except:
    print("At least one of the required functions is missing or named incorrectly")

Double Check You Have Completed All Tasks!

Remember to submit all of your work on Gradescope! The files you should be submitting are:

hw6_code.py

Tasks to be submitted in hw6_code.py:

Part 1

Task 1: name_matches function
Task 2: any_votes_for function
Task 3: count_votes_for function
Task 4: got_more_votes function

Part 2

Task 6: clean_votes function
Task 7: check_percent function
Task 9: sample_third and sample_every_3 functions
Task 11: comment with answers

Part 4

Task 15: dict_any_votes_for function
Task 16: dict_count_votes_for function
Task 17: votes_list_to_dict function

test_hw6.py

Tasks to be submitted in test_hw6.py:

Task 1: test_name_matches function
Task 2: test_any_votes_for function
Task 3: test_count_votes_for function
Task 4: test_got_more_votes function

Part 2

Task 7: test_clean_votes function
Task 8: test_check_percent function
Task 9: test_sample_third and test_sample_every_3 functions
Task 10: test_sample_third_representative and test_sample_every_3_representative functions

Part 4

Task 15: test_dict_any_votes_for function
Task 16: test_dict_count_votes_for function
Task 17: test_votes_list_to_dict function

hw6_src.pdf

Tasks to be submitted in hw6_src.pdf:

Part 3

Answers to tasks 13 and 14

Theme Song

The Bees Go Buzzing

Brown University CSCI 0111 (Fall 2024)