---
title: 'CS 35L Review'
---
CS 35L W24 Week 9 Review
===
Topics Covered: Python, Regex, Git
## Python
### Why Python?
Python recently has become extremely popular and for good reason. A few of these being:
- Large community, maintains and creates new libraries
- Particularly useful in Machine Learning and Data Science
- numpy, pandas, PyTorch, statsmodels, etc.
- Built-in package manager `pip`
- Rapid prototyping in development
- Simple language, high level constructs allow programmer to focus on abstractions
- Interpreted language -- cross platform
### Basic Data Structures
Python has a few data structures out of the box which cover most functionality you really need. These include `list`, `set`, `tuple`, `dict`, `str` and any others you might need once in a while (*cough cough leetcode*) can be found in the [collections](https://docs.python.org/3/library/collections.html) library. You should also know how to write your own class. Python data structures are also heterogeneous, meaning they don't need to have the same type for all elements. Only restrictions are for hashed structures (`dict`, `set`) the keys need to be hashable (no lists as dictionary keys).
**List examples:**
```python=
# list creation (try some of these out)
x = list(range(10))
y = [i ** 2 for i in x] # list comprehension
z = [0] * len(x)
# let's merge x and y into a sorted list
s = []
i, j = 0, 0
while i < len(x) and j < len(y):
if x[i] <= y[j]:
s.append(x[i])
i += 1
else:
s.append(y[j])
j += 1
s += x[i:]
s += y[j:]
# what if we want to loop over a list
for element in x:
print(element)
for idx, element in enumerate(x):
print(idx, element)
```
Notes:
- Python lists are similar to C++ vectors
- This means easy append (push_back) in O(1) time, but inserting at the front is **very** costly
- Slicing creates a copy, not a reference
- Indexing with negative numbers goes from the end of the list
- `arr[-1]` is last element, `arr[-n:]` is last `n` elements
- `enumerate()` returns an iterable of 2-tuples which we unpack
**Set examples:**
```python=
# check if two elements of arr sum to target
# return the first two you see
def two_sum(arr, target):
seen = set()
for x in arr:
if target - x in seen:
return target - x, x
else:
seen.add(x)
return None, None
x = [1, 2, 2, 3, 4, 4, 7]
print(set(x))
```
Notes:
- Checking inclusion is O(1) with `in`
- Uses a hash function under the hood
- Better than dictionary due to less memory usage
- Sets can also be created using `seen = {}`
- Python uses [duck typing](https://en.wikipedia.org/wiki/Duck_typing)
- So your variable will be in dictionary-set quantum superposition until you call a dictionary/set specific function on it
**Tuple examples:**
```python=
# there isn't much to show here ngl
x = (1, "hello", True)
num, string, boolean = x # unpacking x into some variables
y = [1, 2, 3, 4, 5]
z = tuple(y) # what does this do
a, b, *c = z # what does c contain
print(a, b, c)
```
Notes:
- tuples are just immutable lists
- there's nothing to do with them other than pass them around as data
- they do use less memory so are generally more performant than lists if you want immutability
- can be sliced like lists
**Dictionary examples:**
```python=
d = {
'a': list(range(5, 0, -1)),
1: True,
'c': {
'z': list(range(10))
}
}
y = {
'a': 5,
'b': 10
'c': 15
}
# iterate over key-value pairs
# if just want keys, use .keys() instead of .items()
# likewise for .values()
for key, value in d.items():
print(f"{key}: {value}")
# dictionary comprehension
less_than_12 = {i: j for i, j in y.items() if j < 12}
y['d'] = 20
y.get('d', 0) # why use this and not y['d']
y['y'] = y # is this legal
```
Notes:
- if key `x` does not exist in dictionary, `dict[x]` will throw an exception
- keys should be hashable
- if you are doing linked list traversal in leetcode and want to mess with it you can hash the reference using `id()` which returns a string
- this is not good practice whatsoever
- since Python3.7, dictionary insertion is **ordered** by default
**String examples:**
```python=
x = 'abc'
y = "def"
z = """why would you do this
to yourself"""
print(z)
x = x + "'d'"
y = "q" + y
print(x)
sentence = "I love Python"
print(sentence.split())
```
Notes:
- Just like tuple, strings are immutable!
- Can also be sliced just like lists
- Iterated over as well
- Individual elements are also strings, not chars
- char type doesn't exist in Python
### Object-Oriented Programming
Python is more or less object oriented, and some programmers prefer to use it as such. (I'm not one of them, Java is objectively better if you run into a situation where you need OOP)
So let's write some basic object oriented code
```python=
#!/usr/bin/python3
from abc import ABC, abstractmethod
class Shape(ABC):
@abstractmethod
def get_name(self):
raise NotImplementedError()
class Polygon(Shape):
def __init__(self, name, number_sides):
self.name = name
self.number_sides = number_sides
super().__init__()
def get_name(self):
return self.name
def get_num_sides(self):
return self.number_sides
class Square(Polygon):
def __init__(self, side_length):
self.side_length = side_length
super().__init__("Square", 4)
def area(self):
return self.side_length ** 2
def perimeter(self):
return 4 * self.side_length
if __name__ == '__main__':
# shape = Shape()
poly = Polygon("Square", 4)
print(f"This polygon is a {poly.get_name()}")
square = Square()
print(f"I am a {square.get_name()}")
print(f"I have {square.get_num_sides()} sides")
```
Let's break this one down.
- `Shape` is an abstract class (ABC means Abstract Base Class)
- This is kind of like an Java/C# interface, but instead of `implements` you need to inherit from it
- `@abstractmethod` is a decorator that indicates that the method needs to be implemented in the inheriting class
- Try creating an instance of `Shape`. What happens?
- `Polygon` then inherits from `Shape` and implements the `get_name()` method.
- You can create instances of Polygon
- `Square` inherits from `Polygon` and adds some more functionality, as well as removing the need to pass in a name and number of sides, since a square has fixed values for those!
- See why OOP is used?
- You should know how to write your own class for the exam, if it asks you to.
Random Python quirks just in case Eggert tests you on them:
- `is` checks for reference equality, not value
- Frequently used like `if variable is None`
- functions pass parameters by "assignment"
- more or less pass integers by value, data structures by reference
- [here are some people on stackoverflow arguing with each other about it](https://stackoverflow.com/questions/63417488/passing-by-assignment-python)
- CPython (most common Python interpreter) garbage collection (freeing up memory) is done by [reference counting](https://devguide.python.org/internals/garbage-collector/index.html)
- sometimes Eggert 131 concepts like to bleed into 35L
- think to yourself why Python doesn't have a function like `free()` in C or `delete` in C++
- `range()`, `enumerate()`, `zip()` in Python are implemented similar to [generators](https://realpython.com/introduction-to-python-generators/)
- you don't need to know how to make your own necessarily, but it's good to know how they work
- they take advantage of **lazy evaluation**
- they create the next value in your iteration on demand, rather than all at once in the beginning
- when are these useful? why is `range()` implemented like this by default?
- imagine a scenario where you want an index-based loop over a list of one million elements...
```python=
x = range(4)
print(x) # this should look nothing like [0, 1, 2, 3]
y = iter(x)
i = None
while i != x.stop:
i = y.__next__() # generate the next value dynamically
print(i)
```
- Python classes support multiple inheritance
- What if a method is overridden in both parent classes?
- The first one inherited from is what gets called
- Think about why this is awful (and why it's disallowed in Java)
- **[NOT NEEDED FOR THIS CLASS]**: Python does some *weird* things with objects.
- In Python, classes themselves are also objects of type `type`
- You can dynamically create classes
- Which themselves can dynamically create objects of that type
- [Read more about this mess here](https://docs.python.org/3/reference/datamodel.html#metaclasses)
## Regular Expressions
These are generally used for searching for patterns in text. The implementation is done through something called an NFA (Nondeterministic finite automata). In 35L you just need to know how to use them at a basic level within shell scripts and Python scripts. For more information on how to use these in Python take a look at [this reference](https://docs.python.org/3/library/re.html).
For now, we will focus on just how to read and create regular expressions.
**Example 1:**
Let's try matching some simple words. Let's say we want to match string that start with between 3 and 5 of the letters `a-f` or `i-k` and end with some unlimited amount of `o`'s (could be infinite or zero).
```bash=
REGEXP='[a-fi-k]{3,5}o*'
```
What does each segment do?
- `a-f` and `i-k` look for characters in those ranges
- `{3,5}` says preceding between 3 and 5 times, inclusive
- `o*` says zero or more `o`
- How can we make the `+` operator from `*`?
- `x+` <-> `xx*`
- Then we can probably safely assume that `+` is extended regex and `*` is basic, so if we do not enable that `-E` flag on `grep` we would need to escape `+`
**Example 2:**
Let's write a regular expression that will match simple email addresses. These can be in the form `[series of characters]@[domain].[domain extension]`.
```bash=
echo 'pserafimescu9@ucla.edu' | grep -E '^[^@]+@[^@]+\.[^@]+$'
```
What does each component of the regular expression do?
- `^` denotes the start of the line
- `[^@]` is a character set that includes anything except `@`
- `+` means one or more of preceding
- `@` matches literal `@`
- `\.` matches literal `.` (why do we need to escape it?)
- `$` matches end of the line
What if we remove the `^` and `$` at the start and end of the line, respectively?
**Example 3:**
```bash=
REGEXP='//[^\r\n]*[\r\n]'
```
What does this do?
- `[^\r\n]` matches any character except newline or carriage return
- `*` matches zero or more of previous
- `[\r\n]` matches newline character or carriage return
- `//` matches literal string `//`
This looks for C or Java style single line comments
**Example 4:**
```bash=
REGEXP='^[0-9]+(\.[0-9]{2})?'
```
What about this one?
- `^` matches start of the line
- `[0-9]` matches a numeric digit
- `+` means one or more of preceding
- `\.` is literal `.`
- `{2}` means exactly two of preceding
- `?` means zero or one of preceding
Notice the lack of `$` at the end. This will match either an integer portion of a decimal if it has one or zero decimal places, or a decimal up to the second if it has two or more places.
## Git
There's not a lot to cover here beyond what you've done in assignments 4 and 5. I'll leave out the internals since you're all probably sick of those (and for final exam preparation assignment 5 is more than enough).
### Some basic questions to know:
**Q**: What's the difference between rebase and merge?
**A**: Rebase will create a linear git commit history. Imagine this as taking the diverging branch and adding it to the end of the branch you are merging into, rather than creating a merge commit and maintaining a nonlinear history. Think about why you might want one versus the other.

**Q**: Why use Git?
**A**: Git is decentralized, open-source, keeps track of development history, and difficult to corrupt (SHA-1 hash algorithm). Using Git makes distributed development easier and provides fallbacks in case of engineer mistakes or buggy releases. You can also integrate CI/CD (Continuous Integration and Delivery) pipelines into Git easily, allowing you to test each deployment.
**Q**: What are some commands I should know?
**A**: Print out a cheat sheet. But those I would make sure you know are:
- `git add`
- `git commit`
- `git log`
- `git push`
- `git pull` (and `git merge` + `git fetch`)
- `git bisect` (?)
- `git reset`
- `git reset --soft` will reset to given commit, but leaves all changes staged
- `git reset --hard` will discard all changes and reset
- `git reset --mixed` (default option) will unstage and reset, but keep working directory changes
- `git branch`
- `git checkout`
- and any others you have used in assignments
**Q**: How do you fix a merge conflict?
**A**: First edit the files Git has flagged, then run `git add` followed by `git commit` and Git will remember you are in the process of resolving a merge conflict.
### For the Final
Review your two Git assignments. Those are fair game. And print out a cheatsheet of what Git commands do. Understand what Git is and **why we use it in software development**.
## Exam FAQs and Tips
- How do I best prepare?
- In my 100% honest opinion (not representative of course staff), there are two ways to prepare for this exam.
- Print out anything and [everything](https://github.com/vinlin24/cs35l-notebooks)
- Regex, Emacs, Python reference docs, etc.
- Credit btw to Vincent Lin
- Don't stress and study too much
- The exam is *hard* and there is no way to be completely prepared without printing out and reviewing a LOT of stuff
- Review homework and lecture concepts (not details) to build intuition, and use that intuition to get as many points on each question as possible
- Pick whichever one of these best suites what you aim to get from this course and exam. A grade doesn't define you as a student or Computer Scientist!
- Professor Eggert has stated that Python will have **increased** focus in CS 35L
- Expect a little less Lisp then, but still be prepared with Emacs cheat-sheet and basic Lisp scripting knowledge
- Python questions tend to be around design principles and (somewhat niche) data structure behavior
- Print out or take notes on these heavily