# More trees, introduction to objects
## Table of Contents
0. [Comments on the drill](#comments)
1. [What else is tree-shaped?](#whatelse)
2. [Example 3](#ex3)
3. [Introduction to Objects](#objects)
4. [The `__init__` method](#init)
## Comments on the drill <A name="comments"></A>
Recall this example from Monday's class:

In this visualization of the tree, every `HTMLTree` object is a _node_, and the edges between nodes represent the contents of the list in each node's _children_ field. A _leaf_ of the tree is a node without any children (here, the two text nodes).
It's both common and reasonable to ask: "Why did you name the dataclass `HTMLTree`, when it only contains the data for one node of the tree?" It's true that other names would have worked. We chose to include the word "`Tree`" in the name to reinforce the idea of delegating work for each child _tree_ to a new computation for each child _object_.
## What else is tree-shaped? <A name="whatelse"></A>
Tree-shaped data appears in more than just HTML documents. In fact, you've been working with tree-shaped data all semester without thinking much about it.
What else, besides HTML, is tree-shaped?
<details>
<summary><B>Think, then click!</B></summary>
Elm trees, oak trees, ...
Ok, no, seriously: here are three tree-shaped formats you may be familiar with:
* algebraic expressions;
* Python code; and
* (with some caveats) the English language.
Expressions you might enter into your calculator are all tree-shaped. Consider something like $(1+2)*(17/4)$. The operators are intermediate nodes in the tree, and the numbers are the leaf nodes at the bottom. The same goes for more complicated algebraic expressions.
Every time you run a Python program, the `python3` executable first turns your program text into something called an _abstract syntax tree_, which it then tries to optimize and run. You can look at any of the programs you've written, and see if you can break it up into a tree: functions and classes at the top level, etc. In fact, Python's indentation reinforces the idea of the syntax tree.
If you've ever diagrammed a sentence in English class, you've done manual computation over tree-shaped data.
</details>
## Example 3 <A name="ex3"></A>
Let's build one last function on trees
#### Challenge
Write a function `replace_text` that:
* accepts as input: an `HTMLTree`, a text string to search for, and a replacement string; and
* _changes_ the input tree so that instances of the search string have been replaced with the replacement string.
This function is a little bit more complicated than those we wrote on Wednesday, so let's definitely write some example tests first. And while we do so, it's worth asking: what might somebody writing this function get _wrong_?
<details>
<summary><B>Think, then click!</B></summary>
Here are some starter tests I came up with:
```python
def test_replace_text():
# Wrapping long lines outside parens:
# backslash (watch out for any blank space after the backslash)
# don't replace in tags
assert replace_text(parse('<html></html>'), 'html', 'foo') == \
parse('<html></html>')
# replace at nested depths
assert replace_text(parse('<html><p>hello</p><p><strong>world</strong></p></html>'), 'hello', 'greetings') == \
parse('<html><p>greetings</p><p><strong>world</strong></p></html>')
assert replace_text(parse('<html><p>greetings</p><p><strong>world</strong></p></html>'), 'world', 'fellow students') == \
parse('<html><p>greetings</p><p><strong>fellow students</strong></p></html>')
# replace partial words
# (you don't NEED the backslash, but I find having it on one line is less readable)
assert replace_text(parse('<html>chocolate cake</html>'), 'chocolate', 'dog-safe') == parse('<html>dog-safe cake</html>')
```
</details>
We still can start in the same way as before: if we only had _one_ node in the tree, what would the function look like?
Probably something like this:
```python
def replace_text(tree: HTMLTree, find: str, replace: str):
if tree.tag == 'text':
if tree.text == find:
tree.text = replace
```
But that's not entirely right. One of the above tests is failing (why?)
<details>
<summary><B>Think, then click!</B></summary>
That function would only work if the entire text field were the search string. It fails on our "chocolate cake" to "dog-safe cake" example. Instead, let's use a built-in Python function:
```python
def replace_text(tree: HTMLTree, find: str, replace: str):
if tree.tag == 'text':
tree.text = tree.text.replace(find, replace)
```
</details>
Where should we go from here? Just like before, we need to handle the children somehow. And since we don't know how deep the tree will go, we'll use recursion.
```python
def replace_text(tree: HTMLTree, find: str, replace: str):
for child in tree.children:
replace_text(child, find, replace)
if tree.tag == "text":
tree.text = tree.text.replace(find, replace)
```
What does this function return?
<details>
<summary><B>Think, then click!</B></summary>
Just `None`. We've built this function to modify the existing tree, rather than manufacturing any new objects. We can do this because `HTMLTree` isn't `frozen`, but as a result we won't be able to use `HTMLTree` as a key in sets or dictionaries.
</details>
#### I wonder...
How would we change this function so that, instead of _changing_ the input tree, it returned an entirely new tree structure with the needed changes?
## Introduction to Objects <A name="objects"></A>
Imagine you’re a DJ at a radio station. You only play songs that your listeners call in and request. In addition, every thousandth listener who calls in gets a prize! You want to keep track of the queue of songs you've been asked to play, as well as enough information to give out prizes.
To do this, we need a list of songs and a counter that says how many callers we've had so far. (Why can't we get by with just the list?)
We could implement this with a custom data type like this:
```python
@dataclass
class DJData:
num_callers: int
queue: list
```
We can implement a function to update our data and to figure out what we’re going to say to a listener:
```python
def request(data: DJData, caller: str, song: str) -> str:
data.queue.append(song)
data.num_callers += 1
if data.num_callers % 1000 == 0:
return "Congrats, " + caller + "! You get a prize!"
else:
return "Cool, " + caller
```
So here we’ve got a datatype and a function that reads and modifies that datatype’s contents. We can see how it works:
```
the_dj = DJData(0, [])
request(the_dj, "Tim", "French Fries w/Pepper")
"Cool, Tim"
```
We could have written this slightly differently:
```
@dataclass
class DJData:
num_callers: int
queue: list
def request(self, caller: str, song: str) -> str:
self.queue.append(song)
self.num_callers += 1
if self.num_callers % 1000 == 0:
return "Congrats, " + caller + "! You get a prize!"
else:
return "Cool, " + caller
```
To do this, we put the request function inside the definition of `DJData`. We’ve also modified the method a bit: instead of taking a data argument, we’ve called the argument `self` and left off the type annotation. This function is now a _method_ on the DJData class. Which means we can call it like we've been calling methods of other things like lists or dictionaries:
```
the_dj = DJData(0, [])
the_dj.request("Tim", "Rope on Fire")
"Cool, Tim"
```
We call methods by writing the name of an object (`the_dj`, in this case), then a dot, then the method arguments, excluding `self`. Since we’re not passing `self` in, how does Python know which object to call the method on?
We’ll keep learning about classes, objects and methods. I want to re-emphasize, though, that you’ve seen this before. We’ve called methods on lists, sets, etc. For instance, `l.append(2)`. What we’re seeing now is how to add methods to our custom objects!
## The `__init__` method <A name="init"></A>
Up until now we’ve been using the dataclasses library to make our custom datatypes work more like Pyret's. But from this point on, we will generally not use it unless we need to, so that we can see how Python’s objects work.
One of the features that `@dataclass` gives us is easy initialization: e.g., because `HTMLTree` is a dataclass, we can easily create `HTMLTree` objects by writing `HTMLTree(...)` where the `...` gives a tag, a child list, and an optional text field. The `HTMLField` function is called a _constructor_ or an _initializer_.
Non-dataclass objects need you to define their constructor yourself! To do this, we’ll define `__init__()` methods to initialize data on objects:
```python
class DJData:
def __init__(self):
self.queue = []
self.num_callers = 0
def request(self, caller: str, song: str) -> str:
self.queue.append(song)
self.num_callers += 1
if self.num_callers % 1000 == 0:
return "Congrats, " + caller + "! You get a prize!"
else:
return "Cool, " + caller
```
Python calls the `__init__` method in order to initialize an object’s fields when it is created. We can construct instances of the DJData class like this:
```python
> the_dj = DJData()
> the_dj.request("Tim", "Paper Bag")
"Cool, Tim"
```
Note that this gives us a bit more control over how fields are initialized. We could do anything we'd like inside `__init()__`: print things, run algorithms automatically, send email messages, etc.