**[&laquo; Back to the main CSCI1680 website](https://cs.brown.edu/courses/csci1680/f23/)** # Final project :email: <!-- :::info :warning: **Note**: This is a **pre-release** version of the project handout, designed to give you an overview of the project scope and to help you start thinking about topics. Until the assignment is released, details and requirements are subject to change--please refresh this page for updates before posting on Ed to ask for clarifications. ::: --> <!-- ## Introduction In this project, you will have an opportunity to use these skills by further exploring some component of networking that interests you---whether it's exploring something we've already covered more deeply, or by investigating something new. At the end of this document, we have provided a list of potential project ideas with resources to get started. You can either pick one of these ideas, or use your own. This is an open-ended project, and is intended to be lighter than IP or TCP. We have about two weeks left of the semester: in that time, you will propose a brief project idea, and spend about week working on it. After that, you will your code and a brief write up about your work. Our hope is to create something that's fun and interesting, but doesn't create a lot of stress. --> ## Overview We have reached the end of the course, congratulations! Take a deep breath. You've made it. :grin: In this course, we have discussed some of the core protocols and concepts that power the Internet. Yet, there are many topics we have not had time to cover. Though some of the core protocols will be around forever, networking is a fast-moving field of CS, with new protocols, ways to build applications, and new performance and security concerns evolving every day. Thus, our goal is to give you the tools you need to tackle new networking challenges you encounter. In this project, you will have an opportunity to implement a protocol or concept we have discussed in class but did not get to do in our other projects. Examples could include building a client/server for a protocol we've discussed, or implementing a networked application in some interesting way. Your project could also extend an existing project we have already completed, so long as you propose a significant-enough extension. See the [Sample topics](#Sample-project-topics) section for a list of possible topics. You are welcome to use any of these, modify them, or suggest your own! Any ideas you suggest do not need to fit into these two categories---you can work on any topic you want, so long as we approve your idea. ## Logistics and Timeline ### Teams You `SHOULD` work on the project in a team of 2. You `MAY` keep the same team as for IP/TCP, or you may form a new team. Working solo is permitted, but we don't recommend it unless you have extenuating circumstances or a very narrow project topic (ask Nick if you're unsure). If you worked as a group of 3 for IP/TCP, or if you had permission to work solo, you may continue to do so for this project. :::warning :warning: **Note**: Regardless of your team situation, **you `MUST` fill out the [team preference form](https://forms.gle/3rVghQRL5kdHNV5J8)** to register your team (even if it's a team of size 1), or ask to be matched to a team, by **Wednesday, November 29 at 5pm EST.** All team mumbers must fill out the form--only mutual requests will be honored. If you do not submit the form on time, we may not be able to match you to a team, requiring you to work solo. ::: ### Timeline Your project has two deadlines: - A brief **project proposal** due by **Friday, December 1 at 11:59pm**. **No late days** may be used on this part, since we need to review your proposals and provide feedback. - Your final submission, including a your implementation and a brief writeup are due by **Thursday, December 14** ### Repository Once your team has been formed, you will receive a github classroom link to create a repository. This repository is completely blank. Since this is an open-ended project, there is no starter code or reference implementation---this repository is just a place to keep your work and collaborate! ### Languages You can work on the project using any language(s) you want---whatever you think will help you accomplish the project most easily. You are are NOT restricted to Go/C/C++/Rust: Python or other scripting languages are fine. As with previous projects, you may also use any software libraries to help, so long as they do not trivialize the project you have proposed. For example, if your project is to build a DNS resolver, it's fine to use a library to build/parse DNS packets, so long as you write the actual logic to decide what to query and interpret the responses yourself. <!-- The sample projects have links to some software libraries for common languages that you may find useful---you are welcome to use these, or find other libraries for other languages you might prefer. --> ## Project proposal To ensure your project has a suitable scope, you must write a short project proposal and submit it via Gradescope on or before **Friday, December 1 by 11:59pm EST**. **No late days may be used** on this part, since we need to review your work and provide feedback--any late submissions will incur a penalty. Your proposal should be short (no more than 1--2 pages) and should include the following: - An outline of the project you want to implement--what do you want to acheive? - Any stretch goals you think may be difficult but nice to have - Any tools, libraries, or language(s) you intend to use (doesn't need to be a final list) - Any open questions you'd like us to help you answer If you're not sure about all your project's details--that's okay! Let us know what you'd like to learn and how we can help. ### Final implementation and writeup When you are done, you will submit your work by pushing all code to your repository and submitting a final report and demo video that describe your overall results. The requirements for each part are described in the following sections. **Writeup**: There is no official length requirement, but a reasonable estimate is on the order of 3--4 pages of text/figures. In general, your writeup should contain at least the following components: - **Introduction**: What were your overall project goals? What (briefly) did you achieve? - **Design/Implementation**: What did you build, and how does it work? For this part, give an overview of the major components of your system design and how they work, similar what you might write in a readme. - **Discussion/Results**: Describe any results you have, what you have learned, and any challenges you faced along the way. For this part, please include any relevant logs/screenshots of your program operating (and/or reference your demo video). - **Conclusions/Future work**: Overall, what have you learned? How did you feel about this project overall? If you could keep working on this project, what would you do next? **Demo video**: In addition to your code and writeup, your final submission should include a *short* (no more than 5min) demo video to demonstrate your work. This can be as simple as a screen recording while you run your code, or a more involved presentation where you also describe your project and how it works. Basically, this is just a way to supplement your writeup in a video format--we will look at both when grading. If your video is too large to upload to your repo, please upload it to Google Drive and include a shareable link in your document. ### Final Deadline Your final submission (code, writeup, demo video) is due by **Thursday, December 14 at 11:59pm EST**. :::warning **Warning**: If this deadline would be problematic for you, please contact the instructor ASAP so that we can make a plan together. **Late days may not be used without prior permission (or an excused extension)**, since this deadline approaches the official University grading deadline. Depending on your individual circumstances (graduation date, final grade logistics), some extensions may not be possible. If you have concerns, you should contact Nick sooner rather than later! ::: <!-- ## What "open-ended" means Once again, this project is meant to be a chance for you to get started exploring some more network concepts in the very short time we have left in the semester. You are **not** being asked or expected to build a fully-fledged system or research project: just pick an idea, spend about a week on it---with a number of hours that you'd consider reasonable for a single class---and submit your work. If you envision your project as having steps 1--5, and step 1 takes you a week, **that's okay**: just submit what you have and tell me what you did and what you learned. So take this time to explore something that interests you, and have fun! --> # Sample project topics The following pages contain some sample project ideas. These are meant to be a starting point to think about your own project--you can use one of these, or pick your own! :::info **Note**: We'll be adding some more resource links soon! ::: <!-- Each project idea also lists a guess for a suitable development environment: due to the way in which certain network measurements may be performed, not all can fully run within the course container environment. If you have questions about using any tools, development environments, etc, please feel free to ask us for help on Ed or during office hours. However, note that we have not tested these projects before, so we won't have all the answers---but we're happy to help you figure out how to approach the problem and point you in the right direction! --> ## Simple HTTP server Relevant lectures: Lectures 19--21 Implement a simple webserver to serve static pages from a directory using HTTP 1.0 or 1.1. A basic implementation would involve implementing the `GET` method to request pages. From there, you could consider adding support for generating content dynamically, uploading content via the `POST` method, or measuring your webserver's performance with a benchmarking tool like [wrk](https://github.com/wg/wrk). ## DNS resolver Relevant lectures: Lectures 18--19 Implement a DNS resolver that can perform recursive and iterative queries (ie, by starting with a root nameserver). Some extensions could include adding support for caching, or querying different record types (`A`, `AAAA`, `TXT`, ...). <details><summary>Some implementation details </summary> - You are not required to serialize DNS packets yourself. There are many good libraries that can do this for you. For Python, a good one is [`dnspython`](https://www.dnspython.org/) - For testing recursive queries, https://public-dns.info/ curates a list of public DNS servers around the world you can query - When sending DNS queries, don't send a huge number of queries to the same server in rapid succession---otherwise you might get blocked! Instead, wait >=100ms in between queries. </details> ## Build an application with RPCs, eg. a better Snowcast :::info **Note**: We'll talk more about RPCs and Web APIs in class on Tuesday, November 28. For now, here's some more background on what this means. ::: Relevant lectures: Lecture 23 When we built Snowcast, you wrote code to manually compose messages in the Snowcast protocol format and send them along TCP sockets. This is a great exercise in implementing a protocol. However, modern applications often leverage frameworks to help build network APIs more quickly. One such framework is [gRPC](https://grpc.io): users can define their API and message formats, and the gRPC framework automatically generates code for establishing connections, authentication, serializing messages and more, in your language of choice. To explore these tools, you could implement part of Snowcast (or some other application of your choice) in [gRPC](https://grpc.io), or some other framework that provides similar functionality. A good starting point might be to build a client and server that connects and exchanges Snowcast's `Hello`/`Welcome` messages, and then continue with selecting stations and streaming data. <details><summary>Some implementation details </summary> - If you choose to implement Snowcast, note that you can modify the Snowcast protocol as much as you like--you don't need to stick to the same message formats, as long as your protocol achieves the same goals. - Some parts of Snowcast per our specification may not map well onto gRPC--one example is streaming via UDP. For these cases, it's up to you to decide how to handle it. Is there something similar you could do with gRPC, or do you need to make your own custom solution using plain sockets? What are the tradeoffs? Whatever you decide, document your decisions in your writeup. </details> ## HTTP API: ActivityPub :::info **Note**: We'll talk more about Web APIs in class on Tuesday, November 28. ::: Relevant lectures: Lecture 23 Mastodon, the open-source, decentralized Twitter alternative, is built on the ActivityPub protocol ([Overview](https://en.wikipedia.org/wiki/ActivityPub), [Full Specification](https://www.w3.org/TR/2018/REC-activitypub-20180123/)), which is an HTTP API for exchanging messages between ActivityPub servers and clients. While ActivityPub has a lot of features, the mechanics essentially boil down to exchanging JSON messages via HTTP. For this project, you could implement a basic ActivityPub client and server that support some basic methods. To do this, you can use any web programming libraries you like to serve HTTP endpoints. A good starting point would be to implement some methods from ActivityPub's "Social API", which specifies communication between clients and the server (posting and fetching messages). As a stretch goal, you could consider parts of its "Federation API", for communicating between your own servers to build a larger social network. <!-- From there, the protocol defines two "layers", a "Social API" for clients to read and post messages, and a "Federation Protocol" for servers to The main specification provides two layers: a server to server protocol (the “Federation Protocol”) and a client to server protocol (the “Social API”). For this project, implement an ActivityPub conformant Client and an ActivityPub conformant Server, i.e. show us a client and server that can talk to each other with the protocol. This may seem daunting but note that once you implement one side, the other is fairly easy to create. You can use any of the following resources to help you implement the protocol: TODO: list the resources above and mention HTTP libraries TODO: maybe include more specificity here because there is a lot of functionality a server/client can have while what’s needed to get a base implementation working isn’t a lot. --> ## Build your own traffic analyzer Wireshark and similar tools are great for viewing and analyzing network traffic, but you can also build your own custom packet analysis tool to answer very specific question. Using a packet capture library like [scapy](https://scapy.net/) (Python), [pcap](https://pkg.go.dev/github.com/google/gopacket/pcap) (Go), or `libpcap` (C/C++), implement your own traffic analyzer that can either watch for packets on a live network interface, or read a capture file, to perform some specific analyses on your own traffic. What should you analyze? You decide! Examples could include: - Extracting files and images from HTTP (*not* HTTPS) traffic - Logging your DNS traffic, and outputting a list of all domains queried from your system, how often you query them, etc. - Measuring average latency of TCP connections, or drawing your own [TCP stream graphs](https://www.packetsafari.com/blog/2021/10/31/wireshark-tcp-graphs/) to examine congestion control performance <!-- ### Environment You should be able to implement this using our course's container environment. **Notes/Resources** - [Wikipedia's page on HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) has a good overview of the basic elements of the protocol, and a list of RFCs. For a quick experiment to measure performance, you can probably get away with just implementing the `GET` command (similar to what was shown in lecture) - [wrk](https://github.com/wg/wrk), an HTTP benchmarking tool --> <!-- ## Measurement: Investigating Zoom traffic As an old person, I don't understand how Zoom works---or, at least, in terms of how it uses the network. I want you to explain it to me. Capture some traffic while you use Zoom and report on what happens while using various features (hosting calls, joining calls, screen sharing, etc.). Can you tell what protocols are used? How much bandwidth does the call use? Does everything happen over one connection, or multiple? What IPs are involved, and where are they located? Does the video data get sent to Zoom's servers, or directly to other users on the connection? How does this differ from other videoconferencing applications (Hangouts, Messenger, FaceTime, etc.)? You can start this by using Wireshark and looking at the traffic. For a more detailed analysis, one option is to write a script that parses a capture file and outputs some useful statistics. Note: I don't expect you to understand everything about how Zoom works (and indeed, much of the traffic may be encrypted), but I'm quite curious what can be learned from a surface-level analysis! ### Environment To capture traffic from Zoom calls, you would need to install Wireshark on your own machine, rather than inside the container. For more detailed analysis, you can save the data captured by Wireshark to a file (ie, called a "capture file" or "pcap file") and process the data in your container environment (or anywhere else). Capture files are a standardized format that can be processed by various tools and libraries. **Notes/Resources** - Zoom has [a whitepaper](https://explore.zoom.us/docs/doc/Zoom Connection Process Whitepaper.pdf) about its connection process. You might consider comparing what you observe against this (or potentially other resources you find online) to see if you can replicate their results - [`tshark`](https://www.wireshark.org/docs/man-pages/tshark.html), Wireshark's terminal-based counterpart, has some good options for generating summaries of capture files that might be useful - [scapy](https://scapy.net/) and [PyPCAPKit](https://pypi.org/project/pypcapkit/) are Python libraries for parsing packet capture (PCAP) files ## Measurement: Investigating CDNs In class, we discussed how CDNs use DNS to direct users to nearby servers. We this by querying a domain name from multiple DNS servers at different geographic locations. One way to explore this further is to query a domain from many vantage points and examine the IPs that are returned. For example, let's say you ask 100 DNS servers around the world to resolve `randomsite.com`. How many different IPs do you learn? Where are they located? Do they all belong to the same content provider? To investigate this, you could write a script that takes in a domain name, queries it against a list of DNS servers around the world, and examines the results. From here, you could potentially ask other questions of different domains: Do all the DNS responses use similar TTL values? Are the records signed with DNSSEC? ### Environment You should be able to implement this from our course's container environment. **Notes/Resources** - <https://public-dns.info/> curates a list of public DNS servers around the world you can query. - When sending DNS queries, don't send a huge number of queries to the same server in rapid succession---otherwise you might get blocked! Instead, wait >=100ms in between queries. - You can map IP addresses to coarse physical locations using a GeoIP database. For example, you can do this in Python using [`python-geoip`](https://pythonhosted.org/python-geoip/), which reads an IP-to-location database stored on your system. You may need to install the database separately---this link should lead you to instructions. - You can make DNS queries from a script by using a DNS library (a good Python one is [`dnspython`](https://pythonhosted.org/python-geoip/), or you can simply run shell commands from a script and parse the output (fast, but can get ugly) --->