owned this note
owned this note
Published
Linked with GitHub
Open Data Day 2018
==================
**This document is written in Markdown. You can use the buttons above for styling or take a look at https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet for how to format the text**
[Registration](https://www.eventbrite.com/e/code-for-san-francisco-open-data-day-2018-tickets-42826162204)
[Code of Conduct](http://c4sf.me/code-of-conduct)
**Wifi:**: MSGuest
**Wifi code:** microsoft06ib
This year, the global themes are: Open Research Data, Tracking Public Money Flows, Open Mapping, and Data for Equal Development, but we also encourage thinking about issues that are important to San Francisco.
## Open Data Day
Below are types of projects or activities that are good for Open Data Day.
1. **Exploratory data analysis.** Interested in understanding a dataset better, dig into it using your favorite exploratory tools.
2. **Making maps.** Take existing open data and create a nice map that tells a story, helps people query data, or visualizes insights.
3. **Contributing open data.** Is there data locked in a PDF or document? Unlock it and turn it into data. Use that to encourage the release of reports as data.
4. **Contributing to the community.** Interested in moving open data forward? Have a conversation about what can be better, capture and share those thoughts.
5. **And more.** If you can leverage open data, go for it! If you're starting something new, we recommend scoping today's work around discovery or prototyping.
## Schedule
- **8:30 AM** Registration and breakfast
- **9:00 AM** Opening presentation, keynote by [Sandra Zuniga](https://www.linkedin.com/in/sandra-zuniga-a64b732a/), [Fix-It Director](http://sfmayor.org/neighborhoods/fix-it-team), and lightning talks
- **9:50 AM** Existing project pitches, unconference logistics
- **11:00 AM** First unconference session, projects hacking
- **12:30 PM** Lunch
- **1:30 PM** Second unconference session, projects hacking
- **3:00 PM** Third unconference session, projects hacking
- **4:30 PM** Closing
- **5:00 PM** Optional (unsponsored) happy hour around the corner at [ThirstyBear](http://thirstybear.com/)
## Unconference Sessions
These will be filled out day of, you can put ideas below under Challenges, Ideas and Discussions.
* Notes from these session will be captured in [this google drive folder](https://drive.google.com/open?id=1LZFTHlj47fG_2RBP7FJc96xP0wLSIYT1)
* Session leaders should create a [copy of the notes template](https://drive.google.com/open?id=1bUaTu3uVJAkzjhWkD0rMun9IJMhlGv110FxdSQ3Vln8) in the same folder and rename it after their session
* Please assign a scribe to record notes from your session
Once the agenda is set, we ask leads to put the title, room and leader name and slack handle below.
**Notes from unconference sessions:** https://drive.google.com/drive/u/1/folders/1LZFTHlj47fG_2RBP7FJc96xP0wLSIYT1
#### Template: Open Data Census Sprint, let's help SF and CA get on the board
**Session Description:** (Optional, only use this if you absolutely have to, the name should be concise and descriptive)
**Organizer:** Jason Lally @jasonlally
**Room:** Room A
### 11:00 AM
##### Data Visualization in 3D & VR
**Organizer:** Sony Green
**Room:** Room D
#### Leveraging Open Data in SF, Open Data 101
**Organizer:** Jason Laly
**Room:** D
### 1:30 PM
##### Datasette deep dive
**Organizer:** Simon Willison
**Room:** Room A
#### How to contribute to Open Street Map
**Organizer** Sam Estabrook
**Room** C
##### Open Data Census Sprint, let's help SF and CA get on the board
**Organizer:** Jason Lally @jasonlally
**Room:** Room D
### 3:00 PM
#### OpenTransit
**Description:** Building insights from transit data for better operations and planning
**Organizer:** Josh
**Room:** C
#### Campaign Finance
**Room:** D
---
## Challenges, Ideas, and Discussions
### The Data Science Hippocratic Oath
One extremely intriguing idea that came out was forming some kind of data science "hippocratic oath". Just as doctors need to take this oath to ensure that they complete their duties with the highest ethical standard, we as data scientists, should also ensure that we take the proper steps to uphold our ethical standards and promote transparency in our work. Furthermore, we should devote our work to help the People and not be swayed by special interests.
Some other organizations, including [Bloomberg](https://www.bloomberg.com/company/d4gx/) and [Microsoft](https://msblob.blob.core.windows.net/ncmedia/2018/01/The-Future_Computed_1.26.18.pdf) have also thought about and participated in this coversation. We could and should consider connecting our efforts with the broader communities.
## Resources
### Tracking Public Money Flows
Where does the money go? Nobody does this better in SF than the Controller’s Office and the Ethics Commission.
The Controller’s Office manages the City and County budget datasets. Below are some resources published by the Controller’s Office:
[Budget](https://data.sfgov.org/City-Management-and-Ethics/Budget/xdgd-c79v): This covers the budgets of the various departments
[Budget - FTE](https://data.sfgov.org/City-Management-and-Ethics/Budget-FTE/4zfx-f2ts): This dataset provides salary information for each department and associated department programs.
One of the key mandates of the Ethics Commission is in educating the public of campaign and lobbyist spending. While the amount of [datasets the Ethics Commission publishes](https://data.sfgov.org/browse?Department-Metrics_Publishing-Department=Ethics+Commission&category=City+Management+and+Ethics&limitTo=datasets) can be overwhelming, here are some choice selections:
- [Lobbyist Activity - Contacts of Public Officials](https://data.sfgov.org/City-Management-and-Ethics/Lobbyist-Activity-Contacts-of-Public-Officials/hr5m-xnxc): A dataset of each lobby visit. A treasure trove containing who was doing the lobbying, what official they talked to, and why.
- [Lobbyist Activity - Payments Promised by Clients](https://data.sfgov.org/City-Management-and-Ethics/Lobbyist-Activity-Payments-Promised-By-Clients/s2fy-y3my): A dataset of lobbyist contributions. Contains lobbyist, firm, payment amount, and client.
- [Lobbyist Activity - Political Contributions](https://data.sfgov.org/City-Management-and-Ethics/Lobbyist-Activity-Political-Contributions/sa8r-purn): All political contributions of $100 of more by lobbyist.
- [Campaign Consultants - Client Payment](https://data.sfgov.org/City-Management-and-Ethics/Campaign-Consultants-Client-Payments/tc9q-72uj): Record of payments to campaign consultants
### Additional
* [Open Data Day Global Resources](http://opendataday.org/#resources)
* [City of San Francisco Open Data Portal](https://datasf.org/opendata/)
---
## Notes from talks
### Datasette lightning talk (Simon Willison)
Datasette: https://github.com/simonw/datasette
Datasette Publish: https://publish.datasettes.com/
Here's the [Street Tree List](https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq) on data.sfgov.org - here's it [published using Datasette](https://san-francisco.datasettes.com/sf-trees-ebc2ad9/Street_Tree_List) and here's the [San Francisco Tree Search](https://sf-tree-search.now.sh/) demo ([and the underlying source code](https://github.com/simonw/sf-tree-search))
I used this command to create the final database, with separate tables for many of the columns and full-text search enabled across some of them as well:
csvs-to-sqlite Street_Tree_List.csv sf-trees.db \
-c qLegalStatus -c qSpecies -c qSiteInfo \
-c PlantType -c qCaretaker -c qCareAssistant \
-f qLegalStatus -f qSpecies -f qAddress \
-f qSiteInfo -f PlantType -f qCaretaker \
-f qCareAssistant -f PermitNotes
Here's [the query showing the number of trees planted per year](https://san-francisco.datasettes.com/sf-trees-ebc2ad9?sql=SELECT+substr%28PlantDate%2C+7%2C+4%29+plant_year%2C+count%28*%29+FROM+Street_Tree_List+WHERE+PlantDate+%21%3D+%27%27+group+by+plant_year+order+by+plant_year+desc%3B), and here's [the Google Sheets visualization](https://docs.google.com/spreadsheets/d/1UJb8ISUv_b0uffxrD_iZv2VQIyrZ_r2gBdeXl-Q-VHk/edit#gid=0) I made with the results.
I took the [2016 Housing Inventory](https://data.sfgov.org/Housing-and-Buildings/2016-Housing-Inventory/mudq-s8bt) CSV file and uploaded it to [Datasette Publish](https://publish.datasettes.com/). Here's the result: https://datasette-pqqzoahaun.now.sh/ - and here's [an example query](https://datasette-pqqzoahaun.now.sh/csv-data-446fa9a?sql=select+rowid%2C+%2A+from+%5B2016_Housing_Inventory%5D+order+by+rowid+limit+101) that adds up the number of units in each planning district.
### Full session on Datasette
Some Datasette tutorials I have written:
* [Datasette: instantly create and publish an API for your SQLite databases](https://simonwillison.net/2017/Nov/13/datasette/) - introducing Datasette, and some demos using FiveThirtyEight data
* [Building a location to time zone API with SpatiaLite, OpenStreetMap and Datasette](https://simonwillison.net/2017/Dec/12/location-time-zone-api/) which includes notes on using custom templates and GeoJSON to show [a map visualization for each timezone](https://timezones-api.now.sh/timezones-4fbc08f/timezones/12)
* [Analyzing my Twitter followers with Datasette](https://simonwillison.net/2018/Jan/28/analyzing-my-twitter-followers/)
* Everything [tagged Datasette on my blog](https://simonwillison.net/tags/datasette/)
Here's [a list of other Datasettes that have been published](https://github.com/simonw/datasette/wiki/Datasettes).
The article on [ SQLite as an application file format](https://www.sqlite.org/appfileformat.html)
The visualization library I'm thinking about including is [Vega](https://vega.github.io/vega/examples/)