Question2 - HackMD

# Question2 **Sol 1:** ~~~~sql select 'Closed' as TP, count(1) as QTy FROM `bigquery-public-data.san_francisco_311.311_service_requests` WHERE created_date >= '2020-01-01' And STRING(closed_date, 'UTC') IS NULL; ~~~~ **Sol 2:** ~~~~sql SELECT neighborhood, QTY, QTY_CLOSED,QTY - QTY_CLOSED as QTY_OPEN, date FROM ( SELECT neighborhood, COUNT(1) QTY, SUM(IF(STRING(closed_date, 'UTC') IS NULL, 1, 0)) QTY_CLOSED, MIN(IF(closed_date IS NULL, created_date, NULL)) AS date FROM `bigquery-public-data.san_francisco_311.311_service_requests` GROUP BY 1) as t ORDER BY 1, 2; ~~~~ **Sol 3:** ~~~~sql SELECT neighborhood, date, count(1) AS qty_total, 1 as min, count(1) AS max, count( distinct source) as source FROM ( SELECT *, FORMAT_TIMESTAMP('%F %H', created_date, 'UTC') date FROM `bigquery-public-data.san_francisco_311.311_service_requests` WHERE neighborhood is not null ) as t group by 1, 2 HAVING qty_total > 4 ORDER BY 1, 2 DESC ~~~~ **Sol 4:** ~~~~sql SELECT neighborhood, police_district, category, COUNT( unique_key) QTY, MIN( CASE WHEN closed_date IS NULL THEN created_date ELSE NULL END) AS date, MIN(created_date) MIN_created_date, MAX(created_date) MAX_created_date FROM `bigquery-public-data.san_francisco_311.311_service_requests` WHERE police_district IS NOT NULL AND neighborhood IS NOT NULL GROUP BY 1,2,3 ORDER BY 1, 3 desc ~~~~ **Sol 5:** ~~~~sql SELECT source, category, COUNT( unique_key) QTY, SUM(IF(STRING(closed_date, 'UTC') IS NULL, 1, 0)) QTY_CLOSED, SUM(IF(STRING(closed_date, 'UTC') IS NULL, 0, 1)) QTY_OPEN, MIN( CASE WHEN closed_date IS NULL THEN created_date ELSE NULL END) AS date, MIN(created_date) MIN_created_date, MAX(created_date) MAX_created_date FROM `bigquery-public-data.san_francisco_311.311_service_requests` WHERE police_district IS NOT NULL AND neighborhood IS NOT NULL GROUP BY 1,2 ORDER BY 1, 3 desc; ~~~~ **Sol 6:** Impossible to download: 6.2GB. Metadata: https://data.sfgov.org/City-Infrastructure/311-Cases/vw6y-z8j6 After processing and generating a dataset of reasonable size, with the time difference between occurrences. 201.2 MB. In order to estimate de lag distribution I extracted the percentiles. We preserve the neigh, category combinations with more than 10 observations. ~~~~sql SELECT category, neighborhood, approx_quantiles(hours_between_occ, 100) as percentiles, count(1) as n_obs FROM ( SELECT *, TIMESTAMP_DIFF( created_date, created_date_lag, second) as hours_between_occ from ( SELECT category, neighborhood, created_date, LAG(created_date) OVER (PARTITION BY category, neighborhood ORDER BY category, neighborhood, created_date) as created_date_lag FROM `bigquery-public-data.san_francisco_311.311_service_requests` WHERE neighborhood IS NOT NULL AND category IS NOT NULL order by 1, 2, 3) as t_lag WHERE created_date_lag IS NOT NULL) as t_quants group by 1, 2 HAVING n_obs > 9 order by 1, 2 ~~~~ The total number of observations is down to: 3,954. Now the data is available to download, although the format must be JSON since the data no longer holds structured format.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.