# Question2
**Sol 1:**
~~~~sql
select 'Closed' as TP, count(1) as QTy FROM
`bigquery-public-data.san_francisco_311.311_service_requests`
WHERE created_date >= '2020-01-01' And STRING(closed_date, 'UTC') IS NULL;
~~~~
<br> **Sol 2:**
~~~~sql
SELECT neighborhood, QTY, QTY_CLOSED,QTY - QTY_CLOSED as QTY_OPEN, date
FROM (
SELECT neighborhood, COUNT(1) QTY,
SUM(IF(STRING(closed_date, 'UTC') IS NULL, 1, 0)) QTY_CLOSED,
MIN(IF(closed_date IS NULL, created_date, NULL)) AS date FROM
`bigquery-public-data.san_francisco_311.311_service_requests`
GROUP BY 1) as t
ORDER BY 1, 2;
~~~~
<br> **Sol 3:**
~~~~sql
SELECT neighborhood, date, count(1) AS qty_total, 1 as min, count(1) AS max,
count( distinct source) as source FROM (
SELECT *, FORMAT_TIMESTAMP('%F %H', created_date, 'UTC') date
FROM `bigquery-public-data.san_francisco_311.311_service_requests`
WHERE neighborhood is not null
) as t
group by 1, 2
HAVING qty_total > 4
ORDER BY 1, 2 DESC
~~~~
<br> **Sol 4:**
~~~~sql
SELECT neighborhood, police_district, category,
COUNT( unique_key) QTY,
MIN( CASE WHEN closed_date IS NULL THEN created_date ELSE NULL END) AS date, MIN(created_date) MIN_created_date,
MAX(created_date) MAX_created_date
FROM `bigquery-public-data.san_francisco_311.311_service_requests` WHERE police_district IS NOT NULL AND neighborhood IS NOT NULL
GROUP BY 1,2,3
ORDER BY 1, 3 desc
~~~~
<br> **Sol 5:**
~~~~sql
SELECT source, category,
COUNT( unique_key) QTY,
SUM(IF(STRING(closed_date, 'UTC') IS NULL, 1, 0)) QTY_CLOSED,
SUM(IF(STRING(closed_date, 'UTC') IS NULL, 0, 1)) QTY_OPEN,
MIN( CASE WHEN closed_date IS NULL THEN created_date ELSE NULL END) AS date, MIN(created_date) MIN_created_date,
MAX(created_date) MAX_created_date
FROM `bigquery-public-data.san_francisco_311.311_service_requests` WHERE police_district IS NOT NULL
AND neighborhood IS NOT NULL
GROUP BY 1,2
ORDER BY 1, 3 desc;
~~~~
<br> **Sol 6:**
Impossible to download: 6.2GB.
Metadata: https://data.sfgov.org/City-Infrastructure/311-Cases/vw6y-z8j6
After processing and generating a dataset of reasonable size, with the time difference between occurrences. 201.2 MB. In order to estimate de lag distribution I extracted the percentiles. We preserve the neigh, category combinations with more than 10 observations.
~~~~sql
SELECT
category, neighborhood,
approx_quantiles(hours_between_occ, 100) as percentiles,
count(1) as n_obs FROM (
SELECT *, TIMESTAMP_DIFF( created_date, created_date_lag, second) as hours_between_occ from (
SELECT category, neighborhood, created_date,
LAG(created_date) OVER (PARTITION BY category, neighborhood ORDER BY category, neighborhood, created_date) as created_date_lag
FROM `bigquery-public-data.san_francisco_311.311_service_requests`
WHERE neighborhood IS NOT NULL AND category IS NOT NULL
order by 1, 2, 3) as t_lag
WHERE created_date_lag IS NOT NULL) as t_quants
group by 1, 2
HAVING
n_obs > 9
order by 1, 2
~~~~
The total number of observations is down to: 3,954. Now the data is available to download, although the format must be JSON since the data no longer holds structured format.