owned this note
owned this note
Published
Linked with GitHub
# PA220 HA3
Adam Grygar, Miroslav Bezak
## TODO
- [x] Zeptat se zda tam musime davat ten cely shit z explain analyze ke kazde picovine
- [ ] Pouzit jejich dotazy
- [ ] Dat explain analyze na prvni a posledni dotaz
## Uvodni poznamky
- Puvodni casy originalnich dotazu se v prubehu dni postupne menily. Vzdycky jsme se snazili overit cas puvodniho proti novemu dotazu v danem case.
- Pro minimalizovani dochylky casu behu, jsme pousteli jak puvodni tak optimalizovany dotaz nekolikrat. Nakonec jsme vybrali nejlepe reprezentativni casy.
- U dotazu Q4 a Q5 je odchylka casu pri kazdem jednom behu dotazu velka. Neda se proto poradne merit zrychleni. Tak jako tak zrychleni tam nemuzeme moc najit a maximalne se jedna o par sekund v par bezich.
## Materialized View FactsWithDateTime
- dopredne spojenie faktov s datumom a casom, tieto dve dimenzie sa tam najcastejsie joinuju
```sql=
SELECT "ReportFacts".id,
"ReportFacts".program_ver_id,
"ReportFacts".device_id,
"DateDimension".day,
"DateDimension".month,
"DateDimension".year,
"DateDimension".quarter,
"DateDimension".day_of_week,
"TimeDimension".hour,
"TimeDimension".minute,
"ReportFacts".tracking_mode_id,
"ReportFacts".gsmnet_id,
"ReportFacts".method_id,
"ReportFacts".car_id,
"ReportFacts".app_run_time,
"ReportFacts".pda_run_time,
"ReportFacts".battery_level,
"ReportFacts".sim_imsi,
"ReportFacts".pda_imei
FROM "ReportFacts"
LEFT JOIN "DateDimension" ON "ReportFacts".date_id = "DateDimension".id
LEFT JOIN "TimeDimension" ON "ReportFacts".time_id = "TimeDimension".id;
```
## Materialized View FactsWithDeviceAndProgramVersion
- spojeni faktu s program version a device
```sql=
SELECT "ReportFacts".id,
"ReportFacts".tracking_mode_id,
"ReportFacts".gsmnet_id,
"ReportFacts".method_id,
"ReportFacts".car_id,
"ReportFacts".app_run_time,
"ReportFacts".pda_run_time,
"ReportFacts".battery_level,
"ReportFacts".sim_imsi,
"ReportFacts".pda_imei,
"ReportFacts".date_id,
"ReportFacts".time_id,
"ProgramVersionDimension".program_ver,
"ProgramVersionDimension".platform,
"DeviceDimension".device
FROM xbezak."ReportFacts"
LEFT JOIN xbezak."ProgramVersionDimension" ON "ReportFacts".program_ver_id = "ProgramVersionDimension".id
LEFT JOIN xbezak."DeviceDimension" ON "ReportFacts".device_id = "DeviceDimension".id;
```
## Materialized View FactsWithDeviceProgramDateTime
- spojeni faktu s program version, device, date, time
```sql=
SELECT "ReportFacts".id,
"DateDimension".day,
"DateDimension".month,
"DateDimension".year,
"DateDimension".quarter,
"DateDimension".day_of_week,
"TimeDimension".hour,
"TimeDimension".minute,
"ReportFacts".tracking_mode_id,
"ReportFacts".gsmnet_id,
"ReportFacts".method_id,
"ReportFacts".car_id,
"ReportFacts".app_run_time,
"ReportFacts".pda_run_time,
"ReportFacts".battery_level,
"ReportFacts".sim_imsi,
"ReportFacts".pda_imei,
"ProgramVersionDimension".program_ver,
"ProgramVersionDimension".platform,
"DeviceDimension".device
FROM xbezak."ReportFacts"
LEFT JOIN xbezak."DateDimension" ON "ReportFacts".date_id = "DateDimension".id
LEFT JOIN xbezak."TimeDimension" ON "ReportFacts".time_id = "TimeDimension".id
LEFT JOIN xbezak."ProgramVersionDimension" ON "ReportFacts".program_ver_id = "ProgramVersionDimension".id
LEFT JOIN xbezak."DeviceDimension" ON "ReportFacts".device_id = "DeviceDimension".id;
```
## Indexacia
- najefektivnejsim sposobom ako zrychlit dotazy nad databazou je indexacia
- index je vzdy nad primarnym klucom explicitne
- je dobrou praktikou indexovat aj na cudzich klucoch, casto sa podla nich vyhladava and / or spaja - tieto indexy sme pridali do tabulky faktov
- dalej sa je treba sa zamysliet nad indexami nad dalsimi stlpcami, chceme sa vyhnut sekvencemu prechadziu tabuliek pri nasich dotazoch (hladanie konkretnej hodnoty alebo intervalu)
- v tomto nam mozu exekucne plany ktore sa tvoria pred kazdym dotazom, v PgAdmin vieme zobrazit pekne pomocou explain analyze tlacidla alebo priamo v textovom vypise prikazu EXPLAIN ANALYZE
- cielom bolo pridat indexy tak, ze sa zbavime vsetkych sekvenych skenov, zostanu nam iba indexove a ine rychle skeny - indexy v databazach zvyknu mat velky branching faktor, takze na niekolko urovni hlbky sa vieme dostat ku vsetkym zaznamom aj pri milionoch riadkoch
- to bola ale teoreticka idylka, ktora potom narazila na krutu realitu, po pridani indexov ich databazovy engine vobec nepouzil k vyhodnoteniu dotazov, nedoslo teda k zrychleniu
- napadlo nam, ze engine proste spravil chybu vo vytvoreni execution planu a ked ho donutime tieto indexy pouzit, tak sa dotazy zrychlia
- bohuzial sme sa ale mylili a po vynuteni indexov pomocou `SET enable_seqscan = OFF;` sa dotazy nezrychlili, rychlost bola rovnaka alebo dokonca horsia, takze chyba nebola v planovani enginu
- nie sme si uplne 100% isty, preco doslo k takejto situacii ale dovodov moze byt viacero
- indexy najvacsi rozdiel spravia ked hladame konkretnu hodnotu alebo interval vo velkej tabulke, taketo poziadavky sa ale v nasich dotazoch moc neobjavuju
- pri spajani a groupovani sa v nasom pripade viac oplati prejst raz celu tabulku a najst vsetky hladane hodnoty ako sa stale pozerat do indexu; `O(n) vs O(n * log n)`
## Q1 List connection count per app version ad day during october 2020
#### Original
- povodny cas: 2040.211 ms
```sql=
SELECT day, program_ver, COUNT(*) connection_count FROM xbezak."ReportFacts"
LEFT OUTER JOIN xbezak."DateDimension" ON date_id="DateDimension".id
LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id="ProgramVersionDimension".id
WHERE "DateDimension".month=10
GROUP BY ROLLUP(day, program_ver)
ORDER BY day DESC, program_ver DESC
```
```
"Sort (cost=131633.41..131635.50 rows=838 width=44) (actual time=2961.060..2961.086 rows=496 loops=1)"
" Sort Key: ""DateDimension"".day DESC, ""ProgramVersionDimension"".program_ver DESC"
" Sort Method: quicksort Memory: 62kB"
" -> MixedAggregate (cost=1070.05..131592.72 rows=838 width=44) (actual time=2960.449..2960.553 rows=496 loops=1)"
" Hash Key: ""DateDimension"".day, ""ProgramVersionDimension"".program_ver"
" Hash Key: ""DateDimension"".day"
" Group Key: ()"
" Batches: 1 Memory Usage: 113kB"
" -> Hash Left Join (cost=1070.05..126596.29 rows=332537 width=36) (actual time=29.362..2250.775 rows=1956395 loops=1)"
" Hash Cond: (""ReportFacts"".program_ver_id = ""ProgramVersionDimension"".id)"
" -> Hash Join (cost=1068.47..125579.50 rows=332537 width=16) (actual time=28.874..1800.842 rows=1956395 loops=1)"
" Hash Cond: (""ReportFacts"".date_id = ""DateDimension"".id)"
" -> Seq Scan on ""ReportFacts"" (cost=0.00..114240.68 rows=3912168 width=16) (actual time=1.193..1064.311 rows=3912168 loops=1)"
" -> Hash (cost=1018.02..1018.02 rows=4036 width=16) (actual time=26.984..26.986 rows=4030 loops=1)"
" Buckets: 4096 Batches: 1 Memory Usage: 221kB"
" -> Bitmap Heap Scan on ""DateDimension"" (cost=79.57..1018.02 rows=4036 width=16) (actual time=1.953..25.921 rows=4030 loops=1)"
" Recheck Cond: (month = '10'::double precision)"
" Heap Blocks: exact=167"
" -> Bitmap Index Scan on ""MonthIndex"" (cost=0.00..78.56 rows=4036 width=0) (actual time=1.708..1.708 rows=4030 loops=1)"
" Index Cond: (month = '10'::double precision)"
" -> Hash (cost=1.26..1.26 rows=26 width=36) (actual time=0.464..0.465 rows=26 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 10kB"
" -> Seq Scan on ""ProgramVersionDimension"" (cost=0.00..1.26 rows=26 width=36) (actual time=0.449..0.453 rows=26 loops=1)"
"Planning Time: 8.789 ms"
"Execution Time: 2961.244 ms"
```
#### Notes
- Zmena poradi joinu nevedla ke zvyseni vykonu
#### Zrychlenie s Materialized View FactsWithDateTime
- zlepsenie casu na 1551.123 ms
```sql=
SELECT day, program_ver, COUNT(*) connection_count FROM xbezak."FactsWithDateTime"
LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id="ProgramVersionDimension".id
WHERE "FactsWithDateTime".month=10
GROUP BY ROLLUP(day, program_ver)
ORDER BY day DESC, program_ver DESC
```
#### Pouziti IN predicate
- Pouziti In predicate nemelo zadny vliv na zrychleni.
#### Zrychleni s Materialized View
- Zrychleni na 1516.949 ms z originalniho dotazu, ktery momentalne beha za 2373.259 ms
```sql=
SELECT day, program_ver, COUNT(*) connection_count
FROM xgrygar."FactsWithDeviceProgramDateTime"
WHERE "FactsWithDeviceProgramDateTime".month=10
GROUP BY ROLLUP(day, program_ver)
ORDER BY day DESC, program_ver DESC
```
```
"Sort (cost=174677.76..174679.08 rows=528 width=20) (actual time=2163.157..2163.181 rows=496 loops=1)"
" Sort Key: day DESC, program_ver DESC"
" Sort Method: quicksort Memory: 62kB"
" -> MixedAggregate (cost=0.00..174653.88 rows=528 width=20) (actual time=2162.622..2162.721 rows=496 loops=1)"
" Hash Key: day, program_ver"
" Hash Key: day"
" Group Key: ()"
" Batches: 1 Memory Usage: 113kB"
" -> Seq Scan on ""FactsWithDeviceProgramDateTime"" (cost=0.00..145174.34 rows=1964951 width=12) (actual time=2.022..1276.961 rows=1956395 loops=1)"
" Filter: (month = '10'::double precision)"
" Rows Removed by Filter: 1955773"
"Planning Time: 0.756 ms"
"Execution Time: 2163.273 ms"
```
## Q2 List the connections counts in individual operator networks per weekday and month.
#### Original
- Puvodni cas: 9417.357 ms
```sql=
SELECT day, month, network, country, COUNT(*) connection_count FROM xbezak."ReportFacts"
LEFT OUTER JOIN xbezak."DateDimension" ON date_id="DateDimension".id
LEFT OUTER JOIN xbezak."GsmNetDimension" ON "GsmNetDimension".id=gsmnet_id
GROUP BY ROLLUP(network, day, month, country)
ORDER BY month ASC, day ASC, connection_count DESC
```
```
"Sort (cost=951729.35..953777.99 rows=819457 width=48) (actual time=8947.804..8948.095 rows=5820 loops=1)"
" Sort Key: ""DateDimension"".month, ""DateDimension"".day, (count(*)) DESC"
" Sort Method: quicksort Memory: 694kB"
" -> GroupAggregate (cost=779153.43..846030.52 rows=819457 width=48) (actual time=6759.411..8941.431 rows=5820 loops=1)"
" Group Key: ""GsmNetDimension"".network, ""DateDimension"".day, ""DateDimension"".month, ""GsmNetDimension"".country"
" Group Key: ""GsmNetDimension"".network, ""DateDimension"".day, ""DateDimension"".month"
" Group Key: ""GsmNetDimension"".network, ""DateDimension"".day"
" Group Key: ""GsmNetDimension"".network"
" Group Key: ()"
" -> Sort (cost=779153.43..788933.85 rows=3912168 width=40) (actual time=6757.729..8028.101 rows=3912168 loops=1)"
" Sort Key: ""GsmNetDimension"".network, ""DateDimension"".day, ""DateDimension"".month, ""GsmNetDimension"".country"
" Sort Method: external merge Disk: 163112kB"
" -> Hash Left Join (cost=2026.81..136832.09 rows=3912168 width=40) (actual time=31.979..2494.531 rows=3912168 loops=1)"
" Hash Cond: (""ReportFacts"".gsmnet_id = ""GsmNetDimension"".id)"
" -> Hash Left Join (cost=1956.34..126467.38 rows=3912168 width=24) (actual time=30.373..1639.844 rows=3912168 loops=1)"
" Hash Cond: (""ReportFacts"".date_id = ""DateDimension"".id)"
" -> Seq Scan on ""ReportFacts"" (cost=0.00..114240.68 rows=3912168 width=16) (actual time=0.051..574.183 rows=3912168 loops=1)"
" -> Hash (cost=1362.82..1362.82 rows=47482 width=24) (actual time=29.727..29.728 rows=47482 loops=1)"
" Buckets: 65536 Batches: 1 Memory Usage: 3109kB"
" -> Seq Scan on ""DateDimension"" (cost=0.00..1362.82 rows=47482 width=24) (actual time=7.436..17.498 rows=47482 loops=1)"
" -> Hash (cost=48.54..48.54 rows=1754 width=32) (actual time=1.580..1.582 rows=1754 loops=1)"
" Buckets: 2048 Batches: 1 Memory Usage: 134kB"
" -> Seq Scan on ""GsmNetDimension"" (cost=0.00..48.54 rows=1754 width=32) (actual time=0.784..1.134 rows=1754 loops=1)"
"Planning Time: 6.835 ms"
"Execution Time: 8981.153 ms"
```
#### Notes
- Zmena poradi joinu vedla ke zlepseni casu prumerne o sekundu a to na 8539.284 ms
#### Zrychlenie s Materialized View FactsWithDateTime
- zlepsenie casu na 7598.753 ms
```sql=
SELECT day, month, network, country, COUNT(*) connection_count FROM xbezak."FactsWithDateTime"
LEFT OUTER JOIN xbezak."GsmNetDimension" ON "GsmNetDimension".id=gsmnet_id
GROUP BY ROLLUP(network, day, month, country)
ORDER BY month ASC, day ASC, connection_count DESC
```
```
"Sort (cost=875357.01..875932.53 rows=230209 width=48) (actual time=8407.352..8407.614 rows=5820 loops=1)"
" Sort Key: ""FactsWithDateTime"".month, ""FactsWithDateTime"".day, (count(*)) DESC"
" Sort Method: quicksort Memory: 694kB"
" -> GroupAggregate (cost=786780.99..847769.92 rows=230209 width=48) (actual time=6247.404..8401.379 rows=5820 loops=1)"
" Group Key: ""GsmNetDimension"".network, ""FactsWithDateTime"".day, ""FactsWithDateTime"".month, ""GsmNetDimension"".country"
" Group Key: ""GsmNetDimension"".network, ""FactsWithDateTime"".day, ""FactsWithDateTime"".month"
" Group Key: ""GsmNetDimension"".network, ""FactsWithDateTime"".day"
" Group Key: ""GsmNetDimension"".network"
" Group Key: ()"
" -> Sort (cost=786780.99..796562.13 rows=3912456 width=40) (actual time=6245.732..7488.417 rows=3912168 loops=1)"
" Sort Key: ""GsmNetDimension"".network, ""FactsWithDateTime"".day, ""FactsWithDateTime"".month, ""GsmNetDimension"".country"
" Sort Method: external merge Disk: 163112kB"
" -> Hash Left Join (cost=70.47..144405.04 rows=3912456 width=40) (actual time=1.281..2193.020 rows=3912168 loops=1)"
" Hash Cond: (""FactsWithDateTime"".gsmnet_id = ""GsmNetDimension"".id)"
" -> Seq Scan on ""FactsWithDateTime"" (cost=0.00..134039.56 rows=3912456 width=24) (actual time=0.555..1191.550 rows=3912168 loops=1)"
" -> Hash (cost=48.54..48.54 rows=1754 width=32) (actual time=0.711..0.713 rows=1754 loops=1)"
" Buckets: 2048 Batches: 1 Memory Usage: 134kB"
" -> Seq Scan on ""GsmNetDimension"" (cost=0.00..48.54 rows=1754 width=32) (actual time=0.034..0.333 rows=1754 loops=1)"
"Planning Time: 0.744 ms"
"Execution Time: 8437.540 ms"
```
## Q3 Report on oldest and newest app version for all devices
#### Original
- Puvodni cas: 1339.753 ms (Momentalne originalni dotaz beha za 4142.505 ms, viz. Notes)
```sql=
SELECT device, MIN(program_ver) min_version, MAX(program_ver) max_version FROM xbezak."ReportFacts"
LEFT OUTER JOIN xbezak."DeviceDimension" ON "DeviceDimension".id=device_id
LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON "ProgramVersionDimension".id=program_ver_id
GROUP BY device
```
```
"HashAggregate (cost=166028.08..166030.01 rows=193 width=74) (actual time=4398.415..4398.452 rows=137 loops=1)"
" Group Key: ""DeviceDimension"".device"
" Batches: 1 Memory Usage: 64kB"
" -> Hash Left Join (cost=8.95..136686.82 rows=3912168 width=38) (actual time=0.233..2698.644 rows=3912168 loops=1)"
" Hash Cond: (""ReportFacts"".program_ver_id = ""ProgramVersionDimension"".id)"
" -> Hash Left Join (cost=7.37..124741.83 rows=3912168 width=18) (actual time=0.198..1717.027 rows=3912168 loops=1)"
" Hash Cond: (""ReportFacts"".device_id = ""DeviceDimension"".id)"
" -> Seq Scan on ""ReportFacts"" (cost=0.00..114240.68 rows=3912168 width=16) (actual time=0.043..597.448 rows=3912168 loops=1)"
" -> Hash (cost=4.94..4.94 rows=194 width=18) (actual time=0.141..0.142 rows=194 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 19kB"
" -> Seq Scan on ""DeviceDimension"" (cost=0.00..4.94 rows=194 width=18) (actual time=0.027..0.073 rows=194 loops=1)"
" -> Hash (cost=1.26..1.26 rows=26 width=36) (actual time=0.024..0.024 rows=26 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 10kB"
" -> Seq Scan on ""ProgramVersionDimension"" (cost=0.00..1.26 rows=26 width=36) (actual time=0.009..0.015 rows=26 loops=1)"
"Planning Time: 0.948 ms"
"Execution Time: 4398.564 ms"
```
#### Notes:
- Kdyz jsme puvodne delali HA2, tento dotaz zbehl za 1.3s. Momentalne puvodni dotaz beha za priblizne 4s. Nevime pricinu teto zmeny. Moznostou je vyssie vytazenie databazoveho serveru.
- Zmena poradi Joinu nema vliv na rychlost.
#### Zrychleni s Materialized view FactsWithDeviceAndProgramVersion
- Zrychleni na 2581.111 ms (zrychleni o cca 2s origninal momentalne beha za 4443.858 ms)
```sql=
SELECT device, MIN(program_ver) min_version, MAX(program_ver) max_version FROM xgrygar."FactsWithDeviceAndProgramVersion"
GROUP BY device
```
```
"HashAggregate (cost=148441.07..148442.36 rows=130 width=74) (actual time=2990.686..2990.725 rows=137 loops=1)"
" Group Key: device"
" Batches: 1 Memory Usage: 64kB"
" -> Seq Scan on ""FactsWithDeviceAndProgramVersion"" (cost=0.00..119097.18 rows=3912518 width=14) (actual time=0.460..1004.487 rows=3912168 loops=1)"
"Planning Time: 0.154 ms"
"Execution Time: 2990.773 ms"
```
## Q4 Report on the reliability of devices – number of app restarts without device restart
#### Original
- Puvodni cas: 58350.471 ms (Momentalne originalni dotaz beha za 71688.824 ms)
- Dalsi zpomaleni originalniho dotazu na 85734.944 ms
```sql=
SELECT "DeviceDimension".device, program_ver, COUNT(*) number_of_crashes
FROM (SELECT
pda_imei, device_id, program_ver_id,
CASE
WHEN pda_run_time_previous < pda_run_time AND app_run_time_previous > app_run_time THEN 1
ELSE 0
END AS CRASH
FROM (SELECT device_id, program_ver_id, pda_imei, app_run_time, pda_run_time, LAG(pda_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS pda_run_time_previous,
LAG(app_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS app_run_time_previous
FROM xbezak."ReportFacts"
LEFT OUTER JOIN xbezak."DateDimension" ON date_id="DateDimension".id
LEFT OUTER JOIN xbezak."TimeDimension" ON time_id="TimeDimension".id
ORDER BY year ASC, month, day, hour, minute) tb)
AS crashes
LEFT OUTER JOIN xbezak."DeviceDimension" ON device_id="DeviceDimension".id
LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id="ProgramVersionDimension".id
WHERE crashes.crash=1
GROUP BY device, program_ver
ORDER BY number_of_crashes DESC
```
```
"Sort (cost=2163789.57..2163802.12 rows=5018 width=46) (actual time=73061.356..73061.373 rows=143 loops=1)"
" Sort Key: (count(*)) DESC"
" Sort Method: quicksort Memory: 36kB"
" -> GroupAggregate (cost=2163235.35..2163481.14 rows=5018 width=46) (actual time=73059.274..73061.321 rows=143 loops=1)"
" Group Key: ""DeviceDimension"".device, ""ProgramVersionDimension"".program_ver"
" -> Sort (cost=2163235.35..2163284.26 rows=19561 width=38) (actual time=73059.262..73059.736 rows=10340 loops=1)"
" Sort Key: ""DeviceDimension"".device, ""ProgramVersionDimension"".program_ver"
" Sort Method: quicksort Memory: 894kB"
" -> Hash Left Join (cost=2083485.53..2161841.08 rows=19561 width=38) (actual time=66179.661..73053.343 rows=10340 loops=1)"
" Hash Cond: (tb.program_ver_id = ""ProgramVersionDimension"".id)"
" -> Hash Left Join (cost=2083483.95..2161779.78 rows=19561 width=18) (actual time=66179.630..73050.501 rows=10340 loops=1)"
" Hash Cond: (tb.device_id = ""DeviceDimension"".id)"
" -> Subquery Scan on tb (cost=2083476.58..2161719.94 rows=19561 width=16) (actual time=66179.516..73047.342 rows=10340 loops=1)"
" Filter: (CASE WHEN ((tb.pda_run_time_previous < tb.pda_run_time) AND (tb.app_run_time_previous > tb.app_run_time)) THEN 1 ELSE 0 END = 1)"
" Rows Removed by Filter: 3901828"
" -> Sort (cost=2083476.58..2093257.00 rows=3912168 width=148) (actual time=66179.474..72093.395 rows=3912168 loops=1)"
" Sort Key: ""DateDimension"".year, ""DateDimension"".month, ""DateDimension"".day, ""TimeDimension"".hour, ""TimeDimension"".minute"
" Sort Method: external merge Disk: 434704kB"
" -> WindowAgg (cost=939600.78..1066746.24 rows=3912168 width=148) (actual time=20449.676..26067.202 rows=3912168 loops=1)"
" -> Sort (cost=939600.78..949381.20 rows=3912168 width=84) (actual time=20449.655..22342.228 rows=3912168 loops=1)"
" Sort Key: ""ReportFacts"".pda_imei, ""DateDimension"".year, ""DateDimension"".month, ""DateDimension"".day, ""TimeDimension"".hour, ""TimeDimension"".minute"
" Sort Method: external merge Disk: 367048kB"
" -> Hash Left Join (cost=2007.74..136818.45 rows=3912168 width=84) (actual time=28.761..2878.526 rows=3912168 loops=1)"
" Hash Cond: (""ReportFacts"".time_id = ""TimeDimension"".id)"
" -> Hash Left Join (cost=1956.34..126467.38 rows=3912168 width=76) (actual time=28.161..1912.974 rows=3912168 loops=1)"
" Hash Cond: (""ReportFacts"".date_id = ""DateDimension"".id)"
" -> Seq Scan on ""ReportFacts"" (cost=0.00..114240.68 rows=3912168 width=60) (actual time=0.033..549.695 rows=3912168 loops=1)"
" -> Hash (cost=1362.82..1362.82 rows=47482 width=32) (actual time=27.877..27.878 rows=47482 loops=1)"
" Buckets: 65536 Batches: 1 Memory Usage: 3480kB"
" -> Seq Scan on ""DateDimension"" (cost=0.00..1362.82 rows=47482 width=32) (actual time=0.797..11.902 rows=47482 loops=1)"
" -> Hash (cost=33.40..33.40 rows=1440 width=24) (actual time=0.579..0.580 rows=1440 loops=1)"
" Buckets: 2048 Batches: 1 Memory Usage: 95kB"
" -> Seq Scan on ""TimeDimension"" (cost=0.00..33.40 rows=1440 width=24) (actual time=0.028..0.256 rows=1440 loops=1)"
" -> Hash (cost=4.94..4.94 rows=194 width=18) (actual time=0.090..0.090 rows=194 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 19kB"
" -> Seq Scan on ""DeviceDimension"" (cost=0.00..4.94 rows=194 width=18) (actual time=0.021..0.054 rows=194 loops=1)"
" -> Hash (cost=1.26..1.26 rows=26 width=36) (actual time=0.021..0.022 rows=26 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 10kB"
" -> Seq Scan on ""ProgramVersionDimension"" (cost=0.00..1.26 rows=26 width=36) (actual time=0.011..0.015 rows=26 loops=1)"
"Planning Time: 1.600 ms"
"Execution Time: 73183.007 ms"
```
Notes:
- Zmena poradi Joinu nema zasadni vliv na rychlost
#### Zrychleni s materialized view FactsWithDateTime
- Zrychleni na 70449.287 ms (priblizne o 15s oproti momentalni delce origninalniho dotazu (85205.838 ms))
```sql=
SELECT "DeviceDimension".device, program_ver, COUNT(*) number_of_crashes
FROM (SELECT
pda_imei, device_id, program_ver_id,
CASE
WHEN pda_run_time_previous < pda_run_time AND app_run_time_previous > app_run_time THEN 1
ELSE 0
END AS CRASH
FROM (SELECT device_id, program_ver_id, pda_imei, app_run_time, pda_run_time, LAG(pda_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS pda_run_time_previous,
LAG(app_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS app_run_time_previous
FROM xbezak."FactsWithDateTime"
ORDER BY year ASC, month, day, hour, minute) tb)
AS crashes
LEFT OUTER JOIN xbezak."DeviceDimension" ON device_id="DeviceDimension".id
LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id="ProgramVersionDimension".id
WHERE crashes.crash=1
GROUP BY device, program_ver
ORDER BY number_of_crashes DESC
```
#### Pouziti IN predicate
- Pouziti In predicate nemelo zadny vliv na zrychleni.
#### Zrychleni s materialized view FactsWithDateTime a FactsWithDeviceAndProgramVersion
- Zrychleni na 75215.919 ms oproti originalnimu dotazu momentalne okolo 80738.224 ms. Cili cca o 5s oproti originalu.
```sql=
SELECT "FactsWithDeviceAndProgramVersion".device, "FactsWithDeviceAndProgramVersion".program_ver, COUNT(*) number_of_crashes
FROM (SELECT
pda_imei, tb.id,
CASE
WHEN pda_run_time_previous < pda_run_time AND app_run_time_previous > app_run_time THEN 1
ELSE 0
END AS CRASH
FROM (SELECT id, pda_imei, app_run_time, pda_run_time, LAG(pda_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS pda_run_time_previous,
LAG(app_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS app_run_time_previous
FROM xbezak."FactsWithDateTime"
ORDER BY year ASC, month, day, hour, minute) tb)
AS crashes
LEFT OUTER JOIN xgrygar."FactsWithDeviceAndProgramVersion" ON "FactsWithDeviceAndProgramVersion".id="crashes".id
WHERE crashes.crash=1
GROUP BY device, program_ver
ORDER BY number_of_crashes DESC
```
```
"Sort (cost=2241524.70..2241529.58 rows=1950 width=22) (actual time=71068.725..71068.736 rows=143 loops=1)"
" Sort Key: (count(*)) DESC"
" Sort Method: quicksort Memory: 36kB"
" -> GroupAggregate (cost=2241203.02..2241418.14 rows=1950 width=22) (actual time=71066.718..71068.658 rows=143 loops=1)"
" Group Key: ""FactsWithDeviceAndProgramVersion"".device, ""FactsWithDeviceAndProgramVersion"".program_ver"
" -> Sort (cost=2241203.02..2241251.93 rows=19562 width=14) (actual time=71066.706..71067.134 rows=10340 loops=1)"
" Sort Key: ""FactsWithDeviceAndProgramVersion"".device, ""FactsWithDeviceAndProgramVersion"".program_ver"
" Sort Method: quicksort Memory: 894kB"
" -> Hash Right Join (cost=2105843.93..2239808.67 rows=19562 width=14) (actual time=69868.560..71061.576 rows=10340 loops=1)"
" Hash Cond: (""FactsWithDeviceAndProgramVersion"".id = tb.id)"
" -> Seq Scan on ""FactsWithDeviceAndProgramVersion"" (cost=0.00..119097.18 rows=3912518 width=22) (actual time=0.061..827.484 rows=3912168 loops=1)"
" -> Hash (cost=2105599.40..2105599.40 rows=19562 width=8) (actual time=69867.314..69867.317 rows=10340 loops=1)"
" Buckets: 32768 Batches: 1 Memory Usage: 660kB"
" -> Subquery Scan on tb (cost=2027350.28..2105599.40 rows=19562 width=8) (actual time=63304.436..69863.720 rows=10340 loops=1)"
" Filter: (CASE WHEN ((tb.pda_run_time_previous < tb.pda_run_time) AND (tb.app_run_time_previous > tb.app_run_time)) THEN 1 ELSE 0 END = 1)"
" Rows Removed by Filter: 3901828"
" -> Sort (cost=2027350.28..2037131.42 rows=3912456 width=140) (actual time=63304.397..68925.391 rows=3912168 loops=1)"
" Sort Key: ""FactsWithDateTime"".year, ""FactsWithDateTime"".month, ""FactsWithDateTime"".day, ""FactsWithDateTime"".hour, ""FactsWithDateTime"".minute"
" Sort Method: external merge Disk: 404056kB"
" -> WindowAgg (cost=910136.51..1037291.33 rows=3912456 width=140) (actual time=19286.160..24882.051 rows=3912168 loops=1)"
" -> Sort (cost=910136.51..919917.65 rows=3912456 width=76) (actual time=19286.138..21202.214 rows=3912168 loops=1)"
" Sort Key: ""FactsWithDateTime"".pda_imei, ""FactsWithDateTime"".year, ""FactsWithDateTime"".month, ""FactsWithDateTime"".day, ""FactsWithDateTime"".hour, ""FactsWithDateTime"".minute"
" Sort Method: external merge Disk: 336400kB"
" -> Seq Scan on ""FactsWithDateTime"" (cost=0.00..134039.56 rows=3912456 width=76) (actual time=0.024..1560.209 rows=3912168 loops=1)"
"Planning Time: 0.908 ms"
"Execution Time: 71188.588 ms"
```
#### Zrychleni s materialized view FactsWithDeviceProgramDateTime
- **Zpomaleni** na 84698.290 ms z momentalni rychlosti originalniho dotazu 72757.651 ms
```sql=
SELECT device, program_ver, COUNT(*) number_of_crashes
FROM (SELECT
pda_imei, device, program_ver,
CASE
WHEN pda_run_time_previous < pda_run_time AND app_run_time_previous > app_run_time THEN 1
ELSE 0
END AS CRASH
FROM (SELECT device, program_ver, pda_imei, app_run_time, pda_run_time, LAG(pda_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS pda_run_time_previous,
LAG(app_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS app_run_time_previous
FROM xgrygar."FactsWithDeviceProgramDateTime"
ORDER BY year ASC, month, day, hour, minute) tb)
AS crashes
WHERE crashes.crash=1
GROUP BY device, program_ver
ORDER BY number_of_crashes DESC
```
#### Redukce poctu LAG operaci
- Pri zkouseni zredukovani poctu operaci LAG (spojeni do 1) v dotazu se cas nijak nezmenil.
#### Nove materialized view CrashView jen pro Q4
- Ano jsme si vedomi, ze tato cast neodpovida zadani, kde musime materialized view pouzit minimalne pro 2 dotazy.
- Duvod proc jsem to i tak chteli zkusit je, ze nasledujici view by mohlo doopravdy zrychlit tento dotaz, protoze predchozi pokusy byly prakticky marne.
```sql=
SELECT "FactsWithDeviceProgramDateTime".device,
"FactsWithDeviceProgramDateTime".program_ver,
"FactsWithDeviceProgramDateTime".pda_imei,
"FactsWithDeviceProgramDateTime".app_run_time,
"FactsWithDeviceProgramDateTime".pda_run_time,
lag("FactsWithDeviceProgramDateTime".pda_run_time, 1) OVER (PARTITION BY "FactsWithDeviceProgramDateTime".pda_imei ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute) AS pda_run_time_previous,
lag("FactsWithDeviceProgramDateTime".app_run_time, 1) OVER (PARTITION BY "FactsWithDeviceProgramDateTime".pda_imei ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute) AS app_run_time_previous
FROM "FactsWithDeviceProgramDateTime"
ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute;
```
**Pouziti View**
- Zrychleni na 1509.594 ms z momentalne originalnich 81689.375 ms
```sql=
SELECT device, program_ver, COUNT(*) number_of_crashes
FROM (SELECT
pda_imei, device, program_ver,
CASE
WHEN pda_run_time_previous < pda_run_time AND app_run_time_previous > app_run_time THEN 1
ELSE 0
END AS CRASH
FROM xgrygar."CrashView" )
AS crashes
WHERE crashes.crash=1
GROUP BY device, program_ver
ORDER BY number_of_crashes DESC
```
```
"Sort (cost=110815.91..110821.11 rows=2080 width=22) (actual time=1145.793..1145.800 rows=143 loops=1)"
" Sort Key: (count(*)) DESC"
" Sort Method: quicksort Memory: 36kB"
" -> HashAggregate (cost=110680.47..110701.27 rows=2080 width=22) (actual time=1145.702..1145.750 rows=143 loops=1)"
" Group Key: ""CrashView"".device, ""CrashView"".program_ver"
" Batches: 1 Memory Usage: 129kB"
" -> Seq Scan on ""CrashView"" (cost=0.00..110533.76 rows=19561 width=14) (actual time=0.018..1142.272 rows=10340 loops=1)"
" Filter: (CASE WHEN ((pda_run_time_previous < pda_run_time) AND (app_run_time_previous > app_run_time)) THEN 1 ELSE 0 END = 1)"
" Rows Removed by Filter: 3901828"
"Planning Time: 0.191 ms"
"Execution Time: 1145.888 ms"
```
#### Uprava dotazu
- poslednou moznostou by bolo jednodusit resp. zefektivnit samotny prikaz
- problemom je, ze pri nasej implementacii sme isli ovela viac do hlbky ako napriklad vzorove riesenie a kontorlujeme vyslovene predchadzajuci run_time a ten akutalny, co nam pridava vela komplexity do samotneho dotazu (najma pouzitie window functions - o ktorych casovej narocnosti sme sa presvecili mat. view z odstavca vyssie)
- pri uprave query bez window functions dochadza k totalnej zmene vysledkov a podla nasho nazoru prichadzame o detekciu niektorych crashov
- rozhodli sme sa preto dotaz dalej neupravovat
##### vysledek dotazu ze vzoroveho reseni
| app ver | device | crashes |
| -------- | -------- | -------- |
|"A56" | "KSA-LX9"| 288|
|"A56"| "DRA-L21" |285|
|"A56"| "HUAWEI CUN-L21"| 153|
|"A56"| "MYA-L41"| 130|
|"A56"| "Redmi Go" |79|
|"A56"| "Lenovo A6010"| 74|
|"A53"| "DRA-L21"| 74|
|"A53"| "HUAWEI CUN-L21"| 67|
|"A46" |"HUAWEI CUN-L21" |63|
|"A56"| "Mi Note 10"| 55|
##### vysledek naseho dotazu
| app ver | device | crashes |
| -------- | -------- | -------- |
|"A56"| "YAL-L41" |1574|
|"A56" |"Mi Note 10" |1569|
|"A56"| "Pixel 4" |1320|
|"A56"| "Nokia 2.3"| 1294|
|"A56"| "MYA-L41" |577|
|"A56"| "JSN-L21" |574|
|"A56"| "MRD-LX1" |406|
|"A56" |"KSA-LX9" |362|
|"A56" |"MAR-LX1A"| 265|
|"A56" |"AMN-LX9" |175|
## Q5 Identify “connectivity” problems –an app instance connecting too often (i.e., next connection in much shorter time than 5 mins)
#### Original
- Puvodni cas: 51468.216 ms (Momentalne originalni dotaz beha za 65122.524 ms)
- Pri dalsim behu dotazu beha originalni dotaz priblizne za 73931.437 ms
```sql=
SELECT pda_imei, device, program_ver, COUNT(*) number_of_disconnects
FROM
(SELECT
pda_imei, device, program_ver, hour, minute, prev_hour, prev_minute,
CASE
WHEN hour=prev_hour THEN
CASE WHEN minute-prev_minute<5 THEN 1
ELSE 0
END
ELSE
-- prechod cez hodinu
CASE WHEN (minute+60)-prev_minute<5 THEN 1
ELSE 0
END
END AS disconnent
FROM
(SELECT pda_imei,
device,
program_ver,
hour, minute,
LAG(hour, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_hour,
LAG(minute, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_minute
FROM xbezak."ReportFacts"
LEFT OUTER JOIN xbezak."DateDimension" ON date_id="DateDimension".id
LEFT OUTER JOIN xbezak."TimeDimension" ON time_id="TimeDimension".id
LEFT OUTER JOIN xbezak."DeviceDimension" ON device_id="DeviceDimension".id
LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id=xbezak."ProgramVersionDimension".id
ORDER BY year ASC, month, day, hour, minute) lag_table) disconnect_table
WHERE disconnect_table.disconnent=1
GROUP BY ROLLUP(pda_imei, device, program_ver)
ORDER BY number_of_disconnects DESC, pda_imei
```
```
"Sort (cost=2062716.12..2062815.63 rows=39803 width=62) (actual time=74520.930..74521.041 rows=2218 loops=1)"
" Sort Key: (count(*)) DESC, lag_table.pda_imei"
" Sort Method: quicksort Memory: 270kB"
" -> GroupAggregate (cost=2059032.51..2059675.06 rows=39803 width=62) (actual time=74206.517..74519.072 rows=2218 loops=1)"
" Group Key: lag_table.pda_imei, lag_table.device, lag_table.program_ver"
" Group Key: lag_table.pda_imei, lag_table.device"
" Group Key: lag_table.pda_imei"
" Group Key: ()"
" -> Sort (cost=2059032.51..2059081.42 rows=19561 width=54) (actual time=74206.500..74405.809 rows=404312 loops=1)"
" Sort Key: lag_table.pda_imei, lag_table.device, lag_table.program_ver"
" Sort Method: external merge Disk: 15992kB"
" -> Subquery Scan on lag_table (cost=1998955.72..2057638.24 rows=19561 width=54) (actual time=67989.149..73677.453 rows=404312 loops=1)"
" Filter: (CASE WHEN (lag_table.hour = lag_table.prev_hour) THEN CASE WHEN ((lag_table.minute - lag_table.prev_minute) < '5'::double precision) THEN 1 ELSE 0 END ELSE CASE WHEN (((lag_table.minute + '60'::double precision) - lag_table.prev_minute) < '5'::double precision) THEN 1 ELSE 0 END END = 1)"
" Rows Removed by Filter: 3507856"
" -> Sort (cost=1998955.72..2008736.14 rows=3912168 width=110) (actual time=67989.108..73059.034 rows=3912168 loops=1)"
" Sort Key: ""DateDimension"".year, ""DateDimension"".month, ""DateDimension"".day, ""TimeDimension"".hour, ""TimeDimension"".minute"
" Sort Method: external merge Disk: 384296kB"
" -> WindowAgg (cost=988793.92..1115939.38 rows=3912168 width=110) (actual time=26354.206..32081.488 rows=3912168 loops=1)"
" -> Sort (cost=988793.92..998574.34 rows=3912168 width=94) (actual time=26354.180..28649.286 rows=3912168 loops=1)"
" Sort Key: ""ReportFacts"".pda_imei, ""DateDimension"".year, ""DateDimension"".month, ""DateDimension"".day, ""TimeDimension"".hour, ""TimeDimension"".minute"
" Sort Method: external merge Disk: 307968kB"
" -> Hash Left Join (cost=2016.69..159264.58 rows=3912168 width=94) (actual time=24.711..5139.610 rows=3912168 loops=1)"
" Hash Cond: (""ReportFacts"".program_ver_id = ""ProgramVersionDimension"".id)"
" -> Hash Left Join (cost=2015.11..147319.60 rows=3912168 width=74) (actual time=24.667..4281.327 rows=3912168 loops=1)"
" Hash Cond: (""ReportFacts"".device_id = ""DeviceDimension"".id)"
" -> Hash Left Join (cost=2007.74..136818.45 rows=3912168 width=72) (actual time=24.549..3460.432 rows=3912168 loops=1)"
" Hash Cond: (""ReportFacts"".time_id = ""TimeDimension"".id)"
" -> Hash Left Join (cost=1956.34..126467.38 rows=3912168 width=64) (actual time=23.903..2441.535 rows=3912168 loops=1)"
" Hash Cond: (""ReportFacts"".date_id = ""DateDimension"".id)"
" -> Seq Scan on ""ReportFacts"" (cost=0.00..114240.68 rows=3912168 width=48) (actual time=0.155..992.262 rows=3912168 loops=1)"
" -> Hash (cost=1362.82..1362.82 rows=47482 width=32) (actual time=23.431..23.432 rows=47482 loops=1)"
" Buckets: 65536 Batches: 1 Memory Usage: 3480kB"
" -> Seq Scan on ""DateDimension"" (cost=0.00..1362.82 rows=47482 width=32) (actual time=2.936..11.969 rows=47482 loops=1)"
" -> Hash (cost=33.40..33.40 rows=1440 width=24) (actual time=0.624..0.624 rows=1440 loops=1)"
" Buckets: 2048 Batches: 1 Memory Usage: 95kB"
" -> Seq Scan on ""TimeDimension"" (cost=0.00..33.40 rows=1440 width=24) (actual time=0.078..0.307 rows=1440 loops=1)"
" -> Hash (cost=4.94..4.94 rows=194 width=18) (actual time=0.098..0.098 rows=194 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 19kB"
" -> Seq Scan on ""DeviceDimension"" (cost=0.00..4.94 rows=194 width=18) (actual time=0.023..0.052 rows=194 loops=1)"
" -> Hash (cost=1.26..1.26 rows=26 width=36) (actual time=0.024..0.025 rows=26 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 10kB"
" -> Seq Scan on ""ProgramVersionDimension"" (cost=0.00..1.26 rows=26 width=36) (actual time=0.013..0.016 rows=26 loops=1)"
"Planning Time: 2.954 ms"
"Execution Time: 74635.285 ms"
```
#### Zrychleni s materaiized view FactsWithDateTime
- 64462.546 ms (cca 4s momentalne original beha za 68001.142 ms)
```sql=
SELECT pda_imei, device, program_ver, COUNT(*) number_of_disconnects
FROM
(SELECT
pda_imei, device, program_ver, hour, minute, prev_hour, prev_minute,
CASE
WHEN hour=prev_hour THEN
CASE WHEN minute-prev_minute<5 THEN 1
ELSE 0
END
ELSE
-- prechod cez hodinu
CASE WHEN (minute+60)-prev_minute<5 THEN 1
ELSE 0
END
END AS disconnent
FROM
(SELECT pda_imei,
device,
program_ver,
hour, minute,
LAG(hour, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_hour,
LAG(minute, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_minute
FROM xbezak."FactsWithDateTime"
LEFT OUTER JOIN xbezak."DeviceDimension" ON device_id="DeviceDimension".id
LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id=xbezak."ProgramVersionDimension".id
ORDER BY year ASC, month, day, hour, minute) lag_table) disconnect_table
WHERE disconnect_table.disconnent=1
GROUP BY ROLLUP(pda_imei, device, program_ver)
ORDER BY number_of_disconnects DESC, pda_imei
```
#### Pouziti IN predicate
- Pouziti In predicate nemelo zadny vliv na zrychleni.
#### Zrychleni s materialized view FactsWithDateTime a FactsWithDeviceAndProgramVersion
- **Zpomaleni** o 8s na 82280.650 ms z originalniho dotazu ktery momentalne behal cca za 74193.155 ms
```sql=
SELECT pda_imei, device, program_ver, COUNT(*) number_of_disconnects
FROM
(SELECT
pda_imei, device, program_ver, hour, minute, prev_hour, prev_minute,
CASE
WHEN hour=prev_hour THEN
CASE WHEN minute-prev_minute<5 THEN 1
ELSE 0
END
ELSE
-- prechod cez hodinu
CASE WHEN (minute+60)-prev_minute<5 THEN 1
ELSE 0
END
END AS disconnent
FROM
(SELECT "FactsWithDateTime".pda_imei,
device,
program_ver,
hour, minute,
LAG(hour, 1) OVER (PARTITION BY "FactsWithDateTime".pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_hour,
LAG(minute, 1) OVER (PARTITION BY "FactsWithDateTime".pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_minute
FROM xbezak."FactsWithDateTime"
LEFT OUTER JOIN xgrygar."FactsWithDeviceAndProgramVersion" ON "FactsWithDeviceAndProgramVersion".id="FactsWithDateTime".id
ORDER BY year ASC, month, day, hour, minute) lag_table) disconnect_table
WHERE disconnect_table.disconnent=1
GROUP BY ROLLUP(pda_imei, device, program_ver)
ORDER BY number_of_disconnects DESC, pda_imei
```
#### Zrychleni s materialized view
- zrychleni na 70477.253 ms z momentalni rychlosti 73916.948 ms originalniho dotazu
```sql=
SELECT pda_imei, device, program_ver, COUNT(*) number_of_disconnects
FROM
(SELECT
pda_imei, device, program_ver, hour, minute, prev_hour, prev_minute,
CASE
WHEN hour=prev_hour THEN
CASE WHEN minute-prev_minute<5 THEN 1
ELSE 0
END
ELSE
-- prechod cez hodinu
CASE WHEN (minute+60)-prev_minute<5 THEN 1
ELSE 0
END
END AS disconnent
FROM
(SELECT pda_imei,
device,
program_ver,
hour, minute,
LAG(hour, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_hour,
LAG(minute, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_minute
FROM xgrygar."FactsWithDeviceProgramDateTime"
ORDER BY year ASC, month, day, hour, minute) lag_table) disconnect_table
WHERE disconnect_table.disconnent=1
GROUP BY ROLLUP(pda_imei, device, program_ver)
ORDER BY number_of_disconnects DESC, pda_imei
```
```
"Sort (cost=1878261.47..1878354.63 rows=37263 width=38) (actual time=71012.038..71012.146 rows=2218 loops=1)"
" Sort Key: (count(*)) DESC, lag_table.pda_imei"
" Sort Method: quicksort Memory: 270kB"
" -> GroupAggregate (cost=1874815.04..1875432.20 rows=37263 width=38) (actual time=70694.219..71010.163 rows=2218 loops=1)"
" Group Key: lag_table.pda_imei, lag_table.device, lag_table.program_ver"
" Group Key: lag_table.pda_imei, lag_table.device"
" Group Key: lag_table.pda_imei"
" Group Key: ()"
" -> Sort (cost=1874815.04..1874863.95 rows=19562 width=30) (actual time=70694.200..70895.503 rows=404312 loops=1)"
" Sort Key: lag_table.pda_imei, lag_table.device, lag_table.program_ver"
" Sort Method: external merge Disk: 15992kB"
" -> Subquery Scan on lag_table (cost=1814734.28..1873420.68 rows=19562 width=30) (actual time=64332.861..70141.544 rows=404312 loops=1)"
" Filter: (CASE WHEN (lag_table.hour = lag_table.prev_hour) THEN CASE WHEN ((lag_table.minute - lag_table.prev_minute) < '5'::double precision) THEN 1 ELSE 0 END ELSE CASE WHEN (((lag_table.minute + '60'::double precision) - lag_table.prev_minute) < '5'::double precision) THEN 1 ELSE 0 END END = 1)"
" Rows Removed by Filter: 3507856"
" -> Sort (cost=1814734.28..1824515.35 rows=3912427 width=86) (actual time=64332.821..69510.462 rows=3912168 loops=1)"
" Sort Key: ""FactsWithDeviceProgramDateTime"".year, ""FactsWithDeviceProgramDateTime"".month, ""FactsWithDeviceProgramDateTime"".day, ""FactsWithDeviceProgramDateTime"".hour, ""FactsWithDeviceProgramDateTime"".minute"
" Sort Method: external merge Disk: 384296kB"
" -> WindowAgg (cost=884739.84..1011893.71 rows=3912427 width=86) (actual time=22855.551..28579.985 rows=3912168 loops=1)"
" -> Sort (cost=884739.84..894520.90 rows=3912427 width=70) (actual time=22855.530..25154.008 rows=3912168 loops=1)"
" Sort Key: ""FactsWithDeviceProgramDateTime"".pda_imei, ""FactsWithDeviceProgramDateTime"".year, ""FactsWithDeviceProgramDateTime"".month, ""FactsWithDeviceProgramDateTime"".day, ""FactsWithDeviceProgramDateTime"".hour, ""FactsWithDeviceProgramDateTime"".minute"
" Sort Method: external merge Disk: 307968kB"
" -> Seq Scan on ""FactsWithDeviceProgramDateTime"" (cost=0.00..135393.27 rows=3912427 width=70) (actual time=0.026..1756.121 rows=3912168 loops=1)"
"Planning Time: 0.476 ms"
"Execution Time: 71127.691 ms"
```
#### Nove materialized view DisconnectView jen pro Q5
- Ano jsme si vedomi, ze tato cast neodpovida zadani, kde musime materialized view pouzit minimalne pro 2 dotazy.
- Duvod proc jsem to i tak chteli zkusit je, ze nasledujici view by mohlo doopravdy zrychlit tento dotaz, protoze predchozi pokusy byly prakticky marne.
```sql=
SELECT "FactsWithDeviceProgramDateTime".pda_imei,
"FactsWithDeviceProgramDateTime".device,
"FactsWithDeviceProgramDateTime".program_ver,
"FactsWithDeviceProgramDateTime".hour,
"FactsWithDeviceProgramDateTime".minute,
lag("FactsWithDeviceProgramDateTime".hour, 1) OVER (PARTITION BY "FactsWithDeviceProgramDateTime".pda_imei ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute) AS prev_hour,
lag("FactsWithDeviceProgramDateTime".minute, 1) OVER (PARTITION BY "FactsWithDeviceProgramDateTime".pda_imei ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute) AS prev_minute
FROM "FactsWithDeviceProgramDateTime"
ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute;
```
**Pouziti View**
- Zrychleni na 1862.180 ms z momentalne originalnich 64767.946 ms
```sql=
SELECT pda_imei, device, program_ver, COUNT(*) number_of_disconnects
FROM
(SELECT
pda_imei, device, program_ver, hour, minute, prev_hour, prev_minute,
CASE
WHEN hour=prev_hour THEN
CASE WHEN minute-prev_minute<5 THEN 1
ELSE 0
END
ELSE
-- prechod cez hodinu
CASE WHEN (minute+60)-prev_minute<5 THEN 1
ELSE 0
END
END AS disconnent
FROM
xgrygar."DisconnectView") disconnect_table
WHERE disconnect_table.disconnent=1
GROUP BY ROLLUP(pda_imei, device, program_ver)
ORDER BY number_of_disconnects DESC, pda_imei
```
```
"Sort (cost=99218.39..99311.88 rows=37396 width=38) (actual time=1711.340..1711.441 rows=2218 loops=1)"
" Sort Key: (count(*)) DESC, ""DisconnectView"".pda_imei"
" Sort Method: quicksort Memory: 270kB"
" -> GroupAggregate (cost=95759.58..96378.05 rows=37396 width=38) (actual time=1398.075..1709.552 rows=2218 loops=1)"
" Group Key: ""DisconnectView"".pda_imei, ""DisconnectView"".device, ""DisconnectView"".program_ver"
" Group Key: ""DisconnectView"".pda_imei, ""DisconnectView"".device"
" Group Key: ""DisconnectView"".pda_imei"
" Group Key: ()"
" -> Sort (cost=95759.58..95808.48 rows=19561 width=30) (actual time=1398.057..1596.517 rows=404312 loops=1)"
" Sort Key: ""DisconnectView"".pda_imei, ""DisconnectView"".device, ""DisconnectView"".program_ver"
" Sort Method: external merge Disk: 15992kB"
" -> Seq Scan on ""DisconnectView"" (cost=0.00..94365.30 rows=19561 width=30) (actual time=0.021..868.988 rows=404312 loops=1)"
" Filter: (CASE WHEN (hour = prev_hour) THEN CASE WHEN ((minute - prev_minute) < '5'::double precision) THEN 1 ELSE 0 END ELSE CASE WHEN (((minute + '60'::double precision) - prev_minute) < '5'::double precision) THEN 1 ELSE 0 END END = 1)"
" Rows Removed by Filter: 3507856"
"Planning Time: 0.244 ms"
"Execution Time: 1714.540 ms"
```
#### Uprava dotazu
- podobne ako pri predchadzajucom query, tak nas povodny dotaz bol ponaty ovela detailnejsie, tu uz sice aj v pripade vzoroveho riesenia dochadza k pouzitiu window function, nas dotaz ale pocita aj pocet disconnectov, co je podla nas velmi dolezita informacia
- vysledky su oproti vzorovemu dotazu tiez ine, takze logika bola pouzita tiez ina
- kedze nam tento dotaz pride informativnejsi a nenasli sme nejaky sposob, akym rovnaku informaciu vytiahnut z databazy jednuduchsie (plati aj pre Q4), rozhodli sme sa ho dalej neupravovat
##### vysledek dotazu ze vzoroveho reseni
```
"860556046015392" "AGS2-L09" "A53"
"AGS2-L09" "A53"
"860540044404873" "AMN-LX9" "A53"
"860540044405193" "AMN-LX9" "A53"
"860540044443665" "AMN-LX9" "A53"
"860540044484016" "AMN-LX9" "A53"
"860540044908667" "AMN-LX9" "A53"
"861366047251965" "AMN-LX9" "A53"
"861945048272581" "AMN-LX9" "A53"
"861945048289874" "AMN-LX9" "A53"
```
##### vysledek naseho dotazu
```
221170
"0123456789ABCDEF" 55546
"0123456789ABCDEF" "SM-G975F" 8427
"0123456789ABCDEF" "SM-G975F" "A56" 8427
"0123456789ABCDEF" "Pixel 4" 8107
"0123456789ABCDEF" "Pixel 4" "A56" 8107
"0123456789ABCDEF" "Mi Note 10" 6425
"0123456789ABCDEF" "Mi Note 10" "A56" 6425
"861945048322394" "AMN-LX9" 5818
"861945048322394" 5818
"861945048322394" "AMN-LX9" "A54" 5818
"861366047251965" "AMN-LX9" 5816
```
### Zaver
- co sa tyka dodrzania zadania vseobecne sa nam dotazy podarilo zrychlit, ziadny ale na polovicu bez pouzitia velmi specializovanych views
- ohladom ceny vytvorenia nasich pomocnych struktur tak je velkou vyhodou datoveho skladu ze pocas prace s nim sa data nemenia, v OLTP ak by sme pouzili priliz vela indexov, tak by sa nam spomalili aktualizacia tabuliek kedze by sa museli akutalizovat struktury
- v nasom datovom sklade sa ale tieto aktualizacie nedeju a indexy resp. materialized views sa vytvoria pri samotnom naplneni
- vzdy ale zaberaju nejake miesto v pamati databazy a toto miesto nie je zadarmo, podla nas je toto ale vynahradene zrychlenim dotazov, pri views kde dopredu pocitame joiny je zrychlenie mensie (max 40%), tieto su ale vseobecne a su pouzite v minimalne dvoch queries
- k radikalnemu zrychleniu sa nam podarilo dospiet pri az pri poslednych dvoch dotazoch, kde sme si predpocitali narocnu operaciu LAGu, toto materialized view sme sice pouzili potom iba v 1 dotaze, cena by sa ale mohla vratit kebyze robime podobne dotazy, kde ho znovu vyuzijeme, alebo ho pustame opakovane
- Co se tyce prvnich 3 dotazu (Q1 - Q3), ty se nam podarilo znatelne zrychlit v ramci zadani. Sice ne o 50 %, ale jeden se tomu velmi blizil. Nejvic pomohli materialized view, ktere jsou pouzite u vice dotazu --> tedy v ramci zadani.
- K dotazum Q4 a Q5, tam materialized view a ostatni techniky moc nepomohli. Nejen ze odchylka casu byla velka, tedy casy s pouzitim nespecializovanych materialized view a ostatnich technik byly, da se rict, v ramci teto odchylky. Realne meritelne zlepseni prineslo az pouziti specializovanych materialized view, ktere nesplnuji zdani. V tu chvili se oba dotazy dokazali radove (≈50x) zlepsit.