Adamek
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # PA220 HA3 Adam Grygar, Miroslav Bezak ## TODO - [x] Zeptat se zda tam musime davat ten cely shit z explain analyze ke kazde picovine - [ ] Pouzit jejich dotazy - [ ] Dat explain analyze na prvni a posledni dotaz ## Uvodni poznamky - Puvodni casy originalnich dotazu se v prubehu dni postupne menily. Vzdycky jsme se snazili overit cas puvodniho proti novemu dotazu v danem case. - Pro minimalizovani dochylky casu behu, jsme pousteli jak puvodni tak optimalizovany dotaz nekolikrat. Nakonec jsme vybrali nejlepe reprezentativni casy. - U dotazu Q4 a Q5 je odchylka casu pri kazdem jednom behu dotazu velka. Neda se proto poradne merit zrychleni. Tak jako tak zrychleni tam nemuzeme moc najit a maximalne se jedna o par sekund v par bezich. ## Materialized View FactsWithDateTime - dopredne spojenie faktov s datumom a casom, tieto dve dimenzie sa tam najcastejsie joinuju ```sql= SELECT "ReportFacts".id, "ReportFacts".program_ver_id, "ReportFacts".device_id, "DateDimension".day, "DateDimension".month, "DateDimension".year, "DateDimension".quarter, "DateDimension".day_of_week, "TimeDimension".hour, "TimeDimension".minute, "ReportFacts".tracking_mode_id, "ReportFacts".gsmnet_id, "ReportFacts".method_id, "ReportFacts".car_id, "ReportFacts".app_run_time, "ReportFacts".pda_run_time, "ReportFacts".battery_level, "ReportFacts".sim_imsi, "ReportFacts".pda_imei FROM "ReportFacts" LEFT JOIN "DateDimension" ON "ReportFacts".date_id = "DateDimension".id LEFT JOIN "TimeDimension" ON "ReportFacts".time_id = "TimeDimension".id; ``` ## Materialized View FactsWithDeviceAndProgramVersion - spojeni faktu s program version a device ```sql= SELECT "ReportFacts".id, "ReportFacts".tracking_mode_id, "ReportFacts".gsmnet_id, "ReportFacts".method_id, "ReportFacts".car_id, "ReportFacts".app_run_time, "ReportFacts".pda_run_time, "ReportFacts".battery_level, "ReportFacts".sim_imsi, "ReportFacts".pda_imei, "ReportFacts".date_id, "ReportFacts".time_id, "ProgramVersionDimension".program_ver, "ProgramVersionDimension".platform, "DeviceDimension".device FROM xbezak."ReportFacts" LEFT JOIN xbezak."ProgramVersionDimension" ON "ReportFacts".program_ver_id = "ProgramVersionDimension".id LEFT JOIN xbezak."DeviceDimension" ON "ReportFacts".device_id = "DeviceDimension".id; ``` ## Materialized View FactsWithDeviceProgramDateTime - spojeni faktu s program version, device, date, time ```sql= SELECT "ReportFacts".id, "DateDimension".day, "DateDimension".month, "DateDimension".year, "DateDimension".quarter, "DateDimension".day_of_week, "TimeDimension".hour, "TimeDimension".minute, "ReportFacts".tracking_mode_id, "ReportFacts".gsmnet_id, "ReportFacts".method_id, "ReportFacts".car_id, "ReportFacts".app_run_time, "ReportFacts".pda_run_time, "ReportFacts".battery_level, "ReportFacts".sim_imsi, "ReportFacts".pda_imei, "ProgramVersionDimension".program_ver, "ProgramVersionDimension".platform, "DeviceDimension".device FROM xbezak."ReportFacts" LEFT JOIN xbezak."DateDimension" ON "ReportFacts".date_id = "DateDimension".id LEFT JOIN xbezak."TimeDimension" ON "ReportFacts".time_id = "TimeDimension".id LEFT JOIN xbezak."ProgramVersionDimension" ON "ReportFacts".program_ver_id = "ProgramVersionDimension".id LEFT JOIN xbezak."DeviceDimension" ON "ReportFacts".device_id = "DeviceDimension".id; ``` ## Indexacia - najefektivnejsim sposobom ako zrychlit dotazy nad databazou je indexacia - index je vzdy nad primarnym klucom explicitne - je dobrou praktikou indexovat aj na cudzich klucoch, casto sa podla nich vyhladava and / or spaja - tieto indexy sme pridali do tabulky faktov - dalej sa je treba sa zamysliet nad indexami nad dalsimi stlpcami, chceme sa vyhnut sekvencemu prechadziu tabuliek pri nasich dotazoch (hladanie konkretnej hodnoty alebo intervalu) - v tomto nam mozu exekucne plany ktore sa tvoria pred kazdym dotazom, v PgAdmin vieme zobrazit pekne pomocou explain analyze tlacidla alebo priamo v textovom vypise prikazu EXPLAIN ANALYZE - cielom bolo pridat indexy tak, ze sa zbavime vsetkych sekvenych skenov, zostanu nam iba indexove a ine rychle skeny - indexy v databazach zvyknu mat velky branching faktor, takze na niekolko urovni hlbky sa vieme dostat ku vsetkym zaznamom aj pri milionoch riadkoch - to bola ale teoreticka idylka, ktora potom narazila na krutu realitu, po pridani indexov ich databazovy engine vobec nepouzil k vyhodnoteniu dotazov, nedoslo teda k zrychleniu - napadlo nam, ze engine proste spravil chybu vo vytvoreni execution planu a ked ho donutime tieto indexy pouzit, tak sa dotazy zrychlia - bohuzial sme sa ale mylili a po vynuteni indexov pomocou `SET enable_seqscan = OFF;` sa dotazy nezrychlili, rychlost bola rovnaka alebo dokonca horsia, takze chyba nebola v planovani enginu - nie sme si uplne 100% isty, preco doslo k takejto situacii ale dovodov moze byt viacero - indexy najvacsi rozdiel spravia ked hladame konkretnu hodnotu alebo interval vo velkej tabulke, taketo poziadavky sa ale v nasich dotazoch moc neobjavuju - pri spajani a groupovani sa v nasom pripade viac oplati prejst raz celu tabulku a najst vsetky hladane hodnoty ako sa stale pozerat do indexu; `O(n) vs O(n * log n)` ## Q1 List connection count per app version ad day during october 2020 #### Original - povodny cas: 2040.211 ms ```sql= SELECT day, program_ver, COUNT(*) connection_count FROM xbezak."ReportFacts" LEFT OUTER JOIN xbezak."DateDimension" ON date_id="DateDimension".id LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id="ProgramVersionDimension".id WHERE "DateDimension".month=10 GROUP BY ROLLUP(day, program_ver) ORDER BY day DESC, program_ver DESC ``` ``` "Sort (cost=131633.41..131635.50 rows=838 width=44) (actual time=2961.060..2961.086 rows=496 loops=1)" " Sort Key: ""DateDimension"".day DESC, ""ProgramVersionDimension"".program_ver DESC" " Sort Method: quicksort Memory: 62kB" " -> MixedAggregate (cost=1070.05..131592.72 rows=838 width=44) (actual time=2960.449..2960.553 rows=496 loops=1)" " Hash Key: ""DateDimension"".day, ""ProgramVersionDimension"".program_ver" " Hash Key: ""DateDimension"".day" " Group Key: ()" " Batches: 1 Memory Usage: 113kB" " -> Hash Left Join (cost=1070.05..126596.29 rows=332537 width=36) (actual time=29.362..2250.775 rows=1956395 loops=1)" " Hash Cond: (""ReportFacts"".program_ver_id = ""ProgramVersionDimension"".id)" " -> Hash Join (cost=1068.47..125579.50 rows=332537 width=16) (actual time=28.874..1800.842 rows=1956395 loops=1)" " Hash Cond: (""ReportFacts"".date_id = ""DateDimension"".id)" " -> Seq Scan on ""ReportFacts"" (cost=0.00..114240.68 rows=3912168 width=16) (actual time=1.193..1064.311 rows=3912168 loops=1)" " -> Hash (cost=1018.02..1018.02 rows=4036 width=16) (actual time=26.984..26.986 rows=4030 loops=1)" " Buckets: 4096 Batches: 1 Memory Usage: 221kB" " -> Bitmap Heap Scan on ""DateDimension"" (cost=79.57..1018.02 rows=4036 width=16) (actual time=1.953..25.921 rows=4030 loops=1)" " Recheck Cond: (month = '10'::double precision)" " Heap Blocks: exact=167" " -> Bitmap Index Scan on ""MonthIndex"" (cost=0.00..78.56 rows=4036 width=0) (actual time=1.708..1.708 rows=4030 loops=1)" " Index Cond: (month = '10'::double precision)" " -> Hash (cost=1.26..1.26 rows=26 width=36) (actual time=0.464..0.465 rows=26 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 10kB" " -> Seq Scan on ""ProgramVersionDimension"" (cost=0.00..1.26 rows=26 width=36) (actual time=0.449..0.453 rows=26 loops=1)" "Planning Time: 8.789 ms" "Execution Time: 2961.244 ms" ``` #### Notes - Zmena poradi joinu nevedla ke zvyseni vykonu #### Zrychlenie s Materialized View FactsWithDateTime - zlepsenie casu na 1551.123 ms ```sql= SELECT day, program_ver, COUNT(*) connection_count FROM xbezak."FactsWithDateTime" LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id="ProgramVersionDimension".id WHERE "FactsWithDateTime".month=10 GROUP BY ROLLUP(day, program_ver) ORDER BY day DESC, program_ver DESC ``` #### Pouziti IN predicate - Pouziti In predicate nemelo zadny vliv na zrychleni. #### Zrychleni s Materialized View - Zrychleni na 1516.949 ms z originalniho dotazu, ktery momentalne beha za 2373.259 ms ```sql= SELECT day, program_ver, COUNT(*) connection_count FROM xgrygar."FactsWithDeviceProgramDateTime" WHERE "FactsWithDeviceProgramDateTime".month=10 GROUP BY ROLLUP(day, program_ver) ORDER BY day DESC, program_ver DESC ``` ``` "Sort (cost=174677.76..174679.08 rows=528 width=20) (actual time=2163.157..2163.181 rows=496 loops=1)" " Sort Key: day DESC, program_ver DESC" " Sort Method: quicksort Memory: 62kB" " -> MixedAggregate (cost=0.00..174653.88 rows=528 width=20) (actual time=2162.622..2162.721 rows=496 loops=1)" " Hash Key: day, program_ver" " Hash Key: day" " Group Key: ()" " Batches: 1 Memory Usage: 113kB" " -> Seq Scan on ""FactsWithDeviceProgramDateTime"" (cost=0.00..145174.34 rows=1964951 width=12) (actual time=2.022..1276.961 rows=1956395 loops=1)" " Filter: (month = '10'::double precision)" " Rows Removed by Filter: 1955773" "Planning Time: 0.756 ms" "Execution Time: 2163.273 ms" ``` ## Q2 List the connections counts in individual operator networks per weekday and month. #### Original - Puvodni cas: 9417.357 ms ```sql= SELECT day, month, network, country, COUNT(*) connection_count FROM xbezak."ReportFacts" LEFT OUTER JOIN xbezak."DateDimension" ON date_id="DateDimension".id LEFT OUTER JOIN xbezak."GsmNetDimension" ON "GsmNetDimension".id=gsmnet_id GROUP BY ROLLUP(network, day, month, country) ORDER BY month ASC, day ASC, connection_count DESC ``` ``` "Sort (cost=951729.35..953777.99 rows=819457 width=48) (actual time=8947.804..8948.095 rows=5820 loops=1)" " Sort Key: ""DateDimension"".month, ""DateDimension"".day, (count(*)) DESC" " Sort Method: quicksort Memory: 694kB" " -> GroupAggregate (cost=779153.43..846030.52 rows=819457 width=48) (actual time=6759.411..8941.431 rows=5820 loops=1)" " Group Key: ""GsmNetDimension"".network, ""DateDimension"".day, ""DateDimension"".month, ""GsmNetDimension"".country" " Group Key: ""GsmNetDimension"".network, ""DateDimension"".day, ""DateDimension"".month" " Group Key: ""GsmNetDimension"".network, ""DateDimension"".day" " Group Key: ""GsmNetDimension"".network" " Group Key: ()" " -> Sort (cost=779153.43..788933.85 rows=3912168 width=40) (actual time=6757.729..8028.101 rows=3912168 loops=1)" " Sort Key: ""GsmNetDimension"".network, ""DateDimension"".day, ""DateDimension"".month, ""GsmNetDimension"".country" " Sort Method: external merge Disk: 163112kB" " -> Hash Left Join (cost=2026.81..136832.09 rows=3912168 width=40) (actual time=31.979..2494.531 rows=3912168 loops=1)" " Hash Cond: (""ReportFacts"".gsmnet_id = ""GsmNetDimension"".id)" " -> Hash Left Join (cost=1956.34..126467.38 rows=3912168 width=24) (actual time=30.373..1639.844 rows=3912168 loops=1)" " Hash Cond: (""ReportFacts"".date_id = ""DateDimension"".id)" " -> Seq Scan on ""ReportFacts"" (cost=0.00..114240.68 rows=3912168 width=16) (actual time=0.051..574.183 rows=3912168 loops=1)" " -> Hash (cost=1362.82..1362.82 rows=47482 width=24) (actual time=29.727..29.728 rows=47482 loops=1)" " Buckets: 65536 Batches: 1 Memory Usage: 3109kB" " -> Seq Scan on ""DateDimension"" (cost=0.00..1362.82 rows=47482 width=24) (actual time=7.436..17.498 rows=47482 loops=1)" " -> Hash (cost=48.54..48.54 rows=1754 width=32) (actual time=1.580..1.582 rows=1754 loops=1)" " Buckets: 2048 Batches: 1 Memory Usage: 134kB" " -> Seq Scan on ""GsmNetDimension"" (cost=0.00..48.54 rows=1754 width=32) (actual time=0.784..1.134 rows=1754 loops=1)" "Planning Time: 6.835 ms" "Execution Time: 8981.153 ms" ``` #### Notes - Zmena poradi joinu vedla ke zlepseni casu prumerne o sekundu a to na 8539.284 ms #### Zrychlenie s Materialized View FactsWithDateTime - zlepsenie casu na 7598.753 ms ```sql= SELECT day, month, network, country, COUNT(*) connection_count FROM xbezak."FactsWithDateTime" LEFT OUTER JOIN xbezak."GsmNetDimension" ON "GsmNetDimension".id=gsmnet_id GROUP BY ROLLUP(network, day, month, country) ORDER BY month ASC, day ASC, connection_count DESC ``` ``` "Sort (cost=875357.01..875932.53 rows=230209 width=48) (actual time=8407.352..8407.614 rows=5820 loops=1)" " Sort Key: ""FactsWithDateTime"".month, ""FactsWithDateTime"".day, (count(*)) DESC" " Sort Method: quicksort Memory: 694kB" " -> GroupAggregate (cost=786780.99..847769.92 rows=230209 width=48) (actual time=6247.404..8401.379 rows=5820 loops=1)" " Group Key: ""GsmNetDimension"".network, ""FactsWithDateTime"".day, ""FactsWithDateTime"".month, ""GsmNetDimension"".country" " Group Key: ""GsmNetDimension"".network, ""FactsWithDateTime"".day, ""FactsWithDateTime"".month" " Group Key: ""GsmNetDimension"".network, ""FactsWithDateTime"".day" " Group Key: ""GsmNetDimension"".network" " Group Key: ()" " -> Sort (cost=786780.99..796562.13 rows=3912456 width=40) (actual time=6245.732..7488.417 rows=3912168 loops=1)" " Sort Key: ""GsmNetDimension"".network, ""FactsWithDateTime"".day, ""FactsWithDateTime"".month, ""GsmNetDimension"".country" " Sort Method: external merge Disk: 163112kB" " -> Hash Left Join (cost=70.47..144405.04 rows=3912456 width=40) (actual time=1.281..2193.020 rows=3912168 loops=1)" " Hash Cond: (""FactsWithDateTime"".gsmnet_id = ""GsmNetDimension"".id)" " -> Seq Scan on ""FactsWithDateTime"" (cost=0.00..134039.56 rows=3912456 width=24) (actual time=0.555..1191.550 rows=3912168 loops=1)" " -> Hash (cost=48.54..48.54 rows=1754 width=32) (actual time=0.711..0.713 rows=1754 loops=1)" " Buckets: 2048 Batches: 1 Memory Usage: 134kB" " -> Seq Scan on ""GsmNetDimension"" (cost=0.00..48.54 rows=1754 width=32) (actual time=0.034..0.333 rows=1754 loops=1)" "Planning Time: 0.744 ms" "Execution Time: 8437.540 ms" ``` ## Q3 Report on oldest and newest app version for all devices #### Original - Puvodni cas: 1339.753 ms (Momentalne originalni dotaz beha za 4142.505 ms, viz. Notes) ```sql= SELECT device, MIN(program_ver) min_version, MAX(program_ver) max_version FROM xbezak."ReportFacts" LEFT OUTER JOIN xbezak."DeviceDimension" ON "DeviceDimension".id=device_id LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON "ProgramVersionDimension".id=program_ver_id GROUP BY device ``` ``` "HashAggregate (cost=166028.08..166030.01 rows=193 width=74) (actual time=4398.415..4398.452 rows=137 loops=1)" " Group Key: ""DeviceDimension"".device" " Batches: 1 Memory Usage: 64kB" " -> Hash Left Join (cost=8.95..136686.82 rows=3912168 width=38) (actual time=0.233..2698.644 rows=3912168 loops=1)" " Hash Cond: (""ReportFacts"".program_ver_id = ""ProgramVersionDimension"".id)" " -> Hash Left Join (cost=7.37..124741.83 rows=3912168 width=18) (actual time=0.198..1717.027 rows=3912168 loops=1)" " Hash Cond: (""ReportFacts"".device_id = ""DeviceDimension"".id)" " -> Seq Scan on ""ReportFacts"" (cost=0.00..114240.68 rows=3912168 width=16) (actual time=0.043..597.448 rows=3912168 loops=1)" " -> Hash (cost=4.94..4.94 rows=194 width=18) (actual time=0.141..0.142 rows=194 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 19kB" " -> Seq Scan on ""DeviceDimension"" (cost=0.00..4.94 rows=194 width=18) (actual time=0.027..0.073 rows=194 loops=1)" " -> Hash (cost=1.26..1.26 rows=26 width=36) (actual time=0.024..0.024 rows=26 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 10kB" " -> Seq Scan on ""ProgramVersionDimension"" (cost=0.00..1.26 rows=26 width=36) (actual time=0.009..0.015 rows=26 loops=1)" "Planning Time: 0.948 ms" "Execution Time: 4398.564 ms" ``` #### Notes: - Kdyz jsme puvodne delali HA2, tento dotaz zbehl za 1.3s. Momentalne puvodni dotaz beha za priblizne 4s. Nevime pricinu teto zmeny. Moznostou je vyssie vytazenie databazoveho serveru. - Zmena poradi Joinu nema vliv na rychlost. #### Zrychleni s Materialized view FactsWithDeviceAndProgramVersion - Zrychleni na 2581.111 ms (zrychleni o cca 2s origninal momentalne beha za 4443.858 ms) ```sql= SELECT device, MIN(program_ver) min_version, MAX(program_ver) max_version FROM xgrygar."FactsWithDeviceAndProgramVersion" GROUP BY device ``` ``` "HashAggregate (cost=148441.07..148442.36 rows=130 width=74) (actual time=2990.686..2990.725 rows=137 loops=1)" " Group Key: device" " Batches: 1 Memory Usage: 64kB" " -> Seq Scan on ""FactsWithDeviceAndProgramVersion"" (cost=0.00..119097.18 rows=3912518 width=14) (actual time=0.460..1004.487 rows=3912168 loops=1)" "Planning Time: 0.154 ms" "Execution Time: 2990.773 ms" ``` ## Q4 Report on the reliability of devices – number of app restarts without device restart #### Original - Puvodni cas: 58350.471 ms (Momentalne originalni dotaz beha za 71688.824 ms) - Dalsi zpomaleni originalniho dotazu na 85734.944 ms ```sql= SELECT "DeviceDimension".device, program_ver, COUNT(*) number_of_crashes FROM (SELECT pda_imei, device_id, program_ver_id, CASE WHEN pda_run_time_previous < pda_run_time AND app_run_time_previous > app_run_time THEN 1 ELSE 0 END AS CRASH FROM (SELECT device_id, program_ver_id, pda_imei, app_run_time, pda_run_time, LAG(pda_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS pda_run_time_previous, LAG(app_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS app_run_time_previous FROM xbezak."ReportFacts" LEFT OUTER JOIN xbezak."DateDimension" ON date_id="DateDimension".id LEFT OUTER JOIN xbezak."TimeDimension" ON time_id="TimeDimension".id ORDER BY year ASC, month, day, hour, minute) tb) AS crashes LEFT OUTER JOIN xbezak."DeviceDimension" ON device_id="DeviceDimension".id LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id="ProgramVersionDimension".id WHERE crashes.crash=1 GROUP BY device, program_ver ORDER BY number_of_crashes DESC ``` ``` "Sort (cost=2163789.57..2163802.12 rows=5018 width=46) (actual time=73061.356..73061.373 rows=143 loops=1)" " Sort Key: (count(*)) DESC" " Sort Method: quicksort Memory: 36kB" " -> GroupAggregate (cost=2163235.35..2163481.14 rows=5018 width=46) (actual time=73059.274..73061.321 rows=143 loops=1)" " Group Key: ""DeviceDimension"".device, ""ProgramVersionDimension"".program_ver" " -> Sort (cost=2163235.35..2163284.26 rows=19561 width=38) (actual time=73059.262..73059.736 rows=10340 loops=1)" " Sort Key: ""DeviceDimension"".device, ""ProgramVersionDimension"".program_ver" " Sort Method: quicksort Memory: 894kB" " -> Hash Left Join (cost=2083485.53..2161841.08 rows=19561 width=38) (actual time=66179.661..73053.343 rows=10340 loops=1)" " Hash Cond: (tb.program_ver_id = ""ProgramVersionDimension"".id)" " -> Hash Left Join (cost=2083483.95..2161779.78 rows=19561 width=18) (actual time=66179.630..73050.501 rows=10340 loops=1)" " Hash Cond: (tb.device_id = ""DeviceDimension"".id)" " -> Subquery Scan on tb (cost=2083476.58..2161719.94 rows=19561 width=16) (actual time=66179.516..73047.342 rows=10340 loops=1)" " Filter: (CASE WHEN ((tb.pda_run_time_previous < tb.pda_run_time) AND (tb.app_run_time_previous > tb.app_run_time)) THEN 1 ELSE 0 END = 1)" " Rows Removed by Filter: 3901828" " -> Sort (cost=2083476.58..2093257.00 rows=3912168 width=148) (actual time=66179.474..72093.395 rows=3912168 loops=1)" " Sort Key: ""DateDimension"".year, ""DateDimension"".month, ""DateDimension"".day, ""TimeDimension"".hour, ""TimeDimension"".minute" " Sort Method: external merge Disk: 434704kB" " -> WindowAgg (cost=939600.78..1066746.24 rows=3912168 width=148) (actual time=20449.676..26067.202 rows=3912168 loops=1)" " -> Sort (cost=939600.78..949381.20 rows=3912168 width=84) (actual time=20449.655..22342.228 rows=3912168 loops=1)" " Sort Key: ""ReportFacts"".pda_imei, ""DateDimension"".year, ""DateDimension"".month, ""DateDimension"".day, ""TimeDimension"".hour, ""TimeDimension"".minute" " Sort Method: external merge Disk: 367048kB" " -> Hash Left Join (cost=2007.74..136818.45 rows=3912168 width=84) (actual time=28.761..2878.526 rows=3912168 loops=1)" " Hash Cond: (""ReportFacts"".time_id = ""TimeDimension"".id)" " -> Hash Left Join (cost=1956.34..126467.38 rows=3912168 width=76) (actual time=28.161..1912.974 rows=3912168 loops=1)" " Hash Cond: (""ReportFacts"".date_id = ""DateDimension"".id)" " -> Seq Scan on ""ReportFacts"" (cost=0.00..114240.68 rows=3912168 width=60) (actual time=0.033..549.695 rows=3912168 loops=1)" " -> Hash (cost=1362.82..1362.82 rows=47482 width=32) (actual time=27.877..27.878 rows=47482 loops=1)" " Buckets: 65536 Batches: 1 Memory Usage: 3480kB" " -> Seq Scan on ""DateDimension"" (cost=0.00..1362.82 rows=47482 width=32) (actual time=0.797..11.902 rows=47482 loops=1)" " -> Hash (cost=33.40..33.40 rows=1440 width=24) (actual time=0.579..0.580 rows=1440 loops=1)" " Buckets: 2048 Batches: 1 Memory Usage: 95kB" " -> Seq Scan on ""TimeDimension"" (cost=0.00..33.40 rows=1440 width=24) (actual time=0.028..0.256 rows=1440 loops=1)" " -> Hash (cost=4.94..4.94 rows=194 width=18) (actual time=0.090..0.090 rows=194 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 19kB" " -> Seq Scan on ""DeviceDimension"" (cost=0.00..4.94 rows=194 width=18) (actual time=0.021..0.054 rows=194 loops=1)" " -> Hash (cost=1.26..1.26 rows=26 width=36) (actual time=0.021..0.022 rows=26 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 10kB" " -> Seq Scan on ""ProgramVersionDimension"" (cost=0.00..1.26 rows=26 width=36) (actual time=0.011..0.015 rows=26 loops=1)" "Planning Time: 1.600 ms" "Execution Time: 73183.007 ms" ``` Notes: - Zmena poradi Joinu nema zasadni vliv na rychlost #### Zrychleni s materialized view FactsWithDateTime - Zrychleni na 70449.287 ms (priblizne o 15s oproti momentalni delce origninalniho dotazu (85205.838 ms)) ```sql= SELECT "DeviceDimension".device, program_ver, COUNT(*) number_of_crashes FROM (SELECT pda_imei, device_id, program_ver_id, CASE WHEN pda_run_time_previous < pda_run_time AND app_run_time_previous > app_run_time THEN 1 ELSE 0 END AS CRASH FROM (SELECT device_id, program_ver_id, pda_imei, app_run_time, pda_run_time, LAG(pda_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS pda_run_time_previous, LAG(app_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS app_run_time_previous FROM xbezak."FactsWithDateTime" ORDER BY year ASC, month, day, hour, minute) tb) AS crashes LEFT OUTER JOIN xbezak."DeviceDimension" ON device_id="DeviceDimension".id LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id="ProgramVersionDimension".id WHERE crashes.crash=1 GROUP BY device, program_ver ORDER BY number_of_crashes DESC ``` #### Pouziti IN predicate - Pouziti In predicate nemelo zadny vliv na zrychleni. #### Zrychleni s materialized view FactsWithDateTime a FactsWithDeviceAndProgramVersion - Zrychleni na 75215.919 ms oproti originalnimu dotazu momentalne okolo 80738.224 ms. Cili cca o 5s oproti originalu. ```sql= SELECT "FactsWithDeviceAndProgramVersion".device, "FactsWithDeviceAndProgramVersion".program_ver, COUNT(*) number_of_crashes FROM (SELECT pda_imei, tb.id, CASE WHEN pda_run_time_previous < pda_run_time AND app_run_time_previous > app_run_time THEN 1 ELSE 0 END AS CRASH FROM (SELECT id, pda_imei, app_run_time, pda_run_time, LAG(pda_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS pda_run_time_previous, LAG(app_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS app_run_time_previous FROM xbezak."FactsWithDateTime" ORDER BY year ASC, month, day, hour, minute) tb) AS crashes LEFT OUTER JOIN xgrygar."FactsWithDeviceAndProgramVersion" ON "FactsWithDeviceAndProgramVersion".id="crashes".id WHERE crashes.crash=1 GROUP BY device, program_ver ORDER BY number_of_crashes DESC ``` ``` "Sort (cost=2241524.70..2241529.58 rows=1950 width=22) (actual time=71068.725..71068.736 rows=143 loops=1)" " Sort Key: (count(*)) DESC" " Sort Method: quicksort Memory: 36kB" " -> GroupAggregate (cost=2241203.02..2241418.14 rows=1950 width=22) (actual time=71066.718..71068.658 rows=143 loops=1)" " Group Key: ""FactsWithDeviceAndProgramVersion"".device, ""FactsWithDeviceAndProgramVersion"".program_ver" " -> Sort (cost=2241203.02..2241251.93 rows=19562 width=14) (actual time=71066.706..71067.134 rows=10340 loops=1)" " Sort Key: ""FactsWithDeviceAndProgramVersion"".device, ""FactsWithDeviceAndProgramVersion"".program_ver" " Sort Method: quicksort Memory: 894kB" " -> Hash Right Join (cost=2105843.93..2239808.67 rows=19562 width=14) (actual time=69868.560..71061.576 rows=10340 loops=1)" " Hash Cond: (""FactsWithDeviceAndProgramVersion"".id = tb.id)" " -> Seq Scan on ""FactsWithDeviceAndProgramVersion"" (cost=0.00..119097.18 rows=3912518 width=22) (actual time=0.061..827.484 rows=3912168 loops=1)" " -> Hash (cost=2105599.40..2105599.40 rows=19562 width=8) (actual time=69867.314..69867.317 rows=10340 loops=1)" " Buckets: 32768 Batches: 1 Memory Usage: 660kB" " -> Subquery Scan on tb (cost=2027350.28..2105599.40 rows=19562 width=8) (actual time=63304.436..69863.720 rows=10340 loops=1)" " Filter: (CASE WHEN ((tb.pda_run_time_previous < tb.pda_run_time) AND (tb.app_run_time_previous > tb.app_run_time)) THEN 1 ELSE 0 END = 1)" " Rows Removed by Filter: 3901828" " -> Sort (cost=2027350.28..2037131.42 rows=3912456 width=140) (actual time=63304.397..68925.391 rows=3912168 loops=1)" " Sort Key: ""FactsWithDateTime"".year, ""FactsWithDateTime"".month, ""FactsWithDateTime"".day, ""FactsWithDateTime"".hour, ""FactsWithDateTime"".minute" " Sort Method: external merge Disk: 404056kB" " -> WindowAgg (cost=910136.51..1037291.33 rows=3912456 width=140) (actual time=19286.160..24882.051 rows=3912168 loops=1)" " -> Sort (cost=910136.51..919917.65 rows=3912456 width=76) (actual time=19286.138..21202.214 rows=3912168 loops=1)" " Sort Key: ""FactsWithDateTime"".pda_imei, ""FactsWithDateTime"".year, ""FactsWithDateTime"".month, ""FactsWithDateTime"".day, ""FactsWithDateTime"".hour, ""FactsWithDateTime"".minute" " Sort Method: external merge Disk: 336400kB" " -> Seq Scan on ""FactsWithDateTime"" (cost=0.00..134039.56 rows=3912456 width=76) (actual time=0.024..1560.209 rows=3912168 loops=1)" "Planning Time: 0.908 ms" "Execution Time: 71188.588 ms" ``` #### Zrychleni s materialized view FactsWithDeviceProgramDateTime - **Zpomaleni** na 84698.290 ms z momentalni rychlosti originalniho dotazu 72757.651 ms ```sql= SELECT device, program_ver, COUNT(*) number_of_crashes FROM (SELECT pda_imei, device, program_ver, CASE WHEN pda_run_time_previous < pda_run_time AND app_run_time_previous > app_run_time THEN 1 ELSE 0 END AS CRASH FROM (SELECT device, program_ver, pda_imei, app_run_time, pda_run_time, LAG(pda_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS pda_run_time_previous, LAG(app_run_time, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS app_run_time_previous FROM xgrygar."FactsWithDeviceProgramDateTime" ORDER BY year ASC, month, day, hour, minute) tb) AS crashes WHERE crashes.crash=1 GROUP BY device, program_ver ORDER BY number_of_crashes DESC ``` #### Redukce poctu LAG operaci - Pri zkouseni zredukovani poctu operaci LAG (spojeni do 1) v dotazu se cas nijak nezmenil. #### Nove materialized view CrashView jen pro Q4 - Ano jsme si vedomi, ze tato cast neodpovida zadani, kde musime materialized view pouzit minimalne pro 2 dotazy. - Duvod proc jsem to i tak chteli zkusit je, ze nasledujici view by mohlo doopravdy zrychlit tento dotaz, protoze predchozi pokusy byly prakticky marne. ```sql= SELECT "FactsWithDeviceProgramDateTime".device, "FactsWithDeviceProgramDateTime".program_ver, "FactsWithDeviceProgramDateTime".pda_imei, "FactsWithDeviceProgramDateTime".app_run_time, "FactsWithDeviceProgramDateTime".pda_run_time, lag("FactsWithDeviceProgramDateTime".pda_run_time, 1) OVER (PARTITION BY "FactsWithDeviceProgramDateTime".pda_imei ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute) AS pda_run_time_previous, lag("FactsWithDeviceProgramDateTime".app_run_time, 1) OVER (PARTITION BY "FactsWithDeviceProgramDateTime".pda_imei ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute) AS app_run_time_previous FROM "FactsWithDeviceProgramDateTime" ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute; ``` **Pouziti View** - Zrychleni na 1509.594 ms z momentalne originalnich 81689.375 ms ```sql= SELECT device, program_ver, COUNT(*) number_of_crashes FROM (SELECT pda_imei, device, program_ver, CASE WHEN pda_run_time_previous < pda_run_time AND app_run_time_previous > app_run_time THEN 1 ELSE 0 END AS CRASH FROM xgrygar."CrashView" ) AS crashes WHERE crashes.crash=1 GROUP BY device, program_ver ORDER BY number_of_crashes DESC ``` ``` "Sort (cost=110815.91..110821.11 rows=2080 width=22) (actual time=1145.793..1145.800 rows=143 loops=1)" " Sort Key: (count(*)) DESC" " Sort Method: quicksort Memory: 36kB" " -> HashAggregate (cost=110680.47..110701.27 rows=2080 width=22) (actual time=1145.702..1145.750 rows=143 loops=1)" " Group Key: ""CrashView"".device, ""CrashView"".program_ver" " Batches: 1 Memory Usage: 129kB" " -> Seq Scan on ""CrashView"" (cost=0.00..110533.76 rows=19561 width=14) (actual time=0.018..1142.272 rows=10340 loops=1)" " Filter: (CASE WHEN ((pda_run_time_previous < pda_run_time) AND (app_run_time_previous > app_run_time)) THEN 1 ELSE 0 END = 1)" " Rows Removed by Filter: 3901828" "Planning Time: 0.191 ms" "Execution Time: 1145.888 ms" ``` #### Uprava dotazu - poslednou moznostou by bolo jednodusit resp. zefektivnit samotny prikaz - problemom je, ze pri nasej implementacii sme isli ovela viac do hlbky ako napriklad vzorove riesenie a kontorlujeme vyslovene predchadzajuci run_time a ten akutalny, co nam pridava vela komplexity do samotneho dotazu (najma pouzitie window functions - o ktorych casovej narocnosti sme sa presvecili mat. view z odstavca vyssie) - pri uprave query bez window functions dochadza k totalnej zmene vysledkov a podla nasho nazoru prichadzame o detekciu niektorych crashov - rozhodli sme sa preto dotaz dalej neupravovat ##### vysledek dotazu ze vzoroveho reseni | app ver | device | crashes | | -------- | -------- | -------- | |"A56" | "KSA-LX9"| 288| |"A56"| "DRA-L21" |285| |"A56"| "HUAWEI CUN-L21"| 153| |"A56"| "MYA-L41"| 130| |"A56"| "Redmi Go" |79| |"A56"| "Lenovo A6010"| 74| |"A53"| "DRA-L21"| 74| |"A53"| "HUAWEI CUN-L21"| 67| |"A46" |"HUAWEI CUN-L21" |63| |"A56"| "Mi Note 10"| 55| ##### vysledek naseho dotazu | app ver | device | crashes | | -------- | -------- | -------- | |"A56"| "YAL-L41" |1574| |"A56" |"Mi Note 10" |1569| |"A56"| "Pixel 4" |1320| |"A56"| "Nokia 2.3"| 1294| |"A56"| "MYA-L41" |577| |"A56"| "JSN-L21" |574| |"A56"| "MRD-LX1" |406| |"A56" |"KSA-LX9" |362| |"A56" |"MAR-LX1A"| 265| |"A56" |"AMN-LX9" |175| ## Q5 Identify “connectivity” problems –an app instance connecting too often (i.e., next connection in much shorter time than 5 mins) #### Original - Puvodni cas: 51468.216 ms (Momentalne originalni dotaz beha za 65122.524 ms) - Pri dalsim behu dotazu beha originalni dotaz priblizne za 73931.437 ms ```sql= SELECT pda_imei, device, program_ver, COUNT(*) number_of_disconnects FROM (SELECT pda_imei, device, program_ver, hour, minute, prev_hour, prev_minute, CASE WHEN hour=prev_hour THEN CASE WHEN minute-prev_minute<5 THEN 1 ELSE 0 END ELSE -- prechod cez hodinu CASE WHEN (minute+60)-prev_minute<5 THEN 1 ELSE 0 END END AS disconnent FROM (SELECT pda_imei, device, program_ver, hour, minute, LAG(hour, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_hour, LAG(minute, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_minute FROM xbezak."ReportFacts" LEFT OUTER JOIN xbezak."DateDimension" ON date_id="DateDimension".id LEFT OUTER JOIN xbezak."TimeDimension" ON time_id="TimeDimension".id LEFT OUTER JOIN xbezak."DeviceDimension" ON device_id="DeviceDimension".id LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id=xbezak."ProgramVersionDimension".id ORDER BY year ASC, month, day, hour, minute) lag_table) disconnect_table WHERE disconnect_table.disconnent=1 GROUP BY ROLLUP(pda_imei, device, program_ver) ORDER BY number_of_disconnects DESC, pda_imei ``` ``` "Sort (cost=2062716.12..2062815.63 rows=39803 width=62) (actual time=74520.930..74521.041 rows=2218 loops=1)" " Sort Key: (count(*)) DESC, lag_table.pda_imei" " Sort Method: quicksort Memory: 270kB" " -> GroupAggregate (cost=2059032.51..2059675.06 rows=39803 width=62) (actual time=74206.517..74519.072 rows=2218 loops=1)" " Group Key: lag_table.pda_imei, lag_table.device, lag_table.program_ver" " Group Key: lag_table.pda_imei, lag_table.device" " Group Key: lag_table.pda_imei" " Group Key: ()" " -> Sort (cost=2059032.51..2059081.42 rows=19561 width=54) (actual time=74206.500..74405.809 rows=404312 loops=1)" " Sort Key: lag_table.pda_imei, lag_table.device, lag_table.program_ver" " Sort Method: external merge Disk: 15992kB" " -> Subquery Scan on lag_table (cost=1998955.72..2057638.24 rows=19561 width=54) (actual time=67989.149..73677.453 rows=404312 loops=1)" " Filter: (CASE WHEN (lag_table.hour = lag_table.prev_hour) THEN CASE WHEN ((lag_table.minute - lag_table.prev_minute) < '5'::double precision) THEN 1 ELSE 0 END ELSE CASE WHEN (((lag_table.minute + '60'::double precision) - lag_table.prev_minute) < '5'::double precision) THEN 1 ELSE 0 END END = 1)" " Rows Removed by Filter: 3507856" " -> Sort (cost=1998955.72..2008736.14 rows=3912168 width=110) (actual time=67989.108..73059.034 rows=3912168 loops=1)" " Sort Key: ""DateDimension"".year, ""DateDimension"".month, ""DateDimension"".day, ""TimeDimension"".hour, ""TimeDimension"".minute" " Sort Method: external merge Disk: 384296kB" " -> WindowAgg (cost=988793.92..1115939.38 rows=3912168 width=110) (actual time=26354.206..32081.488 rows=3912168 loops=1)" " -> Sort (cost=988793.92..998574.34 rows=3912168 width=94) (actual time=26354.180..28649.286 rows=3912168 loops=1)" " Sort Key: ""ReportFacts"".pda_imei, ""DateDimension"".year, ""DateDimension"".month, ""DateDimension"".day, ""TimeDimension"".hour, ""TimeDimension"".minute" " Sort Method: external merge Disk: 307968kB" " -> Hash Left Join (cost=2016.69..159264.58 rows=3912168 width=94) (actual time=24.711..5139.610 rows=3912168 loops=1)" " Hash Cond: (""ReportFacts"".program_ver_id = ""ProgramVersionDimension"".id)" " -> Hash Left Join (cost=2015.11..147319.60 rows=3912168 width=74) (actual time=24.667..4281.327 rows=3912168 loops=1)" " Hash Cond: (""ReportFacts"".device_id = ""DeviceDimension"".id)" " -> Hash Left Join (cost=2007.74..136818.45 rows=3912168 width=72) (actual time=24.549..3460.432 rows=3912168 loops=1)" " Hash Cond: (""ReportFacts"".time_id = ""TimeDimension"".id)" " -> Hash Left Join (cost=1956.34..126467.38 rows=3912168 width=64) (actual time=23.903..2441.535 rows=3912168 loops=1)" " Hash Cond: (""ReportFacts"".date_id = ""DateDimension"".id)" " -> Seq Scan on ""ReportFacts"" (cost=0.00..114240.68 rows=3912168 width=48) (actual time=0.155..992.262 rows=3912168 loops=1)" " -> Hash (cost=1362.82..1362.82 rows=47482 width=32) (actual time=23.431..23.432 rows=47482 loops=1)" " Buckets: 65536 Batches: 1 Memory Usage: 3480kB" " -> Seq Scan on ""DateDimension"" (cost=0.00..1362.82 rows=47482 width=32) (actual time=2.936..11.969 rows=47482 loops=1)" " -> Hash (cost=33.40..33.40 rows=1440 width=24) (actual time=0.624..0.624 rows=1440 loops=1)" " Buckets: 2048 Batches: 1 Memory Usage: 95kB" " -> Seq Scan on ""TimeDimension"" (cost=0.00..33.40 rows=1440 width=24) (actual time=0.078..0.307 rows=1440 loops=1)" " -> Hash (cost=4.94..4.94 rows=194 width=18) (actual time=0.098..0.098 rows=194 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 19kB" " -> Seq Scan on ""DeviceDimension"" (cost=0.00..4.94 rows=194 width=18) (actual time=0.023..0.052 rows=194 loops=1)" " -> Hash (cost=1.26..1.26 rows=26 width=36) (actual time=0.024..0.025 rows=26 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 10kB" " -> Seq Scan on ""ProgramVersionDimension"" (cost=0.00..1.26 rows=26 width=36) (actual time=0.013..0.016 rows=26 loops=1)" "Planning Time: 2.954 ms" "Execution Time: 74635.285 ms" ``` #### Zrychleni s materaiized view FactsWithDateTime - 64462.546 ms (cca 4s momentalne original beha za 68001.142 ms) ```sql= SELECT pda_imei, device, program_ver, COUNT(*) number_of_disconnects FROM (SELECT pda_imei, device, program_ver, hour, minute, prev_hour, prev_minute, CASE WHEN hour=prev_hour THEN CASE WHEN minute-prev_minute<5 THEN 1 ELSE 0 END ELSE -- prechod cez hodinu CASE WHEN (minute+60)-prev_minute<5 THEN 1 ELSE 0 END END AS disconnent FROM (SELECT pda_imei, device, program_ver, hour, minute, LAG(hour, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_hour, LAG(minute, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_minute FROM xbezak."FactsWithDateTime" LEFT OUTER JOIN xbezak."DeviceDimension" ON device_id="DeviceDimension".id LEFT OUTER JOIN xbezak."ProgramVersionDimension" ON program_ver_id=xbezak."ProgramVersionDimension".id ORDER BY year ASC, month, day, hour, minute) lag_table) disconnect_table WHERE disconnect_table.disconnent=1 GROUP BY ROLLUP(pda_imei, device, program_ver) ORDER BY number_of_disconnects DESC, pda_imei ``` #### Pouziti IN predicate - Pouziti In predicate nemelo zadny vliv na zrychleni. #### Zrychleni s materialized view FactsWithDateTime a FactsWithDeviceAndProgramVersion - **Zpomaleni** o 8s na 82280.650 ms z originalniho dotazu ktery momentalne behal cca za 74193.155 ms ```sql= SELECT pda_imei, device, program_ver, COUNT(*) number_of_disconnects FROM (SELECT pda_imei, device, program_ver, hour, minute, prev_hour, prev_minute, CASE WHEN hour=prev_hour THEN CASE WHEN minute-prev_minute<5 THEN 1 ELSE 0 END ELSE -- prechod cez hodinu CASE WHEN (minute+60)-prev_minute<5 THEN 1 ELSE 0 END END AS disconnent FROM (SELECT "FactsWithDateTime".pda_imei, device, program_ver, hour, minute, LAG(hour, 1) OVER (PARTITION BY "FactsWithDateTime".pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_hour, LAG(minute, 1) OVER (PARTITION BY "FactsWithDateTime".pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_minute FROM xbezak."FactsWithDateTime" LEFT OUTER JOIN xgrygar."FactsWithDeviceAndProgramVersion" ON "FactsWithDeviceAndProgramVersion".id="FactsWithDateTime".id ORDER BY year ASC, month, day, hour, minute) lag_table) disconnect_table WHERE disconnect_table.disconnent=1 GROUP BY ROLLUP(pda_imei, device, program_ver) ORDER BY number_of_disconnects DESC, pda_imei ``` #### Zrychleni s materialized view - zrychleni na 70477.253 ms z momentalni rychlosti 73916.948 ms originalniho dotazu ```sql= SELECT pda_imei, device, program_ver, COUNT(*) number_of_disconnects FROM (SELECT pda_imei, device, program_ver, hour, minute, prev_hour, prev_minute, CASE WHEN hour=prev_hour THEN CASE WHEN minute-prev_minute<5 THEN 1 ELSE 0 END ELSE -- prechod cez hodinu CASE WHEN (minute+60)-prev_minute<5 THEN 1 ELSE 0 END END AS disconnent FROM (SELECT pda_imei, device, program_ver, hour, minute, LAG(hour, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_hour, LAG(minute, 1) OVER (PARTITION BY pda_imei ORDER BY year ASC, month, day, hour, minute) AS prev_minute FROM xgrygar."FactsWithDeviceProgramDateTime" ORDER BY year ASC, month, day, hour, minute) lag_table) disconnect_table WHERE disconnect_table.disconnent=1 GROUP BY ROLLUP(pda_imei, device, program_ver) ORDER BY number_of_disconnects DESC, pda_imei ``` ``` "Sort (cost=1878261.47..1878354.63 rows=37263 width=38) (actual time=71012.038..71012.146 rows=2218 loops=1)" " Sort Key: (count(*)) DESC, lag_table.pda_imei" " Sort Method: quicksort Memory: 270kB" " -> GroupAggregate (cost=1874815.04..1875432.20 rows=37263 width=38) (actual time=70694.219..71010.163 rows=2218 loops=1)" " Group Key: lag_table.pda_imei, lag_table.device, lag_table.program_ver" " Group Key: lag_table.pda_imei, lag_table.device" " Group Key: lag_table.pda_imei" " Group Key: ()" " -> Sort (cost=1874815.04..1874863.95 rows=19562 width=30) (actual time=70694.200..70895.503 rows=404312 loops=1)" " Sort Key: lag_table.pda_imei, lag_table.device, lag_table.program_ver" " Sort Method: external merge Disk: 15992kB" " -> Subquery Scan on lag_table (cost=1814734.28..1873420.68 rows=19562 width=30) (actual time=64332.861..70141.544 rows=404312 loops=1)" " Filter: (CASE WHEN (lag_table.hour = lag_table.prev_hour) THEN CASE WHEN ((lag_table.minute - lag_table.prev_minute) < '5'::double precision) THEN 1 ELSE 0 END ELSE CASE WHEN (((lag_table.minute + '60'::double precision) - lag_table.prev_minute) < '5'::double precision) THEN 1 ELSE 0 END END = 1)" " Rows Removed by Filter: 3507856" " -> Sort (cost=1814734.28..1824515.35 rows=3912427 width=86) (actual time=64332.821..69510.462 rows=3912168 loops=1)" " Sort Key: ""FactsWithDeviceProgramDateTime"".year, ""FactsWithDeviceProgramDateTime"".month, ""FactsWithDeviceProgramDateTime"".day, ""FactsWithDeviceProgramDateTime"".hour, ""FactsWithDeviceProgramDateTime"".minute" " Sort Method: external merge Disk: 384296kB" " -> WindowAgg (cost=884739.84..1011893.71 rows=3912427 width=86) (actual time=22855.551..28579.985 rows=3912168 loops=1)" " -> Sort (cost=884739.84..894520.90 rows=3912427 width=70) (actual time=22855.530..25154.008 rows=3912168 loops=1)" " Sort Key: ""FactsWithDeviceProgramDateTime"".pda_imei, ""FactsWithDeviceProgramDateTime"".year, ""FactsWithDeviceProgramDateTime"".month, ""FactsWithDeviceProgramDateTime"".day, ""FactsWithDeviceProgramDateTime"".hour, ""FactsWithDeviceProgramDateTime"".minute" " Sort Method: external merge Disk: 307968kB" " -> Seq Scan on ""FactsWithDeviceProgramDateTime"" (cost=0.00..135393.27 rows=3912427 width=70) (actual time=0.026..1756.121 rows=3912168 loops=1)" "Planning Time: 0.476 ms" "Execution Time: 71127.691 ms" ``` #### Nove materialized view DisconnectView jen pro Q5 - Ano jsme si vedomi, ze tato cast neodpovida zadani, kde musime materialized view pouzit minimalne pro 2 dotazy. - Duvod proc jsem to i tak chteli zkusit je, ze nasledujici view by mohlo doopravdy zrychlit tento dotaz, protoze predchozi pokusy byly prakticky marne. ```sql= SELECT "FactsWithDeviceProgramDateTime".pda_imei, "FactsWithDeviceProgramDateTime".device, "FactsWithDeviceProgramDateTime".program_ver, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute, lag("FactsWithDeviceProgramDateTime".hour, 1) OVER (PARTITION BY "FactsWithDeviceProgramDateTime".pda_imei ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute) AS prev_hour, lag("FactsWithDeviceProgramDateTime".minute, 1) OVER (PARTITION BY "FactsWithDeviceProgramDateTime".pda_imei ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute) AS prev_minute FROM "FactsWithDeviceProgramDateTime" ORDER BY "FactsWithDeviceProgramDateTime".year, "FactsWithDeviceProgramDateTime".month, "FactsWithDeviceProgramDateTime".day, "FactsWithDeviceProgramDateTime".hour, "FactsWithDeviceProgramDateTime".minute; ``` **Pouziti View** - Zrychleni na 1862.180 ms z momentalne originalnich 64767.946 ms ```sql= SELECT pda_imei, device, program_ver, COUNT(*) number_of_disconnects FROM (SELECT pda_imei, device, program_ver, hour, minute, prev_hour, prev_minute, CASE WHEN hour=prev_hour THEN CASE WHEN minute-prev_minute<5 THEN 1 ELSE 0 END ELSE -- prechod cez hodinu CASE WHEN (minute+60)-prev_minute<5 THEN 1 ELSE 0 END END AS disconnent FROM xgrygar."DisconnectView") disconnect_table WHERE disconnect_table.disconnent=1 GROUP BY ROLLUP(pda_imei, device, program_ver) ORDER BY number_of_disconnects DESC, pda_imei ``` ``` "Sort (cost=99218.39..99311.88 rows=37396 width=38) (actual time=1711.340..1711.441 rows=2218 loops=1)" " Sort Key: (count(*)) DESC, ""DisconnectView"".pda_imei" " Sort Method: quicksort Memory: 270kB" " -> GroupAggregate (cost=95759.58..96378.05 rows=37396 width=38) (actual time=1398.075..1709.552 rows=2218 loops=1)" " Group Key: ""DisconnectView"".pda_imei, ""DisconnectView"".device, ""DisconnectView"".program_ver" " Group Key: ""DisconnectView"".pda_imei, ""DisconnectView"".device" " Group Key: ""DisconnectView"".pda_imei" " Group Key: ()" " -> Sort (cost=95759.58..95808.48 rows=19561 width=30) (actual time=1398.057..1596.517 rows=404312 loops=1)" " Sort Key: ""DisconnectView"".pda_imei, ""DisconnectView"".device, ""DisconnectView"".program_ver" " Sort Method: external merge Disk: 15992kB" " -> Seq Scan on ""DisconnectView"" (cost=0.00..94365.30 rows=19561 width=30) (actual time=0.021..868.988 rows=404312 loops=1)" " Filter: (CASE WHEN (hour = prev_hour) THEN CASE WHEN ((minute - prev_minute) < '5'::double precision) THEN 1 ELSE 0 END ELSE CASE WHEN (((minute + '60'::double precision) - prev_minute) < '5'::double precision) THEN 1 ELSE 0 END END = 1)" " Rows Removed by Filter: 3507856" "Planning Time: 0.244 ms" "Execution Time: 1714.540 ms" ``` #### Uprava dotazu - podobne ako pri predchadzajucom query, tak nas povodny dotaz bol ponaty ovela detailnejsie, tu uz sice aj v pripade vzoroveho riesenia dochadza k pouzitiu window function, nas dotaz ale pocita aj pocet disconnectov, co je podla nas velmi dolezita informacia - vysledky su oproti vzorovemu dotazu tiez ine, takze logika bola pouzita tiez ina - kedze nam tento dotaz pride informativnejsi a nenasli sme nejaky sposob, akym rovnaku informaciu vytiahnut z databazy jednuduchsie (plati aj pre Q4), rozhodli sme sa ho dalej neupravovat ##### vysledek dotazu ze vzoroveho reseni ``` "860556046015392" "AGS2-L09" "A53" "AGS2-L09" "A53" "860540044404873" "AMN-LX9" "A53" "860540044405193" "AMN-LX9" "A53" "860540044443665" "AMN-LX9" "A53" "860540044484016" "AMN-LX9" "A53" "860540044908667" "AMN-LX9" "A53" "861366047251965" "AMN-LX9" "A53" "861945048272581" "AMN-LX9" "A53" "861945048289874" "AMN-LX9" "A53" ``` ##### vysledek naseho dotazu ``` 221170 "0123456789ABCDEF" 55546 "0123456789ABCDEF" "SM-G975F" 8427 "0123456789ABCDEF" "SM-G975F" "A56" 8427 "0123456789ABCDEF" "Pixel 4" 8107 "0123456789ABCDEF" "Pixel 4" "A56" 8107 "0123456789ABCDEF" "Mi Note 10" 6425 "0123456789ABCDEF" "Mi Note 10" "A56" 6425 "861945048322394" "AMN-LX9" 5818 "861945048322394" 5818 "861945048322394" "AMN-LX9" "A54" 5818 "861366047251965" "AMN-LX9" 5816 ``` ### Zaver - co sa tyka dodrzania zadania vseobecne sa nam dotazy podarilo zrychlit, ziadny ale na polovicu bez pouzitia velmi specializovanych views - ohladom ceny vytvorenia nasich pomocnych struktur tak je velkou vyhodou datoveho skladu ze pocas prace s nim sa data nemenia, v OLTP ak by sme pouzili priliz vela indexov, tak by sa nam spomalili aktualizacia tabuliek kedze by sa museli akutalizovat struktury - v nasom datovom sklade sa ale tieto aktualizacie nedeju a indexy resp. materialized views sa vytvoria pri samotnom naplneni - vzdy ale zaberaju nejake miesto v pamati databazy a toto miesto nie je zadarmo, podla nas je toto ale vynahradene zrychlenim dotazov, pri views kde dopredu pocitame joiny je zrychlenie mensie (max 40%), tieto su ale vseobecne a su pouzite v minimalne dvoch queries - k radikalnemu zrychleniu sa nam podarilo dospiet pri az pri poslednych dvoch dotazoch, kde sme si predpocitali narocnu operaciu LAGu, toto materialized view sme sice pouzili potom iba v 1 dotaze, cena by sa ale mohla vratit kebyze robime podobne dotazy, kde ho znovu vyuzijeme, alebo ho pustame opakovane - Co se tyce prvnich 3 dotazu (Q1 - Q3), ty se nam podarilo znatelne zrychlit v ramci zadani. Sice ne o 50 %, ale jeden se tomu velmi blizil. Nejvic pomohli materialized view, ktere jsou pouzite u vice dotazu --> tedy v ramci zadani. - K dotazum Q4 a Q5, tam materialized view a ostatni techniky moc nepomohli. Nejen ze odchylka casu byla velka, tedy casy s pouzitim nespecializovanych materialized view a ostatnich technik byly, da se rict, v ramci teto odchylky. Realne meritelne zlepseni prineslo az pouziti specializovanych materialized view, ktere nesplnuji zdani. V tu chvili se oba dotazy dokazali radove (≈50x) zlepsit.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully