# Postgres large tips ## Large data count ### hll This Postgres module introduces a new data type hll which is a HyperLogLog data structure. HyperLogLog is a fixed-size, set-like structure used for distinct value counting with tunable precision. For example, in 1280 bytes hll can estimate the count of tens of billions of distinct values with only a few percent err. Simple example ```sql --- Make a dummy table CREATE TABLE helloworld ( id integer, des text, set hll ); --- Insert an empty HLL INSERT INTO helloworld(id, set) VALUES (1, hll_empty()); --- Add a hashed integer to the HLL UPDATE helloworld SET set = hll_add(set, hll_hash_integer(12345)) WHERE id = 1; --- Or add a hashed string to the HLL UPDATE helloworld SET set = hll_add(set, hll_hash_text('hello world')) WHERE id = 1; --- Get the cardinality of the HLL SELECT hll_cardinality(set) FROM helloworld WHERE id = 1; --- check set hash select des, set || hll_hash_integer('12345') = set from helloworld ``` Extends example ```sql INSERT INTO helloworld(id ,des, set) SELECT 2,'all user count', hll_add_agg(hll_hash_text(id::text)) FROM account GROUP BY 1; SELECT hll_cardinality(set) FROM helloworld WHERE id = 2; INSERT INTO helloworld(id ,des, set) SELECT 3,'2021 user count', hll_add_agg(hll_hash_text(id::text)) FROM account where create_time > '2021-01-01' and create_time < '2022-01-01' GROUP BY 1; INSERT INTO helloworld(id ,des, set) SELECT 4,'2020 user count', hll_add_agg(hll_hash_text(id::text)) FROM account where create_time > '2020-01-01' and create_time < '2021-01-01' GROUP BY 1; SELECT hll_cardinality(set) FROM helloworld WHERE id = 3; SELECT des, hll_cardinality(set) FROM helloworld; select des, set || hll_hash_text(((select id::text from account where create_time>'2020-01-01' order by create_time asc limit 1 ))) = set from helloworld --- select des, set || hll_hash_integer('12345') = set from helloworld -- check union hash select -- hll_cardinality( hll_union(coalesce(t_all,hll_empty()), coalesce(t_2020,hll_empty())) ) hll_cardinality( t_all||t_2020||t_2021 ) from ( select (select set from helloworld where id =2) as t_all, (select set from helloworld where id =3) as t_2021, (select set from helloworld where id =4) as t_2020 ) as temp ``` ![](https://i.imgur.com/PiaHrqf.png) more example ```SQL INSERT INTO helloworld(id ,des, set) SELECT 5,'2021 login user count', hll_add_agg(hll_hash_text(id::text)) from (select distinct id from log.web_login_log where create_time > '2021-01-01' and create_time < '2022-01-01') as temp; INSERT INTO helloworld(id ,des, set) SELECT 6,'2022 login user count', hll_add_agg(hll_hash_text(id::text)) from (select distinct id from log.web_login_log where create_time > '2020-01-01' and create_time < '2021-01-01') as temp; -- select count(distinct id) from log.web_login_log where create_time > '2020-01-01' and create_time < '2022-01-01' select hll_cardinality(t_2021|| t_2020) from ( select (select set from helloworld where id =5) as t_2021, (select set from helloworld where id =6) as t_2020 ) as temp ``` category|value ---|--- (real) |411420 (hash union) |412954.08134970645 (two value add) |428936.78133179368 ```SQL ---hash seed INSERT INTO helloworld(id ,des, set) SELECT 2,'all user count', hll_add_agg(hll_hash_text(id::text,10)) FROM account GROUP BY 1; ``` ![](https://i.imgur.com/inwcZKu.png) ### reference [github](https://github.com/citusdata/postgresql-hll) [usage example - from huawei cloud](https://support.huaweicloud.com/intl/en-us/sqlreference-dws/dws_06_0021.html) [practice - from ali cloud](https://help.aliyun.com/document_detail/183124.html)