# Postgres large tips
## Large data count
### hll
This Postgres module introduces a new data type hll which is a HyperLogLog data structure. HyperLogLog is a fixed-size, set-like structure used for distinct value counting with tunable precision. For example, in 1280 bytes hll can estimate the count of tens of billions of distinct values with only a few percent err.
Simple example
```sql
--- Make a dummy table
CREATE TABLE helloworld (
id integer,
des text,
set hll
);
--- Insert an empty HLL
INSERT INTO helloworld(id, set) VALUES (1, hll_empty());
--- Add a hashed integer to the HLL
UPDATE helloworld SET set = hll_add(set, hll_hash_integer(12345)) WHERE id = 1;
--- Or add a hashed string to the HLL
UPDATE helloworld SET set = hll_add(set, hll_hash_text('hello world')) WHERE id = 1;
--- Get the cardinality of the HLL
SELECT hll_cardinality(set) FROM helloworld WHERE id = 1;
--- check set hash
select des, set || hll_hash_integer('12345') = set from helloworld
```
Extends example
```sql
INSERT INTO helloworld(id ,des, set)
SELECT 2,'all user count', hll_add_agg(hll_hash_text(id::text))
FROM account
GROUP BY 1;
SELECT hll_cardinality(set) FROM helloworld WHERE id = 2;
INSERT INTO helloworld(id ,des, set)
SELECT 3,'2021 user count', hll_add_agg(hll_hash_text(id::text))
FROM account where create_time > '2021-01-01' and create_time < '2022-01-01'
GROUP BY 1;
INSERT INTO helloworld(id ,des, set)
SELECT 4,'2020 user count', hll_add_agg(hll_hash_text(id::text))
FROM account where create_time > '2020-01-01' and create_time < '2021-01-01'
GROUP BY 1;
SELECT hll_cardinality(set) FROM helloworld WHERE id = 3;
SELECT des, hll_cardinality(set) FROM helloworld;
select des, set || hll_hash_text(((select id::text from account where create_time>'2020-01-01' order by create_time asc limit 1 ))) = set from helloworld
---
select des, set || hll_hash_integer('12345') = set from helloworld
-- check union hash
select
-- hll_cardinality( hll_union(coalesce(t_all,hll_empty()), coalesce(t_2020,hll_empty())) )
hll_cardinality( t_all||t_2020||t_2021 )
from (
select
(select set from helloworld where id =2) as t_all,
(select set from helloworld where id =3) as t_2021,
(select set from helloworld where id =4) as t_2020
) as temp
```

more example
```SQL
INSERT INTO helloworld(id ,des, set)
SELECT 5,'2021 login user count', hll_add_agg(hll_hash_text(id::text)) from
(select distinct id from log.web_login_log where create_time > '2021-01-01' and create_time < '2022-01-01') as temp;
INSERT INTO helloworld(id ,des, set)
SELECT 6,'2022 login user count', hll_add_agg(hll_hash_text(id::text)) from
(select distinct id from log.web_login_log where create_time > '2020-01-01' and create_time < '2021-01-01') as temp;
-- select count(distinct id) from log.web_login_log where create_time > '2020-01-01' and create_time < '2022-01-01'
select
hll_cardinality(t_2021|| t_2020)
from (
select
(select set from helloworld where id =5) as t_2021,
(select set from helloworld where id =6) as t_2020
) as temp
```
category|value
---|---
(real) |411420
(hash union) |412954.08134970645
(two value add) |428936.78133179368
```SQL
---hash seed
INSERT INTO helloworld(id ,des, set)
SELECT 2,'all user count', hll_add_agg(hll_hash_text(id::text,10))
FROM account
GROUP BY 1;
```

### reference
[github](https://github.com/citusdata/postgresql-hll)
[usage example - from huawei cloud](https://support.huaweicloud.com/intl/en-us/sqlreference-dws/dws_06_0021.html)
[practice - from ali cloud](https://help.aliyun.com/document_detail/183124.html)