Trino(prestoSQL)
===
## What is Trino
- Distributed SQL query engine for big data analytics
- MPP(massively parallel processing)
- Difference with Presto(facebook)
- Similar like Mysql(presto) and MariaDB(trino)
- Background
- performance
- multiple data source
- Caractéristiques
- Connect lot of source like etl tools
- https://trino.io/docs/current/connector.html
- https://trino.io/blog/2023/07/12/trino-fest-2023-let-it-snow-recap.html
- Do analysis between different source by using standard SQL
- Pushdown
- Calculated in memory
- ETL/EL between different datasource
- 
## Demo
- Catalog(source) ←database
- Schema
- Tables/View
- Define Catalog
```linux
docker exec -it 453158df942d /bin/bash
ls /etc/trino/catalog/
cat /etc/trino/catalog/mongo.properties
docker exec -it trino trino
```
```sql
show cagalogs;
```
- Explore Data
- Mongo
```sql
docker exec -it 648cc1ab7882 bash
mongosh "mongodb://admin:nttcom00330033@0.0.0.0:27017/database?authSource=admin
use cdrdb
db.cdrsampletopic.find()
```
- Mongo in Trino
```sql
docker exec -it trino trino --catalog mongo
show schemas from mongo;
show tables from mongo.cdrdb;
select * from mongo.cdrdb.cdrsampletopic
```
- Join different data source
- Sqlserver in Trino
```sql
select * from sqlserver.demo.staff;
```
- Join
```sql
select
staff.*,
cdrsampletopic.description
from sqlserver.demo.staff
INNER JOIN mongo.cdrdb.cdrsampletopic
ON staff.name=cdrsampletopic.name
```
- Load Table from Mongo to Sqlserver
```sql
show tables from sqlserver.demo;
```
```sql
create table sqlserver.demo.staff_from_mongo
AS
select * from mongo.cdrdb.cdrsampletopic
```
## Pensée
- Hadoop,HIVE and Trino combination
- Trino play a role like datawarehouse layer in Snowflake
- Compare to snowflake, trino can query multiple source
- Usage in ETL
- Compare to Airbyte
- all managed by standard sql,everyone who can write sql could do etl
- easy to do troubleshooting(se sentir plus à l'aise)
- SkillSET
- Python: Airflow
- SQL: Trino
- Java: Talend...
- Nocode: Airbyte
- YML: Meltano