# Snowflake
###### tags: `snowflake`
## Load data into snowflake
* google big query
* amazon s3
* parquet of files from amazon s3 bucket
* when additional packets of files are added into s3 or there's just a lot of groups of files
* raw files
### Uploading File
1. Create schema: create schema [schema name]
2. Snowflake - stage: pointer for material for importation
(look at internal vs external in snowflake)
* show file formats available: 'show file formats'
* create file format that is in stage/is stage; describe the file format:
```
create or replace file format parquet
type = PARQUET
TRIM_SPACE=false
null_if = ()
BINARY_AS_TEXT = true
COMPRESSION = AUTO;
```
```
create or replace file format gz_format
type = csv field_delimiter = '"' skip_header = 1
null_if = ('NULL', 'null')
empty_field_as_null = true
COMPRESSION = 'GZIP';
```
### Create stage
```
create or replace stage mprod_cm_demo_stage url='s3://gn5456-prodanalytics-snowflake-us-west-2/music/db-delph-trgt-1/db_replica_full/MPROD_CM/'
credentials=(aws_key_id='AKIAXSMVNSKGBY2Y55AP' aws_secret_key='4Y2kqVCrp5EIIePJQ3ImKaNeCCwWvtio7n8ZiTjN')
file_format= gz_format;
```
* list stages made: `show stages`
* list files in the s3 bucket: `list @mprod_cm_demo_stage`

* create table in snowflake:
```
create table ddhar_test.stg.mprod_cm
(
MPROD_CM_ID bigint,
ROOT_OBJECT_ID bigint,
OBJECT_ID bigint,
MPROD_ID string,
MPROD_OWNER string,
MPROD_TABLE string,
MPROD_PARENT_ID string,
MPROD_PARENT_TABLE string,
NAME string,
ABREVIATED_NAME string,
DESCRIPTION string,
LANGUAGE_CONTEXT_ID bigint,
SORT bigint,
TYPE string,
SOURCE_ID bigint,
STATUS_ID bigint,
DESTINATION_ID bigint,
CDATE datetime,
UDATE datetime,
SESSION_ID bigint
)
```
* Copying some files originally in the s3 bucket
* Note that you can't use this when creating pipes. You need to write a more simple select statement:
```
copy into DDHAR_TEST.STG.MPROD_CM from
( select
t.$1::bigint,
t.$2::bigint,
t.$3::bigint, --t.$3::bigint
t.$4::string,
t.$5::string,
t.$6::string,
t.$7::string,
t.$8::string,
t.$9::string,
t.$10::string,
t.$11::string,
t.$12::bigint,
t.$13::bigint,
t.$14::string,
t.$15::bigint,
t.$16::bigint,
t.$17::bigint,
t.$18::datetime,
t.$19::datetime,
t.$20::bigint
from @mprod_cm_demo_stage (file_format => gz_format) t
)
```
## Create Pipe
* automated ingestion of data into snowflake
* you can have a bucket,
* enable the continuous reading of a folder
* lets say you want to automate a pipeline from your s3 bucket of certain files. Use the stage of the source in order to pipe files into a table in snowflake
* unfortunately i haven't figured out a way for this to take columns :(
* show the pipes already made `show pipes`
```
create pipe if not exists my_mprod_cm_pipe
auto_ingest= true as
copy into ddhar_test.stg.mprod_cm from
@mprod_cm_demo_stage/ file_format = gz_format;
```
```
create pipe if not exists d_mprod_cm_pipe
auto_ingest= true as
copy into ddhar_test.stg.mprod_cm from
@mprod_cm_demo_stage/ file_format = gz_format;
```
* you do this in order to set datatypes for all rows
* you also can make a pipe and a table to copy the file into
```
create pipe if not exists my_mprod_cm_pipe
auto_ingest= true as
copy into ddhar_test.stg.mprod_cm from
(file_format => parquet) t;
```
## Quering files from s3 in Amazon Athena
1. Build a crawler that holds metadata about a table
* analyzes sites of data (5-10%), and captures metadata about the table
* once you create it, it
* what types of files we're dealing with, etc.
2. Build a database where crawler stores information about the database
#### Schema migration script
* when you have a lot of tables to copy!
#### Importing multiple schemas into scowflae
1. Oracle Data modeler
2. create a logical model
* logical:relational::entity:table
3. convert datatypes right? Use domains: logical concepts that maps to native types, you can have a mapping from a native type to a domain then another time, etc.
* to get a
snowflake:
* no difference in different types of varchar