owned this note
owned this note
Published
Linked with GitHub
# Snowflake and AWS S3 Integration: 2 Methods for Connection
There is a high demand for storing and analyzing large volumes of data. This demand is being satisfied by numerous data storage and data warehouse providers. In this article, we talk about two of the most popular services --- Amazon S3 and Snowflake. We touch on why people choose them, highlight their benefits, and check their native connectivity options and possible alternatives.
## Table of Contents
* [What Is Amazon S3?]()
* [Amazon S3 Benefits]()
* [What Is Snowflake?]()
* [Snowflake Benefits]()
* [Amazon S3 Snowflake Integration]()
* [Manual connection of Amazon S3 to Snowflake]()
* [Integration of Amazon S3 with Snowflake using Skyvia]()
* [Conclusion]()
## What Is Amazon S3?
Amazon S3 is one of the most popular clod storage services. It stores data as objects within buckets. S3 earned its popularity due to extremly high scalability and availability.
### Amazon S3 Benefits
Amazon S3 is a complex system that provides numerous benefits. To learn about them in details, we recommend you to check the [S3 official documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html). For the overview purpose, we are pointing out the elephant in the room.
#### Scalability
In Amazon S3, you can store an almost unlimited number and volume of objects. It allows you to scale storage services on the go according to your current needs.
#### Availability
Amazon [states](https://aws.amazon.com/s3/faqs/) that the service provides a 99.9% availability for the data you store. That's an extremely high number, and according to our usage experience and overall feedback over the internet, that statement is pretty close to reality.
#### Cost Efficiency
With Amazon S3, you don't need to guess how much storage you're going to need in the future. You can store as much data as you want and access it when needed. You only pay for the storage you use.
#### Security
Amazon S3 ensures that you don't lose valuable data by automatically making copies of your objects on multiple devices across multiple facilities. It also lets you preserve, retrieve and restore every version of every object in an Amazon S3 bucket so you can easily recover data if something is accidentally deleted by users or application failures.
Bucket policies, AWS Identity and Access Management (IAM) policies, access control lists (ACLs), and S3 Access Points help you precisely control the access to your objects and buckets.
## What Is Snowflake?
Snowflake is a cloud-based data warehouse. Like all data warehouses, Snowflake collects and aggregates data from one or many sources so data can be used to produce business insights.
Complex queries usually require stopping the database updates for the time of query execution, which is not always possible. That's why businesses use Snowflake as a repository for grouping up the data and executing queries on it (separately from the main database).
### Snowflake Benefits
Snowflake is mostly known for becoming the [biggest software IPO in history](https://edition.cnn.com/2020/09/16/investing/snowflake-ipo/index.html) back in 2020. And that's for a reason. Company holders were able to properly share the benefits that Snowflake can provide with the masses. Here you can find the list of the main ones.
#### Effective Data Sharing
Snowflake’s architecture enables smooth data sharing between users. You can provide access to your data in Snowflake to the users who aren't Snowflake customers by creating client accounts for them.
#### Accessibility and Performance
There are no concurrency-related delays or failures. Multi-cluster architecture makes sure all your concurrent queries run smoothly.
#### Security
Snowflake is HIPAA, HITRUST CSF, PCI DSS, FedRAMP, IRAP Protected [compliant](https://www.snowflake.com/product/security-and-trust-center/) and is trusted by many governmental, health, and banking institutions.
#### Simplicity
It requires no on-premise installation or management and provides a number of built-in features such as speed optimization, security, easy data restore, and others.
## Amazon S3 Snowflake Integration
Amazon S3 stores huge volumes of data. You can use this data to achieve valuable insights by uploading it to a purpose-built cloud data warehouse such as Snowflake. To integrate Snowflake and Amazon S3, you can use native methods or custom third-party solutions. Let's check both options in details, so you can decide what fits your needs best.
## Manual connection of Amazon S3 with Snowflake
You can setup a manual connection between Snowflake and Amazon S3 by using built-in tools to access S3 data by using `LIST <Stage_name>;` command. The process consists of 6 steps.
> Completing the instructions in this paragraph requires permissions in AWS to create and manage IAM policies and roles.
### Step 1. Creating IAM policy
Snowflake needs the following permissions on an S3 bucket to access files in the folder (and sub-folders):
`s3:GetBucketLocation`
`s3:GetObject`
`s3:GetObjectVersion`
`s3:ListBucket`
As a best solution for this Snowflake suggests to create an IAM policy. Afterwards, you can attach a policy to a certain role and use credentials generated by AWS.
* Log into the **AWS Management Console**.
* Go to **Dashboard** > **Identity & Access Management**
* Open **Account settings** on the left.
* Activate your AWS region by expanding the **Security Token Service Regions** list and choosing **Activate** next to your region.
* Open **Policies** on the left.
* Click **Create Policy**.
* Click the **JSON** tab.
Add a policy document that allows Snowflake to access the S3 bucket and folder.
Copy and paste the text into the policy editor:
> Make sure to replace bucket and prefix with actual names.
``` {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<bucket>",
"Condition": {
"StringLike": {
"s3:prefix": [
"<prefix>/*"
]
}
}
}
]
}
```
* Click **Review policy**.
* Enter the policy name and click **Create Policy**.
### Step 2. Create the IAM Role in AWS
* While in **AWS Management Console**, choose Identity & Access Management (IAM):
* Choose **Roles** on the left.
* Click **Create Role**.
* Select **Another AWS Account** as the trusted entity type.
* In the **Account ID** field, enter your own AWS account ID. You will modify this value later.
* Select **Require External ID**. Enter any value as your ID. You will modify it later.
* Click **Next**.
* Select the policy you created in Step 1.
* Click **Next**.
* Enter a name and description for the role, and click **Create Role**.
Save the Role ARN value from the role summary page. In the next step, you will create a Snowflake integration that references this role.
### Step 3. Create a Cloud Storage Integration in Snowflake
Create a storage integration using the [CREATE STORAGE INTEGRATION](https://docs.snowflake.com/en/sql-reference/sql/create-storage-integration.html) command.
```
CREATE STORAGE INTEGRATION <integration_name>
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = '<iam_role>'
STORAGE_ALLOWED_LOCATIONS = ('s3://<bucket>/<path>/', 's3://<bucket>/<path>/')
[ STORAGE_BLOCKED_LOCATIONS = ('s3://<bucket>/<path>/', 's3://<bucket>/<path>/') ]
```
And replace the following parameters with your values:
* integration_name
* iam_role
* Bucket
* path
### Step 4. Retrieve the AWS IAM User for your Snowflake Account
Execute the [DESCRIBE INTEGRATION](https://docs.snowflake.com/en/sql-reference/sql/desc-integration.html) command to retrieve the ARN for the AWS IAM user that was created automatically for your Snowflake account:
`DESC INTEGRATION <integration_name>;`
Replace `<integration_name>` with the name of the integration you created in Step 3.
Save the following values for the next step:
`STORAGE_AWS_IAM_USER_ARN`
`STORAGE_AWS_EXTERNAL_ID`
### Step 5. Grant the IAM User Permissions to Access Bucket Objects
* Log into the **AWS Management Console**.
* Choose **Identity & Access Management** (IAM).
* Choose **Roles** from the left-hand navigation pane.
* Click on the role you created in Step 2.
* Click on the **Trust Relationships** tab.
* Click the **Edit Trust Relationship** button.
* Modify the policy document with the **DESC STORAGE INTEGRATION** output values you recorded in Step 4 where `snowflake_user_arn` is `STORAGE_AWS_IAM_USER_ARN` and `snowflake_external_id` is `STORAGE_AWS_EXTERNAL_ID`.
```
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "<snowflake_user_arn>"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<snowflake_external_id>"
}
}
}
]
}
```
* Click **Update Trust Policy**.
### Step 6. Create an External Stage
Create the stage using the [CREATE STAGE]() command.
```
create stage my_s3_stage
storage_integration = s3_int
url = 's3://bucket1/path1'
file_format = my_csv_format;
```
Finally, you can access your S3 data by using `LIST <Stage_name>;` command where <Stage_name> is the name of the created stage.
## Integration of Amazon S3 with Snowflake Using Skyvia
If you don't want to spend a lot of time on creating and configuring manual connection between Amazon S3 and Snowflake, you can use Skyvia to create packages and run them manually or on schedule based on your needs. To do so, you'll need to follow three simple steps:
* Create Amazon S3 Connection.
* Create SnowFlake Connection.
* Create and run a package of choice. We create an Import package as an example of importing a CSV file from Amazon S3 to Snowflake.
The instructions below assume that you have already created a Skyvia account. In case you haven't, you can do that for free by visiting the [Skyvia](https://app.skyvia.com) app page.
### Create Amazon S3 Connection
To create an Amazon S3 connection, go to **+New > Connection**, select Amazon S3, and do the following:
![](https://i.imgur.com/PsgJdLg.png)
1. Enter **Access Key ID** --- the first part of your Amazon Web Services access key.
2. Enter **Secret Key** --- the second part of your Amazon Web Services access key.
3. Enter **Security Token** (optionally) --- a session token used with temporary security credentials.
4. Set the **Region** --- an AWS region, where your S3 storage is hosted.
5. Enter the **Bucket Name** --- the name of your S3 bucket to load CSV files from or to.
6. Click **Create Connection**.
### Create Snowflake Connection
To create an Amazon S3 connection, go to **+New > Connection**, select Snowflake, and do the following:
![](https://i.imgur.com/NHdI9b6.png)
1. Enter your Snowflake **Domain** name.
2. Enter your Snowflake **User** name.
3. Enter the user **Password**.
4. Enter the **Database** name.
5. Click **Create Connection**.
### Create Integration Package
Once connections to Snowflake and Amazon S3 are created, you can create a package of choice. For this example, we create a CSV import package:
1. Go to **New > Import**.
2. Choose **CSV from storage service** option.
![](https://i.imgur.com/Cb1sZUU.png)
3. Choose Amazon S3 as **Source** and Snowflake as **Target** and click **Add New**.
4. Select your CSV file and click **Next Step**.
5. Choose the table you want to import data to and the type of import action.
![](https://i.imgur.com/1ZqlScV.png)
6. Map columns if needed and click **Save**.
The Import package is ready to use. You can run it manually or on schedule.
## Conclusion
Amazon S3 and Snowflake are great cloud tools for storing and working with data. S3 Snowflake integration can benefit you with a deeper understanding of the data you store on the S3. You can use built-in and third-party solutions to make Amazon S3 and Snowflake share data. The integration process with Skyvia requires no more than the knowledge of your credentials. Additionally, you can combine data imported from S3 with data from other sources. However, if you prefer the classical integration way, it's still an option.