# AWS Athena Walkthrough - Goal: Build an Athena table on top of HTTPD access logs 🏁 - Sample data: https://github.com/elastic/examples/blob/master/Common%20Data%20Formats/apache_logs/apache_logs ## Steps Replace instances of "mmaddox" below with your username. 1. Upload your data ```bash= aws s3 cp <mylogfile> s3://<athena query results bucket>/mmaddox/access_logs/<mylogfile> ``` 2. Create your DB. ```sql= CREATE DATABASE mmaddox; ``` 3. Build a table What does lines 30-35 do? ```sql= CREATE EXTERNAL TABLE mmaddox.access_logs ( log_line string ) ROW FORMAT SERDE 'com.amazonaws.glue.serde.GrokSerDe' WITH SERDEPROPERTIES ( 'input.format'='%{GREEDYDATA:log_line}' ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://<athena query results bucket>/mmaddox/access_logs/'; ``` 4. Query your table ```sql= SELECT * FROM mmaddox.access_logs LIMIT 10; ``` 5. Build out a rich grok expression Here is what chatGPT spat out: %{IP:client_ip} - - \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:response_size} \"%{URIPATHPARAM:referrer}\" \"%{DATA:user_agent}\" Humans FTW - %{IP:client_ip} - - \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:response_size} \"%{GREEDYDATA:url}\" \"%{DATA:user_agent}\" - Navigate to https://grokdebugger.com/ - Drop a few log lines into it - Create a grok expression that matches the line 6. The result ```sql= CREATE EXTERNAL TABLE justmiles.access_logs ( client_ip string, timestamp string, method string, request string, http_version string, response_code string, response_size string, url string, user_agent string ) ROW FORMAT SERDE 'com.amazonaws.glue.serde.GrokSerDe' WITH SERDEPROPERTIES ( 'input.format'='%{IP:client_ip} - - \\[%{HTTPDATE:timestamp}\\] \\"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}\\" %{NUMBER:response_code} %{NUMBER:response_size} \\"%{GREEDYDATA:url}\\" \\"%{DATA:user_agent}\\"' ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://aws-sftp-test-bucket-local/rbirnbaum/access_logs/'; ``` 7. Clean Up ```sql= DROP TABLE IF EXISTS mmaddox.access_logs; DELETE DATABASE mmaddox; ```
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up