---
# System prepended metadata

title: CloudQuery
tags: [geek-drink]

---

---
title: CloudQuery
type: slide
tags: geek-drink
slideOptions:
  theme: black
  transition: 'fade'
  parallaxBackgroundImage: 'https://go.justmiles.io/f/c82df4ba?download'
---

<!-- .slide: data-background="https://go.justmiles.io/f/686f794e?download"-->

# CloudQuery

Sync any source to any destination.

---

## What is this thing?

![](https://hackmd.io/_uploads/SJjw_UeO2.png)

cloudquery.io

Note:
 - Like many software projects, they're capabale of more than they're useful for.
 - We're not going to rewrite all our ETL to be used with cloudquery.

----

- software that helps simplify data access 
- particularly useful for extracting resources from APIs

Note:
 - Take an API, any API, iterate over the resources, store the results in a DB

---

## Why use CloudQuery?

----

**Asset Inventory**

- Ingest AWS resources - anything with an ARN, huck it into postgres <!-- .element: class="fragment" -->
- Ingest CrowdStrike resources (vulnerabilities, devices, etc) <!-- .element: class="fragment" -->
- JumpCloud? Sophos?  <!-- .element: class="fragment" -->
- Unified view across multiple resources <!-- .element: class="fragment" -->

----

**Compliance reporting**

- Unencrypted EBS volumes? <!-- .element: class="fragment" -->
- S3 buckets with public acccess? <!-- .element: class="fragment" -->
- Standard Cloud best-practices in one view <!-- .element: class="fragment" -->

---

## Sources

----

![](https://hackmd.io/_uploads/rJLhcLe_2.png)

----

![](https://hackmd.io/_uploads/HyJkjIlO3.png)

----

![](https://hackmd.io/_uploads/r1I-iUl_h.png)

----

![](https://hackmd.io/_uploads/SkwXoUl_n.png)

----

![](https://hackmd.io/_uploads/r1dVC8xdh.png)

---

## Destinations

----

![](https://hackmd.io/_uploads/B1GajLxuh.png)

----

![](https://hackmd.io/_uploads/Hk-m38gun.png)

---

## How CloudQuery Works

- Connects to a source - variety of API connections
- Extracts to a destination - handles the schema
- Written in Go; plugin architecture for sources and destinations
- When writing a plugin, only need to translate the resource into the SDK's struct
- Simple YAML configs

----

```yaml
kind: source
spec:
  name: "crowdstrike"
  registry: "github"
  path: "justmiles/crowdstrike"
  version: "v0.0.3"
  destinations: ["sqlite"]
---
kind: destination
spec:
  name: sqlite
  path: cloudquery/sqlite
  version: "v2.2.0"
  spec:
    connection_string: ./db.sql
```

---

## CloudQuery DEMO

- metabase.ops.gofigg.net

---

## Honorable Mention

----

SteamPipe - https://steampipe.io

```sql
select 
  runtime,
  count(*)
from 
  aws_lambda_function
group by 
  runtime;
 +------------+-------+
 |  runtime   | count |
 +------------+-------+
 | nodejs12.x |     1 |
 | python3.7  |     1 |
 | python3.8  |     2 |
 +------------+-------+
```

----

- SteamPipe creates a PostgreSQL plugin for APIs
- Instead of storing the data, it pulls it realtime - via SQL!
- Why not use this??
    - API Rate Limits
    - Ability to join across data sources
    - CloudQuery provides historical context

---

### Challenge

1. Install steampipe:
    https://steampipe.io/downloads

2. Install AWS Plugin
    ```
    steampipe plugin install aws
    ```
3. Launch the query editor
    ```
    steampipe query
    ```
4. Solve the Challenge and Slack me the answer and the query you used to solve it. First correct answer wins.

----

How many ECR images do we have in the FI account?

Hint: `aws_ecr_?`

---

Which ECR repository has the most images?


---

## Challenge Answers

----

How many ECR images do we have in the FI account?

![](https://hackmd.io/_uploads/HySCEdxO2.png)

----

Which ECR repository has the most images?

![](https://hackmd.io/_uploads/SkuP4deOh.png)

---


## Resources

- https://www.cloudquery.io/docs/developers/creating-new-plugin
