Xendit Reliability ETL

# Xendit Reliability ETL The purpose behind this project is to organize the various data that is necessary so we can easily analyze how reliable we are operating as an engineering organization. The current set of data that we look at are: 1. Incidents 2. PagerDuty 3. Sentry A primer on the above can be found in our [RFC documents](https://drive.google.com/drive/u/1/folders/1ltxy0JbLY53PX7mtwow7Depc2FqflfJ-) so will not be covered in this document. ## Extension There are a few areas which need to be covered before you're able to understand and contribute to this project: 1. Source Data 2. Data Schemas ### Source Data As mentioned above, our current data sources which inform how well we're doing in terms of reliability are taken from (this list will expand): 1. Incidents 2. PagerDuty 3. Sentry 4. Github 5. DataDog #### Incidents Our incidents data comes from [postmortem docs](https://drive.google.com/drive/u/1/folders/0ANGugt3gnJvqUk9PVA). We have created automation for this. Follow these steps below to perform ETL on incidents data, #### Steps - Config `google_client_id`, `google_client_secret`, and `google_client_redirect_uri` in your environment variables. - Run `node scripts/extractors/postmortem.js`. This script will download all postmortem docs in the drive and stored it as `html` files in `raw_data/postmortems`. - Run `node scripts/extractors/incidents.js`. This script will automate the extract stage and stored all the postmortem docs in `clean_data/incidents.json` . ##### Structures ``` [ ... { "incident_name": "<INCIDENT_NAME>", "postmortem_link": "<POSTMORTEM_LINK>", "statuspage_link": "<STATUSPAGE_LINK>", "incident_date": "<INCIDENT_DATE>", "affected_entities": [ { "entity_name": "<AFFECTED_ENTITY_NAME>", "products": [ "<AFFECTED_PRODUCT>", ... ] }, ... ], "root_causes": [ "<ROOT_CAUSE>", ... ], "stats": { "time_of_first_trigger": "<TIME_OF_FIRST_TRIGGER>", "time_of_customer_detect": "<TIME_OF_CUSTOMER_DETECT>", "time_of_internal_detect": "<TIME_OF_INTERNAL_DETECT>", "time_of_recovery": "<TIME_OF_RECOVERY>", "time_of_reconcile": "<TIME_OF_RECONCILE>", "time_of_rca": "<TIME_OF_RCA>", "number_of_impacted_customers": "<NUMBER_OF_IMPACTED_CUSTOMERS>", "number_of_failed_requests": "<NUMBER_OF_FAILED_REQUESTS>", "severity_level": "<SEVERITY_LEVEL>" }, "reliability_gaps": "<RELIABILITY_GAPS>" } ... ] ``` #### PagerDuty The way the process for getting the performance data from PagerDuty is as follows: 1. Run the extractor script in `PAGERDUTY_SECRET=<YOUR_PAGERDUTY_SECRET> node /scripts/extractors/pagerduty.js` 2. Make sure the raw data gets written to: 1. `/raw_data/pagerduty_services/` 2. `/raw_data/pagerduty_incidents/` 3. `/raw_data/pagerduty_log_entries/` 4. `/raw_data/pagerduty_teams/` 5. `/raw_data/pagerduty_users/` #### Sentry Sentry offers us an API to read the information about our usage and statistics. In order to pull this data, you must have access to a Sentry API secret key (we're currently using personal tokens to avoid having to go through the OAuth flow). 1. Run the Sentry projects extractor by running `SENTRY_API_KEY=<YOUR_API_KEY> node scripts/extractors/sentry.js` 2. Check to make sure that the data has been written to: 1. `/raw_data/sentry_projects/` 2. `/raw_data/sentry_issues/` 3. `/raw_data/sentry_teams/` #### Github Our github data will allow us to know whether we've implemented our RFCs correctly. To pull the github data, run the following: 1. Run the extractor by calling `GITHUB_API_KEY=<YOUR_API_KEY> node scripts/extractors/github.js` 2. Check to make sure the data has been written to: 1. `/raw_data/github_teams/` 2. `/raw_data/github_repos/` 3. `/raw_data/github_repo_teams/` 4. `/raw_data/github_repo_hooks/` #### DataDog To pull DataDog data, run the following: 1. Run the extractor by calling `DATADOG_API_KEY=<YOUR_API_KEY> DATADOG_APPLICATION_KEY=<YOUR_APP_KEY> node scripts/extractors/datadog.js` 2. Check to make sure the data has been written to: 1. `/raw_data/datadog_monitors/` 2. `/raw_data/datadog_synthetics/` ### Data Schemas Below is a detailed description of our data schemas that the source data is loaded into to allow us to perform complex queries more efficiently. #### Incidents ![Incidents ERD](resources/erd/Incidents.png) #### PagerDuty ![PagerDuty ERD](resources/erd/PagerDuty.png) #### Sentry ![Sentry ERD](resources/erd/Sentry.png) #### Github ![Github ERD](resources/erd/Github.png) #### DataDog ![DataDog ERD](resources/erd/DataDog.png) #### Developer Survey ![Developer Survey ERD](resources/erd/Developer%20Survey.png) 1. Incidents 1. Incidents 2. AffectedProducts 3. RootCauses 2. PagerDuty 1. PagerDutyTeams 2. PagerDutyUsers 3. PagerDutyUserTeams 4. PagerDutyUserContactMethods 5. PagerDutyServices 6. PagerDutyIncidents 7. PagerDutyLogEntries 3. Sentry 1. SentryProjects 2. SentryTeams 3. SentryIssues 4. SentryIssueEvents 4. Github 1. GithubRepos 2. GithubTeams 3. GithubRepoTeams 4. GithubRepoHooks 5. DataDog 1. DataDogMonitors 2. DataDogSynthetics

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.