Data Source Course

# Data Source Course ## Recommended Reading Material https://docs.bandchain.org/technical-specifications/yoda.html # Introduction ## Who is this course for? This course is designed for anyone who is interested in learning how to create and deploy a data source onto BandChain. It is recommended that those who are attempting this course is able to code in [Python](https://www.python.org/) and has completed the BandChain course (add-link-here). ## Recommended reading material While not explicitly required, it is recommended to read up on the following before starting the course: - [yoda](https://docs.bandchain.org/technical-specifications/yoda.html) ## Course structure ### Chapter 1 - Data source overview In this section, you'll learn about what a data source is and how it works ### Chapter 2 - How to create a data source This section will teach you all the necessary skills for you to be able create your own oracle script. ### Chapter 3 - How to deploy/edit a data source You'll learn how to deploy or edit an oracle script on BandChain in this section. ## Course objective At the end of this course, you'll be able to: - Understand how a data source works - Create a working data source - Deploy and edit a data source on BandChain # Chapter 1 - Data source overview ## What is a data source? A data source is the most fundamental unit in BandChain's oracle system. It describes a proceduce to retrieve and process a raw data point from a external source. ## How does a data source work? When a `Request` event is received on BandChain, the associated data source IDs and calldatas are collated along with the request verification info and sent to Yoda's executor environment to await the data source execution. The results are then collected and constructed into a list of `MsgReportData` which is then submitted as a tx using the reporter key. ```mermaid flowchart TB A["Send MsgRequestData"] --> B["Get data source IDs and associated calldata"] B --> C["Collect data sources, calldata and verify with reporter key"] C --> D["Submit collected details to Yoda's executor environment"] D --> E["Collect all results and construct MsgReportData"] E --> F["Submit all MsgReportData"] ```  # Chapter 2 - How to create a data source ## Writing the data source [//]: # (ds can be any type of executable ... but have support for python runtime and this course will focus on that) While a data source can be any type of executable or a Python script, in this course, we'll focus on creating a data source as a Python script. As the data source will be executed in Yoda's Python runtime, the Python packages that a data source are able to use is limited to what Yoda's runtime has available. The available packages on Yoda's Python runtime can be found [here](https://github.com/bandprotocol/data-source-runtime/blob/master/requirements.txt). ### Data source workflow A data source runtime can be defined into three main sections: - Input - Computation - Output #### Input As Yoda treats each data source as an executable, a [shebang line](https://en.wikipedia.org/wiki/Shebang_(Unix)) in the python script is required to help define the type and path for Yoda when executing the script. If there are any inputs for the script, it is handled as arguments following the executable call. Therefore, we can use the `argv()` function from Python's [`sys`](https://docs.python.org/3/library/sys.html) module in order to get the command line arguments that was passed to the Python script. The example below shows an oracle script that takes two input, `symbol` and `multiplier` of type `str` and `int` respectively: ```python #!/usr/bin/env python3 import sys def main(symbol: str, multiplier: int): pass if __name__ == "__main__": symbol = sys.argv[1] # The symbol input multiplier = int(sys.argv[2]) # The multiplier input print(main(symbol, multiplier)) ``` #### Computation During the computation flow, the requested data should be retrieved and parsed into a format we expect our oracle script to retrieve. For example, if our oracle scripts requires the price of a particular asset and the endpoint we've queried returns a message it the following format: ```json { "bitcoin": { "usd": 19965.3 } } ``` We'd need to find a method to parse the data into just `19965.3`. Below is an example of a data source that queries an endpoint for the Bitcoin Price Index (BPI) and parses the given information into just the BPI rate. ```python #!/usr/bin/env python3 import requests from typing import Literal def main(quote_currency: Literal["USD", "GBP", "EUR"]): # Checks if quote currency is supported. # If not, raises an exception. if quote_currency not in {"USD", "GBP", "EUR"}: raise Exception(f"Quote currency {quote_currency} is not supported") # Sends a HTTP GET method to the URL given. r = requests.get("https://api.coindesk.com/v1/bpi/currentprice.json") # If response status code is not 200, raises an exception. r.raise_for_status() # Attempt to get a specific value from the response. If the response # format is not as expected, raises an exception. try: return r.json()["bpi"][quote_currency]["rate_float"] except Exception as e: raise e ``` #### Output Now that we're able to get that the oracle script requires, we need a method to send the data back to BandChain. In order to do so, we will need to print our output so that Yoda is able to read the output and send the data back to BandChain. In the case of the data source not executing as expected, we can use `sys.exit(exit_code)` to set the exit code to `1` as to identify that the data source was not executed successfully. Below shows an example implementation of the data sources input, computation and execution: ```python #!/usr/bin/env python3 ## THIS IS CODE IS STRICTLY TO BE USED AS AN EXAMPLE AND IS NOT ## MEANT FOR PRODUCTION USE import sys import requests from typing import Literal URL = "https://api.coindesk.com/v1/bpi/currentprice.json" def main(quote_currency: Literal["USD", "GBP", "EUR"]): # Checks if quote currency is supported. # If not, raises an exception. if quote_currency not in {"USD", "GBP", "EUR"}: raise Exception(f"Quote currency {quote_currency} is not supported") # Sends a HTTP GET method to the URL given. r = requests.get("https://api.coindesk.com/v1/bpi/currentprice.json") # If response status code is not 200, raises an exception. r.raise_for_status() # Attempt to get a specific value from the response. If the response # format is not as expected, raises an exception. try: return r.json()["bpi"][quote_currency]["rate_float"] except Exception as e: raise e if __name__ == "__main__": try: # Takes the first command line argument passed to the Python script # and runs uses it as the parameter for main(). # The result of main is then printed print(main(sys.argv[1])) except Exception as e: # Prints out any exceptions that the script comes acrosses print(str(e), file=sys.stderr) # Exits with exit status 1 sys.exit(1) ``` ### Challenge Here's a challenge to help test your understanding of the knowledge acquired in the past 3 chapters: Given the following endpoint: https://api.binance.com/api/v3/klines, with the following [documentation](https://binance-docs.github.io/apidocs/spot/en/#kline-candlestick-data) where its endpoints parameters are as follows: | Name | Type | Mandatory | Description | | ----------- | -------- | --------- | ------------------------------------------------------------------------------------------------------------- | | `symbol` | `STRING` | Yes | Pair of asset to query. e.g: `BTCUSDT` | | `interval` | `ENUM` | Yes | Candlestick interval. e.g: `1d` | | `startTime` | `LONG` | No | Candlestick start time in Unix milliseconds timestamp format. Default is the latest candle. e.g. `1632398400` | | `endTime` | `LONG` | No | Candlestick end time in Unix milliseconds timestamp format. Default is the latest candle. e.g. `1632398400` | | `limit` | `INT` | No | Limit of results returned. Default is 500. e.g: `494` | Create a data source that queries the API for a **daily** candlestick's closing price given the candlestick's opening timestamp and the candlestick pair. The code should return an error along with exit status `1` if the pair **does not** exist or **does not** have an candle that opens at the specified timestamp. The data source should take the following inputs: | Field | Type | Description | | --------- | ---- | --------------------------------------------------------- | | pair | str | The candlestick pair to query | | timestamp | int | The candlestick's start time in Unix timestamp in seconds | The example inputs and expected outputs for testing can be found below: Input: | pair | timestamp | | -------- | ---------- | | BTCUSDT | 1502928000 | | ETHUSDT | 1662508800 | | BANDUSDT | 1618531200 | Output: | price | | ------------- | | 4285.08000000 | | 1630.00000000 | | 20.96470000 | # Chapter 3 - How to deploy and edit a data source When interacting with BandChain, we need to send instructions to BandChain to tell it what to do. The method in which we use to send instructions is by creating messages and broadcasting them to BandChain. ## Broadcasting Messages In order to broadcast these messages, our online IDE, [Band Builder](https://band-builder.netlify.app/) can be used to both write the data source and deploy it. Alternatively, our SDK's [bandchain.js](https://github.com/bandprotocol/bandchain.js) and [pyband](https://github.com/bandprotocol/pyband) can also be used. Examples on how to form a transaction to send messages on pyband can be found [here](https://docs.bandchain.org/client-library/pyband/transaction.html#example-use-case) and an example for bandchain.js can be found [here](https://docs.bandchain.org/client-library/bandchain.js/transaction.html#gettxdata-signature-publickey-signmode). ## Message Definitions There are two messages used to deploy and edit a data source, `MsgCreateDataSource` and `MsgEditDataSource` ### MsgCreateDataSource `MsgCreateDataSource` is used to create a data source on BandChain and contains the following parameters: | Parameter | Type | Description | |-------------|--------|------------------------------------------------------------| | Name | string | The name of the data source to be deployed | | Description | string | A description of the data source | | Executable | bytes | The content of the data source executable | | Fee | Coin | The fee to be paid per-query when this data source is used | | Treasury | string | The treasury address of which the fee is paid too | | Owner | string | The address of owner for the data source to be deployed | | Sender | string | The address of the message sender | ### MsgEditDataSource `MsgEditDataSource` is used to modify the parameters of an existing data source on BandChain and contains the following parameters: | Parameter | Type | Description | |--------------|--------|------------------------------------------------------------| | DataSourceID | int64 | The data source identifier | | Name | string | The name of the data source to be deployed | | Description | string | A description of the data source | | Executable | bytes | The content of the data source executable | | Fee | Coin | The fee to be paid per-query when this data source is used | | Treasury | string | The treasury address of which the fee is paid too | | Owner | string | The address of owner for the data source to be deployed | | Sender | string | The address of the message sender | where the following field can be set as `["do-not-modify]` in their respective types if no changes are required from the existing values on BandChain: - `name`: `"[do-not-modify]"` - `description`: `"[do-not-modify"]"` - `executable`: `b"[do-not-modify]"`  # Chapter 4 - Conclusion Congratulations on finishing the primer course on BandChain's data sources. With your newly acquired knowledge, you should be able to create your very own data source! Head over to our oracle script course to learn how to implement your data sources into an oracle script to request data on BandChain!