TEP: Summary/Aggregaion funtions for Tekton Results

# Draft: Summary/Aggregation functions for Tekton Results ## Current Status The data stored in the Tekton Results can be retrieved using REST, gRPC or tkn-results CLI tool. Here is the current list of data that can be retrieved. * List of the Results for a namespace - `parent=<namespace>` * List of Results for all namespaces - `parent=-` * A single Result - `name=<namespace>/results/<parent-run-uuid>` * List of the Records under a particular Result - `parent=<namespace>/results/<parent-run-uuid>/records` * List of all the Records under a particular namespace - `parent=<namespace>/results/-/records` * List of all the Record in all namespaces - `parent=-/results/-/records` * A single Record - `name=<namespace>/results/<parent-run-uuid>/records/<child-run-uuid>` * List of Logs under a particular Result - `parent=<namespace>/results/<parent-run-uuid>/logs` * List of all Logs in a namespace - `parent=<namespace>/results/-/logs` * List of all Logs in all namespaces - `parent=-/results/-/logs` * A single Log - `name=<namespace>/results/<parent-run-uuid>/logs/<child-run-uuid>` The List of responses can be filtered using CEL filtering expressions to further minimize the number of responses. The filters can work on metadata of the Result, Record, or Logs object as well as the actual data contained by them. For example, results can be filtered on if the PLR or TR succeeded, similarly Records can be filtered on the type of Run they represent, duration, timestamp, and metadata of the runs etc. ## Missing Functions Currently, there are no aggregation functions present in Tekton Results API that can provide data about the number of particular responses and other useful information. Here is the list of desired functions useful for aggregation and summary. The list assumes that we can already filter the desired Results and Records. Any missing filtering function is not a part of this TEP. * Given a list of Results (after filtering), summary may contain * Total number of Results * Number of TaskRuns and PipelineRuns * Number of Results in different statuses i.e Succeed, Failed, Pending etc. * Total duration of all Results * Average duration of all Results * Max/Min duration of Results * Given a single Result, summary may contain * Total number of TaskRuns * Number of TaskRuns in different statuses * Total duration of the Result * Average duration of TaskRuns * Max/Min duration of TaskRuns * Given a list of Records (after filtering), summary may contain * Total number of Records * Number of TaskRuns/PipelineRuns * Total duration of all the Records * Average duration of all the Records * Min/Max duration of Records * Group the summary based on group by ## Proposal The proposal is to provide a new endpoint or/and path that provides an aggregation summary of the responses. This is useful for dashboards where aggregate summary of the data is required. Currently, this aggregation is performed on the client side that requires multiple API calls and wastes bandwidth and resources. ### Functions This section provides the requirements for the functions needed. #### Results Here is a general example of list of results ```json { "results": [ { "name": "default/results/640d1af3-9c75-4167-8167-4d8e4f39d403", "id": "338481c9-3bc6-472f-9d1b-0f7705e6cb8c", "uid": "338481c9-3bc6-472f-9d1b-0f7705e6cb8c", "createdTime": "2023-03-02T07:26:48.972907Z", "createTime": "2023-03-02T07:26:48.972907Z", "updatedTime": "2023-03-02T07:26:54.191114Z", "updateTime": "2023-03-02T07:26:54.191114Z", "annotations": {}, "etag": "338481c9-3bc6-472f-9d1b-0f7705e6cb8c-1677742014191114634", "summary": { "record": "default/results/640d1af3-9c75-4167-8167-4d8e4f39d403/records/640d1af3-9c75-4167-8167-4d8e4f39d403", "type": "tekton.dev/v1beta1.TaskRun", "startTime": null, "endTime": "2023-03-02T07:26:54Z", "status": "SUCCESS", "annotations": {} } }, { "name": "default/results/c360def0-d77e-4a3f-a1b0-5b0753e7d5af", "id": "9514f318-9329-485b-871c-77a4a6904891", "uid": "9514f318-9329-485b-871c-77a4a6904891", "createdTime": "2023-03-02T07:28:05.535047Z", "createTime": "2023-03-02T07:28:05.535047Z", "updatedTime": "2023-03-02T07:28:10.308632Z", "updateTime": "2023-03-02T07:28:10.308632Z", "annotations": {}, "etag": "9514f318-9329-485b-871c-77a4a6904891-1677742090308632274", "summary": { "record": "default/results/c360def0-d77e-4a3f-a1b0-5b0753e7d5af/records/c360def0-d77e-4a3f-a1b0-5b0753e7d5af", "type": "tekton.dev/v1beta1.TaskRun", "startTime": null, "endTime": "2023-03-02T07:28:10Z", "status": "SUCCESS", "annotations": {} } } ], "nextPageToken": "" } ``` The useful aggregation functions can be: For list of Results - * **size** - number of results in the response with given filter * **average duration** - average duration of results * **number of records** - total number of records in the results with given filter For a single Result - * **number of taskruns/pipelineruns** * **number of records** * **total duration of run** * **number of runs in different status** - pass, fail, pending, etc * <add-more-here> User should be able to filter these aggregations based on CEL, like with current APIs. #### Records Here is an example of a list of records: ```json { "records": [ { "name": "default/results/640d1af3-9c75-4167-8167-4d8e4f39d403/records/640d1af3-9c75-4167-8167-4d8e4f39d403", "id": "df3904b8-a6b8-468a-9e3f-8b9386bf3673", "uid": "df3904b8-a6b8-468a-9e3f-8b9386bf3673", "data": { "type": "tekton.dev/v1beta1.TaskRun", "value": "VGhpcyBpcyBhbiBleG1hcGxlIG9mIHJlY29yZCBkYXRhCg===" }, "etag": "df3904b8-a6b8-468a-9e3f-8b9386bf3673-1677742019012643389", "createdTime": "2023-03-02T07:26:48.997424Z", "createTime": "2023-03-02T07:26:48.997424Z", "updatedTime": "2023-03-02T07:26:59.012643Z", "updateTime": "2023-03-02T07:26:59.012643Z" }, { "name": "default/results/640d1af3-9c75-4167-8167-4d8e4f39d403/records/77add742-5361-3b14-a1d3-2dae7e4977b2", "id": "62e52c4d-9a61-4cf0-8f88-e816fcb0f84a", "uid": "62e52c4d-9a61-4cf0-8f88-e816fcb0f84a", "data": { "type": "results.tekton.dev/v1alpha2.Log", "value": "VGhpcyBpcyBhbiBleG1hcGxlIG9mIHJlY29yZCBkYXRhCg==" }, "etag": "62e52c4d-9a61-4cf0-8f88-e816fcb0f84a-1677742014245938484", "createdTime": "2023-03-02T07:26:54.220068Z", "createTime": "2023-03-02T07:26:54.220068Z", "updatedTime": "2023-03-02T07:26:54.245938Z", "updateTime": "2023-03-02T07:26:54.245938Z" } ], "nextPageToken": "" } ``` The useful aggregation functions can be: For the list of Records: * **size** - number of records for a given set of filters * **average duration** - average duration of run for a given set of filters * <add-more-here> ## Grouped Aggregation Grouped aggregation is useful especially for finding trends in data and avoids multiple API calls for similar queries. For example, one might be interested in getting a summary of all PipelineRuns that succeed on a particular day, grouped by every hour. Normally, this would require around 24 API calls for getting a summary of each hour. Grouped aggregation can help this to a single API call. Here is the list of possible grouped aggregation: * group by time - by hours, week, month, year etc. * group by parent - by namespaces * group by status - by succeed, pending, failed etc. * group by duration - by time quantum * group by type - type of record or result ## Implementation ### A `summary` path in existing endpoints We can add a `/summary` (or any keyword suitable) to the existing list APIs and get the aggregation results for a given response. Here is an example of an API call for listing records ```bash curl --insecure -H "Authorization: Bearer $ACCESS_TOKEN" \ -H "Accept: application/json" \ https://localhost:8080/apis/results.tekton.dev/v1alpha2/parents/-/results/-/records?filter='data.status.completionTime.getDate()==7' ``` The response is the list of records that match the given filters. Adding a summary path in this endpoint: ```bash https://localhost:8080/apis/results.tekton.dev/v1alpha2/parents/-/results/-/records/summary?filter='data.status.completionTime.getDate()==7'&summary='total,success,failed,...&group_by='namespace' ``` This will return the summary data. The exact format of the summary response is subject to discussion. But a raw format can be like this: ```json { "query": "query used to generate this summary", "summary": { "total": "number of total resources", "avg-duration": "average duration of runs", "success": "number of successful runs", "failed": "number of failed runs", "pending": "number of pending runs" ... <more fields here> }, <more fields here> } ``` ```json { "query": "query used to generate this summary", "group_by": "group" "summary": [ {"group-one":{<summary></summary>}} {"group-two":{}} ] } ``` #### How does this work? All the filters provided are passed to the path before `summary`, here in this case it is `<url>/apis/results.tekton.dev/v1alpha2/parents/-/results/-/records`. Then the fields passed to the summary are evaluated, and the desired data is calculated. The fields are defined below. * `summary` field in the JSON is a map of strings with string key. It contains all the requested aggregate fields. * `query` field works as a unique identifier for a summary response. We should be able to reproduce the similar summary for a given set of queries. Here it may not be the same summary since some variable fields might change. For example, if you have requested the number of pending runs, the data may change on your next call as some pending Runs are now having different status. ### A CEL transformer query for aggregation Here we send a CEL transformer query along with a filter query. Here is an example of an API call for summary aggregation using list API of records ```bash curl --insecure -H "Authorization: Bearer $ACCESS_TOKEN" \ -H "Accept: application/json" \ https://localhost:8080/apis/results.tekton.dev/v1alpha2/parents/-/results/-/records?filter='data.status.completionTime.getDate()==7'&:transfomer='average_duration=average_time( path-to-create-time_in_data, path-to-complete-time_in_data), success_pipelineruns=success_res(path_to_success_condition),failed_pipelineruns=failed_res(path_to_success_condition), pending_pipelineruns=pending_res(path_to_success_condition),total(record)' ``` CEL functions like failed_res, pending_res, and success_res will be implemented. This will return the list data along with the transform data field. The exact format of the transformed data response is subject to discussion. But a raw format can be like this: ```json { "transformed_data": { "total": "number of total resources", "average_time": "average duration of runs", "success_pipelineruns": "number of successful runs", "failed_pipelineruns": "number of failed runs", "pending_pipelineruns": "number of pending runs" ... <more fields here> }, <more fields here> } ```