Refactoring readMany API

# Refactoring readMany API This document is supposed to be an implementation sketch for a refactoring of the Zendro `readMany` functions to not use a count. See also github [issue #142](https://github.com/Zendro-dev/graphql-server-model-codegen/issues/142). ## General Design Ideas: * Resolvers enforce limits, model layer does not! * In the resolvers: We will use limit, after, or before to reduce the record-limit in the context. If the respective argument is not set, we initialize it to the global default value (Record-Limit) * In model: Do no count ever in the two readAll and readAllCursor functions. * In model: Use default global limit, if limit not set, provide no limit-offset arguments if limit-offset based pagination is not requested, i.e. the respective args are not provided * If no limit has been provided, no count needs to be made to provide a limit arg! The model layer does not care about the limits. * computation of count is inefficient and complex, especially in DDM or just remote case. -> Use upper limit, i.e. cursor- or limit-offset-based pagination args to reduce record limit * record limit will be reduced by the maximum if no pagination is provided. This has the consequence that running multiple queries in the same graphql statement won't work. ## Design sketch (old) see [Implementation-Spec-WIP](#Implementation-Spec-WIP) ### `readAll` ```javascript= // RESOLVER accessions: async function({search, order, pagination}, context) { if (await checkAuthorization(context, 'Accession', 'read') === true) { // helper to get record_limit let record_limit = helper.getCurrentRecordLimit(pagination, context) // reduce the record limit helper.reduceRecordLimit(record_limit, context); let benignErrorReporter = new errorHelper.BenignErrorReporter(context); return await accession.readAll(search, order, pagination, benignErrorReporter); } else { throw new Error("You don't have authorization to perform this action"); } } // MODEL - sql ``` ### `readAllCursor` ```javascript= ``` ### helpers ```javascript= getCurrentRecordLimit(pagination, context){ //something like let record_limit = pagination.limit === undefined ? context.record_limit : pagination.limit return record_limit; } reduceRecordLimit(){ // acutally reduce the record limit. // Check, if zero, and through benign error: No record-limit left } ``` ## Thought experiment on `has<Next|Previous>Page` without count (05 AUG 2020) * Cursor based pagination: hasNextPage | hasPreviousPage * Idea: Do it with an "extended" select * Theory: Assume alphabetically sorted Letter-Records. * User wants three records starting from cursor I in forward direction Expected Result: J,K,L Zendro does: I, J,K,L, M - hasPreviousPage: Does the cursor I exist, i.e. has it been returned? - hasNextPage: Is there a latter after the expected last record, i.e. L? Concrete: Forward and backward, if the cursor is null, there naturally is no previous or next page, respectivelly. Given the [Relay spec](https://relay.dev/graphql/connections.htm#sec-undefined.PageInfo) on Connections and Cursor-Based-Pagiantion (CBP), three solutions to avoid counts can be taken: 1) In the Spec's paragraph 5.1 in case we hit the '2.' conditions we return true (against spec). Less liked, because false information can be given to the user. 2) Ignore the Spec and say, existence of the cursor actually means that a previous or next page at least returns a single record. Thus including the cursor should be sufficient. Would actually go agains the spec. 3) Extend the search (WHERE conditions) to also look for one and exactly one record befor or after the cursor. To Do: Generate a software design in Pseudo Code and focus on SQL Queries. Forward, 1st page, page in the middle (nth), including and not including cursor Backward, 1st page, page in the middle (nth), including and not including the cursor Use UNION and base it on the following example: ```sql= ((SELECT id, asv_id FROM microbiome_asvs WHERE asv_id > 'ASVF_1' OR asv_id = 'ASVF_1' AND id >= 609734 ORDER BY asv_id ASC, id ASC LIMIT 12) UNION (SELECT id, asv_id FROM microbiome_asvs WHERE asv_id < 'ASVF_1' and id < 609734 ORDER BY asv_id DESC, id DESC LIMIT 1) ORDER BY asv_id ASC, id ASC); ``` ## Thoughts * build the sql raw statement from the `options` object used in `findAll` in the `readAllCursor` case to build the raw sequelize query that does the union. * Add 1 to the `LIMIT` to see if `hasNextPage` is true (forward) * `UNION` query with the backwards query `LIMIT` 1 to see if `hasPreviousPage`o is true (forward) * backward pagination should be analog ## has<Previous|Next>Page Thought Experiment 07. AUG 2020: ### Connection and Count Meeting minutes: * cursor B * dir forward * page-size 3 #### Single complex "union-query" strategy: - paginate forward including cursor and (set-size + 2) - union with: - paginate backward with page-size=1 returns `A B C D E F` *Note*: The above is different for the "first page", i.e. where cursor is NULL **Resolve questions:** - Has next page? - Has previous page? **Algorithm-Sketch:** In memory (Zendro) identify the position of the cursor in the result array Case "cursor still exists": A **B** C D E F (cursor B) Check length of items after cursor position. Case - is equal to or less page-size -> No next page - is > page-size -> Next page exists Check length of items before cursor position. Case - Item exist -> Has previous page - Item don't exist -> Has NO previous page **Detail on algorithm to find cursor position:** *Lemma*: Cursor must be at index 0 or 1 (counting from 0!) *Proof*: Consider possible cases: cursor still exists `1 1 0 0` previous page exists `1 0 1 0` cursor position `1 0 - -` *Note "-" indicates that the cursor is not in the result array.* *Also note that the result Array MUST be sorted*. Algorithm: Check sorted result array's first two positions. Evaluate at each of those, whether the item comes before the cursor or is the cursor in the current sort order. Depending on the following cases infer has-previous-page (P) and after-cursor-start-index (A) cases: 1. 0 before cursor and 1 is cursor -> P: true, A: 2 2. 0 before cursor and 1 is not cursor (after cursor) -> P: true, A: 1 3. 0 is cursor and 1 is not cursor (after cursor) -> P: false, A: 1 4. 0 is after cursor (not cursor and not before) AND 1 is after cursor -> P: false, A: 0 Suggested encoding of the above four cases in terms of three bits pos 0: item 0 before cursor? pos 1: item 0 is cursor? pos 2: item 1 is cursor? Remains to evaluate has-next-page from index A and result-array-size N: *Lemma*: (N - A) > page-size *Proof*: Consider above cases 1 to 4 using the initial example 1. (6-2) > 3 -> TRUE 2. (5-1) > 3 -> TRUE 3. (5-1) > 3 -> TRUE 4. (4-0) > 3 -> TRUE Assume above example, but with no existing next page 1. (5-2) > 3 -> FALSE 2. (4-1) > 3 -> FALSE 3. (4-1) > 3 -> FALSE 4. (3-0) > 3 -> FALSE Formal proof sketch: page-size p, total set-size T, result-array N. Should do the above four checks for all combinations of: `<=> (p,T,N)`, and next-page exists T|F Last problem: How to efficiently evaluate the questions - are you the cursor? -> trivial with equals id-attribute - are you before the cursor given the current sort order argument? input: order-arg array of elements of form ['attribute', 'sort-dir'] where sort-dir e (ASC, DESC) Above cases can thus be encoded: 1 -> `101` 2 -> `100` 3 -> `010` 4 -> `000` We generate two arrays: firstTwoCursor `[T|F, T|F]` firstTwoBeforeCursor `[T|F, T|F]` Using above "bit-approach" you can thus initialize our index A **NOTE:** * The second query in the UNION does NOT need the sorting! * Relay Connection Spec does NOT include 'totalCount', however learn.graphql.org does, but refers to Relay's spec. * If we wanted to include totalCount, we'd do it like this: ```javascript= var myConnection = { edges: [1,2,3], pageInfo: {foo: 1, bar: 2}, totalCount: async () => { return await count(origArgs) } } ``` **Conclusion (so far)** * use two single queries. * the second query should be fast because: * we don't need to sort * It just needs to find 1 record < searchArgs * worst case there is no record. * by doing two searches we do not need an additional sort of the result Set (`ORDER BY asv_id ASC, id ASC` in example UNION query) ## Implementation Spec (WIP) ### resolver ```javascript= //LIMIT-OFFSET-BASED readMany individuals: async function ({ search, order, pagination }, context) { if (await checkAuthorization(context, 'individual', 'read') === true) { await checkCountAndReduceRecordsLimit({ search, pagination }, context, "individuals"); let recordCount = pagination && pagination.limit !== undefined ? pagination.limit : context.recordsLimit; helper.checkCountAndReduceRecordLimitHelper(recordCount, context, "individualsConnection") let benignErrorReporter = new errorHelper.BenignErrorReporter(context); return await individual.readAll(search, order, pagination, benignErrorReporter); } else { throw new Error("You don't have authorization to perform this action"); } }, // CURSOR-BASED readMany individualsConnection: async function ({ search, order, pagination }, context) { if (await checkAuthorization(context, 'individual', 'read') === true) { let recordCount = pagination && pagination.first ? pagination.first : pagination && pagination.last ? pagination.last : context.recordsLimit; helper.checkCountAndReduceRecordLimitHelper(recordCount, context, "individualsConnection") let benignErrorReporter = new errorHelper.BenignErrorReporter(context); return await individual.readAllCursor(search, order, pagination, benignErrorReporter); } else { throw new Error("You don't have authorization to perform this action"); } } ``` ### model ```javascript= // benignErrorReporter not mentioned //LIMIT-OFFSET-BASED readMany static readAll(search, order, pagination, benignErrorReporter) { let options = helper.buildLimitOffsetSequelizeOptions(search, order, pagination, idAttribute); //use default BenignErrorReporter if no BenignErrorReporter defined benignErrorReporter = errorHelper.getDefaultBenignErrorReporterIfUndef(benignErrorReporter); let records = await super.findAll(options); return validatorUtil.bulkValidateData('validateAfterRead', this, records, benignErrorReporter); } // CURSOR-BASED readMany async readAllCursor(search, order, pagination){ // check valid paginationArguments helper.checkCursorBasedPaginationArgument(pagination); // build the sequelize options object for cursor-based pagination let options = buildCursorBasedSequelizeOptions(search, order, pagination, idAttribute); let records = await super.findAll(options); records = await validatorUtil.bulkValidateData('validateAfterRead', this, records, benignErrorReporter); // get the first record (if exists) in the opposite direction to determine pageInfo. // if no cursor was given there is no need for an extra query as the results will start at page "1". let oppRecords = []; if (pagination && (pagination.after || pagination.before){ let oppOptions = buildOppositeSearch(search, order, pagination, idAttribute) oppRecords = await super.findAll(oppOptions); } let edges = buildEdgeObject(records) let pageInfo = buildPageInfo(edges, oppRecords, pagination); return {edges, pageInfo} } ``` ### helpers ```javascript= // Translate searchConditions to sequelize // This helper can be reused in readAll and countRecords functions searchConditionsToSequelize(search){ let whereOptions = {}; if(search !== undefined && search !== null){ //check if(typeof search !== 'object') throw new Error('Illegal "search" argument type, it must be an object.'); let arg = new searchArg(search); whereOptions = arg.toSequelize(); } return whereOptions; } // Translate orderConditions to sequelize readable Object orderConditionsToSequelize(order, idAttribute){ let orderOptions = []; if (order !== undefined) { orderOptions = order.map((orderItem) => { return [orderItem.field, orderItem.order]; }); } if (!orderOptions.map(orderItem => { return orderItem[0] }).includes(idAttribute)) { orderOptions = [...orderOptions, ...[ [idAttribute, "ASC"] ]]; } return orderOptions; } // cursorPaginationArgumentsToSequelize(pagination, sequelizeOptions, idAttribute) { let isForwardPagination = !pagination || !(pagination.last != undefined); if (pagination) { //forward if (isForwardPagination) { if (pagination.after) { // base64Decode(cursor) currently a static model function. Should be a helper let decoded_cursor = JSON.parse(helper.base64Decode(pagination.after)); sequelizeOptions['where'] = { ...sequelizeOptions['where'], ...helper.parseOrderCursor(sequelizeOptions['order'], decoded_cursor, idAttribute, pagination.includeCursor) }; } // set the LIMIT to pagination first + 1 to see if hasNextPage is true if (pagination.first) { sequelizeOptions['limit'] = pagination.first + 1; } } else { //backward if (pagination.before) { let decoded_cursor = JSON.parse(helper.base64Decode(pagination.before)); sequelizeOptions['where'] = { ...sequelizeOptions['where'], ...helper.parseOrderCursorBefore(sequelizeOptions['order'], decoded_cursor, idAttribute, pagination.includeCursor) }; } if (pagination.last) { sequelizeOptions['limit'] = pagination.last; // Do we need offset in CBP? // options['offset'] = Math.max((countB - pagination.last), 0); } } } } buildCursorBasedSequelizeOptions(search, order, pagination, idAttribute){ let options = {}; // build the sequelize options object. options['where'] = searchConditionsToSequelize(search); options['order'] = orderConditionsToSequelize(order, idAttribute); // extend the where options for the given order and cursor cursorPaginationArgumentsToSequelize(pagination, options, idAttribute); return options; } buildOppositeSearch(pagination, options, idAttribute){ // IMPORTANT: COPY pagination and options, they are still needed. let isForwardPagination = !pagination || !(pagination.last != undefined); if(isForwardPagination){ pagination.before = pagination.after delete pagination.after; delete pagination.first; } else { pagination.after = pagination.before delete pagination.before; delete pagination.last; } cursorPaginationArgumentsToSequelize(paginationCopy, optionsCopy, idAttribute); return optionsCopy; } buildPageInfo(edges, oppRecords, pagination){ if (isForwardPagination) { let hasNextPage = (edges.length > pagination.first) ? true : false; // pop last edge. It is only used to determine if a next page exists. if (hasNextPage){edges.pop()}; pageInfo = { hasPreviousPage: (oppRecords.length > 0) ? true: false, hasNextPage: hasNextPage, startCursor: (edges.length > 0) ? edges[0].cursor : null, endCursor: (edges.length > 0) ? edges[edges.length - 1].cursor : null } } else { //backward let hasPreviousPage = (edges.length > pagination.last) ? true : false; // I think we need to shift instead of pop if backward pagination if (hasPreviousPage){edges.shift()}; pageInfo = { hasPreviousPage: hasPreviousPage, hasNextPage: (oppRecords.length > 0) ? true: false, startCursor: (edges.length > 0) ? edges[0].cursor : null, endCursor: (edges.length > 0) ? edges[edges.length - 1].cursor : null } } } buildEdgeObject(records){ if (records.length > 0) { edges = records.map(record => { return { node: record, cursor: record.base64Enconde() } }); } } ``` ## Backward pagination OFFSET Problem: In case of backward pagination we need to calculate the needed `OFFSET` to get the correct result. **EXAMPLE:** ```sql= SELECT * FROM individuals WHERE id < 7 ORDER BY id ASC LIMIT 2 OFFSET 4; id | createdAt | updatedAt | name ----+----------------------------+----------------------------+------- 5 | 2020-08-06 07:39:17.959+00 | 2020-08-06 07:39:17.959+00 | ind_5 6 | 2020-08-06 07:39:17.963+00 | 2020-08-06 07:39:17.963+00 | ind_6 ``` Note the `OFFSET 4` which is currently calculated using the **count** of all records with `id < 7`. ```javascript= options['offset'] = Math.max((countB - pagination.last), 0); ``` Consider also the following Example for an overview of all combinations `FORWARD/BACKWARD/ASC/DESC`: ``` Data: [1,2,3,4,5,6,7] Cursor(^): 4, Limit: 2 (--> marks result set) FORWARD ASC: Op: > =========== 1 2 3 4 5 6 7 ^ --> FORWARD DESC: Op: < ============ 7 6 5 4 3 2 1 ^ --> BACKWARD ASC: Op: < ============ 1 2 3 4 5 6 7 ..--> ^ BACKWARD DESC: Op: > ============= 7 6 5 4 3 2 1 ..--> ^ ``` **Conclusion:** `OFFSET 1` (`..`), calculation by `#records [Op] 4 - LIMIT = 3 - 2 = 1` is needed in case of `BACKWARD` pagination: ### Solution Reverse the sql sort `ORDER` and sort again after fetching the results with the opposite `ORDER`, either by using sql subqueries or postprocessing. Consider the above Example: ``` Solution without OFFSET: BACKWARD ASC: Op: < Order: DESC 7 6 5 4 3 2 1 ^ --> postprocess sort ASC: 2,3 BACKWARD DESC: Op: > 1 2 3 4 5 6 7 ^ --> postprocess sort DESC: 6,5 ``` #### Problem: Sort on multiple fields ##### Solution 1 For the postprocessing sort we will need to immitate the correct `ORDER` arguments depending on the given query. An easy solution would be to use lodash `orderBy`. See: https://lodash.com/docs/4.17.15#orderBy This should allow us to order on multiple fields and specific orders. We also use this to postprocess results in the distributed-data-model. ##### Solution 2 (efficient) Actually we can just reverse the returned records, the sort should already be correct just in the incorrect order. ## Changelog ### codegen * pagination is mandatory!. * **limit-offset**: `paginationInput` and `limit` are required by the schema. * **cbp**: `paginationCursorInput` required by schema. Either `first` or `last` has to be given. This is checked in the resolver. Error if invalid. * resolver checks record limit, model-layer does not care * `checkAndAdjustRecordLimit` resolver function removed, pagination arguments used instead. If exceeded error is thrown. This is now also the case in DDMs * in case of `to_one` end of an association pagination with `LIMIT 1` (either `limit` or `first`) is given to the root-resolver call. * in case of `one_to_one` end of assoc without the foreignKey pagination with `LIMIT 2` is used in the root resolver, to check if a record is associated to more than 1 record. Appropriate `benignError` is added. * `readAll` and `readAllCursor` refactored: * neither does need to count. * neither will use a limit, if no limit is given. (e.g. when requiring a model manually inside `node`) * `readAllCursor` validity check of paginiation args moved to resolver. * To avoid calculating the `OFFSET` needed for backward-pagination, instead the resulting records will be reversed post-fetching to get the desired order. * To calculate the pageInfo (`hasNextPage | hasPreviousPage`) the query limit (if needed) will be increased by 1 (Extra check needed for the special case someone uses `pagination:{first: -1}`), to get information about possible succeeding records. * Additionaly a second query with reversed Operators (<,<=,>,>=) with no `ORDER` and `LIMIT 1` will be run to get possible preceding records. * Functionalities outsourced to helper functions. * changed error message for invalid response from remote zendro-server. * Tests * Integration Test "`21. Limit check`" refactored * added Tests for DDM and Zendro-webservice * added integration test for generic `readAllCursor` ### skeleton * add Helpers for `readAll` & `readAllCursor`: ```javascript= // SQL searchConditionsToSequelize(search) orderConditionsToSequelize(order, idAttribute) orderConditionsToSequelizeBefore(order, idAttribute) cursorPaginationArgumentsToSequelize(pagination, sequelizeOptions, idAttribute) buildLimitOffsetSequelizeOptions(search, order, pagination, idAttribute) buildCursorBasedSequelizeOption(search, order, pagination, idAttribute) reverse buildOppositeSearchSequelize(search, order, pagination, idAttribute) buildPageInfo(edges, oppRecords, pagination) buildEdgeObject(records) reverseOrderConditions(order) // Generic cursorPaginationArgumentsToGeneric(search, pagination, orderOptions, idAttribute) buildCursorBasedGenericOptions(search, order, pagination, idAttribute) buildOppositeSearchGeneric(search, order, pagination, idAttribute) ``` * rename `checkCountAndReduceRecordLimitHelper` to `checkCountAndReduceRecordsLimit` ### documentation * added documentation on cursor-based-pagination argument * added mandatory pagination argument to all places in the docu where needed ## To Do * `readAllCursor` Implementation in generec models: DONE