Apollo-be Content Integration

Apollo-be Content Integration === # Definitions # * content documents (pr-envelops, pci-ng-envelops) * original documents * master documents ## Content documents ## A *content document* is the build/modeled document in the publish-ready database. It’s created from the *original documents*/data. Every content document can be identified by it's **content-id** (ex: mnco54620, ln11, rn478, VS300241) ## Original documents ## Documents and other kinds of data/resources that are necessary to build the *content documents*. These are identified by a unique uri (ex: `/orig/cls/classif/VS300123`) ## Master documents ## This is an *original document* that contains all information to build the *content document* or identify the needed dependencies. It can be identified by a unique id, the *content-id*. For example: * VS300123 identifies master-document `/orig/cls/meta/VS300123` * ln123 identifies master-document `/orig/brons/wet/123` * rn300123 => `/orig/brons/jur/300123` * mnco54620 => `/orig/monkey/documentinfo/mnco54620` # Integrationflows # For several reasons we need more than one flow for updating the content in Marklogic. These reasons include performance, specific content-selection, manual updates (failed incrementals, technical changes, ...) * User-invoked update flow * Incremental update flow ## User-invoked update flow ## Targets mostly *content documents* (mnco172734, ln123, VS300123) For every targeted *content document* all dependent *original documents* will be updated if modified. ```sequence user -> service : updateDocument by endDocumentId Note right of service : get master-document by endDocumentId service -> ml : write master-document service -> ml : get-dependencies (by endDocumentId) ml --> service: returns all dependent original document uris Note right of service: update dependencies ``` ## Incremental update flow ## Targets *original documents* only. There is no dependency resolving necessary here because all possible dependencies are *original documents* as well. The rebuilding of the related *content documents* will be done by another process. In that process there is NO dependency (out-of-date) checking! ```sequence job1 -> cls.service : incrementalUpdate cls.service -> cls.oracle : query new/update data cls.service -> ml : push original data ``` ```sequence job2 -> monkey.service : incrementalUpdate monkey.service -> monkey.mssql: query new/update data monkey.service -> ml : push original data ``` ```sequence job3 -> service : remodelDocuments service -> ml : remodelDocuments Note right of ml : build envelops\nwith original data ``` # Remodelling (building of envelops) # The remodeling process expects all dependencies to be available. If not, the remodelling of the envelop should be aborted. ## Selection of document to remodel ## The selection is done by a query on a property-flag of the *master documents*. This flag is set: * when a master-document is updated * a new master-document is added * a known dependency is updated (todo) # Rules # ## 1. When the dependency of an end-document has changed ## We have to make sure that: * it will automatically be reflected to the corresponding master-document by the cms system **OR** * the trigger property is set on the master-document in Marklogic by the integration-service (by origdefinitions property?) ## 2. Dependencies that are only discoverable by parsing the content, need special treatment (inline-only dependencies) ## The reason is that we want to prevent parsing of content at the service. So parsing and discovery of dependencies should be done at ML during transform. (For now, not necessary) (See "[Inline-only-dependencies](#Inline-only-dependencies)") # FAQ # ### Why don’t we just do incremental updates with the dependency resolving system? ### Because it’s to slow. ### Why no dependency checking when remodeling? ### Slow and not necessary (see rule1) # Inline-only-dependencies # > These are dependencies that are only known by parsing content files. They are not indexed or registered in a cms's database. For example the backoffice images. **How to handle these dependencies?** The easiest solution is to regularly just sync the entire image folder to ML. We could generate a blacklist for not-needed images/files. See the backoffice.service example below. ```sequence job -> backoffice.service : backoffice.incrementalUpdate Note right of backoffice.service : get full folder\n listing with ts backoffice.service -> Marklogic : get todo Marklogic --> backoffice.service : Note right of backoffice.service : upload changed to S3 backoffice.service -> Marklogic : push placeholders ```