--- title: Container sync pipeline refactor tags: Container plugin --- # Container sync pipeline refactor ## Content types (current state) ML (ManifestList) CB (ConfigBlob) M (Manifest) ---> ConfigBlob and ManifestList -> ManifestBlob and ManifestListManifest B (Blob) ---> Manifest DC T (Tag) ---> Manifest DC MS (ManifestSignature) ---> Manifest DC (provisional) ## Content types (new state) Idea: As long as the DC is in the first stage, extra_data can refer to other DC. When we put a DC in the pipeline, everything in extra_data must be resolved. * blob will be sent first into the pipeline * manifests in the `extra_data` will contain info on config and regular blobs * manifest lists will store info about manifests in the `extra_data` B (Blobs and Config Blobs) M (Manifest) ---> ConfigBlob [extra_data] and Blobs [extra_data] -> `manifest_dc.content.config_blob = await manifest_dc.extra_data.pop("config_blob_dc").resolution()` -> `manifest_dc.extra_data["blobs"] = [await blob.resolution() for blob in manifest_dc.extra_data.pop("blobs_dc")]` -> `self.put(manifest_dc)` -> ManifestBlob --> from `manifest_dc.extra_data["blobs"]` in the `_post_save` hook of manifest ML (ManifestList)---> Manifests [extra_data] ->`manifest_dc.extra_data["manifests"] = [await manifest.resolution() for manifest in manifest_dc.extra_data.pop("manifests_dc")]` * note - bring over the extra information between ML and M ( arch, platform, etc) ->`self.put(manifest_list_dc)` -> ManifestListManifest --> in the `_post_save` hook T (Tag) ---> Manifest DC [extra_data] ->`tag_dc.content.tagged_manifest = await tag_dc.extra_data.pop("tagged_manifest_dc").resolution()` -> `self.put(tag_dc)` ## Batching :::info To avoid trouble with batching, `await resolution()` should only be used in the first stage. ::: ```python man_dcs = [ ] list_dcs = [ ] tag_dcs = [ ] BATCH_SIZE = xxx for tag_dc in tag_list_from_upstream: if tag_dc == manifest_list_type: create_list_dc parse_list_dc_manifest_json discover_children_manifest create_children_manifest_dcs to_download_children.append(downloader.run(....)) signatures_from_ml = [] children_ml = [] missing_signatures = False for download_children_manifest in asyncio.as_completed(to_download_children_manifest): signatures = find_sigs(child_man_dc) if signatures: signatures_from_ml.extend(signatures) children_ml.append(child_man_dc) else: missing_signatures = True if not missing_signatures: for manifest in children_ml: handle_blobs man_dcs.extend(children_ml) sig_dcs.extend(signatures_from_ml) list_dcs.append(list_dc) tag_dcs.append(tag_dc) elif tag_dc == image_manifest_type: man_dc = create_tagged_manifest(...) signatures = find_sigs(man_dc) if signatures: handle_blobs(man_dc) sig_dcs.extend(find_sigs(man_dc)) man_dcs.append(man_dc) tag_dcs.append(tag_dc) if len(sig_dcs) + len(tag_dcs) + len(list_dcs) + len(man_dcs) > BATCH_SIZE: resolve_flush(man_dcs) resolve_flush(list_dcs) resolve_flush(tag_dcs) resolve_flush(sig_dcs) resolve_flush(man_dcs) resolve_flush(list_dcs) resolve_flush(tag_dcs) resolve_flush(sig_dcs) ```