CDO - HackMD

# CDO Article Critique https://arxiv.org/pdf/2409.19653v2 In the article "DATA-CENTRIC DESIGN: INTRODUCING AN INFORMATICS DOMAIN MODEL AND CORE DATA ONTOLOGY FOR COMPUTATIONAL SYSTEMS" authors propose a Informatics Domain Model (IDM) along with the Core Data Ontology (CDO) and introduce a framework that enhances data security, semantic interoperability, and scalability across distributed data ecosystems. The article discusses the applicability of IDM and CDO in certain use cases as enablers for organizing and categorizing digital information, enhancements for reasoning systems, or equipping machine learning model training processes with consent receipts. The article introduces the model's four pillars (domains) and later demonstrates its mechanics. The [external resource](https://zenodo.org/records/13729820) defines the additional terms within each domain. However, their purpose and applicability within the model are not defined. The article's `rationale` behind the article proposals, especially the Informatics Domain Model (IDM), remains unclear. The article `PROBLEM STATEMENT` section discusses the need for novel concepts due to limitations in existing informatics models yet doesn't discuss what these existing models are (the lack of exhaustive references is a different problem in the article). The problem later focuses on a lack of data semantics, likely in existing solutions, due to the focus on lower-level mechanical problems like securing IP addresses. While the OSI model is inherently the bare bone of communication in digital space, the purpose of comparison of Transport Layer (level 4) and Application Layer (level 7) for explaining the problem statement is unknown. The article identifies "integrity" and "authenticity" nonfunctional requirements as the significant properties in achieving data accuracy but doesn't explain the purpose of introducing them into the problem scope. Presuming authors perceive it as an essential concept, the article doesn't explain the relationship between the IDM model and the data accuracy, specifically how the model (or if at all) makes it a first-class citizen. Later, the article introduces data-centric and node-centric models comparison (lack of explanation or references) in the context of data semantics and concludes that if prioritizing data management is enhanced with semantic meaning, merely the data-centric approach is the way to go. Presuming node-centric is `data-silo-centric`, authors again compare OSI model layers to express the lack of data semantics problem. Later, the article expresses the data-centric approach as the enabler for data authorization and role-based access. However, it does not explain how it enables it. It's later unknown what's the purpose of the introduction of the witnessing concept in the example, and how does it relate to the problem statement. It's worth adding that the [external resource](https://zenodo.org/records/13729820) clarifies the difference between data-centric and IP-centric, but the clarification doesn't answer the above statements. In the next section, `OBJECTIVES`, the article discusses how the IDM redefines security by categorizing data into one of four domains, but it remains not explained what the actual security redefinitions are or what the advancements in information security are. In the `SCALABILITY AND FUTURE DIRECTIONS` section, the article brings the following statement: ``` The meticulous design of the Informatics domain model and its accompanying ontology provide a scalable and adaptable foundation for computational systems. The accuracy and precision of the definitions, sourced primarily from the Oxford Dictionary, ensure a solid basis for the model's quadrimodal structure. ``` It is clear that the model went through iterations with the dictionary. However, it's not answered how it impacts the model foundation and it's appllicability for computational systems. **Summary** Failing to include proper references in the first part of the article challenges understanding the author's intent, weakening the article's message. The `rationale` behind the IDM foundations indicates the lack of semantic design inclusion in existing models. However, this does not explain the need for the new model. The article discusses differences in data-centric and node-centric models and favors data-centricity in all comparisons. It does not answer the comparison methodology or what concrete models it compares. The four domains of IDM introduce clear separation and their distinct purposes. The mechanics of the model are reasonable, although the purpose of each pillar's six additional terms in the context of the proposed framework in this article is unknown. The article discusses IDM applicability in scalability requirements and how it enhances distributed data ecosystems. However, it doesn't provide any actual metrics or measurements that express IDM's contribution to scalability. Since the article discusses IDM scalability specifically in the context of distributed data ecosystems, there must be a comparison to existing approaches. There is precisely the Domain-Driven Design ([DDD](https://en.wikipedia.org/wiki/Domain-driven_design)) approach along with countless concepts it introduces and aspects they address. While `DDD` doesn't explicitly address scalability, it is the enabler that achieves it. It furthermore is synonymous with `IDM` in `actions` and `objects` pillars, however when enhanced with Event Sourcing ([ES](https://en.wikipedia.org/wiki/Domain-driven_design#Event_sourcing)) addresses also `events` domain. Furthermore, equipping `DDD` with Command Query Responsibility Segregation ([CQRS](https://en.wikipedia.org/wiki/Command_Query_Responsibility_Segregation)) greatly supports `IDM` domain segregation on the architectural level. However, due to the lack of the `rationale`, `IDM` applicability remains unanswered, except for the use cases discussed in the article where some of the model pillars are in use. Use cases employ some model concepts, i.e., consent receipts or verifiable chain of data custody. However, the question arises whether they could not serve as distinct pieces outside `IDM`, effectively making them more broadly adaptable. The article may unnecessarily tackle and compare technical aspects with non-technical concepts with no common denominator. Purely redirecting toward Ontologies or Information Science, in general, would remove much of the discussed issues. A valuable addition to the article would be the following topics: - estimation of the actual costs of migrations towards IDM in the mentioned use cases. Furthermore, what are the costs of CDO maintenance in the long term, and how it evolves (if at all) over time. - applicability in centralized ecosystems.