Michał Pietrus

CDO
https://arxiv.org/pdf/2409.19653v2 In the article "DATA-CENTRIC DESIGN: INTRODUCING AN INFORMATICS DOMAIN MODEL AND CORE DATA ONTOLOGY FOR COMPUTATIONAL SYSTEMS" authors propose a Informatics Domain Model (IDM) along with the Core Data Ontology (CDO) and introduce a framework that enhances data security, semantic interoperability, and scalability across distributed data ecosystems. The article discusses the applicability of IDM and CDO in certain use cases as enablers for organizing and categorizing digital information, enhancements for reasoning systems, or equipping machine learning model training processes with consent receipts. The article introduces the model's four pillars (domains) and later demonstrates its mechanics. The external resource defines the additional terms within each domain. However, their purpose and applicability within the model are not defined. The article's rationale behind the article proposals, especially the Informatics Domain Model (IDM), remains unclear. The article PROBLEM STATEMENT section discusses the need for novel concepts due to limitations in existing informatics models yet doesn't discuss what these existing models are (the lack of exhaustive references is a different problem in the article). The problem later focuses on a lack of data semantics, likely in existing solutions, due to the focus on lower-level mechanical problems like securing IP addresses. While the OSI model is inherently the bare bone of communication in digital space, the purpose of comparison of Transport Layer (level 4) and Application Layer (level 7) for explaining the problem statement is unknown. The article identifies "integrity" and "authenticity" nonfunctional requirements as the significant properties in achieving data accuracy but doesn't explain the purpose of introducing them into the problem scope. Presuming authors perceive it as an essential concept, the article doesn't explain the relationship between the IDM model and the data accuracy, specifically how the model (or if at all) makes it a first-class citizen. Later, the article introduces data-centric and node-centric models comparison (lack of explanation or references) in the context of data semantics and concludes that if prioritizing data management is enhanced with semantic meaning, merely the data-centric approach is the way to go. Presuming node-centric is data-silo-centric, authors again compare OSI model layers to express the lack of data semantics problem. Later, the article expresses the data-centric approach as the enabler for data authorization and role-based access. However, it does not explain how it enables it. It's later unknown what's the purpose of the introduction of the witnessing concept in the example, and how does it relate to the problem statement.
Michał Pietrus changed 3 months agoView mode Like Bookmark
Potential for collaboration
Gossip-ing KERI-based networks do not require a globally ordered total consensus ledger, but network state convergence is not addressed in any existing implementation. This is not a problem of the KERI protocol itself but of disseminating "KERI-products" (KERI events). Current implementations merely enable queries on demand. Events dissemination is a use-case-specific problem, and not all use cases require network convergence. There are three cases where different network nodes aim for convergence:Witness <-> Witness Witness <-> Watcher Watcher <-> Watcher Current implementations address the first case (Witness <-> Witness) by applying round-robin dissemination from the Controller side. The Controller is responsible for and incentivized to disseminate events to Witnesses. The second case (Witness <-> Watcher) is addressed via querying on demand, which is reactive and runs only when requested, making it inefficient. The third case (Watcher <-> Watcher) remains unaddressed.
Michał Pietrus changed 4 months agoView mode Like Bookmark
Addressing more complex business processes in DF
Assumptions: Form might have a list of steps to-be-executed in order to complete the form. This is the form workflow. Invariants may define additional business logic that impacts form execution. Applicable independently from form workflow or applicable in a certain step. When a form requires to mimic roles, i.e., actor A fulfills and signs part X, actor B part Y and actor C part Z, form workflow imposes the same step repeated three times. Each step contains pages that are shown within this step. Workflow concept DF has a set of supported steps that have logical consequences within the app. Any customized workflow defined for a form must adhere these steps. List of supported steps: a. pending – not starteddefault starting step form schema modifications possible
Michał Pietrus changed a year agoView mode Like Bookmark
Invariants
Enforcing invariants on domain model may involve complex mechanisms (in terms of resources and time), including even external services, not to mention sophisticated validation functions like each store has a unique ID number check digit meets an identification number email is a proper email device type X has ABC additionally built-in, whereas Y does not conditionally applicable rules On the other hand, invariants may also take more straightforward roles, i.e.:
Michał Pietrus changed a year agoView mode Like Bookmark
OCA meets domain invariants
Domain invariants are all business rules that keeps the concistency and/or integrity of the domain model in place. Examples: each store has a unique ID number check digit meets an identification number device type X has ABC additionally built in, whereas Y does not Such rules keep the captured data stored in the domain model to support the business. The common sense suggests to not blend OCA, the integrity (schema) layer, with the higher level layers that impose and enforce invariants. Lets first discuss the current OCA status. It consists of a capture base and 10+ overlays that mainly serve as additional metadata providers. One of the primary reasons it is organized this way is governance, likely distributed, with no central authority deciding upon every aspect (overlay). By having such high granularity, we enable – distributed – governance to have highly granular reputation and business processes that rely on it. Highly granular overlays from OCA do not constitute any higher level subset that would organize them into more coherent, purpose-based groups. Perspective plays crucial role to consider it as good or bad design choice. Architecturally such design can be perceived as weak, because it is not coherent, that is, all the responsibilities are vague. However, when we consider the support of distributed governance, where we delegate it to authorities to decide what to "support" and who does it, such flat set of overlays gets different perspective. At the end of the day, reputation from governance outbids technical considerations.
Michał Pietrus changed a year agoView mode Like Bookmark
Untitled
What presentation needs to provide: allow to define how given attribute needs to be presented, e.g. binnary as a signature vs upload a file, Text as short string vs multiline text area support sections on the page ? Do we really need it? If we would have device and customer and both have a name, it needs to be clearly visible which name corresponds to which object. Sometimes grouping that in separate pages would make sens sometimes could be overkill. Presentation base.{ "bd": "..." # bundle digest "d": "...", "o": { # "pages", renamed from "containers" "pageY": [ list of attrs1 ], #order is relevant
Robert Mitwicki changed a year agoView mode Like Bookmark
Spatial orientation of element
Presentation, so the spatial orientation of elements on the screen has always been challenging. Until now, no common standard has emerged to address this, and various providers serving user interfaces approach it in their way. Portability in this field does not exist. Presentation is closely related to cognition, so the "mental action or process of acquiring knowledge and understanding through thought, experience, and the senses"[wiki]. The presentation comprises activities that are solely human-related. In essence, how a human, using a device with a screen, can benefit from digitalized information or enhance it. Here, the presentation branches to: ability to add new digital information, ability to read digital information. In 2023, for adding new digital information, we still use forms to capture it from humans. Forms, while still serving their job, are pretty primitive in the sense of how machines could assist humans with intent digitalization that is expressed using words. The process of form fulfillment expects clear and structured and often categorized answers from the form filler. Its mechanics also depend on how the form elements are arranged within the rectangular area of a device that displays them on the screen. In other words, the arrangement of form elements and their spatial orientation within the available area is often device-dependent or screen size-dependent. No common standard defines how form elements should be displayed on the screen. Depending on the form's complexity and structure (i.e., potential relationships between elements), a columnar approach is widely used where each column serves as the rectangular area for a cohesive part of the form. In many cases, simple/medium complexity forms use one-column approach, where form elements display one after another vertically.
Michał Pietrus changed a year agoView mode Like Bookmark
The determinism mechanics
Overlays Capture Architecture consists of capture base (head) and overlays (heavy tail). The term head an tail comes from the concept of object destructuring (i.e. https://elixir-lang.org/getting-started/pattern-matching.html) and essentially means there is one head and a potentially infinite number of elements in the tail. The former is a list of human meaningful attribute names that express properties or characteristics of observables. The latter brings context by enriching them with additional metadata. The ingredients of the heavy tail (overlays) are bound to their head by using the unique identifier of the head (its digest), the product of the one-way hash function. Hash functions have the deterministic characteristic that means, for given input, they always return the same output. For example, for given input "abc" and using the blake2 hash function, the output is always the 5qnr9eFS5vFAd5gFaHzyrV5nQI9b/du99O6F1cLgn54=. $ echo "abc" | openssl dgst -binary -blake2s256 | openssl base64 -A 5qnr9eFS5vFAd5gFaHzyrV5nQI9b/du99O6F1cLgn54= Such characteristic enables the use such products as identifiers. To give an example, there is a parent<->child relationship between two objects and the identifier of the parent is digest(parent) = FDqGuc96GMxgREtwY+0QxrkxUl0idXTww4PZCRN0KMY=. The child is in posession of the identifier of the parent and, therefore, is able to uniquely identify the parent by checking whether the digest(parent) matches what she knows. In terms of OCA, any object that participates in the system, whether it is a capture base or overlay, is uniquely identified by calculating the digest. Furthermore, all overlays additionally include the digest of the capture base, because they constitute a unidirectional relationship, where the capture base is the "parent". Any set of one capture base and multiple overlays constitute a whole that can be identifiable as digest(whole). By the "whole", we use the term OCA Bundle. Mechanically, these properties makes the whole system deterministic which means that for any digest(whole) we can uniquely identify the list of ingredients included in it. This operation is irreversible, that is, we cannot "deconstruct it" and find out the list of ingredients solely from the digest(whole). However, what we can do is to find out what ingredients were used by performing the assertion "0oY0+xYIgTeCvV5rJVpqRKrVmVkmT6vM+y0HI19bohk=" === digest(ing1+ing2+ing3). If such assertion passes, we say that the integrity of the object ing1+ing2+ing3 has not been tampered with. This is the determinism mechanics.
Michał Pietrus changed a year agoView mode Like Bookmark
OCA REPO Search mechanism
OCA Bundle that is a blend of capture base (head) and overlays (heavy tail) includes metadata that is either more or less valuable for search purposes, especially considering human purposes. Metadata useful for search purposes is: capture base attrs meta overlay label overlay information overlay This is how an individual can express the search intent by entering words that the search mechanism uses for full text search (FTS). This is search based on words that are part of given OCA Bundle.
Michał Pietrus changed 2 years agoView mode Like Bookmark
Information volatility vs. deterministic mechanics
Overlays Capture Architecture consists of capture base (head) and overlays (heavy tail). The term head an tail comes from the concept of object destructuring (i.e. https://elixir-lang.org/getting-started/pattern-matching.html) and essentially means there is one head and a potentially infinite number of elements in the tail. The former is a list of human meaningful attribute names that express properties or characteristics of observables. The latter brings context by enriching them with additional metadata. The ingredients of the heavy tail (overlays) are bound to their head by using the unique identifier of the head (its digest), the product of the one-way hash function. Hash functions have the deterministic characteristic that means, for given input, they always return the same output. For example, for given input "abc" and using the blake2 hash function, the output is always the 5qnr9eFS5vFAd5gFaHzyrV5nQI9b/du99O6F1cLgn54=. $ echo "abc" | openssl dgst -binary -blake2s256 | openssl base64 -A 5qnr9eFS5vFAd5gFaHzyrV5nQI9b/du99O6F1cLgn54= Such characteristic enables the use such products as identifiers. To give an example, there is a parent<->child relationship between two objects and the identifier of the parent is digest(parent) = FDqGuc96GMxgREtwY+0QxrkxUl0idXTww4PZCRN0KMY=. The child is in posession of the identifier of the parent and, therefore, is able to uniquely identify the parent by checking whether the digest(parent) matches what she knows. In terms of OCA, any object that participates in the system, whether it is a capture base or overlay, is uniquely identified by calculating the digest. Furthermore, all overlays additionally include the digest of the capture base, because they constitute a unidirectional relationship, where the capture base is the "parent". Any set of one capture base and multiple overlays constitute a whole that can be identifiable as digest(whole). By the "whole", we use the term OCA Bundle. Mechanically, these properties makes the whole system deterministic which means that for any digest(whole) we can uniquely identify the list of ingredients included in it. This operation is irreversible, that is, we cannot "deconstruct it" and find out the list of ingredients solely from the digest(whole). However, what we can do is to find out what ingredients were used by performing the assertion "0oY0+xYIgTeCvV5rJVpqRKrVmVkmT6vM+y0HI19bohk=" === digest(ing1+ing2+ing3). If such assertion passes, we say that the integrity of the object ing1+ing2+ing3 has not been tampered with. This is the determinism mechanics. Ingredients can be somehow thought of as a physical, tangible objects that have weight, dimensions, color etc, effectively making them uniquely identifiable by the human senses. In the digital space the equivalent of unique identification is the digest of a digital information. Digests, however, do care solely about the list of bytes of an information, not what such information represents. To give an example, digest(www.example.com) = siOoRRdDu6CGAXAf6ZP+YxJAmnsZr82CvudIgpL8oM0=, but it doesn't mean such digest uniquely identifies the whole content of the www.example.com. Such digest is solely the product of the hash function and therefore solely relies on the list of bytes of www.example.com and that's it.
Michał Pietrus changed 2 years agoView mode Like Bookmark
Addressing information mutability with OCA data structures
Mutable information is information that can change over time without any preemptive notification. OCA structures rely on the immutability of the information, which allows to easily add integrity control to such information because, by nature, it doesn't change. Therefore, immutability becomes a significant characteristic for any system that relies on what has been said because it gives a warranty to the end-consumer as it assures her the information did not change. Consider a telephone game where players pass the information, and the first player passes the information up to the last player. Whispering (as the game assumes) may introduce mistakes, not only mutating the information but also changing its meaning. However, if players also pass the unique identifier of the information that is used to verify its integrity, any mistake made by any player is immediately disclosed. Since everything in the core of OCA is immutable, thinking of adding potentially mutable information must be very well thought out. The anti-corruption layer (ACL) against mutable information must protect the immutable core. There are some lines of defense to protect immutable information against contamination of mutable information: [convenient for OCA ecystem, but diffcult to achieve] make mutable information immutable. The most convenient way to achieve it is to make a snapshot of mutable information, effectively making it immutable. Technically, most mutable information that can be beneficial to use with immutable OCA data structures is tree-based (take an ontology as an example). All the nodes can be identified within a tree via object id (OID), that is, the path to reach it from the tree's root through all the predecessors. OCA 1.0 spec brings OCA code tables that can be used precisely for this purpose. OCA code table that is a snapshot of mutable information is, however, immutable. Creating OCA code tables is a transition phase from mutable to immutable data structures. [more flexible, yet protective] make sure mutable information doesn't affect on immutable information. In other words isolate what's mutable and impose it doesn't change the context created from immutable information. Protect the core OCA that is immutable and make mutable information as additional, third party metadata, i.e., via meta overlay. Such metadata then becomes contextual, ecosystem dependent information, that doesn't need to be resolved by all the consumers. Resolution process refers to potential discovery and/or machine readability (i.e. ability to resolve information from a 3rd party WEB service) of given metadata (i.e. URI of a WEB service), effectively narrowing consumability only to parties that are really interested in it and have capabilities to interact with such WEB service.
Michał Pietrus changed 2 years agoView mode Like Bookmark
OCAfile training prompt
Below you can find OCAfile PEG rules. The rules start with the BEGIN OF RULES comment. They describe the grammar for defining OCAfile's. OCAFile's enable to define data schemas in an expressive way. To add more context, when you start defining OCAfile, you use one of the four basic commands defined in commands PEG rule. With ADD ATTRIBUTE you start with the attribute definition. For example ADD ATTRIBUTE firstName = Text starts with the firstName attribute definition. Text is the attribute type and all available types are defined in the attr_type PEG rule. If you consider an attribute is a "personal identifiable information" (PII), you flag such attribute using ADD FLAGGED_ATTRIBUTES, for example add flagged_attributes firstName. Note flagged_attributes is a list of attributes. When you want to add an additional description to an attribute, you use ADD INFORMATION. Note this rule expects lang. For example ADD INFORMATION EN ATTR firstName "The person first name". With ADD LABEL you can add a label to an attribute using the natural language. Note it also expects lang.
Michał Pietrus changed 2 years agoEdit mode Like Bookmark
OCA Repo v2
Rewrite to Rust, too many deps in Rust entities (OCA CB, OCA Bundle, Overlays) versioned and controlled via microledgers https://github.com/THCLab/microledger https://github.com/the-human-colossus-foundation/microledger-spec/blob/main/microledger.mdCurrently microledger serialization/deserialization to any representation is missing. Consider CESR for ML serialization search engine (Meta overlay, OCA CB attributes, label overlay) along with i18n support REST API
Michał Pietrus changed 2 years agoEdit mode Like Bookmark
(Lack of?) OCA structure
The OCA essence is data defined via capture base and metadata via overlays. Metadata by its nature is optional, thus all overlays tend to be optional. The so far defined overlays are universal across jurisdictions and use cases. They decorate raw bytes of data with additional meaning. They classify distinct characteristics that are part of, lets call it, the enriched data types ontology. In other words they constitute an ontology/classification about raw bytes. Some of the current overlays are borrowed from other ontologies, i.e., unit. Some can be inferred, i.e., character encoding. Inference works much like type inference in some strongly typed programming languages, where type annotation is not explicitly required. It is deduced. The metadata defined through existing overlays, while universally applicable to any use case, is at the same time heavily influenced by external factors. The most signifiacnt is the need for data presentation in various ways using user interfaces. The demand for proper fundational ontology is a prerequisite for the next challenge: ontology-based data integration. Before going deeper into this topic, lets define what is actually the case of even starting such challenge.
Michał Pietrus changed 2 years agoView mode Like Bookmark
Untitled
{ attributes: { "name": { format: "[A-Za-z]" character_encoding: "utf8" labels: { "en": "name", "pl": "imię", ... }
Michał Pietrus changed 2 years agoView mode Like Bookmark
CESR-based OCA Bundle
OCA Bundle, a self-sufficient aggregation of an OCA Capture Base and overlays, was primarily designed to be bundled into a ZIP file. ZIP file format was chosen to keep related OCA objects bounded altogether, potentially with external assets. However, the lack of need to support external assets in the bundle and more importantly, the evolution of protocols[CESR][CESR-proof], where cryptographic material is no longer a part of data model, but rather an attachment to it, influenced a novel format of the Bundle. Draft The canonical form of the Bundle is defined using JSON-representation: { "version": "OCAB10000023_", "said": SAID of the bundle, "capture_base": { "said": SAID of the CB,
Michał Pietrus changed 2 years agoView mode Like Bookmark
Untitled
Briefly looked into Apple/Google Wallet proposal for keeping various types of creds. While personally would not ever care to have creds in a visual form, we at this point cannot propose a counter-offer (where everything happens automatically , thus a credential is solely a record in database -- TDA takes care of everything else). The world inevitably is going to have digital credentials mimicking real-world ones, especially visually. Whether it is good or bad is secondary topic. We can utilize Apple/Google Wallet and use them as a part of our ecosystem and for our benefits. There are also obvious benefits for the end user: no need to have additional gazilion, wallet-like apps,just for the sake of keeping them somewhere. End-user UX is reasonable: creds are stored as a part of the OS. This is at the same time even more coupling with the big guys and their tech, but in this case it might be beneficial. Technology development and adoption must be primarily reasonable. Wallet-ism is not reasonable. Even if we'll provide our app and call it TDA-like something, it's still an app. Other providers will do exactly the same. We then end up in a messy world. Same as we have right now. The end user pleasant and seamless interactions must overcome it, thus what has been proposed by the big ones should be at least seriously considered as the adoption vector. In all cases where service oriented credentials are considered (tickets, boarding passes etc). Jurisdiction/gov lvl credentials , like passports, birth attestations, ID's I have no strong opinion yet.
Michał Pietrus changed 2 years agoView mode Like Bookmark
EUPL license distilled
As a potential consumer of our work, you deserve full transparency of how EUPL works and whether it is an option to consume our work in your codebase. We have chosen EUPL, because we want to protect ourselves (see our IPR approach) from exclusive appropriation. TL;DR: you can consume our work within your codebase, no matter whether it is licensed under one of OSS options or is proprietary. EUPL crucial characteristics in terms of software reusability It is compatible and business friendly, for reusing the code in a great number of other projects even licensed differently[2]; it is not viral: according to the provision of European Law (Directive EC 2009/24 recitals 10 & 15), the consumer can utilize static and dynamic linking with other programs without barriers or conditions.[2][3]
Michał Pietrus changed 2 years agoView mode Like Bookmark
Code Table for Unit mappings
A Code Table for Unit mappings provides a mapping of input units to output units for quantitative data. The unit conversion process consists of the following steps: Read source unit. Read target unit. Convert source unit to target unit. Conversion between units is defined as follows: Target unit = source unit * conversion factor + offset
Michał Pietrus changed 2 years agoView mode Like Bookmark
KERIOX TDA app notification system
actor "Actor 1" as a1 participant "Witness 1" as w1 box "TDA" #LightBlue participant "App" as tda2 participant "KERIOX" as k1 end box actor "Actor 2" as a2 a1 -> w1 : Sends delegation request\nto mailbox of Actor 2 w1 --> a1: ACK
Michał Pietrus changed 3 years agoView mode Like Bookmark