Data Discovery in Solid

# Data Discovery in Solid # Problem * Data Discovery, Data Validation, Shapes, Footprints, and Authorization are highly interrelated. * Meaningful applications that are truly interoperable cannot be created until these things are established * We must create something that is safe, secure, and solid that works now, while at the same time fostering something that allows us to do more advanced data retrieval (reshaping, query, etc). We have to be able to build upon what we have or at least not paint ourselves into a corner. ### Discovery An application needs a way to find the data that it needs to operate ### Footprints An application needs some blueprint that tells it where to store and how to link new shape data Similarly, we need footprints to inform how we should structure permissions and shape validations on these resource locations. ### Authorization Users need to understand what kind of access to grant to an application, based on what it needs access to This needs to be extremely intuitive - much more so than the status quo (i.e. google, dropbox, etc), because we’re going to be dealing with much more diverse and complex information. However, we can also leverage the self-identifying nature of linked data and shapes to make this more intuitive. # Use Cases ## Alice and a Third Party Service Alice would like to allow a service called FooTest to store a diagnostic test result (blood test) into her Solid pod. This test result is considered within the logical scope of her personal health record, and very sensitive. This will be the first test result she store in her pod, and the first entry of any kind related to a personal health record. Note that this use case begins with FooTest being unauthorized, and also having no knowledge of whether or not Alice already has a place in her pod where test results are stored (i.e. as part of her personal health record.) **Alice goes to https://footest.example and orders a diagnostic test kit to be delivered to her home.** The test kit allows her to send a blood sample back to FooTest for processing. FooTest is unique because it is premised upon storing test results in your Solid pod, rather than a database the user doesn't control. This means that Alice must also authorize FooTest to store the diagnostic results into her Solid Pod when they are ready. **Alice is brought to an authorization page that she controls, bringing with her the AppID of FooTest.** - FooTest has information about the kind of shapes it needs in its AppID. - Her authorization interface can use that information to determine whether Alice already has data of that type in her pod, and to ask Alice if she wants to authorize access to the same, or even if she does, to store this data from FooTest in a different place. - Regardless, if she doesn't have a place already to store this data, an area in her pod will be prepared to store it, and then authorize FooTest to access it there. A footprint will be leveraged for these purposes. **FooTest receives the test kit, processes it, and goes to store the result in her pod.** 1. Initiates request to store a resource document into the pod 1. Uses discovery to inform where the test result should be stored. Discovery was updated during authorization time. Points her to the container, which has an associated footprint and shape validation for a diagnostic result. 1. Footprint informs how it should be stored. 1. Input matches the diagnostic shape so validation passes **Alice signs up for TotalHealth, which is an application that manages an entire personal health record.** - It realizes that one of its shapes - the diagnostic test - already exists in the pod, and needs to figure out how to incorporate it cleanly without moving it around. ## E-Sports Game Center with Tic Tac Toe, Chess & Foosball Ladder Ranking - Brackets are running for both Tic Tac Toe and Chess, with data coming from each app individually. - Each app is storing unique game data / shapes, but use common shapes to report scoring and participate in overall brackets. - Foosball data is being generated offline and entered in manually through the user (i.e. via a generic form model for any kind of sports data bracket) - New game registers and finds that there is a place for it to report its scores, and can do so seamlessly, can find a list of friends to play with. ## Discussion Forum for Solid - Someone acting as an administrator for a new discussion forum will bootstrap a workspace for that storage. Many non admin apps will expect already bootstrapped workspaces. - Users can interact with using any compatible app of their choice. This bootstrapping step would very likely involve: - setting up shapes used for validation and discovery - setting up footprints - setting up initial ACLs - Users would store their posts in their own pod, and the forum would reference these (or at least maybe there’s a copy/sync) ## Alice with Photos and Contacts * Alex storing photos and contacts in her data pod * Authenticating as Alice (not as a unique application) Questions: - How will we format that data? - Which shapes do we use? - Where will we put it? - Where things are stored and how they are wired out is covered by footprints - Who is (typically) responsible for footprint definition? - Tough to say. When are footprints going to be relevant? - They may also be relevant when reading - Have to consider re-shaping - Could look at a footprint as a database index - any well-built app can leverage this? - App arrives - In order to work, I need these kind of shapes (photo shape, contact shape). - Authorization question to Alice - "I see you have photos and contacts in these places, do you want to give me access to these?" - This may not work with her footprints - She may organize contacts by group, etc. - Organizes her photos by location - She wants to give access to everything from 2019 and not anytime before. - Footprints and permissions become tied together here. - The way around this is to use virtual footprints and virtual trees. - Apps should not have hard coded assumptions of how things are spread across files - its not their business. # Questions * What are the different organizing patterns for data in the pod? (e.g. Deep Footprints, Flat Hierarchy) * How do we ensure that: * Data is securely compartmentalized (which could be a virtual compartmentalization) * Applications that are dealing with complex data can find the information that they need * Applications can access what they need quickly and efficiently * We do not end up creating really complicated storage hierarchies that are difficult to work with now, and even more complex to deal with later as things evolve and mature? * Does broad search and query fall under the umbrella of data discovery? * What is the relationship with footprints? * Which end users care about how and where files are stored? * Which end users do we care about? * Who are footprints really for? * Are they for people? * Are they for applications? * If you access as a filesystem * When we write apps, they should use queries * Should only use footprints when query isn't available? # Brainstorm - Get rid of file-based access / everything is a triple store - Do we still need footprints? - Authorization - We want these to be orthoganol things -