Jenny Yu, Bob Zeleznik
October 20, 2023
Dash is a collaborative, browser-based hypermedia system designed to help knowledge workers build and navigate complex relationships between content in the form of different media. Users can create, attach, link, and group multimedia content, such as text, image, video, PDF, etc., on different types of visual layouts. They can also weave them into hypertrails for interactive exploration and presentation.
This documentation is created to provide an overview for the system architecture and design principles that guide the development of Dash. It also documents some problems that the team has run into, how we solved or are working to solve them. Hopefully this can provide some insights into Dash's data and system model and serve as an introductory reference for future development work.
If you find any of the concepts interesting and inspiring, or have suggestions for how we can improve, we would love your contribution!
Traditional documents enforce rigid separations between media types (e.g., .doc, .jpg, .mp4). Correspondingly, applications have been designed to suit the capabilities of specific medium, which limit users' ability to work with unstructured, multimedia data. Hypermedia scholars want to access, view, and organize their documents beyond ways that are predetermined by application developers. As Rosemary Simpson, a data organizer expert and a valued member of the Dash research group, put it,
I have a lot of data. I don't know what they are all about. I want an application where I am able to throw them all in, add metadata to it, and figure out how to represent or relate them over time.
Dash thus evolved with the goal of breaking down the barriers between media types and empowering users to customize their own data environment. It strives to seamlessly integrate any data format and give users complete freedom over how they display, organize, or use their information.
As you delve deeper into this document, you'll discover how Dash, in pursuit of this ambitious objective, has designed its data and architecture around the principle of flexibility. Although the team is still working on balancing the idea of customizability and usability, Dash's flexible design makes it a platform for researchers and developers to constantly explore novel approaches for accessing, linking, and presenting data. Over the years, many student developers have added new components, views, and support for additional media types and use cases.
As you learn about the architecture, we encourage you to look at it through both an appreciative and critical lens: What are some tradeoffs of the design? Given its power and limitations, how can we best leverage it to empower users to create truly dynamic and personalized data interaction experiences?
In Dash, anything that is intended to be persistent or shareable is a Dash document (shorthand as Doc). For example, a text, a button, an image, a PDF. Docs can also be a composite of or reference other Docs. For example,
Imagine embedding audio recordings and photos on your travel map, annotating your meeting recording with links to other PDFs, or recording videos on your presentation slides. This flexible and generalized notion of 'Doc' allows different media to be seamlessly integrated with each other and gives users full flexibility over what they create and how they create it.
Dash represents all Docs with Fobs (flexible objects). A Fob is defined by the following properties:
On the client and server, Fobs are just JavaScript objects. In the database, every Fob maps directly to a MongoDB document. (MongoDB is a NOSQL DBMS that stores data in the form of key-value pairs, called documents. Each field in a Fob is a field in the MongoDB document.)
Below are some naive examples of what Fobs may look like:
Dash Fobs are not restricted to a single "type" (i.e. we do not need to define the type explicitly by putting fields such as "type: text" or "type: PDF" on the Fob). Instead, they have different facets, or sets of fields that enable certain behavior or capabilities. For example,
data
field that stores a link to the PDF/image.onClick
and onHover
.Different React components (E.g. ImageBox, Textbox) render specific facets and will look for the fields they need. If a required field is not present, then it will show the default behavior specified by the develper. For example, if the onClick
behavior is not defined for a button, then the button simply gets selected when a user clicks on it.
This design offers a few benefits:
Reusable Components: Instead of restricting that only certain "types" of Fob can be passed to a specific React component, any Fob can be passed to any component as long as it has the required facets. For example,
annotations
field that stores a list of annotations, which are individual Fobs with specified layout information.Extensibility: Developers can easily create new components that render Fobs differently based on existing or new facets, enabling creative ways of representing data. For example,
Flexibility: There is no need to manage a complex type hierachy. Depending on the facets present, Fobs can share features/capabilities without any inheritance relationship. They can also gain new facets and capabilities by adding fields or combining old and new fields, without needing to change types.
Dash distinguishes between a Doc's intrinsic content (e.g. text in a textbox, url to a PDF) and its contextual layout information for rendering (e.g. x, y position, width, height). The layout information is represented using a set of fields that constitute the "layout facet", which, in most cases, is encapsulated in a separate Fob called the Layout Fob. The Layout Fob contains a proto
field, which stores a reference to the underlying Content Fob that represents the actual data.
With this separation, multiple Layout Fobs can reference the same Content Fob, allowing the same content to be viewed and modified in in different visual contexts. For example, an image Content Fob can have different Layout Fobs that crop or resize it differently. All the Layout Fobs would point to the one source Content Fob and therefore display the same image.
If any Fob can be pased to any React component, which component do we actually use to render a Fob?
In addition to position and size information, Layout Fobs also have a layout
field, which stores a JSX string that can be parsed at runtime to instantiate a specific React component. When a piece of data gets created or imported into Dash, this field is automatically generated based on the file type or creation method, but may be changed if a user chooses to open the same Doc in a different way.
"Doc" is a user-facing, abstract concept that refers to the items that users see and interact with. Under the hood, all Docs are all represented by one or more Fobs. The client, server, and database don't know about the idea of "Docs" at all. All they see is the representation of Docs in the form of Fobs - the server asks the database for the Fobs, passes them to the client, which then renders them to become the "Docs" that user see and interact with.
Dash uses the React.js framework to construct its rich hypermedia displays. Each Doc that the user sees and interacts with on the screen is a media component wrapped inside a container component called DocView.
DocView has UI elements and general hypermedia features that are shared between all components, regardless of the media type. Some examples include a title bar, ability to be enlarged/shrinked/dragged on the screen, and common hypermedia features such as annotating, linking, and highlighting.
Inside this container, Dash uses different React components to describe how Docs of different media type should be rendered (e.g. ImageBox for images, VideoBox for videos, etc.). Some of these components use external libraries, such as ProseMirror for rendering rich text and Pdf.js for rendering PDFs. Depending on their purpose, some media components, such as FreeformLayout, may contain other DocView components.
For example, in the following image, we are seeing a FreeformLayout, which contains a piece of text and an image.
When they are rendered, the FreeformLayout is a FreeformLayout component nested inside the DocView component. Inside FreeformLayout, the image and the text are an ImageBox and a TextBox component each nested inside a DocView.
When rendering a Doc at runtime, Dash first creates a DocView component. In addition to rendering the shared UI elements, the DocView component takes in a Layout Fob as a prop and parses the layout
field to instantiate a specific media component. Note that since every Layout Fob has reference to the Content Fob through the proto
field, the DocView component also has access to the content information of a Doc.
The below diagram illustrates how Dash creates different nested media components based on data stored in the Layout Fob
In the simplified diagram, the DocView component on the left takes an layoutFob as a prop and renders an HTML div element where common UI, such as title, can be placed. Inside the div, DocView uses a JSX parser to create a specific media component from the layoutFob's layout
field, passing down the layoutFob as a prop.
The media components then query the layoutFob for the facet that it expect. For example, a PDFBox component tries to find the URL to the PDF in layoutFob.data
and passes the URL to PDF.js to handle the rendering. A FreeformLayout component tries to find the list of Fobs that should be rendered on the freeform view in layoutFob.data
, iterates over the list, and creates a DocView component for each of these child layoutFobs at the specified location on the screen.
What is fieldKey
?
Instead of hardcoding the name of the field that a component expects (e.g. data
in the above examples), the name is passed as a prop; components are designed to query facets based on the fieldKey passed in. This allows components to be reused even if the expected facet is stored in different fields across Fobs. For example, when being used to render a list of annotations, the FreeformLayout component could look for the list of Fobs in layoutFob.annotations
instead of layoutFob.data
.
The Dash server hosts various API endpoints for actions such as user sign up, log in, log out, uploading a file, etc. However, most of the requests that a client sends to the server are sent using WebSocket, which is a bidirectional protocol for client-server communication. When a user logs in and enters the main Dashboard editing page, a websocket is created between the client and the server.
Compared to REST APIs, WebSocket is a stateful protocol that uses a single TCP connection for data exchange. The connection stays alive until it is terminated by either end, enabling low overhead per message and making it suitable for low-latency, high-frequency communication scenarios.
Dash uses an optimistic approach when writing to the remote server database, giving users immediate feedback to their actions. Whenever a user adds, deletes or modifies a field in a Doc, the component state immediately updates (i.e. change is shown on the UI). At the same time, a message is sent to the server to register the change in the database. If the server is unavailable, this message will not result in a database write and, unfortunately, Dash, as currently instrumented, will not reverse the change. This means that a Dash client can get out of sync with the database, and when the user refreshes their app, their out of sync changes will be lost. (Read more about optimistic v.s. pessimistic rendering)
To avoid loss of work, the Dash client continually pings the server to know that it is alive. If an outage is detected, an alert is presented to the user. The user is free to continue exploring, knowing that all work will be lost. In many commercial system, (e.g. Google Suite applications), the client will keep track of changes when the server is disconnected and will attempt to sync them with the database when the connection is restored. This is something that is on the Dash future development wish list.
Server-based systems introduce performance considerations due to the time and latency involved in transferring data from the server to the client. There are two primary approaches to request data from the server:
Both approaches have their trade-offs and Dash uses a balance of the two: Dash retreives a user's current working set of Docs (i.e. all Docs that were open in the user's previous session) on startup, and retrieves other Docs (e.g. closed Docs, newly shared Docs) on demand. For each user, Dash stores on the server and in the database a list of Doc IDs which correspond to all the Docs in the user's last viewed dashboard, including files they have authored or been shared with. When Dash is first loaded, the client retrieves the list of Doc IDs from the server. The client then periodically updates this list and sends it back to the server.
Not all data is directly stored as field values. For instance, media information, such as videos, images, and PDFs, is stored in the form of string URLs instead of big binary blobs. Therefore, only the URLs are retrieved at startup, and the full binary data is fetched when needed for rendering.
For native HTML types like <img> and <video>, the browser can stream the bytes pointed to by the URL and render them on-demand. As for non-native types, such as PDFs, Dash asynchronously queries the PDF bytes and utilizes an external library called PDF.js to render them. However, RTF text content is directly stored as a string inside the Fob for the text Doc and is always retrieved on startup, regardless of whether the Doc is rendered or not.
The below diagram illustrates (on a high level) how the client, server, database interact:
(Step 12 and 13, which are related to syncing changes acorss multiple clients, will be explained in the next section)
Dash supports real-time collaboration. In collaborative applications, the goal is to eventually attain the property of WYSIWIS (what you see is what i see). That is, if multiple users make changes to a document at the same time, their views of the document might look different temporarily, but should eventually converge to the same. In this section, we describe how the server and client work together in Dash to achieve this.
for most concurrent editing, Dash simply uses LILO, so with simultaneous edits, one user’s edits will be discarded, however for lists, Dash uses a naive type of OT to try to do valid merges for both users
When Dash clients make updates to a Fob, they send messages to the server to register the change. For most Fobs, Dash adopts a "last in, last out" (LILO) approach. That is, if conflicting changes to a Fob field occur, the server resolves the conflict by favoring the most recent change sent to it, overwriting all previous changes. This is not the ideal user experience - if two people are editing different parts of a text document, users would expect both changes to be kept.
One method is Operational Transformation (OT). This is Wikipedia's explanation of OT:
Collaboration systems utilizing Operational Transformations typically use replicated document storage, where each client has their own copy of the document; clients operate on their local copies in a lock-free, non-blocking manner, and the changes are then propagated to the rest of the clients. When a client receives the changes propagated from another client, it typically transforms the changes before executing them; the transformation ensures that application-dependent consistency criteria (invariants) are maintained by all sites.
In simple terms, in OT, when multiple clients are making changes to the same Fob, the server would look at the changes and intelligently "transform" or merge them.
Another approach that evolved later is Conflict-free Replicable Data Type (CRDT). The technical details of CRDT are beyond the scope of this documentation. On a high level, OT relies on an active server connection to coordinate and guarantee all clients operate correctly, whereas CRDT is capable of working peer-to-peer with end-to-end encryption and is more resilient to transient network connections.
For list/array-type information in Fobs, Dash supports merging conflicting changes and operates in a way more similar to OT. (Without merging, if two users are modifying a collection at the same time, then the document added by one user would disappear after the server reports the LILO change). Supporting change merging for non-list data types is an area of future work.
After the server registers the change to the database, it broadcasts the change to all connected clients via the websockets established (except for the client who published the change).
Every Dash client maintains a local cache of the MongoDB documents that is relevant to them (in the form of Fobs). They keep track of the Fob properties as observable states, monitor changes in the Fobs, and respond to them accordingly.
To keep track of states, Dash uses a state management library called MobX, instead of pure React states. (i.e. you will not see pure React functions such as setState
or hooks such as useState
, but rather MobX annotations such as @observable
, @action
, @computed
) Similar to React, MobX allows creating derivations – computations or side effects that automatically respond to state changes. However, whereas any change in pure React state would trigger a re-render of the entire UI, MobX only updates the derivations that are affected by the changing state, avoiding unnecessary re-rendering and improving performance.
When a client receives updates to any Fob from the server, it first checks if this change is relevant to them (i.e. if the Fob is locally cached). If it isn't, the message gets ignored. If it is, it updates the Fob field in the local cache. As MobX observes these values, it triggers any React component (or other derivations) that depend on these values to recompute/rerender. As such, whenever a client modifies a field in a Fob, any data or UI that depends on this field will be automatically reflected on other clients.
The below diagram illustrates an example of syncing changes across different clients:
In this diagram, user A modifies the title of a text Doc, which updates the title field of the Fob (1). On client A, MobX sees the change in the Fob and immediately triggers any React component that has referenced the field to re-render (2). In addition, client A also sends a message to the server notifying the change (2). The server unpacks the message and updates the Fob in the database (3). If the update is successful, the server broadcasts a message to all other connected clients via the established websockets (4). Other clients, upon receiving the message, checks whether the target Fob is relevant to themselves (5). If it isn't, the message gets ignored. If it is, the client updates the local Fob. MobX sees the change and triggers components that have referenced the field to re-render (6).
Note: Dash is designed in a way such that when any Fob field changes, the client automatically notifies the server. Developers do not make explicit calls to the backend to register the change.
Additional Resources
Dash supports the creation of links–a bi-directional connection between two anchors, which can be an entire document, a phrase within a long text document, an annotation on a pdf, etc.
A link is represented by a Fob with the link facet, which consists of
anchor
fields, each containing a references to another FobshowPath
) and directions for what to do when the link is followedOn a high level, this is how a link is represented:
Note that the anchors are typically references to Layout Fobs, rather than Content Fobs. This allows users to follow links to reach specific rendering/instances of a Doc. If a link points to one Layout Fob, all other Layout Fobs that share the same underlying Content Fob will be able to access the link (For example, a PDF may be linked to some note that summarizes the PDF. If the PDF is opened in another view (i.e. has another Layout Fob), the user can still see this link and follow it to the summary note). To achieve this, for every Layout Fob, Dash asks the underlying Content Fob for all Layout Fobs associated with it, and then asks these "sibling" Layout Fobs for their links.
In addition to storing the individual link Fobs in the database, Dash also keeps a linkCollection Fob for each user that contains all the links created by the user. This collection of links is retrieved from the database when Dash is loaded.
In the UI, there is a visual indicator for the number of links associated with each Doc. When the client is starting, it computes this number for each Doc by going through all link collections and counting the number of links that reference this Doc as one of the anchors. Although this computation is linear in the number of links created, it does not empirically present a performance issue since the number of links in total is much less than the number of Docs that exist.
The below animation illustrates the steps that Dash goes through when a user selects a portion of a text and links it to a selection on an image:
annotations
field in the original text Fobannotations
field in the original image Fobanchor_1
and anchor_2
, respectivelyIf instead of linking between selected portions of a text and image, we were to directly create a link between the entire piece of text and image, then Dash would just omit step (1) and (2) above and directly create a link Fob pointing to the Fobs representing the entire text and image:
In the process of creating links, the propagation of the newly added Fobs to the server and to all other users is handled in the same way as any other document change.
One functionality supported by links in Dash is the ability to change the state of anchors when a link is followed. In the following example, the user has created two links: one between button 1 and the text box on the right, one between button 2 and the text box. When the links were created, we saved the state (including the content and layout) of the textbox and allow the state to be restored when the link is followed.
This functionality is achieved with the use of Config Fobs, which specify the state that needs to be changed/restored on a Fob when the link is followed. Instead of directly pointing to the Layout Fob of the textbox in the anchor_2
field, the link points to the Config Fob, which in turn points to the Layout Fob of the textbox. When the link is followed, every config_<field>
on the Config Fob are copied onto the <field>
on the Layout Fob, thereby changing the state of the anchor. In the example below, the x/y coordinates and the data information will get copied, moving the textbox to the specified location and changing its content.
At Dash, we try to bridge the gap between different media forms and give users full control over how they create, organize, and share information. We are actively exploring novel ways to achieve this and have also been working to provide better documentation, tutorials, and examples to guide users in designing a workspace that is most suited to their needs. If you have any suggestions on how to make it better, we'd love to hear your thoughts!