Narrative Principles for Data Explorers

# Narrative Principles for Data Explorers Summary --- **Background of method**: Many open data portals focus on targeted search, however not everyone knows what data they are looking for, or which types of data are going to provide insight towards solving a problem. Therefore, data recommenders that interpret a query more 'loosely' and return a range of data for browsing that are related to the topic but do not directly answer the question, may be more useful to support exploratory data search. Narrative principles that identify overlap between datasets, based on potential stories that can be told across them, are one way to find connections between data. Narrative principles focus on identify overlap in setting (time and place) of datasets, as well as characters/attributes of the data and finally the theme. The higher level themes can often be derived from tags given to datasets for thematic filtering. More nuanced themes might be found by using natural language processing (NLP) methods on dataset descriptions and then computing similarity of descriptions. Description --- The following offers some examples of how narrative principles could be applied for finding and exploring data: **Describe:** A data set is described by 4 elements of time, place, characters and themes. Key questions relate to how to level of granularity of description for all of these elements, but especially time and place. Temporal descriptions need to include when data collection started, when it ended, or if it is ongoing and also how frequently data is collected - is it one time, or periodically and if so what is the update period. Geographical descriptions may identify many very specific locations (e.g. locations of sensors from which data is collected periodically, specific buildings, landmarks or trees) or may refer broadly to regions without pinpointing any specific entity that the data was derived from (e.g. demographic data that is published on a regional level, such as population density, or number of people in energy poverty). Most metadata schemes define approaches that can be used. **Merge:** this refers to the act of merging narrative descriptions of two or more data sets to create a single query. This could be two or more data sets that belong to a story, or that have been returned by a search query and somehow validated by a user as being potentially useful. **Query:** a search query is comprised of four narrative aspects, which are *time, place, characters* and *themes*. A user might construct a query directly, for example using keywords, filters, or even natural language They could even use an existing data story as as a starting point for a new search, if they are looking to expand on that story. In this case, the content of the story could act in a similar way to a natural language query, or if the story is directly linked to narratively described data then those individual or merged data set descriptions can themselves be a query. Similarly, a recommender could construct a query in the process of finding data sets to recommend. There are NLP methods for extracting entities and time periods from text and categorising them and this same approach could be used to broaden the search beyond just data sets but to provide additional background information for extra context in understanding the returned data, something that has been suggested could be useful. **Search:** searching is the process of using the query to retrieve a set of search results. The search mechanism might use the percentage of overlap across all four elements, time, place, characters and theme and could be weighted in different ways, to provide more emphasis for example of thematic similarity and less on geographical. **Overlap:** describes the amount of narrative overlap between two data sets across the four elements of time, place, character and theme. There are different methods available for measuring spatial and temporal overlap of data and the most appropriate may be dependent on the context and the types of data available. The descriptions of characters and theme can be compared using more simple measures such as cosine similarity of words. **Proximity:** in addition to identifying data that overlaps to a search query, it can be useful to identify and recommend data sets that are proximal to the query but not directly covered by it. For example, data that is from a neighbouring region or just outside the time period, but with no overlap. This supports the expansion of a data narrative, which could lead to new characters, areas, time periods being included to a new data story. **Recommend**: in addition to returning data that directly matches a query, a recommender may suggest data based on its proximity to a search result. In addition to proximity of settings, characters and themes, this could also include data that is proximal by prior use, for example data sets that have commonly been used together. This is where creating additional data events based on re-use, not just raw data collection, may be useful. These principles can be combined with the [Data Explorer Tool.](/HJ2OQWGPj) **Method originators:** Annika Wolff **Further Reading** [Open Data Inclusion through Narrative Approaches](https://lutpub.lut.fi/bitstream/handle/10024/164483/wolff_et_al_open_data_inclusion_aam.pdf?sequence=1&isAllowed=y) ###### tags: `datascape toolkit` `toolkit` `method`