This working paper proposes the development of a repository to collect references and documentation about datasets curated and created by Data Design + Art students, faculty and staff.
DATAFUGE
(from data + refuge; a working title) will be a next generation Data Portfolio developed at the HSLU. Behind-the-scenes of our data skills accelerating programs, this will be the new home of data collection efforts and experimental prototypes, complementing the visual and coding work of our students.
Our data portfolio builds on and connects to existing websites like the 2022 📘 DD+A DATA FOLIES Wordpress site, which documents the final projects of students - the 📘 DataDesignandArt.ch online showreel (shown above), Instagram social media channel, and other 📘 public references.
It should connect well with student's own websites, as well as with modern 3rd-party design portfolios - as well as being useful in next-generation Web, Metaverse, and IRL design publications. Created on global principles of data journalism and activism, DATAFUGE should offer our students support in running their own data gathering and exploration efforts.
Our inspiration for the data portfolio are efforts to collect threatened and undervalued data on the Internet for humanitarian and activist purposes, such as Data Refuge that became famous at the beginning of the Trump administration, or more recently SUCHO in the wake of the war in Ukraine.
Data Refuge (or DataRescue) is a public and collaborative project designed to address concerns about federal climate and environmental data that is in danger of being lost.
A good data refuge initiative uses collaborative tools and easy-to-use repositories, designed to automatically track and generate data about the whole process. But how do we get this started?
Having a low barrier to entry combined with hands-on activities in short sprints can be used to quickly fill out a 'data space' for the project, categorizing data sources according to topic, process, administrative level, etc. This is typically done in a series of "sprints", inspired by hackathons.
Using DATAFUGE, the students should be empowered to participate in Data Expeditions and hackathons, and learn to set up their own. More than just a 📘 trending way to 📘 get recruited, these public laboratories for experimentation in a social setting leverage open collaboration akin to focus groups, enable civic engagement through ethical hacking, stimulate more diversity in tech (see preprint Hevenstone et al 2023), and use pop-up events for corporate social responsibility.
The use of Hackdays or Hackathons for social impact (rather than for performance & prizes) and to promote data activism has led to many "bootstraps": tools and data resources, capabilities to track progress of teams, documentation of results in real time from across online repositories like CKAN, GitHub and GitLab.
It is the pride and responsibility of professional data designers and artists to include references to their sources, or "facts", that are at the foundation of their generative process.
Screenshot from portfolio of Kirell Benzi
We will develop a storybook, design system, and/or set of code components, that can be used by students to reference their works online - with a reference that can be pasted onto a blog, webpage, online profile, etc. - and offline, via (short)link or QR code. The way that datasets are linked today from scientific publications can serve as a basic reference.
Screenshot of a dataset on Zenodo
These components will have a consistent visual identity and set of metadata. They need to be easy to use, and easy to read. The information must be reliable, and back the work up with authenticity and reproducibility. A good example are the Creative Commons badges, which have been adopted by millions of content creators around the world.
Closer to home, we can build on the work of opendata.swiss, which has provided a legal and technical basis, and visual vocabulary, for discovering Open Government Data in Switzerland since 2016.
Screenshot of opendata.swiss search
This portal is based on CKAN, an open source platform developed by the open data community to assist with data publication efforts, used by thousands of institutions around the world. We are already helping to deploy CKAN for the 📘 NADIT project, and will use this opportunity to test it for our needs and contrast with alternatives.
Inspired by Data Refuges, we aim to propose a set of tools for use as a Data Portfolio, with a focus on these five concerns:
Filesystems inevitably become messy over time. A single search engine based on the context of use will help us to quickly retrieve our data references and links when we need them.
Keeping track of sources and ownership, as well as knowing who (for example, through a student project) has experience with a particular dataset or type of data will be extremely valuable down the road.
In addition to basic metadata, we will expand on factors - from financial and legal to technical and ethical - that present barriers to the use of certain data sources, representing our experience…and as a reminder, to revisit and test our assumptions in the future.
Being able to see how data is being used across domains, and quickly reference and expand on these in a knowledge base, will allow us to map the landscape of data usage across topics, sources, and collection methods.
It is not only interesting to us how we use data, we also want to keep track of other interesting projects across academic, industrial and citizen science communities. These may be local to our region or halfway across the world. We would like to document some of the similarities and differences in our respective contexts of use.
These powerful instruments on top of methods like Data Pipelines to structure the conversion of 'raw' data discoveries to maintainable data artifacts, make for a great learning environment for budding data wranglers. We therefore propose to build DATAFUGE on the foundation of the open source tool dribdat - which has been running in alpha form at DDA.schoolofdata.ch since 2021 within DD+A and familiar to our students.
DRIBDAT
(from "driven by data") is an open source application, inspired by the design sharing site Dribbble, and developed within the Swiss open data community. It is maintained by @loleg, DD+A lecturer and one of the top GitHub committers in Switzerland. See more history in the Whitepaper.
To better understand the context of DATAFUGE, we can look at their semester and final projects, to see how data is referenced in their notes. We can survey them to find out which tools (Dropbox Paper, Notion, etc.) and sites (Behance, Instagram, etc.) are most popular destinations to get inspired by other data artists. We will ask their perspective on the topic of a data portfolio, and involve them in the R&D process.
Two Frictionless Data Packages attached to notes on dribdat.
Students of DD+A 20, 21 & 22 have already used the dribdat platform to share progress of their work, embed open data and Data Packages in their notes. They have provided interesting feedback and had an impact on the development of the project.
Screenshots of a Data Pipeline showing class progress, and an individual student profile on dribdat.
Students need to be able to experiment with digital publication in a safe setting. They should be able to attach glimpses into their visual exploration, concepts, design and production process. It should be possible to see the data analysis go through stages. These are represented with colours and progress bars.
In this sense, the School of Data Pipeline as built into dribdat has now been evaluated in both an activist setting, and adopted in the classroom. We will discuss with DD+A faculty and lecturers to get their opinions on what methodological and pedagogical aspects are relevant to this project. By having direct access to the development of open source software, we can create a custom-tailored solution for our department.
Students tend to complain, avoid, and procrastinate on their process logs. If the environment within which they are developed was more smart, if it would actually enhance their work in some way, they may form a better relationship to it.
Recent discussions such as 📘 Race against the machine with Prof. Dr. Peter Kels & Kai Dröge point to the urgency with which machine learning technolgoies are adapted to 'human' problems. What about tools for note-taking and documentation?
We asked A.I. for help with our task, and here is the suggestion:
Screenshot of OpenAI / ChatGPT
A strong data art portfolio should include a variety of projects that demonstrate the artist's technical skills, creative vision, and ability to use data in a meaningful and thought-provoking way. Some key elements that can help make a portfolio stand out include:
To what extent the originality, wit and provocativeness of a data project can be expressed in this way still needs to be defined. Questions about the ethics of the use and application of A.I. can in this way be directly addressed in the design of the platform.
In a later stage of our research, we will look into ways of complementing data publications with deep searching, made possible today using the Web of Linked Data and Deep Learning. A major present-day challenge is how to render A.I. explainable, rather than having a black-box solution.
On one hand, this is a question of having an open source approach: working with modular components that are transparent to the user, with well attributed open data sources a "no brainer" - to be able to reproduce and modify every part of the system. On the other hand, this means responding to a set of ethical constraints around how data is collected, whom it (does or does not) represent, and how a healthy level of privacy is respected.
In that sense, our DATAFUGE should have a kind of "Nutri-Score" that represents a subjective and objective evaluation of the substituent qualities of data ingredients. There are strong methodologies and compelling projects in this area, such as Open Data Badges and Open Science Framework that we can leverage. It will also be interesting to study how these indicators correlate with 📘 digital marketing data.
Screenshot of an Altmetric score
An initial prototype of this is the dribdat check it worksheet created as a classroom support at DD+A last year.
Such self-assessments could form an integral activity, and provide the basic content for the DATAFUGE.
To launch the project we define in the next section
Specific user groups to be detailed as personas:
(1) Content
(2) Interface
(3) Integration
(4) Magic
Total: 12 - 25 working days
_ _ ___ _ __ ___ _ __