D A T A F U G E

DATA PORTFOLIO FOR DD+A

This working paper proposes the development of a repository to collect references and documentation about datasets curated and created by Data Design + Art students, faculty and staff.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Introduction

DATAFUGE (from data + refuge; a working title) will be a next generation Data Portfolio developed at the HSLU. Behind-the-scenes of our data skills accelerating programs, this will be the new home of data collection efforts and experimental prototypes, complementing the visual and coding work of our students.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Our data portfolio builds on and connects to existing websites like the 2022 📘 DD+A DATA FOLIES Wordpress site, which documents the final projects of students - the 📘 DataDesignandArt.ch online showreel (shown above), Instagram social media channel, and other 📘 public references.

It should connect well with student's own websites, as well as with modern 3rd-party design portfolios - as well as being useful in next-generation Web, Metaverse, and IRL design publications. Created on global principles of data journalism and activism, DATAFUGE should offer our students support in running their own data gathering and exploration efforts.

What is (a) data refuge?

Our inspiration for the data portfolio are efforts to collect threatened and undervalued data on the Internet for humanitarian and activist purposes, such as Data Refuge that became famous at the beginning of the Trump administration, or more recently SUCHO in the wake of the war in Ukraine.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Data Refuge (or DataRescue) is a public and collaborative project designed to address concerns about federal climate and environmental data that is in danger of being lost.

Wikipedia

Project-based exploration

A good data refuge initiative uses collaborative tools and easy-to-use repositories, designed to automatically track and generate data about the whole process. But how do we get this started?

Having a low barrier to entry combined with hands-on activities in short sprints can be used to quickly fill out a 'data space' for the project, categorizing data sources according to topic, process, administrative level, etc. This is typically done in a series of "sprints", inspired by hackathons.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Using DATAFUGE, the students should be empowered to participate in Data Expeditions and hackathons, and learn to set up their own. More than just a 📘 trending way to 📘 get recruited, these public laboratories for experimentation in a social setting leverage open collaboration akin to focus groups, enable civic engagement through ethical hacking, stimulate more diversity in tech (see preprint Hevenstone et al 2023), and use pop-up events for corporate social responsibility.

The use of Hackdays or Hackathons for social impact (rather than for performance & prizes) and to promote data activism has led to many "bootstraps": tools and data resources, capabilities to track progress of teams, documentation of results in real time from across online repositories like CKAN, GitHub and GitLab.

What is (a) data portfolio?

It is the pride and responsibility of professional data designers and artists to include references to their sources, or "facts", that are at the foundation of their generative process.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Screenshot from portfolio of Kirell Benzi

We will develop a storybook, design system, and/or set of code components, that can be used by students to reference their works online - with a reference that can be pasted onto a blog, webpage, online profile, etc. - and offline, via (short)link or QR code. The way that datasets are linked today from scientific publications can serve as a basic reference.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Screenshot of a dataset on Zenodo

These components will have a consistent visual identity and set of metadata. They need to be easy to use, and easy to read. The information must be reliable, and back the work up with authenticity and reproducibility. A good example are the Creative Commons badges, which have been adopted by millions of content creators around the world.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Closer to home, we can build on the work of opendata.swiss, which has provided a legal and technical basis, and visual vocabulary, for discovering Open Government Data in Switzerland since 2016.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Screenshot of opendata.swiss search

This portal is based on CKAN, an open source platform developed by the open data community to assist with data publication efforts, used by thousands of institutions around the world. We are already helping to deploy CKAN for the 📘 NADIT project, and will use this opportunity to test it for our needs and contrast with alternatives.

What concerns us

Inspired by Data Refuges, we aim to propose a set of tools for use as a Data Portfolio, with a focus on these five concerns:

1. How to make sure we don't lose our data?

Filesystems inevitably become messy over time. A single search engine based on the context of use will help us to quickly retrieve our data references and links when we need them.

2. Who is who in collecting, managing, applying data?

Keeping track of sources and ownership, as well as knowing who (for example, through a student project) has experience with a particular dataset or type of data will be extremely valuable down the road.

3. What factors impact the accessibility of data?

In addition to basic metadata, we will expand on factors - from financial and legal to technical and ethical - that present barriers to the use of certain data sources, representing our experienceand as a reminder, to revisit and test our assumptions in the future.

4. Which projects and topics are linked to what data?

Being able to see how data is being used across domains, and quickly reference and expand on these in a knowledge base, will allow us to map the landscape of data usage across topics, sources, and collection methods.

5. Which datasets are of value to other communities, and why?

It is not only interesting to us how we use data, we also want to keep track of other interesting projects across academic, industrial and citizen science communities. These may be local to our region or halfway across the world. We would like to document some of the similarities and differences in our respective contexts of use.

Hello, dribdat

These powerful instruments on top of methods like Data Pipelines to structure the conversion of 'raw' data discoveries to maintainable data artifacts, make for a great learning environment for budding data wranglers. We therefore propose to build DATAFUGE on the foundation of the open source tool dribdat - which has been running in alpha form at DDA.schoolofdata.ch since 2021 within DD+A and familiar to our students.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

DRIBDAT (from "driven by data") is an open source application, inspired by the design sharing site Dribbble, and developed within the Swiss open data community. It is maintained by @loleg, DD+A lecturer and one of the top GitHub committers in Switzerland. See more history in the Whitepaper.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Initial experiments

To better understand the context of DATAFUGE, we can look at their semester and final projects, to see how data is referenced in their notes. We can survey them to find out which tools (Dropbox Paper, Notion, etc.) and sites (Behance, Instagram, etc.) are most popular destinations to get inspired by other data artists. We will ask their perspective on the topic of a data portfolio, and involve them in the R&D process.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Two Frictionless Data Packages attached to notes on dribdat.

Students of DD+A 20, 21 & 22 have already used the dribdat platform to share progress of their work, embed open data and Data Packages in their notes. They have provided interesting feedback and had an impact on the development of the project.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
 
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Screenshots of a Data Pipeline showing class progress, and an individual student profile on dribdat.

Students need to be able to experiment with digital publication in a safe setting. They should be able to attach glimpses into their visual exploration, concepts, design and production process. It should be possible to see the data analysis go through stages. These are represented with colours and progress bars.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

In this sense, the School of Data Pipeline as built into dribdat has now been evaluated in both an activist setting, and adopted in the classroom. We will discuss with DD+A faculty and lecturers to get their opinions on what methodological and pedagogical aspects are relevant to this project. By having direct access to the development of open source software, we can create a custom-tailored solution for our department.

Are you for Real?

Students tend to complain, avoid, and procrastinate on their process logs. If the environment within which they are developed was more smart, if it would actually enhance their work in some way, they may form a better relationship to it.

Recent discussions such as 📘 Race against the machine with Prof. Dr. Peter Kels & Kai Dröge point to the urgency with which machine learning technolgoies are adapted to 'human' problems. What about tools for note-taking and documentation?

  • You could say that Dropbox Paper is usable.
  • You could say that Notion is flexible.
  • But what tool today gives you superpowers?

We asked A.I. for help with our task, and here is the suggestion:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Screenshot of OpenAI / ChatGPT

A strong data art portfolio should include a variety of projects that demonstrate the artist's technical skills, creative vision, and ability to use data in a meaningful and thought-provoking way. Some key elements that can help make a portfolio stand out include:

  • Diverse projects: Showcase a range of projects that demonstrate your ability to work with different types of data and create a variety of data-driven visualizations and interactive experiences.
  • Technical skills: Show that you have a strong understanding of the technical tools and methods used to create data art, such as programming languages, data visualization libraries, and machine learning techniques.
  • Creativity: Showcase your ability to use data in a way that is both visually interesting and meaningful. Highlight projects that explore unique perspectives and push the boundaries of what's possible with data.
  • Attention to detail: Make sure that your portfolio is well-organized and easy to navigate, with clear explanations of the data and techniques used in each project.
  • Ethical considerations: Be transparent about any biases or limitations in your data, and highlight projects that explore ethical and social issues related to the use of data.
  • Storytelling: Showcase your ability to tell a story with your data and how it can be used to convey a message or a meaning.

To what extent the originality, wit and provocativeness of a data project can be expressed in this way still needs to be defined. Questions about the ethics of the use and application of A.I. can in this way be directly addressed in the design of the platform.

X-A.I.

In a later stage of our research, we will look into ways of complementing data publications with deep searching, made possible today using the Web of Linked Data and Deep Learning. A major present-day challenge is how to render A.I. explainable, rather than having a black-box solution.

IBM XAI Toolkit

On one hand, this is a question of having an open source approach: working with modular components that are transparent to the user, with well attributed open data sources a "no brainer" - to be able to reproduce and modify every part of the system. On the other hand, this means responding to a set of ethical constraints around how data is collected, whom it (does or does not) represent, and how a healthy level of privacy is respected.

In that sense, our DATAFUGE should have a kind of "Nutri-Score" that represents a subjective and objective evaluation of the substituent qualities of data ingredients. There are strong methodologies and compelling projects in this area, such as Open Data Badges and Open Science Framework that we can leverage. It will also be interesting to study how these indicators correlate with 📘 digital marketing data.

Screenshot of an Altmetric score

An initial prototype of this is the dribdat check it worksheet created as a classroom support at DD+A last year.

Such self-assessments could form an integral activity, and provide the basic content for the DATAFUGE.

Resourcing

To launch the project we define in the next section

  • Target audience
  • Milestones and Time budget
  • Cloud and Tech budget
  • Deliverables

Target audience

Specific user groups to be detailed as personas:

  • Students
  • Teachers
  • Staff
  • Visitors
  • Public

Milestones

(1) Content

  • Create a DD+A theme for the platform
  • Set up project template for our students
  • Improve and expand the resource library
  • Customize and tweak the stage rules

(2) Interface

  • Simplify updating of project logs with drag and drop
  • Use notifications to push advice and reminders
  • Implement gamification elements and progress reports
  • Wrap into a mobile app for easier access by sudents

(3) Integration

  • Create an easy access for staff to update the site
  • Write documentation for fellow lecturers and staff
  • Record screencast, run a training session
  • Install with official IT support

(4) Magic

  • Run a hackathon to develop further improvements in-situ
  • Evaluate project logs with NLP for structural analysis
  • Extrapolate content with predictive AI like ChatGPT
  • Futuristic portrayal of a data portfolio in the year 2050

Time budget

  1. Content 3 - 5 days
  2. Interface 4 - 7 days
  3. Integration 2 - 5 days
  4. Magic 3 - 8 days

Total: 12 - 25 working days

Cloud and tech budget

  • Hosting a Wordpress website on HSLU infrastructure is free, but we may be limited in the kinds of plugins and themes we can install.
  • A typical small website and domain hosting package is around 100 CHF per year.
  • Dribdat requires a small server and set of cloud services at about 500 CHF per year for a commercial service. It remains to be defined if this could be deployed on HSLU infrastructure, and at what cost.
  • Deploying a CKAN instance requires commerical server infrastructure for at least 1000 CHF per year. Lower cost alternatives like JKAN or Datacentral should be considered.
  • A cloud data science environment for training GPT models could have a high hourly cost and needs to be appropriately budgeted. Offline hardware (graphics workstations) or university infrastructure (datascience.ch) can be used instead.
  • For other collaboration needs free open source code projects and planning boards will be sufficient.

Further reading

Inspiration

_ _ ___ _ __ ___ _ __

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
__ __ _ __ _ __ __ __ __

CC BY 4.0