# (0) Introduction ###### tags: `session-2-chunkified` ## Welcome back to Exploring the World with Maps and Data! #### In this session, we will * dive deeper into different types of data * introduce projections * go over the concept of joining spatial and non-spatial data * review the different file types you might encounter when making a map * have a closer look at the LMEC Public Data Portal and learn about metadata. In the last session, we * introduced the basics of maps and data * discussed the differences between maps and the data that underlies them * discussed the conscious and unconscious biases engrained in the maps we view and create ## Human bias in data * What gets lost in translation between spreadsheet and map? * Distortion actually begins with data collection (and lack thereof) * Some helpful vocabulary for identifying this distortion: * **Response bias** * **Non-response bias** * **Sample size** * **Survey design (e.g. leading questions)** * **Missing data** * i.e. the information that we fail to record (more to come in session 3) Any mode of data collection is imperfect; it is an attempt to record phenomena in our world. Let's learn more about methods of data collection. <aside> Example: Covid-19 testing capacity Early in the pandemic, a lack of testing infrastructure clouded our understanding of community spread and transmission. Fluctuations in testing capacity and infrastructure effect the accuracy of the data. </aside> <Hideable title = 'On your own time'> ## Human bias in data In this session, we will dive deeper into different types of geospatial data, and review the different file types you might encounter when making a map. We’ll also have a closer look at the LMEC Public Data Portal, and learn how to recognize changes that can be made to datasets. Last session, we discussed the conscious and unconscious biases engrained in the maps we view and create. Somewhere between data spreadsheet and beutiful cartographic result, something gets lost in translation. This type of distortion is not limited to the mapping process; it starts with data collection (and lack thereof). While a spreadsheet detailing statistics for different parts of a city might look unassuming, there is a robust, possibly seedy, story behind the numbers themselves. An example we know (and don't love) is Covid-19. Early in the pandemic, a lack of testing infrastructure clouded our understanding of community spread and transmission. While this problem has not disappeared, conditions have certainly improved. Fluctuations in testing capacity and infrastructure, though, effect the accuracy of the data and can make it appear that changes in community spread have occured when, in reality, the alterations have to do with testing. While this is a particularly relevant example of bias, a journey into statistics can equip us with a toolkit for identifying potential issues with data. **Response bias**, **non-response bias**, and **small sample size** are included among the potential statsitical pitfalls in our data collection journey. Survey design can also influence data. If a researcher phrases questions in a leading manner or fails to ask about certain information, the results of a study can be substantially altered. In cases like these, the data collected do not accurately represent the reality on the ground. We should also stay aware of **missing data**, or the information that has no designated recording channel. Yet another form of power imposition comes with what we fail to document, and these ommissions are often used to perpetuate existing chanels of oppression. What do surveyors not even think to record? How is this lack of awareness created by their ingrained biases? We will talk more about missing data in session three. Let's dive into some of the technical aspects of data collection and storage to understand more about these influences. </Hideable>