--- tags: math615 robots: noindex, nofollow --- # Data Architecture By the end of this lecture students will understand that * Data is recorded in different manners * Spreadsheets are a common method of recording data * Codebooks are an essential piece in learning about the data * There is a difference between human readable and computer readable data formats --- ## Prior knowledge :question: **What is data?** Gunner - any meaningful information that exists or is collected/recorded. Used to understand a topic(s) Matthew - Relevant information that is used to aid in answering a question Meghan - The resulting information from variables that have been measured or surveyed. Abbey - Information in different forms such as numbers that can be used to answer questions Evan: Data is information and information is power Jake - some amount of information that can be classified Ryan - Recorded values Sean- sets of information and variables which can be used to more deeply explore topics and questions Jay - bits of information that can be collected, stored, and manipulated to tell a story Eden - A collection of numbers and information that applies to them. Kenji- information one obtains Sruthi-Record values, classify, variables and better understand the data :question: **In what manner/forms is data found?** Gunner- letters,numbers, ordinal, categorical, numerical, discreet, continuous Jake - Data can be categorical, ordinal, or integer, discrete or continuous Sean-data can be measurements, observations, or forms of information Evan: almost everything everywhere Jay - digital storage centers, biological senses, all sorts of types Abbey- numerical, categorical Ryan - Discrete, continuous, categorical, numerical Matthew - numbers, letters, observations Meghan - numbers or categories Kenji- numbers, samples, graphs Eden - Graphs, tables, spreadsheets Sruthi-data sets, categories, transcripts :question: **How / where do you store data?** Jay - I have it stored on every computer that I use at this time. I also store it in google drive Kenji- Paper, hard drive, cloud Abbey- Files on my computer, google drive Meghan - Databases and files (.csv, .tsv, .txt) that are either locally or remotelt stored on a hard drive. Gunner - your brain, digitally, cloud, analog Matthew - file types, hard drive, cloud, google drive Ryan - Specific folders on my harddrive, google drive Evan: It's all up here (taps on head) Jake - data can be stored in lists, spreadsheets, test, and the placement of realworld objects. Sean - I store data on files on my hard drive, on google drive, box, dropbox Eden - Excel, Google Drive, thumbdrive, computer hard drive, Box Raquel - spreadsheet Sruthi-desktop folders, drive, box ------- :question: **What do we use data for?** Kenji-support/refute hypotheses, further a topic, argument Raquel - to find patterns Evan: mostly for marketing things Sean - To further understand a question, to agree or disagree with hypotheses, to more deeply understand differences and similarities in variables Sruthi-Drive, Box,SPSS Jake - Seeing the world and making descisions. Ryan - To examine variables, their relationships, and inform questions regarding these data Matthew - Answer questions, prove or disprove arguments Gunner - To live life, answer questions, prove, disprove, categorize, average Abbey - To understand something or find trends / patterns. To make visualizations. Eden - To make charts and graphs, which show trends in the data. # Working with data in Spreadsheets :question: **What kind of tasks do you do in spreadsheets?** Jake - I use spreadsheets to record data in my fieldbook Evan: raw data entry and organization Raquel - organize information Kenji- Graph, plot, organize Abbey - organize information, make plots Jay - Math, organization of information Meghan - Conditional formatting to look for trends, sorting, summarizing data, formatting changes... Sean - graphing, plotting, organizing Matthew - Organize, plot, conditional formatting, sort Ryan - Organize data by value, category, etc. Sruthi-Organizing, uploading and saving values for different categories Eden - To do calculations, store data, organize data :question: **Which tasks do you think spreadsheets are good for?** Jake- Recording data in a easily conceptualized space. Jay - Creating a space where one piece of information affects other bits of information and the changes are automatic based on spreadsheet Gunner - accounting, financial data, numerical data, organizing different types of data, having nice looking columns Abbey - organizing data and performing calculations. Graphs and plots Meghan - It can be helpful for sorting data by a specific variable and using conditional formatting to visually look for trends in the data. Sean- organizing and cleaning up raw data Kenji- Plotting a graph to put in research papers Ryan - Visualizing raw data, sorting data, a good format to pull data from for analysis Evan: inputting data, basic calculations and organizing Sruthi-To look up for information quickly for various categories Matthew - Organizing data, calcutate raw data Eden - Doing calculations, organizing data, making charts and graphs. :question: **Spreadsheet frustrations (Pain points)** _(What have you accidentally done that made you frustrated or sad)_ Jake - Skipped a row during data entry. Sruthi-Not saving the data and going back without saving Abbey - Graph making Jay - I didn't store my data and lost everything when my computer turned into a brick Gunner - boring, need lots of formatting, columns (sigh), Microsofts bane to our existence, necessary for organization Sean - forgetting to save, not keeping organized Kenji- Axes flipped, points off by a decimal, wrong formula Evan - switching between Excel, Google Sheets, and Libreoffice Calc is painful Meghan - Graphing! I find graphing in excel super frustrating and it never does what I expect it to do. It also runs super slow with large data sets... Ryan - Accidentally delete a whole column and go "what the heck just happened". Stare at it because I don't know what to do. Not saving. Matthew - Calculate lots of data is time consuming Eden - Glitches or has small errors that cause big problems. ## :books: Example - livestock data Consider a study of agricultural practices among farmers in two countries in eastern sub-Saharan Africa (Mozambique and Tanzania). Researchers conducted interviews with farmers in these countries to collect data on household statistics (e.g. number of household members, number of meals eaten per day, availability of water), farming practices (e.g. water usage), and assets (e.g. number of farm plots, number of livestock). They also recorded the dates and locations of each interview. If they were to keep track of the data like this: ![](https://datacarpentry.org/spreadsheets-socialsci/fig/multiple-info.png) :question: **What are some of the problems with this?** Jake - The different livestock types could be their own variables, with each cell describing the number of animals. Sean - There are several repeated data points which are confusingly displayed. It isnt clear what the table is trying to say - i.e. do three people own "oxen and cows" and are they counted in just "oxen" or "cows" etc.. Zero organization. Gunner - No descending order, title is confusing, no description of data, are numbers totals or individual amounts? Who knows? General confusion Kenji- There are points where oxen overlaps with another animal, possible indication of repeated/neglected data. Not useful for analysis individual species. Abbey - Data is not organized or displayed in a way that is easy to understand. Title does not make sense. It is difficult to understand what is being conveyed Eden- The data does not specify one specific number for one specific type of livestock. There is too much room to misunderstand how many of what animal is there. Ryan - Livestock appear to be collected under the same numerical value, rather than a sum. Any analysis done regarding quantities of specific livestock types won't be possible. No way to know if each line is a specific farmer, configuration of livestock, etc. I.e. are there 2 farmers that have a certain amount of oxen and goats, and 3 with a different amount of oxen and goats? Does one farmer have 2 animals and one have 3? Meghan - The number of each type of animal cannot be summarized well since the values such as "10 (oxens, cows, goats, poultry)" do not specify how many of which animal were surveyed. Matthew - The numbers and livestock are within the same cell. Lots of information is missing. For multiple species the total number is not separated. Jay - Each unique bit of information should be in its own column. It is difficult to determine what information this table is trying to convey. Evan: Multiple variables are mixed in the same column, so segregating them is necessary prior to analyzing in software. Moreover, info is lost due to recording only the total livestock and not the count of each animal.