# <center><i class="fa fa-edit"></i> Understanding the Water Dispenser Project </center> ###### tags: `Internship` :::info **Goal:** To gain a basic understanding of the water dispenser project and previous data analysis - [x] Understand the Water Dispenser Project - [x] Understand how to navigate Robo3T for raw data - [x] Understand previous work on data analysis **Resources:** [IoT裝置與維修說明書](https://hackmd.io/@RenJhang/r16FW6ZKt) [Smart Dispenser Project](https://hackmd.io/@RayCheng/HJk_YpRou) [新興國中開始畫圖數據觀察](https://hackmd.io/@RenJhang/SJpi2dsy9) [TakeWaterRawData](https://docs.google.com/spreadsheets/d/1tPkpihnQi1YP5FaADZE6aRzB-PKrUyho0BZTwjH3MN0/edit#gid=0) ::: ## Water Dispenser Project ### Overview - IoT-enabled water dispenser system - Purpose: To learn users’ behavior and operation - Manage, collect data from, and analyze different types of water dispensers - Large-scale system that adjusts parameters to optimize power consumption ### Navigating Robo3T ![](https://i.imgur.com/azJmhdU.png) ![](https://i.imgur.com/Igm8ZfT.png) :::success **To Sort Data In Time Order** >`db.getCollection('raw_data').find({}).sort({"Timming":1})` ::: ### Previous Data Analysis **General Overview** - Graphs - Cumulative Distribution Function (CDF) plot is a linear vs linear with data overlay and confidence limits. Shows the cumulative density of any data set over time (i.e., Probability vs. size) - Water Level Units (mL) - Graphs based on: - Different grade levels - Different weights (based on BMI) - Gender - Differences amongst people --> (?) - This was very vague **GOALS** - The effect of temperature/location of water supply from the water dispenser on students' drinking water choices - Are water dispensers sufficient to provide students with what they need? - Average wait time for students to collect water - Time Periods Allowed For Student Water Collection: * 07:30-07:45 * 08:20-08:30 * 09:10-09:25 * 10:05-10:30 * 11:10-11:25 * 12:05-12:40 * 13:10-13:25 * 14:05-14:20 * 15:00-15:25 * 16:05-16:20 * 17:00-17:20 **Previous Types of Graphs Drawn** Type 1 (2021-11-09): Total Water Intake - Found that the water intake time is mostly between 06:00 and 18:00 ![](https://i.imgur.com/Pp8pU3Q.png) - Location of the water dispensers: - Main composition of the xinxing03, 04, 05 population are students (near classrooms) - prefer to use ice water on 2021-11-09 - Main composition of the xinxing01, 02, 06 population are staff members (the principal's office, library and office, and health center, respectively) ![](https://i.imgur.com/sTgycx7.png) :::warning ### Questions for Graph 1 1. Does water intake time coincide with school hours precisely? If so what is the point of this analysis? Shouldn't this analysis be more specific to which period of time students prefer to get water out of: - 07:30-07:45 * 08:20-08:30 * 09:10-09:25 * 10:05-10:30 * 11:10-11:25 * 12:05-12:40 * 13:10-13:25 * 14:05-14:20 * 15:00-15:25 * 16:05-16:20 * 17:00-17:20 2. Students prefer ice water on 2021-11-09. Can this be more specifc? Based on gender perhaps? Seasonal changes? Weather? Did they have PE that day? Confounding variables are not listed. Sample size not stated. 3. Can there be generalizations made for teachers and staff members? ::: Type 2 (2021-11-09): Amount of Water Taken Each Time - *Answers Q1 and Q3* - When water intake levels are high, the likelihood of a line developing in front of the water dispenser is higher - xinxing01 (Principal's Office) - Usually only filled with water by the principal - Principal's habit is to fill up the amount of water he needs for a day in the morning. Does not revisit the dispenser again during the day - xinxing02 (in front of the library, staff room is left of dispenser, classroom is right of the dispenser) - Distribution of water intake periods is more scattered than that of xinxing03, xinxing04 and xinxing05 due to location - xinxing03 (Classroom area 3F, A side), xinxing04 (Classroom area 4F, including the instructor's office), xinxing05 (Classroom area 3F, B side) - Because these three water dispensers are located in the classroom area, the water intake interval is obviously dense during the class period. - Xinxing04's are includes the tutor's office, so the water intake interval will be more scattered in comparison to xinxing03 and xinxing05. - xinxing06 (Health Center) - The staff of the health center have the habit of making tea around 8:30 in the morning and 2:30 in the afternoon as well as excercising after 5:00 pm, so the water intake time will be significantly longer in these three periods. - time period chart - 13:10-13:25: 12:35-13:10 is the school's nap time, the frequency of water collection is significantly less than other time periods. - 17:00-17:20: Since 17:00-17:20 is the time to go home from school, the frequency of water collection is significantly less than other time periods. :::warning ### Questions for Graph 2 1. Sample size? Still unknown 2. Is there a way to determine who is using the dispenser other than asking for certain habits and generalizing? We have cards so maybe making the distinction between student and staff is helpful. 3. What are afterschool hours usually like? 4. Maybe obtain a schedule from the school for major events (i.e. ceremonies/plays/performances) to help prepare for high surges of water usage? 5. Any data analysis requested by teachers? students? staff? principal? 6. Other ways of data analysis considered that bar charts? histograms? cdf? - anova tests - t tests - standard error bars - line of best fit (per person maybe?) ::: Type 3 (2021-11-09): Water Intake of Each Student Day Of (Based on Teacher Request) **Version 1.0** - *Answers Q5* - Only those who have registered are considered as a valid data point - BMI indexes used: | | Category | BMI Range | | --- | -------- | ---------- | | 1. | Underweight | BMI<18.5 | | 2. | Healthy Weigh | 18.5≦BMI<24 | | 3. | Overweight | 24≦BMI<27 | | 4. | Mildy Obese | 27≦BMI<30 | | 5. | Moderately Obese | 30≦BMI<35 | | 6. | Severely Obese | BMI≧35 | - CDF Graphs for each category, one line per person - Slope = 0: water not taken - Slope ≠ 0: water taken - Averages - 3 people who are underweight: about 300 mL, 400 mL, 550 mL of water at a time, and 1~3 times a day - 16 people with normal weight: 4 people take 500 mL a day, 2 people take 1,000 mL a day, 2 people take 1,800 mL a day, 1~5 times a day - 2 people with mild obesity take about 800 mL at a time, 1~2 times a day - 1 moderately obese person takes about 800 mL at a time, once a day - Except for one of the female students' BMI values that were too light, all the others were in healthy range: Take water 1~5 times a day, total water intake 500, 1200, 2800 Type 3 (2021-11-09): Water Intake of Each Student Day Of (Based on Teacher Request) **Version 2.0** - Problem: Too little data presented for overweight category - Solution: Collapsed the 6 categories into 3: 1. Underweight 2. Healthy Weight 3. Overweight - Box and whisket plot ![](https://i.imgur.com/sPegXMG.png) ![](https://i.imgur.com/rLmGRgL.png) ![](https://i.imgur.com/U3CIX9H.png) ![](https://i.imgur.com/jwupaR2.png) :::danger ### Overall Unanswered Questions 1. It is unclear of the sample size of certain graphs. Sometimes it is included in the description, other times it is lacking. 2. Students prefer ice water on 2021-11-09. Can this be more specifc? Based on gender perhaps? Seasonal changes? Weather? Did they have PE that day? Confounding variables are not listed. Sample size not stated. 3. Any analysis of data after 2021-11-09? 4. Is there a way to determine who is using the dispenser other than asking for certain habits and generalizing? The raw data currently says "CardID: Anyone" Do we make the distinction between student and teaching staff on the card? (This may be helpful). 5. What are afterschool hours usually like? There are significantly less people so maybe we can create a graph specifically for those who tend to stay after and use the dispenser? 6. Maybe obtain a schedule from the school for major events (i.e. ceremonies/plays/performances) to help prepare for high surges of water usage? 7. Anything data requested by teachers? students? staff? principal? (Other than graph 3 about weights measured by BMI) 8. What is the method of determining how data analysis is conducted? Is it streamlined? 9. Other ways of data analysis considered other than bar charts, histograms, and cdf? **Proposals for Future Data Analysis Methods** If we were to conduct experimentation: - Methods for testing statistical significance to verify the integrity of the data analysis - anova tests (for populations of 3 or more) - t tests (for populations of 2) - standard error bars - confidence intervals? - has there been any null hypothesis established? have they been rejected or fail to rejected? Testing cause and effect - line of best fit (simple linear regression, multiple linear regression logistic regression) - per person maybe? or by specific categories requested (i.e BMI, gender, grade level, staff vs. student body) Testing correlation - Pearson's r ::: :::danger ### Chinese Version of Questions 總體未回答的問題 1. 部分圖的樣本量不明確。有時它包含在描述中,有時它缺少。 2. 2021-11-09 學生更喜歡冰水。這可以更具體嗎?也許基於性別?季節性變化?天氣?他們那天有體育課/體測/運動會之類的活動嗎嗎?未列出混雜變量。未說明樣本量。 3. 2021-11-09 之後的數據有分析嗎?我唯一發現的數據分析在:https://hackmd.io/@RenJhang/SJpi2dsy9 如果有的話請告訴我,謝謝! 5. 除了詢問某些習慣和概括之外,有沒有其他方法可以確定誰在使用飲水機?目前的原始數據顯示“CardID: Anyone”。我們是否在卡片上區分了學生和教職員工? (這可能會有所幫助) 6. 課後時間通常情況是怎樣的?人數明顯減少,所以也許我們可以專門為那些傾向於留下並使用分配器的人創建一個圖表? 7. 或許可以從學校獲得一份重大活動(表演,晚會,儀式,等)的時間表,以幫助為用水量激增做準備? 8. 有老師要求的資料嗎?學生們?職員?主要的? (除圖3關於BMI測量的重量) 9. 確定如何進行數據分析的方法是什麼?有沒有簡化的流程? 10. 除了條形圖、直方圖和 cdf 之外,還考慮過其他數據分析方法嗎? **未來數據分析方法的建議** 如果我們進行實驗: - 檢驗統計顯著性以驗證數據分析完整性的方法 - anova 測試(針對 3 人或更多人的人群) - t 檢驗(針對 2 人的人群) - 標準誤差線 - 置信區間? - 是否建立了任何零假設?他們是被拒絕還是沒有被拒絕? 測試因果關係 - 最佳擬合線(簡單線性回歸、多元線性回歸邏輯回歸) - 每人也許?或按要求的特定類別(即 BMI、性別、年級、教職員工與學生團體) 測試相關性 - 皮爾遜的 r :::