# **CUHK-STAT1013-Proj-B**
---
>name: Una, Shengyuan Zhong
>
>student ID: 1155156303
>
>email : <1155156303@link.cuhk.edu.hk>
---
## table of contents
1. [Introduction](##Introduction)
2. [Graphs-and-Descriptive-Statistics](##Graphs-and-Descriptive-Statistics)
3. [Verifying-Necessary-Data-Conditions](##Verifying-Necessary-Data-Conditions)
4. [Hypothesis-Test](##Hypothesis-Test)
5. [Conclusion-and-Summary](##Conclusion-and-Summary)
## Introduction
- **How did you come up with the idea?**
>Because proj-A wrote this data once, I have a better understanding of this data, and I am very curious about the opening prices of some markets.
- **What are your hypotheses?**
>- **Sample size: 251**
>- **Feature documentation:**
> 
>- **Hypotheses: A data from February 2022 to February 2023, a total of 251 datas were recorded. The average opening price $\bar{x}$=120.37 with standard deviation $\hat{σ}$=23.27,Do these statistics contradict the belief that the average opening price is 121 ?**
>
>- ==**base on 251 datas, $\bar{x}$=120.37 and $\hat{σ}$=23.27, Recall the null and alternative hypothesis:**==
> ==**H~0~: μ=121**==
> ==**H~1~: μ≠121**==
- **What is the reason for your hypotheses?**
> Because $\bar{x}$=120.37and $\hat{σ}$=23.27 so I think the hypothesis value can be 121.
- **How did you gather your data?**
>[yahoo!finance](https://finance.yahoo.com/)
## Graphs-and-Descriptive-Statistics
- **two appropriate graphs**
>**Boxplot**
>
>
>**Violinplot**
>
- **summary statistics**
>The minimum value is around 80, the maximum value is around 180, and the most distributed value is around 100-140.
- **Similarities and differences between the samples are discussed**
>We don't have the second set of data, but with this set of data, his overall distribution is very good.
## Verifying-Necessary-Data-Conditions
>T = (sample mean - hypothesized value) / (sample sd of sample mean)
## Hypothesis-Test
>- test statistic is: -0.428
>- Python command
>```python=
>import pandas as pd
>df = pd.read_csv('una-stat1013 data 2.csv')
>
>print('sample mean of Open')
>print(df['Open'].mean())
>
>print('---')
>
>print('sample median of Open')
>print(df['Open'].median())
>
>print('---')
>
>print('sample std of Open')
>print(df['Open'].std())
>
>import numpy as np
>t_value = (120.37 - 121) / (23.27 / np.sqrt(250))
>print('test statistic is: %.3f' %t_value)
>- p-value is 0.668969
>- The p-value shows that the probability of extreme cases is 0.668969. Our p-value is very large, so the possibility of accepting the null hypothesis is relatively high.
>- Fail to reject H~0~. The result were statistically significant, population characteristics were inferred from the data, and accurate decisions and inferences were made. But it has no practical significance, because the financial media will change due to some external reasons, which cannot be predicted by statistics, such as the epidemic.
>- Type II Error
## Conclusion-and-Summary
>First of all, I believe my example to be somewhat constrained. He merely has a single year of collection history, which is completely meaningless, and it is not everyone can play this game. I originally wanted to know the starting price of this Nintendo and the subsequent opening circumstance since I like to use this example and I enjoy playing online games a lot. It took me a long time to look for data because I discovered several websites during my data collection that did not offer any free data. I discovered during the investigation and analysis process that the calculations and assumptions are rather repetitive, and the risk of error is fairly significant owing to the relatively little amount of data. If this project's topic could be changed and its resources were unrestricted, I would like to compare the evolution of two specific goods in an epidemic environment.