# Machine Learning Dataset ## Kaggle 題目 Kaggle 是目前最大的人工智慧與資料科學競賽平台。2017 年由 Google 收購。 | 題目 | Task | 文章 | 解法 | 備註 | |---|---|---|---|---| | [Bosch Production Line Performance](https://www.kaggle.com/c/bosch-production-line-performance) | | | | 改善生產線效能 | | [Mercedes-Benz Greener Manufacturing](https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/overview) | || | 賓士加速測試流程 | | [PAKDD 2014 - ASUS Malfunctional Components Prediction](https://www.kaggle.com/c/pakdd-cup-2014) | || | 華碩筆電故障元件預估 | | [wsdm - kkbox music recommendation challenge](https://www.kaggle.com/c/kkbox-music-recommendation-challenge/overview) | || | KKBox音樂推薦 | | [Santa Gift Matching Challenge](https://www.kaggle.com/c/santa-gift-matching) | || | 聖誕老人禮物配對 | | [Camera Model Identification](https://www.kaggle.com/c/sp-society-camera-model-identification) | || | 相機型號識別 | | [Passenger Screening Algorithm Challenge](https://www.kaggle.com/c/passenger-screening-algorithm-challenge) | | | | 機場乘客檢查競賽 | | [RSNA Pneumonia Detection Challenge](https://www.kaggle.com/c/rsna-pneumonia-detection-challenge) | || | 肺炎檢測競賽 | | [Udemy Course Enrollment Information](https://www.kaggle.com/datasets/songseungwon/2020-udemy-courses-dataset) | | | | | | | | | | | | | | | | | | | | | | | ## 影像 | 題目 | 文章 | 解法 |Instances|Label| 備註 | |---|---|---|---|---|---| |[鑄造加工](https://www.kaggle.com/ravirajsinh45/real-life-industrial-dataset-of-casting-product)| | | 7348 | 2 |512x512 grayscale| |[玉米葉子疾病]( https://github.com/dtunnicliffe/Maize-Leaf-Disease_CNN)| | [Kaggle](https://www.kaggle.com/datasets/smaranjitghose/corn-or-maize-leaf-disease-dataset) | 4118 | 3 |1000x585 RGB| | | | | | ## Machine Learning Repository - [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) - [OpenML](https://www.openml.org/home) | 題目 | 文章 | 解法 |Task|Instances|Attributes| 備註 | |---|---|---|---|---|---|---| | [汽車燃油效率預測](https://archive.ics.uci.edu/ml/datasets/Auto+MPG) | | |Regression|398|8| [汽車燃油效率預測](https://tellyouwhat.cn/p/machine-learning-tensorflow-keras-predicts-car-fuel-efficiency-the-regression-problem/) | | [銀行客戶定存申辦預測](https://archive.ics.uci.edu/ml/datasets/bank+marketing) | | |Classification|45211|20| 銀行營銷資料集,客戶是否將認購銀行的產品(定期存款)。 | |[銀行客戶流失預測](https://www.kaggle.com/competitions/playground-series-s4e1/overview)|||Classification|165000|13|預測客戶是否即將流失| |[薪水工作經驗](https://github.com/SteffiPeTaffy/machineLearningAZ/blob/master/Machine%20Learning%20A-Z%20Template%20Folder/Part%202%20-%20Regression/Section%204%20-%20Simple%20Linear%20Regression/Salary_Data.csv)|[文章](https://www.kaggle.com/datasets/harsh45/random-salary-data-of-employes-age-wise)||Regression|30|1|判斷薪水和工作經驗關係| |[Salary Prediction dataset](https://www.kaggle.com/datasets/rkiattisak/salaly-prediction-for-beginer/data)|||Regression|373|4|| |[公司營收預測](https://github.com/SteffiPeTaffy/machineLearningAZ/blob/master/Machine%20Learning%20A-Z%20Template%20Folder/Part%202%20-%20Regression/Section%205%20-%20Multiple%20Linear%20Regression/50_Startups.csv)|[Startups profit prediction](https://www.analyticsvidhya.com/blog/2021/11/startups-profit-prediction-using-multiple-linear-regression/)|[Kaggle](https://www.kaggle.com/datasets/karthickveerakumar/startup-logistic-regression/code)|Regression|50|4|預測新創公司營收| |[共享單車需求預測](https://www.kaggle.com/c/bike-sharing-demand)|||Regression|10885|11|| |[wine classification(sklearn)](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html)|||Classification|176|13|3種類別| |[Wine Quality](https://archive.ics.uci.edu/dataset/186/wine+quality)|||Regression|4898|11|1~10評分| |[糖尿病數值預測(sklearn)](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html)|||Regression|422|10|糖尿病患者在一年後的疾病進展情況| |[糖尿病罹患預測](https://www.kaggle.com/datasets/mathchi/diabetes-data-set)|||Classification|768|8|二元分類| |[CDC 糖尿病健康指標](https://archive.ics.uci.edu/dataset/891/cdc+diabetes+health+indicators)|||Classification|253680|21|二元分類| |[乳癌罹患預測](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer)|||Classification|569|30|| |[乳癌罹患預測 OVA Breast](https://www.openml.org/search?type=data&sort=runs&id=1128)|||Classification|1159|10934|是否得乳癌 乳癌: 258筆; 沒有乳癌: 901筆| |[森林種類預測](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_covtype.html#sklearn.datasets.fetch_covtype)|||Classification|581012|52|七種類型| |[加州房價預測](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html#sklearn.datasets.fetch_california_housing)|||Regression|20640|8|| |[鑽石價格預測](https://www.kaggle.com/datasets/shivam2503/diamonds)|[文章](https://medium.com/@a0922335447/%E9%91%BD%E7%9F%B3%E7%9A%84%E5%83%B9%E6%A0%BC%E9%A0%90%E6%B8%AC-b38e7bfbda00)||Regression|53940|10|| |[電商銷量預估](https://www.kaggle.com/c/rossmann-store-sales)|[文章](https://www.showmeai.tech/article-detail/206)|||1017209|27|| |[Student Marks](https://www.kaggle.com/datasets/yasserh/student-marks-dataset)|[文章(svm)](https://medium.com/@niousha.rf/support-vector-regressor-theory-and-coding-exercise-in-python-ca6a7dfda927)||Regression|100|2|可以用讀書時間預測分數,是何用SVR| |[Higher Education Students Performance Evaluation](https://archive.ics.uci.edu/dataset/856/higher+education+students+performance+evaluation)|||Classification|145|31|八個等級預測| |[Exam Scores](http://roycekimmons.com/tools/generated_data/exams)|[文章](https://www.kaggle.com/code/tejpal123/eda-students-performance-in-exams)|||||適合EDA| |[Glass Identification](https://archive.ics.uci.edu/dataset/42/glass+identification)|||Classification|214|9|7種類別| |[胎兒健康分類](https://github.com/dtunnicliffe/fetal-health-classification)||[Kaggle](https://www.kaggle.com/andrewmvd/fetal-health-classification)|Classification|2126|21|使用 CTG 數據預測胎兒健康結果3Types| |[紅外線熱成像溫度](https://archive.ics.uci.edu/dataset/925/infrared+thermography+temperature+dataset)|||Regression|1020|33|以使用環境資訊和熱影像讀數來預測口腔溫度。| |[帕金森氏症評估預測](https://archive.ics.uci.edu/dataset/189/parkinsons+telemonitoring)||||||| |[帕金森氏症罹患預測](https://www.kaggle.com/datasets/vikasukani/parkinsons-disease-data-set)|||Classification|195|24|| |[BMI數值預測](https://www.kaggle.com/datasets/freego1/bmi-data)||[Kaggle](https://www.kaggle.com/code/sivantm/prediction-of-bmi-using-regression)|Regression|2500|4|| |[冠心病罹患預測](https://www.kaggle.com/datasets/dileep070/heart-disease-prediction-using-logistic-regression/data)||[Kaggle](https://www.kaggle.com/code/adithyabshetty100/coronary-heart-disease-prediction/notebook)|Classification|4238|15|預測患者未來 10 年是否有罹患冠心病 (CHD) 的風險。二元分類| |[心血管疾病罹患預測](https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset)|||Classification|253680|21|判斷是否存在心血管疾病。二元分類| |[慢性腎臟病罹患預測](https://archive.ics.uci.edu/dataset/336/chronic+kidney+disease)|||Classification|400|24|判斷是否存在腎臟病。二元分類| |[肝硬化病情預測](https://www.kaggle.com/competitions/playground-series-s3e26/overview)|||Regression|7904|17|多元輸出預測| |[肥胖風險預測](https://www.kaggle.com/competitions/playground-series-s4e2/overview)|||Classification|20800|16|七種不同身體狀態分類| |[癌症預測](https://www.kaggle.com/competitions/widsdatathon2024-challenge1/overview)|||Classification|1000k|82|二元分類| |[新生兒體重預測](https://www.kaggle.com/competitions/prediction-interval-competition-i-birth-weight/overview)|||Regression|108000|37|| |[賓士產線虛擬量測](汽車通過測試所需時間)|||Regression|4209|384|汽車通過測試所需時間| |[保險理賠預測](https://github.com/brunocampos01/allstate-claims-severity)|||Regression|188318|132|| |[pump壽命預測](https://www.kaggle.com/account/login?titleType=dataset-downloads&showDatasetDownloadSkip=False&messageId=datasetsWelcome&returnUrl=%2Fdatasets%2Fanseldsouza%2Fwater-pump-sensor-data%2Fversions%2F1%3Fresource%3Ddownload)|||Regression|44807|52|預測剩餘壽命(小時)| |[鋼鐵業能源消耗預測](https://www.kaggle.com/datasets/csafrit2/steel-industry-energy-consumption)|||Classification|35040|10|預測能源消耗低中高| |[鋼材缺陷偵測](https://archive.ics.uci.edu/dataset/198/steel+plates+faults)|[完整資料集](https://www.kaggle.com/competitions/playground-series-s4e3/data)||Classification|1941|27|7種瑕疵類別| |||||||| |||||||| ## 時間序列 - [Manufacturing-Data-Science-with-Python](https://github.com/tvhahn/Manufacturing-Data-Science-with-Python/tree/master) | 題目 | 文章 | 解法 |Task|Instances| 備註 | |---|---|---|---|---|---| |[工業零件退化](https://www.kaggle.com/datasets/inIT-OWL/one-year-industrial-component-degradation?select=01-04T184835_002_mode1.csv)||||||| |[收縮包裝機組件退化](https://www.kaggle.com/datasets/inIT-OWL/vega-shrinkwrapper-runtofailure-data)||||||| |[milling-data](https://colab.research.google.com/github/tvhahn/Manufacturing-Data-Science-with-Python/blob/master/Metal%20Machining/1.A_milling-data-exploration.ipynb#scrollTo=o5sjphTg1ZqX)|[文章](https://towardsdatascience.com/anomaly-detection-in-manufacturing-part-1-an-introduction-8c29f70fc68b)|||||| |[ECG心電圖](https://www.kaggle.com/c/aia-rnn-ecg-predict/overview/description)||||||| |||||||| |||||||| ## 聲音 - [Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring](https://dcase.community/challenge2020/task-unsupervised-detection-of-anomalous-sounds-results) | 題目 | 文章 | 解法 |Task|Instances| 備註 | |---|---|---|---|---|---| |[sound-anomaly-detection]()|||Classification|||| |[MIMII Pump Sound](https://www.kaggle.com/datasets/senaca/mimii-pump-sound-dataset/data)|[Anomaly Detection using Autoencoder](https://www.kaggle.com/code/jaison14/anomaly-detection-using-autoencoder)|||abnormal:138<br>normal:381|10秒音檔8通道| |||||||| |||||||| |||||||| ## PDM - [機械異常振動(需整理)](https://www.kaggle.com/code/arnabbiswas1/predictive-maintenance-exploratory-data-analysis/notebook) - [機械異常二元分類,與故障類別分類](https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification/data) - [PredictiveMaintenance-and-Vibration-Resources](https://github.com/Charlie5DH/PredictiveMaintenance-and-Vibration-Resources) - [加速規異常分類(3類)](https://archive.ics.uci.edu/dataset/846/accelerometer) -[phm-dataset](https://ykkim.gitbook.io/wiki/industrial-ai/phm-dataset) - [BigMart Sales Data](https://www.kaggle.com/brijbhushannanda1979/bigmart-sales-data) - [極市數據集](https://www.cvmart.net/dataSets) - [Kaggle-Tabular-Playground-Series](https://github.com/andy6804tw/Kaggle-Tabular-Playground-Series) - [工業缺陷數據集匯總](https://codingnote.cc/zh-tw/p/192608/) - [大白智能](https://www.jiangdabai.com/downloads) - [autogluon AutoML使用的資料集(不定時更新)](https://github.com/autogluon/autogluon/blob/master/AWESOME.md) - [dataset](https://www.cs.toronto.edu/~delve/data/datasets.html) - [資料集](https://github.com/pravinknr/DataScience_R_Codes/tree/master/2.%20Implemetation%20of%20the%20Algorithms%20on%20Datasets/Supervised%20Machine%20Learning%20Techniques/Support%20Vector%20Machine/Salary%20Data)