# Determine the sample size required for ANN *There is no "Golden Rule" for determine the minimum sample size in machine learning.* However, there are some "rule-of-thumbs" methods[^second], and *Alwosheel et al.*[^first] conducted extensive Monte Carlo analyses and concluded that *"minimum sample size of **fifty times** the number of weights in the ANN"* is advised. ### Number of weights in Study 3 - Number of weights can be calculated with: $$ N_w = (I+1)*H_1 +(H1+1)*H2 + ...+(H_{n-1}+1)*H_n +(H_n+1)*O $$ where $N_w$ is the number of weights, $I$ is the dimension of input, $H_n$ is the dimension of hidden layer $n$, and $O$ is the dimension of output. According to the rule-of-thumb by *Heaton*[^third], starting with 1 hidden layer and $H_1$ equals to two-thirds of the $I$ is recommended. > Number of input ($N=6$): > age, gender, years of driving license, car brand, years owing the car, frequency of using ADS (L2) Hence, the number of weight would be **33**. ### Required sample size According to the above conclusions, the sample size needed for ANN training is **1650**. As described in the literature[^first], 70% of the sample would be used for training, while the rest 30% would be used for validation and testing. As a result, the full sample size needed for this study would be around **2400** data points. ### Tutorial for applying ANN with python - https://www.mltut.com/implementation-of-artificial-neural-network-in-python/ [^first]:Alwosheel, A., van Cranenburgh, S., & Chorus, C. G. (2018). Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis. Journal of choice modelling, 28, 167-182. [^second]:Haykin, S. (2009). Neural networks and learning machines, 3/E. Pearson Education India. [^third]:Heaton, J. (2008). Introduction to neural networks with Java. Heaton Research, Inc.