FDA 期末考古

5-fold cross validation

  1. The original dataset is randomly divided into five subsets, each containing an equal number of samples.

  2. The model is trained and evaluated five times. In each iteration, one of the folds is held out as the validation set, while the remaining four folds are used for training.

  3. The model is trained on the four training folds and then tested on the held-out fold. This process is repeated five times, with each fold serving as the validation set once.

  4. The performance metrics, such as accuracy or error rate, are recorded for each iteration.

  5. The performance metrics obtained from the five iterations are averaged to provide an overall assessment of the model's performance.

將資料集拆分成五等分,每次訓練時以四等分作為 training dataset,剩下一份作為 validation dataset。訓練時紀錄每次的表現並在最後回報五次平均的表現以作為模型的整體評估。

AUC

  • AUC stands for Area Under the ROC Curve. It is a commonly used evaluation metric in machine learning, particularly in binary classification tasks. The ROC curve (Receiver Operating Characteristic curve) is a graphical representation of the performance of a binary classification model as its discrimination threshold is varied.
  1. The model's predictions are sorted based on their predicted probabilities or scores.

  2. The ROC curve is created by plotting the true positive rate (TPR) on the y-axis against the false positive rate (FPR) on the x-axis at various threshold values.

  3. AUC represents the area under this ROC curve. It ranges between 0 and 1, where a higher value indicates better performance.

AUC 是 ROC 曲線下的面積,ROC 曲線是將 True Positive 的比例作為 Y 軸, False Positive 的比例做為 X 軸作圖。而 AUC 數值介於 0 到 1 之間。當越接近 1 時表示模型的判別準確率越好。

false negative

  • True Positive (TP): The model correctly predicts the positive class when the true class is positive.
  • True Negative (TN): The model correctly predicts the negative class when the true class is negative.
  • False Positive (FP): The model incorrectly predicts the positive class when the true class is negative.
  • False Negative (FN): The model incorrectly predicts the negative class when the true class is positive.

模型判別為 Negative,而實際狀況為 Positive 就被稱為 false negative

method for missing value imputation

  • mean imputation
  • Let's say you have a dataset containing information about houses, including variables such as size, number of bedrooms, and price. In this dataset, the "number of bedrooms" variable has some missing values that need to be imputed.
  1. First, calculate the mean value of the "number of bedrooms" variable using the available non-missing values.

  2. Next, identify the instances with missing values in the "number of bedrooms" variable.

  3. Replace the missing values in the "number of bedrooms" variable with the calculated mean value.

將整個 dataset 的平均值補入所有的缺失值。
優點是很快就能計算好,缺點則是很常忽略掉資料集中的不確定性。

method for imbalanced data training

  • oversampling

  • Let's say you have a dataset with two classes: "positive" and "negative." The positive class is the minority class, and the negative class is the majority class.

  1. Split the original dataset into a training set and a separate validation or test set. It's important to ensure that the class distribution is maintained in both sets.

  2. Apply an oversampling technique to the minority class in the training set. Oversampling involves creating additional synthetic examples of the minority class to balance the class distribution. One common oversampling technique is called "Random Oversampling." This technique randomly selects instances from the minority class and duplicates them until the class distribution is more balanced.

  3. Train your machine learning model on the balanced training set. You can use any appropriate algorithm or model that suits your problem.

  4. Evaluate the trained model on the validation or test set to assess its performance. This will give you an indication of how well the model generalizes to unseen data.

對於較少 labelclass 可以使用 oversampling 讓資料分布變回 balanced
優點是不會對於資料集造成太嚴重的偏差,缺點則是容易 overfitting

Overfitting

  • Overfitting: Overfitting occurs when a model learns the training data too well, to the point where it starts to memorize the noise or random fluctuations in the data rather than capturing the underlying patterns or relationships. As a result, the overfitted model performs very well on the training data but fails to generalize to new, unseen data. Some characteristics of an overfitted model include:
  1. Low training error: The model achieves a low error or high accuracy on the training data.
  2. High validation/test error: When evaluated on validation or test data, the model's performance significantly deteriorates compared to the training data.
  3. Complex or intricate model: Overfitting often happens when the model is excessively complex, with too many parameters or features relative to the available training data.
  4. Excessive sensitivity to training data: The model may be overly sensitive to small changes in the training data, resulting in poor generalization.

overfitting 發生於在 training dataaccuracy 很高,但是對於 validation data 或是 testing dataaccuracy 很低。

  • Overfitting and Variance: Overfitting is often associated with high variance. When a model overfits the training data, it means that it has learned the noise or random fluctuations in the data, resulting in high sensitivity to the training examples. Consequently, the model's performance on unseen data (validation or test data) tends to be worse than on the training data due to the high variance. Overfitting occurs when the model is too complex or has too many parameters relative to the available training data.

overfitting 通常對應到很高的 variance

Underfitting

  • Underfitting: Underfitting occurs when a model is too simple or lacks the capacity to capture the underlying patterns in the data. In this case, the model's performance is suboptimal both on the training data and new, unseen data. Key characteristics of an underfitted model include:
  1. High training error: The model struggles to fit the training data well, resulting in high error or low accuracy.
  2. High validation/test error: Similar to overfitting, an underfitted model also performs poorly on validation or test data.
  3. Insufficient complexity: The model may lack the necessary complexity or capacity to learn the underlying patterns in the data.
  4. High bias: Underfitting is often associated with high bias, meaning the model has strong assumptions or limitations that prevent it from capturing the true complexity of the data.

underfitting 發生於 training datavalidation datatesting dataaccuracy 皆很低。

  • Underfitting and Bias: Underfitting is often associated with high bias. When a model underfits the training data, it means that it fails to capture the underlying patterns or relationships adequately. The model is too simple or lacks the necessary complexity to represent the true data distribution. Underfitting results in high training and validation/test errors, indicating that the model is biased and unable to capture the complexity of the problem.

underfitting 通常對應到很高的 bias

Reduce overfitting/underfitting in DNN

  • Reducing Overfitting in DNNs:
  1. Increase the size of the training dataset: Having more diverse and representative data can help the DNN generalize better and reduce overfitting.
  2. Apply regularization techniques: Regularization methods such as L1 or L2 regularization (also known as weight decay) can help prevent overfitting. These techniques add a penalty term to the loss function, discouraging the model from relying too heavily on any single feature or parameter.
  3. Use dropout: Dropout is a technique where randomly selected neurons are temporarily "dropped out" during training. This helps to prevent the network from relying too heavily on specific connections and encourages the model to learn more robust features.
  4. Early stopping: Monitor the model's performance on a validation set during training and stop training when the validation error starts to increase. This prevents the model from over-optimizing on the training data.
  5. Model architecture: Adjust the complexity of the DNN by adding or removing layers and adjusting the number of neurons. A simpler model with fewer parameters may help reduce overfitting.
  6. Data augmentation: Apply techniques such as rotation, scaling, or flipping to artificially increase the diversity of the training data. This can help the model generalize better to unseen data.
  • Reducing Underfitting in DNNs:
  1. Increase model complexity: If the DNN is underfitting, it may lack the necessary capacity to capture the underlying patterns in the data. Increasing the model's complexity by adding more layers or neurons can help it learn more intricate relationships.
  2. Adjust learning rate: The learning rate determines the step size during model training. If the learning rate is too low, the model may converge slowly and struggle to fit the data. Conversely, if the learning rate is too high, the model may not converge at all. Experiment with different learning rates to find an optimal value.
  3. Feature engineering: Analyze the input features and consider adding new features or transforming existing ones to help the model capture the underlying patterns better.
  4. Increase training time: Sometimes, the model may need more iterations or epochs to learn the patterns in the data adequately. Increase the training time and monitor the model's performance on validation or test data to determine the appropriate stopping point.

Reduce overfitting/underfitting in Decision Tree

  1. Pruning: Apply pruning techniques to restrict the growth of the decision tree and prevent overfitting. Pruning removes unnecessary branches or nodes that do not contribute significantly to the overall predictive accuracy.
  2. Adjust tree depth: Limit the depth of the decision tree to control its complexity. A shallow tree may underfit the data, while an excessively deep tree may overfit. Experiment with different depths to find an optimal balance.
  3. Increase minimum samples per leaf: Specify a higher value for the minimum number of samples required in a leaf node. This reduces the likelihood of creating leaf nodes with very few instances, which can lead to overfitting.
  4. Ensemble methods: Utilize ensemble methods like Random Forests or Gradient Boosted Trees that combine multiple decision trees to improve generalization and reduce overfitting.
  5. Feature selection: Analyze and select relevant features that have a stronger predictive power. Removing irrelevant or noisy features can help reduce overfitting.

Naive Bayes

  1. Bayes' Theorem:
    At the core of Naive Bayes is Bayes' theorem, which provides a way to calculate conditional probabilities. It states that the probability of an event A occurring given an event B can be calculated as the product of the probability of B given A, the probability of A occurring, and the reciprocal of the probability of B occurring. Mathematically, it can be expressed as:

    P(A|B)=P(B|A)P(A)P(B)

  2. Feature Independence Assumption:
    Naive Bayes assumes that the features (or attributes) used for classification are conditionally independent given the class. This means that the presence or absence of one feature does not affect the presence or absence of other features. Although this assumption rarely holds true in real-world scenarios, Naive Bayes can still perform well in practice.

  3. Training Phase:
    During the training phase, Naive Bayes estimates the prior probabilities and conditional probabilities of each feature given the class labels. It builds a probability model based on the training data.

  4. Classification Phase:
    When presented with a new instance for classification, Naive Bayes calculates the posterior probability of each class label given the observed features. It selects the class label with the highest posterior probability as the predicted class.

  5. Calculating Probabilities:
    To calculate the posterior probabilities, Naive Bayes utilizes the prior probabilities (P(class)) and the conditional probabilities (P(feature|class)). The prior probabilities represent the probability of each class occurring in the training data, while the conditional probabilities represent the likelihood of observing a particular feature given a class.

  6. Handling Continuous and Categorical Features:
    Naive Bayes can handle both continuous and categorical features. For continuous features, it assumes a probability distribution (usually Gaussian) and estimates the mean and variance for each class. For categorical features, it calculates the probability of observing a specific category given each class.

  7. Laplace Smoothing:
    To handle situations where a particular feature value has not been observed in the training data, Laplace smoothing (or add-one smoothing) is often applied. It adds a small value to the counts of feature occurrences to avoid zero probabilities.

tags: 1112_courses FDA