/dev/log for leafaffliction

# /dev/log for leafaffliction A simple computer vision pipeline to detect leaf diseases ## Installation and usage Requirements: `python 3.10.12, uv` Run analysis `uv run -m src.Distribution -id id1` Run Augmentation `uv run -m src.Augment` Transform pipeline example `uv run -m src.Transform -id id1` Run training `uv run -m src.Train -id id1` Run prediction `uv run -m src.Predict -id id1` ## Pipeline architecture ![image](https://hackmd.io/_uploads/HkFGB2fogg.png) > https://excalidraw.com/#json=PmsJz27_wdpsv55bFj0oH,b6TG0WYvzSmgmc3pKZPhtQ ## Preprocessing Some analysis is done on the dataset to ensure fundamental data integrity, it is found that the dataset provided contains label imbalance. ![image](https://hackmd.io/_uploads/rkbvhAY5xg.png) this imbalance will induce bias towards majority classes when training. While this may be acceptable in scenarios where the imbalanced distribution of the clases in training is similar in deployment, it is simply not applicable to this problem, or not specified; hence corrective actions must be taken to balance out the labels. both undersampling and oversampling is tried, the underesampling method is done by removing all extra images from the classes until the number of images for every class matches the number of images in the minority class. The under sampling yielded mediocre results in training due to training data overfitting ![image](https://hackmd.io/_uploads/Hy3JC0Y9gg.png) the oversampling method involves generating 6 transformations from multiple images in the minority classes, the transformations are - flip - rotate - skew / shear - zoom - contrast - distortion our new distribution after balancing is like so ![image](https://hackmd.io/_uploads/SkHIA0Fceg.png) we also observed improvements in training metrics ![image](https://hackmd.io/_uploads/B1pDACFqgg.png) ## Image transformation Although the labels are balanced, the raw image that we supply isnt suitable for classification directly. This is simply because the images in the dataset is not normalized enough (in terms of their "features") and of course, the lack of feature engieering to extrapolate prominent features amongst different labels. There are 2 major pipelines in an image processing pipeline: 1. Object segmentation - detect or isolate target object from the background or other noise 2. Object analysis - feature extraction For object segmentation and the rest of the pipeline, it is important to normalize all images using methods like white balancing to ensure low variation between images due to lighting changes. Image thresholding for object segmentation is done by selescting a color channel in the image and limit the colors of that image to specific colos. This can help with backgrounf segmentation and a color with a high contrast between the background and target object is selected. Multiple thresholding steps are needed and more noise fitlering is needed. An ML model can also be used for Object segmentation by labelling the regions of interest in the image. For noise reduction, filters such as bilateral blur and gaussian blur can be used. The preprocessing from out pipeline starts with a simple grayscale via saturation channel into binary thresholding to seperate the target object from the background, and some gaussian blur. the objective here is to distinct the parts of the image with the high contrast and to smooth out random noise artifacts for further preprocessing. ![image](https://hackmd.io/_uploads/H1A86bOrge.png) > before ![image](https://hackmd.io/_uploads/SkIiyGOSgl.png) > grayscale ![image](https://hackmd.io/_uploads/BJWLaWdSgg.png) > binary threshold ![image](https://hackmd.io/_uploads/rynPRZ_Sle.png) > after blur After that, we will apply the mask of the binary threshold of the blur to the original image, where we keep the pixels which are while and discard the ones which are not ![image](https://hackmd.io/_uploads/HkTW1fdreg.png) > after initial mask There are still some remaining artifacts (the shadow and some background), so further preprocessing is still needed. We will do the same thing, but we grayscale using other channels ('green-magenta & blue-yellow') instead. ![image](https://hackmd.io/_uploads/ryK_xfOSee.png) > green-magenta ![image](https://hackmd.io/_uploads/r1Qtlf_Hlg.png) > blue-yellow And as we can see, by doing so we have successfully made the shadow less obvious in the current grayscale. The binary threshold process is repeated and we get a mask which does not contain too much of the shadow for this particular example ![image](https://hackmd.io/_uploads/H1E-WMdBxe.png) > mask on green-magenta | blue-yellow We will combine this mask with the one generated by the saturation channel, and do a filling operation to fill out any remaining artifacts. This operation will leave us with this final threshold mask ![image](https://hackmd.io/_uploads/B1atWMOrxx.png) > no more shadows after fill Once we re-apply the mask to the input, we will get the background and shadow removed from the original picture ![image](https://hackmd.io/_uploads/rk9pWMdrel.png) > final There are a couple of ways to help validate the whole procedure, such as landmarks, size analysis and the color histogram. ![image](https://hackmd.io/_uploads/BkJzfzuBgg.png) > Before processing, the red channel is highly left-skewed which will cause trouble for the classifier ![image](https://hackmd.io/_uploads/rklEMMdSll.png) > After processing, the color distributions are more balanced now, which is useful to improve classification metrics ![image](https://hackmd.io/_uploads/B1LnzG_Hex.png) > Pseudolandmarks for edge detection, looks like it accurately detects the edges of our target object after preprocessing. ![image](https://hackmd.io/_uploads/HJe5MG_Hxx.png) > If we did not do it correctly, the pseudolandmarks will be inaccurate.. ![image](https://hackmd.io/_uploads/Hkx-XzuSgx.png) > Size analysis, looks like the borders and shape of the object is well defined. ![image](https://hackmd.io/_uploads/r1iGXzuSll.png) > If we did not do it correctly, the size analysis will be inaccurate.. All of this is done to reduce irrelevant pixels that may regress the classifiers performance. > Note: implementation of optimization of inage processing parametes can be considered in the future; color channrl distribution differences across multiple classes maybe? ## Modelling and prediction for training, a deep learning model with an architecture like so is used ``` ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ rescaling (Rescaling) │ (None, 256, 256, 3) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d (Conv2D) │ (None, 254, 254, 32) │ 896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d (MaxPooling2D) │ (None, 127, 127, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout (Dropout) │ (None, 127, 127, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_1 (Conv2D) │ (None, 125, 125, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_1 (MaxPooling2D) │ (None, 62, 62, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_2 (Conv2D) │ (None, 60, 60, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_2 (MaxPooling2D) │ (None, 30, 30, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten (Flatten) │ (None, 115200) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense (Dense) │ (None, 128) │ 14,745,728 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 5) │ 645 │ └─────────────────────────────────┴────────────────────────┴───────────────┘ ``` > Note: hyper parameter tuning is not done here due to hardware limitations Each domain (apple, grape) will have each have a model, and their final performance in the validation set is like so: ![image](https://hackmd.io/_uploads/rka2ZJ55lg.png) > apple ![image](https://hackmd.io/_uploads/r1Opbk55lx.png) > grape A image viewer was also made to inspect the transmormations and predictions easier ![image](https://hackmd.io/_uploads/By31z159xx.png)