owned this note
owned this note
Published
Linked with GitHub
## Courses
### Production ML Systems
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/ML-Systems.PNG](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/ML-Systems.PNG)
[https://developers.google.com/machine-learning/crash-course/production-ml-systems](https://developers.google.com/machine-learning/crash-course/production-ml-systems)
### Static vs Dynamic Training
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Static-Dynamic.PNG](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Static-Dynamic.PNG)
[https://developers.google.com/machine-learning/crash-course/static-vs-dynamic-training](https://developers.google.com/machine-learning/crash-course/static-vs-dynamic-training)
### Static vs Dynamic Inference
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Static-Dynamic-Inference.PNG](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Static-Dynamic-Inference.PNG)
[https://developers.google.com/machine-learning/crash-course/static-vs-dynamic-inference/](https://developers.google.com/machine-learning/crash-course/static-vs-dynamic-inference/)
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Static-Dynamic-Online-Inference.PNG](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Static-Dynamic-Online-Inference.PNG)
### Data Dependencies
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Input-Data.PNG](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Input-Data.PNG)
[https://developers.google.com/machine-learning/crash-course/data-dependencies](https://developers.google.com/machine-learning/crash-course/data-dependencies)
### Fairness
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Fairness.PNG](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Fairness.PNG)
[https://developers.google.com/machine-learning/crash-course/fairness/](https://developers.google.com/machine-learning/crash-course/fairness/)
#### Fairness: Types of Bias
- **Reporting Bias**
- **Automation Bias**
_@Google Definition:_
**Automation Bias** is a tendency to favor results generated by automated systems over those generated by non-automated systems, irrespective of the error rates of each.
_Original Video_
- Shows object detection of vehicles on roads
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/far_entrance.0.gif](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/far_entrance.0.gif)
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/output_annotated_330.gif](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/output_annotated_330.gif)
**The above video is demonstrating optical flow detection within the bounding box images as you can see. The detected points are filtered over a threshold which emits data to be sent through an analysis pipeline. The Analysed data is used to determine the statistical parameters of the video.**
**So why does Automation Bias exist? Automation Bias exists when a piece of software with single original model can be used to determine the statistical properties which helps in optimization of the video characteristics.**
- **Selection Bias**
**Selection Bias** occurs if a data set's examples are chosen in a way that is not reflective of their real-world distribution. Selection Bias can take many different forms:
- Coverage Bias
- Non-response Bias
- Sampling Bias
**In one example the vehicles were learned by a Vehicle Detection Model by Deep Learning. And then bounding boxes were drawn. The resulting vehicles from those bounding boxes form a traffic when is then graphed using Proximal considerations.**
**Coverage bias**: Data is not selected in a representative fashion.
**The particle count inside those bounding boxes are measured from Optical Flow Phase Based Methods. The data collected does not form a representative fashion but they are transformed into usable statistics.**
```python
viz = pd.DataFrame(columns=['vehicle', 'count', 'frame', 'nodes'])
frame = 0
for n, t in zip(nodes, threshold):
frame += 1
map_n = list(map(lambda k: str(k), n))
for i, j in zip(n, t):
viz = viz.append({'vehicle': i, 'count': j, 'frame': frame, 'nodes': "-".join(map_n)}, ignore_index=True)
```
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Vehicle-Viz-Table.PNG](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Vehicle-Viz-Table.PNG)
**Non-response Bias**: (or participation bias): Data ends up being unrepresentative due to participation gaps in the data-collection process.
**Participation Gaps can occur in vehicle detection because some bounding boxes may get undetected either due to Deep Learning error which is very unlikely to happen. The other reason is when the vehicles are moving sequentially, the bounding boxes switch the Confidence Intervals or Confidence Threshold implying they change their order of appearance in the detection. This introduces participation bias in detection of a particular vehicle from a set of vehicles.**
__Code to Explain the Bounding Box Detection:__
```python
def draw_boxes(out_write_npy, zone, frame, result, args, width, height):
for box in result[0][0]: # Output shape is 1x1x100x7
conf = box[2]
# comparison of confidence threshold
if conf >= args.pt:
xmin = int(box[3] * width)
ymin = int(box[4] * height)
xmax = int(box[5] * width)
ymax = int(box[6] * height)
# draw bounding box rectangles
cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), args.c, args.th)
return frame
```
__Code to show participation bias of bounding boxes:__
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Bounding-Boxes-With-Participation-Bias.png](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Bounding-Boxes-With-Participation-Bias.png)
```python
# distance arrays to collect distances
dist = []
node_length = len(_input[0]) # initial node length referring to frame 0
nodes = [list(range(node_length))] # collected nodes, starts from node_length of first frame
# iterate up frame (n - 1)
for i in range(len(_input)-1):
d = []
# loop through a cycle of next frame and the current frame
for k in range(len(_input[i+1])):
for j in range(len(_input[i])):
# norm of distances between each bounding box rectangles
d.append(np.sqrt(np.sum(np.square(np.array(_input[i][j]) - np.array(_input[i+1][k])))/4))
dist.append(d)
n = np.zeros(len(_input[i+1])).astype(np.int64) # initialising the nodes to zeros to explore findability
for k,x in enumerate(d):
# a comparison of distance to its threshold which is assumed to be PI
if (x <= np.pi):
n[int(k/len(_input[i]))] = int(nodes[i][k%len(_input[i])])
else:
if (int(k/len(_input[i])) >= len(_input[i])):
# increment the node_length to enable detection
node_length += 1
n[int(k/len(_input[i]))] = node_length - 1
nodes.append(n.tolist())
```
**The above code results in nodes collected in a traffic. Such a bias can be avoided by introducing a Custom Layer to the deep learning model which results in a softmax detection. The Training of the model must be through constant validation because the softmax layers may need at least 2 dimensions of vectors.**
__Code which has fixed the participation bias:__
**Solution 1: Custom Layer**
```python
import numpy as np
def test_custom_layer(self):
class Softmax(object):
def __init__(self, params, blobs):
self.xstart = 0
self.xend = 0
self.ystart = 0
self.yend = 0
# Our layer receives two inputs. We need to crop the first input blob
# to match a shape of the second one (keeping batch size and number of channels)
def getMemoryShapes(self, inputs):
inputShape, targetShape = inputs[0], inputs[1]
batchSize, numChannels = inputShape[0], inputShape[1]
height, width = targetShape[2], targetShape[3]
self.ystart = (inputShape[2] - targetShape[2]) // 2
self.xstart = (inputShape[3] - targetShape[3]) // 2
self.yend = self.ystart + height
self.xend = self.xstart + width
return [[batchSize, numChannels, height, width]]
def forward(self, inputs):
# uses cosh function in order to take a log of the function
return [np.cosh(inputs[0][:,:,self.ystart:self.yend,self.xstart:self.xend])]
cv.dnn_registerLayer('SoftmaxCaffe', Softmax)
# layer proto definition
proto = '''
name: "LogSoftmax"
input: "input"
input_shape
{
dim: 1
dim: 2
dim: 5
dim: 5
}
input: "roi"
input_shape
{
dim: 1
dim: 2
dim: 3
dim: 3
}
layer {
name: "Softmax"
type: "SoftmaxCaffe"
bottom: "input"
bottom: "roi"
top: "conv_prob"
}'''
net = cv.dnn.readNetFromCaffe(bytearray(proto.encode()))
for backend, target in self.dnnBackendsAndTargets:
if backend != cv.dnn.DNN_BACKEND_OPENCV:
continue
printParams(backend, target)
net.setPreferableBackend(backend)
net.setPreferableTarget(target)
src_shape = [1, 2, 5, 5]
dst_shape = [1, 2, 3, 3]
inp = np.arange(0, np.prod(src_shape), dtype=np.float32).reshape(src_shape)
roi = np.empty(dst_shape, dtype=np.float32)
net.setInput(inp, "input")
net.setInput(roi, "roi")
out = net.forward()
ref = inp[:, :, 1:4, 1:4]
normAssert(self, out, ref)
cv.dnn_unregisterLayer('SoftmaxCaffe')
```
**Solution 2: Eliminate the participation bias by connecting the nodes to the bounding boxes**
Redraw the bounding boxes with the nodes matched with the bounding boxes
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Bounding-Boxes-No-Participation-Bias.png](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Bounding-Boxes-No-Participation-Bias.png)
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/output_video_annotated_330_tagged.gif](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/output_video_annotated_330_tagged.gif)
**Sampling bias**: Proper randomization is not used during data collection.
**In this experiment shown above, we consider samples of data from bounding boxes and not the entire data which makes us apply statistical tests to the data**
![https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Vehicle-Viz-Table.PNG](https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Vehicle-Viz-Table.PNG)
- **Group Attribution Bias**
- In-group Bias
- Out-group Homogeneity Bias
- **Implicit Bias**
- Confirmation Bias
- Experimenter's Bias
#### Fairness: Identifying Bias
[https://developers.google.com/machine-learning/crash-course/fairness/identifying-bias](https://developers.google.com/machine-learning/crash-course/fairness/identifying-bias)
#### Fairness: Evaluating for Bias
[https://developers.google.com/machine-learning/crash-course/fairness/evaluating-for-bias](https://developers.google.com/machine-learning/crash-course/fairness/evaluating-for-bias)
[https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/intro_to_ml_fairness.ipynb](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/intro_to_ml_fairness.ipynb)