Courses

Production ML Systems

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/ML-Systems.PNG https://developers.google.com/machine-learning/crash-course/production-ml-systems

Static vs Dynamic Training

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Static-Dynamic.PNG https://developers.google.com/machine-learning/crash-course/static-vs-dynamic-training

Static vs Dynamic Inference

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Static-Dynamic-Inference.PNG https://developers.google.com/machine-learning/crash-course/static-vs-dynamic-inference/

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Static-Dynamic-Online-Inference.PNG

Data Dependencies

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Input-Data.PNG https://developers.google.com/machine-learning/crash-course/data-dependencies

Fairness

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Fairness.PNG https://developers.google.com/machine-learning/crash-course/fairness/

Fairness: Types of Bias

  • Reporting Bias

  • Automation Bias

@Google Definition:

Automation Bias is a tendency to favor results generated by automated systems over those generated by non-automated systems, irrespective of the error rates of each.

Original Video

  • Shows object detection of vehicles on roads

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/far_entrance.0.gif

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/output_annotated_330.gif

The above video is demonstrating optical flow detection within the bounding box images as you can see. The detected points are filtered over a threshold which emits data to be sent through an analysis pipeline. The Analysed data is used to determine the statistical parameters of the video.

So why does Automation Bias exist? Automation Bias exists when a piece of software with single original model can be used to determine the statistical properties which helps in optimization of the video characteristics.

  • Selection Bias

Selection Bias occurs if a data set's examples are chosen in a way that is not reflective of their real-world distribution. Selection Bias can take many different forms:

​​​​- Coverage Bias
​​​​- Non-response Bias
​​​​- Sampling Bias

In one example the vehicles were learned by a Vehicle Detection Model by Deep Learning. And then bounding boxes were drawn. The resulting vehicles from those bounding boxes form a traffic when is then graphed using Proximal considerations.

Coverage bias: Data is not selected in a representative fashion.

The particle count inside those bounding boxes are measured from Optical Flow Phase Based Methods. The data collected does not form a representative fashion but they are transformed into usable statistics.


viz = pd.DataFrame(columns=['vehicle', 'count', 'frame', 'nodes'])
frame = 0
for n, t in zip(nodes, threshold):
    frame += 1
    map_n = list(map(lambda k: str(k), n))
    for i, j in zip(n, t):
        viz = viz.append({'vehicle': i, 'count': j, 'frame': frame, 'nodes': "-".join(map_n)}, ignore_index=True)

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Vehicle-Viz-Table.PNG

Non-response Bias: (or participation bias): Data ends up being unrepresentative due to participation gaps in the data-collection process.

Participation Gaps can occur in vehicle detection because some bounding boxes may get undetected either due to Deep Learning error which is very unlikely to happen. The other reason is when the vehicles are moving sequentially, the bounding boxes switch the Confidence Intervals or Confidence Threshold implying they change their order of appearance in the detection. This introduces participation bias in detection of a particular vehicle from a set of vehicles.

Code to Explain the Bounding Box Detection:


def draw_boxes(out_write_npy, zone, frame, result, args, width, height):
    for box in result[0][0]: # Output shape is 1x1x100x7
        conf = box[2]
        # comparison of confidence threshold
        if conf >= args.pt:
            xmin = int(box[3] * width)
            ymin = int(box[4] * height)
            xmax = int(box[5] * width)
            ymax = int(box[6] * height)
            # draw bounding box rectangles
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), args.c, args.th)
            
    return frame

Code to show participation bias of bounding boxes:

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Bounding-Boxes-With-Participation-Bias.png


# distance arrays to collect distances
dist = []
node_length = len(_input[0]) # initial node length referring to frame 0
nodes = [list(range(node_length))] # collected nodes, starts from node_length of first frame
# iterate up frame (n - 1)
for i in range(len(_input)-1):
    d = []
    # loop through a cycle of next frame and the current frame
    for k in range(len(_input[i+1])):
        for j in range(len(_input[i])):
            # norm of distances between each bounding box rectangles
            d.append(np.sqrt(np.sum(np.square(np.array(_input[i][j]) - np.array(_input[i+1][k])))/4))
    dist.append(d)
    n = np.zeros(len(_input[i+1])).astype(np.int64) # initialising the nodes to zeros to explore findability
    for k,x in enumerate(d):
        # a comparison of distance to its threshold which is assumed to be PI
        if (x <= np.pi):
            n[int(k/len(_input[i]))] = int(nodes[i][k%len(_input[i])])
        else:
            if (int(k/len(_input[i])) >= len(_input[i])):
                # increment the node_length to enable detection
                node_length += 1
                n[int(k/len(_input[i]))] = node_length - 1
    
    nodes.append(n.tolist())

The above code results in nodes collected in a traffic. Such a bias can be avoided by introducing a Custom Layer to the deep learning model which results in a softmax detection. The Training of the model must be through constant validation because the softmax layers may need at least 2 dimensions of vectors.

Code which has fixed the participation bias:

Solution 1: Custom Layer


import numpy as np

def test_custom_layer(self):
        class Softmax(object):
            def __init__(self, params, blobs):
                self.xstart = 0
                self.xend = 0
                self.ystart = 0
                self.yend = 0
            # Our layer receives two inputs. We need to crop the first input blob
            # to match a shape of the second one (keeping batch size and number of channels)
            def getMemoryShapes(self, inputs):
                inputShape, targetShape = inputs[0], inputs[1]
                batchSize, numChannels = inputShape[0], inputShape[1]
                height, width = targetShape[2], targetShape[3]
                self.ystart = (inputShape[2] - targetShape[2]) // 2
                self.xstart = (inputShape[3] - targetShape[3]) // 2
                self.yend = self.ystart + height
                self.xend = self.xstart + width
                return [[batchSize, numChannels, height, width]]
            def forward(self, inputs):
                # uses cosh function in order to take a log of the function
                return [np.cosh(inputs[0][:,:,self.ystart:self.yend,self.xstart:self.xend])]

        cv.dnn_registerLayer('SoftmaxCaffe', Softmax)
        # layer proto definition
        proto = '''
        name: "LogSoftmax"
        input: "input"
        input_shape
        {
            dim: 1
            dim: 2
            dim: 5
            dim: 5
        }
        input: "roi"
        input_shape
        {
            dim: 1
            dim: 2
            dim: 3
            dim: 3
        }
        layer {
          name: "Softmax"
          type: "SoftmaxCaffe"
          bottom: "input"
          bottom: "roi"
          top: "conv_prob"
        }'''

        net = cv.dnn.readNetFromCaffe(bytearray(proto.encode()))
        for backend, target in self.dnnBackendsAndTargets:
            if backend != cv.dnn.DNN_BACKEND_OPENCV:
                continue

            printParams(backend, target)

            net.setPreferableBackend(backend)
            net.setPreferableTarget(target)
            src_shape = [1, 2, 5, 5]
            dst_shape = [1, 2, 3, 3]
            inp = np.arange(0, np.prod(src_shape), dtype=np.float32).reshape(src_shape)
            roi = np.empty(dst_shape, dtype=np.float32)
            net.setInput(inp, "input")
            net.setInput(roi, "roi")
            out = net.forward()
            ref = inp[:, :, 1:4, 1:4]
            normAssert(self, out, ref)

        cv.dnn_unregisterLayer('SoftmaxCaffe')

Solution 2: Eliminate the participation bias by connecting the nodes to the bounding boxes

Redraw the bounding boxes with the nodes matched with the bounding boxes

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Bounding-Boxes-No-Participation-Bias.png

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/output_video_annotated_330_tagged.gif

Sampling bias: Proper randomization is not used during data collection.

In this experiment shown above, we consider samples of data from bounding boxes and not the entire data which makes us apply statistical tests to the data

https://github.com/nscalo/ai-in-business/raw/main/Courses/Production-ML-Systems/Vehicle-Viz-Table.PNG

  • Group Attribution Bias

    • In-group Bias
    • Out-group Homogeneity Bias
  • Implicit Bias

    • Confirmation Bias
    • Experimenter's Bias

Fairness: Identifying Bias

https://developers.google.com/machine-learning/crash-course/fairness/identifying-bias

Fairness: Evaluating for Bias

https://developers.google.com/machine-learning/crash-course/fairness/evaluating-for-bias

https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/intro_to_ml_fairness.ipynb

Select a repo