Understanding TensorFlow quantization API

# Understanding TensorFlow quantization API ###### tags: small_tpu [TOC] ## GitHub repo [link](https://github.com/WeiCheng14159/Small_TPU) ## TensorFlow Quantization Flow ```graphviz digraph G { graph [rankdir="TB"]; node [color=red fontsize=10 fontname="Verdana"]; "Build NN model \n i.e. tf.keras.models.Sequential()" -> "Clone NN layers conditionally \n i.e. tf.keras.models.clone_model(model, clone_function=xx)" -> "clone_function specifies which layers (i.e. Dense layer) should be quantized" -> "quantize_apply apply quantization config to model \n i.e. q_model = quantize_apply(annotated_model)" -> "Annotated model should be recompiled \n i.e. model.compile(...)" -> "Quantization config class specifies 6 member functions to override \n i.e. tfmot.quantization.keras.QuantizeConfig" -> "Each member functions depend on quantizers \n i.e. tfmot.quantization.keras.quantizers.Quantizer" -> "Each Quantizer should specify the following functions: \n 1. __init__ 2. build 3. __call__"; } ``` ### TF Quantization Python APIs #### tfmot.quantization.keras.QuantizeConfig [Documentation](https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/quantization/keras/QuantizeConfig) [Source Code](https://github.com/tensorflow/model-optimization/blob/da9cca770e6a1abb55f6e38f9a9d47cc731dd6a9/tensorflow_model_optimization/python/core/quantization/keras/quantize_config.py#L24-L202) Notice that 6 member functions of a quantize config are as follow: * def get_weights_and_quantizers(self, layer): * def get_activations_and_quantizers(self, layer): * def set_quantize_weights(self, layer, quantize_weights): * def set_quantize_activations(self, layer, quantize_activations): * def get_output_quantizers(self, layer): * def get_config(self): All these member functions depend on quantizers [Doc](https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/quantization/keras/quantizers) i.e. [AllValuesQuantizer](https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/quantization/keras/quantizers/AllValuesQuantizer), [FixedQuantizer](https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/quantization/keras/quantizers/FixedQuantizer), [LastValueQuantizer](https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/quantization/keras/quantizers/LastValueQuantizer), [MovingAverageQuantizer](https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/quantization/keras/quantizers/MovingAverageQuantizer), These quantizers depend on [tf.quantization.fake_quant_with_min_max_args](https://www.tensorflow.org/api_docs/python/tf/quantization/fake_quant_with_min_max_args) TF ops #### fake_quant_with_min_max_args [Docs](https://www.tensorflow.org/api_docs/python/tf/quantization/fake_quant_with_min_max_args) ### TF Quantization C++ APIs [Docs](https://www.tensorflow.org/api_docs/python/tf/quantization/fake_quant_with_min_max_args) [Source Code](https://github.com/tensorflow/tensorflow/blob/ac74e1746a28b364230072d4dac5a45077326dc2/tensorflow/core/kernels/fake_quant_ops.cc#L63-L98)