ReLeQ: An Automatic Reinforcement Learning Approach for Deep Quantization of Neural Networks

Authors: Amir Yazdanbakhsh∗, Ahmed T. Elthakeb∗, Prannoy Pilligundla, FatemehSadat Mireshghallah, Hadi Esmaeilzadeh, ACT Lab UCSD, Google Brain

This paper provides a smart Reinforcement Learning based approach to make the training process of deep neural networks computationally faster and consume less space. Reducing the bitwidth (known as deep quantization) requires manual effort, hyper-parameter tuning, and re-training. In this paper, authors provided an end to end framework (called as ReLeQ) to automate the deep quantization process without compromising the classification accuracy of deep neural network.

Intuition of the addressed research problem:
Quantization of all the layers to the same bitwidth results in sub-optimal accuracy. The intuition that the authors provide is that each layer in a neural network play different roles and hence have unique properties in terms of its weight distribution. Over-quantizing one layer can result in other layers also being over-quantized which might lead to sub-optimal accuracy as well as longer re-training and fine-tuning times. 

Main Points:
  • This paper aims to minimize the average bitwidth of the neural network layers while keeping the classification accuracy closer to the full-precision accuracy.
  • Quantization bitwidth of each layer is dependent on previous layers’s quantization at each step. At the end of each step, the award that ReLeQ agent receives is determined by both the accuracy of inference for the current quantization level and the average of quantized bitwidths.
  • LSTM network is used to consider the dependency between layers which requires knowledge of previous layers’ bitwidths, layer indices, layer sizes and other statistics about the weight distributions such as standard deviation.
  • ReLeQ agent chooses the quantization bitwidths from a discrete set ( authors used 1, 2, 3, 4, 5 as that set) to calculate reward.
  • ReLeQ was evaluated for several neural networks. It quantized those networks to average bitwidth of 2.25 on MNIST, 5 on CIFAR10 and 4 on SVHN datasets.
  • Through many experiments, it was observed that bitwidth probability profiles are not same across all layers. Hence, authors say that the ReLeQ agent was able to distinguish between layers by understanding the sensitivity of those layers to accuracy and choose bitwidths accordingly.
  • In the optimization criteria, authors added a new quantization-friendly regularization term. The objective of the optimization problem is to find the best solution closest to the desired accuracy where weight values are nearly matching the desired quantization levels.


Pros:
  • The scope of the research problem that this paper has targeted is very vast as one of the major challenges of deep learning today (apart from understanding the explainability aspect of the neural networks) is the huge amount of time that it takes to train which is dependent of the amount of space that it is consuming.
  • This paper provides a novel approach of automated reinforcement learning for bitwidth selection in the context of quantization of the weights of the neural networks.
  • Reward formulation of the ReLeQ cares for both accuracy and quantization enabling the agent to deeply quantize the network with minimal loss in accuracy. 
  • Since, the optimization equation has a quantization-friendly regularization term, it can be tailored to control the trade-off between accuracy and quantization.


Cons:
  • It is required to provide a discrete set of quantization bitwidths from which the ReLeQ agent chooses to obtain a reward. However, it is not convenient to decide what could be the plausible options for quantization levels for the agent to choose from in order to maintain good accuracy for a given neural network.
  • Authors set a threshold to prevent unnecessary or undesirable exploration to speed up the agent’s convergence time. However, they did not explain how a region is claimed to be unnecessary. As for a non-convex objective function, sometimes the more optimum point might occur at some other region beyond the restricted design space which the agent might not even be considering.
  • In order to speed up the learning procedure, authors perform a shortened re-training process, after every action for a given step, and use the resulting validation accuracy in formulating the reward. This approach might lead to some suboptimal local optima and they might miss better local optima.
  • As per the evaluations, authors say that LSTM enables the ReLeQ agent to converge 1.33 times faster than without LSTM. However, they did not mention how much time it took to train LSTM and hence, it is not reasonable to claim that the overall architecture is faster.
  • Authors concluded that among all the available state embedding, layer’s size and standard deviation of the layer’s weights are the most important ones. However, this conclusion was only based on the evaluation of ReLeQ convergence performance on MNIST dataset for LeNet (convolution layer 1). They did not explain about the analysis of the importance of different state embedding over other neural networks and datasets. 

- Akash Kulkarni 


Comments

  1. Very thorough analysis! You raised some excellent points. I think the search for more global minima might compromise the performance/power objectives. On this front, I would have liked to see how the resulting system improved training time and/or power consumption on a real device.

    ReplyDelete

Post a Comment

Popular posts from this blog

A Machine Learning Approach to Live Migration Modeling

StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware

Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors: A Machine Learning Approach