Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors: A Machine Learning Approach

Authors: Ramazan Bitirgen, Engin Ipek, Jose F. Martinez


Motivation:
Sharing system resources efficiently on chip multiprocessors (CMPs) platforms is the key to obtain high utilization and enforce system-level performance objectives. Unrestricted sharing of microarchitectural resources will result in destructive interference and unacceptable performance. However, the works focus on coordinated management of multiple interacting resources on CMPs are very scarce, even though several proposals focusing on addressing the management of a single microarchitectural resource exist. Aiming to fill this gap, this paper proposes a framework with the machine learning technology to manage multiple shared CMP resources in a coordinated fashion to enforce higher-level performance objectives. This novel mechanism consists of per-application hardware performance models and a global resource manager.


Positive Points:
1. The global resource manager is able to aggregate each application’s predicted performance thus it can build a global system-level performance prediction and allocate resources in a more accurate and efficient way.

2. It has employed the coefficient of variance (CoV) to estimate the prediction error and it shows the empirical prediction error cut-off point is 9%, which can serve as a good guidance or a calibration start point.

3. To determine the optimal distribution in a large allocation space, an efficient search mechanism is of essential importance. This paper provides a modified version of stochastic hill climbing. By using this method, it does not need to search on all possible combinations of resource allocations.

4. It has provided a comprehensive background and compared its mechanism with other related works in aspects of cache partition, bandwidth management, and power management.


Negative Points:
1. In artificial neural network model, it uses four hidden units, but it does not further illustrate why 4 hidden units work better and it even does not clarify the number of nodes in each hidden level. Therefore, I personally think it is hard to implement its experiment by just reading this paper.

2. The paper indicates that longer decision-making interval will result in higher system reconfiguration overhead and enable more sophisticated allocation algorithm; while shorter decision-making interval will permit faster reaction time to dynamic changes. However, in the paper, it does not clarify what decision-making interval it uses and what kind of mechanism can be used to get the best decision-making interval.

3. The models need to be periodically re-trained on fresh training samples at runtime. The paper utilizes the sampling training data. I wonder is there any online training algorithms that can solve this problem?


Name --- Taihui Li

Comments

  1. Good discussion. I think ANN design is a black art and highly empirical -- though this may be changing (can someone post papers that learn the network itself?). The online nature of the work is exciting though one wonder what kinds of applications/time-scales make sense for this. The paper is written for desktop environments, though it would be seem to be more feasible for longer running applications (e.g. server type)?

    ReplyDelete
  2. This is one of the approach to learn network
    https://ai.googleblog.com/2017/05/using-machine-learning-to-explore.html
    and link to paper https://arxiv.org/abs/1611.01578

    ReplyDelete

Post a Comment

Popular posts from this blog

A Machine Learning Approach to Live Migration Modeling

StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware