A Machine Learning Approach to Mapping Streaming Workloads to Dynamic Multicore Processors

February 18, 2019

Authors: Paul-Jules Micolet, Aaron Smith, Christophe Dubach

Motivation:

There exists no accurate as well as automated solution for optimizing the performance of streaming applications at both hardware level and software level. This paper presents a machine learning technique to get near optimal performance using software level threads and hardware level cores. It analyzes the effect of number of threads and number of cores on an application’s performance and attempts to get optimal performance by determining appropriate number of threads and number of cores of a Dynamic Multicore Processor using static code features of the application.

Positive Points:

[1] Paper analyzes the impact of thread partitioning on performance of various StreamIt applications for individual cores and composed cores. It concludes that the performance of an application follows the same trend and does not depend on the composition of cores. Hence, it reduces the problem of finding set of threads and core composition to two separate problems that can be modeled individually.

[2] It uses KNN model to get optimum number of threads and uses linear regression to get optimal core composition for a thread. It extracts over 50 features from StreamIt applications and only uses features which are highly correlated to optimal number of threads making the model simple. It also synthesizes benchmarks to get sufficient amount of data points for training.

[3] It unrolls loops in the program code which facilitates Thread Level Parallelism (TLP) and Instruction Level Parallelism (ILP) to estimate the optimum partitioning of the program by statically analyzing the code.

Negative Points:

[1] Paper considers variables which are highly correlated with the optimal number of threads but only considers the positive correlation and doesn’t consider negative correlation for KNN model. It also assumes that the features have similar range and hence it uses Euclidean distance for KNN without normalizing features.

[2] Model determines optimal thread number with 33% accuracy which improves to 67% for two threads away from optimal solution which is 10-20% below optimum performance.

[3] As a future work, these methods could be enhanced to take into account running multiple applications and optimizing overall number of threads and core composition using preemptive scheduling.

[4] It uses default partitioning scheme of StreamIt compiler. This partitioning could be improved using [1] which could lead to better overall performance.

[5] Will the method work if the rates of the pipes in programs are dynamic? Do we need runtime analysis of the program?

- Saurabh Gupta

Comments

UnknownFebruary 19, 2019 at 6:44 AM
Good analysis. Your correct that the paper doesn't consider dynamic stream rates; actually very little analysis of the applications themselves is presented. A question is for what type of streaming applications is this methodology likely to be effective. Also, it seemed that feature extraction was a manual process. Would the same features be predictive for other applications? How general is the StreamIt application suite.
ReplyDelete
Replies

Add comment

Search This Blog

CSci 8980 Machine Learning in Computer Systems

A Machine Learning Approach to Mapping Streaming Workloads to Dynamic Multicore Processors

Comments

Post a Comment

Popular posts from this blog

A Machine Learning Approach to Live Migration Modeling

StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware

CrystalBall: Statically Analyzing Runtime Behavior via Deep Sequence Learning