Neural Network Meets DCN: Traffic-driven Topology Adaptation with Deep Learning


Authors: Mowei Wang, Yong Cui, Shihan Xiao, Xin Wang, Dan Yang, Kai Chen, Jun Zhu


Motivation:
Adopting new network components (e.g., optical circuit switches or wireless radios) into the data center networks (DCNs) has become a very common approach to improve the DCN’s performance in recent days. However, how to find the optimal (or near-optimal) topology configuration to support the dynamic traffic demands has become a key challenge. In order to address this challenge, this paper proposes the xWeaver which can find the best global topology to meet the overall traffic demands in a practical DCN. xWeaver is a traffic-driven deep learning system with three key design features: expressive learning framework, data-driven feature extraction, and traffic-topology mapping learning. Experiments demonstrate that xWeaver outperforms other solutions such as Weight-matching and Sample, and it can update its model parameters for new traffic smoothly without extensive retraining everything from scratch.


Positive Points:
1. The parameters of xWeaver can be trained via historical traffic traces (offline), which can reduce the model complexity, improve its training efficiency, and tolerate a relatively-long time to generate traffic-topology samples. Moreover, it can be easily applied for online topology configuration as the topology inference through the neural network is fast and it can update its parameters smoothly when new data is available.

2. In its scoring module, it proposes a two separate multi-layer convolutional neural networks --- T-SCNN and P-SCNN --- to perform the feature extraction independently, which reduce the model complexity and make it possible to train the neural network model within a reasonable time and a limited number of data samples.

3. It uses the CRF (Conditional Random Field) module to embed the prior human knowledge such as the link relationship and link constraints in the target DCN into the FPNN neural network to further improve FPNN’s performance.

4. It has provided very rich simulation scenarios. For example, in order to analyze the performance of its scoring module, it compares four different neural network solutions, including GNN, SCNN, FNN, and SFNN. Furthermore, in order to illustrate the topology performance of xWeaver, it uses other three reference solutions including Weight-matching solution, Sample solution, and Optimal solution. And it also evaluates the learning performance over different traffic patterns including application traffic patterns, hot-spot traffic patterns, and hybrid traffic trace.

5. It has evidenced several performance metrics, different optimization objectives, and different score definitions, such as minimizing the completion time of the demands (CTD), minimizing the FCT for small flows while maximizing the throughput for large flows, and the job completion time of the benchmark Hadoop Terasort application.


Negative Points:
1. In order to train a scoring neural network model, it needs a lot of traffic-topology-score (TTS) samples. The paper uses the flow-level simulator to generate those TTS samples. However, it does not clarify the detailed settings (e.g., network topology and the number of switches in networks) used in the simulator.

2. The paper says it uses SCNN to extract the critical features in traffic and topology that are related to the self-defined optimization objective and FPNN to capture the essential interactions between traffic and topology configurations. However, it does not specify the detailed setting about those neural networks such as the number of layers, the number of cells in each layer, and the activation functions. It does not open its source code as well. Therefore, it is very hard for other people to reproduce the experiments and it is hard to know whether this setting is optimum. 



Comments

  1. I agree the paper is very interesting but negative point #2 was an issue. The paper needed to give concrete examples of what the transferred represented looked like. It was also unclear if this was completely independent of the supplied topology.

    ReplyDelete

Post a Comment

Popular posts from this blog

A Machine Learning Approach to Live Migration Modeling

StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware

Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors: A Machine Learning Approach