Scalability study of Deep Learning algorithms in high performance computer infrastructures
Tutor/a - Director/a
Estudiante
Sastre Cabot, Francesc
Tipo de documento
Projecte Final de Màster Oficial
Fecha
2017
rights
Acceso abierto
Editorial
Universitat Politècnica de Catalunya
UPCommons
Resumen
Deep learning algorithms base their success on building high learning capacity
models with millions of parameters that are tuned in a data-driven fashion.
These models are trained by processing millions of examples, so that the development
of more accurate algorithms is usually limited by the throughput
of the computing devices on which they are trained.
This project show how the training of a state-of-the-art neural network
for computer vision can be parallelized on a distributed GPU cluster, Minotauro
GPU cluster from Barcelona Supercomputing Center with the TensorFlow
framework.
In this project, two approaches for distributed training are used, the synchronous
and the mixed-asynchronous. The effect of distributing the training
process is addressed from two different points of view. First, the scalability of
the task and its performance in the distributed setting are analyzed. Second,
the impact of distributed training methods on the final accuracy of the models
is studied.
The results show an improvement for both focused areas. On one hand,
the experiments show promising results in order to train a neural network
faster. The training time is decreased from 106 hours to 16 hours in mixedasynchronous
and 12 hours in synchronous. On the other hand we can observe
how increasing the numbers of GPUs in one node rises the throughput, images
per second, in a near-linear way. Moreover the accuracy can be maintained,
like the one node training, in the synchronous methods.
Entitat col·laboradora
Barcelona Supercomputing Center
