Implementation of the nDES evolutionary strategy on multiple graphics processing units
Defense Date:
Classical gradient-based optimization methods used for training neural networks limit the class of possible to use architectures and are the source of several issues. Those include the exploding or vanishing gradient problem. One of the approaches to overcome them assumes using methods that are not based on gradient. The thesis majorly focuses on the evolutionary strategy nDES, which belongs to the group of such non-gradient-based methods. The original implementation assumes computations with a single graphics processing unit. However, the algorithm is scalable and the thesis introduces implementation parallelized across multiple graphics cards. The thesis also goes through the literature and describes some popular neural network architectures, classical optimization methods as well as those rooting from metaheuristics (including evolution strategies). The nDES algorithm is thoroughly described with the included pseudocode. The thesis presents the details of the parallelized implementation. Moreover, there is a description of conducted experiments showing speed-up over the original method and comparing nDES with the classical gradient-based methods. Finally, prospective ways of further development are discussed.
