Training sets processing neural models

Defense Date: March 15, 2022

The primary objective of this thesis is to describe and evaluate the efficiency of different models that use sets as inputs, the challenges that they face in comparison to other models, and the problems that they were designed to solve. The main focus was on the Set Transformer, which utilizes the attention mechanism. The thesis addresses the question of whether adding additional complexity to the attention mechanism would enhance the performance of the model. In the scope of the research, several experiments were carried out. The mentioned model was tested on image classification and generating movie recommendations along with a DeepSets model for comparison purposes. The models have been implemented in Python using TensorFlow and Keras libraries. The experiments were run on Google Colab Platform, which allows writing and executing Python code directly in the browser, as well as providing access to GPU power. There were additional neural networks added to the attention implementation, and the modified mechanism was used either in the input layer of the model or all the layers. The networks varied in size and complexity so that the best setup was found that allowed for optimal results. A change in the input layer of the model did indeed enhance its performance for tabular datasets. When images were used as input, however, the results were not significantly better.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Paweł Zawistowski

Share on