PRINCIPLE OF LONG-AND-CLOSE-RANGE ACTION IN STRUCTURIZATION PROBLEMS AND TRAINING OF ARTIFICIAL NEURAL NETWORKS
DOI:
https://doi.org/10.31891/2307-5732-2024-337-3-54Keywords:
Artificial Neural Networks, long-and-close-range action principle, unsupervised learning, nonlinear convolutional networks, parametric sigmoid, transition matricesAbstract
Classical artificial neural networks in the general case require learning a significant number of transition matrix parameters between adjacent layers. The main idea of the work is to set the matrix once and rigidly according to some "reasonable thoughts". Then only the neurons of the network will need to be trained. This work uses the principle of long-and-close-range action in the kind of the "reasonable thoughts". The essence of this principle is that the closer layers are to each other stronger is the influence between neurons. In the current work, it is proposed to use the radial structure of the topology. This way of determining the geometric arrangement of the layers ensures the conditions of the balance of the network, namely: the set of effects of the neurons of the previous layer on the neurons of the neighboring layer is a constant regardless of the conditional numbers of neurons.
According to the mathematical essence, the proposed artificial neural networks based on the long-and-close-range action principle can be classified as a specific subclass of nonlinear convolutional networks. Nonlinear convolutions are implemented with the help of core discrete transforms, where the transition matrices of connections between adjacent layers are the cores of transformations.
As activation functions, parametric sigmoids are considered, which have only one free parameter i.e. the nonlinearity coefficient.
The developed algorithms and programs are applied to solve the problem of unsupervised learning, namely the problem of clustering. The well-known set of handwritten digits MNIST was chosen as the test data set. The problem was solved on a regular computer using only the CPU (GPU was not used).
The results of the validation of the obtained distribution over 50,000 samples of the MNIST set on 1000 clusters yielded very encouraging results. The time for solving the learning and pure clustering tasks is less than 10 minutes, and the accuracy of correct assignment to clusters at the validation stage reaches 97%.