METHOD SUPER LEARNING FOR DETERMINATION OF MOLECULAR RELATIONSHIP
DOI:
https://doi.org/10.31891/2307-5732-2022-307-2-14-24Keywords:
Super Learning, machine learning, ensemble methods, molecular affinity, transformers, boosting, stacking, inhibition coefficientAbstract
This paper uses the Super Learning principle to predict the molecular affinity between the receptor (large biomolecule) and ligands (small organic molecules). Meta-models study the optimal combination of individual basic models in two consecutive ensembles - classification and regression. Each costume contains six models of machine learning, which are combined by stacking. Base models include the reference vector method, random forest, gradient boosting, neural graph networks, direct propagation, and transformers. The first ensemble predicts binding probability and classifies all candidate molecules to the selected receptor into active and inactive. Ligands recognized as involved by the first ensemble are fed to the second ensemble, which assumes the degree of their affinity for the receptor in the form of an inhibition factor (Ki). A feature of the method is the rejection of the use of atomic coordinates of individual molecules and their complexes - thus eliminating experimental errors in sample preparation and measurement of nuclear coordinates and the method to determine the affinity of biomolecules with unknown spatial configurations. It is shown that meta-learning increases the response (Recall) of the classification ensemble by 34.9% and the coefficient of determination (R2) of the regression ensemble by 21% compared to the average values. This paper shows that an ensemble with meta-stacking is an asymptotically optimal system for learning. The feature of Super Learning is to use k-fold cross-validation to form first-level predictions that teach second-level models — or meta-models — that combine first-level models optimally. The ability to predict the molecular affinity of six machine learning models is studied, and the efficiency improvement is due to the combination of models in the ensemble by the stacking method. Models that are combined into two consecutive ensembles are shown.
Downloads
Published
Issue
Section
License
Copyright (c) 2022 О. ГУРБИЧ (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.