COMPARATIVE ANALYSIS OF TRAINING MODELS ON HARD LABELS VERSUS KNOWLEDGE DISTILLATION FOR IMAGE CLASSIFICATION TASKS UNDER LIMITED COMPUTATIONAL RESOURCES

Authors

DOI:

https://doi.org/10.31891/

Keywords:

machine learning, neural networks, knowledge distillation, model optimization, limited computational resources

Abstract

This paper presents a comparative analysis of two approaches to training artificial neural network models for image classification tasks: traditional training on hard labels and knowledge distillation, which leverages the soft probabilistic outputs of a pretrained teacher model to guide the learning process of a smaller student network. The study aims to evaluate the relative efficiency and practical impact of these methods under constrained computational environments, where reducing model size and inference time is essential without sacrificing predictive performance. Within the experimental framework, several network architectures were developed and tested. Two knowledge transfer schemes were implemented: distillation from a single teacher and from an ensemble of ten independently trained models, representing both individual and collective knowledge sources. All experiments were conducted on the MNIST dataset, a standard benchmark for handwritten digit recognition, using accuracy and error rate as the primary evaluation metrics. Each configuration was trained and validated ten times to ensure statistical reliability and eliminate random fluctuations in model initialization and optimization. The results demonstrate that, under the tested conditions, knowledge distillation did not provide measurable improvement in student model accuracy and, in some cases, led to a moderate decrease compared to classical training on hard labels. These findings indicate that, while knowledge distillation remains a powerful concept in large-scale deep learning, its benefits may be limited for simple fully connected architectures or small datasets with low input variability. Consequently, for tasks constrained by hardware or energy efficiency, direct independent training on hard labels remains a more reliable and computationally optimal strategy.

Published

2025-12-11

How to Cite

GRINENKO, O., & SUKHOVYI, O. (2025). COMPARATIVE ANALYSIS OF TRAINING MODELS ON HARD LABELS VERSUS KNOWLEDGE DISTILLATION FOR IMAGE CLASSIFICATION TASKS UNDER LIMITED COMPUTATIONAL RESOURCES. Herald of Khmelnytskyi National University. Technical Sciences, 359(6.1), 471-475. https://doi.org/10.31891/