МЕТОДИ АУДІО-АУГМЕНТАЦІЇ У МОДЕЛЯХ МАШИННОГО НАВЧАННЯ

АНДРІЙ ПІРКО; ІРИНА БОРЕЦЬКА

doi:10.31891/2307-5732-2024-341-5-54

Authors

ANDRII PIRKO Ukrainian National Forestry University Author https://orcid.org/0009-0007-9056-0413
IRYNA BORETSKA Ukrainian National Forestry University Author https://orcid.org/0000-0002-6767-104X

DOI:

https://doi.org/10.31891/2307-5732-2024-341-5-54

Keywords:

audio augmentation, machine learning, neural networks, audio signal processing

Abstract

This paper explores the role of audio augmentation techniques in enhancing the classification of guitar chords using machine learning. In the field of audio analysis, especially for tasks like chord classification, obtaining a sufficiently large and diverse dataset can often be challenging. Audio augmentation addresses this issue by synthetically increasing the size and diversity of the training dataset, thereby allowing models to generalize better to unseen data. By modifying audio signals in specific ways, such as adding noise, altering speed, applying reverb, and shifting the timing of signals, augmentation enables the creation of varied versions of the original audio. This helps in simulating real-world scenarios, where audio inputs can be distorted due to various factors such as environmental noise, recording equipment limitations, or differences in instrument performance.

The study employs a convolutional neural network (CNN) architecture for the classification task, a choice motivated by CNNs' effectiveness in learning spatial hierarchies and patterns, which are crucial for recognizing features in audio spectrograms. The dataset of guitar chords, initially limited in scope, was augmented with various techniques, each chosen to mimic different types of distortions or variations that a chord signal might encounter in practice. For instance, noise addition simulates interference or background sound, speed modification accounts for variations in tempo, reverb mimics the effects of different acoustic environments, and time shifting introduces subtle timing variations often seen in live recordings.

These transformations expand the dataset, ensuring the model is exposed to a broad spectrum of variations, which enhances its ability to generalize to new, unseen audio samples. The CNN trained on this augmented dataset exhibited significantly higher classification accuracy compared to models trained on the original, non-augmented dataset. This finding underscores the importance of data diversity in training machine learning models, particularly for audio classification tasks where real-world data often contains unpredictable variations.

AUDIO AUGMENTATION METHODS IN MACHINE LEARNING MODELS

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Language

Make a Submission

Index

For Avtors

Flag