EFFICIENCY ANALYSIS FOR DCT-BASED DENOISING OF SPEECH SIGNALS
DOI:
https://doi.org/10.31891/2307-5732-2025-355-61Keywords:
additive white Gaussian noise, DCT-based denoising, efficiency analysis, block sizeAbstract
This paper investigates the problem of noise suppression in speech signals, where interference is modeled using the Additive White Gaussian Noise (AWGN) model. A key challenge for this case is to effectively reduce noise without introducing audible artifacts that degrade the perceptual quality of the speech. We employ a denoising method based on the Discrete Cosine Transform (DCT), which is applied to fully overlapping signal blocks of fixed sizes (16, 32, and 64 samples). The effectiveness of the proposed approach is comprehensively evaluated using both objective and perceptual criteria. The improvement in the output Signal-to-Noise Ratio (SNR) compared to the input (ISNR) serves as the objective measure. The perceptual quality of the processed speech is assessed using the standard Perceptual Evaluation of Speech Quality (PESQ) metric. We investigate the dependence on the input Signal-to-Noise Ratio (SNR), the processing block size, the type of threshold applied (hard and combined), and the parameter β employed in threshold calculation. The analysis, conducted on a set of standard Harvard sentence test signals, yielded highly consistent results and revealed the following tendencies: 1) A block size of N=64 consistently provides the best denoising efficiency according to both metrics compared to sizes N=16 and N=32; 2) The greatest gain in the ISNR metric is observed at low input SNR values, which is particularly important for highly noisy signals; 3) The optimal value of the parameter β depends strongly on both the input SNR0 (generally decreasing as SNR increases for ISNR optimization) and the chosen evaluation metric; 4) The combined threshold demonstrates an advantage over the hard threshold according to the perceptual PESQ metric, provided β is selected appropriately, whereas their ISNR performance characteristics are approximately the same for the respective optimal β values. Ultimately, this study underscores the necessity of an adaptive approach to parameter selection, tailored to both the specific noise conditions and the primary application's performance metric, whether objective or perceptual. Moreover, the computational complexity of the DCT-based method remains manageable, making it suitable for real-time applications. Examples of signal processing are presented and discussed.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 ПЕТРО БРИСІН, ВОЛОДИМИР ЛУКІН (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.