ENSEMBLE APPROACH IN MULTIMODAL DATA PROCESSING BASED ON GOOGLE API

Authors

DOI:

https://doi.org/10.31891/2307-5732-2024-339-4-37

Keywords:

multimodal data, speech recognition, sequence-to-sequence, machine learning, artificial intelligence

Abstract

This work examines current trends in multimodal data processing using Google resources and explores the main directions in developing modern data integration methods. The analysis considers the effectiveness of using communication lines with two ensembling strategies. Developing interfaces for multimodal data processing is a crucial step toward simplifying the analysis and interpretation of complex datasets. The research can be applied to develop speech-to-text models for various industries, enhancing speech translation tasks and boosting workers' time efficiency.

Context. Development of an interface for processing multimodal data using machine learning and an ensemble approach.

Objective. The propose a system architecture for the multimodal data processing interface that leverages modern ensemble approaches and machine learning techniques.

Methods. The proposed methodology is based on the integration of multiple models using communication lines to ensure a rapid and high-quality ensemble of data from different modalities. The architecture employs two main ensembling strategies: the A/B branching strategy and the sequential strategy.

Results. The proposed system architecture demonstrates a number of advantages:

  1. Improved performance over traditional statistical machine translation systems that have been developed over the past two decades.
  2. Independent of predefined language rules, algorithms are self-learning and frequently updated based on new data.
  3. Efficient processing of multimodal data thanks to a flexible combination of ensemble strategies, which ensures efficient data integration and processing.

Conclusions. The proposed architecture, which combines branching and sequential communication lines, provides a robust framework for integrating various models, ensuring high-quality data analysis. This approach is promising for advancing multimodal data processing and offers significant potential for further research and development in the field.

Published

2024-08-30

How to Cite

BASYSTIUK, O., MELNYKOVA, N., DYMUN, I., & DYMUN, A. (2024). ENSEMBLE APPROACH IN MULTIMODAL DATA PROCESSING BASED ON GOOGLE API. Herald of Khmelnytskyi National University. Technical Sciences, 339(4), 235-238. https://doi.org/10.31891/2307-5732-2024-339-4-37