ВИКОРИСТАННЯ МІКРОСЕРВІСНОГО ПІДХОДУ В ПРОЦЕСІ ВЕБ-СКРАПІНГУ ВЕЛИКИХ ОБСЯГІВ ДАНИХ ДЛЯ ВЕБ-САЙТІВ ІЗ ДИНАМІЧНИМ ВМІСТОМ

ОРЕСТ  СУШИНСЬКИЙ; ВОЛОДИМИР  КОЦУН; ОЛЕНА  СКЛЯРЕНКО; ЛЕОНІД  ЛИТВИНЕНКО

doi:10.31891/2307-5732-2023-327-5-243-248

Authors

OREST SUSHYNSKYI Private higher education institution “European University” Author https://orcid.org/0000-0002-2661-6458
VOLODYMYR KOTSUN Private higher education institution “European University” Author https://orcid.org/0000-0003-2363-8157
OLENA SKLIARENKO Private higher education institution “European University” Author https://orcid.org/0000-0001-6555-1223
LEONID LYTVYNENKO Private higher education institution “European University” Author https://orcid.org/0000-0002-0828-383X

DOI:

https://doi.org/10.31891/2307-5732-2023-327-5-243-248

Keywords:

microservice, web scraping, data

Abstract

One of the main challenges of web scraping is handling dynamic content. Modern websites often use technologies such as AJAX and JavaScript to dynamically update content without reloading the page. The problem of web scraping arises from the increasing complexity of web pages that use dynamic content generated by JavaScript. This complicates the data collection process, as standard HTTP request methods cannot retrieve the full content of the page. Microservice architecture can solve this problem because it allows tasks to be distributed among small, independent services. Research and publications analysis shows that commonly used web scraping techniques can be time-consuming when
scanning large amounts of data. Various approaches are used to solve this problem, such as the fast XPath selector engine. The average reliability and resilience of XPath is 96% of successful requests and increases to 98% when using microservices. XPath provides higher reliability and resilience than other methods. The CSS Selector method is the smallest in terms of bandwidth usage compared to other methods. Using microservice processing methods can provide higher reliability and resilience when parsing large amounts of data, but will require an increase in execution time. The article aims to study the features of the microservice approach in the process of web scraping and consider the main advantages of microservice architecture. The article will explore the peculiarities of using different approaches in accessing website elements, in particular, attention will be paid to the methods of CSS selectors, Regex, and XPath. The study found that microservice architecture can improve system performance but can lead to longer turnaround times. Performance measurements have shown that the Regex method has the lowest CPU and
memory usage compared to other methods, and the XPath method provides higher reliability and resilience.

USING A MICROSERVICE APPROACH IN THE PROCESS OF WEB SCRAPING OF LARGE VOLUMES OF DATA FOR WEBSITES WITH DYNAMIC CONTENT

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Language

Make a Submission

Index

For Avtors

Flag