Analysis of Frameworks and Technologies For Solving Big Data Storage and Processing Problems in Distributed Systems

Uploaded by

t9951198569

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Analysis of Frameworks and Technologies For Solving Big Data Storage and Processing Problems in Distributed Systems

Uploaded by

t9951198569

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Analysis of Frameworks and Technologies for

Solving Big Data Storage and Processing Problems

in Distributed Systems

Olga Blinova Oleg Kuprikov Mais Farkhadov

lab.17 lab.17 lab.17

Trapeznikov Institute of Control Trapeznikov Institute of Control Trapeznikov Institute of Control
Sciences of Russian Academy of Sciences of Russian Academy of Sciences of Russian Academy of
Sciences, Sciences, Sciences,
Moscow, Russia Moscow, Russia Moscow, Russia
[email protected] [email protected] [email protected]

Abstract—The article discusses the main metrics for be optimal for solving a specific task? The old evaluation
evaluating the performance of big data warehouses, compares methods no longer work. All modern distributed systems are
the characteristics of the most popular modern frameworks. in a state of dynamic equilibrium, data can move between
When considering distributed storage of large amounts of different machines, user requests are also redirected
information, it is impossible to use the same indicators of speed depending on the load in the system. The same query made
and efficiency that are used in local databases. Qualitative and to the same database can be executed in a different time, with
quantitative evaluation is complicated by the inability to a different result, use different data channels and be executed
reproduce the same queries in the same conditions when using on different hardware. How then to compare the proposed
distributed storage. This is caused by the constantly changing
solutions and choose the best one for the current task? What
state of the transmitting environment and computing
do you need to know about the specifics of the task before
resources. When choosing a framework, technology stack, file
system, distributed database architecture for a specific task, it moving on to choosing a technology stack? How to evaluate
is necessary to take into account a large number of factors, as the effectiveness of the information system? Let's consider
well as the totality and frequency of operations and treatments the basic principles of building and evaluating the
that will be required in this system. The features and effectiveness of distributed information systems for storing
advantages of using containers and various file systems for big data.
storing data in distributed systems are presented, as well as
other technologies that allow increasing the speed and II. BASIC PRINCIPLES OF THE ORGANIZATION OF BIG DATA
efficiency of data processing in large arrays of heterogeneous SYSTEMS
data. At the moment, there is no single solution that would
A. The Concept of Big Data and Common Frameworks
allow the best way to organize the storage of large volumes of
heterogeneous data, but there are trends in the development of Before proceeding to evaluating the effectiveness of
technologies for their processing. Many of them are related to systems, it is necessary to determine what makes it possible
the ability to apply to big data the same methods that are used to attribute systems to big data. Despite the great prevalence
in relational databases, but taking into account specific of the term "big data", it does not have an unambiguous
features.* definition. In 2013, this term was added to the Oxford
Dictionary, and it is defined as follows: "sets of information
Keywords—big data, control system architectures, that are too large or too complex to handle, analyze or use
heterogeneous data, metrics of distributed systems, distributed with standard method"[1]. Wikipedia provides the following
systems definition: Big Data is the designation of structured and
unstructured data of huge volumes and significant diversity,
I. INTRODUCTION
efficiently processed by horizontally scalable software tools
Technologies and distributed big data storage systems are that appeared in the late 2000s and alternative to traditional
rapidly developing and gaining popularity, as they are database management systems and solutions of the Business
optimally suited for storing data in conditions of a huge flow Intelligence class [2, 3]. That is, big data can include arrays
of new information, most often poorly structured. New of information that cannot be decomposed and processed on
technologies for organizing data storage, access to them, a single computer, so special hardware and software
methods and methods of processing, technologies for solutions are used to place an array of data on multiple
ensuring and maintaining data integrity, providing remote devices while preserving the integrity and availability of data
access and multi–user work are solutions dictated by the [4,5]. Most often, such storages are built on the basis of
realities of the modern world. There are a large number of cloud technologies. Since, in fact, it is cloud technologies
big data storage solutions on the market, most often these are that provide the necessary virtualization layer, which allows
complex technology stacks, ranging from the file system to you to work with distributed data storage as a whole. This
the user interface. How to determine which technologies will means that when evaluating big data solutions, you can use
the experience of evaluating cloud data warehouses.
*
This study was conducted within the framework of the scientific program
of the National Center for Physics and Mathematics, section №9 «Artificial Of the most widely used frameworks and platforms, a
intelligence and big data in technical, industrial, natural and social systems huge ecosystem of technologies developed by Apache should

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on September 05,2024 at 10:33:18 UTC from IEEE Xplore. Restrictions apply.
be highlighted. Apache Hadoop is a platform that allows III. BIG DATA STORAGE PERFORMANCE
parallel processing and distributed data storage; Apache When working with any framework and with any data
Spark is a general—purpose distributed data processing source , it will be possible to distinguish three main stages:
environment; Apache Kafka is a streaming processing
platform; Apache Cassandra is a distributed NoSQL database • Data collection. Data can be collected both from the
management system. Storm is another prominent solution, network for certain requests and received through other
focused on working with a large real-time data flow, Flink, channels: for example, video surveillance cameras. The data
Heron, Presto, Samza and Kudu [6,7]. Another of the most source can be local and global. It is worth evaluating the
popular frameworks is Greenplum, an open source massively estimated database volumes and the speed of their collection
parallel relational DBMS for data warehouses with flexible from sources.
horizontal scalability and columnar data storage based on
• Data integration. Special systems convert them into a
PostgreSQL [8]. MongoDB cannot be called a framework,
format suitable for storage and processing, they can also
but it is a flexible and scalable solution based on a document-
continuously monitor for certain requests. At this stage,
oriented NoSQL DBMS.
when choosing technologies and data formats, it is necessary
B. Big Data File Systems to compromise: either more resources for structuring and
Special file systems are needed to store and maintain data transforming data, storing meta-information; or storing
integrity in distributed systems. HDFS (Hadoop Distributed poorly structured information and a lot of overhead during
File System) is a distributed file system that is designed to processing.
store large data sizes that are block–by-block divided • Processing and analysis. There may be real-time
between nodes of a computing cluster. Other file formats are processing or data preparation for processing that will be
built on the basis of HDFS. The main features of HDFS: performed in the future. Popular methods of data analysis:
replication mechanism, data aggregation into blocks, learning associative rules, classification, cluster and
hierarchical file system, conveyor replication. Replication is regression analysis, mixing and integration of data, machine
necessary to maintain data integrity so that when one node learning, pattern recognition and others. At this stage, it is
fails, data is not permanently lost. That is, all data is important to understand the amount of information being
necessarily duplicated. When new data is entered, a list of processed and the frequency of requests.
nodes is determined on which a copy of the blocks will be
stored and, further, data is transmitted from one node to The basic principles of organizing distributed repositories
another by the conveyor method. Hierarchical file of large amounts of information include, firstly, extensibility.
organization is most often used in file systems - the user can When using distributed storage, they talk about horizontal
create file directories in the same way as on local machines extensibility, that is, the ability to increase the required
storage volumes flexibly, without having to rebuild the entire
HDFS is a writeone system, you cannot update a line – system. Secondly, fault tolerance. Any devices can fail and
only delete it completely. HDFS allows you to store fail, nevertheless, the system must work smoothly. This is
completely unstructured binary files, but it is inconvenient usually ensured by distributing the load to other machines
for users and database management systems to work with instead of the failed one and organizing the data in such a
them, so there are a number of file formats based on HDFS way that all information is duplicated in parts on different
that allow partially structuring information. Some of them, machines, then when one of the devices exits due to a failure,
such as AVRO, Parquet, ORC, allow you to work with data the information will not be lost. Thirdly, localization.
using SQL queries (a special technology is used for this – Processing of various amounts of information should take
spark sql [9]. place on the same dedicated servers where they are stored.
All the variety of Big Data file formats (AVRO, This reduces the load on data channels and system overhead.
Sequence, Parquet, ORC, RCFile) can be divided into 2 Two main metrics are used to numerically evaluate the
categories: linear (string) and column (column). Linear ones performance of big data systems: memory consumption and
are more compactly stored on the hard disk, but they are CPU seconds during a test query to the database. It should be
difficult to process – to find the right value, you need to read understood that in distributed systems it will not be possible
the entire string. The columns take up more space due to the to get the same result when repeating a test query at another
storage of indexes, but are much faster to process, since the time. Averages are usually used. In addition to the request,
search goes directly to the desired column [10]. When report preparation, analysis, data collection, etc. can be
choosing the data format used, it is necessary to pay attention tested. For these purposes, it is worth choosing test tasks that
to the complexity and frequency of data processing requests, are as close to real as possible.
as well as the structuring of information.
There are several other indicators that are worth paying
S3 (Simple Storage Service) is an object storage and data attention to: the frequency of data collection, the time it takes
transfer protocol developed by Amazon. Its uniqueness lies for the data to become available for analysis, the time it takes
in storing a huge amount of data in the original format to transform the data into KPIs, etc.
without hierarchy and splitting into separate directories, as
well as in the ability to modify data, which is impossible in IV. CRITERIA FOR CHOOSING A FRAMEWORK FOR THE T ASK
HDFS. S3 storage has no scaling restrictions, the most OF PROCESSING BIG DATA
modern frameworks are starting to use S3 as a replacement When choosing from frameworks and ready-made
for HDFS. solutions, it is necessary to pre-evaluate the signs of big data
and study the features of the selected task. Big data is often
described using a set of "V" attributes:

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on September 05,2024 at 10:33:18 UTC from IEEE Xplore. Restrictions apply.
Volume. The data sets must be large. The best criterion is V. TECHNOLOGIES TO IMPROVE PRODUCTIVITY
the need to distribute an array of data in terms of their
storage and processing. But both 100 TB and 10000 TB can A. In-Memory Technologies
be attributed to big data. It is necessary to assess the current Large data arrays require not only a lot of free space on
volume of data and its growth prospects. the hard disk, but also a large amount of RAM for
processing. The memory in the computer is organized
Variety. It is believed that heterogeneity is one of the key according to a hierarchical system and the speed of operation
features of big data information, but heterogeneity can be in RAM is several times higher than the speed of accessing
physical or semantic, as well as differ in the severity of this the hard disk. However, when processing big data,
feature. It is worth evaluating the properties of the incoming processing is most often performed iteratively, with
data stream as well as the complexity of its processing and intermediate results recorded on the hard disk. Due to this,
transformation. considerable time is lost when working. In memory
Velocity is the rate at which information enters your technologies are used to speed up computing operations [11].
system. The incoming flow of information can be huge, or it When using in memory technology, data compression
can be relatively scarce with a large amount of stored algorithms such as DWARF are used. Using these
information. Another important point is the uniformity of this technologies, it became possible to create databases that use
flow. For example, you may have a huge flow of data twice a RAM as the main storage. These include In-Memory Data
year, and the rest of the time the flow of information is quite Base (IMDB) and In-Memory Data Grid (IMDG). IMDB is
scarce. closer in architecture to traditional relational databases,
although it uses generational storage. IMDG is an object
Veracity. This property is important when processing this storage more similar to a multithreaded hash table. The main
data. For example, you plan to train a neural network based advantage of IMDG is the ability to work with objects from a
on this data. And it turns out that the data is unreliable. relational data model. Among the most popular in memory
Additional processing checks may be needed. database offerings are SAP with relational IMDB HANA,
Viability. One can understand both obsolescence and Oracle with IMDB TimesTen, as well as IMDB from
technical obsolescence. When evaluating this property, pay MemSQL and IMDG from GridGain.
attention to whether the number of requests you have B. Containerization
depends on the time that this data is stored in the database.
When cloud technologies and big data technologies are
Value is a value that should be taken into account when used, most often the virtualization layer is deployed using
organizing access to data and evaluating frameworks by the virtual machines. On the basis of a physical server, a virtual
level of security. environment is created that works as a separate computer,
but uses a predetermined amount of resources. The virtual
Variability. In many big data solutions, data overwriting
machine runs on an isolated sector of the server's hard disk,
can be either difficult or impossible. If there is a need to
its own operating system and all necessary applications are
frequently overwrite data, this must be taken into account
installed on it [12]. Virtual machines have many advantages,
when choosing a DBMS,
but there is also a serious drawback – it takes quite a long
Visualization. Different solutions have different tools for time to install and configure an additional virtual machine.
data visualization, it is important that the chosen framework This is inconvenient for systems whose load is very uneven.
provides the opportunity to use these tools. Either the installed and configured virtual machines will be
idle or there will be insufficient computing power during
In addition to the listed "V" properties, there are several peak load hours. A good solution in this situation may be the
more, with a name starting with other letters. use of containers.
Exhausting. This is the presence or absence of pre- Containerization is a technology in which program code
filtering of data; ai—grained and uniquely lexical - detail and is packaged into a single executable package together with
lexical uniqueness. Shows to what extent an element and its libraries and dependencies to ensure its correct launch. Thus,
characteristics can be correctly indexed or identified. an analog of a virtual machine is obtained, but it uses the
Relational — relativity, relativity. Whether the collected data resources of the operating system on which it is deployed.
contains common fields, whether they overlap with each The most well-known container creation platforms include
other or with previously obtained data, which will allow Docker, Kubernetes, and Porto. The first two are available
combining them or making a meta-analysis of different data for a wide range of users. What is the difference between
sets. Extensional — extensibility, extensiveness. It does not these technologies? To choose the right container system for
show the number of new records, but the ability of each your project, you need to understand the main differences
specific record to be supplemented and increased in volume. between Docker and Kubernetes, in addition to the surface
Scalability — scalability. How quickly the amount of data definition. Docker creates containers, and its management
storage in a particular system can increase. capabilities are relatively modest. Kubernetes cannot create a
After a detailed analysis of the information planned for container by itself, in fact it is an orchestrator – it allows you
storage has been carried out, attention should be paid to the to conveniently manage a large number of containers through
following properties of solutions: the file system, the data a single interface. But it requires you to use a third-party tool
model used, additional means of improving performance, to create containers. Docker clusters are more difficult to
adequacy of the task, cost, etc. create and manage compared to Kubernetes, but they are
stronger and much more stable. Kubernetes is designed for
automatic scaling of Docker containers. There are system
containers and application containers. System applications
combine all the necessary applications and are practically

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on September 05,2024 at 10:33:18 UTC from IEEE Xplore. Restrictions apply.
variants of virtual machines. Application containers are Russian State University. Series: Literary Studies. Linguistics.
designed to run a separate application in isolation [13, 14]. Cultural studies. 2018. №1 (34). (in Russian)
[2] https://ru.wikipedia.org/
[3] M. Chen, Sh. Mao, Y. Zhang, V. C.M. Leung. Big Data. “Related
Technologies, Challenges, and Future Prospects.”, Spinger, 2014, 100
VI. CONCLUSION p.
Big data technologies offer users a wide range of [4] S. V. Mkrtychev “Big data: approaches to definition and
solutions, including both full technology stacks and many classification”, Information technologies in modeling and
management: approaches, methods, solutions, 2021, pp. 253-258. (in
variations within these stacks. When choosing a Russian)
technological solution for the implementation of a specific [5] A. O. Shcherbinina “The study of architecture and the main criteria
project, it is necessary first of all to evaluate not the solution for choosing a dbms focused on big data processing”, I International
itself, but the features of the data that must be stored and Scientific and Technical Conference "Topical issues of the use of data
processed in the systems under consideration. The most analysis technologies and artificial intelligence", 2018, pp. 196-200.
common tasks, processing speed requirements, data (in Russian)
structuring and variability, etc. After evaluating the stored [6] V. I Khakhanov., V I. Obrizan, A. S. Mishchenko, B. A. Tamer
“Metric for big data analysis”, Radioelectronics and Informatics.
data, it is already possible to consider possible solutions, 2014. №2 (65). (in Russian)
taking into account not only the metrics listed above, but also [7] https://jelvix.com/blog/top-5-big-data-frameorks
the cost of the entire solution, the necessary information [8] Z. Lyu, “Greenplum: a hybrid database for transactional and
transformations, information processing speed requirements, analytical workloads”, Proceedings of the 2021 International
etc. Highlighting the key qualities of a specific practical task Conference on Management of Data, 2021, pp. 2530-2542.
is the key to understanding the requirements for the system. [9] V. S. Yakovlev “Big data”, Technique and technology: role in the
For example, when storing archived information, which is development of modern society,2015, no. 6. pp. 83-90. (in Russian)
accessed extremely rarely, information can be stored in an [10] E. A. Artyushina, I. I. Salnikov, “In-memory technologies for storing,
unstructured form. If there is a need for frequent processing processing and analyzing large volumes of structured and weakly
of information, especially if these are automated queries, or structured data”, XXI century: results of the past and problems of the
present plus, 2018, vol. 7, no 4, pp. 147-152. (in Russian)
if there are strict requirements for the response time to a
[11] A.V. Gordeev, “Virtual machines and networks”, Information and
request, more structured databases, such as key-value, will be control systems, 2006, no. 2, pp 21-26. (in Russian)
suitable. [12] M. K. Gupta, V. Verma, M. S. Verma. “In-memory database systems-
a paradigm shift”, arXiv preprint arXiv:1402.1258, 2014.
REFERENCES
[13] D. A Kozintsev., A. A. Shiyan, “containerization for big data
analysis by example kubernetes and docker”, /Actual problems of
[1] M. S. Kornev, “The history of the concept of "Big data" (Big Data): infotelecommunications in science and educatio (APINO 2020),
dictionaries, scientific and business periodicals”, Bulletin of the 2020, pp. 393-396 (in Russian)

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on September 05,2024 at 10:33:18 UTC from IEEE Xplore. Restrictions apply.

WeAccess Maker's Manual
75% (8)
WeAccess Maker's Manual
64 pages
(BS-IS) Program Specification v2
No ratings yet
(BS-IS) Program Specification v2
8 pages
Data Science and Big Data Computing - Frameworks and Methodologies
90% (10)
Data Science and Big Data Computing - Frameworks and Methodologies
332 pages
Data Science and Big Data Computing PDF
100% (2)
Data Science and Big Data Computing PDF
332 pages
Structures, Signals and Systems
From Everand
Structures, Signals and Systems
Martin Braae
No ratings yet
j.ijdsa.20241005.11
No ratings yet
j.ijdsa.20241005.11
14 pages
What Is Data
No ratings yet
What Is Data
20 pages
Big Data Processing Technologies in Distributed in
No ratings yet
Big Data Processing Technologies in Distributed in
6 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Bigdata
No ratings yet
Bigdata
12 pages
BigData OSFY Nov
No ratings yet
BigData OSFY Nov
6 pages
4 A Review Paper On Big Data and Hadoop
No ratings yet
4 A Review Paper On Big Data and Hadoop
3 pages
Ds Using RR
No ratings yet
Ds Using RR
6 pages
Big Data For Org
No ratings yet
Big Data For Org
10 pages
An Insight On Big Data Analytics Using Pig Script
No ratings yet
An Insight On Big Data Analytics Using Pig Script
7 pages
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
No ratings yet
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
3 pages
Unit- 1
No ratings yet
Unit- 1
28 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
CC Unit 3 Imp Questions
No ratings yet
CC Unit 3 Imp Questions
15 pages
Bigdata
No ratings yet
Bigdata
22 pages
Bigdata
No ratings yet
Bigdata
12 pages
Jsaer2016 03 01 21 24
No ratings yet
Jsaer2016 03 01 21 24
4 pages
$RM5TSDQ
No ratings yet
$RM5TSDQ
70 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
big data unit 1
No ratings yet
big data unit 1
24 pages
Big Data From Beginning To Future
No ratings yet
Big Data From Beginning To Future
17 pages
UNIT1 -BDH
No ratings yet
UNIT1 -BDH
77 pages
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
No ratings yet
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
8 pages
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
No ratings yet
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
6 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
Module-3 (Part-2)
No ratings yet
Module-3 (Part-2)
46 pages
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
12 pages
Unit-11 big data
No ratings yet
Unit-11 big data
18 pages
(IJCST-V5I4P10) :M Dhavapriya
No ratings yet
(IJCST-V5I4P10) :M Dhavapriya
5 pages
Big Data Analytics On Large Scale Shared Storage System: First Seminar
No ratings yet
Big Data Analytics On Large Scale Shared Storage System: First Seminar
22 pages
Big Data in Telecommunications
No ratings yet
Big Data in Telecommunications
10 pages
Big Data
No ratings yet
Big Data
17 pages
Big Data Analytics_Lecture Slides
No ratings yet
Big Data Analytics_Lecture Slides
72 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Guha Roy 2017
No ratings yet
Guha Roy 2017
3 pages
Big Data Analytics 1-5
100% (1)
Big Data Analytics 1-5
63 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
Rao 2018
No ratings yet
Rao 2018
81 pages
Challenging Tools On Research Issues in Big Data Analytics: Althaf Rahaman - SK, Sai Rajesh.K .Girija Rani K
No ratings yet
Challenging Tools On Research Issues in Big Data Analytics: Althaf Rahaman - SK, Sai Rajesh.K .Girija Rani K
8 pages
11-12 Big Data Concepts and Tools
No ratings yet
11-12 Big Data Concepts and Tools
30 pages
Types of Digital Data: Unit 1 Big Data KCS-061
No ratings yet
Types of Digital Data: Unit 1 Big Data KCS-061
12 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Introduction To Big Data: Soorya Prasanna Ravichandran
No ratings yet
Introduction To Big Data: Soorya Prasanna Ravichandran
33 pages
TP 4 2docuatrimestre
No ratings yet
TP 4 2docuatrimestre
10 pages
Escritura 1
No ratings yet
Escritura 1
7 pages
Big Data
No ratings yet
Big Data
4 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Big-Data-A-Comprehensive-Overview
No ratings yet
Big-Data-A-Comprehensive-Overview
25 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
BDA Unit-2 (Part 3)
No ratings yet
BDA Unit-2 (Part 3)
7 pages
Mivar NETs and logical inference with the linear complexity
From Everand
Mivar NETs and logical inference with the linear complexity
Varlamov, Oleg O.
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Installation Guide
No ratings yet
Installation Guide
122 pages
Iridium Edge User Manual
No ratings yet
Iridium Edge User Manual
60 pages
TYIT-SEm-V and Sem-VI Autonomous Syllabus 2024-25
No ratings yet
TYIT-SEm-V and Sem-VI Autonomous Syllabus 2024-25
65 pages
Klavye Calisma Sistemi
No ratings yet
Klavye Calisma Sistemi
15 pages
16-1 Mux Using 8-1 Mux, 4-1mux, and 2-1 Mux
58% (19)
16-1 Mux Using 8-1 Mux, 4-1mux, and 2-1 Mux
4 pages
Applying Data Mining Techniques To Digital Forensics: Abstract
No ratings yet
Applying Data Mining Techniques To Digital Forensics: Abstract
8 pages
Configuration & Product Info Tool Industrial Ethernet (INET)
100% (1)
Configuration & Product Info Tool Industrial Ethernet (INET)
42 pages
Intrusion Detection System - Wikipedia
No ratings yet
Intrusion Detection System - Wikipedia
15 pages
Aparamenta para Automatización
No ratings yet
Aparamenta para Automatización
28 pages
Menuselect Interfaces: Select System by Running Make Menuselect in The Asterisk Source Directory. Before Ex
No ratings yet
Menuselect Interfaces: Select System by Running Make Menuselect in The Asterisk Source Directory. Before Ex
11 pages
14 - The Ultimate Guide To Creating Facebook Ads by Coursenvy - v4
No ratings yet
14 - The Ultimate Guide To Creating Facebook Ads by Coursenvy - v4
243 pages
You Are Currently Offline: Online Help Unavailable
No ratings yet
You Are Currently Offline: Online Help Unavailable
70 pages
an_overview_of_the_applications_of_computers_in_chemistry
No ratings yet
an_overview_of_the_applications_of_computers_in_chemistry
16 pages
Final Term Question Paper Class XI Computer Science
No ratings yet
Final Term Question Paper Class XI Computer Science
4 pages
Computer Programming: Quarter III - Module 5
No ratings yet
Computer Programming: Quarter III - Module 5
17 pages
8th Semester
No ratings yet
8th Semester
13 pages
LAB03 Report
No ratings yet
LAB03 Report
8 pages
CGI (Common Gateway Interface)
No ratings yet
CGI (Common Gateway Interface)
3 pages
Tybsc Lab Iii (Sem-5)
No ratings yet
Tybsc Lab Iii (Sem-5)
28 pages
Test
No ratings yet
Test
19 pages
HANDS On LAB S4793 Image Processing Using NPP
No ratings yet
HANDS On LAB S4793 Image Processing Using NPP
22 pages
Anti Recoil - Lua
No ratings yet
Anti Recoil - Lua
5 pages
Programmable Logic Devices
No ratings yet
Programmable Logic Devices
6 pages
Michael Burch: Contact Experience
No ratings yet
Michael Burch: Contact Experience
1 page
CSC 333 Lecture Note 1
No ratings yet
CSC 333 Lecture Note 1
18 pages
Logcat
No ratings yet
Logcat
1,017 pages
Microsoft Training and Certification - MOC Course 2074A - Designing and Implementing OLAP Solutions With Microsoft SQL Server 20
No ratings yet
Microsoft Training and Certification - MOC Course 2074A - Designing and Implementing OLAP Solutions With Microsoft SQL Server 20
759 pages
Args and Kwargs
No ratings yet
Args and Kwargs
3 pages

Analysis of Frameworks and Technologies For Solving Big Data Storage and Processing Problems in Distributed Systems

Uploaded by

Analysis of Frameworks and Technologies For Solving Big Data Storage and Processing Problems in Distributed Systems

Uploaded by

Analysis of Frameworks and Technologies for

Solving Big Data Storage and Processing Problems

Olga Blinova Oleg Kuprikov Mais Farkhadov

lab.17 lab.17 lab.17

979-8-3503-4094-5/23/$31.00 ©2023 IEEE

You might also like