Big Data Processing Technologies in Distributed in

Uploaded by

Huy Nguyen

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Big Data Processing Technologies in Distributed in

Uploaded by

Huy Nguyen

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Available online at www.sciencedirect.

com
ScienceDirect
ScienceDirect
Procedia Computer Science 00 (2018) 000–000
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2018) 000–000 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 160 (2019) 561–566

The 6th International Symposium on Emerging Information, Communication and Networks

The 6th International Symposium on Emerging
(EICN 2019)
Information, Communication and Networks
(EICN
November 4-7, 2019,2019)
Coimbra, Portugal
November 4-7, 2019, Coimbra, Portugal
Big Data Processing Technologies in Distributed Information
Big Data Processing Technologies
Systems in Distributed Information
Systems
Nataliya Shakhovskaa *, Nataliya Boykoa, Yevgen Zasobaa, Eleonora Benovab
Nataliya Shakhovskaa *, Nataliya Boykoa, Yevgen Zasobaa, Eleonora Benovab
a
Lviv Polytechnic National University, 12 Bandera Street, 79000, Lviv, Ukraine
b
Faculty of management,
a Comenius
Lviv Polytechnic University
National in Bratislava,
University, 12 BanderaOdbojárov 10, Bratislava,
Street, 79000, Slovak Republic
Lviv, Ukraine
b
Faculty of management, Comenius University in Bratislava, Odbojárov 10, Bratislava, Slovak Republic

Abstract
Abstract
The analysis of Big data technologies was provided. An example of MapReduce paradigm application, uploading of big volumes
of data,
The processing
analysis of Big and
dataanalyzing of unstructured
technologies was provided. information andofits
An example distributionparadigm
MapReduce into the application,
clustered database wasof
uploading provided. The
big volumes
article
of data,summarizes
processing the
andconcept
analyzingof of
"big data". Examples
unstructured of methods
information and itsfordistribution
working with
into arrays of unstructured
the clustered database data
was are given. The
provided.
parallelsummarizes
article system Resilient Distributed
the concept of "bigDatasets (RDD) is organized.
data". Examples of methodsThe for class of basic
working with database
arrays ofoperations wasdata
unstructured realized: database
are given. The
con-nection,
parallel table
system creation,
Resilient getting in Datasets
Distributed line id, returning
(RDD) isallorganized.
elements of
Thetheclass
database, update,
of basic delete
database and create
operations wastherealized:
line. database
con-nection, table creation, getting in line id, returning all elements of the database, update, delete and create the line.
© 2019 The Authors. Published by Elsevier B.V.
© 2019
This The
is an Authors.
open Published
accessPublished by Elsevier
article under B.V.
the CC B.V.
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
© 2019 The Authors. by Elsevier
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review
This is an under
open responsibility
access article of the
under theConference
CC BY-NC-NDProgram Chairs.
Peer-review under responsibility of the Conference Programlicense
Chairs.(http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the Conference Program Chairs.
Keywords: Big data; Web application; Modeling; Processing, analytics.
Keywords: Big data; Web application; Modeling; Processing, analytics.

1. Introduction
1. Introduction
The information technology (IT) field is a promising field of research. Recently big systems have consisted of
The information
several servers and technology
terabytes of(IT) field is a Nowadays,
information. promising field of research.
the systems use aRecently big systems
cloud cluster model, have
whichconsisted
includesofa
several servers
thousand and terabytes
of multicore of information.
processors and petabytes Nowadays, the is
of data. That systems
why it use
wasacreated
cloud cluster model, which
a new research area asincludes
Big data.a
thousand
This of multicore
paradigm processors
has already and in
reflected petabytes of data.
academic That isExamples
programs. why it was
of created
Big dataa new research
branch are thearea as Big data.
structured and
This paradigm has already reflected in academic programs. Examples of Big data branch are the structured and

* Corresponding author. Tel.: +380322582404 fax+380322582404..

E-mail address:author.
* Corresponding [email protected]
Tel.: +380322582404 fax+380322582404..
E-mail address: [email protected]
1877-0509 © 2019 The Authors. Published by Elsevier B.V.
This is an open
1877-0509 access
© 2019 Thearticle under
Authors. the CC BY-NC-ND
Published license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
by Elsevier B.V.
Peer-review
This under
is an open responsibility
access of the
article under the Conference
CC BY-NC-NDProgram Chairs.
license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the Conference Program Chairs.

1877-0509 © 2019 The Authors. Published by Elsevier B.V.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the Conference Program Chairs.
10.1016/j.procs.2019.11.047
562 Nataliya Shakhovska et al. / Procedia Computer Science 160 (2019) 561–566
2 Nataliya Shakhovska/ Procedia Computer Science 00 (2018) 000–000

unstructured data, media or random processes as they practically can't be processed traditionally. The traditional
monolithic systems are being replaced with the new asynchronous and parallel solutions. These new solutions
provide the ability to work with Big data [1].
Big Data information technology is the set of methods and means of processing different types of structured
(databases) and unstructured (text, stream) dynamic large amounts of data for their analysis and usage for decision
support. This technology is an alternative to traditional database management systems and Business Intelligence
solutions class. Besides, Big data technology can be used for parallel (distributed) data processing [2, 3]. The system
consists of several independent blocks that efficiently process information under conditions of continuous growth
and distribution throughout the multiple cluster nodes. In such systems, the volumes of information increase
exponentially, and unstructured data is the most significant part of the whole data. Therefore, the issues of a proper
interpretation of data flow in systems of such type become more and more urgent [1].
The subject of research is the methods and tools for building, editing and adapting the information flow in
distributed information systems.

2. State-of-art

In [1], the concept of Big data, criteria for their classification are given. The paper [3] considered Big data as a
revolutionary technology of innovation, competition, and productivity of the economy, a new resource for business.
The architecture, informational value for business and the impact of Big Data are given in [4]. The possibilities of
involving innovative Big Data to develop a business strategy are analyzed in [5].
The analysis of the methods of data consolidation is given in [6]. In [7] the authenticity, integration, scalability,
and confidentiality of "open" structured (databases) and unstructured (text) data from social networks are described.
The technic aspects of Big data realization are given in [8]. The method of intelligent data analysis is described in
[9]. The analysis of possibility of Big data implementation in medicine is given in [10, 11]. The information model
of cloud data warehouse and a possibility to implement it as part of Big data technology is provided in [12 – 14].
The big data usage for information analysis in a social network is given in [14]. The methods of deep learning and
machine learning can process Big data consists of different sources such as images, video, audio [13 – 17]. The
business and e-libraries are examples of Big data technologies usage too [18 – 23].
So, the partly solved tasks in Big data processing are: the biggest part of sources are unstructured data; there is
requires to time complexity, so the parallel data processing should be used.

3. Problem statement

The research task is to develop the model of Big data and information technology of distributed unstructured data
processing. To gain such a result, the following tasks must be solved in the paper:
1. To analyze the methods and principles of Big data processing;
2. To analyze existing technologies of Big data processing;
3. To carry out a comparative analysis of productivity of Hadoop and Spark platforms for unstructured data
processing;
4. To test the parallelized system in Scala.

4. Materials and Methods

The clustering is one of the ways to decrease the time complexity for Big data processing. Two variants of
scaling, i.e., horizontal and vertical scaling should be taken into account.
Horizontal scaling divides the data set and distributes the data over multiple servers, or shards. So, you can create
ten instances each with 1TB database. Each shard is an independent database, and collectively, the shards make up a
single logical database. The system should rely on asynchronous message communication to delimit the
components. The controlling of the loads, flows, and message queues should be provided in the system [2 – 4].
Nataliya Shakhovska et al. / Procedia Computer Science 160 (2019) 561–566 563
Nataliya Shakhovska/ Procedia Computer Science 00 (2018) 000–000 3

When we use the systems of this type, several problems arise with all clustered system nodes interworking. For
example, different applications require data access from different nodes. This makes clustered system operation
more complicated, but there is the possibility of vertical data scaling that provides access to data of all system nodes.
Hinchcliff divides the approaches to the Bigdata into three groups depending on the volume:
VolBD = { VolFD, VolBA, VolDI }, (1)
where VolFD is Fast Data: their volume is measured in terabytes; VolBA is Big Analytics; they are petabytes of data;
VolDI is Deep Insight; it is measured in exabytes, zettabytes.
Groups differ among themselves not only in the operating volumes of data but also in the quality of their
processing solutions. Processing information from different expressive power types of information sources, namely
structured, semi-structured, and unstructured is necessary for the Big data technology. A set of information products
is divided into three blocks:
Ip =  St, SemS, UnS, (2)

where St = DB, DW is structured data (databases, data warehouses); SemS = Wb, Tb is semistructured data
(XML, electronic worksheets); UnS = Nd is unstructured data (text) [10, 14].
The following technologies are used for Big data processing:
TBD = TNoSQL, TSQL, THadoop, TV, (3)
where TNoSQL is the technology of NoSQL databases; THadoop is the technology that ensures the massively-parallel
processing; TSQL is the technology of the structured data processing (SQL database); TV is the technology of the Big
data visualization [8, 11].
The main technologies of Big data processing are: NoSQL; MapReduce; Apache Hadoop; Apache Spark.
The information volume increasing problem cannot be solved using classical relational architectures. The
followers of the concept of NoSQL language emphasize that it is not a complete negation of SQL and the relational
model, but the project comes from the fact that SQL is essential and handy tool, that can not be considered as
universal. One problem that point for a classical relational database is a problem of dealing with massive data and
projects with a high load. The first objective approach is to extend the database if SQL flexible enough, and not
displace it wherever it is to perform its tasks. Also, relation approach does not support both types of scaling (vertical
and horizontal).
There are classical approaches and paradigms for the development of data processing facilities. MapReduce
paradigm is one of them [5]. This model of distributed data processing is suggested by Google to process the
significant volume of data on computing clusters. Cluster is several independent computers used together and
working as a single system.
MapReduce provides for data organizing in the form of lists that pass 3 stages of processing:
1. Map stage. At this stage, the data are processed with the help of the map() function defined by the user. The
operation is similar to the map() method in functional programming languages. The map function accepts the list
at the input and returns several key-value pairs.
2. Shuffle stage. At this stage the map function “is divided into buckets” – each bucket conforms to one map stage
key. Later on, these input buckets will serve for reduce() function.
3. Reduce stage. Reduce function defines the result for separate “buckets”.

At present, the Apache Hadoop MapReduce and Apache Spark technologies are a leader in the use of
MapReduce paradigm and creation of the software platform for the arrangement of the distributed processing of
large data volumes [8, 16 – 18].
Apache Hadoop MapReduce is a free platform for the arrangement of large data volumes processing (measured
in petabytes) using the MapReduce paradigm. This paradigm makes it possible to distribute the separate fragments,
each of which can be run at a separate cluster node. Hadoop includes implementation of the distributed Hadoop
HDFS file system, which automatically provides data backup and it is optimized for work with MapReduce. To
simplify the access to the data in Hadoop store, the SQL-like Hive language, which is kind of SQL for MapReduce,
was developed. The requests in this language can be parallelized and processed by several Hadoop platforms.
564 Nataliya Shakhovska et al. / Procedia Computer Science 160 (2019) 561–566
4 Nataliya Shakhovska/ Procedia Computer Science 00 (2018) 000–000

Compared to the previous Hadoop MapReduce, the Spark provides 100 times higher performance when the data
is processed in memory and 10 times higher performance when the data is located on discs. This mechanism is
fulfilled at Hadoop cluster nodes with the help of Hadoop YARN and in a separate mode. It supports the data
processing at HDFS, Cassandra [13] and Hive [11] store and in any Hadoop input format [6, 8].
The main difference between Spark and Hadoop MapReduce is that Spark stores information in computer
memory, providing in such a way the higher platform productivity, while Hadoop stores it on the disc, providing the
higher security level [18 – 19]. In addition to traditional features of Apache Hadoop MapReduce, namely,
processing of unstructured data, the Apache Spark platform includes Spark Streaming for working with
asynchronous streams, Mlib library for computer analysis and GraphX.

5. Experiment

Let us provide the following comparative analysis of productivity of both platforms in the execution time to the
number of iterations ratio (Fig. 1).

Fig. 1. Comparative analysis of productivity of Hadoop and Spark platforms

Spark provides API (Application Program Interface) in Scala, Java, Python and R programming languages. At
first, the Spark program creates the SparkContext object that shows the Spark method of access to the cluster. The
SparkConf object with the information about the application should be built to create the SparkContext.
The concept of Resilient Distributed Datasets (RDD) is the basis of Spark. It is a failure-resistant collection
(list) of elements, that is being processed in parallel. There are two ways to create RDD: parallelization of the
transmitted collection (list) in the program and reference to the external file system, such as HDFS (Hadoop
Distributed File System), or any other data source in Hadoop [5].
Let us divide the service structure into two parts. The first one is a Web page including the UI (User Interface)
with a form for document transmittal to the server and interfaces with data analysis after receiving the processed
data from the server. The second one is the API (Application Program Interface) of our system that will represent a
library of methods for acceptance, processing, analysis, and transmittal of data to the client.
We focused attention on the API systems when Apache Spark is used. The example is provided in Scala
language. To begin with, we set the cluster configuration and create the SparkContext. In the master code the
URL is a cluster configuration setting; setMaster(“local[*]”) means running of Spark locally with the
determined number of information streams according to the quantity of cores on a certain computer;
setMaster(spark://HOST:PORT) is a configuration for connection with external cluster.
We develop the method for file receiving from the client and checking of the file type (csv or xlsx). If so, the
file will be uploaded to the server and its name will be transmitted to the method parseAttachment(inputFile:
String).Otherwise, the method will return the warning.
During the next step each file element should be transmitted to the CSVReader constructor, it should be parsed,
and the raw content should be returned to Spark RDD. This process allows paralleling of data processing. After that
the collection (list) returns and it is transmitted to the toTransactions(data) constructor, and in such a way
the collection returns from transaction. After completion of this process, each element of the collection is
Nataliya Shakhovska et al. / Procedia Computer Science 160 (2019) 561–566 565
Nataliya Shakhovska/ Procedia Computer Science 00 (2018) 000–000 5

transmitted to the DAO.create(), i.e. it is stored in the database.

Let us provide the following Scala object for basic operations with the database: connect to the database, create
a table, delete a table, return the line through the id, return all elements of the database, update the line, remove the
line, and create the line.
At the next stage, we create the transaction class, which includes four fields: id, account, description (DESC),
code and amount. At the last step, an actor is created for asynchronous messaging to set the limit between
components.

6. Results

At the last stage, all elements are united to create the main class of application running. Scala library, namely
spray is used. It is necessary to run the server and deploy applications. Our issue is to create the configuration,
combine it with the database, create the service, actors system and run the HTTP server. Application operating
interface is shown in Fig. 4, the content of the database after the uploading of csv document with the data to the
server is shown on the left and right side.

Fig. 2. Application operating results

7. Discussion

The parallel method for file receiving from the client and checking of the file type (csv or xlsx) is developed.
Each file element is transmitted to the CSVReader constructor. The raw content after parsing is returned to Spark
RDD. Scala object for basic operations with the database is developed. It guarantees loosely coupled interface,
isolation, location transparency and provides means of errors or messages delegation.

8. Conclusions

The information technology for Big data parallel processing is developed. The analysis of the methods and
principles of Big data processing is given. The comparative analysis of the productivity of Hadoop and Spark
platforms for unstructured data processing is provided. An example of the application of the MapReduce paradigm,
loading large volumes of data, processing, and analysis of unstructured information and its distribution into a cluster
database is given. Examples of methods for working with unstructured data arrays are given. A parallel RDD system
is organized. The proposed working class of loneliness for basic database operations such as database connection,
table creation, spread-sheet, id readout, the return of all database elements, update, deletion, and line creation.
The parallelized system in Scala is developed and testing. This information technology allows us processing
566
6 Nataliya
Nataliya Shakhovska
Shakhovska/ et al. / Computer
Procedia Procedia Computer
Science 00Science
(2018)160 (2019) 561–566
000–000

structured, semi-structured and unstructured data and combining vertical and horizontal data scaling.

References

[1] Janssen, M., van der Voort, H., & Wahyudi, A. (2017). “Factors influencing big data decision-making quality”. Journal of Business Research,
70: 338-345.
[2] Shaw, J. (2014). “Why Big Data is a big deal”. Harvard Magazine, 3: 30-35.
[3] Daas, P. J., Puts, M. J., Buelens, B., & van den Hurk, P. A. (2015). “Big data as a source for official statistics”. Journal of Official Statistics,
31(2): 249-262.
[4] Shakhovska, N., Vovk, O., Hasko, R., Kryvenchuk, Y. (2018). “The Method of Big Data Processing for Distance Educational System”. In:
Shakhovska N., Stepashko V. (eds) Advances in Intelligent Systems and Computing II. 689: 461-473.
[5] De Mauro, A., Greco, M., & Grimaldi, M. (2016). “A formal definition of Big Data based on its essential features”. Library Review, 65(3):
122-135.
[6] Melnykova, N., Marikutsa, U., Kryvenchuk, U. (2018). “The New Approaches of Heterogeneous Data Consolidation”. Proceedings of the
13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), Lviv, September 2018
(1): 408-411.
[7] Ediger, D., Jiang, K., Riedy, J., Bader, D. A., Corley, C., Farber, R., Reynolds, W. N. (2010). “Massive social network analysis: Mining
twitter for social good”. Proceedings of the 39th International Conference on Parallel Processing (2010, September): 583-593.
[8] Chen, H., Chiang, R. H., Storey, V. C. (2012). “Business intelligence and analytics: from big data to big impact”. MIS quarterly: 1165-1188.
[9] Boyko, N. (2016). “A look trough methods of intellectual data analysis and their applying in informational systems”. Proceedings of the XIth
International Scientific and Technical Conference “Computer Sciences and Information Technologies (CSIT), Lviv, September 2016: 183-185.
[10] Das, N., Das, L., Rautaray, S. S., Pandey, M. (2018). “Big Data Analytics for Medical Applications”. International Journal of Modern
Education and Computer Science, 10(2): 35
[11] Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., ... & Murthy, R. (2009). “Hive: a warehousing solution over a map-
reduce framework”. Proceedings of the VLDB Endowment, 2(2): 1626-1629.
[12] Wang, C., Ren, K., Lou, W., & Li, J. (2010). “Toward publicly auditable secure cloud data storage services”. IEEE network, 24(4).
[13] Fedushko S., Shakhovska N., Syerov Yu. (2018) “Verifying the medical specialty from user profile of online community for health-related
advices”. Proceedings of the 1st International workshop on informatics & Data-driven medicine (IDDM 2018) Lviv, November 28–30, 2018.
2255: 301–310.
[14] Maass, W., Natschläger, T., & Markram, H. (2002). “Real-time computing without stable states: A new framework for neural computation
based on perturbations”. Neural computation, 14(11): 2531-2560
[15] Vitynskyi, P., Tkachenko, R., Izonin, I., Kutucu H. (2018) “Hybridization of the SGTM Neural-like Structure through Inputs Polynomial
Extension”. In Proceedings of the Second International Conference on Data Stream Mining Processing (DSMP), 386-391.
[16] Wang, G., & Tang, J. (2012, August). “The nosql principles and basic application of cassandra model”. In Proceedings of the 2012
International Conference Computer Science & Service System (CSSS), 1332-1335.
[17] Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., & Ghodsi, A. (2016). “Apache spark: a unified engine for big data
processing”. Communications of the ACM, 59(11): 56-65.
[18] Molnár, E., Molnár, R., Kryvinska, N., Greguš M. (2014) “Web Intelligence in practice”. The Society of Service Science, Journal of Service
Science Research, Springer, 6(1):149-172.
[19] Kryvinska, N. (2012) “Building Consistent Formal Specification for the Service Enterprise Agility Foundation”. The Society of Service
Science, Journal of Service Science Research, Springer, Vol. 4, No. 2, 2012, pp. 235-269.
[20] Gregus, M. Kryvinska, N. (2015) “Service Orientation of Enterprises - Aspects, Dimensions, Technologies”. Comenius University in
Bratislava, ISBN: 9788022339780.
[21] Kaczor, S., Kryvinska, N. (2013) “It is all about Services - Fundamentals, Drivers, and Business Models”. The Society of Service Science,
Journal of Service Science Research, Springer, 5(2): 125-154.
[22] Kryvinska, N., Gregus, M. (2014) “SOA and it's Business Value in Requirements, Features, Practices and Methodologies”. Comenius
University in Bratislava, ISBN: 9788022337649.
[23]. Rusyn, B., Vysotska, V., Pohreliuk, L.: “Model and architecture for virtual library information system”. In Proceedings of the 13th
International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), Lviv, September 2018 (1),
37-41

PySpark Data Frame Questions PDF
100% (1)
PySpark Data Frame Questions PDF
57 pages
Dice Resume CV Vijay Krishna
No ratings yet
Dice Resume CV Vijay Krishna
4 pages
j.ijdsa.20241005.11
No ratings yet
j.ijdsa.20241005.11
14 pages
Analysis of Frameworks and Technologies For Solving Big Data Storage and Processing Problems in Distributed Systems
No ratings yet
Analysis of Frameworks and Technologies For Solving Big Data Storage and Processing Problems in Distributed Systems
4 pages
Rao 2018
No ratings yet
Rao 2018
81 pages
The Big Data System, Components, Tools, and Technologies A Survey
No ratings yet
The Big Data System, Components, Tools, and Technologies A Survey
100 pages
Big Data: Insight: Mrs. S.V. Balshetwar, Dr. R.M.Tugnayat
No ratings yet
Big Data: Insight: Mrs. S.V. Balshetwar, Dr. R.M.Tugnayat
3 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Types of Digital Data: Unit 1 Big Data KCS-061
No ratings yet
Types of Digital Data: Unit 1 Big Data KCS-061
12 pages
Data Modeling Overview
No ratings yet
Data Modeling Overview
18 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
big data unit 1
No ratings yet
big data unit 1
24 pages
Hamid Seminar Doc
No ratings yet
Hamid Seminar Doc
57 pages
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
No ratings yet
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
6 pages
Bigdata
No ratings yet
Bigdata
12 pages
Big Data Service Architecture: A Survey
No ratings yet
Big Data Service Architecture: A Survey
14 pages
BIG Data_Unit_1
No ratings yet
BIG Data_Unit_1
24 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Ashish_Presentation_Stage1_modify_LR
No ratings yet
Ashish_Presentation_Stage1_modify_LR
24 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
Jsaer2016 03 01 21 24
No ratings yet
Jsaer2016 03 01 21 24
4 pages
Mtech Scheme
No ratings yet
Mtech Scheme
54 pages
Review of Recent Technologies in Big Data Analysis
No ratings yet
Review of Recent Technologies in Big Data Analysis
3 pages
Hand Book: Ahmedabad Institute of Technology
No ratings yet
Hand Book: Ahmedabad Institute of Technology
103 pages
Ds Using RR
No ratings yet
Ds Using RR
6 pages
Big Data Analytics and Its Applications
No ratings yet
Big Data Analytics and Its Applications
4 pages
What Is Big Data
No ratings yet
What Is Big Data
18 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
UNIT1 -BDH
No ratings yet
UNIT1 -BDH
77 pages
ucPDF (14)
No ratings yet
ucPDF (14)
10 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
BDA-1
No ratings yet
BDA-1
26 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
27 pages
Seminar_Report kiran
No ratings yet
Seminar_Report kiran
14 pages
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
No ratings yet
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
8 pages
Session 8 - George Strawn - Big Data
No ratings yet
Session 8 - George Strawn - Big Data
34 pages
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
No ratings yet
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
3 pages
Big Data Analytics Using Apache Hadoop
No ratings yet
Big Data Analytics Using Apache Hadoop
33 pages
Big Data For Org
No ratings yet
Big Data For Org
10 pages
Big Data Analytics Notess
No ratings yet
Big Data Analytics Notess
69 pages
hadoop-big-data-unit-2
No ratings yet
hadoop-big-data-unit-2
23 pages
Big Data Analytics On Large Scale Shared Storage System: First Seminar
No ratings yet
Big Data Analytics On Large Scale Shared Storage System: First Seminar
22 pages
Module 1
No ratings yet
Module 1
54 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
BIG DATA ANALYTICS
No ratings yet
BIG DATA ANALYTICS
10 pages
Exploration On Big Data Oriented Data Analyzing and Processing Technology
No ratings yet
Exploration On Big Data Oriented Data Analyzing and Processing Technology
7 pages
BD Case Study
No ratings yet
BD Case Study
3 pages
A Review Paperbased On Big Data Analytics: Rashmi
No ratings yet
A Review Paperbased On Big Data Analytics: Rashmi
7 pages
Big Data Analytics Unit-1
100% (1)
Big Data Analytics Unit-1
5 pages
BDA1-4 bunits
No ratings yet
BDA1-4 bunits
113 pages
Experiment No _ 1 Bda
No ratings yet
Experiment No _ 1 Bda
10 pages
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
12 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Detailednotes_unit1_Big Data
No ratings yet
Detailednotes_unit1_Big Data
22 pages
Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
No ratings yet
Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
5 pages
Big Data Introduction
No ratings yet
Big Data Introduction
7 pages
Introduction To Big Data-0
No ratings yet
Introduction To Big Data-0
77 pages
Mastering Vector Databases: The Future of Data Retrieval and AI
From Everand
Mastering Vector Databases: The Future of Data Retrieval and AI
Robert Johnson
No ratings yet
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
Data Science: Lecture #1
No ratings yet
Data Science: Lecture #1
22 pages
CS8081-Internet of Things
No ratings yet
CS8081-Internet of Things
11 pages
BDA UNIT-2 (Final)
No ratings yet
BDA UNIT-2 (Final)
27 pages
Social Network Analysis
No ratings yet
Social Network Analysis
22 pages
MapReduce Exam 2019 - Solved Paper
No ratings yet
MapReduce Exam 2019 - Solved Paper
25 pages
Big Data Workshop Contents
No ratings yet
Big Data Workshop Contents
2 pages
Final HBD Lab Manual
No ratings yet
Final HBD Lab Manual
47 pages
Wiley Management PDF
No ratings yet
Wiley Management PDF
52 pages
Resume
No ratings yet
Resume
3 pages
BigData and Hadoop - Syllabus
No ratings yet
BigData and Hadoop - Syllabus
2 pages
Module 5_Mahout
No ratings yet
Module 5_Mahout
20 pages
MapReduce Book Final
No ratings yet
MapReduce Book Final
175 pages
A Survey On Data Anonymization For Big Data Security
No ratings yet
A Survey On Data Anonymization For Big Data Security
4 pages
Spark Details
No ratings yet
Spark Details
11 pages
1 Introduction To Big Data Management and Processing
No ratings yet
1 Introduction To Big Data Management and Processing
42 pages
Baze de Date
No ratings yet
Baze de Date
17 pages
Dav Yeshiva University
No ratings yet
Dav Yeshiva University
37 pages
Capstone Project
No ratings yet
Capstone Project
57 pages
Amazon Elastic MapReduce PDF
No ratings yet
Amazon Elastic MapReduce PDF
231 pages
Unit - 4-Cloud
No ratings yet
Unit - 4-Cloud
122 pages
ASSIGNMENT-3 BDA
No ratings yet
ASSIGNMENT-3 BDA
5 pages
Wood 2017
100% (1)
Wood 2017
8 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
Amazon Emr Management Guide
No ratings yet
Amazon Emr Management Guide
314 pages
Rinda Priya - Business Analyst
No ratings yet
Rinda Priya - Business Analyst
6 pages
The Perfect Artificial Intelligence Guide-With-Link PDF
No ratings yet
The Perfect Artificial Intelligence Guide-With-Link PDF
13 pages
CV - Chetan Shah - Profile - May 2019
No ratings yet
CV - Chetan Shah - Profile - May 2019
8 pages
CP5261 Data Analytics Laboratory LTPC0042 Objectives
No ratings yet
CP5261 Data Analytics Laboratory LTPC0042 Objectives
80 pages