0% found this document useful (0 votes)

39 views20 pages

Big Data Unit 1

Uploaded by

pegih51730

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views20 pages

Big Data Unit 1

Uploaded by

pegih51730

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

lOMoARcPSD|50136394

BIG DATA Notes UNIT - 1

Big Data (Rajiv Gandhi Proudyogiki Vishwavidyalaya)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by yomos cantozil ([email protected])
lOMoARcPSD|50136394

Indore Institute of Science and Technology

Department of Computer Science and Engineering

BIG DATA
UNIT - 1 NOTES
BIG DATA

Big data is high-velocity and high-variety information assets that demand cost effective,
innovative forms of information processing for enhanced insight and decision making.
Big data refers to datasets whose size is typically beyond the storage capacity of and also
complex for traditional database software tools
Big data is anything beyond the human & technical infrastructure needed to support storage,
processing and analysis.

THREE V’s OF BIG DATA

It is data that is big in volume, velocity and variety.

Variety: Data can be structured data, semi-structured data and unstructured data. Data stored in a
database is an example of structured data.HTML data, XML data, email data,

CSV files are examples of semi-structured data. Powerpoint presentation, images, videos,
researches, white papers, body of email etc are the examples of unstructured data.

Velocity: Velocity essentially refers to the speed at which data is being created in real- time. We
have moved from simple desktop applications like payroll applications to real- time processing
applications.

Volume: Volume can be in Terabytes or Petabytes or Zettabytes. Gartner Glossary Big data is
high-volume, high-velocity and/or high variety information assets that demand cost-effective,
innovative forms of information processing that enable enhanced insight and decision making.

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

Data generates information and from information we can draw valuable insight. As depicted in
Figure, digital data can be broadly classified into structured, semi-structured, and unstructured
data.

TYPES OF BIG DATA

1. Unstructured data: This is the data which does not conform to a data model or is not in a
form which can be used easily by a computer program. About 80% data of an organization is in
this format; for example, memos, chat rooms, PowerPoint presentations, images, videos, letters.
research, white papers, body of an email, etc.

2. Semi-structured data: Semi-structured data is also referred to as selfdescribing structure.

This is the data which does not conform to a data model but has some structure. However, it is
not in a form which can be used easily by a computer program. About 10% data of an
organization is in this format; for example, HTML, XML, JSON, email data etc. Figure 1.1
classification of digital data

Fig: Classification of Digital Data

3. Structured data: When data follows a predefined schema/structure we say it is structured
data. This is the data which is in an organized form (e.g., in rows and columns) and can be easily
used by a computer program. Relationships exist between entities of data, such as classes and

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

their objects. About 10% data of an organization is in this format. Data stored in databases is an
example of structured data.

INTRODUCTION TO BIG DATA

The "Internet of Things" and its widely ultra-connected nature are leading to a burgeoning rise in
big data. There is no dearth of data for today's enterprise. On the contrary, they are mired in data
and quite deep at that. That brings us to the following questions:

1. Why is it that we cannot forgo big data?

2. How has it come to assume such magnanimous importance in running business?
3. How does it compare with the traditional Business Intelligence (BI) environment?
4. Is it here to replace the traditional, relational database management system and data warehouse
environment or is it likely to complement their existence?"

Data is widely available. What is scarce is the ability to draw valuable insight. Some examples of
Big Data:
• There are some examples of Big Data Analytics in different areas such as retail, IT
infrastructure, and social media.
• Retail: As mentioned earlier, Big Data presents many opportunities to improve sales and
marketing analytics.
• An example of this is the U.S. retailer Target. After analyzing consumer purchasing behavior,
Target's statisticians determined that the retailer made a great deal of money from three main
life-event situations.
• Marriage, when people tend to buy many new products
• Divorce, when people buy new products and change their spending habits
• Pregnancy, when people have many new things to buy and have an urgency to buy them. The
analysis target to manage its inventory, knowing that there would be demand for specific
products and it would likely vary by month over the coming nine- to ten-month cycles

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

• IT infrastructure: MapReduce paradigm is an ideal technical framework for many Big Data
projects, which rely on large data sets with unconventional data structures.
• One of the main benefits of Hadoop is that it employs a distributed file system, meaning it can
use a distributed cluster of servers and commodity hardware to process large amounts of data.
Some of the most common examples of Hadoop implementations are in the social media space,
where Hadoop can manage transactions, give textual updates, and develop social graphs among
millions of users.

Twitter and Facebook generate massive amounts of unstructured data and use Hadoop and its
ecosystem of tools to manage this high volume.

Social media: It represents a tremendous opportunity to leverage social and professional

interactions to derive new insights. LinkedIn represents a company in which data itself is the
product. Early on, Linkedin founder Reid Hoffman saw the opportunity to create a social
network for working professionals. As of 2014, Linkedin has more than 250 million user
accounts and has added many additional features and data-related products, such as recruiting,
job seeker tools, advertising, and lnMaps, which show a social graph of a user's professional
network

CHARACTERISTICS OF DATA

1. Composition: The composition of data deals with the structure of data, that is, the sources of
data, the granularity, the types, and the nature of data as to whether it is static or real-time
streaming.

2. Condition: The condition of data deals with the state of data, that is, "Can one use this data as
is for analysis?" or "Does it require cleansing for further enhancement and enrichment?"

3. Context: The context of data deals with "Where has this data been generated?" "Why was this
data generated?" How sensitive is this data?"

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

"What are the events associated with this data?" and so on. Small data (data as it existed prior to
the big data revolution) is about certainty. It is about known datasources; it is about no major
changes to the composition or context of data.
Most often we have answers to queries like why this data was generated, where and when it was
generated, exactly how we would like to use it, what questions will this data be able to answer,
and so on. Big data is about complexity. Complexity in terms of multiple and unknown datasets,
in terms of exploding volume, in terms of speed at which the data is being generated and the
speed at which it needs to be processed and in terms of the variety of data (internal or external,
behavioral or social) that is being generated.

EVOLUTION OF BIG DATA

1970s and before was the era of mainframes. The data was essentially primitive and structured.
Relational databases evolved in the 1980s and 1990s. The era was of data intensive applications.
The World Wide Web (WWW) and the Internet of Things (IOT) have led to an onslaught of
structured, unstructured, and multimedia data. Refer Table

Fig: The evaluation of big data

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

CHALLENGES WITH BIG DATA

Fig: Challenges in Big Data

Data volume: Data today is growing at an exponential rate. This high tide of data will continue
to rise continuously. The key questions are – “will all this data be useful for analysis?”, “Do we
work with all this data or subset of it?”, “How will we separate the knowledge from the noise?”
etc.

Storage: Cloud computing is the answer to managing infrastructure for big data as far as
cost-efficiency, elasticity and easy upgrading / downgrading is concerned. This further
complicates the decision to host big data solutions outside the enterprise.

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

Data retention: How long should one retain this data? Some data may require for log-term
decision, but some data may quickly become irrelevant and obsolete.
Skilled professionals: In order to develop, manage and run those applications that generate
insights, organizations need professionals who possess a high-level proficiency in data sciences.

Other challenges: Other challenges of big data are with respect to capture, storage, search,
analysis, transfer and security of big data.

Visualization: Big data refers to datasets whose size is typically beyond the storage capacity of
traditional database software tools. There is no explicit definition of how big the data set should
be for it to be considered big data. Data visualization(computer graphics) is becoming popular as
a separate discipline. There are very few data visualization experts.

WHY BIG DATA?

The more data we have for analysis, the greater will be the analytical accuracy and the greater
would be the confidence in our decisions based on these analytical findings. The analytical
accuracy will lead to a greater positive impact in terms of enhancing operational efficiencies,
reducing cost and time, and originating new products, new services, and optimizing existing
services.

Fig: Why Big data? (Big data and Analytics)

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

DATA WAREHOUSE ENVIRONMENT

The data from these sources may differ in format. Operational or transactional or day-to-day
business data is gathered from Enterprise Resource Planning (ERP) systems, Customer
Relationship Management (CRM), Legacy systems, and several third-party applications. The
data from these sources may differ in format.
This data is then integrated, cleaned up, transformed, and standardized through the process of
Extraction, Transformation, and Loading (ETL).
The transformed data is then loaded into the enterprise data warehouse (available at the
enterprise level) or data marts (available at the business unit/ functional unit or business process
level).
Business intelligence and analytics tools are then used to enable decision making from the use of
ad-hoc queries, SQL, enterprise dashboards, data mining, Online Analytical Processing etc.
Refer Figure

Fig: Data warehouse environment

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

TRADITIONAL BUSINESS INTELLIGENCE (BI) VERSUS BIG DATA

Following are the differences that one encounters dealing with traditional Bl and big data.

In a traditional BI environment, all the enterprise's data is housed in a central server whereas in a
big data environment data resides in a distributed file system. The distributed file system scales
by scaling in(decrease) or out(increase) horizontally as compared to a typical database server that
scales vertically.

In traditional BI, data is generally analyzed in an offline mode whereas in big data, it is analyzed
in both real-time streaming as well as in offline mode.

Traditional Bl is about structured data and it is here that data is taken to processing functions
(move data to code) whereas big data is about variety: Structured, semi- structured, and
unstructured data and here the processing functions are taken to the data (move code to data).

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

TECHNOLOGIES AVAILABLE FOR BIG DATA

What is Big Data Technology?

Big data technology is defined as software-utility. This technology is primarily designed to

analyze, process and extract information from a large data set and a huge set of extremely
complex structures. This is very difficult for traditional data processing software to deal with.

Among the larger concepts of rage in technology, big data technologies are widely associated
with many other technologies such as deep learning, machine learning, artificial intelligence
(AI), and Internet of Things (IoT) that are massively augmented. In combination with these
technologies, big data technologies are focused on analyzing and handling large amounts of
real-time data and batch-related data.

Types of Big Data Technology

Before we start with the list of big data technologies, let us first discuss this technology's board
classification. Big Data technology is primarily classified into the following two types:

Operational Big Data Technologies

This type of big data technology mainly includes the basic day-to-day data that people used to
process. Typically, the operational-big data includes daily basis data such as online transactions,
social media platforms, and the data from any particular organization or a firm, which is usually
needed for analysis using the software based on big data technologies. The data can also be
referred to as raw data used as the input for several Analytical Big Data Technologies.

Some specific examples that include the Operational Big Data Technologies can be listed as
below:

○ Online ticket booking system, e.g., buses, trains, flights, and movies, etc.

○ Online trading or shopping from e-commerce websites like Amazon, Flipkart, Walmart,
etc.

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

○ Online data on social media sites, such as Facebook, Instagram, Whatsapp, etc.

○ The employees' data or executives' particulars in multinational companies.

Analytical Big Data Technologies

Analytical Big Data is commonly referred to as an improved version of Big Data Technologies.
This type of big data technology is a bit complicated when compared with operational-big data.
Analytical big data is mainly used when performance criteria are in use, and important real-time
business decisions are made based on reports created by analyzing operational-real data. This
means that the actual investigation of big data that is important for business decisions falls under
this type of big data technology.

Some common examples that involve the Analytical Big Data Technologies can be listed as
below:

○ Stock marketing data

○ Weather forecasting data and the time series analysis

○ Medical health records where doctors can personally monitor the health status of an
individual

○ Carrying out the space mission databases where every information of a mission is very
important

Top Big Data Technologies

We can categorize the leading big data technologies into the following four sections:

○ Data Storage

○ Data Mining

○ Data Analytics

○ Data Visualization

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

INFRASTRUCTURE OF BIG DATA

What is big data infrastructure

Big data infrastructure is what it sounds like: The IT infrastructure that hosts your “big data.”
(Keep in mind that what constitutes big data depends on a lot of factors; the data need not be
enormous in size to qualify as “big.”)

More specifically, big data infrastructure entails the tools and agents that collect data, the
software systems and physical storage media that store it, the network that transfers it, the
application environments that host the analytics tools that analyze it and the backup or archive
infrastructure that backs it up after analysis is complete.

Lots of things can go wrong with these various components. Below are the most common
problems you may experience that delay or prevent you from transforming big data into value.

Slow storage media

Disk I/O bottlenecks are one common source of delays in data processing. Fortunately, there are
some tricks that you can use to minimize their impact.

One solution is to upgrade your data infrastructure solid-state disks (SSDs), which typically run
faster. Alternatively, you could use in-memory data processing, which is much faster than relying
on conventional storage.

SSDs and in-memory storage are more costly, of course, especially when you use them at scale.
But that does not mean you can’t take advantage of them strategically in a cost-effective way:
Consider deploying SSDs or in-memory data processing for workloads that require the highest
speed, but sticking with conventional storage where the benefits of faster I/O won’t outweigh the
costs.

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

Lack of scalability

If your data infrastructure can’t increase in size as your data needs grow, it will undercut your
ability to turn data into value.

At the same time, of course, you don’t want to maintain substantially more big data infrastructure
than you need today just so that it’s there for the future. Otherwise, you will be paying for
infrastructure you’re not currently using, which is not a good use of money.

One way to help address this challenge is to deploy big data workloads in the cloud, where you
can increase the size of your infrastructure virtually instantaneously when you need it, without
paying for it when you don’t. If you prefer not to shift all of your big data workloads to the
cloud, you might also consider keeping most workloads on-premise, but having a cloud
infrastructure set up and ready to handle “spillover” workloads when they arise—at least until
you can create a new on-premise infrastructure to handle them permanently.

Slow network connectivity

If your data is large in size, transferring it across the network can take time—especially if
network transfers require using the public internet, where bandwidth tends to be much more
limited than it is on internal company networks.

Paying for more bandwidth is one way to mitigate this problem, but that will only get you so far
(and it will cost you). A better approach is to architect your big data infrastructure in a way that
minimizes the amount of data transfer that needs to occur over the network. You could do this by,
for example, using cloud-based analytics tools to analyze data that is collected in the cloud,
rather than downloading that data to an on-premise location first. (The same logic applies in
reverse: If your data is born or collected on-premise, analyze it there.)

Sub-optimal data transformation

Getting data from the format in which it is born into the format that you need to analyze it or
share it with others can be very tricky. Most applications structure data in ways that work best for

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

them, with little consideration of how well those structures work for other applications or
contexts.

This is why data transformation is so important. Data transformation allows you to convert data
from one format to another.
When done incorrectly—which means manually and in ways that do not control for data
quality—data transformation can quickly cause more trouble than it is worth. But when you
automate data transformation and ensure the quality of the resulting data, you maximize your
data infrastructure’s ability to meet your big data needs, no matter how your infrastructure is
constructed.

USE OF DATA ANALYTICS IN BIG DATA

Big data analytics is the often complex process of examining large and varied data sets - or big
data - that has been generated by various sources such as eCommerce, mobile devices, social
media and the Internet of Things (IoT). It involves integrating different data sources,
transforming unstructured data into structured data, and generating insights from the data using
specialized tools and techniques that spread out data processing over an entire network. The
amount of digital data that exists is growing at a fast pace, doubling every two years. Big data
analytics is the solution that came with a different approach for managing and analyzing all of
these data sources. While the principles of traditional data analytics generally still apply, the
scale and complexity of big data analytics required the development of new ways to store and
process the petabytes of structured and unstructured data involved. The demand for faster speeds
and greater storage capacities created a technological vacuum that was soon filled by new
storage methods, such as data warehouses and data lakes, and nonrelational databases like
NoSQL, as well as data processing and data management technologies and frameworks, such as
open source Apache Hadoop, Spark, and Hive. Big data analytics takes advantage of advanced
analytic techniques to analyze really big data sets that include structured, semi-structured and
unstructured data, from various sources, and in different sizes from terabytes to zettabytes.

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

The Most Common Data Types Involved in Big Data Analytics Include:

● Web data. Customer level web behavior data such as visits, page views, searches,
purchases, etc.

● Text data. Data generated from sources of text including email, news articles, Facebook
feeds, Word documents, and more is one of the biggest and most widely used types of
unstructured data.

● Time and location, or geospatial data. GPS and cell phones, as well as Wi-Fi
connections, make time and location information a growing source of interesting data.
This can also include geographic data related to roads, buildings, lakes, addresses,
people, workplaces, and transportation routes, which have been generated from
geographic information systems.

● Real-time media. Real-time data sources can include real-time streaming or event-based
data.

● Smart grid and sensor data. Sensor data from cars, oil pipelines, windmill turbines, and
other sensors is often collected at extremely high frequency.

● Social network data. Unstructured text (comments, likes, etc.) from social network sites
like Facebook, LinkedIn, Instagram, etc. is growing. It is even possible to do link
analysis to uncover the network of a given user.

● Linked data: this type of data has been collected using standard Web technologies like
HTTP, RDF, SPARQL, and URLs.

● Network data. Data related to very large social networks, like Facebook and Twitter, or
technological networks such as the Internet, telephone and transportation networks.

Big data analytics helps organizations harness their data and use advanced data science
techniques and methods, such as natural language processing, deep learning, and machine
learning, uncovering hidden patterns, unknown correlations, market trends and customer
preferences, to identify new opportunities and make more informed business decisions.

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

Advantages of Using Big Data Analytics Include:

● Cost reduction. Cloud computing and storage technologies, such as Amazon Web
Services (AWS) and Microsoft Azure, as well as Apache Hadoop, Spark, and Hive can
help companies decrease their expenses when it comes to storing and processing large
data sets.

● Improved decision making.With the speed of Spark and in-memory analytics, combined
with the ability to quickly analyze new sources of data, businesses can generate
immediate and actionable insights needed to make decisions in real time.

● New products and services. With the help of big data analytics tools, companies can
more precisely analyze customer needs, making it easier to give customers what they
want in terms of products and services.

● Fraud detection. Big data analytics is also used to prevent fraud, mainly in the financial
services industry, but it is gaining importance and usage across all verticals.

DESIRED PROPERTIES OF BIG DATA SYSTEMS

The properties you should strive for in Big Data systems are as much about complexity as they
are about scalability. Not only must a Big Data system perform well and be resource - efficient, it
must be easy to reason about as well. Let’s go over each property one by one.

Robustness and fault tolerance

Building systems that “do the right thing” is difficult in the face of the challenges of distributed
systems. Systems need to behave correctly despite machines going down randomly, the complex
semantics of consistency in distributed databases, duplicated data, concurrency, and more. These
challenges make it difficult even to reason about what a system is doing. Part of making a Big
Data system robust is avoiding these complexities so that you can easily reason about the system.

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

It’s imperative for systems to be human-fault tolerant. This is an oft-overlooked property of

systems that we’re not going to ignore. In a production system, it’s inevitable that someone will
make a mistake sometime, such as by deploying incorrect code that corrupts values in a database.
If you build immutability and recomputation into the core of a Big Data system, the system will
be innately resilient to human error by providing a clear and simple mechanism for recovery.

Low latency reads and updates

The vast majority of applications require reads to be satisfied with very low latency, typically
between a few milliseconds to a few hundred milliseconds. On the other hand, the update latency
requirements vary a great deal between applications. Some applications require updates to
propagate immediately, but in other applications a latency of a few hours is fine. Regardless, you
need to be able to achieve low latency updates when you need them in your Big Data systems.
More importantly, you need to be able to achieve low latency reads and updates without
compromising the robustness of the system.

Scalability

Scalability is the ability to maintain performance in the face of increasing data or load by adding
resources to the system. The Lambda Architecture is horizontally scalable across all layers of the
system stack: scaling is accomplished by adding more machines.

Generalization

A general system can support a wide range of applications. Because the Lambda Architecture is
based on functions of all data, it generalizes to all applications, whether financial management
systems, social media analytics, scientific applications, social networking, or anything else.

Extensibility

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

You don’t want to have to reinvent the wheel each time you add a related feature or make a
change to how your system works. Extensible systems allow functionality to be added with a
minimal development cost.

Often a new feature or a change to an existing feature requires a migration of old data into a new
format. Part of making a system extensible is making it easy to do large-scale migrations. Being
able to do big migrations quickly and easily is core to the approach you’ll learn.

Ad hoc queries

Being able to do ad hoc queries on your data is extremely important. Nearly every large dataset
has unanticipated value within it. Being able to mine a dataset arbitrarily gives opportunities for
business optimization and new applications. Ultimately, you can’t discover interesting things to
do with your data unless you can ask arbitrary questions of it.

Minimal maintenance

Maintenance is a tax on developers. Maintenance is the work required to keep a system running
smoothly. This includes anticipating when to add machines to scale, keeping processes up and
running, and debugging anything that goes wrong in production.

An important part of minimizing maintenance is choosing components that have as little

implementation complexity as possible. You want to rely on components that have simple
mechanisms underlying them. In particular, distributed databases tend to have very complicated
internals. The more complex a system, the more likely something will go wrong, and the more
you need to understand about the system to debug and tune it.

You combat implementation complexity by relying on simple algorithms and simple

components. A trick employed in the Lambda Architecture is to push complexity out of the core
components and into pieces of the system whose outputs are discardable after a few hours. The

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

lOMoARcPSD|50136394

most complex components used, like read/write distributed databases, are in this layer where
outputs are eventually discardable.

Debuggability

A Big Data system must provide the information necessary to debug the system when things go
wrong. The key is to be able to trace, for each value in the system, exactly what caused it to have
that value.

“Debuggability” is accomplished in the Lambda Architecture through the functional nature of the
batch layer and by preferring to use recomputation algorithms when possible.

Dr. Sathish Kumar

Associate Professor & Head

Downloaded by yomos cantozil ([email protected])

Big Data Analytics - Complete Notes
No ratings yet
Big Data Analytics - Complete Notes
136 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Big Data Study 1
No ratings yet
Big Data Study 1
77 pages
BigData - BCom Unit 1
No ratings yet
BigData - BCom Unit 1
9 pages
Converted 4011171
No ratings yet
Converted 4011171
144 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
BigData BCom
No ratings yet
BigData BCom
57 pages
BigData Hadoop Notes
No ratings yet
BigData Hadoop Notes
101 pages
Unit 1
No ratings yet
Unit 1
107 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Unit 1
No ratings yet
Unit 1
54 pages
Unit 1
No ratings yet
Unit 1
24 pages
BDCC Unit 1
No ratings yet
BDCC Unit 1
165 pages
SAP731780 - ALP - SAP SuccessFactors Data Migration Strategy PUBLIC
No ratings yet
SAP731780 - ALP - SAP SuccessFactors Data Migration Strategy PUBLIC
39 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Fundamentals of Power Query and M - A Detailed Guide
No ratings yet
Fundamentals of Power Query and M - A Detailed Guide
34 pages
Bda M1
No ratings yet
Bda M1
111 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
115 pages
Unit 1
No ratings yet
Unit 1
20 pages
Bigdata Units
No ratings yet
Bigdata Units
80 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
BIG Data Analytics
No ratings yet
BIG Data Analytics
17 pages
Course Material
100% (1)
Course Material
57 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
36 pages
Unit I - BDA
No ratings yet
Unit I - BDA
12 pages
R19 Bda Unit-1
No ratings yet
R19 Bda Unit-1
22 pages
Lec 1 - Introduction To Big Data
No ratings yet
Lec 1 - Introduction To Big Data
37 pages
UNIT - 1 - DA - Notes
No ratings yet
UNIT - 1 - DA - Notes
51 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Unit I Bda
No ratings yet
Unit I Bda
18 pages
Bda U1
No ratings yet
Bda U1
78 pages
Module 1
No ratings yet
Module 1
14 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Unit-I Bdaur-Bcom
No ratings yet
Unit-I Bdaur-Bcom
5 pages
Module 1.1 - Introduction To Big Data
No ratings yet
Module 1.1 - Introduction To Big Data
18 pages
Unit 1 Notes Bda
No ratings yet
Unit 1 Notes Bda
20 pages
Big Data12
No ratings yet
Big Data12
11 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
What Is Big Data
No ratings yet
What Is Big Data
18 pages
Unit 1 (Chapter 1) - Introduction
No ratings yet
Unit 1 (Chapter 1) - Introduction
10 pages
IMTC634 - Data Science - Chapter 11
No ratings yet
IMTC634 - Data Science - Chapter 11
22 pages
Big Data Intro
No ratings yet
Big Data Intro
12 pages
BDA Notes
No ratings yet
BDA Notes
35 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
37 pages
Unit 1 BIGDATA - 702 (D) CSE
No ratings yet
Unit 1 BIGDATA - 702 (D) CSE
20 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Unit I-KCS-061
No ratings yet
Unit I-KCS-061
42 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
38 pages
Big Data UNIT1
No ratings yet
Big Data UNIT1
23 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
117769
No ratings yet
117769
20 pages
Unit 1 and Unit 2 Notes Bda
No ratings yet
Unit 1 and Unit 2 Notes Bda
11 pages
RACI Matrix
No ratings yet
RACI Matrix
3 pages
ccs341 Data Warehouse Lab Experiments
No ratings yet
ccs341 Data Warehouse Lab Experiments
26 pages
Da Notes (Big Data) PDF
No ratings yet
Da Notes (Big Data) PDF
32 pages
Report of Big Data
No ratings yet
Report of Big Data
14 pages
Big Data
No ratings yet
Big Data
7 pages
Preparing For Your Professional Data Engineer Journey - T-GCPPDE-A-m1-l7-file-en-13
No ratings yet
Preparing For Your Professional Data Engineer Journey - T-GCPPDE-A-m1-l7-file-en-13
32 pages
Data Transformation and Standardization
No ratings yet
Data Transformation and Standardization
5 pages
Project Report Edit
No ratings yet
Project Report Edit
20 pages
System Design Cheat Sheet
No ratings yet
System Design Cheat Sheet
8 pages
Excel Power Query and Power Pivot
No ratings yet
Excel Power Query and Power Pivot
7 pages
Module-1 DM
No ratings yet
Module-1 DM
15 pages
InfoSphere DataStage Balanced Optimization
No ratings yet
InfoSphere DataStage Balanced Optimization
17 pages
Data Mining UNIT II
No ratings yet
Data Mining UNIT II
19 pages
Unit 4
No ratings yet
Unit 4
42 pages
A216 DWM EXP 2b
No ratings yet
A216 DWM EXP 2b
33 pages
Data Transformation
No ratings yet
Data Transformation
12 pages
Data Transformation in Data Mining
No ratings yet
Data Transformation in Data Mining
6 pages
Oracle - Actualtests.1z0 448.v2018!11!26.by - Colin.49q
No ratings yet
Oracle - Actualtests.1z0 448.v2018!11!26.by - Colin.49q
22 pages
Sahithi Devi
No ratings yet
Sahithi Devi
10 pages
Unit 4
No ratings yet
Unit 4
11 pages
M-II FDS U-II Questions
No ratings yet
M-II FDS U-II Questions
43 pages
Wa0008.
No ratings yet
Wa0008.
19 pages
Knowledge Discovery Process
No ratings yet
Knowledge Discovery Process
15 pages
SAP CPI-DS Course Outline
No ratings yet
SAP CPI-DS Course Outline
5 pages
Us From Chaos To Insights
No ratings yet
Us From Chaos To Insights
8 pages
Dadm (1) Sidra
No ratings yet
Dadm (1) Sidra
9 pages
Nikhil Raut Data Analyst 2 Years
No ratings yet
Nikhil Raut Data Analyst 2 Years
2 pages
Floating - Point Operations - Data Operations and Other: Application Instructions
No ratings yet
Floating - Point Operations - Data Operations and Other: Application Instructions
7 pages
CDA 3 Overview
No ratings yet
CDA 3 Overview
4 pages
Maiora's Zarus Data Suite - Insurance Use Case
No ratings yet
Maiora's Zarus Data Suite - Insurance Use Case
2 pages
XXXX - GCP Resume
No ratings yet
XXXX - GCP Resume
2 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet