0% found this document useful (0 votes)

24 views18 pages

Big Data 101

Uploaded by

Arshad Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views18 pages

Big Data 101

Uploaded by

Arshad Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Big Data 101

 The story of Big Data

 Big Data Driving Factors

 What is Big Data?

 Big Data Characteristics

 Types of Big Data

 Examples of Big Data

 Applications of Big Data

 Challenges with Big Data

 Advantages Of Big Data

 Big Data Tools

 Big Data Job Type

1. The story of Big Data

In ancient days, people used to travel from one village to another village on a horse driven cart, but
as the time passed, villages became towns and people spread out. The distance to travel from one
town to the other town also increased. So, it became a problem to travel between towns, along with
the luggage. Out of the blue, one smart fella suggested, we should groom and feed a horse more, to
solve this problem. When I look at this solution, it is not that bad, but do you think a horse can
become an elephant? I don’t think so. Another smart guy said, instead of 1 horse pulling the cart, let
us have 4 horses to pull the same cart. What do you guys think of this solution? I think it is a fantastic
solution. Now, people can travel large distances in less time and even carry more luggage.

The same concept applies to Big Data. Big Data says, till today, we were okay with storing the data
into our servers because the volume of the data was pretty limited, and the amount of time to
process this data was also okay. But now in this current technological world, the data is growing too
fast and people are relying on the data a lot of times. Also, the speed at which the data is growing, it
is becoming impossible to store the data into any server.

Through this blog on Big Data Tutorial, let us explore the sources of Big Data, which the traditional
systems are failing to store and process.
2. Big Data Driving Factors

The quantity of data on planet earth is growing exponentially for many reasons. Various sources and
our day to day activities generate lots of data. With the invent of the web, the whole world has gone
online, every single thing we do leaves a digital trace. With smart-objects going online, the data
growth rate has increased rapidly. The major sources of Big Data are social media sites, sensor
networks, digital images/videos, cell phones, purchase transaction records, weblogs, medical
records, archives, military surveillance, eCommerce, complex scientific research and so on. All these
information amounts to around some Quintillion bytes of data. By 2020, the data volumes will be
around 40 Zettabytes which is equivalent to adding every single grain of sand on the planet
multiplied by seventy-five.

3. What is Big Data?

Big Data is a term used for a collection of data sets that are large and complex, which is difficult to
store and process using available database management tools or traditional data processing
applications. The challenge includes capturing, curating, storing, searching, sharing, transferring,
analyzing and visualization of this data.

4. Big Data Characteristics

The five characteristics that define Big Data are Volume, Velocity, Variety, Veracity, and Value.
4.1 Volume:

Refers to the ‘amount of data’, which is growing day by day at a very fast pace. The size of data
generated by humans, machines and their interactions on social media itself is massive. The name
Big Data itself is related to an enormous size. Big Data is a vast 'volumes' of data generated from
many sources daily, such as business processes, machines, social media platforms, networks,
human interactions, and many more.

Facebook can generate approximately a billion messages, 4.5 billion times that the "Like" button is
recorded, and more than 350 million new posts are uploaded each day. Big data technologies can
handle large amounts of data.
4.2 Velocity:

Velocity is defined as the pace at which different sources generate the data every day. This flow of
data is massive and continuous. There are 1.03 billion Daily Active Users (Facebook DAU) on Mobile
as of now, which is an increase of 22% year-over-year. This shows how fast the number of users are
growing on social media and how fast the data is getting generated daily. If you are able to handle
the velocity, you will be able to generate insights and take decisions based on real-time data.

Velocity plays an important role compared to others. Velocity creates the speed by which the data is
created in real-time. It contains the linking of incoming data sets speeds, rate of change,
and activity bursts. The primary aspect of Big Data is to provide demanding data rapidly.
Big data velocity deals with the speed at the data flows from sources like application logs, business
processes, networks, and social media sites, sensors, mobile devices, etc.

4.3 Variety:

As there are many sources which are contributing to Big Data, the type of data they are generating is
different. It can be structured, semi-structured or unstructured. Hence, there is a variety of data
which is getting generated every day. Earlier, we used to get the data from excel and databases, now
the data are coming in the form of images, audios, videos, sensor data etc. as shown in below image.
Hence, this variety of unstructured data creates problems in capturing, storage, mining and
analyzing the data.

The data is categorized as below:

 Structured data: In Structured schema, along with all the required columns. It is in a tabular
form. Structured Data is stored in the relational database management system.

 Semi-structured: In Semi-structured, the schema is not appropriately defined, e.g., JSON,

XML, CSV, TSV, and email. OLTP (Online Transaction Processing) systems are built to work
with semi-structured data. It is stored in relations, i.e., tables.
 Unstructured Data: All the unstructured files, log files, audio files, and image files are
included in the unstructured data. Some organizations have much data available, but they
did not know how to derive the value of data since the data is raw.

 Quasi-structured Data:The data format contains textual data with inconsistent data formats
that are formatted with effort and time with some tools.

4.4 Veracity:

Veracity refers to the data in doubt or uncertainty of data available due to data inconsistency and
incompleteness. In the image below, you can see that few values are missing in the table. Also, a few
values are hard to accept, for example — 15000 minimum value in the 3rd row, it is not possible.
This inconsistency and incompleteness is Veracity.

Data available can sometimes get messy and may be difficult to trust. With many forms of big data,
quality and accuracy are difficult to control like Twitter posts with hashtags, abbreviations, typos,
and colloquial speech. The volume is often the reason behind for the lack of quality and accuracy in
the data.

Due to uncertainty of data, 1 in 3 business leaders don’t trust the information they use to make
decisions.

It was found in a survey that 27% of respondents were unsure of how much of their data was
inaccurate.

Poor data quality costs the US economy around $3.1 trillion a year.

4.5 Value:

After discussing Volume, Velocity, Variety, and Veracity, there is another V that should be taken into
account when looking at Big Data i.e. Value. It is all well and good to have access to big data but
unless we can turn it into value it is useless. By turning it into value I mean, Is it adding to the
benefits of the organizations who are analyzing big data? Is the organization working on Big Data
achieving high ROI (Return On Investment)? Unless it adds to their profits by working on Big Data, it
is useless.

As discussed in Variety, there are different types of data which is getting generated every day. So, let
us now understand the types of data:

What are the other V’s?

Viscosity (complexity or degree of correlation), Variability (inconsistency in data

flow), Volatility (durability or how long time data is valid and how long it should be
stored), Viability (capability to be live and active), Validity (understandable to find the hidden
relationships).
Where is the Big Data stored?

5. Types of Big Data

Big Data could be of three types:

 Structured

 Semi-Structured

 Unstructured

5.1 Structured

The data that can be stored and processed in a fixed format is called Structured Data. Data stored in
a relational database management system (RDBMS) is one example of ‘structured’ data. It is easy to
process structured data as it has a fixed schema. Structured Query Language (SQL) is often used to
manage such kind of Data.

5.2 Semi-Structured

Semi-Structured Data is a type of data which does not have a formal structure of a data model, i.e. a
table definition in a relational DBMS, but nevertheless, it has some organizational properties like
tags and other markers to separate semantic elements that make it easier to analyze. XML files or
JSON documents are examples of semi-structured data.

5.3 Unstructured

The data which have unknown form and cannot be stored in RDBMS and cannot be analyzed unless
it is transformed into a structured format is called as unstructured data. Text Files and multimedia
contents like images, audios, videos are examples of unstructured data. The unstructured data is
growing quicker than others, experts say that 80 percent of the data in an organization is
unstructured.

6. Evolution of Big Data

The evolution of Big Data can roughly be subdivided into three main phases. Each phase was driven
by technological advancements and has its own characteristics and capabilities. In order to
understand the context of Big Data today, it is important to understand how each of these phases
contributed to the modern meaning of Big Data.

6.1 Big Data Phase 1 – Structured Content

Data analysis, data analytics and Big Data originate from the longstanding domain of database
management. It relies heavily on the storage, extraction, and optimization techniques that are
common in data that is stored in Relational Database Management Systems (RDBMS). The
techniques that are used in these systems, such as structured query language (SQL) and the
extraction, transformation and loading (ETL) of data, started to professionalize in the 1970s.

Database management and data warehousing systems are still fundamental components of modern-
day Big Data solutions. The ability to quickly store and retrieve data from databases or find
information in large data sets, is still a core requirement for the analysis of Big Data. Relational
database management technology and other data processing technologies that were developed
during this phase, are still strongly embedded in the Big Data solutions from leading IT vendors, such
as Microsoft, Google and Amazon. A number of core technologies and characteristics of this first
phase in the evolution of Big Data is outlined in figure 3.

6.2 Big Data Phase 2 – Web Based Unstructured Content

From the early 2000s, the internet and corresponding web applications started to generate
tremendous amounts of data. In addition to the data that these web applications stored in relational
databases, IP-specific search and interaction logs started to generate web based unstructured data.
These unstructured data sources provided organizations with a new form of knowledge: insights into
the needs and behaviours of internet users. With the expansion of web traffic and online stores,
companies such as Yahoo, Amazon and eBay started to analyse customer behaviour by analysing
click-rates, IP-specific location data and search logs, opening a whole new world of possibilities.

From a technical point of view, HTTP-based web traffic introduced a massive increase in semi-
structured and unstructured data (further discussed in chapter 1.6). Besides the standard structured
data types, organizations now needed to find new approaches and storage solutions to deal with
these new data types in order to analyse them effectively. The arrival and growth of social media
data greatly aggravated the need for tools, technologies and analytics techniques that were able to
extract meaningful information out of this unstructured data. New technologies, such as networks
analysis, web-mining and spatial-temporal analysis, were specifically developed to analyse these
large quantities of web based unstructured data effectively.

6.3 Big Data Phase 3 – Mobile and Sensor-based Content

The third and current phase in the evolution of Big Data is driven by the rapid adoption of mobile
technology and devices, and the data they generate. The number of mobile devices and tablets
surpassed the number of laptops and PCs for the first time in 2011. In 2020, there are an estimated
10 billion devices that are connected to the internet. And all of these devices generate data every
single second of the day.

Mobile devices not only give the possibility to analyse behavioural data (such as clicks and search
queries), but they also provide the opportunity to store and analyse location-based GPS data.
Through these mobile devices and tablets, it is possible to track movement, analyse physical
behaviour and even health-related data (for example the number of steps you take per day). And
because these devices are connected to the internet almost every single moment, the data that
these devices generate provide a real-time and unprecedented picture of people’s behaviour.
Simultaneously, the rise of sensor-based internet-enabled devices is increasing the creation of data
to even greater volumes. Famously coined the ‘Internet of Things’ (IoT), millions of new TVs,
thermostats, wearables and even refrigerators are connected to the internet every single day,
providing massive additional data sets. Since this development is not expected to stop anytime soon,
it could be stated that the race to extract meaningful and valuable information out of these new
data sources has only just begun. A summary of the evolution of Big Data and its key characteristics
per phase is outlined in figure 3.

Figure: The Three major Phases in the evolution of Big Data

7. Big Data in Numbers

 According to a study conducted by IBM, 90% of all data in the world was generated just in
the last two years.

 Recent surveys by Gartner found that 89% of companies are investing in Big Data to gain a
competitive edge.

 The National Small Business Association found that 63% of small businesses use Big Data to
improve operations.

 Big Data also includes unstructured datasets 80% of all data is unstructured and still requires
analysis.

 McKinsey Co. found that Big Data can lead to a 2-3% increase in productivity and a 20-25%
reduction in costs.

8. Examples of Big Data

Daily we upload millions of bytes of data. 90 % of the world’s data has been created in the last two
years.

 Walmart handles more than 1 million customer transactions every hour.

 Facebook stores, accesses and analyzes 30+ Petabytes of user-generated data.

 230+ millions of tweets are created every day.

 More than 5 billion people are calling, texting, tweeting and browsing on mobile phones
worldwide.

 YouTube users upload 48 hours of new video every minute of the day.

 Amazon handles 15 million customer click-stream user data per day to recommend
products.

 294 billion emails are sent every day. Services analyses this data to find the spams.

 Modern cars have close to 100 sensors which monitor fuel level, tire pressure etc. , each
vehicle generates a lot of sensor data.

9. Applications of Big Data

We cannot talk about data without talking about the people, people who are getting benefited by
Big Data applications. Almost all the industries today are leveraging Big Data applications in one or
the other way.

 Smarter Healthcare: Making use of the petabytes of patient’s data, the organization can
extract meaningful information and then build applications that can predict the patient’s
deteriorating condition in advance.
 Telecom: Telecom sectors collects information, analyzes it and provide solutions to different
problems. By using Big Data applications, telecom companies have been able to significantly
reduce data packet loss, which occurs when networks are overloaded, and thus, providing a
seamless connection to their customers.

 Retail: Retail has some of the tightest margins, and is one of the greatest beneficiaries of big
data. The beauty of using big data in retail is to understand consumer behavior. Amazon’s
recommendation engine provides suggestion based on the browsing history of the
consumer.

 Traffic control: Traffic congestion is a major challenge for many cities globally. Effective use
of data and sensors will be key to managing traffic better as cities become increasingly
densely populated.

 Manufacturing: Analyzing big data in the manufacturing industry can reduce component
defects, improve product quality, increase efficiency, and save time and money.

 Search Quality: Every time we are extracting information from Google, we are
simultaneously generating data for it. Google stores this data and uses it to improve its
search quality.

Someone has rightly said: “Not everything in the garden is Rosy!”. Till now in this Big Data tutorial, I
have just shown you the rosy picture of Big Data. But if it was so easy to leverage Big data, don’t you
think all the organizations would invest in it? Let me tell you up front, that is not the case. There are
several challenges which come along when you are working with Big Data.

Now that you are familiar with Big Data and its various features, the next section of this blog on Big
Data Tutorial will shed some light on some of the major challenges faced by Big Data.

10. Challenges with Big Data

1. Data Quality — The problem here is the 4thV i.e. Veracity. The data here is very messy,
inconsistent and incomplete. Dirty data cost $600 billion to the companies every year in the
United States.

2. Discovery — Finding insights on Big Data is like finding a needle in a haystack. Analyzing
petabytes of data using extremely powerful algorithms to find patterns and insights are very
difficult.
3. Storage — The more data an organization has, the more complex the problems of managing
it can become. The question that arises here is “Where to store it?”. We need a storage
system which can easily scale up or down on-demand.

4. Analytics — In the case of Big Data, most of the time we are unaware of the kind of data we
are dealing with, so analyzing that data is even more difficult.

5. Security — Since the data is huge in size, keeping it secure is another challenge. It includes
user authentication, restricting access based on a user, recording data access histories,
proper use of data encryption etc.

6. Lack of Talent — There are a lot of Big Data projects in major organizations, but a
sophisticated team of developers, data scientists and analysts who also have sufficient
amount of domain knowledge is still a challenge.

Hadoop to the Rescue

We have a savior to deal with Big Data challenges — its Hadoop. Hadoop is an open source, a Java-
based programming framework that supports the storage and processing of extremely large data
sets in a distributed computing environment. It is part of the Apache project sponsored by the
Apache Software Foundation.

Hadoop with its distributed processing handles large volumes of structured and unstructured data
more efficiently than the traditional enterprise data warehouse. Hadoop makes it possible to run
applications on systems with thousands of commodity hardware nodes, and to handle thousands of
terabytes of data. Organizations are adopting Hadoop because it is an open source software and can
run on commodity hardware (your personal computer). The initial cost savings are dramatic as
commodity hardware is very cheap. As the organizational data increases, you need to add more &
more commodity hardware on the fly to store it and hence, Hadoop proves to be economical.
Additionally, Hadoop has a robust Apache community behind it that continues to contribute to its
advancement.

11. Advantages Of Big Data

Big Data has several advantages that make it a valuable asset for organizations in various industries.
Some of the advantages of Big Data include:

1. Improved decision-making: Big Data provides organizations with access to vast amounts of
data, allowing them to make more informed and data-driven decisions. By analyzing Big
Data, organizations can identify trends, patterns, and insights that would be difficult or
impossible to discern from smaller datasets.

2. Increased efficiency and productivity: Big Data technologies enable organizations to process
and analyze data more quickly and accurately. This can help organizations to optimize their
operations, reduce waste and inefficiencies, and increase productivity.

3. Better customer insights: Big Data can provide organizations with a more complete and
detailed understanding of their customers' behaviors, preferences, and needs. This can help
organizations to improve their marketing and customer engagement strategies, leading to
higher customer satisfaction and loyalty.

4. Enhanced product and service innovation: Big Data can provide organizations with insights
into emerging trends, consumer preferences, and market opportunities, which can help to
drive product and service innovation. By leveraging Big Data, organizations can develop
products and services that better meet customer needs and preferences.

5. Cost savings: By improving efficiency and productivity, Big Data can help organizations to
reduce costs and increase profitability. For example, Big Data can be used to optimize supply
chain operations, reduce inventory costs, and improve resource allocation.

Overall, the advantages of Big Data can be significant, and organizations that effectively manage and
analyze their data assets can gain a competitive advantage in their respective industries. However, it
is important to note that working with Big Data also presents significant challenges, including the
need for specialized expertise, tools, and infrastructure to manage and analyze large datasets.
12. Big Data Tools

There are many tools available for managing and analyzing Big Data, each with its own strengths and
weaknesses. Some popular Big Data tools include:

1. Apache Hadoop: Apache Hadoop is an open-source software framework that is widely used
for distributed storage and processing of large datasets. It provides a scalable and fault-
tolerant system for storing and processing data, and it includes several tools for data
processing and analysis, such as Hadoop Distributed File System (HDFS) and MapReduce.

2. Apache Spark: Apache Spark is an open-source data processing engine that is designed for
high-speed data processing and analytics. It provides a unified analytics engine for data
processing, machine learning, and graph processing, and it supports multiple programming
languages, including Java, Python, and Scala.

3. Apache Cassandra: Apache Cassandra is an open-source distributed database management

system that is designed for handling large volumes of data across multiple servers. It
provides a highly scalable and fault-tolerant system for storing and retrieving data, and it is
particularly well-suited for use cases that require high availability and high write throughput.

4. NoSQL databases: NoSQL databases are a category of databases that are designed for
handling unstructured and semi-structured data. They provide a flexible and scalable system
for storing and retrieving data, and they include several popular databases such as
MongoDB, Couchbase, and Apache CouchDB.

5. Data visualization tools: Data visualization tools are used for creating visual representations
of data, such as charts, graphs, and maps. They provide an effective way to communicate
insights and trends to stakeholders and decision-makers, and they include popular tools
such as Tableau, D3.js, and QlikView.

6. Machine learning libraries: Machine learning libraries are used for developing and deploying
machine learning models that can be used for a variety of applications, such as predictive
analytics, natural language processing, and computer vision. Popular machine learning
libraries include TensorFlow, Scikit-learn, and Keras.
These are just a few examples of the many Big Data tools available today. Choosing the right tool for
a given use case depends on several factors, such as the size and complexity of the data, the desired
analysis or processing capabilities, and the available resources and expertise.

14. Big Data Job Type:

There are various job types related to Big Data, depending on the specific skills and expertise
required. Some of the common Big Data job types include:

1. Data Scientist: This job involves analyzing and interpreting complex data sets to identify
patterns and insights, and using them to develop predictive models and machine learning
algorithms.

2. Data Analyst: This job involves collecting, cleaning, and processing large data sets to derive
insights and trends, and presenting them in an understandable format to business
stakeholders.

3. Big Data Engineer: This job involves designing and building scalable data architectures and
pipelines that can process and manage large volumes of data from various sources.

4. Data Architect: This job involves designing and maintaining the overall data architecture of
an organization, including data models, schemas, and metadata.

5. Business Intelligence Analyst: This job involves designing and developing dashboards and
reports that help businesses make data-driven decisions.

6. Database Administrator: This job involves managing and maintaining databases, ensuring
their reliability, security, and scalability.

7. Machine Learning Engineer: This job involves designing and building machine learning
models and systems that can learn and improve over time.

8. Data Warehouse Developer: This job involves designing and building data warehouses,
which are central repositories of data used for reporting and analysis.

9. Data Mining Engineer: This job involves using machine learning and statistical techniques to
extract insights and patterns from large data sets.

10. Data Visualization Specialist: This job involves designing and creating visual representations
of data, such as charts and graphs, to help stakeholders understand complex data sets.

Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
64 pages
Big Data Chapter 1
No ratings yet
Big Data Chapter 1
22 pages
Big Data
No ratings yet
Big Data
15 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
BigData - BCom Unit 1
No ratings yet
BigData - BCom Unit 1
9 pages
Unit 1
No ratings yet
Unit 1
56 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
6 pages
Bda Chapter 1 Techneo
No ratings yet
Bda Chapter 1 Techneo
27 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
BigData BCom
No ratings yet
BigData BCom
57 pages
CS3352 FDS Notes - 03 - by WWW - Notesfree.in
No ratings yet
CS3352 FDS Notes - 03 - by WWW - Notesfree.in
138 pages
BD Unit-1 Upd
No ratings yet
BD Unit-1 Upd
29 pages
Sns College of Engineering: Big Data Analytics
No ratings yet
Sns College of Engineering: Big Data Analytics
17 pages
Bda QB Answer
No ratings yet
Bda QB Answer
39 pages
Big Data and Its Characteristics
No ratings yet
Big Data and Its Characteristics
21 pages
Lecture 1: Big Data Challenges and Overview: Extracted From
No ratings yet
Lecture 1: Big Data Challenges and Overview: Extracted From
26 pages
Unit 1
No ratings yet
Unit 1
57 pages
Big Data Analysis
No ratings yet
Big Data Analysis
14 pages
BIG Data Analytics
No ratings yet
BIG Data Analytics
17 pages
1 U Data-Analytics-Unit-I-1
100% (1)
1 U Data-Analytics-Unit-I-1
81 pages
Big Data 1
No ratings yet
Big Data 1
22 pages
Bda - Digital Notes
No ratings yet
Bda - Digital Notes
85 pages
Big Data SKN
No ratings yet
Big Data SKN
24 pages
Bda M1
No ratings yet
Bda M1
111 pages
Big Data Lecture 1
No ratings yet
Big Data Lecture 1
22 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
Unit 1
No ratings yet
Unit 1
44 pages
UNIT 1big Data Introduction
No ratings yet
UNIT 1big Data Introduction
56 pages
Unit 1 What Is Big Data
No ratings yet
Unit 1 What Is Big Data
26 pages
Unit 1 Big Data and Sources of Data
No ratings yet
Unit 1 Big Data and Sources of Data
53 pages
Big Data
No ratings yet
Big Data
7 pages
Presentation 1
No ratings yet
Presentation 1
27 pages
Unit 5
No ratings yet
Unit 5
63 pages
AI-based Legacy Data Extraction and Processing Tool
No ratings yet
AI-based Legacy Data Extraction and Processing Tool
4 pages
Big Data UNIT I
No ratings yet
Big Data UNIT I
91 pages
BDA Unit 1
No ratings yet
BDA Unit 1
28 pages
Lec 1 - Introduction To Big Data
No ratings yet
Lec 1 - Introduction To Big Data
37 pages
Unit 4
No ratings yet
Unit 4
29 pages
Admission Chatbot Internship Report
No ratings yet
Admission Chatbot Internship Report
34 pages
Turban Ch1 Ch6
100% (4)
Turban Ch1 Ch6
167 pages
Unit 1
No ratings yet
Unit 1
56 pages
Big Data Cat 1
No ratings yet
Big Data Cat 1
11 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
36 pages
Bi Assignment
No ratings yet
Bi Assignment
35 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
117 pages
Chapter 7 - Text Mining, Sentiment Analysis, and Social Analytics
No ratings yet
Chapter 7 - Text Mining, Sentiment Analysis, and Social Analytics
91 pages
Ds Unit-1
No ratings yet
Ds Unit-1
19 pages
BIG Data 1
No ratings yet
BIG Data 1
10 pages
Unit I Bda
No ratings yet
Unit I Bda
18 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Big Data
No ratings yet
Big Data
16 pages
Class - Big Data UNIT-I
No ratings yet
Class - Big Data UNIT-I
40 pages
CC&BD Unit 3
No ratings yet
CC&BD Unit 3
16 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
BDA Question Answer
No ratings yet
BDA Question Answer
29 pages
Unit 3 Big Data Analytics
No ratings yet
Unit 3 Big Data Analytics
18 pages
Unit 1
No ratings yet
Unit 1
26 pages
Analytics and Cybersecurity - The Shape of Things To Come PDF
No ratings yet
Analytics and Cybersecurity - The Shape of Things To Come PDF
95 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Module 2-4
No ratings yet
Module 2-4
16 pages
Artificial Intelligence and Credit Risk - The Use of Alternative Data and Methods in Internal Credit Rating-Palgrave Macmillan (2022)
No ratings yet
Artificial Intelligence and Credit Risk - The Use of Alternative Data and Methods in Internal Credit Rating-Palgrave Macmillan (2022)
115 pages
Unit 1 (Chapter 1) - Introduction
No ratings yet
Unit 1 (Chapter 1) - Introduction
10 pages
Big Data Intro
No ratings yet
Big Data Intro
12 pages
Challenges Due To Big Data in Economy and Its Solutions
No ratings yet
Challenges Due To Big Data in Economy and Its Solutions
3 pages
Text and Web Analytics
No ratings yet
Text and Web Analytics
48 pages
Chapter 2-Converted BI
No ratings yet
Chapter 2-Converted BI
39 pages
Bda (Chapter 1)
No ratings yet
Bda (Chapter 1)
8 pages
Introduction To Bigdata
No ratings yet
Introduction To Bigdata
31 pages
Big Data UNIT1
No ratings yet
Big Data UNIT1
23 pages
Lecture 6-Text Mining and Sentiment Analysis
No ratings yet
Lecture 6-Text Mining and Sentiment Analysis
57 pages
Module 1
No ratings yet
Module 1
55 pages
Big Data Analytics (CS443) IV B.Tech (IT) 2018-19 I Semester
No ratings yet
Big Data Analytics (CS443) IV B.Tech (IT) 2018-19 I Semester
72 pages
How To Use Portals As A Knowledge-Sharing Mechanism
71% (7)
How To Use Portals As A Knowledge-Sharing Mechanism
21 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
4 pages
E-Invoicing: The State of Play in Australia
No ratings yet
E-Invoicing: The State of Play in Australia
52 pages
Text Cleaning Methods in NLP - Part-2
No ratings yet
Text Cleaning Methods in NLP - Part-2
5 pages
Big Data
No ratings yet
Big Data
7 pages
Data - Visualisation - Charts and Types of Data
No ratings yet
Data - Visualisation - Charts and Types of Data
7 pages
Big Data Characteristics
No ratings yet
Big Data Characteristics
4 pages
Machine 2
No ratings yet
Machine 2
3 pages
5V S of Big Data Attributes and Their Re
No ratings yet
5V S of Big Data Attributes and Their Re
10 pages
1b.data Understanding (AutoRecovered)
No ratings yet
1b.data Understanding (AutoRecovered)
4 pages
Module 1: Introduction To Information Storage 1
No ratings yet
Module 1: Introduction To Information Storage 1
18 pages
Business Intelligence Technology, Applications, and Trends PDF
No ratings yet
Business Intelligence Technology, Applications, and Trends PDF
11 pages
Web Data Mining Synopsis
No ratings yet
Web Data Mining Synopsis
18 pages
Oracle Endeca Information Discovery: A Technical Overview: An Oracle White Paper January 2014
No ratings yet
Oracle Endeca Information Discovery: A Technical Overview: An Oracle White Paper January 2014
28 pages
Dsbda Unit1 Question Bank
No ratings yet
Dsbda Unit1 Question Bank
1 page
AWS-IDC Executive Summary Final
No ratings yet
AWS-IDC Executive Summary Final
4 pages
Reality Mining: Using Big Data to Engineer a Better World
From Everand
Reality Mining: Using Big Data to Engineer a Better World
Nathan Eagle
4/5 (2)
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet

Big Data 101

Uploaded by

Big Data 101

Uploaded by

Big Data 101

 The story of Big Data

 Big Data Driving Factors

 What is Big Data?

 Big Data Characteristics

 Types of Big Data

 Examples of Big Data

 Applications of Big Data

 Challenges with Big Data

 Advantages Of Big Data

 Big Data Tools

 Big Data Job Type

1. The story of Big Data

3. What is Big Data?

4. Big Data Characteristics

The data is categorized as below:

 Semi-structured: In Semi-structured, the schema is not appropriately defined, e.g., JSON,

What are the other V’s?

Viscosity (complexity or degree of correlation), Variability (inconsistency in data

5. Types of Big Data

Big Data could be of three types:

6. Evolution of Big Data

6.1 Big Data Phase 1 – Structured Content

6.2 Big Data Phase 2 – Web Based Unstructured Content

6.3 Big Data Phase 3 – Mobile and Sensor-based Content

Figure: The Three major Phases in the evolution of Big Data

7. Big Data in Numbers

8. Examples of Big Data

 Walmart handles more than 1 million customer transactions every hour.

 Facebook stores, accesses and analyzes 30+ Petabytes of user-generated data.

 230+ millions of tweets are created every day.

9. Applications of Big Data

10. Challenges with Big Data

Hadoop to the Rescue

11. Advantages Of Big Data

3. Apache Cassandra: Apache Cassandra is an open-source distributed database management

14. Big Data Job Type:

You might also like