0% found this document useful (0 votes)

35 views6 pages

Lesson 1 Overview of Big Data Analytics

Uploaded by

flyware600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views6 pages

Lesson 1 Overview of Big Data Analytics

Uploaded by

flyware600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Big Data Analytics Lesson 1

Lesson 1: Overview of Big Data Analysis

Lesson 1: Overview of Big Data Analysis ..................................................................................................... 1

1.1. Introduction .................................................................................................................................. 2
1.2. Definition of terms ....................................................................................................................... 2
1.2.1. Big Data ................................................................................................................................. 2
1.2.2. Data Science .......................................................................................................................... 2
1.2.3. Data Analysis ......................................................................................................................... 2
1.2.4. Machine Learning.................................................................................................................. 2
1.2.5. Database Management Systems........................................................................................... 2
1.2.6. Data warehouse .................................................................................................................... 2
1.3. Characteristics of Big Data ........................................................................................................... 3
1.3.1. Volume .................................................................................................................................. 3
1.3.2. Velocity.................................................................................................................................. 3
1.3.3. Variety ................................................................................................................................... 3
1.3.4. Veracity ................................................................................................................................. 3
1.4. Types of Big data analytics ........................................................................................................... 3
1.4.1. Diagnostic analytics............................................................................................................... 3
1.4.2. Descriptive analytics ............................................................................................................. 4
1.4.3. Prescriptive analytics ............................................................................................................ 4
1.4.4. Predictive analytics ............................................................................................................... 4
1.5. Tools used in Big Data analysis .................................................................................................... 4
1.5.1. Apache Hadoop ..................................................................................................................... 4
1.5.2. MapReduce ........................................................................................................................... 4
1.5.3. HDFS ...................................................................................................................................... 5
1.5.4. Hive ....................................................................................................................................... 5
1.5.5. Pig.......................................................................................................................................... 5
1.6. Traditional versus Big data business Approach........................................................................... 5
1.6.1. Relational Databases (SQL) ................................................................................................... 5
1.6.2. Schema less and Column oriented Databases (No Sql) ........................................................ 5
1.7. Opportunities in Big data Analysis............................................................................................... 6
Lesson 1: Review Questions ..................................................................................................................... 6

Compiled by: Karari E.K email: [email protected] 1

Big Data Analytics Lesson 1

1.1. Introduction
The volume of data that one has to deal has exploded to unimaginable levels in the past decade, and at
the same time, the price of data storage has systematically reduced. Private companies and research
institutions capture terabytes of data about their users’ interactions, business, social media, and also
sensors from devices such as mobile phones and automobiles. The challenge of this era is to make sense
of this sea of data. This is where big data analytics comes into picture.

Big Data Analytics largely involves collecting data from different sources, merge it in a way that it
becomes available to be consumed by analysts and finally deliver data products useful to the
organization business.

The process of converting large amounts of unstructured raw data, retrieved from different sources to a
data product useful for organizations forms the core of Big Data Analytics.

1.2. Definition of terms

There are a number of terms used in this unit which we need to familiarize ourselves with and some of
which are described below:

1.2.1.Big Data
Big data is the collective name for the large amount of registered digital data and the equal growth
thereof. The aim is to convert this stream of information into valuable information for the company.

1.2.2.Data Science
Data science is the process of deriving knowledge and insights from a huge and diverse set of data
through organizing, processing and analyzing the data. It involves many different disciplines like
mathematical and statistical modelling, extracting data from it source and applying data visualization
techniques.

1.2.3.Data Analysis
Data Analysis is a process of inspecting, cleaning, transforming and modeling data with the goal of
discovering useful information, suggesting conclusions and supporting decision-making.

1.2.4.Machine Learning
Machine Learning (ML) is that field of computer science with the help of which computer systems can
provide sense to data in much the same way as human beings do.In simple words, ML is a type of
artificial intelligence that extract patterns out of raw data by using an algorithm or method. The main
focus of ML is to allow computer systems learn from experience without being explicitly programmed or
human intervention.

1.2.5.Database Management Systems

A Database Management System (DBMS) is defined as the software system that allows users to define,
create, maintain and control access to the database. A DBMS makes it possible for end users to create,
read, update and delete data in database. It is a layer between programs and data.

1.2.6.Data warehouse
A data warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data. This
data helps analysts to take informed decisions in an organization.

Compiled by: Karari E.K email: [email protected] 2

Big Data Analytics Lesson 1

1.3. Characteristics of Big Data

Big data is characterized by the four V’s. These V’s stand for the four dimensions of Big Data: Volume,
Velocity, Variety and Veracity.

1.3.1.Volume
Big Data is large in volume. It is estimated that we create 2.3 trillion gigabytes of data every day. And
that will only increase. As the volume grows so rapidly, so does the need for new database
management systems and IT employees. Millions of new IT jobs are expected to be created in the next
few years to accommodate the Big Data flow.

1.3.2.Velocity
Velocity, or speed, refers to the enormous speed with which data is generated and processed. Until a
few years ago, it took a while to process the right data and to surface the right information. Today, data
is available in real time. This is not only a consequence of the speed of the internet, but also of the
presence of Big Data itself. Because the more data we create, the more methods are needed to monitor
all this data, and the more data is monitored. This creates a vicious circle.

1.3.3.Variety
The high speed and considerable volume are related to the variety of forms of data. Smart IT solutions
are available today for all sectors, from the medical world to construction and business. Consider, for
example, the electronic patient records in healthcare, which contribute to many trillions of gigabytes of
data. When all parts of the world have the internet in the future, the volume and variety will only
increase.

1.3.4.Veracity
How truthful Big Data is remains a difficult point. Data quickly becomes outdated and the information
shared via the internet and social media does not necessarily have to be correct. Many managers and
directors in the business community do not dare to make decisions based on Big Data. Data scientists
and IT professionals have their hands full organizing and accessing the right data. It is very important
that they find a good way to do this. Because if Big Data is organized and used in the right way, it can be
of great value in our lives. From predicting business trends to preventing disease and crime

1.4. Types of Big data analytics

There are four main types of big data analytics: diagnostic, descriptive, prescriptive, and predictive
analytics. They use various tools for processes such as data mining, cleaning, integration, visualization,
and many others, to improve the process of analyzing data and ensuring the company benefits from the
data they gather.

1.4.1.Diagnostic analytics
Diagnostic analytics is one of the more advanced types of big data analytics that you can use to
investigate data and content. Through this type of analytics, you use the insight gained to answer the
question, “Why did it happen?” So, by analyzing data, you can comprehend the reasons for certain
behaviors and events related to the company you work for, their customers, employees, products, and
more.

Let’s say there has been a drastic change in a product’s sale even though you have not made any
marketing changes to it. You would use diagnostic analytics to identify this anomaly and find the causal

Compiled by: Karari E.K email: [email protected] 3

Big Data Analytics Lesson 1

relationship for such a change. Some tools and techniques used for such a task include: searching for
patterns in the data sets, filtering the data, using probability theory, regression analysis, and more.

1.4.2.Descriptive analytics
Descriptive analytics is one of the most common forms of analytics that companies use to stay updated
on current trends and the company’s operational performances. It is one of the first steps of analyzing
raw data by performing simple mathematical operations and producing statements about samples and
measurements. After you identify trends and insight with descriptive analytics, you can use the other
types of analytics to learn more about what causes those trends.

You will need to use descriptive analytics when dealing with finance, production, and sales. Some tasks
that require this type of analytics include the production of financial reports and metrics, surveys, social
media initiatives, and other business-related assignments.

1.4.3.Prescriptive analytics
Prescriptive analytics takes the results from descriptive and predictive analysis and finds solutions for
optimizing business practices through various simulations and techniques. It uses the insight from data
to suggest what the best step forward would be for the company.

Google is one of the many companies that use this type of analytics. They made use of it when designing
their self-driving cars. These cars analyze data in real-time and make decisions based on prescriptive
analytics.

1.4.4.Predictive analytics
As the name suggests, this type of data analytics is all about making predictions about future outcomes
based on insight from data. In order to get the best results, it uses many sophisticated predictive tools
and models such as machine learning and statistical modeling.

Predictive analytics is one of the most widely used types of analytics today.

1.5. Tools used in Big Data analysis

In this section, we discuss various tools used in big data analysis.

1.5.1.Apache Hadoop
Apache Hadoop is one of the main supportive element in Big Data technologies. It simplifies the
processing of large amount of structured or unstructured data in a cheap manner. Hadoop is an open
source project from apache that is continuously improving over the years. "Hadoop is basically a set of
software libraries and frameworks to manage and process big amount of data from a single server to
thousands of machines. It provides an efficient and powerful error detection mechanism based on
application layer rather than relying upon hardware."

1.5.2.MapReduce
MapReduce was introduced by google to create large amount of web search indexes. It is basically a
framework to write applications that processes a large amount of structured or unstructured data over
the web. MapReduce takes the query and breaks it into parts to run it on multiple nodes. By distributed
query processing it makes it easy to maintain large amount of data by dividing the data into several
different machines. Hadoop MapReduce is a software framework for easily writing applications to

Compiled by: Karari E.K email: [email protected] 4

Big Data Analytics Lesson 1

manage large amount of data sets with a highly fault tolerant manner. More tutorials and getting
started guide can be found at Apache Documentation.

1.5.3.HDFS
HDFS (Hadoop distributed file system) is a java based file system that is used to store structured or
unstructured data over large clusters of distributed servers. The data stored in HDFS has no restriction
or rule to be applied, the data can be either fully unstructured of purely structured. In HDFS the work to
make data senseful is done by developer's code only. Hadoop distributed file system provides a highly
fault tolerant atmosphere with a deployment on low cost hardware machines. HDFS is now a part of
Apache Hadoop project, more information and installation guide can be found at Apache HDFS
documentation.

1.5.4.Hive
Hive was originally developed by Facebook, now it is made open source for some time. Hive works
something like a bridge in between sql and Hadoop, it is basically used to make Sql queries on Hadoop
clusters. Apache Hive is basically a data warehouse that provides ad-hoc queries, data summarization
and analysis of huge data sets stored in Hadoop compatible file systems. Hive provides a SQL like called
HiveQL query based implementation of huge amount of data stored in Hadoop clusters. In January 2013
apache releases Hive 0.10.0, more information and installation guide can be found at Apache Hive
Documentation.

1.5.5.Pig
Pig was introduced by yahoo and later on it was made fully open source. It also provides a bridge to
query data over Hadoop clusters but unlike hive, it implements a script implementation to make Hadoop
data access able by developers and business persons. Apache pig provides a high level programming
platform for developers to process and analyses Big Data using user defined functions and programming
efforts. In January 2013 Apache released Pig 0.10.1 which is defined for use with Hadoop 0.10.1 or later
releases. More information and installation guide can be found at Apache Pig Getting Started
Documentation.

1.6. Traditional versus Big data business Approach

In a basic sense, measuring learning using a big data approach isn’t too dissimilar from utilizing
traditional approaches like the long-established Kirkpatrick, Phillips or Kaufman’s models. When using
these approaches, you start by generating a hypothesis that a change you are going to make to your
workforce’s learning will affect your organization’s performance. You then measure a baseline, make the
change and measure again to see how your baseline data has changed.

1.6.1.Relational Databases (SQL)

A relational schema is a set of relational tables and associated items that are related to one another. All
of the base tables, views, indexes, domains, user roles, stored modules, and other items that a user
creates to fulfill the data needs of a particular enterprise or set of applications belong to one schema.
SQL provides a statement to define a schema.

1.6.2.Schema less and Column oriented Databases (No Sql)

We are using table and row based relational databases over the years, these databases are just fine with
online transactions and quick updates. When unstructured and large amount of data comes into the

Compiled by: Karari E.K email: [email protected] 5

Big Data Analytics Lesson 1

picture we needs some databases without having a hard code schema attachment. There are a number
of databases to fit into this category, these databases can store unstructured, semi structured or even
fully structured data.

Apart from other benefits the finest thing with schema less databases is that it makes data migration
very easy. MongoDB is a very popular and widely used NoSQL database these days. NoSQL and schema
less databases are used when the primary concern is to store a huge amount of data and not to
maintain relationship between elements. "NoSQL (not only Sql) is a type of databases that does not
primarily rely upon schema based structure and does not use Sql for data processing.

1.7. Opportunities in Big data Analysis

In rapidly evolving industries, big data enables businesses to solve today’s manufacturing challenges and
to gain a competitive edge. With big data and analytics, companies have got a chance to make better
real-time decisions about asset usage and operations scheduling. Below are the most in-demanding
opportunities:

• Data Scientist
• Data Architect
• Business Intelligence developer
• Data Engineer
• Data Analyst
• Decision Scientist

Lesson 1: Review Questions

1. Discuss the Data Mining Life Cycle.
2. Discuss characteristics of big data.
3. Explain the benefits of various types of big data analytics.
4. Differentiate between traditional and big data business approach.

Compiled by: Karari E.K email: [email protected] 6

cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Data Analytics III-i
No ratings yet
Data Analytics III-i
85 pages
DATA ANALYTICS
No ratings yet
DATA ANALYTICS
42 pages
Big Data Manual - Edited
No ratings yet
Big Data Manual - Edited
69 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
It (r20) 4-1 Big Data Analytics Digital Notes
No ratings yet
It (r20) 4-1 Big Data Analytics Digital Notes
84 pages
DA (1)
No ratings yet
DA (1)
86 pages
Super 25 Unit 1 and Unit 2 (1)
No ratings yet
Super 25 Unit 1 and Unit 2 (1)
15 pages
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
No ratings yet
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
117 pages
Lab Manual - Student Copy - Index & Experiments CCS334_BDA
No ratings yet
Lab Manual - Student Copy - Index & Experiments CCS334_BDA
66 pages
BIG data1
No ratings yet
BIG data1
49 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Introduction to Big Data
No ratings yet
Introduction to Big Data
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
BIG DATA ANALTICS (UNIT 1)
No ratings yet
BIG DATA ANALTICS (UNIT 1)
31 pages
Bda Unit-1
No ratings yet
Bda Unit-1
43 pages
Cp5293 Big Data Analytics Question Bank
0% (1)
Cp5293 Big Data Analytics Question Bank
13 pages
BDA_Notes
No ratings yet
BDA_Notes
68 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
2 pages
BDA Unit 1
No ratings yet
BDA Unit 1
36 pages
BUSINESS ANALYTICS NOTES
No ratings yet
BUSINESS ANALYTICS NOTES
31 pages
Big Data - Iv Bda
No ratings yet
Big Data - Iv Bda
143 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
Introduction-to-Data-Analytics
No ratings yet
Introduction-to-Data-Analytics
15 pages
Da Unit-1
No ratings yet
Da Unit-1
23 pages
BIG DATA ANALYTICS
No ratings yet
BIG DATA ANALYTICS
10 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
BDA 02 - Fundamentals
No ratings yet
BDA 02 - Fundamentals
64 pages
TP 4 2docuatrimestre
No ratings yet
TP 4 2docuatrimestre
10 pages
BDA U1
No ratings yet
BDA U1
80 pages
FUNDAMENTALS OF BIG DATA ANALYTICS Digital Notes
No ratings yet
FUNDAMENTALS OF BIG DATA ANALYTICS Digital Notes
121 pages
Unit - 2 Fundamentals of Big Data Analytics
No ratings yet
Unit - 2 Fundamentals of Big Data Analytics
39 pages
Big Data Analytics (R20a0520)
No ratings yet
Big Data Analytics (R20a0520)
84 pages
Big data analytics notes
No ratings yet
Big data analytics notes
33 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
Data Analytics III I
No ratings yet
Data Analytics III I
86 pages
Big Data Analytics (BDA) : Name of The Faculty: Affiliation: Teaching Area
No ratings yet
Big Data Analytics (BDA) : Name of The Faculty: Affiliation: Teaching Area
8 pages
BDA2023Outline
No ratings yet
BDA2023Outline
7 pages
Big data
No ratings yet
Big data
47 pages
Total Lecture Hours Physical and Computer Models To Lectu Re, Visit To Industry, Min of 2 Lectures by Industry Experts
No ratings yet
Total Lecture Hours Physical and Computer Models To Lectu Re, Visit To Industry, Min of 2 Lectures by Industry Experts
2 pages
Big Data Analytics
100% (1)
Big Data Analytics
11 pages
326E5E
No ratings yet
326E5E
2 pages
BDA (18CS72) Module-1
No ratings yet
BDA (18CS72) Module-1
36 pages
dataanalyticsunit-1[1]
No ratings yet
dataanalyticsunit-1[1]
26 pages
Big Data Analytics Project Proposal by Slidesgo
No ratings yet
Big Data Analytics Project Proposal by Slidesgo
12 pages
Module 1
No ratings yet
Module 1
21 pages
CS8091-Big-Data-Analytics
No ratings yet
CS8091-Big-Data-Analytics
28 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
BDS Session 3
No ratings yet
BDS Session 3
56 pages
Data Analysis PHASE
No ratings yet
Data Analysis PHASE
14 pages
Bda Unit 1
No ratings yet
Bda Unit 1
74 pages
37 A Review Paper On Big Data Analytics
No ratings yet
37 A Review Paper On Big Data Analytics
4 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
01_Big Data Analytics - An Introduction
No ratings yet
01_Big Data Analytics - An Introduction
45 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Bda Combined
No ratings yet
Bda Combined
102 pages
CS8091 BDA Unit1
No ratings yet
CS8091 BDA Unit1
63 pages
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
Design and Implementation of A Low Cost 3D Printed Humanoid Robotic Platform
No ratings yet
Design and Implementation of A Low Cost 3D Printed Humanoid Robotic Platform
6 pages
Highway Engineering II PDF Full
No ratings yet
Highway Engineering II PDF Full
225 pages
Feast of Fire
No ratings yet
Feast of Fire
13 pages
Gambar Detail Frame Rumah Kucing
No ratings yet
Gambar Detail Frame Rumah Kucing
4 pages
Protocol NGS Twist96-PlexLibraryPreparationProtocol 26MAR22 Rev2.0 0
No ratings yet
Protocol NGS Twist96-PlexLibraryPreparationProtocol 26MAR22 Rev2.0 0
14 pages
Phoenix BIOS Beep and Error Codes
No ratings yet
Phoenix BIOS Beep and Error Codes
12 pages
Herrera Bernardo Assigment2.Recruitmen
No ratings yet
Herrera Bernardo Assigment2.Recruitmen
5 pages
Consumer Behavior in Ibm
No ratings yet
Consumer Behavior in Ibm
56 pages
Chapter 1: Temperature: PHYF134
No ratings yet
Chapter 1: Temperature: PHYF134
1 page
Devops Assignment
No ratings yet
Devops Assignment
10 pages
Chap-12 Part 3
No ratings yet
Chap-12 Part 3
28 pages
Samsug Customer Experience
No ratings yet
Samsug Customer Experience
12 pages
Red Chilli de Hydration Plant
No ratings yet
Red Chilli de Hydration Plant
29 pages
Economics Is The Study of How Individuals and Societies Choose To Use The Scarce Resources That Nature and Previous Generations Have Provided
No ratings yet
Economics Is The Study of How Individuals and Societies Choose To Use The Scarce Resources That Nature and Previous Generations Have Provided
4 pages
VPMPGF 19g526 Ac
No ratings yet
VPMPGF 19g526 Ac
1 page
French Gcse Work Experience Coursework
100% (2)
French Gcse Work Experience Coursework
7 pages
2023 08 en
No ratings yet
2023 08 en
84 pages
Data Sheet Mini-Striker Dist PDF
100% (1)
Data Sheet Mini-Striker Dist PDF
4 pages
Dimensional Homogeneity & Dimensionless Numbers
No ratings yet
Dimensional Homogeneity & Dimensionless Numbers
74 pages
Assembly Language
No ratings yet
Assembly Language
3 pages
Electric Fields: Sir Michael Faraday's Electric Lines of Force
No ratings yet
Electric Fields: Sir Michael Faraday's Electric Lines of Force
48 pages
KTD2151-04e Datasheet Brief
No ratings yet
KTD2151-04e Datasheet Brief
2 pages
Basic Features of The Microcredit Regulatory Authority Act, 2006
No ratings yet
Basic Features of The Microcredit Regulatory Authority Act, 2006
10 pages
Halstead Software Science
100% (1)
Halstead Software Science
2 pages
Job Application Letter Sample Philippines
100% (1)
Job Application Letter Sample Philippines
5 pages
On Symplectic Packing Problems in Higher Dimensions: Kyler Siegel, Yuan Yao
No ratings yet
On Symplectic Packing Problems in Higher Dimensions: Kyler Siegel, Yuan Yao
25 pages
Syllabus - Master Programme in Health Informatics
No ratings yet
Syllabus - Master Programme in Health Informatics
8 pages
2023 Iiii
No ratings yet
2023 Iiii
4 pages
Mission Centrifugal Price List CP-005-05 rev G
No ratings yet
Mission Centrifugal Price List CP-005-05 rev G
47 pages
Experiment Number 03 gmt
No ratings yet
Experiment Number 03 gmt
3 pages

Lesson 1 Overview of Big Data Analytics

Uploaded by

Lesson 1 Overview of Big Data Analytics

Uploaded by

Big Data Analytics Lesson 1

Lesson 1: Overview of Big Data Analysis

Lesson 1: Overview of Big Data Analysis ..................................................................................................... 1

Compiled by: Karari E.K email: [email protected] 1

1.2. Definition of terms

1.2.5.Database Management Systems

Compiled by: Karari E.K email: [email protected] 2

1.3. Characteristics of Big Data

1.4. Types of Big data analytics

Compiled by: Karari E.K email: [email protected] 3

1.5. Tools used in Big Data analysis

Compiled by: Karari E.K email: [email protected] 4

1.6. Traditional versus Big data business Approach

1.6.1.Relational Databases (SQL)

1.6.2.Schema less and Column oriented Databases (No Sql)

Compiled by: Karari E.K email: [email protected] 5

1.7. Opportunities in Big data Analysis

Lesson 1: Review Questions

Compiled by: Karari E.K email: [email protected] 6

You might also like