0% found this document useful (0 votes)

16 views3 pages

What Is Big Data Analytics

Internet

Uploaded by

charleenmandibvira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views3 pages

What Is Big Data Analytics

Internet

Uploaded by

charleenmandibvira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 3

What is big data analytics?

Big data analytics is the use of advanced analytic techniques against very large, diverse big data
sets that include structured, semi-structured and unstructured data, from different sources, and in
different sizes from terabytes to zetta bytes.

What is big data exactly? It can be defined as data sets whose size or type is beyond the ability of
traditional relational databases to capture, manage and process the data with low latency.
Characteristics of big data include high volume, high velocity and high variety. Sources of data
are becoming more complex than those for traditional data because they are being driven by
artificial intelligence (AI), mobile devices, social media and the Internet of Things (IoT). For
example, the different types of data originate from sensors, devices, video/audio, networks, log
files, transactional applications, web and social media — much of it generated in real time and at
a very large scale.

With big data analytics, you can ultimately fuel better and faster decision-making, modelling and
predicting of future outcomes and enhanced business intelligence. As you build your big data
solution, consider open source software such as Apache Hadoop, Apache Spark and the entire
Hadoop ecosystem as cost-effective, flexible data processing and storage tools designed to
handle the volume of data being generated today.

Data Preprocessing in Data Mining

Preprocessing in Data Mining:

Data preprocessing is a data mining technique which is used to transform the raw data in a useful
and efficient format.

Steps Involved in Data Preprocessing:

1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part, data cleaning is done.
It involves handling of missing data, noisy data etc.

 (a). Missing Data:

This situation arises when some data is missing in the data. It can be handled in various
ways.
Some of them are:
1. Ignore the tuples:
This approach is suitable only when the dataset we have is quite large and
multiple values are missing within a tuple.

2. Fill the Missing values:

There are various ways to do this task. You can choose to fill the missing values
manually, by attribute mean or the most probable value.

 (b). Noisy Data:

Noisy data is a meaningless data that can’t be interpreted by machines.It can be generated
due to faulty data collection, data entry errors etc. It can be handled in following ways :
1. Binning Method:
This method works on sorted data in order to smooth it. The whole data is divided
into segments of equal size and then various methods are performed to complete
the task. Each segmented is handled separately. One can replace all data in a
segment by its mean or boundary values can be used to complete the task.

2. Regression:
Here data can be made smooth by fitting it to a regression function.The regression
used may be linear (having one independent variable) or multiple (having
multiple independent variables).

3. Clustering:
This approach groups the similar data in a cluster. The outliers may be undetected
or it will fall outside the clusters.

2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable for mining process.
This involves following ways:

1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to 1.0)

2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of attributes to help the
mining process.

3. Discretization:
This is done to replace the raw values of numeric attribute by interval levels or
conceptual levels.

4. Concept Hierarchy Generation:

Here attributes are converted from lower level to higher level in hierarchy. For Example-
The attribute “city” can be converted to “country”.
3. Data Reduction:
Since data mining is a technique that is used to handle huge amount of data. While working with
huge volume of data, analysis became harder in such cases. In order to get rid of this, we uses
data reduction technique. It aims to increase the storage efficiency and reduce data storage and
analysis costs.

The various steps to data reduction are:

1. Data Cube Aggregation:

Aggregation operation is applied to data for the construction of the data cube.

2. Attribute Subset Selection:

The highly relevant attributes should be used, rest all can be discarded. For performing
attribute selection, one can use level of significance and p- value of the attribute.the
attribute having p-value greater than significance level can be discarded.

3. Numerosity Reduction:
This enable to store the model of data instead of whole data, for example: Regression
Models.

4. Dimensionality Reduction:
This reduce the size of data by encoding mechanisms.It can be lossy or lossless. If after
reconstruction from compressed data, original data can be retrieved, such reduction are
called lossless reduction else it is called lossy reduction. The two effective methods of
dimensionality reduction are:Wavelet transforms and PCA (Principal Component
Analysis).

Training Report On Telecommunication and Signal-Indian Railways
100% (3)
Training Report On Telecommunication and Signal-Indian Railways
38 pages
S2000 Ddec Iv 170708
100% (4)
S2000 Ddec Iv 170708
95 pages
Atlas - Histologie PDF
100% (1)
Atlas - Histologie PDF
133 pages
DATA MINING Notes
No ratings yet
DATA MINING Notes
37 pages
Facilities Management Policy Draft 12
100% (2)
Facilities Management Policy Draft 12
36 pages
Data Binning
No ratings yet
Data Binning
9 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
IV-cse DM Viva Questions
No ratings yet
IV-cse DM Viva Questions
10 pages
CCS357 Lab Manual
No ratings yet
CCS357 Lab Manual
41 pages
Unit-3 Data Preprocessing
100% (1)
Unit-3 Data Preprocessing
7 pages
China Book Digital Publishing Market Analysis Yanping Bryant Openbook Trajectory
No ratings yet
China Book Digital Publishing Market Analysis Yanping Bryant Openbook Trajectory
37 pages
Data Mining
No ratings yet
Data Mining
11 pages
TUV Certificate - HC900 Safety
No ratings yet
TUV Certificate - HC900 Safety
1 page
DataMining S
No ratings yet
DataMining S
103 pages
Lateral Earth Pressures For Seismic Design of Cantilever Retaining Walls
100% (2)
Lateral Earth Pressures For Seismic Design of Cantilever Retaining Walls
8 pages
GMAT Integrated Reasoning
No ratings yet
GMAT Integrated Reasoning
12 pages
Notes - Unit01 - Data Science and Big Data Analytics
No ratings yet
Notes - Unit01 - Data Science and Big Data Analytics
7 pages
OD2e L2 Word List
No ratings yet
OD2e L2 Word List
5 pages
Alienware 17 R4 Service Manual: Computer Model: Alienware 17 R4 Regulatory Model: P31E Regulatory Type: P31E001
No ratings yet
Alienware 17 R4 Service Manual: Computer Model: Alienware 17 R4 Regulatory Model: P31E Regulatory Type: P31E001
133 pages
Lecture Notes Data Mining Data Warehousing Unit-2: Data Preprocessing
No ratings yet
Lecture Notes Data Mining Data Warehousing Unit-2: Data Preprocessing
3 pages
DWM Assigment-Questions Ans
No ratings yet
DWM Assigment-Questions Ans
67 pages
C Programming Sollution
100% (1)
C Programming Sollution
43 pages
Unit 2 Preprocessing in Data Mining
No ratings yet
Unit 2 Preprocessing in Data Mining
6 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Photonicsspectra 201208
No ratings yet
Photonicsspectra 201208
84 pages
Technical Note Fortimail Transparent Mode Options Explained Revision 0.7
No ratings yet
Technical Note Fortimail Transparent Mode Options Explained Revision 0.7
10 pages
LECTURE 3-BDM 411 Data Analytics and BIG Data
No ratings yet
LECTURE 3-BDM 411 Data Analytics and BIG Data
49 pages
Lecture 3 Unit 1
No ratings yet
Lecture 3 Unit 1
61 pages
Down 2
No ratings yet
Down 2
61 pages
Data Mining and Data Warehousing CSPC-308
No ratings yet
Data Mining and Data Warehousing CSPC-308
51 pages
Data Preprocessing Steps 2
No ratings yet
Data Preprocessing Steps 2
26 pages
Screenshot 2025-04-09 at 10.35.12 AM
No ratings yet
Screenshot 2025-04-09 at 10.35.12 AM
31 pages
Dr. Shyam N. Chawda, C Language Tutorial, 78 74 39 11 91 1.1 Concepts of Programming Methodology
No ratings yet
Dr. Shyam N. Chawda, C Language Tutorial, 78 74 39 11 91 1.1 Concepts of Programming Methodology
64 pages
Data Mining Basics
No ratings yet
Data Mining Basics
52 pages
Chapter 3 - For Class
No ratings yet
Chapter 3 - For Class
52 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
3 Preprocessing
No ratings yet
3 Preprocessing
27 pages
Lect 4
No ratings yet
Lect 4
30 pages
Unit 2 Data Warehouse and Data Mining
No ratings yet
Unit 2 Data Warehouse and Data Mining
19 pages
Unit - 2
No ratings yet
Unit - 2
17 pages
Data Mining
No ratings yet
Data Mining
22 pages
Data Mining Basics
No ratings yet
Data Mining Basics
38 pages
Dta Mining
No ratings yet
Dta Mining
15 pages
Data Mining 3
No ratings yet
Data Mining 3
31 pages
Data Warehouse and Data Mining - Definition and Concepts
No ratings yet
Data Warehouse and Data Mining - Definition and Concepts
20 pages
Bi Lesson 6
No ratings yet
Bi Lesson 6
36 pages
Module2 DataPreprocessing
No ratings yet
Module2 DataPreprocessing
27 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
25 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
16 pages
Data Mining UNIT II
No ratings yet
Data Mining UNIT II
19 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
Unit 3 DW&DM Notes Mr. Rohit Pratap Singh
No ratings yet
Unit 3 DW&DM Notes Mr. Rohit Pratap Singh
22 pages
Unit - III DW
No ratings yet
Unit - III DW
14 pages
QB 10 Marker
No ratings yet
QB 10 Marker
19 pages
Module III Data Mining
No ratings yet
Module III Data Mining
7 pages
MachineReport 20231110 173216
No ratings yet
MachineReport 20231110 173216
14 pages
Shortnjn
No ratings yet
Shortnjn
12 pages
Unit 3
No ratings yet
Unit 3
18 pages
Oslo 1
No ratings yet
Oslo 1
69 pages
Data Pre Processing
No ratings yet
Data Pre Processing
11 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
11 pages
IBA - MODULe 4.3
No ratings yet
IBA - MODULe 4.3
10 pages
Data Preprocessing
No ratings yet
Data Preprocessing
8 pages
BUSINESS INTELLIGENCE NOTES Unit 4
No ratings yet
BUSINESS INTELLIGENCE NOTES Unit 4
10 pages
Unit 3 Data Warehousing and Data Mining
No ratings yet
Unit 3 Data Warehousing and Data Mining
7 pages
Data Mining
No ratings yet
Data Mining
40 pages
TMS IntraWeb Component Pack Quick Start
No ratings yet
TMS IntraWeb Component Pack Quick Start
17 pages
Mohammed Radwan CV PDF
No ratings yet
Mohammed Radwan CV PDF
6 pages
OJCST Vol13 N2-3 P 78-81
No ratings yet
OJCST Vol13 N2-3 P 78-81
4 pages
DATA MINING Notes (Upate)
No ratings yet
DATA MINING Notes (Upate)
25 pages
3.data Pre-Processing Concepts
No ratings yet
3.data Pre-Processing Concepts
8 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
3 pages
Snapdeal MIS
No ratings yet
Snapdeal MIS
16 pages
Data Mining
No ratings yet
Data Mining
5 pages
Data Preprocessing Unit 2
No ratings yet
Data Preprocessing Unit 2
3 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
4 pages
Data Preprocessing
No ratings yet
Data Preprocessing
3 pages
Integral Control - Odp
No ratings yet
Integral Control - Odp
16 pages
Mobile Application Development Past
No ratings yet
Mobile Application Development Past
3 pages
02 Data Warehouse
No ratings yet
02 Data Warehouse
18 pages
PHP - Form Introduction: Dynamic Websites
No ratings yet
PHP - Form Introduction: Dynamic Websites
3 pages
Schneider - 45RIEC PDF
No ratings yet
Schneider - 45RIEC PDF
28 pages
Syamali Ray
No ratings yet
Syamali Ray
2 pages
AWS DDA Agenda PDF
No ratings yet
AWS DDA Agenda PDF
1 page
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
5 pages
WWW Kratikal Com Blog How Is Vulnerability Management Different From Vulnerability Assessment
No ratings yet
WWW Kratikal Com Blog How Is Vulnerability Management Different From Vulnerability Assessment
7 pages
Allen Bradley File List
No ratings yet
Allen Bradley File List
2 pages
IBM University Relations - Newsletter (Q3&4,2010)
No ratings yet
IBM University Relations - Newsletter (Q3&4,2010)
21 pages
GSM and Vas
No ratings yet
GSM and Vas
27 pages

What Is Big Data Analytics

Uploaded by

What Is Big Data Analytics

Uploaded by

What is big data analytics?

Data Preprocessing in Data Mining

Preprocessing in Data Mining:

Steps Involved in Data Preprocessing:

 (a). Missing Data:

2. Fill the Missing values:

 (b). Noisy Data:

4. Concept Hierarchy Generation:

The various steps to data reduction are:

1. Data Cube Aggregation:

2. Attribute Subset Selection:

You might also like