Unit - I - Types of Digital Data

unit_I_Types of Digital Data

Uploaded by

hekhodke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views45 pages

Unit - I - Types of Digital Data

unit_I_Types of Digital Data

Uploaded by

hekhodke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Prepared by,

Dr.H.E.Khodke
What is a Data?
• Data is any set of characters that has been gathered and
translated for some purpose, usually analysis.
• It can be any character, including text and numbers, pictures,
sound, or video.
What is Digital Data?
• Digital data are discrete, discontinuous representations of
information or work.
• Digital data is a binary language.
Types of Digital Data
1.Unstructured Data
2. Semi Structured Data
3. Structured
Structured Data
• Refers to any data that resides in a fixed field within a record or file.
• Support ACID properties
• Structured data has the advantage of being easily entered, stored,
queried and analyzed.
• Structured data represent only 5 to 10% of all informatics data.
Unstructured Data
• Unstructured data is all those things that can't be so readily
classified and fit into a neat box.
• Unstructured data represent around 80% of data.
• Techniques: Data mining-Association rule, Regression analysis, Text
mining, NLP etc.,
Semi Structured Data
• Semi-structured data is a cross between the two. It is a type of
structured data, but lacks the strict data model structure.
• Semi-structured data is information that doesn’t reside in a
relational database but that does have some organizational
properties that make it easier to analyze.
Characteristic of Data
• Composition - What is the Structure, type and Nature of
data?
• Condition - Can the data be used as it is or it needs to be
cleansed?
• Context - Where this data is generated? Why? How sensitive
this data? What are the events associated with this data?
What is Big Data?
• Collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools
or traditional data processing applications.
What is Big Data? Cont..
• The data is too big, moves too fast, or doesn’t fit the structures
of your database architectures
• The scale, diversity, and complexity of the data require new
architecture, techniques, algorithms, and analytics to manage it
and extract value and hidden knowledge from it
• Big data is the realization of greater business intelligence by
storing, processing, and analyzing data that was previously
ignored due to the limitations of traditional data management
technologies.
Why Big Data? & what makes Big
Data?
• Key enablers for the growth of “Big Data” are

Increase of storage capacities

Increase of processing power

Availability of data

• Every day we create 2.5 quintillion bytes of data.

• 90% of the data in the world today has been created in the last
two years.
Where does data come from?
Data come from many quarters.
 Science – Medical imaging, Sensor data, Genome
sequencing, Weather data, Satellite feeds
 Industry - Financial, Pharmaceutical, Manufacturing,
Insurance, Online, retail
 Legacy – Sales data, customer behavior, product
databases, accounting data etc.,
 System data – Log files, status feeds, activity stream,
network messages, spam filters.
Where does data come from? Cont..
Characteristics Of 'Big Data'
• 5V’s - Volume, Velocity, Variety, Veracity &
Variability
CHALLENGES
• More data = more storage space
• Data coming faster
• Needs to handle various data structure
• Agile business requirement
• Securing big data
• Data consistency & quality
What is the importance of Big Data?
• The importance of big data is how you utilize the data which
you own. Data can be fetched from any source and analyze it
to solve that enable us in terms of
1) Cost reductions
2) Time reductions
3) New product development and optimized offerings, and
4) Smart decision making.
What is the importance of Big Data?
Cont..
• Combination of big data with high-powered analytics, you can
have great impact on your business strategy such as:
1) Finding the root cause of failures, issues and defects in real
time operations.
2) Generating coupons at the point of sale seeing the customer’s
habit of buying goods.
3) Recalculating entire risk portfolios in just minutes.
4) Detecting fraudulent behavior before it affects and risks your
organization.
Who are the ones who use the Big
Data Technology?
• Banking
• Government
• Education
• Health Care
• Manufacturing
• Retail
Storing Big Data
• Analyzing your data characteristics
 Selecting data sources for analysis
 Eliminating redundant data
 Establishing the role of NoSQL
• Overview of Big Data stores
 Data models: key value, graph, document,
 column-family
 Hadoop Distributed File System
 HBase
 Hive
Big Data Analytics
• It is the process of examining big data to uncover patterns,
unearth trends, and find unknown correlations and other useful
information to make faster and better decisions.
Why is big data analytics important?
• Big data analytics helps organizations harness their data and
use it to identify new opportunities. That, in turn, leads to
smarter business moves, more efficient operations, higher
profits and happier customers.
Types of Analytics
• Business Intelligence
• Descriptive Analysis
• Predictive Analysis
Business intelligence (BI)
• It is a technology-driven process for analyzing data and presenting
actionable information to help executives, managers and other
corporate end users make informed business decisions.
Descriptive Analysis
• Descriptive statistics is the term given to the analysis of data that helps
describe, show or summarize data in a meaningful way such that, for
example, patterns might emerge from the data.
Predictive Analysis
• Predictive analytics is the branch of data mining concerned with the
prediction of future probabilities and trends.
• The central element of predictive analytics is the predictor, a variable that
can be measured for an individual or other entity to predict future behavior.
Predictive Analysis
• There is 2 types of predictive analytics:
◦ Supervised
Supervised analytics is when we know the truth about
something in the past
Example: We have historical weather data. The temperature,
humidity, cloud density and weather type (rain, cloudy, or sunny). Then we can
predict today weather based on temp, humidity, and cloud density today
◦ Unsupervised
Unsupervised is when we don’t know the truth about
something in the past. The result is segment that we need to interpret
Example: We want to do segmentation over the student
based on the historical exam score, attendance, and late history.
Tools used in Big Data
• Where processing is hosted?
Distributed Servers / Cloud (e.g. Amazon EC2)
• Where data is stored?
Distributed Storage (e.g. Amazon S3)
• What is the programming model?
Distributed Processing (e.g. MapReduce)
• How data is stored & indexed?
High-performance schema-free databases (e.g. MongoDB)
• What operations are performed on data?
Analytic / Semantic Processing
Top Big Data Technologies
1. Apache Hadoop
• Apache Hadoop is a java based free software framework that can
effectively store large amount of data in a cluster.
• Hadoop Distributed File System (HDFS) is the storage system of Hadoop
which splits big data and distribute across many nodes in a cluster.
• This also replicates data in a cluster thus providing high availability. It uses
Map Reducing algorithm for processing.
Top Big Data Technologies Cont..
2. NoSQL
• NoSQL (Not Only SQL)is used to handle unstructured data.
• NoSQL databases store unstructured data with no particular schema.
• NoSQL gives better performance in storing massive amount of data. There
are many open-source NoSQL DBs available to analyse big Data.
Top Big Data Technologies Cont..
3. Apache Spark
• Apache Spark is part of the Hadoop ecosystem, but its use has
become so widespread that it deserves a category of its own.
• It is an engine for processing big data within Hadoop, and it's
up to one hundred times faster than the standard Hadoop
engine, Map Reduce.
Top Big Data Technologies Cont..
4. R
• R, another open source project, is a programming language
and software environment designed for working with statistics.
• Many popular integrated development environments (IDEs),
including Eclipse and Visual Studio, support the language.
Applications for Big Data Analytics
DATA SCIENTIST
• Data scientist/analyst is one of the trending and emerging job
in the market
terminology of Big Data

• a. In-Memory Analytics
• b. In-Database processing
• c. Symmetric Mulit-processor system
• d. Massively parallel processing
• e. Shared nothing architecture
• f. CAP Theorem
In-memory Analytics
• Data access from non-volatile storage such as
hard disk is a slow process. This problem has
been addressed using In-memory Analytics.
Here all the relevant data is stored in Random
Access memory (RAM) or primary storage thus
eliminating the need to access the data from
hard disk. The advantage is faster access rapid
deployment, better insights, and minimal IT
involvement.
In-Database Processing
• In-Database processing is also called In-
database analytics. It works by fusing data
warehouses with analytical systems. Typically
the data from various enterprise OLTP systems
after cleaning up through the process of ETL is
stored in the Enterprise Dataware house or
data marts. The huge data sets are then
exported to analytical programs for complex
and extensive computations.
Symmetric Multi-Processor System
• In this there is single common main memory
that is shared by two or more identical
processors. The processors have full access to
all I/O devices and are controlled by single
operating system instance.
• SMP are tightly coupled multiprocessor
systems. Each processor has its own high
speed memory called cache memory and are
connected using a system bus
Symmetric Multi-Processor System
Massively Parallel Processing
• Massively parallel Processing (MPP) refers to the
coordinated processing of programs by a number
of processors working parallel. The processors
each have their own OS and dedicated memory.
They work on different parts of the same
program. The MPP processors communicate
using some sort of messaging interface.
• MPP is different from symmetric multiprocessing
in that SMP works with processors sharing the
same OS and same memory. SMP also referred as
tightly coupled Multiprocessing
Massively Parallel Processing
Shared nothing Architecture
• The three most common types of architecture for multiprocessor
systems:
• 1. Shared memory
• 2. Shared disk
• 3. Shared nothing.

• In shared memory architecture, a common central memory is

shared by multiple processors.
• In shared disk architecture, multiple processors share a common
collection of disks while having their own private memory.
• In shared nothing architecture, neither memory nor disk is shared
among multiple processors.
Shared nothing Architecture
• Advantages of shared nothing architecture:
• Fault Isolation: A “shared nothing architecture” provides
the benefit of isolating fault. A fault in a single node is
contained and confined to that
• node exclusively and exposed only through messages or
lack of it. Scalability: Assume that the disk is a shared
resource it implies that the controller and the disk band-
width are also shared. Synchronization will have to be
implemented to maintain a consistent shared state. This
would mean that different nodes will have to take turns to
access the critical data. This imposes a limit on how many
nodes can be added to the distributed shared disk system,
thus compromising on the scalability.
CAP Theorem
• The CAP theorem is also called the Brewer’s theorem. It states that in a distributed
computing environment, it is impossible to provide the following guarantees. At
best you can have two of the following three and one must be sacrificed.
• 1. Consistency
• 2. Availability
• 3. Partition tolerance

• 1. Consistency implies that every read fetches the last write. Consistency means
that all nodes see the same data at the same time. If there are multiple replicas
and there is an update being processed, all users see the update go live at the
same time even if they are reading from different replicas.
• 2. Availability implies that reads and writes always succeed. Availability is a
guarantee that every request receives a response about whether it was successful
or failed.
• 3. Partition tolerance implies that the system will continue to function when
network partition occurs. It means that the system continues to operate despite
arbitrary message loss or failure of part of the system
CAP Theorem
Thank You

KCA 034 - Unit 1
No ratings yet
KCA 034 - Unit 1
48 pages
Unit1 BDT
No ratings yet
Unit1 BDT
96 pages
Module 4 DSBD
No ratings yet
Module 4 DSBD
89 pages
Data Analytics Da Quantum
No ratings yet
Data Analytics Da Quantum
127 pages
08 Big Data Introduction
No ratings yet
08 Big Data Introduction
15 pages
Data Analytics
No ratings yet
Data Analytics
127 pages
Data, Big
No ratings yet
Data, Big
90 pages
U1 A CLSRM
No ratings yet
U1 A CLSRM
33 pages
Big Data Analytics
No ratings yet
Big Data Analytics
83 pages
Big Data Analytics Unit1
No ratings yet
Big Data Analytics Unit1
20 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Unit 1
No ratings yet
Unit 1
44 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
"Ÿ""Isliln: Formation
No ratings yet
"Ÿ""Isliln: Formation
34 pages
Big Data - Module 1
No ratings yet
Big Data - Module 1
35 pages
Hamid Seminar
No ratings yet
Hamid Seminar
57 pages
OC - Module 1 - Intro To BDA 021312
No ratings yet
OC - Module 1 - Intro To BDA 021312
37 pages
BIG Data1
No ratings yet
BIG Data1
49 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
34 pages
Bigdata Mod-1
No ratings yet
Bigdata Mod-1
33 pages
Big Data Analytics
No ratings yet
Big Data Analytics
73 pages
UNIT I Notes
No ratings yet
UNIT I Notes
26 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
Introduction To Data
No ratings yet
Introduction To Data
34 pages
Mtech Scheme
No ratings yet
Mtech Scheme
54 pages
Bda U1
No ratings yet
Bda U1
78 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
BDU1
No ratings yet
BDU1
39 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
Da 1
No ratings yet
Da 1
20 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
Ms RDPBCGR
No ratings yet
Ms RDPBCGR
459 pages
A Course in Machine Learning
No ratings yet
A Course in Machine Learning
189 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
31 pages
Unit-1 Final Sgs
No ratings yet
Unit-1 Final Sgs
24 pages
CHAPTER 02: Big Data Analytics
No ratings yet
CHAPTER 02: Big Data Analytics
73 pages
Unit - I Question & Answer
No ratings yet
Unit - I Question & Answer
23 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
Module 1
No ratings yet
Module 1
21 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
5 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
BDT 1
No ratings yet
BDT 1
49 pages
University Institute of Computing: Big Data Analytics 21CAH-782
No ratings yet
University Institute of Computing: Big Data Analytics 21CAH-782
13 pages
Shcspraccs
No ratings yet
Shcspraccs
49 pages
Reviewerku
No ratings yet
Reviewerku
6 pages
Big Data Analytics Project Proposal by Slidesgo
No ratings yet
Big Data Analytics Project Proposal by Slidesgo
12 pages
What Is Big Data
No ratings yet
What Is Big Data
4 pages
Unit 1 and Unit 2 Notes Bda
No ratings yet
Unit 1 and Unit 2 Notes Bda
11 pages
Wrong File
100% (1)
Wrong File
224 pages
Insights Into Big Data: An Industrial Perspective
No ratings yet
Insights Into Big Data: An Industrial Perspective
52 pages
Unit 2
No ratings yet
Unit 2
35 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
Learning Functional Data Structures and Algorithms Learn Functional Data Structures and Algorithms For Your Applications and Bring Their Benefits To Your Work Now Khot Ebook All Chapters PDF
100% (1)
Learning Functional Data Structures and Algorithms Learn Functional Data Structures and Algorithms For Your Applications and Bring Their Benefits To Your Work Now Khot Ebook All Chapters PDF
55 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
4 pages
Edt Whatsapp Presentation 1 1
No ratings yet
Edt Whatsapp Presentation 1 1
11 pages
Npar Tests: Descriptive Statistics
No ratings yet
Npar Tests: Descriptive Statistics
58 pages
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
Exceptions & Gui & Threads
No ratings yet
Exceptions & Gui & Threads
22 pages
Gprs Gprs Data Over Cellular Networks Data Over Cellular Networks Data Over Cellular Networks Data Over Cellular Networks
No ratings yet
Gprs Gprs Data Over Cellular Networks Data Over Cellular Networks Data Over Cellular Networks Data Over Cellular Networks
51 pages
Ready Reckoner For Emails
No ratings yet
Ready Reckoner For Emails
6 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Connect 4: (2D Array)
No ratings yet
Connect 4: (2D Array)
10 pages
ECEN 667 Power System Stability: Lecture 24:stabilizer Design, Measurement Based Modal Analysis
No ratings yet
ECEN 667 Power System Stability: Lecture 24:stabilizer Design, Measurement Based Modal Analysis
33 pages
Test
No ratings yet
Test
30 pages
Lect2 Lists 2
No ratings yet
Lect2 Lists 2
20 pages
BIS4225.3 - Components of Information Systems Architecture
No ratings yet
BIS4225.3 - Components of Information Systems Architecture
36 pages
Lect1 Lists 1
No ratings yet
Lect1 Lists 1
15 pages
User Manual: Ekom/Cmmi/Eng/Rsd/Um Fosimv2 - Um - SMK - V1-0
No ratings yet
User Manual: Ekom/Cmmi/Eng/Rsd/Um Fosimv2 - Um - SMK - V1-0
14 pages
Point Schedule BAS - LAVENUE - r2
No ratings yet
Point Schedule BAS - LAVENUE - r2
10 pages
ZXWR RNC Product Description
No ratings yet
ZXWR RNC Product Description
21 pages
An Early History of The Internet - Leonard Kleinrock - 2010
No ratings yet
An Early History of The Internet - Leonard Kleinrock - 2010
31 pages
The Evolution of Internet Services PDF
No ratings yet
The Evolution of Internet Services PDF
12 pages
Validation of Atmospheric Boundary Layer CFD Simulation of A Generic Isolated Cube: Basic Settings For Urban Flows
No ratings yet
Validation of Atmospheric Boundary Layer CFD Simulation of A Generic Isolated Cube: Basic Settings For Urban Flows
9 pages
17 Aptitude TSD Average Speed Made
No ratings yet
17 Aptitude TSD Average Speed Made
6 pages
The Palace of Westminster
No ratings yet
The Palace of Westminster
18 pages
Cisco Certification Tracking Guide
No ratings yet
Cisco Certification Tracking Guide
4 pages
How Can I Select An Object in Tekla and Get Its Info - General Discussions - Tekla Discussion Forum PDF
No ratings yet
How Can I Select An Object in Tekla and Get Its Info - General Discussions - Tekla Discussion Forum PDF
4 pages
COURSE SYLLABUS Introduction To C++ Programming
No ratings yet
COURSE SYLLABUS Introduction To C++ Programming
3 pages
TeXRefCard v1 5 PDF
No ratings yet
TeXRefCard v1 5 PDF
2 pages
Shadow Properties in Entity Framework Core
No ratings yet
Shadow Properties in Entity Framework Core
4 pages
Maptek BlastLogic Overview
No ratings yet
Maptek BlastLogic Overview
4 pages
Opertaor Overloading
No ratings yet
Opertaor Overloading
30 pages
KBP Polytechnic, Kopargaon: Assignment No 4
No ratings yet
KBP Polytechnic, Kopargaon: Assignment No 4
1 page
CLK Yl Ntb620 Cr2
No ratings yet
CLK Yl Ntb620 Cr2
3 pages
CV Thameur Aissaoui Eng
No ratings yet
CV Thameur Aissaoui Eng
2 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

Unit - I - Types of Digital Data

Uploaded by

Unit - I - Types of Digital Data

Uploaded by

Prepared by,

Increase of storage capacities

Increase of processing power

• Every day we create 2.5 quintillion bytes of data.

• In shared memory architecture, a common central memory is

You might also like