BD Unit 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Onit 9ntvoduction to Eigdota

a acole cti on of dola that u huge


gDala
tnwolumey 80win9 oxponen tially oilh tim
omplexity tho
dola o t h larg S; V
ond
1 o
data mano.qement o01
oone t o t i on
ctore DY rDCe bA it e4ficiontuy
le.
yBig o I6Ddata ó doto bu l :4h hug
Tyhf Egdata
con bi stored ,actebh
dota h at
sructure d; -hny 4fiued -format y
and proce A he
d n he for
a.
termed a a
'
stuefure d dat
data
theormot uo e1. bnouo
I n structure d
n odVo nce.
RDEMS Lample 0f shuctured dot a.
t
3.unshuluredl

D1m or the Slrue tute


fAn y dafa toth ut noon

Enowo
o
unslructured data,
uns fmucture d type i
daf a in
*To chre big the
ohol)engeA in termA oftx po e
posbe multiple of
for deriving Ma lus out

2Outpu furne d by 'G0ogl eorch, vid: on, PD


Te

Sem)-S/ucture d.
he form of data Sem-chue lu-
bolh
con boinA in form but i
a^ actue tured
dota
con be Betn
no struc tu
d l e

E Data store d ?n xM ft la. an

charocte
Th ere or
ohc
1o vA to dekcribe the o+ big do ta

The main %0
volume)
of dafo
volume E19 dofa a votk

Teneo t d from mony CourCt

uge mount dala


data -from ditfe(a0
Vaay D form at of
buved
Sourte 5 7 d o f a tuelovnd unCpslruc
can be
llec trd.
,

and Semi- stuc lured hal ave bei ng


diftenh Source &. Atoo- o-dayA Big data conlan

VideoA.
PDP), 0maill, au di ot,
Ve
locity on impoatant ahorocluui htie
8.Velocity
of big do ta
the data

Vetocily creafea 4he spu d by oh:eh


CTeafed in Teal- 4i me. t contalnA he hnEimg o
incoming doto ceta Spes drat of change an
activity burss
ntP moy ope st Big d ato u b pro v'de
deta rapid y
deols iththe T
Spu d
af
dato velo ci ty
Big ile appi catron to9 a
dota ftouoA-o Spurcaa

buAines prceb etc


t.Ve Yacthy data Te liable
meons houo much th
Veracity trans latt
the data
to fi1ber or

t ho) many y ans

of being a ble to hand b


Veraci ty u the pmtt A
mona q he data efficiuty
tion
-

viscoA, ty
-

viroh fty-vibualiza
- valid y Venut

wo lut-vOco bulary
Fa gua ness

voltitihy
in
hallongeA Bq data.

sforiog
3 0y 0nd Securit

shating ond occenAing dota

-analytical challeng
5choc al chaeng
ngis
- Ouo li y of data
+oulETolerana
Sce la bil
Visu aliotion
a c k of poptr undercta ndin q f brg date

Sou rc of big da ta
t.Ale yort stock erchong
.Sociel media- fa, into,t t te,

TOolto hondl bigdaba


: tto doop
cloudera
e h n v ife
3. AwS- Am a Zon AUTpot
Mo pR
6 MS 9eig h
-6ua ntito tive(humeeal shole nu mber et; ag.
vo lue
dota
6u olit ative; DrAcription about slata/feauh
o-f data

Analyics DEu a procel8 of eraminiog. inty pre in


the dato to moke efftctive dec:hon

ypes of onaly tic-here are


Type A of anolytic
'n bg data,

Dagnosie onalyticn t gie) diagnos:A HD a pr


3t ive a d e t a i l e d i n d p t h in Sght totb the rai
Ouule of opp blm
ote drit
in diogno ttic onalytrCa
uued

doon, dato mininq.

2-Descriptiv onalyticn
con
ConAicdred aß a uhe +rchoiq u Tt0r Cane

paterop vi 4h a Crfoin 84 of cuS Tomero,


i m pld ie A h doto Summovu pat d
into re a d a ble fov
Df shaf a hopPenins
pr 0vide +h inAightA
sohot ha) happem
ke f baG4L
9 cluclesing %Abciotien ul rd
in man

anatys,
3Redietive analyticp:
h typ of onaly ticp mate) vhe of hib lorical
eventA
ptentdata to prdiet Futun
in fu lüre. becau
33-foreCo sh might ho ppen
ohat
babiisPe in ntur
o predicFive ana lytic) onpro
D A t dato mining, n, machine learning to anelyz
happen n spect
Cuen daB fove co 1t sho migh
Scunar
4 Pr ciptive AnalyAis.
fom
he moct a luo bl yei unde ur d
analytich
8 S t a 4he nett sttp n predichive analyli
Spreeitive analyticn erplore) &nral bOMh
a tHons Suggest octian) depunding on rehult e

descip fiv predctie anolytt oh


Bosie uni of analytic u custame
RoltA oi ou stomer change to.rt d Scenoriou
Odliov
cdelex:lion And Trectmerb
fhal me dosimlar
ery di
oburavilions

olreme
Otlies 7ve

of thee bpulalfon.
to Ihe Ye st cre :

ypes of OlIlfens They 1e 4I mlion)


ave
of boss
F Thee sany
Valfll observatfons (eq,
ts S Hs)
observallons(e.g-age
nvaltd EenceIhat h e are
oulltese în Ihe
Bolb cte Uivaniale
Vie0 of The chla
ebla
one cimension
u on
o Unydimenstonal
Oatlle1s can be hiclelen
ihal ostlig in
obeeavalions

* Mullivariate
hulticliroensfonal
outlters ce

muttiple d m e n s f o n s .
deeclion &Trealmy
d
Lotth o t l f e s
Sleps n dealig
Tuo imp o fbínfanomlia
Pes of outliers outlhers
DGbbal
They o e cordliffonal
8ypes coezual )
CTe 3)
There 8) olle:tive outliers

constdewec)-globn oatier T T s
Globnl outliers
x(Aclate potet1s of he datoset

oud stde he eninei


Value s fo
deviafes
Sanitta
ebich 1S -found o u t l i e r s c o h i c h

hese are ind of

Test of the dato. outlier & Ps ound n


rom he of
1s the sroplest
sT
cases othec data
he mojortty of om
dslnet
ermaios
A.bbal outliers

155 outlfer s
otots oefve Sen og
en Cedit cord voud K ans vansceia
al data
1f Le
detection frven he
ome tide 7 datly transaclo7

coh
ve
ranoadfo .al omor
16
con sTdered
gh omount of fnd
Clokhal cutifer
Conlexlaal oul lferS
1 a a pecitc conlet, th
data fnatance T6 anomalous fn

Could becalled a
contectual oule a Smalroup
of
*hn A
conlecual outer c e (a esent
outlfens baving some srilo aatureS
3. coleclive outTers
"Wihen aSub et o obsevolions n o dalo
coledve
Ts called
data set, 1
deviales
pntfycontly iom he enle

Ourtlfers
cofthin he aollecie ou
*TE 1s ot necessorr that each nstance

1S also ouflfer
an
keep he
onlext

fmp fo
When
Secbing out lfer detectton, 1t s
ouethe con
alio De
Decaulse Sometfnes, a Peint Colled or

Contettub outller ven contel otdhe S


Visualiaatiom Tools
orocled aveas a
Hsto gams he outieas

950

SD
TH 26-60
U I20 20-25
sHo

out i e s 2 0
roozimu. TeP)
Cea+l-S* aR)

Bo plots
Thee ore g vey quotles o data. 3 4"Peracestfle)

5T quorhle (a pexcent Cf obsea li)Tntergutle mectfann

have touoer veilue. pange


medt an (50 Joth pecattle )
tved uev1tle (
minimum
C1-1-5 *TSR)
Ot Tes O
Mtesmg akue
becaulSe ot Vorious teasone
Miseing values
Can OCcu
ble.
be en applica
The inomatiom can
undisclosed încorne
albo be
he in-fo can etsclosedhts c her
to
cuetomer decided nat
aa dunig
a eroT
ecause of Poivacy because of
Can
albo originate
Misstna data
o TD)
names
(e3 6ypos to Can irectly
erga techngUese-
dectsiom rees)
Some analytical
Miesmg valueS
deal cotth Eome
additional
peptoceesin
need coRth missi
Othe techn9ues echenes to deal
most Populo
Tbe tollouing are Hhe

ng volues he missing
This irmplfes replacing
Replace Ctopute):
value otth a knoun Value. e t h buTeau
the mising credib
me ould imperte of the
or ode
o Median
Scorres fth T average

kneuon Va lue oplion an


1She most etroighttosu oid
Delete This
uth lots f
observatioms. or Variables
Consiets af deleTn

missing yalses.
The nto
6 masinaat Tandom an h Ao e i

You might also like