0% found this document useful (0 votes)
6 views6 pages

File 11

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

File 11

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Nasir Qureshi

Unsupervised learning: data forecasting..... unlabeled data

# build upon your python and your sql skills:


big data

Module delivery:

1. Big Data introduction


2. Installation
# Oracle Virtual Machine
# Cloudera vm
# FILE TRASFER SOFT: Filezilla

# basic of sql also

# Spark...

# Python knowledge: to cover major advanced data analysis


# implementation of Pyspark

# Data analytics with the help of spark basd environment


# Hive based data analysis using Cloudera

# 6 session of 2 hours:
# 2 session of 1 hour: doubt clearning session

# Installed required softwares:


# install !!!?

# software:
# red---
# 6 gb: cloudera

# installation:

# Oracle Virtual machine: https://www.virtualbox.org/wiki/Downloads


# Cloudera VM
https://downloads.cloudera.com/demo_vm/virtualbox/cloudera-
quickstart-vm-5.13.0-0-virtualbox.zip
# File zilla
# Workbench sql

Infrastructure of Data Pipe:

# virtual world:
# how and where to save all this ongoing & generated data

# data :
# Structured # Unstructured
# tabular format file
# row and columns
# dictionary:
# json:
# key: value
# column: [32432,46,6,34]

# 17% of real world data:

# solution of real world problems are


more complex

# social tech data:


# emoji's
# photos
# videos
# snaps
# tic.... toc content
# daily status
# tweets
# reviews.... text ...
# gif files
# sensor data
# hashtag
# voice recording...
# reels
# youtube....
# share.... market....daily news...

# 83% share in data world

# tools to store it:


# do this much of data: can really you create wealth:
resource

# Data is most valuable resource in our world: Value (83%


unstructured data)
# Oil

# A lot's of DATA: Big DATA


# 3 Characterics:
# Volume: 83% .....
# Variety: big data: Humans:
# Velocity:

# Rate at which data is getting generate = Rate at which we make


decision out of it

# Technical:
#
# Flipkart big billion day sale: 12:00 AM Oct - 11
# Amazon: AWS Platform ????

# 3 days: millions: platform must be smooth:


# review....
# average uses for any platform:
# 3 million
3 days # 9 million user daily

# buy more machines to store the


information:
# cost very much
# outsourcing:?????
# Pay as you go: service:
# cloud platforms

# recommendation: using KMeans


algorithm:
# cluster: of what was
bought together buy other
#
customer... forecasting

# mobile phone:
recommendation.....using your model:
# resources:
jupyter notebook:
# jupyter
notebook

# where actual data will be saved:


# keep my storage location very close to my consumer base:
# data fetching and saving speed will be high

# data storing is becoming cheaper because of cloud platform: rent


serviCE

# Peta bytes.... (pb)


# product
# customers
# reviews
# sales
# bankings... card, otp

# XYZ product: arrange a stock of 100 units:


# steep discount : 75%
# 1 min: pan india:
# 99 orders in 1 min:
12:00:59 am

# server:
# updating the available product in
their warehouse:
# 98th order is books:
system must show only 2 product left
# 1 sec of time: 10 people try transaction:
# 1 order:
# 1st order allocation to 1st person

# 5-6 years ago: With Paytm:


# 100% 5-6: 200:
# server:
# updating the available product in
their warehouse:
# product is out of stock:
# 9 orders will be pending:
# cancel
# Velocity:

# Rate at which data is getting generate = Rate at which we make decision


are made

# Hot Steaming of data: Real time streaming:


# platform will go down:
# data scientist and data engineer: optimum
and efficient

# unsupervised learning at the same time on


the platform:
# show a cluster of buy together....
# earphone
# cover

# big data tools and technologies:


# 2002-03: Hadoop was born:
# Map reduce:
# 2004-05: Map reduce: google give the research paper
# 2007: Hadoop with Map Reduce

# velocity issue: process information as fast as possible: big data


tech

# cloud platforms: which provide services to store data digitally:


# 2006s: 2007 AWS was launched:

# certification....
# physical data capturing machines....
# laptop: hard disk....
# space:
# demand keep
fluctuating for data capturing

# Hadoop:
# Room in a building:
# bed, washing machine, fan, laptop......

# Cluster in Cloud platform:


# millions/thousands of clusters on every platforms now a
days

# save our database, processing of incoming data,


# process data out, ligh fast speed

# machine learning ????


# platform were added in Hadoop system during 2005-2009:

# search something: data is processed out:

# Disk Output/Input: another disk location

# Application name: YARN: Yet another resource

# facebook: created one cloud account:


# sending all users information....
# resources in cloud platforms
# processing power?????
# HDFS with Map
Reduce

# Machine learning: tools


# data analysis tools
# fetching a filtered data: join queries, aggregate
queries:

# 100x times faster clocking speed in data processing:


# HDFS:
Map Reduce: processing the data: from disk
# saving back data into the
disk at another location

Spark: more Memory: save information in memory itself:


# RAM: RECENTLY ALLOCATED
MEMORY:

You might also like