0% found this document useful (0 votes)

6 views6 pages

File 11

Uploaded by

AMITKUMAR RAUTRAY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views6 pages

File 11

Uploaded by

AMITKUMAR RAUTRAY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Nasir Qureshi

Unsupervised learning: data forecasting..... unlabeled data

# build upon your python and your sql skills:

big data

Module delivery:

1. Big Data introduction

2. Installation
# Oracle Virtual Machine
# Cloudera vm
# FILE TRASFER SOFT: Filezilla

# basic of sql also

# Spark...

# Python knowledge: to cover major advanced data analysis

# implementation of Pyspark

# Data analytics with the help of spark basd environment

# Hive based data analysis using Cloudera

# 6 session of 2 hours:
# 2 session of 1 hour: doubt clearning session

# Installed required softwares:

# install !!!?

# software:
# red---
# 6 gb: cloudera

# installation:

# Oracle Virtual machine: https://www.virtualbox.org/wiki/Downloads

# Cloudera VM
https://downloads.cloudera.com/demo_vm/virtualbox/cloudera-
quickstart-vm-5.13.0-0-virtualbox.zip
# File zilla
# Workbench sql

Infrastructure of Data Pipe:

# virtual world:
# how and where to save all this ongoing & generated data

# data :
# Structured # Unstructured
# tabular format file
# row and columns
# dictionary:
# json:
# key: value
# column: [32432,46,6,34]

# 17% of real world data:

# solution of real world problems are

more complex

# social tech data:

# emoji's
# photos
# videos
# snaps
# tic.... toc content
# daily status
# tweets
# reviews.... text ...
# gif files
# sensor data
# hashtag
# voice recording...
# reels
# youtube....
# share.... market....daily news...

# 83% share in data world

# tools to store it:

# do this much of data: can really you create wealth:
resource

# Data is most valuable resource in our world: Value (83%

unstructured data)
# Oil

# A lot's of DATA: Big DATA

# 3 Characterics:
# Volume: 83% .....
# Variety: big data: Humans:
# Velocity:

# Rate at which data is getting generate = Rate at which we make

decision out of it

# Technical:
#
# Flipkart big billion day sale: 12:00 AM Oct - 11
# Amazon: AWS Platform ????

# 3 days: millions: platform must be smooth:

# review....
# average uses for any platform:
# 3 million
3 days # 9 million user daily

# buy more machines to store the

information:
# cost very much
# outsourcing:?????
# Pay as you go: service:
# cloud platforms

# recommendation: using KMeans

algorithm:
# cluster: of what was
bought together buy other
#
customer... forecasting

# mobile phone:
recommendation.....using your model:
# resources:
jupyter notebook:
# jupyter
notebook

# where actual data will be saved:

# keep my storage location very close to my consumer base:
# data fetching and saving speed will be high

# data storing is becoming cheaper because of cloud platform: rent

serviCE

# Peta bytes.... (pb)

# product
# customers
# reviews
# sales
# bankings... card, otp

# XYZ product: arrange a stock of 100 units:

# steep discount : 75%
# 1 min: pan india:
# 99 orders in 1 min:
12:00:59 am

# server:
# updating the available product in
their warehouse:
# 98th order is books:
system must show only 2 product left
# 1 sec of time: 10 people try transaction:
# 1 order:
# 1st order allocation to 1st person

# 5-6 years ago: With Paytm:

# 100% 5-6: 200:
# server:
# updating the available product in
their warehouse:
# product is out of stock:
# 9 orders will be pending:
# cancel
# Velocity:

# Rate at which data is getting generate = Rate at which we make decision

are made

# Hot Steaming of data: Real time streaming:

# platform will go down:
# data scientist and data engineer: optimum
and efficient

# unsupervised learning at the same time on

the platform:
# show a cluster of buy together....
# earphone
# cover

# big data tools and technologies:

# 2002-03: Hadoop was born:
# Map reduce:
# 2004-05: Map reduce: google give the research paper
# 2007: Hadoop with Map Reduce

# velocity issue: process information as fast as possible: big data

tech

# cloud platforms: which provide services to store data digitally:

# 2006s: 2007 AWS was launched:

# certification....
# physical data capturing machines....
# laptop: hard disk....
# space:
# demand keep
fluctuating for data capturing

# Hadoop:
# Room in a building:
# bed, washing machine, fan, laptop......

# Cluster in Cloud platform:

# millions/thousands of clusters on every platforms now a
days

# save our database, processing of incoming data,

# process data out, ligh fast speed

# machine learning ????

# platform were added in Hadoop system during 2005-2009:

# search something: data is processed out:

# Disk Output/Input: another disk location

# Application name: YARN: Yet another resource

# facebook: created one cloud account:

# sending all users information....
# resources in cloud platforms
# processing power?????
# HDFS with Map
Reduce

# Machine learning: tools

# data analysis tools
# fetching a filtered data: join queries, aggregate
queries:

# 100x times faster clocking speed in data processing:

# HDFS:
Map Reduce: processing the data: from disk
# saving back data into the
disk at another location

Spark: more Memory: save information in memory itself:

# RAM: RECENTLY ALLOCATED
MEMORY:

Big Data A Comprehensive Overview
No ratings yet
Big Data A Comprehensive Overview
25 pages
Unit 1 B Tech 3 Year BD
No ratings yet
Unit 1 B Tech 3 Year BD
10 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
17 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
33 pages
Big Data - Comprehensive Summary
No ratings yet
Big Data - Comprehensive Summary
12 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Module 1-BDA
No ratings yet
Module 1-BDA
82 pages
BIG DATA AND ANALYTICS Presentation
No ratings yet
BIG DATA AND ANALYTICS Presentation
31 pages
Unit 1
No ratings yet
Unit 1
11 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
Big Data Distributed Platforms
No ratings yet
Big Data Distributed Platforms
18 pages
Bda U1
No ratings yet
Bda U1
80 pages
Big Data - Cloud - AI
No ratings yet
Big Data - Cloud - AI
45 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
BD Imp Ques 1
No ratings yet
BD Imp Ques 1
22 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
Intro AI
No ratings yet
Intro AI
4 pages
Lecture 2
No ratings yet
Lecture 2
11 pages
BD by Maaz
No ratings yet
BD by Maaz
19 pages
BG
No ratings yet
BG
4 pages
What's Is Big D-WPS Office
No ratings yet
What's Is Big D-WPS Office
3 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Unit 1 J2 Big Data
No ratings yet
Unit 1 J2 Big Data
6 pages
Big Data ANAlysis Short
No ratings yet
Big Data ANAlysis Short
114 pages
Real-Time Big Data Analytics - Sample Chapter
100% (2)
Real-Time Big Data Analytics - Sample Chapter
30 pages
Big Data Analytics
No ratings yet
Big Data Analytics
37 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Big Data Analytics02
No ratings yet
Big Data Analytics02
20 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
17 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
Big Data Basics - Simple Notes
No ratings yet
Big Data Basics - Simple Notes
4 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
DP 900 Day 4
No ratings yet
DP 900 Day 4
40 pages
UNIT II - Emerging Technology
No ratings yet
UNIT II - Emerging Technology
22 pages
Bda U2
No ratings yet
Bda U2
68 pages
Introduction To Big Data Notes
No ratings yet
Introduction To Big Data Notes
4 pages
Managing Your Assets With Big Data Tools
No ratings yet
Managing Your Assets With Big Data Tools
54 pages
Bigdata Overview PDF
No ratings yet
Bigdata Overview PDF
98 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
Big Data Ashish
No ratings yet
Big Data Ashish
7 pages
BDA Unit 2
No ratings yet
BDA Unit 2
8 pages
IOTBDM - Mid Sem
No ratings yet
IOTBDM - Mid Sem
16 pages
Types of Digital Data: Unit 1 Big Data KCS-061
No ratings yet
Types of Digital Data: Unit 1 Big Data KCS-061
12 pages
Lecture 2 - Hadoop 221
No ratings yet
Lecture 2 - Hadoop 221
28 pages
Big Data Tools
No ratings yet
Big Data Tools
29 pages
Big Data Unit 1 AKTU Notes
No ratings yet
Big Data Unit 1 AKTU Notes
87 pages
Intro To Big Data Analytics
No ratings yet
Intro To Big Data Analytics
14 pages
Experiment No - 1 Bda
No ratings yet
Experiment No - 1 Bda
10 pages
Big Data
No ratings yet
Big Data
8 pages
Big Data
No ratings yet
Big Data
10 pages
Big Data Hadoop Complete Final Spaced
No ratings yet
Big Data Hadoop Complete Final Spaced
15 pages
BIGDATAUNIT1 AKTUpdf
No ratings yet
BIGDATAUNIT1 AKTUpdf
33 pages
Data Analytics
No ratings yet
Data Analytics
69 pages
BIG DATA Module 1
No ratings yet
BIG DATA Module 1
16 pages
Notes Big Data
No ratings yet
Notes Big Data
106 pages
Big Data Analytics
No ratings yet
Big Data Analytics
61 pages
Unit-1 Introduction To Data Analytics
No ratings yet
Unit-1 Introduction To Data Analytics
35 pages
Conversations with: AI: Developer edition, #1
From Everand
Conversations with: AI: Developer edition, #1
Xinc Cyberwizard
No ratings yet
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
AT - A A D M: E S V T L: Arget Gnostic Ttack On EEP Odels Xploiting Ecurity Ulnerabilities of Ransfer Earning
No ratings yet
AT - A A D M: E S V T L: Arget Gnostic Ttack On EEP Odels Xploiting Ecurity Ulnerabilities of Ransfer Earning
14 pages
Data Science Presentation
No ratings yet
Data Science Presentation
27 pages
Prof. Rajendra Singh (Rajju Bhaiya) University: (Two Years Degree Course in Computer Application)
No ratings yet
Prof. Rajendra Singh (Rajju Bhaiya) University: (Two Years Degree Course in Computer Application)
25 pages
Fraud Detection
No ratings yet
Fraud Detection
16 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Wikipedia K Nearest Neighbor Algorithm
No ratings yet
Wikipedia K Nearest Neighbor Algorithm
4 pages
The Future of Work Post-Covid and AI: and A Closer Look at AR/VR (Bonus Feature)
No ratings yet
The Future of Work Post-Covid and AI: and A Closer Look at AR/VR (Bonus Feature)
32 pages
KNN MCQs 3
No ratings yet
KNN MCQs 3
14 pages
Hojageldiyev 2018
No ratings yet
Hojageldiyev 2018
9 pages
Cienciadedatos
No ratings yet
Cienciadedatos
21 pages
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
No ratings yet
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
35 pages
Darshan
No ratings yet
Darshan
9 pages
Effective Usage of Artificial Intelligence in Enterprise Resource Planning Applications
No ratings yet
Effective Usage of Artificial Intelligence in Enterprise Resource Planning Applications
8 pages
Deep Learning Literature Review
100% (1)
Deep Learning Literature Review
8 pages
First and Last
No ratings yet
First and Last
68 pages
Agro-Genius: Crop Prediction Using Machine Learning
No ratings yet
Agro-Genius: Crop Prediction Using Machine Learning
7 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
11 pages
ITXXX Applied Forecasting Methods Winter - Pritam Anand
No ratings yet
ITXXX Applied Forecasting Methods Winter - Pritam Anand
3 pages
4-1 Syllabus-1
No ratings yet
4-1 Syllabus-1
6 pages
XIV NLSTIAM Case Study
No ratings yet
XIV NLSTIAM Case Study
24 pages
Ai Lakshmana Sai Vision Transformer
No ratings yet
Ai Lakshmana Sai Vision Transformer
19 pages
Resume 2025
No ratings yet
Resume 2025
1 page
HR Planning & Strategy
No ratings yet
HR Planning & Strategy
11 pages
Ibm Ix Dach Genai Pov en
No ratings yet
Ibm Ix Dach Genai Pov en
30 pages
Proactive Failure Detection of Automotive Components and Its Recovery Recommendations Using Static Rule Engine and LLM Models
No ratings yet
Proactive Failure Detection of Automotive Components and Its Recovery Recommendations Using Static Rule Engine and LLM Models
11 pages
5028-Article Text-16342-1-4-20230406
No ratings yet
5028-Article Text-16342-1-4-20230406
8 pages
MSDS StudentHandbook 2425
No ratings yet
MSDS StudentHandbook 2425
26 pages
Technical Answers For Realworld Problems (ECE3999) : Project Title: Covid-19 Analysis Through Chest X-Rays
No ratings yet
Technical Answers For Realworld Problems (ECE3999) : Project Title: Covid-19 Analysis Through Chest X-Rays
7 pages
Deep Learning Bootcamp - Neural Networks With Python, Pytorch
No ratings yet
Deep Learning Bootcamp - Neural Networks With Python, Pytorch
8 pages
Certificate Program In: Machine Learning & Ai With Python
No ratings yet
Certificate Program In: Machine Learning & Ai With Python
16 pages

File 11

Uploaded by

File 11

Uploaded by

Nasir Qureshi

Unsupervised learning: data forecasting..... unlabeled data

# build upon your python and your sql skills:

1. Big Data introduction

# basic of sql also

# Python knowledge: to cover major advanced data analysis

# Data analytics with the help of spark basd environment

# Installed required softwares:

# Oracle Virtual machine: https://www.virtualbox.org/wiki/Downloads

Infrastructure of Data Pipe:

# 17% of real world data:

# solution of real world problems are

# social tech data:

# 83% share in data world

# tools to store it:

# Data is most valuable resource in our world: Value (83%

# A lot's of DATA: Big DATA

# Rate at which data is getting generate = Rate at which we make

# 3 days: millions: platform must be smooth:

# buy more machines to store the

# recommendation: using KMeans

# where actual data will be saved:

# data storing is becoming cheaper because of cloud platform: rent

# Peta bytes.... (pb)

# XYZ product: arrange a stock of 100 units:

# 5-6 years ago: With Paytm:

# Rate at which data is getting generate = Rate at which we make decision

# Hot Steaming of data: Real time streaming:

# unsupervised learning at the same time on

# big data tools and technologies:

# velocity issue: process information as fast as possible: big data

# cloud platforms: which provide services to store data digitally:

# Cluster in Cloud platform:

# save our database, processing of incoming data,

# machine learning ????

# search something: data is processed out:

# Disk Output/Input: another disk location

# Application name: YARN: Yet another resource

# facebook: created one cloud account:

# Machine learning: tools

# 100x times faster clocking speed in data processing:

Spark: more Memory: save information in memory itself:

You might also like