0% found this document useful (0 votes)

78 views46 pages

Big Data Lesson 1 Lucrezia Noli

This document provides an introduction to big data and discusses: 1) The data science process which involves data quality, integration, transformation, supervised vs unsupervised learning, and model performance. 2) The foundation of data analytics including business intelligence, data warehousing, ETL, data sources analysis, and data quality. 3) Machine learning and how machines can learn tasks without being specifically programmed through experience and improving performance. Algorithms like deep reinforcement learning are discussed.

Uploaded by

Reyansh Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views46 pages

Big Data Lesson 1 Lucrezia Noli

Uploaded by

Reyansh Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

BIG DATA

Introduction

Lecturer: Lucrezia Noli

Lesson 1
LUCREZIA NOLI

▪ Università Commerciale Luigi Bocconi

▪ Bachelor in Finance
▪ Master in Economics of Innovation & Technology

▪ Master Thesis: «Machine Learning Techniques to

Investigate the ALS Disease»
▪ Won second prize of PRISLA competition

▪ Current roles
▪ Big Data Scientist at Dataskills
▪ Lecturer SEDIN - Università Bocconi
▪ Lecturer Overnet

▪ Previous roles
▪ Business Development Manager at Metail London
The Data Science process

• Quality Model • Parameters

• Integration optimization
• Supervised vs
• Transformation & • Model
Unsupervised
creation performance
• Rule based or Black
• reports
Box
Data Predictions

OLAP

Data Mart

3
Data Mart
The foundation of Data Analytics

BI Tools & Reporting Business Intelligence is

central for any company
wanting to use data in an
innovative way.
Data warehouse This is because BI certifies
ETL that the data to be analyzed
are correct.
Business Rules

Data Sources Analysis

Data Quality
What is Business Intelligence?

DATA SOURCES CLEANSING & STRUCTURE DATA WAREHOUSE CLIENT TOOLS

Sensors/ PLC

BIG DATA Analysis

INTERNAL OLAP
CRM STAGING AREA Data Mart
ERP

ETL
Production DB

Data Mart
ETL

REPORTING TOOLS

KPIs

Data Mart
EXTERNAL

Master data
management
Data quality
Machine learning

GENERICITA’

Through Machine Learning a computer can solve various tasks,

without being given all parameters that characterize any MACHINE LEARNING
ofthem specifically
Process through
which a machine

APPRENDIMENTO

“A computer program is said to learn from

experience E with respect to some class of tasks T and
M learns

without
how
complete a task

specifically
to

being

programmed to do
performance measure P if its performance at tasks in T, so
as measured by P, improves with experience E.”
Tom M. Mitchell
MACHINE LEARNING - Algorithms

In 2012 Deepmind creates an Artificial

Intelligence called Deep Q Learner that
can play any game of the Atari
package, famous in the 70s, through
ML techniques called «Deep
Reinforcement Learning».

The algorithm doesn’t specify the

characteristics of the game played, but
simply enables the machine to learn
how to play by itself.

LEARNING HOW TO PLAY FLAPPY BIRD

PREDICTIVE ANALYTICS
Learning from the data

Anni Carburante Porte Antifurto Prezzo ($)

1 Diesel 5 Si 14.000
MODEL

TARGET
1 Benzina 5 Si 12.500
TRAINING
Through
2 INPUT DATA
Benzina 3 No 11.000
2 Diesel 5 Si 13.000
machine
learning 3 Benzina 5 No 9.000
4 Benzina 3 No 8.500

3 Diesel 5 data No
New input ?
PREDICTION
Predictive Analytics

EXISTANCE OF HISTORICAL DATA, which are studied in order to PREDICTIVE

understand which interactions between input variables have ANALYTICS
generated a specific output

P Extraction
information
of
and
knowledge from data,
making
Machine
use of
Learning
AIM TO PREDICT the evolution of data in the future techniques
Predictive Analytics

PREDICTION &
DATABASE BUSINESS INSIGHT

PREDICTIVE ENGINE
Predictive Analysis - steps

• SUPERVISED
1. REPRESENTING THE PROBLEM • UNSUPERVISED
• HYBRID
PREDICTIVE
ANALYTICS

P
• CONFRONTING REAL & Extraction of
2. VALUATING THE PERFORMANCE EXPECTED VALUE
information and
• TRIAL & ERROR
knowledge from data,
making use of
Machine Learning
• MINIMIZATION OF techniques
3. OPTIMIZING THE PARAMETERS COST FUNCTION
• SEARCH FOR AN
OPTIMUM
Predictive Analytics – representing the problem

PREDICTIVE
ANALYTICS

P Extraction
information
knowledge
of
and
from
data, making use of
Machine Learning
techniques
Predictive Analytics - applications
Predictive Analytics – energy demand forecast

WHAT WILL THE ELECTRICAL

EXPENSE BE IN THE NEXT
HOUR?

▪ Time series of electric

consumption

▪ Exogenous data(weather
forecast)

➢ Better estimation of
needs and costs
➢ Ability to fix energy price
PREDICTIVE ANALYTICS – Advanced Client Segmentation

CAN WE IDENTIFY
HOMOGENEOUS GROUPS
WITHIN OUR CUSTOMER
BASE?

▪ Purchasing behavior

▪ Demographic information

➢ Ad-hoc marketing
➢ Price-setting strategies
➢ New products & services
PREDICTIVE ANALYTICS – Churn Analysis

WHICH OF OUR CLIENTS ARE

LIKELY TO LEAVE US FOR
OUR COMPETITORS?

▪ Time series data of

demand

▪ Exogenous data(eg.
Promotions, sales, holiday
seasons)

➢ Ad-hoc Marketing
➢ Promo activities
PREDICTIVE ANALYTICS – Sentiment Analysis

WHICH KIND OF FEEDBACK

PEOPLE LEAVE ONLINE
ABOUT OUR FIRM?

▪ Social media posts

▪ Customers reviews

➢ Various aims:
recommendation
engines, advanced
clustering, propensity
analysis
PREDICTIVE ANALYTICS – Propensity Analysis

HOW LIKELY IS IT THAT A

PROSPECT WILL BUY?

▪ Cross-referencing of purchase
data & data on marketing &
adv campaigns
▪ Analysis of online consumer
behavior

➢ Ad-hoc Marketing
➢ Price-setting strategies
➢ Promo activities
PREDICTIVE ANALYTICS – Price Prediction

CAN WE PREDICT HOW PRICES

WILL EVOLVE IN OUR MARKETS
OF INTEREST?

▪ Time series of prices

▪ Time series of exogenous
data(eg. Sales, promotions,
holiday seasons)

➢ Competitors’ strategy
➢ Promotions & offers
PREDICTIVE ANALYTICS – Demand Forecast

HOW MUCH DEMAND WILL

WE HAVE FOR OUR GOOD &
SERVICE IN THE NEXT
HOUR/DAY?

▪ Time series of demand

▪ Exogenous data(eg.
Promotions, sales, holiday
seasons)

➢ Inventory
➢ Resource allocation
➢ Ad-hoc marketing
➢ promotions
PREDICTIVE ANALYTICS – Fraud Detection

CAN WE IDENTIFY
FRAUDOLENT
TRANSACTIONS BEFORE
THEY ARE CARRIED OUT?

▪ Time series of transactions

with «fraud tag»

➢ Fraud detection
➢ Fraud classification
PREDICTIVE ANALYTICS – Resource Allocation

CAN WE PREDICT GEOGRAPHIC

AREAS WITH HIGHER DEMAND
FOR OUR SERVICE?

▪ Time series of demand

▪ Exogenous data(eg.fairs,
events, strikes)

➢ Optimal resource
allocation
➢ Demand/revenue
forecasting
PREDICTIVE ANALYTICS – Document Classification

CAN WE AUTOMATICALLY
CLASSIFY OUR DOCUMENTS
BASED ON THEIR CONTENT?

▪ Sample of documents to
classify

➢ Automatic document
classification
➢ Error identification
Data Sources
“Operational” sources picturing a firm’s daily activities are:

• Production-related devices
• Sales- related devices
• Tools to track orders and deliveries
• Accounting tools
• HR tools
• Client-management tools
• Back-office tools
• The product
• Production line (production plant) – data from machine’s sensors
• Orders & deliveries
• Inventory
• Suppliers data
• Customers’ feedback: call center, emails, returns data
Data types

Structured vs. unstructured data:

• Structured data → table-like data with columns and rows → eg.

Transactions: each row is a transaction, each column is a
characteristic of the transaction: when it was made, by whom, the
amount,etc.

• Unstructures (semi-structured) data → images, videos, emails, any

feedback from social networks…
Introducing Big data
Big data - definitions
1997
“ Data of a very large size, “…data sets are generally
typically to the extent that its quite large, taxing the
manipulation and capacities of main memory, 2013
management present local disk, and even remote
significant logistical disk. We call this the problem “…things one can do at
challenges.” of big data”
a large scale that cannot
be done at a smaller
one, to extract new
Oxford English Dictionary Cox, Ellsworth – NASA insights or create new
forms of value.”
Mayer-Schönberger
2001
“An all-encompassing term Cukier
Big Data described for any collection of data sets 2011
by the 3Vs:
Volume, Velocity,
so large and complex that it
becomes difficult to process
“datasets whose size is
Variety beyond the ability of
using on-hand data
Doug Laney - Gartner typical database software
management tools or
tools to capture, store,
traditional data processing
manage, and analyze,”
applications.”
Wikipedia McKinsey
27
3 Vs definition
• Volume
• Huge amounts of data
• Variety
• Variety of structures, data sources, types of data
• Complexity of structures
• Unstructured data
• Velocity
• Rapidity with which data is produced
3 Vs + 2
• Value
• Ability to extract value from this huge amount of data

• Veracity
• Not all data we have at hand are actually of “good quality”
BUT…

• Big data are not JUST data in big volume ( need to have
other characteristics too in order to be defined as big
data)

• They come from both new sources (eg. Social networks),

but also traditional ones! (think about data coming from a
sensor put on a machine in a production plant)

• In many statistics published on the web or shared on tv,

the number are either exagerated or imprecise → most of
the times, the data we’ll have to actually analize are much
less than the initial number
So there’s an alternative definition

Big data are also:

• Data which cannot be analyzed by a single machine, and shouldn’t

be analyzed with traditional hardware or software technologies.

• They might require particularly sofisticated analytical tools, but not

necessarily

• Even when we face unstructured data, these too have to be turned

into structured data before they can be analyzed
What does this all mean?

It means that when dealing with big data we still use the tools, and apply
the main concepts, that we also encounter when dealing with traditional
BI and Predictive Analysis.

The difference is in the use of specific technologies required for bigger and
unstructured/non-traditional data, such as:

• Hardware and Software devices to extract data directly from sensors placed on
smart objects/devices
• Tools to extract data in semi-real time
• Tools for parallel computing
Why do we have so many?
• Web 2.0 -> user generated content
• Facebook
• Twitter
• Instagram
• Youtube
• Blogs
• Videos
• Photos
• Posts
• …

• IOT (Internet of Things)

Big data - chart

LOW COMPLEXITY MID COMPLEXITY HIGH COMPLEXITY

Big data - chart

man
SOURCE

machine

Structured data Unstructured

Unstructured data
data
DATA STRUCTURE
Big data - cases

Case characteristic example

Sensors & DCS Velocity & volume Predictive maintenance

Radio Frequency Identification Analysis of the path of consumers within a shop, or

Velocity & volume
(RFID) goods delivered within a geographic area

Stock markets Velocity & volume Yields’ analysis, optimal portfolio analysis, risk analysis
Scientific instruments data Velocity & volume Pattern recognition & simulations
Weather forecast info Volume Weather data

Healthcare info Volume & variety Monitoring of diseases

Information of accounts & transactions – information

Fiscal & bank data Volume
cross-checking
Big data - cases

Case characteristic example

Social Network Volume,velocity, variety Sentiment Analysis

Blog, Forum Volume,velocity, variety Sentiment Analysis

Web server log Volume Web server traffic and users’ behavior

Log del traffico di un router Volume,velocity Utilizzo da parte di provider.

Surveillance Volume,velocity, variety Anomaly identification

Documents Volume, variety Automatic document classification

Dati geografici Volume,velocity Resourse allocation (eg car-sharing)

How to generate value from Big Data?

• We receive information at very high frequency

• We can store & analyze more detailed information (because we have

the hw & sw to do so)

• Enable more advanced analysis

• Micro-segmentation & Ad-hoc offering

This leads to …

• More sophisticated analysis leading to more

efficient decision-making

• Possibility to create new products/services

Software tools
Data Ingestion

Data storing Data organization

Computation/Analysis Integration/Enrichment
Hardware architectures
• Symmetric multiprocessing (SMP):
• Two or more processors connected to a single, shared RAM.
• Each processor has full access to I/O devices. Only one instance of the
operating system
• Massively Parallel Processing (MPP)
• Shared nothing architecture: each processor has its own RAM and I/O
devices
• No resource is shared
• An efficient communication layer enables collaboration between nodes

42
SMP Architecture

CPU CPU CPU CPU DISKs

1 2 3 4

BUS

RAM
MPP Architecture

Communication layer

CPU CPU CPU CPU

RAM RAM RAM RAM

DISKs DISKs DISKs DISKs

Big data Hardware
• MPP is obviously more suited to work with huge amounts of data
• The limit of this architecture is in the number of nodes we can add to
the system, and their cost!
• Examples:
• Oracle Exadata – max 18 rack (x 672TB)= 11 PB
• Microsoft APS –max 7 nodi = 6,2 PB
• Teradata - max 4096 nodi = 186PB !!!
Now let’s set up KNIME

This Photo by Unknown Author is licensed under CC BY-SA

SAP SD Training PDF PDF
100% (2)
SAP SD Training PDF PDF
4 pages
LEXISNEXIS Law Certified Training Manual
100% (2)
LEXISNEXIS Law Certified Training Manual
335 pages
Fundamentals Big DAta Read
100% (1)
Fundamentals Big DAta Read
61 pages
Bda Unit 1
No ratings yet
Bda Unit 1
74 pages
Chapter 1 - Intro To Business Analytics
No ratings yet
Chapter 1 - Intro To Business Analytics
52 pages
8.3.6 Lab - Use NETCONF To Access An IOS XE Device
No ratings yet
8.3.6 Lab - Use NETCONF To Access An IOS XE Device
16 pages
Introduction To Big Data & Basic Data Analysis
No ratings yet
Introduction To Big Data & Basic Data Analysis
51 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Sapient Intrview
No ratings yet
Sapient Intrview
4 pages
Network Functions Virtualization NFV (PDFDrive)
100% (1)
Network Functions Virtualization NFV (PDFDrive)
184 pages
Cbse - Department of Skill Education: Information Technlogy (Subject Code 802)
No ratings yet
Cbse - Department of Skill Education: Information Technlogy (Subject Code 802)
11 pages
Unit1 BDT
No ratings yet
Unit1 BDT
96 pages
Data, Big
No ratings yet
Data, Big
90 pages
Big Data Analytics - Unit 1
No ratings yet
Big Data Analytics - Unit 1
43 pages
CHAPTER 02: Big Data Analytics
No ratings yet
CHAPTER 02: Big Data Analytics
73 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
31 pages
Unit 1
No ratings yet
Unit 1
61 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Thesis - Aloma Madurog
No ratings yet
Thesis - Aloma Madurog
72 pages
Data Analysis - Unit1
No ratings yet
Data Analysis - Unit1
65 pages
L01-Fundamentals of Big Data and Data Analytics
No ratings yet
L01-Fundamentals of Big Data and Data Analytics
58 pages
UNIT Two Emerging Technology
No ratings yet
UNIT Two Emerging Technology
43 pages
Data Governance
No ratings yet
Data Governance
27 pages
Session-31 - Java Files and Streams
No ratings yet
Session-31 - Java Files and Streams
122 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
Business Analytics
No ratings yet
Business Analytics
34 pages
Unit 1
No ratings yet
Unit 1
36 pages
Introduction To Business Analytics
No ratings yet
Introduction To Business Analytics
63 pages
Unit 1
No ratings yet
Unit 1
59 pages
Entry Level Resume VIT Bhopal
No ratings yet
Entry Level Resume VIT Bhopal
2 pages
Introduction To Data
No ratings yet
Introduction To Data
34 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
4 - Big Data Powering Business Intelligence
No ratings yet
4 - Big Data Powering Business Intelligence
20 pages
Unit - I - Types of Digital Data
No ratings yet
Unit - I - Types of Digital Data
45 pages
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
No ratings yet
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
56 pages
Unit 1
No ratings yet
Unit 1
74 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
Managing Innovation Teams at Disney Case Analysis: William R. Cook NETW583 Strategic Management of Technology
No ratings yet
Managing Innovation Teams at Disney Case Analysis: William R. Cook NETW583 Strategic Management of Technology
6 pages
OC - Module 1 - Intro To BDA 021312
No ratings yet
OC - Module 1 - Intro To BDA 021312
37 pages
Big Data Analytics (10!06!2025)
No ratings yet
Big Data Analytics (10!06!2025)
22 pages
(Subject Code: 410243) (Class: TE Computer Engineering) : Data Analytics
No ratings yet
(Subject Code: 410243) (Class: TE Computer Engineering) : Data Analytics
68 pages
Questionaire &
No ratings yet
Questionaire &
97 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Unit I (Notes) Class Notes
No ratings yet
Unit I (Notes) Class Notes
63 pages
Questionnaire - Weblogic Fusion Middleware Administrator
No ratings yet
Questionnaire - Weblogic Fusion Middleware Administrator
3 pages
Unit-1 Bda
No ratings yet
Unit-1 Bda
72 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
Introduction To Data Analytics: Roberta Turra
No ratings yet
Introduction To Data Analytics: Roberta Turra
23 pages
20463C Curso SQL Server
No ratings yet
20463C Curso SQL Server
130 pages
DAUnit 1
No ratings yet
DAUnit 1
20 pages
BDA Class1
No ratings yet
BDA Class1
26 pages
Document From Shivam
No ratings yet
Document From Shivam
35 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
Module 1
No ratings yet
Module 1
21 pages
Insights Into Big Data: An Industrial Perspective
No ratings yet
Insights Into Big Data: An Industrial Perspective
52 pages
Big Data
No ratings yet
Big Data
23 pages
Big Data Analytics Transforming Financial Industries
No ratings yet
Big Data Analytics Transforming Financial Industries
36 pages
BDT 1
No ratings yet
BDT 1
49 pages
Da 1
No ratings yet
Da 1
20 pages
21ai402 Data Analytics Unit-1
No ratings yet
21ai402 Data Analytics Unit-1
37 pages
Big Data Analytics
100% (1)
Big Data Analytics
3 pages
TP 4 2docuatrimestre
No ratings yet
TP 4 2docuatrimestre
10 pages
BS in Information Technology in The Philippines
100% (2)
BS in Information Technology in The Philippines
5 pages
Unit-5 DS
No ratings yet
Unit-5 DS
20 pages
Big Data in Management Unit - I: Session 1-5
No ratings yet
Big Data in Management Unit - I: Session 1-5
25 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
Bell LaPadula Model
100% (1)
Bell LaPadula Model
8 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
The Role of Think Tanks in The Definition and Application of Defence Policies and Strategies
No ratings yet
The Role of Think Tanks in The Definition and Application of Defence Policies and Strategies
33 pages
Fitness Club
No ratings yet
Fitness Club
20 pages
Data Analytics 1 - THE History and Concepts of Data Analytics
No ratings yet
Data Analytics 1 - THE History and Concepts of Data Analytics
15 pages
Minot 2 Reportrrrrrrr
No ratings yet
Minot 2 Reportrrrrrrr
74 pages
How To Sell Hillstone 313. Ax Series Adc v2.10
No ratings yet
How To Sell Hillstone 313. Ax Series Adc v2.10
30 pages
Aldi Group in Retailing (World)
No ratings yet
Aldi Group in Retailing (World)
37 pages
Big Data Lesson 5 Lucrezia Noli
No ratings yet
Big Data Lesson 5 Lucrezia Noli
30 pages
Ta2 PDF
No ratings yet
Ta2 PDF
3 pages
Big Data Lesson 2 Lucrezia Noli
No ratings yet
Big Data Lesson 2 Lucrezia Noli
21 pages
Applications For Management: Cluster Analysis I
No ratings yet
Applications For Management: Cluster Analysis I
43 pages
Applications For Management: Factor Analysis III
No ratings yet
Applications For Management: Factor Analysis III
43 pages
PS2 Sol
No ratings yet
PS2 Sol
11 pages
Applications For Management: Scale Construction and Reliability Analysis
No ratings yet
Applications For Management: Scale Construction and Reliability Analysis
40 pages
Three Level Authentication System
No ratings yet
Three Level Authentication System
15 pages
File To File Scenario Without IR (ESR) Objects - SAP Integration Hub
No ratings yet
File To File Scenario Without IR (ESR) Objects - SAP Integration Hub
8 pages
Policy Brief - Paternity Leave
No ratings yet
Policy Brief - Paternity Leave
23 pages
Organizing For Innovation Avimanyu, PH.D.: Mcgraw-Hill/Irwin
No ratings yet
Organizing For Innovation Avimanyu, PH.D.: Mcgraw-Hill/Irwin
20 pages
Exercise 1: Statistiscs 30001 - Classes 15/21
No ratings yet
Exercise 1: Statistiscs 30001 - Classes 15/21
8 pages
Isms Report
No ratings yet
Isms Report
5 pages
Exercise 1: STATISTISCS 30001 - CLASSES 15/21
No ratings yet
Exercise 1: STATISTISCS 30001 - CLASSES 15/21
7 pages
Comparative Financial Statement Analysis - M2
No ratings yet
Comparative Financial Statement Analysis - M2
13 pages
Public Management Public Management: Course Overview and Instructions
No ratings yet
Public Management Public Management: Course Overview and Instructions
12 pages
Setting Up The Oracle Warehouse Builder 11g Release 2 Tutorial Environment
No ratings yet
Setting Up The Oracle Warehouse Builder 11g Release 2 Tutorial Environment
26 pages
The Decalougue of The Strong Company
No ratings yet
The Decalougue of The Strong Company
7 pages
TA3 Sol
No ratings yet
TA3 Sol
6 pages
CASE STUDY - 3 Vivobarefoot Upgrades Technology Infrastructure
No ratings yet
CASE STUDY - 3 Vivobarefoot Upgrades Technology Infrastructure
3 pages
Case Safilo-Luxottica PART A 2018 V - 1
No ratings yet
Case Safilo-Luxottica PART A 2018 V - 1
5 pages
Trellix Intelligent Virtual Execution Cloud Datasheet
No ratings yet
Trellix Intelligent Virtual Execution Cloud Datasheet
3 pages
Big Data Lesson 4 Lucrezia Noli
No ratings yet
Big Data Lesson 4 Lucrezia Noli
16 pages
TA5 Sol
No ratings yet
TA5 Sol
3 pages
Practical Session 3: With Maximum Frequency)
No ratings yet
Practical Session 3: With Maximum Frequency)
3 pages
HCPR - Referential Integrity - SAP NetWeaver Business Warehouse - Support Wiki PDF
No ratings yet
HCPR - Referential Integrity - SAP NetWeaver Business Warehouse - Support Wiki PDF
6 pages
Fundamental Exercises For First Partial (Exercise Session 5 - Pier)
No ratings yet
Fundamental Exercises For First Partial (Exercise Session 5 - Pier)
2 pages
TA 2: Radiant Practical:D: The Following Steps Will Guide You in Exploring Radiant and The Dataset CPS1995:D!
No ratings yet
TA 2: Radiant Practical:D: The Following Steps Will Guide You in Exploring Radiant and The Dataset CPS1995:D!
2 pages
Applications For Management 30280 3 Computer Lab Session Factor Analysis Part I. Factor Analysis With Principal Axis Factoring
No ratings yet
Applications For Management 30280 3 Computer Lab Session Factor Analysis Part I. Factor Analysis With Principal Axis Factoring
2 pages
Computer Networking Transport Layer Notes
No ratings yet
Computer Networking Transport Layer Notes
4 pages
Profesional 1 - Fabian Castro
No ratings yet
Profesional 1 - Fabian Castro
6 pages
Interface Between File System and IOCS Consists of
No ratings yet
Interface Between File System and IOCS Consists of
5 pages
Abnormal Errors After ORA-1013 Received in Application
No ratings yet
Abnormal Errors After ORA-1013 Received in Application
2 pages
MATLAB Data Science
From Everand
MATLAB Data Science
Henry Codwell
No ratings yet
Flutter Full-Stack
From Everand
Flutter Full-Stack
HAROLD WHITES
No ratings yet
Data Cleaning with Power BI: The definitive guide to transforming dirty data into actionable insights
From Everand
Data Cleaning with Power BI: The definitive guide to transforming dirty data into actionable insights
Gus Frazer
No ratings yet
DeepSeek for Data Analysis: The Future of Data Analysis for Business Professionals
From Everand
DeepSeek for Data Analysis: The Future of Data Analysis for Business Professionals
Mohammod Shaharuzzaman
No ratings yet

Big Data Lesson 1 Lucrezia Noli

Uploaded by

Big Data Lesson 1 Lucrezia Noli

Uploaded by

BIG DATA

Lecturer: Lucrezia Noli

▪ Università Commerciale Luigi Bocconi

▪ Master Thesis: «Machine Learning Techniques to

• Quality Model • Parameters

BI Tools & Reporting Business Intelligence is

Data Sources Analysis

DATA SOURCES CLEANSING & STRUCTURE DATA WAREHOUSE CLIENT TOOLS

BIG DATA Analysis

Through Machine Learning a computer can solve various tasks,

“A computer program is said to learn from

In 2012 Deepmind creates an Artificial

The algorithm doesn’t specify the

LEARNING HOW TO PLAY FLAPPY BIRD

Anni Carburante Porte Antifurto Prezzo ($)

EXISTANCE OF HISTORICAL DATA, which are studied in order to PREDICTIVE

WHAT WILL THE ELECTRICAL

▪ Time series of electric

WHICH OF OUR CLIENTS ARE

▪ Time series data of

WHICH KIND OF FEEDBACK

▪ Social media posts

HOW LIKELY IS IT THAT A

CAN WE PREDICT HOW PRICES

▪ Time series of prices

HOW MUCH DEMAND WILL

▪ Time series of demand

▪ Time series of transactions

CAN WE PREDICT GEOGRAPHIC

▪ Time series of demand

Structured vs. unstructured data:

• Structured data → table-like data with columns and rows → eg.

• Unstructures (semi-structured) data → images, videos, emails, any

• They come from both new sources (eg. Social networks),

• In many statistics published on the web or shared on tv,

Big data are also:

• Data which cannot be analyzed by a single machine, and shouldn’t

• They might require particularly sofisticated analytical tools, but not

• Even when we face unstructured data, these too have to be turned

• IOT (Internet of Things)

LOW COMPLEXITY MID COMPLEXITY HIGH COMPLEXITY

Structured data Unstructured

Case characteristic example

Radio Frequency Identification Analysis of the path of consumers within a shop, or

Healthcare info Volume & variety Monitoring of diseases

Information of accounts & transactions – information

Case characteristic example

Social Network Volume,velocity, variety Sentiment Analysis

Blog, Forum Volume,velocity, variety Sentiment Analysis

Log del traffico di un router Volume,velocity Utilizzo da parte di provider.

Surveillance Volume,velocity, variety Anomaly identification

Documents Volume, variety Automatic document classification

Dati geografici Volume,velocity Resourse allocation (eg car-sharing)

• We receive information at very high frequency

• We can store & analyze more detailed information (because we have

the hw & sw to do so)

• Enable more advanced analysis

• Micro-segmentation & Ad-hoc offering

• More sophisticated analysis leading to more

• Possibility to create new products/services

Data storing Data organization

CPU CPU CPU CPU DISKs

CPU CPU CPU CPU

RAM RAM RAM RAM

DISKs DISKs DISKs DISKs

This Photo by Unknown Author is licensed under CC BY-SA

You might also like