0% found this document useful (0 votes)

44 views59 pages

Module 1 Ppt1

This document provides an overview of a data mining and analysis course. The objectives are to learn data mining methods, business intelligence, predictive analytics, and knowledge discovery. The course covers topics such as association rules, classification, clustering, performance evaluation, and time series forecasting. Data mining is presented as a process to extract useful patterns from large amounts of data and turn raw data into useful business information.

Uploaded by

Rashmi Sehgal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views59 pages

Module 1 Ppt1

Uploaded by

Rashmi Sehgal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Data Mining and

Analysis
MCA2001
Objectives:

• To learn Data mining methods and its importance.

• To learn about Business Intelligent and predictive
analytics and decision making.
Expected Outcomes
• Implement the appropriate data mining methods like
classification, clustering or association mining on large
data sets.
• Apply analytics and intelligence to solve practical
problems.
• Apply Data mining for knowledge discovery.
Module 1
Introduction : Data Mining(DM) –origin –rapid growth- - Core
Ideas in Data Mining - Supervised and Unsupervised Learning -
Steps in Data Mining – Data Warehousing.
Dimension reduction: Data Summaries, Correlation Analysis,
Reducing the Number of Categories in Categorical Variables-
Converting a Categorical Variable to a Numerical Variable -
Principal Components Analysis.
Module -2
Associative Prediction:
Frequent pattern Mining, Utility itemset mining, Association Rules –
Association Algorithms
Classifications:
Classification methods – Decision Tree, Naïve Bayes- K-Nearest
Neighbors- classification and regression trees –
logistic regression models.
Module-3
Cluster analysis:
Cluster analysis –Introduction –distance between
two records- measuring distance between two
clusters- Hierarchical clustering-Non-hierarchical
clustering –k-means algorithm.
Module 4
Performance Evaluation:
Evaluating classification performance -
Introduction - Evaluating Goodness of fit - logistic regression for
more than two classes. Predictive Performance - Judging
Classification Performance - Evaluating Predictive Performance –
Prediction - Multiple linear regression- Explanatory vs predictive
modelling – Estimating the regression equation and prediction
variable selection in linear regression.
Module 5
Forecasting time series :
Introduction to time series - Explanatory versus Predictive
Modelling - Popular Forecasting Methods in Business - Time
Series Components - Data Partitioning - Regression-Based
Forecasting - Model with Trend - Model with Seasonality –
Model with Trend and Seasonality - Autocorrelation and
ARIMA Models - Smoothing Methods.
Introduction to Data Mining(DM)
• Data mining is a process used by companies to
turn raw data into useful information.
• By using software to look for patterns in large
batches of data, businesses can learn more
about their customers to develop more effective
marketing strategies, increase sales and
decrease costs.
• Data mining depends on effective data
collection, warehousing, and computer
processing.
Why Data Mining ?
• Credit ratings/targeted marketing:
• Given a database of 100,000 names, which persons are the least likely to
default on their credit cards?
• Identify likely responders to sales promotions
• Fraud detection
• Which types of transactions are likely to be fraudulent, given the demographics
and transactional history of a particular customer?
• Customer relationship management:
• Which of my customers are likely to be the most loyal, and which are most likely
to leave for a competitor? :

Data Mining helps extract such information

Cont…

• The Explosive Growth of Data: from terabytes(10004) to yottabytes(10008)

• Data collection and data availability
• Automated data collection tools, database systems, web
• Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: bioinformatics, scientific simulation, medical research …
• Society and everyone: news, digital cameras, …
• Data rich but information poor!
• What does those data mean?
• How to analyze data?
• Data mining — Automated analysis of massive data sets
Data mining

• Process of semi-automatically analyzing large

databases to find patterns that are:
• valid: hold on new data with some certainity
• novel: non-obvious to the system
• useful: should be possible to act on the item
• understandable: humans should be able to interpret
the pattern
• Also known as Knowledge Discovery in
Databases (KDD)
What Is Data Mining?
• Data mining (knowledge discovery from data)
• Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
• Data mining: a misnomer?
• Alternative names
• Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.

Data Mining: Concepts and Techniques 13

Potential Applications
• Data analysis and decision support
• Market analysis and management
• Target marketing, customer relationship management (CRM), market basket analysis,
cross selling, market segmentation
• Risk analysis and management
• Forecasting, customer retention, improved underwriting, quality control, competitive
analysis
• Fraud detection and detection of unusual patterns (outliers)
• Other Applications
• Text mining (news group, email, documents) and Web mining
• Stream data mining
• Bioinformatics and bio-data analysis
Data Mining: Concepts and Techniques
Applications (continued)
• Medicine: disease outcome, effectiveness of treatments
• analyze patient disease history: find relationship between diseases
• Molecular/Pharmaceutical: identify new drugs
• Scientific data analysis:
• identify new galaxies by searching for sub clusters
• Web site/store design and promotion:
• find affinity of visitor to pages and modify layout
Ex.: Market Analysis and Management
• Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, surveys …
• Target marketing
• Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.,
• E.g. Most customers with income level 60k – 80k with food expenses $600 -
$800 a month live in that area
• Determine customer purchasing patterns over time
• E.g. Customers who are between 20 and 29 years old, with income of 20k – 29k
usually buy this type of CD player
• Cross-market analysis—Find associations/co-relations between product sales, &
predict based on such association
• E.g. Customers who buy computer A usually buy software B 16
Ex.: Market Analysis and Management (2)
• Customer requirement analysis
• Identify the best products for different customers
• Predict what factors will attract new customers
• Provision of summary information
• Multidimensional summary reports
• E.g. Summarize all transactions of the first quarter from three different branches
Summarize all transactions of last year from a particular branch
Summarize all transactions of a particular product
• Statistical summary information
• E.g. What is the average age for customers who buy product A?
• Fraud detection
• Find outliers of unusual transactions
• Financial planning
• Summarize and compare the resources and spending 17
Knowledge Discovery (KDD) Process

Data Mining: Concepts and Techniques

KDD Process: Several Key Steps
• Learning the application domain
• relevant prior knowledge and goals of application

• Identifying a target data set: data selection

• Data processing
• Data cleaning (remove noise and inconsistent data)
• Data integration (multiple data sources maybe combined)
• Data selection (data relevant to the analysis task are retrieved from database)
• Data transformation (data transformed or consolidated into forms appropriate for mining) (Done with data
preprocessing)
• Data mining (an essential process where intelligent methods are applied to extract data patterns)
• Pattern evaluation (indentify the truly interesting patterns)
• Knowledge presentation (mined knowledge is presented to the user with visualization or representation
techniques)

• Use of discovered knowledge 19

Data Mining and Business Intelligence

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business

Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
Data Mining: Concepts and Techniques 20
A typical DM System Architecture

• Database, data warehouse, WWW or other information repository (store data)

• Database or data warehouse server (fetch and combine data)
• Knowledge base (turn data into meaningful groups according to domain
knowledge)
• Data mining engine (perform mining tasks)
• Pattern evaluation module (find interesting patterns)
• User interface (interact with the user)
A typical DM System
Architecture
Architecture of typical data mining system
• Database, data warehouse, World Wide Web, or other information repository:
This is one or a set of databases, data warehouses, spreadsheets, or other
kinds of information repositories. Data cleaning and data integration techniques
may be performed on the data.
• Database or data warehouse server: The database or data warehouse server is
responsible for fetching the relevant data, based on the user’s data mining
request.
• Knowledge base: This is the domain knowledge that is used to guide the search
or evaluate the interestingness of resulting patterns. Such knowledge can
include concept hierarchies, used to organize attributes or attribute values into
different levels of abstraction. Knowledge such as user beliefs, which can be
used to assess a pattern’s interestingness based on its unexpectedness, may
also be included. Other examples of domain knowledge are additional
interestingness constraints or thresholds, and metadata (e.g., describing data
from multiple heterogeneous sources).
• Data mining engine: Consists of a set of functional modules for tasks such as
characterization, association and correlation analysis, classification, prediction,
cluster analysis, outlier analysis, and evolution analysis.
• Pattern evaluation module: This component typically employs interestingness
measures and interacts with the data mining modules so as to focus the search
toward interesting patterns. It may use interestingness thresholds to filter out
discovered patterns. Alternatively, the pattern evaluation module may be
integrated with the mining module.
• User interface: This module communicates between users and the data mining
system, allowing the user to interact with the system by specifying a data mining
query or task, providing information to help focus the search, and performing
exploratory data mining based on the intermediate data mining results. This
component allows the user to browse database and data warehouse schemas
or data structures, evaluate mined patterns, and visualize the patterns in
different forms.
Motivating Challenges
• Scalability:
• Datasets with sizes of gigabytes, terabytes or even petabytes
• Massive datasets cannot fit into main memory
• Need to develop scalable data mining algorithms to mine massive datasets
• Scalability can also be improved by using sampling or developing parallel and
distributed algorithms.
• High Dimensionality:
• Data sets with hundreds or thousands of attributes.
• Example: Dataset that contains measurements of temperature at various location
• Traditional data analysis techniques that were developed for low dimensional data .
• Need to develop data mining algorithms to handle high dimensionality.
• Heterogeneous and Complex Data:
• Traditional data analysis methods deal with datasets containing attributes of same
type(Continuous or Categorical).
• Complex data sets contains image, video, text etc.
• Need to develop mining methods to handle complex datasets
• Data Ownership and Distribution:
• Data is not stored in one location or owned by one organization.
• Data is geographically distributed among resources belonging to multiple entities.
• Need to develop distributed data mining algorithms to handle distributed datasets.
• Key challenges:
• How to reduce the amount of communication needed for distributed data.
• How to effectively consolidate the data mining results from multiple sources
• How to address data security issues.
• Non Traditional Analysis:
• Traditional statistical approach is based on a hypothesize-and-test
paradigm.
• A hypothesis is proposed, an experiment is designed to gather the
data, and then data is analyzed with respect to the hypothesis.
• This process is extremely labor-intensive.
• Need to develop mining methods to automate the process of
hypothesis generation and evaluation.
On What Kinds of Data?

• Database-oriented data sets and applications

• Relational database, data warehouse, transactional database
• Advanced data sets and advanced applications
• Object-Relational Databases
• Temporal Databases, Sequence Databases, Time-Series databases
• Spatial Databases and Spatiotemporal Databases
• Text databases and Multimedia databases
• Heterogeneous Databases and Legacy Databases
• Data Streams
• The World-Wide Web

28
Relational Databases
• DBMS – database management system, contains a collection of
interrelated databases
e.g. Faculty database, student database, publications database
• Each database contains a collection of tables and functions to
manage and access the data.
e.g. student_bio, student_graduation, student_parking
• Each table contains columns and rows, with columns as attributes of data and
rows as records.
• Tables can be used to represent the relationships between or among multiple
tables.
Relational Databases

Data Mining: Concepts and Techniques

Relational Databases

• With a relational query language, e.g. SQL, we will be able to find

answers to questions such as:
• How many items were sold last year?
• Who has earned commissions higher than 10%?
• What is the total sales of last month for Dell laptops?
• When data mining is applied to relational databases, we can search for
trends or data patterns.
• Relational databases are one of the most commonly available and
rich information repositories, and thus are a major data form in our
study.
Data Warehouses

• A repository of information
collected from multiple
sources, stored under a
unified schema, and that
usually resides at a single
site.
• Constructed via a process of
data cleaning, data
integration, data
transformation, data loading
and periodic data refreshing.
Data Warehouses (2)

• Data are organized around major subjects, e.g. customer, item, supplier and activity.
• Provide information from a historical perspective (e.g. from the past 5 – 10 years)
• Typically summarized to a higher level (e.g. a summary of the
transactions per item type for each store)
• User can perform drill-down or roll-up operation to view the data at different degrees of
summarization

• OLAP (Online Analytical Processing) is the technology behind many Business

Intelligence (BI) applications. OLAP is a powerful technology for data discovery,
including capabilities for limitless report viewing, complex analytical calculations,
and predictive “what if” scenario (budget, forecast) planning.
Data Warehouses (3)
Transactional Databases
• Consists of a file where each record represents a transaction
• A transaction typically includes a unique transaction ID and a list of the items making
up the transaction.

• Either stored in a flat file or unfolded into relational tables

• Easy to identify items that are frequently sold together
Relationship with other fields

• Overlaps with machine learning, statistics, artificial intelligence,

databases, visualization but more stress on
• scalability of number of features and instances

• stress on algorithms and architectures whereas foundations of methods and

formulations provided by statistics and machine learning.

• automation for handling large, heterogeneous data

The Origins of Data Mining
• Data Mining Draws ideas, such as:
• Sampling, estimation and hypothesis testing
from statistics.
• Search algorithms, modeling techniques and
learning theories from Artificial Intelligence or
Machine Learning, Pattern Recognition.
• Database systems are needed to provide
support for efficient storage, Indexing and query
processing.
• The Techniques from parallel computing are
addressing the massive size of some datasets.
• Distributed Computing techniques are used to
gather information from different locations.
Confluence of Multiple Disciplines

Database
Technology Statistics

Information Machine
Science Data Mining Learning

Visualization Other
Disciplines

• Not all “Data Mining System” performs true data mining

 machine learning system, statistical analysis (small amount of data)
 Database system (information retrieval, deductive querying…)
38
DATAWAREHOUSE
A producer wants to know….
Which are our
lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?

What product prom- Which customers

-otions have the biggest are most likely to go
impact on revenue? to the competition ?
What impact will
new products/services
have on revenue
and margins?
40
Data, Data everywhere
yet ... • I can’t find the data I need
• data is scattered over the network
• many versions, subtle differences

 I can’t get the data I need

need an expert to get the data

 I can’t understand the data I

found
available data poorly documented

 I can’t use the data I found

results are unexpected
data needs to be transformed 41
from one form to other
What is a Data Warehouse?
A single, complete and
consistent store of data
obtained from a variety of
different sources made
available to end users in a
what they can understand and
use in a business context.

[Barry Devlin]

42
What are the users saying...
• Data should be integrated across the
enterprise
• Summary data has a real value to the
organization
• Historical data holds the key to
understanding data over time
• What-if capabilities are required

43
What is Data Warehousing?

A process of transforming
Information data into information and
making it available to users in
a timely enough manner to
make a difference

[Forrester Research, April 1996]

Data
44
Evolution
• 60’s: Batch reports
• hard to find and analyze information
• inflexible and expensive, reprogram every new request
• 70’s: Terminal-based DSS and EIS (executive information systems)
• still inflexible, not integrated with desktop tools
• 80’s: Desktop data access and analysis tools
• query tools, spreadsheets, GUIs
• easier to use, but only access operational databases
• 90’s: Data warehousing with integrated OLAP engines and tools

45
Very Large Data Bases
• Terabytes -- 10^12 bytes: Walmart -- 24 Terabytes

• Petabytes -- 10^15 bytes: Geographic Information Systems

• Exabytes -- 10^18 bytes:

National Medical Records
• Zettabytes -- 10^21 bytes:
Weather images
• Zottabytes -- 10^24 bytes:
Intelligence Agency Videos

46
Data Warehousing --
It is a process
• Technique for assembling and
managing data from various sources
for the purpose of answering
business questions. Thus making
decisions that were not previous
possible
• A decision support database
maintained separately from the
organization’s operational database

47
Data Warehouse
• A data warehouse is a
• subject-oriented
• integrated
• time-varying
• non-volatile

collection of data that is used primarily in organizational decision

making.
-- Bill Inmon, Building the Data Warehouse 1996

48
Explorers, Farmers and Tourists
Tourists: Browse information harvested by
farmers

Farmers: Harvest information

from known access paths

Explorers: Seek out the unknown and previously

unsuspected rewards hiding in the detailed data

49
Data Mining works with Warehouse Data

• Data Warehousing provides the

Enterprise with a memory

Data Mining provides

the Enterprise with
intelligence
50
Supervised learning vs. unsupervised learning

• Supervised learning: discover patterns in the data that relate

data attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the target
attribute in future data instances
• Unsupervised learning: The data have no target attribute.
• We want to explore the data to find some intrinsic structures in
them.

51
Supervised learning:
• The computer is presented with example inputs and their desired outputs,
given by a “teacher”, and the goal is to learn a general rule that maps inputs
to outputs.
• The training process continues until the model achieves the desired level of
accuracy on the training data.
• Some real-life examples are:
• Image Classification: You train with images/labels. Then in the future you give a new
image expecting that the computer will recognize the new object.
• Market Prediction/Regression: You train the computer with historical market data
and ask the computer to predict the new price in the future.
Supervised Machine Learning
• Supervised learning is where you have input variables (x) and an
output variable (Y) and you use an algorithm to learn the mapping
function from the input to the output.
Y = f(X)
• The goal is to approximate the mapping function so well that when
you have new input data (x) that you can predict the output variables
(Y) for that data.
• It is called supervised learning because the process of an
algorithm learning from the training dataset can be thought of
as a teacher supervising the learning process.
• We know the correct answers, the algorithm iteratively makes
predictions on the training data and is corrected by the
teacher.
• Learning stops when the algorithm achieves an acceptable
level of performance.
We have four types of fruits. They are: apple, banana, grape and
cherry.

FRUIT
NO. SIZE COLOR SHAPE
NAME

Rounded shape with a depression at

1 Big Red Apple
the top

2 Small Red Heart-shaped to nearly globular Cherry

3 Big Green Long curving cylinder Banana

4 Small Green Round to oval, Bunch shape Cylindrical Grape

• Suppose you have taken an new fruit from the basket then you will see
the size , color and shape of that particular fruit.
• If size is Big , color is Red , shape is rounded shape with a depression at
the top, you will conform the fruit name as apple and you will put in
apple group.
• Likewise for other fruits also.
• Job of groping fruits was done and happy ending.
• You can observe in the table that a column was labeled as “FRUIT
NAME” this is called as response variable.
• If you learn the thing before from training data and then applying that
knowledge to the test data(for new fruit), This type of learning is called
as Supervised Learning.
• Supervised learning problems can be further grouped into
regression and classification problems.
• Classification: A classification problem is when the output variable
is a category, such as “red” or “blue” or “disease” and “no disease”.
• Regression: A regression problem is when the output variable is a
real value, such as “dollars” or “weight”.
Unsupervised learning
• No labels are given to the learning algorithm, leaving it on its own to
find structure in its input.
• It is used for clustering population in different groups.
• Unsupervised learning can be a goal in itself (discovering hidden
patterns in data).
• Clustering: You ask the computer to separate similar data into clusters, this
is essential in research and science.
Steps in Data Mining
• Develop an understanding of the purpose of the data mining project
• Obtain the dataset to be used in the analysis.
• Explore, clean, and preprocess the data.
• Reduce the data, if necessary, and (where supervised training is involved)
separate them into training, validation, and test datasets.
• Determine the data mining task (classification, prediction, clustering, etc.).
• Choose the data mining techniques to be used (regression, neural nets,
hierarchical clustering, etc.).
• Use algorithms to perform the task
• Interpret the results of the algorithms
• Deploy the model

Demystifying Artificial Intelligence Symbolic, Data-Driven, Statistical and Ethical AI (Emmanuel Gillain) (Z-Library)
No ratings yet
Demystifying Artificial Intelligence Symbolic, Data-Driven, Statistical and Ethical AI (Emmanuel Gillain) (Z-Library)
476 pages
Windows Server Get Started
No ratings yet
Windows Server Get Started
245 pages
Thesis Statement Activity Sheet
100% (2)
Thesis Statement Activity Sheet
4 pages
Csi Safe 22.1.0.2728
No ratings yet
Csi Safe 22.1.0.2728
3 pages
Navigating Quipper
No ratings yet
Navigating Quipper
30 pages
Language Models Application Development
No ratings yet
Language Models Application Development
5 pages
AM 8000 Manu Prog ENG
No ratings yet
AM 8000 Manu Prog ENG
60 pages
Splunk 1003 - Question Banks
No ratings yet
Splunk 1003 - Question Banks
21 pages
Iso 26262 Safety Cases: Compliance and Assurance: Rob Palin, David Ward, Ibrahim Habli, Roger Rivett
No ratings yet
Iso 26262 Safety Cases: Compliance and Assurance: Rob Palin, David Ward, Ibrahim Habli, Roger Rivett
6 pages
Research Paper - Attendease
No ratings yet
Research Paper - Attendease
13 pages
ITB-I Unit-2 PPT Till MIS
No ratings yet
ITB-I Unit-2 PPT Till MIS
23 pages
Lab 9 SQL SELECT With Functions, Aggregate Functions
No ratings yet
Lab 9 SQL SELECT With Functions, Aggregate Functions
14 pages
Log TGT101MM2 26082023
No ratings yet
Log TGT101MM2 26082023
7 pages
TBC 401 Data Analytics Using Python
No ratings yet
TBC 401 Data Analytics Using Python
2 pages
2090i Fette
No ratings yet
2090i Fette
14 pages
PSM Simulation Exam 2
No ratings yet
PSM Simulation Exam 2
35 pages
Knowledge of Microsoft Office Skills Test
No ratings yet
Knowledge of Microsoft Office Skills Test
7 pages
CT038!3!2 Object Oriented Development Using Java (VD1) 1 September 2019
No ratings yet
CT038!3!2 Object Oriented Development Using Java (VD1) 1 September 2019
2 pages
Device Info Report
No ratings yet
Device Info Report
5 pages
RMC No. 25-2024 - Annex B
No ratings yet
RMC No. 25-2024 - Annex B
1 page
Sintetičke Membrane - Sustav Zelenog Krova - Preljev
No ratings yet
Sintetičke Membrane - Sustav Zelenog Krova - Preljev
2 pages
ANSYS Motion 2019 R3 DFPreStart - Exe Manual
No ratings yet
ANSYS Motion 2019 R3 DFPreStart - Exe Manual
5 pages
Breaking LTE On Layer Two
No ratings yet
Breaking LTE On Layer Two
1 page
Packing Slip
No ratings yet
Packing Slip
2 pages
Azgaar's Fantasy Map Generator v1.89.04
No ratings yet
Azgaar's Fantasy Map Generator v1.89.04
1 page
Resume
100% (2)
Resume
7 pages
Bradly Hand (TC) Bold Sizes 45 Justify Font Color Blue.: Lucida Console Size 30
No ratings yet
Bradly Hand (TC) Bold Sizes 45 Justify Font Color Blue.: Lucida Console Size 30
1 page
Interrupts Programming
No ratings yet
Interrupts Programming
7 pages
R1 - Case Study Problem Statement PDF
No ratings yet
R1 - Case Study Problem Statement PDF
2 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2133)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Module 1 Ppt1

Uploaded by

Module 1 Ppt1

Uploaded by

Data Mining and

• To learn Data mining methods and its importance.

Data Mining helps extract such information

• The Explosive Growth of Data: from terabytes(10004) to yottabytes(10008)

• Process of semi-automatically analyzing large

Data Mining: Concepts and Techniques 13

Data Mining: Concepts and Techniques

• Identifying a target data set: data selection

• Use of discovered knowledge 19

Data Presentation Business

Data Preprocessing/Integration, Data Warehouses

• Database, data warehouse, WWW or other information repository (store data)

• Database-oriented data sets and applications

Data Mining: Concepts and Techniques

• With a relational query language, e.g. SQL, we will be able to find

• OLAP (Online Analytical Processing) is the technology behind many Business

• Either stored in a flat file or unfolded into relational tables

• Overlaps with machine learning, statistics, artificial intelligence,

• stress on algorithms and architectures whereas foundations of methods and

• automation for handling large, heterogeneous data

• Not all “Data Mining System” performs true data mining

What product prom- Which customers

 I can’t get the data I need

 I can’t understand the data I

 I can’t use the data I found

[Forrester Research, April 1996]

• Petabytes -- 10^15 bytes: Geographic Information Systems

• Exabytes -- 10^18 bytes:

collection of data that is used primarily in organizational decision

Farmers: Harvest information

Explorers: Seek out the unknown and previously

• Data Warehousing provides the

Data Mining provides

• Supervised learning: discover patterns in the data that relate

Rounded shape with a depression at

2 Small Red Heart-shaped to nearly globular Cherry

3 Big Green Long curving cylinder Banana

4 Small Green Round to oval, Bunch shape Cylindrical Grape

You might also like