Module 1 Ppt1
Module 1 Ppt1
Analysis
MCA2001
Objectives:
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
28
Relational Databases
• DBMS – database management system, contains a collection of
interrelated databases
e.g. Faculty database, student database, publications database
• Each database contains a collection of tables and functions to
manage and access the data.
e.g. student_bio, student_graduation, student_parking
• Each table contains columns and rows, with columns as attributes of data and
rows as records.
• Tables can be used to represent the relationships between or among multiple
tables.
Relational Databases
• A repository of information
collected from multiple
sources, stored under a
unified schema, and that
usually resides at a single
site.
• Constructed via a process of
data cleaning, data
integration, data
transformation, data loading
and periodic data refreshing.
Data Warehouses (2)
• Data are organized around major subjects, e.g. customer, item, supplier and activity.
• Provide information from a historical perspective (e.g. from the past 5 – 10 years)
• Typically summarized to a higher level (e.g. a summary of the
transactions per item type for each store)
• User can perform drill-down or roll-up operation to view the data at different degrees of
summarization
Database
Technology Statistics
Information Machine
Science Data Mining Learning
Visualization Other
Disciplines
[Barry Devlin]
42
What are the users saying...
• Data should be integrated across the
enterprise
• Summary data has a real value to the
organization
• Historical data holds the key to
understanding data over time
• What-if capabilities are required
43
What is Data Warehousing?
A process of transforming
Information data into information and
making it available to users in
a timely enough manner to
make a difference
Data
44
Evolution
• 60’s: Batch reports
• hard to find and analyze information
• inflexible and expensive, reprogram every new request
• 70’s: Terminal-based DSS and EIS (executive information systems)
• still inflexible, not integrated with desktop tools
• 80’s: Desktop data access and analysis tools
• query tools, spreadsheets, GUIs
• easier to use, but only access operational databases
• 90’s: Data warehousing with integrated OLAP engines and tools
45
Very Large Data Bases
• Terabytes -- 10^12 bytes: Walmart -- 24 Terabytes
46
Data Warehousing --
It is a process
• Technique for assembling and
managing data from various sources
for the purpose of answering
business questions. Thus making
decisions that were not previous
possible
• A decision support database
maintained separately from the
organization’s operational database
47
Data Warehouse
• A data warehouse is a
• subject-oriented
• integrated
• time-varying
• non-volatile
48
Explorers, Farmers and Tourists
Tourists: Browse information harvested by
farmers
49
Data Mining works with Warehouse Data
51
Supervised learning:
• The computer is presented with example inputs and their desired outputs,
given by a “teacher”, and the goal is to learn a general rule that maps inputs
to outputs.
• The training process continues until the model achieves the desired level of
accuracy on the training data.
• Some real-life examples are:
• Image Classification: You train with images/labels. Then in the future you give a new
image expecting that the computer will recognize the new object.
• Market Prediction/Regression: You train the computer with historical market data
and ask the computer to predict the new price in the future.
Supervised Machine Learning
• Supervised learning is where you have input variables (x) and an
output variable (Y) and you use an algorithm to learn the mapping
function from the input to the output.
Y = f(X)
• The goal is to approximate the mapping function so well that when
you have new input data (x) that you can predict the output variables
(Y) for that data.
• It is called supervised learning because the process of an
algorithm learning from the training dataset can be thought of
as a teacher supervising the learning process.
• We know the correct answers, the algorithm iteratively makes
predictions on the training data and is corrected by the
teacher.
• Learning stops when the algorithm achieves an acceptable
level of performance.
We have four types of fruits. They are: apple, banana, grape and
cherry.
FRUIT
NO. SIZE COLOR SHAPE
NAME