0% found this document useful (0 votes)

28 views

UFE Lecture-1 Overview Data

The document is the syllabus for an introductory lecture on artificial intelligence. It provides information about the course, which will cover basic machine learning theories and algorithms through lectures and exercises. It will also introduce Python programming through seminars and apply machine learning techniques to real-world cases. The lecture will define artificial intelligence and machine learning, describe their fields of study and algorithms, and provide a reference for further reading.

Uploaded by

b20fa1751

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

UFE Lecture-1 Overview Data

Uploaded by

b20fa1751

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Introduction to Artificial

Intelligence

UFE – AIF321
Spring semester
Lecture 1 – Overview

01/27/2021 Introduction to2024.01.25

Data Mining, 2nd Edition 1
Tan, Steinbach, Karpatne, Kumar
Information
About course:

○ Lecture & Seminar: Thursday 17:00 ~ 19:40

○ Lecture:
■ Basic theories of Machine Learning (ML) Algorithms
■ Exercises
○ Seminar:
■ Introduction to Python
■ Application of ML algorithms to real-life case

01/27/2021 Introduction to Data Mining, 2nd Edition 2

Tan, Steinbach, Karpatne, Kumar
Section 1: Introduction

01/27/2021 Introduction to Data Mining, 2nd Edition 3

Tan, Steinbach, Karpatne, Kumar
What is Artificial Intelligence?

01/27/2021 Introduction to Data Mining, 2nd Edition 4

Tan, Steinbach, Karpatne, Kumar
Fields

01/27/2021 Introduction to Data Mining, 2nd Edition 5

Tan, Steinbach, Karpatne, Kumar
01/27/2021 Introduction to Data Mining, 2nd Edition 6
Tan, Steinbach, Karpatne, Kumar
Skill sets and tools

Open Data Science Conference 2023

01/27/2021 Introduction to Data Mining, 2nd Edition 7
Tan, Steinbach, Karpatne, Kumar
Classification of AI

01/27/2021 Introduction to Data Mining, 2nd Edition 8

Tan, Steinbach, Karpatne, Kumar
Classification of ML

01/27/2021 Introduction to Data Mining, 2nd Edition 9

Tan, Steinbach, Karpatne, Kumar
ML Algorithms

01/27/2021 Introduction to Data Mining, 2nd Edition 10

Tan, Steinbach, Karpatne, Kumar
Reference
Stanford online course:
https://online.stanford.edu/courses/stats216-introduction-statistical-learning

01/27/2021 Introduction to Data Mining, 2nd Edition 11

Tan, Steinbach, Karpatne, Kumar
01/27/2021 Introduction to Data Mining, 2nd Edition 12
Tan, Steinbach, Karpatne, Kumar
Section 2: Data

01/27/2021 Introduction to Data Mining, 2nd Edition 13

Tan, Steinbach, Karpatne, Kumar
Outline

● Attributes and Objects

● Types of Data

● Data Quality

● Data Preprocessing

01/27/2021 Introduction to Data Mining, 2nd Edition 14

Tan, Steinbach, Karpatne, Kumar
What is Data?

● Collection of data objects Attributes

and their attributes
● An attribute is a property or
characteristic of an object
– Examples: eye color of a
person, temperature, etc.
– Attribute is also known as

Objects
variable, field, characteristic,
dimension, or feature
● A collection of attributes
describe an object
– Object is also known as
record, point, case, sample,
entity, or instance
Attribute Values

● Attribute values are numbers or symbols

assigned to an attribute for a particular object

● Distinction between attributes and attribute values

– Same attribute can be mapped to different attribute
values
◆ Example: height can be measured in feet or meters

– Different attributes can be mapped to the same set of

values
◆ Example: Attribute values for ID and age are integers
– But properties of attribute can be different than the
properties of the values used to represent the
attribute Introduction to Data Mining, 2nd Edition
01/27/2021 16
Tan, Steinbach, Karpatne, Kumar
Measurement of Length
● The way you measure an attribute may not match the
attributes properties.

This scale This scale

preserves preserves
only the the ordering
ordering and additvity
property of properties of
length. length.
Types of Attributes

● There are different types of attributes

– Nominal
◆ Examples: ID numbers, eye color, zip codes
– Ordinal
◆ Examples: rankings (e.g., taste of potato chips on a
scale from 1-10), grades, height {tall, medium, short}
– Interval
◆ Examples: calendar dates, temperatures in Celsius or
Fahrenheit.
– Ratio
◆ Examples: temperature in Kelvin, length, counts,
elapsed time (e.g., time to run a race)
01/27/2021 Introduction to Data Mining, 2nd Edition 18
Tan, Steinbach, Karpatne, Kumar
Properties of Attribute Values

● The type of an attribute depends on which of the

following properties/operations it possesses:
– Distinctness: = ≠
– Order: < >
– Differences are + -
meaningful :
– Ratios are * /
meaningful
– Nominal attribute: distinctness
– Ordinal attribute: distinctness & order
– Interval attribute: distinctness, order & meaningful
differences
– Ratio attribute: all 4 properties/operations

01/27/2021 Introduction to Data Mining, 2nd Edition 19

Tan, Steinbach, Karpatne, Kumar
Difference Between Ratio and Interval

● Is it physically meaningful to say that a

temperature of 10 ° is twice that of 5° on
– the Celsius scale?
– the Fahrenheit scale?
– the Kelvin scale?

● Consider measuring the height above average

– If Bill’s height is three inches above average and
Bob’s height is six inches above average, then would
we say that Bob is twice as tall as Bill?
– Is this situation analogous to that of temperature?

01/27/2021 Introduction to Data Mining, 2nd Edition 20

Tan, Steinbach, Karpatne, Kumar
This categorization of attributes is due to S. S. Stevens
This categorization of attributes is due to S. S. Stevens
Discrete and Continuous Attributes

● Discrete Attribute
– Has only a finite or countably infinite set of values
– Examples: zip codes, counts, or the set of words in a
collection of documents
– Often represented as integer variables.
– Note: binary attributes are a special case of discrete
attributes
● Continuous Attribute
– Has real numbers as attribute values
– Examples: temperature, height, or weight.
– Practically, real values can only be measured and
represented using a finite number of digits.
– Continuous attributes are typically represented as
floating-point variables.
01/27/2021 Introduction to Data Mining, 2nd Edition 23
Tan, Steinbach, Karpatne, Kumar
Critiques of the attribute categorization

● Real data is approximate and noisy

– This can complicate recognition of the proper attribute type
– Treating one attribute type as another may be approximately
correct

01/27/2021 Introduction to Data Mining, 2nd Edition 24

Tan, Steinbach, Karpatne, Kumar
Key Messages for Attribute Types

● The types of operations you choose should be

“meaningful” for the type of data you have
– Distinctness, order, meaningful intervals, and meaningful
ratios are only four (among many possible) properties of
data

– The data type you see – often numbers or strings – may

not capture all the properties or may suggest properties
that are not present

– Analysis may depend on these other properties of the data

◆ Many statistical analyses depend only on the distribution

– In the end, what is meaningful can be specific to domain

01/27/2021 Introduction to Data Mining, 2nd Edition 25

Tan, Steinbach, Karpatne, Kumar
Important Characteristics of Data

– Dimensionality (number of attributes)

◆ High dimensional data brings a number of challenges

– Sparsity
◆ Only presence counts

– Resolution
◆ Patterns depend on the scale

– Size
◆ Type of analysis may depend on size of data

01/27/2021 Introduction to Data Mining, 2nd Edition 26

Tan, Steinbach, Karpatne, Kumar
Types of data sets
● Record
– Data Matrix
– Document Data
– Transaction Data
● Graph
– World Wide Web
– Molecular Structures
● Ordered
– Spatial Data
– Temporal Data
– Sequential Data
– Genetic Sequence Data

01/27/2021 Introduction to Data Mining, 2nd Edition 27

Tan, Steinbach, Karpatne, Kumar
Record Data

● Data that consists of a collection of records, each

of which consists of a fixed set of attributes

01/27/2021 Introduction to Data Mining, 2nd Edition 28

Tan, Steinbach, Karpatne, Kumar
Data Matrix

● If data objects have the same fixed set of numeric

attributes, then the data objects can be thought of as
points in a multi-dimensional space, where each
dimension represents a distinct attribute

● Such a data set can be represented by an m by n matrix,

where there are m rows, one for each object, and n
columns, one for each attribute

01/27/2021 Introduction to Data Mining, 2nd Edition 29

Tan, Steinbach, Karpatne, Kumar
Document Data

● Each document becomes a ‘term’ vector

– Each term is a component (attribute) of the vector
– The value of each component is the number of times
the corresponding term occurs in the document.

01/27/2021 Introduction to Data Mining, 2nd Edition 30

Tan, Steinbach, Karpatne, Kumar
Transaction Data

● A special type of data, where

– Each transaction involves a set of items.
– For example, consider a grocery store. The set of products
purchased by a customer during one shopping trip constitute a
transaction, while the individual products that were purchased
are the items.
– Can represent transaction data as record data

01/27/2021 Introduction to Data Mining, 2nd Edition 31

Tan, Steinbach, Karpatne, Kumar
Graph Data

● Examples: Generic graph, a molecule, and webpages

Benzene Molecule: C6H6

01/27/2021 Introduction to Data Mining, 2nd Edition 32
Tan, Steinbach, Karpatne, Kumar
Ordered Data

● Sequences of transactions
Items/Events

An element of
the sequence
01/27/2021 Introduction to Data Mining, 2nd Edition 33
Tan, Steinbach, Karpatne, Kumar
Ordered Data

● Genomic sequence data

01/27/2021 Introduction to Data Mining, 2nd Edition 34

Tan, Steinbach, Karpatne, Kumar
Ordered Data

● Spatio-Temporal Data

Average Monthly
Temperature of
land and ocean

01/27/2021 Introduction to Data Mining, 2nd Edition 35

Tan, Steinbach, Karpatne, Kumar
Data Quality

● Poor data quality negatively affects many data processing

efforts

● Data mining example: a classification model for detecting

people who are loan risks is built using poor data
– Some credit-worthy candidates are denied loans
– More loans are given to individuals that default

01/27/2021 Introduction to Data Mining, 2nd Edition 36

Tan, Steinbach, Karpatne, Kumar
Data Quality …

● What kinds of data quality problems?

● How can we detect problems with the data?
● What can we do about these problems?

● Examples of data quality problems:

– Noise and outliers
– Wrong data
– Fake data
– Missing values
– Duplicate data
01/27/2021 Introduction to Data Mining, 2nd Edition 37
Tan, Steinbach, Karpatne, Kumar
Noise

● For objects, noise is an extraneous object

● For attributes, noise refers to modification of original values
– Examples: distortion of a person’s voice when talking on a poor phone
and “snow” on television screen
– The figures below show two sine waves of the same magnitude and
different frequencies, the waves combined, and the two sine waves with
random noise
◆ The magnitude and shape of the original signal is distorted

01/27/2021 Introduction to Data Mining, 2nd Edition 38

Tan, Steinbach, Karpatne, Kumar
Outliers

● Outliers are data objects with characteristics that

are considerably different than most of the other
data objects in the data set
– Case 1: Outliers are
noise that interferes
with data analysis

– Case 2: Outliers are

the goal of our analysis
◆ Credit card fraud
◆ Intrusion detection

● Causes?
01/27/2021 Introduction to Data Mining, 2nd Edition 39
Tan, Steinbach, Karpatne, Kumar
Missing Values

● Reasons for missing values

– Information is not collected
(e.g., people decline to give their age and weight)
– Attributes may not be applicable to all cases
(e.g., annual income is not applicable to children)

● Handling missing values

– Eliminate data objects or variables
– Estimate missing values
◆ Example: time series of temperature
◆ Example: census results
– Ignore the missing value during analysis

01/27/2021 Introduction to Data Mining, 2nd Edition 40

Tan, Steinbach, Karpatne, Kumar
Duplicate Data

● Data set may include data objects that are

duplicates, or almost duplicates of one another
– Major issue when merging data from heterogeneous
sources

● Examples:
– Same person with multiple email addresses

● Data cleaning
– Process of dealing with duplicate data issues

● When should duplicate data not be removed?

01/27/2021 Introduction to Data Mining, 2nd Edition 41
Tan, Steinbach, Karpatne, Kumar
Selecting the Right Proximity Measure

● Choice of the right proximity measure depends on the domain

● What is the correct choice of proximity measure for the
following situations?
– Comparing documents using the frequencies of words
◆ Documents are considered similar if the word frequencies are similar

– Comparing the temperature in Celsius of two locations

◆ Two locations are considered similar if the temperatures are similar in
magnitude

– Comparing two time series of temperature measured in Celsius

◆ Two time series are considered similar if their “shape” is similar, i.e., they vary
in the same way over time, achieving minimums and maximums at similar
times, etc.

01/27/2021 Introduction to Data Mining, 2nd Edition 42

Tan, Steinbach, Karpatne, Kumar

Lecture Notes For Chapter 2 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 2 Introduction To Data Mining, 2 Edition
96 pages
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
8.4.1.2 Packet Tracer - Configure and Verify A Site-To-Site IPsec VPN Using CLI
No ratings yet
8.4.1.2 Packet Tracer - Configure and Verify A Site-To-Site IPsec VPN Using CLI
5 pages
Chap2 Data
No ratings yet
Chap2 Data
87 pages
chap2_data (1)
No ratings yet
chap2_data (1)
105 pages
CH2 data 1
No ratings yet
CH2 data 1
35 pages
Lecture Notes For Chapter 2 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 2 Introduction To Data Mining, 2 Edition
87 pages
2DMT
No ratings yet
2DMT
73 pages
Chap2 Data
No ratings yet
Chap2 Data
92 pages
Chap2 Data
No ratings yet
Chap2 Data
78 pages
Chap2 Data
No ratings yet
Chap2 Data
86 pages
All Data Mining Chapters
No ratings yet
All Data Mining Chapters
235 pages
APznzaYWmQ-qTobj5RXzz8-xtNTobjIxUBBK2CZPI-jNfIhVqkF8b7cZ1tNuaihsGv4VttsFBJ5w8X_jB6b8UegcEnFTG3Rxj-fuGplOc4YDZDKmOqayvVrdoHINtkuN-c4OgbbeX9-btpgsT__OEpp7NeVkQh3HGSQfs_p5pWsx9Et69wyRSeULRuX9f3pX8A4L8v1-fJ7
No ratings yet
APznzaYWmQ-qTobj5RXzz8-xtNTobjIxUBBK2CZPI-jNfIhVqkF8b7cZ1tNuaihsGv4VttsFBJ5w8X_jB6b8UegcEnFTG3Rxj-fuGplOc4YDZDKmOqayvVrdoHINtkuN-c4OgbbeX9-btpgsT__OEpp7NeVkQh3HGSQfs_p5pWsx9Et69wyRSeULRuX9f3pX8A4L8v1-fJ7
67 pages
Lecture Notes For Chapter 2 Introduction To Data Mining
No ratings yet
Lecture Notes For Chapter 2 Introduction To Data Mining
34 pages
Lecture Notes For Chapter 2 Introduction To Data Mining: by Tan, Steinbach, Kumar
100% (1)
Lecture Notes For Chapter 2 Introduction To Data Mining: by Tan, Steinbach, Kumar
66 pages
Attribute Type Description Examples Operations: Attribute Level Transformation Comments
No ratings yet
Attribute Type Description Examples Operations: Attribute Level Transformation Comments
33 pages
Chap2 Data
No ratings yet
Chap2 Data
68 pages
Lecture Notes For Chapter 2: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 2: by Tan, Steinbach, Kumar
25 pages
3_Introduction to Data (3)
No ratings yet
3_Introduction to Data (3)
55 pages
lec1
No ratings yet
lec1
27 pages
3 - Introduction To Data
No ratings yet
3 - Introduction To Data
56 pages
Data Mining Chapter 2 Notes
No ratings yet
Data Mining Chapter 2 Notes
87 pages
Data Mining: Data: Lecture Notes For Chapter 2
No ratings yet
Data Mining: Data: Lecture Notes For Chapter 2
34 pages
Preprocessing
No ratings yet
Preprocessing
20 pages
Association Analysis: Advance Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis: Advance Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
87 pages
Datascience
No ratings yet
Datascience
28 pages
Chap1 Intro
No ratings yet
Chap1 Intro
32 pages
Unit4 Cluster Analysis 10oct
No ratings yet
Unit4 Cluster Analysis 10oct
133 pages
Chap7 Extended Association Analysis
No ratings yet
Chap7 Extended Association Analysis
67 pages
Handling Continuous Attributes: Different Kinds of Rules
No ratings yet
Handling Continuous Attributes: Different Kinds of Rules
33 pages
Ragb Alllnkg Kyoulltherrdz: in Structor
No ratings yet
Ragb Alllnkg Kyoulltherrdz: in Structor
31 pages
DM 2 Part 1
No ratings yet
DM 2 Part 1
50 pages
1. Performance Evaluation
No ratings yet
1. Performance Evaluation
56 pages
Lecture Notes For Chapter 7 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 7 Introduction To Data Mining: by Tan, Steinbach, Kumar
67 pages
Chap5 Basic Cluster Analysis
No ratings yet
Chap5 Basic Cluster Analysis
110 pages
Data Mining - Cluster Analysis Basic Concepts and Algorithms
No ratings yet
Data Mining - Cluster Analysis Basic Concepts and Algorithms
98 pages
Instant download (eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tan pdf all chapter
100% (1)
Instant download (eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tan pdf all chapter
45 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
34 pages
Introduction To Data Mining
100% (1)
Introduction To Data Mining
643 pages
DM Consolidated
100% (1)
DM Consolidated
676 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
34 pages
Chap6 Advanced Association Analysis
No ratings yet
Chap6 Advanced Association Analysis
85 pages
DM Lect3 4
No ratings yet
DM Lect3 4
30 pages
(eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tanpdf download
100% (8)
(eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tanpdf download
51 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
31 pages
Introduction To Data Mining Instructors Solution Manual 1st ed. Edition Tan download
100% (1)
Introduction To Data Mining Instructors Solution Manual 1st ed. Edition Tan download
48 pages
APznzabwlzV5M2e5GjQ954nHSvXZJgoScUzxJJGGObe92caYJVEnxuSRlgugOxlDuIjc-9F42C4ZhbwuYnh0O69UinLutAfSUZxUg2Nuy6xm-Rs3ubxzNFS7ZmZOgZDG2KcsCi2ukySFiw0LC9JPY6dbbd5SMEZWe8kjP5IWtAn_cWgcAMBg1fG60cRdL3iMi5hZ56pOq9v
No ratings yet
APznzabwlzV5M2e5GjQ954nHSvXZJgoScUzxJJGGObe92caYJVEnxuSRlgugOxlDuIjc-9F42C4ZhbwuYnh0O69UinLutAfSUZxUg2Nuy6xm-Rs3ubxzNFS7ZmZOgZDG2KcsCi2ukySFiw0LC9JPY6dbbd5SMEZWe8kjP5IWtAn_cWgcAMBg1fG60cRdL3iMi5hZ56pOq9v
82 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
28 pages
R21 DM Unit1
No ratings yet
R21 DM Unit1
77 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
37 pages
Chap9 Anomaly Detection
No ratings yet
Chap9 Anomaly Detection
46 pages
Complete Download (eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tan PDF All Chapters
100% (4)
Complete Download (eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tan PDF All Chapters
61 pages
Data Science Mid Syllabus
No ratings yet
Data Science Mid Syllabus
102 pages
Nptel Swayam DWDM Slides
No ratings yet
Nptel Swayam DWDM Slides
406 pages
Lecture Notes For Chapter 1 Introduction To Data Mining
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining
16 pages
Chap8 Advanced Cluster Analysis
No ratings yet
Chap8 Advanced Cluster Analysis
45 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Finite Element Methods
From Everand
Finite Element Methods
Rahul Basu
No ratings yet
Basic Concepts in Data Structures
From Everand
Basic Concepts in Data Structures
K.Meenendranath Reddy
No ratings yet
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
Data Types: Getting Started With Statistics
From Everand
Data Types: Getting Started With Statistics
Lee Baker
No ratings yet
Chapter 1
No ratings yet
Chapter 1
6 pages
CMAT Colleges
No ratings yet
CMAT Colleges
95 pages
DNS Cargo Care 2
No ratings yet
DNS Cargo Care 2
13 pages
OD327981941808242100
No ratings yet
OD327981941808242100
1 page
What Is A Boat Note and Where It Is Used
No ratings yet
What Is A Boat Note and Where It Is Used
8 pages
CropSci101 Module 1 (ONLINE)
100% (1)
CropSci101 Module 1 (ONLINE)
52 pages
AI Project Cycle_Question Bank
No ratings yet
AI Project Cycle_Question Bank
34 pages
Security Economics Knowledge Guide
No ratings yet
Security Economics Knowledge Guide
25 pages
MSDS HDL Cholesterol
No ratings yet
MSDS HDL Cholesterol
5 pages
F7 - C2 & C3A PPE & Borrowing Cost Ans
No ratings yet
F7 - C2 & C3A PPE & Borrowing Cost Ans
53 pages
Trainity Project-6
No ratings yet
Trainity Project-6
12 pages
Permission Is Granted Subject To The Following Conditions
No ratings yet
Permission Is Granted Subject To The Following Conditions
2 pages
A Guide To Grading Equipment - Guides - Shining Moon - Forum
No ratings yet
A Guide To Grading Equipment - Guides - Shining Moon - Forum
9 pages
Flight Planning and Performance Exam Question
No ratings yet
Flight Planning and Performance Exam Question
7 pages
Senior Permit Application
0% (1)
Senior Permit Application
2 pages
LBPH Hematology SOP 006
No ratings yet
LBPH Hematology SOP 006
4 pages
Spare Parts List: Washing Machines and Tumble Dryers
No ratings yet
Spare Parts List: Washing Machines and Tumble Dryers
8 pages
Types of Application (Legal Basis) - GOV - UK
No ratings yet
Types of Application (Legal Basis) - GOV - UK
3 pages
RRB JE CBT 1 Previous Year Papers - 4701
No ratings yet
RRB JE CBT 1 Previous Year Papers - 4701
53 pages
Neil Armstrong Interview (1979)
100% (2)
Neil Armstrong Interview (1979)
19 pages
Motorcycle Maximal Safe Speed in Cornering Situation: Hamid Slimi, Dalil Ichalal, Hichem Arioui, Sa Id Mammar
No ratings yet
Motorcycle Maximal Safe Speed in Cornering Situation: Hamid Slimi, Dalil Ichalal, Hichem Arioui, Sa Id Mammar
8 pages
Experimental Investigation on Shear Behaviour of Riveted Connections In Steel Structures
No ratings yet
Experimental Investigation on Shear Behaviour of Riveted Connections In Steel Structures
16 pages
Copy of MUMBAI VISITING SHEET - MARKETING
No ratings yet
Copy of MUMBAI VISITING SHEET - MARKETING
9 pages
TCAEMM Emptyvaluesdocument
0% (1)
TCAEMM Emptyvaluesdocument
241 pages
Nigerian Brochure Full Content V2 - LR
No ratings yet
Nigerian Brochure Full Content V2 - LR
35 pages
Bab 5 Pengurusan Kewangan 2 (Payback Period, NPV, IRR)
No ratings yet
Bab 5 Pengurusan Kewangan 2 (Payback Period, NPV, IRR)
27 pages
Exam 6 Worksheet 2
100% (2)
Exam 6 Worksheet 2
3 pages
Business Plan
No ratings yet
Business Plan
29 pages
Course Design Guide: Axia College
No ratings yet
Course Design Guide: Axia College
14 pages

UFE Lecture-1 Overview Data

Uploaded by

UFE Lecture-1 Overview Data

Uploaded by

Introduction to Artificial

01/27/2021 Introduction to2024.01.25

○ Lecture & Seminar: Thursday 17:00 ~ 19:40

01/27/2021 Introduction to Data Mining, 2nd Edition 2

01/27/2021 Introduction to Data Mining, 2nd Edition 3

01/27/2021 Introduction to Data Mining, 2nd Edition 4

01/27/2021 Introduction to Data Mining, 2nd Edition 5

Open Data Science Conference 2023

01/27/2021 Introduction to Data Mining, 2nd Edition 8

01/27/2021 Introduction to Data Mining, 2nd Edition 9

01/27/2021 Introduction to Data Mining, 2nd Edition 10

01/27/2021 Introduction to Data Mining, 2nd Edition 11

01/27/2021 Introduction to Data Mining, 2nd Edition 13

● Attributes and Objects

01/27/2021 Introduction to Data Mining, 2nd Edition 14

● Collection of data objects Attributes

● Attribute values are numbers or symbols

● Distinction between attributes and attribute values

– Different attributes can be mapped to the same set of

This scale This scale

● There are different types of attributes

● The type of an attribute depends on which of the

01/27/2021 Introduction to Data Mining, 2nd Edition 19

● Is it physically meaningful to say that a

● Consider measuring the height above average

01/27/2021 Introduction to Data Mining, 2nd Edition 20

● Real data is approximate and noisy

01/27/2021 Introduction to Data Mining, 2nd Edition 24

● The types of operations you choose should be

– The data type you see – often numbers or strings – may

– Analysis may depend on these other properties of the data

– In the end, what is meaningful can be specific to domain

01/27/2021 Introduction to Data Mining, 2nd Edition 25

– Dimensionality (number of attributes)

01/27/2021 Introduction to Data Mining, 2nd Edition 26

01/27/2021 Introduction to Data Mining, 2nd Edition 27

● Data that consists of a collection of records, each

01/27/2021 Introduction to Data Mining, 2nd Edition 28

● If data objects have the same fixed set of numeric

● Such a data set can be represented by an m by n matrix,

01/27/2021 Introduction to Data Mining, 2nd Edition 29

● Each document becomes a ‘term’ vector

01/27/2021 Introduction to Data Mining, 2nd Edition 30

● A special type of data, where

01/27/2021 Introduction to Data Mining, 2nd Edition 31

● Examples: Generic graph, a molecule, and webpages

Benzene Molecule: C6H6

● Genomic sequence data

01/27/2021 Introduction to Data Mining, 2nd Edition 34

01/27/2021 Introduction to Data Mining, 2nd Edition 35

● Poor data quality negatively affects many data processing

● Data mining example: a classification model for detecting

01/27/2021 Introduction to Data Mining, 2nd Edition 36

● What kinds of data quality problems?

● Examples of data quality problems:

● For objects, noise is an extraneous object

01/27/2021 Introduction to Data Mining, 2nd Edition 38

● Outliers are data objects with characteristics that

– Case 2: Outliers are

● Reasons for missing values

● Handling missing values

01/27/2021 Introduction to Data Mining, 2nd Edition 40

● Data set may include data objects that are

● When should duplicate data not be removed?

● Choice of the right proximity measure depends on the domain

– Comparing the temperature in Celsius of two locations

– Comparing two time series of temperature measured in Celsius

01/27/2021 Introduction to Data Mining, 2nd Edition 42

You might also like