0% found this document useful (0 votes)

15 views34 pages

1 - Data Mining and Analysis

The document provides an overview of data science and artificial intelligence, detailing key concepts such as machine learning, deep learning, and data mining. It discusses the roles of various experts in the field and highlights the differences between AI and augmented intelligence, as well as the importance of reinforcement learning. Additionally, it covers data analytics, data matrices, and the probabilistic view of data, emphasizing the significance of understanding data attributes and their classifications.

Uploaded by

contactsachinjorwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views34 pages

1 - Data Mining and Analysis

Uploaded by

contactsachinjorwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Data Science

Dr. Teena Sharma

DAI 101
Ph.D., University of Quebec at Chicoutimi, Canada
© IIT Roorkee India
([email protected])
2
Expert Talk –foreign, India
1. Prof. Rajasen Gupta (Professor Mcgill University, Montreal Canada)
2. Prof Abdellah Chehri (Royal Military College, Kingston, Canada)
3. Prof Issouf Fofana (University of Quebec at Chicoutimi, Quebec Canada)
4. Dr. Benoit Duglas (Thales, Canada)
5. Dr. Roshan Jain (Startup in AI, Waterloo, Ontario, Canada)
6. Mr. Abhishek (Manager, software developer, Accenture, Quebec, Canada)
7. Ms. Dhavni Sharma (Working at International air transport association
(IITA, Montreal, Canada)
8. Mrs. Aakansha Chawla, MBA (Business analyst, IITA, Montreal, Canada)
9. Prof. Hitesh Upreti (Professor Shivnadar University, Greater Noida)

3
AI fundamentals

Artificial Intelligence: Simulation of human intelligence in

machines enabling them to perform tasks typically require
human thinking. Chat bot ELISA (developed in mid 1960s
and could mimic human like conversation to an extent). It’s
a very broad terms encompassing several techniques.

AI’s ability to learn and adapt has the potential to transform

entire industries, create new innovations and ultimately
benefit society as a whole.

4
Cont.

Machine Learning: A subfield of AI, focussing on

developing algorithms that allow computers to learn from
and make decisions based upon data rather than being
explicitly programed to perform a specific task.
These algorithms uses statistical techniques to learn
patterns in data and make predictions or decisions
without human intervention. It’s again a broad terms and
uses traditional statistical methods and complex neural
networks (Categories: SL, UL, RL).

5
Cont.
Deep Learning: Artificial neural networks with multiple layers (nodes and
connections). ML can extract simpler patterns in data while DL excels at handling
vast amounts of big data (unstructured data) like images or natural language.

Foundation models: popularized in 2021 by researchers at the Stanford institute

and provide more generalized and scalable AI solutions. These models are large
scale neural networks pretrained on vast amount of data and they serve as a base
for a multitude of applications. So, instead of training a model from scratch for each
specific task, you can take a pretrained foundation model and fine tune it for a
particular application (save both resources and time). Can perform task ranging
from language translation to content generation to image recognition. They can
handle many types of inputs image, audio and text.

6
Cont.
Large Language Model: Type of foundation model which is trained on large amount of
text data. L stands for large scale (billions or millions of parameters). Next, L stands for
language that designed to understand and interact using human languages as they are
trained on massive datasets. They are used in NLP task such as such as understanding
context, answering questions, generating text and even translation.

Vision model: It can see in and quotes, interpret and generate images.
Scientific models: are used in biology where there are models for predicting how proteins
fold into 3D shape.
Audio model: for generating human sounding, speech or composing the next fake drake hit
song.

Generative AI: Models and algorithms specifically crafted to generate new content.
Foundation models provide the underlying structure and understanding, GI is about
harnessing that knowledge to produce something that is new. It’s a broad field of AI that
uses algorithms to create new content like text, images, videos, audio, code, and
simulations.
7
Example?
AI and Augment AI

AI: is the ability for leveraging computers or machines to mimic the problem
solving and decision-making capabilities of human mind. It can perform task and
make decisions that normally require human intelligence, such as reasoning, natural
communication and problem solving. Basically, replaces the need of humans.

Augmented intelligence: m/c and humans both work together by enhancing

each other’s efforts when completing tasks. It augment human abilities, such as
screen reader for blind, voice navigation or in-car collision avoidance system or
blind spot detection system. They complement our own capabilities.
So, AI or Augment AI?

8
Reinforcement learning in AI

Reinforcement learning in AI is when machines learn to make better decisions by

trying things out and getting feedback. For example, it can be used to teach a robot
how to navigate in a room. When robot perform an action, such as stopping,
turning around or moving forward, it then receives a reward or penalty based on
how well it did. The robot uses this feedback to learn and improve its decision-
making abilities and over time it gets better at navigating in the room.

Use cases: Robotics, gaming, autonomous vehicles and recommendation systems

use reinforcement learning to improve performance. The ability to learn from
mistakes and get better over time makes reinforcement learning a critical tool in AI.

9
Reinforcement learning

Reinforcement learning (RL) is a machine learning (ML) technique that

trains software to make decisions to achieve the most optimal results.
It mimics the trial-and-error learning process that humans use to
achieve their goals, through a feedback system, the agent learns from
its environment and optimizes its behaviors.

During training, model perceive and interpret its environment, take

actions and learn through trial and error. E.g., such as a feature in a
video game or a robot in an industrial setting and recommendation
systems.

1
0
Reinforcement Learning

1
1
Data Analytics and Data Science

Data analytics involves examining data to extract meaningful insights,

while data science encompasses a wider scope, including data
collection, cleaning, analysis, and machine learning modeling for
predictive insights and decision-making.

Data analytics focuses more on analyzing the past data or historical

data (explaining the past) to predict or forecast future, outcome or
decision making. E.g., Amazon product sale or temperature
prediction.

1
2
Introduction to Data Mining
and Analysis

13
Data Mining
• Data mining is the process of discovering insightful, interesting, and
novel patterns, as well as deriving descriptive, understandable, and
predictive models from large-scale data.

• At the heart of data mining is data itself.

• We begin this course by looking at basic properties of data modeled

as a data matrix

1
4
Data Matrix
• Data can often be represented or abstracted as an n x d data matrix,
with n rows and d columns, where rows correspond to entities in the
dataset, and columns represent attributes or properties of interest

1
5
Data Matrix
• Rows: Also called instances, examples, records, transactions, objects,
points, feature-vectors, etc. Given as a d-tuple

• Columns: Also called attributes, properties, features, dimensions,

variables, fields, etc. Given as an n-tuple

1
6
Attribute Classification
Discrete Attribute
Has a finite or countably set of values
Examples: Zip codes, click counts, set of words in a collection
of documents (often represented as integer values)
Binary attribute is a special case of discrete attribute

Continuous Attribute
Has real numbers as attribute values
Examples: temperature, height, or weight
Continuous attributes are typically represented as floating-point
variables

1
7
Attributes
Attributes may be classified into two main types
• Numeric Attributes: real-valued or integer-valued domain
• Interval-scaled: only differences are meaningful, e.g., temperature
• Ratio-scaled: differences and ratios are meaningful, e.g., Age
• Categorical Attributes: set-valued domain composed of a set
of symbols
• Nominal: only equality is meaningful e.g., domain(Sex) = { M, F}
• Ordinal: both equality (are two values the same?) and inequality (is one
value less than another?) are meaningful e.g., domain(Education) = {
High School, BS, MS, PhD}

1
8
19
Iris Dataset Extract

2
0
Data: Algebraic and Geometric View
• For numeric data matrix D, each row is a d-dimensional data point (i.e., a
vector with d attributes):

whereas each column is an n-dimensional attribute vector (i.e., a vector

with n data points).

2
1
Data: Algebraic and Geometric View

2
2
Scatterplot:
2D Iris
Dataset
sepal length
versus sepal
width.

What about more than two attributes?. 92

3
Numeric Data Matrix
• If all attributes are numeric, then the data matrix D is an n x d matrix,
or equivalently a set of n row vectors xiT ∈ Rd or a set of d column
vectors Xj ∈ Rn

• The mean of the data matrix D is the average of all the points:

24
Numeric Data Matrix
• The centered data matrix is obtained by subtracting the mean
from all the points:

25
Norm, Distance and Angle

26
Norm, Distance and Angle

27
Orthogonal Projection

28
DATA: PROBABILISTIC VIEW
• The probabilistic view of the data assumes that each numeric
attribute X is a random variable, defined as a function that assigns a
real number to each outcome of an experiment.

• Formally, X is a function X : O → R, where O, the domain of X, is the

set of all possible outcomes of the experiment, also called the sample
space, and R, the range of X, is the set of real numbers.
X ( O: all possible outcomes or sample space, R= Range)

• If the outcomes are numeric, and represent the observed values of

the random variable, then X: O →O is simply the identity function:
X(v) = v for all v ∈ O.
29
DATA: PROBABILISTIC VIEW
• The distinction between the outcomes and the value of the random
variable is important, as we may want to treat the observed values
differently depending on the context

• A random variable X is called a discrete random variable if it takes on

only a finite or countably infinite number of values in its range,
whereas X is called a continuous random variable if it can take on any
value in its range.

30
Example
• Consider the sepal length attribute (X1) for the Iris dataset in.
• All n = 150 values of this attribute lie in the range [4.3,7.9], with
centimeters as the unit of measurement.
• Let us assume that these constitute the set of all possible outcomes
O.

• By default, we can consider the attribute X1 to be a continuous

random variable, given as the identity function X1(v) = v, because the
outcomes (sepal length values) are all numeric.

31
Example Cont.,
• On the other hand, if we want to distinguish between Iris flowers
with short and long sepal lengths, with long being, say, a length of 7 cm
or more, we can define a discrete random variable A as follows:

• In this case the domain of A is [4.3,7.9], and its range is {0,1}.

32
Probability Mass Function
• If X is discrete, the probability mass function of X is defined as

• Intuitively, for a discrete variable X, the probability is concentrated or

massed at only discrete values in the range of X, and is zero for all
other values.

33
Next…

Data Exploration

Canny Serial Control Manual Ingles
100% (4)
Canny Serial Control Manual Ingles
117 pages
Unit 2 - Advance Concepts of Modelling in AI
No ratings yet
Unit 2 - Advance Concepts of Modelling in AI
12 pages
Canada NOC Code List PDF 2024 - In-Demand Jobs in Canada
No ratings yet
Canada NOC Code List PDF 2024 - In-Demand Jobs in Canada
363 pages
Letter To Regional Provident Fund Commissioner
69% (16)
Letter To Regional Provident Fund Commissioner
2 pages
Final
No ratings yet
Final
24 pages
SHS LCS Q1 Las Le2
No ratings yet
SHS LCS Q1 Las Le2
6 pages
NTC ESD Process Flow and Requirements For Type Approval and Acceptance Certificate Application
No ratings yet
NTC ESD Process Flow and Requirements For Type Approval and Acceptance Certificate Application
33 pages
Animal Breeding Methods
No ratings yet
Animal Breeding Methods
186 pages
AI Facilitators Handbook Xprint
No ratings yet
AI Facilitators Handbook Xprint
197 pages
Modelling & Neural Network Grade 9
0% (1)
Modelling & Neural Network Grade 9
71 pages
Advanced Programme In: Supply Chain Management
No ratings yet
Advanced Programme In: Supply Chain Management
19 pages
Ai Project Cycle
No ratings yet
Ai Project Cycle
30 pages
AI Unit 2
No ratings yet
AI Unit 2
38 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
21 pages
Z-Series Iso HC Manual
No ratings yet
Z-Series Iso HC Manual
2 pages
07 DC Motor Direction Control PDF
No ratings yet
07 DC Motor Direction Control PDF
7 pages
Erro ORA 00933
No ratings yet
Erro ORA 00933
187 pages
Shounter Volume III, Section - 4
No ratings yet
Shounter Volume III, Section - 4
99 pages
Iml Material
No ratings yet
Iml Material
139 pages
GR 9 - MODELLING & NEURAL NETWORK
No ratings yet
GR 9 - MODELLING & NEURAL NETWORK
71 pages
DS Xi Sec3
No ratings yet
DS Xi Sec3
101 pages
MarSurf PS1 Instruction Manual
No ratings yet
MarSurf PS1 Instruction Manual
66 pages
ML Module 1 Final
No ratings yet
ML Module 1 Final
134 pages
ML Unit 1 Pallav
No ratings yet
ML Unit 1 Pallav
22 pages
CSD411-Week 2 - DAH 2
No ratings yet
CSD411-Week 2 - DAH 2
62 pages
Nozomi Networks WP Drone Telemetry
No ratings yet
Nozomi Networks WP Drone Telemetry
73 pages
Class Notes - XI
No ratings yet
Class Notes - XI
17 pages
Website Development Agreement
No ratings yet
Website Development Agreement
9 pages
MVDAFT Final
No ratings yet
MVDAFT Final
30 pages
Great Writing - 1 - Catherinescrossculturalcafe - Page 1 - 210 - Flip PDF Online - PubHTML5
No ratings yet
Great Writing - 1 - Catherinescrossculturalcafe - Page 1 - 210 - Flip PDF Online - PubHTML5
210 pages
Unit 2 AIML
No ratings yet
Unit 2 AIML
23 pages
DL Unit 1
No ratings yet
DL Unit 1
27 pages
Antim Prahar 2024 AI and ML For Business
No ratings yet
Antim Prahar 2024 AI and ML For Business
43 pages
AI With ICA 18092024 074806pm
No ratings yet
AI With ICA 18092024 074806pm
36 pages
PDF&Rendition 1 4
No ratings yet
PDF&Rendition 1 4
33 pages
AI Intro Session
No ratings yet
AI Intro Session
21 pages
Machine Learning: Upendra Verma
No ratings yet
Machine Learning: Upendra Verma
34 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
DLunit 1
No ratings yet
DLunit 1
20 pages
Ai Xi Sec3
No ratings yet
Ai Xi Sec3
47 pages
MLDM Lect1 Introduction
No ratings yet
MLDM Lect1 Introduction
40 pages
Xi Ai - Unit 1 Notes & Exercise
No ratings yet
Xi Ai - Unit 1 Notes & Exercise
18 pages
Textbook ML - Removed - Removed - Removed
No ratings yet
Textbook ML - Removed - Removed - Removed
42 pages
Ai Life Cycle
No ratings yet
Ai Life Cycle
30 pages
EPS DL Handout1 Introduction Compressed
No ratings yet
EPS DL Handout1 Introduction Compressed
46 pages
Module1 - Deep Learning
No ratings yet
Module1 - Deep Learning
26 pages
Ai and ML qp1 Solved
No ratings yet
Ai and ML qp1 Solved
20 pages
Unit-2 AI Project Cycle
No ratings yet
Unit-2 AI Project Cycle
20 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Introduction To AI and Machine Learning
No ratings yet
Introduction To AI and Machine Learning
21 pages
Nitin Raj Sharma - AIApplicationsInTheDomainsOfMachineLearningAndDeepLearning - NitinRajSh
No ratings yet
Nitin Raj Sharma - AIApplicationsInTheDomainsOfMachineLearningAndDeepLearning - NitinRajSh
9 pages
Answers 111111111111111111111111111
No ratings yet
Answers 111111111111111111111111111
21 pages
ML Chapter 01
No ratings yet
ML Chapter 01
38 pages
Week3 02 Dataset Characteristics
No ratings yet
Week3 02 Dataset Characteristics
41 pages
Curriclum-Syllabus-MS Data Science & MGT IIT Indore
No ratings yet
Curriclum-Syllabus-MS Data Science & MGT IIT Indore
16 pages
SAP MM - Defining Organizational Structure
No ratings yet
SAP MM - Defining Organizational Structure
19 pages
Unit 3 - Data Science, Machine Learning
No ratings yet
Unit 3 - Data Science, Machine Learning
20 pages
Christ Lecture 9 AI Intro, Evolution, & Terminology
No ratings yet
Christ Lecture 9 AI Intro, Evolution, & Terminology
62 pages
Module 8 (Topic 8) Socialmedia Etiquette
No ratings yet
Module 8 (Topic 8) Socialmedia Etiquette
6 pages
University Licensure Examination Reviewer For Teacher: A Framework For Developing Gamified Examination
No ratings yet
University Licensure Examination Reviewer For Teacher: A Framework For Developing Gamified Examination
14 pages
UNIT - 1 - Introduction To Artificial Intelligence
No ratings yet
UNIT - 1 - Introduction To Artificial Intelligence
27 pages
Chapter 6 Exponential Functions (指數函數) Tutorial Class (常規課堂)
No ratings yet
Chapter 6 Exponential Functions (指數函數) Tutorial Class (常規課堂)
11 pages
Lec 7
No ratings yet
Lec 7
18 pages
Lec 3
No ratings yet
Lec 3
39 pages
Term 2 Ai Notes
No ratings yet
Term 2 Ai Notes
14 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
9 pages
Ai Project Cycle Class 9
No ratings yet
Ai Project Cycle Class 9
13 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
8 pages
Module 1 Part 2
No ratings yet
Module 1 Part 2
19 pages
Final Review SolutionsWritten
No ratings yet
Final Review SolutionsWritten
13 pages
Part B Unit 2 Running Notes and Textbook Questions
No ratings yet
Part B Unit 2 Running Notes and Textbook Questions
27 pages
Class 10 Ai Notes
No ratings yet
Class 10 Ai Notes
8 pages
ML-Exp1
No ratings yet
ML-Exp1
5 pages
Anisha ETL DataEngineer
No ratings yet
Anisha ETL DataEngineer
7 pages
Caterpillar 248b - Loader - Operation Manual - Maintenance PDF
No ratings yet
Caterpillar 248b - Loader - Operation Manual - Maintenance PDF
33 pages
AI Introduction - Adv, Dis, Applications, Techniques
No ratings yet
AI Introduction - Adv, Dis, Applications, Techniques
9 pages
Week Eight Term Project
No ratings yet
Week Eight Term Project
5 pages
Database Management Systems 1
No ratings yet
Database Management Systems 1
7 pages
Lecturer1 (ML)
No ratings yet
Lecturer1 (ML)
7 pages
Kranji MYSEP Jan2011 Web
No ratings yet
Kranji MYSEP Jan2011 Web
9 pages
Marantz RC 2001 Service Manual
No ratings yet
Marantz RC 2001 Service Manual
2 pages
Machine Learning Unit - 1
No ratings yet
Machine Learning Unit - 1
7 pages
Homework 3 Sol PDF
No ratings yet
Homework 3 Sol PDF
4 pages
Polarmods - Patcher Logcat
No ratings yet
Polarmods - Patcher Logcat
4 pages
Domains of AI
No ratings yet
Domains of AI
11 pages
AI Chapter1 Class 10
No ratings yet
AI Chapter1 Class 10
9 pages
Lect 01 DS Intro
No ratings yet
Lect 01 DS Intro
4 pages
Trace
No ratings yet
Trace
2 pages
(Note1 Ai & Edu) : Exabyte
No ratings yet
(Note1 Ai & Edu) : Exabyte
1 page
AI Basics
From Everand
AI Basics
Anand Vemula
No ratings yet