Machile Learning Mid Note
Machile Learning Mid Note
Query Data: ককছু ডািালসি কিলে সেইন করালনার পলর সসই ডািালসলির বাকিলর ককছু সিকিলে কিনলে পারা
Supervised learning: সেিালন সকালনা একিা instance (এর attributes সিকিলে) সকান ক্লালস belong কলর সসই
ইনফরলেশন থাকলব, Level data সিো থাকলব
Classification: সেিালন একাকিক class আলছ, একিা instance একাকিক ক্লালস থাকলে,
নেু ন একিা instance আসলে সসিা সকান ক্লালস belong সসগুলো সে প্রলেে কিলে সেভ করা িে সসিাই classification
problem
Regression: এিালন সকালনা classification সনই, একিা সভকরলেবে এর সালপলে অনয একিা সভকরলেবে ককভালব সিঞ্জ
িে,সসিাই regression.
or, Estimating unknown value of one variable from known value of another variable
Classification Algorithms:
Decision tree, Naive H classifier,Neural network, SVM,KNN
Multivariate: একিা কসদ্ধান্ত অলনকগুলো variable এর উপলর কডলপন্ড করলে
23 January
Data Mining Concepts and techniques
Machine learning techniques apply কলর data mining করা িে
Searching is not data mining, simple search & query processing is not data mining.
সকাথাে কক আলছ, সসিা জানা থাকলে সসিা িুলজ সবর করা data mining নে
30 January Part-1
Data mining functionalities
- multidimensional concept description
- frequent pattern, association, correlation vs causality
- classification & prediction
- cluster analysis
- outlier analysis
- trend & evolution analysis
- pattern directed or statistical analysis
06 February
Chapter 2 - Getting to know your data
term frequency vector - ডকুলেলন্ট সকান শব্দ কে বার এলসলছ
Types of data sats
- records
- graph and network
- ordered
- spatial, image and multimedia
important characteristics of structured data
- dimensionality
- sparsity- some attributes might be missing
- resolution
- distribution
data objects also called samples, examples, instances, data points, objects, tuples
rows -> data objects, columns -> attributes
Attribute types -
1) Nominal - categories, states or name of things
marital status, id number, zip code, occupation
2) Binary
a) symmetric binary: outcome same importance, ex-gender
b) asymmetric binary: outcome not same importance, ex-positive-negative
3) Ordinal - values have a meaningful order (ranking)
size = {small, medium, large}
Numeric attributes type -
a) interval - no zero point
b) ratio - inherent zero point
13 February
Measuring data similarity and dissimilarity
Cluster - Cluster is a collection of data objects such that the objects within a cluster are similar to one another and
dissimilar to the objects in other clusters.
Data matrix (or object-by-attribute structure): This structure stores the n data objects in the form of a relational
table, or n-by-p matrix (n objects ×p attributes)
Measures of similarity can often be expressed as a function of measures of dissimilarity, For nominal data, sim ( i, j )
= 1 - d( i, j )
===========================================
13 February
Dissimilarity for MIXED attributes
Excellent
20 February (From Previous Semester Record)
Data preprocessing
data cleaning
data integration
data reduction
data transformation & discretization
Data reduction - obtain a reduced representation of the data set that is much smaller in volume, yet closely
maintains the integrity of the original data.
- dimensionality reduction (Wavelet transform,PCA, supervise & nonlinear technique)
- numerosity reduction
- data compression
23 February