0% found this document useful (0 votes)

2 views

Data - part 1

The document provides an overview of data mining concepts, focusing on datasets, attributes, and data preprocessing techniques. It outlines the types of datasets and attributes, as well as the importance of data cleaning, transformation, and similarity measures. Key preprocessing tasks include handling missing values, outliers, and noise, along with methods for data transformation such as normalization and discretization.

Uploaded by

hamza.oukil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Data - part 1

Uploaded by

hamza.oukil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Data Mining

Lecture 2
Data: Part 1
Mohammed Brahimi & Sami Belkacem

1
Outline

1. What is a Dataset?

2. Types of Datasets

3. Types of Attributes

4. Data Preprocessing

5. Similarity measures

2
1- What is a Dataset?

3
Definition of Dataset
● Dataset: collection of objects and their attributes Attributes

● Attribute: property or characteristic of an object.

Also known as variable, field, characteristic,
dimension, or feature.

Objects
● Object: collection of attributes.
Also known as record, point, case, sample, entity,
or instance.

4
Important Characteristics of Datasets
● Size: The type of analysis often depends on the size of the data.

● Dimensionality: High-dimensional data presents unique challenges.

● Sparsity: Emphasizes the importance of presence over absence.

● Distribution: Considers centrality and dispersion in the data.

● Resolution: Extracted patterns can vary based on the scale of measurement

5
2- Types of Datasets

6
Types of Datasets
● Record Data: records with fixed attributes
○ Relational records
○ Data matrix …
○ Transaction Data

● Graphs and Networks

○ Transportation network
○ Social or information networks…
○ Molecular Structures

● Ordered (Sequence) Data

○ Video: sequence of image
○ Genetic Sequence Data
○ Temporal sequence …

● Spatial Data
○ RGB Images
○ Satellite images

7
Types of Datasets
● Record Data: records with fixed attributes
○ Relational records
○ Data matrix …
○ Transaction Data

● Graphs and Networks

○ Transportation network
○ Social or information networks…
○ Molecular Structures

● Ordered (Sequence) Data

○ Video: sequence of image
○ Genetic Sequence Data
○ Temporal sequence …

● Spatial Data
○ RGB Images
○ Satellite images

8
Types of Datasets
● Record Data: records with fixed attributes
○ Relational records
○ Data matrix …
○ Transaction Data

● Graphs and Networks

○ Transportation network
○ Social or information networks…
○ Molecular Structures

● Ordered (Sequence) Data

○ Video: sequence of image
○ Genetic Sequence Data
○ Temporal sequence …

● Spatial Data
○ RGB Images
○ Satellite images

9
Types of Datasets
● Record Data
○ Relational records
○ Data matrix …
○ Transaction Data

● Graphs and Networks

○ Transportation network
○ Social or information networks…
○ Molecular Structures

● Ordered (Sequence) Data

○ Video: sequence of image
○ Genetic Sequence Data
○ Temporal sequence …

● Spatial Data
○ RGB Images
○ Satellite images

10
3- Types of Attributes

11
Types of Attributes
● Nominal (Unordered Categories)
○ Examples: Gender, eye color, types of fruit (e.g., apple, orange), etc.

● Ordinal (Ordered Categories)

○ Examples: Grades (A,B,C), height (tall, medium, short), swimming level (beginner ... advanced)

● Interval (Numerical, Equal Intervals, No True Zero)

○ Examples: Calendar dates, temperatures in Celsius or Fahrenheit

● Ratio (Numerical, Equal Intervals, True Zero)

○ Examples: Temperature in Kelvin, length, counts, elapsed time (e.g., time to run a race)

12
Types of Operations on Attributes
● Nominal
○ Distinctness ( =, ≠ )
● Ordinal
○ Distinctness ( =, ≠ )
○ Order ( <, > )
● Interval
○ Distinctness ( =, ≠ )
○ Order ( <, > )
○ Meaningful Differences ( +, - )
● Ratio
○ Distinctness ( =, ≠ )
○ Order ( <, > )
○ Meaningful Differences ( +, - )
○ Meaningful Ratios ( *, / )
13
Discrete vs. Continuous Attributes
● Discrete Attribute: Takes values from a finite or
countable set.
○ Examples: gender, eye color, swimming level.
● Typically represented as integers.
● Binary attributes: special case of discrete attributes.

● Continuous Attribute: Takes values within a

continuous range.
○ Examples: height, length, temperature.
● Typically represented as floating-point variables.
14
Asymmetric Attributes
● In asymmetric attributes, only the presence (non-zero value) matters.
● Examples:
■ Words present in documents: Focus on words that appear.
■ Items present in customer transactions: Emphasize purchased items.

● Real-Life Scenario:

In a grocery store, given the number of products, we don’t say:

"Our purchases are similar because we both didn't buy most of the same products."
Instead, we would focus on the products that were bought.

15
4- Data Preprocessing

16
Major Tasks of Data Preprocessing
Data integration (covered in the course advanced databases)

● Integration of multiple databases, data cubes, or files

Data reduction (covered in the next chapter)

● Dimensionality reduction and data compression

Data cleaning

● Handle duplicates and missing values, identify/remove outliers, smooth noisy data, etc.

Data transformation

● Data sampling, encoding, discretization, normalization, etc.

17
Data Cleaning
● Poor data quality can negatively impact modeling efforts.
E.g. In Bank loan prediction, poor data can lead to incorrect loan decisions:
– Some credit-worthy candidates are denied loans
– More loans are given to individuals who are unlikely to be creditworthy

● What types of data quality issues exist, and how can we identify and handle them?

● Examples of data quality problems:

– Duplicate data
– Missing values
– Outliers
– Noise
18
Duplicate Data
● Occurrence of identical or nearly identical data objects.
● Common when merging data from diverse sources.
○ Example: Identical individuals with multiple email addresses.

How to handle duplicate data

● Remove duplicate data objects.

● In some scenarios, we need to keep and handle duplicates, e.g.:
○ Customers with multiple accounts may unintentionally accumulate points separately.
○ Keeping duplicate data ensures customers receive all earned benefits.

19
Missing Values

● Reasons for missing values

○ Information is not collected (e.g., people decline to give their age and weight)

○ Attributes may not be applicable to all cases (e.g., annual income not applicable to children)
20
How to Handle Missing Values?
● Delete Records: Drop records if there is enough data and few missing values

● Keep Missing Data: Keep as NaN if, for example, the missing values are ≥ 60% of the observations

● Imputation-Based Techniques:

Random value: fill in with a random value if introducing noise is acceptable

Average: fill in with the mean or the median for numerical data, and the mode for categorical data

Nearest neighbor: fill in with similar data points based on the nearest neighbors in the dataset

Heuristic-Based: make a reasonable guess based on knowledge of the underlying domain

Interpolation: train a prediction model on the dataset to predict the corresponding value

21
Outliers
● Data objects with characteristics significantly different
from the majority in the dataset.

● Case 1: Outliers as Noise

○ Outliers can be noise that disrupts data analysis.

● Case 2: Outliers as the Focus

○ In certain scenarios, outliers are the primary focus of analysis.
■ Credit card fraud detection
■ Intrusion detection

● Determining Causes:
○ Explore the reasons behind the presence of outliers.
22
How to Handle Outliers?

23
Noise
● Noise in Objects: Irrelevant elements
affecting data integrity.
● Noise in Attributes: Modification of
original attribute values.
● Examples:
○ Erroneous values caused by data entry errors
○ Distorted voice on a poor phone line.
○ "Snow" on a television screen.
○ Etc. Noisy data in signal processing
24
How to Handle Noise?

● Binning: Group data into bins and smooth it using means, medians, or defined boundaries.

● Clustering: Apply clustering to separate out noise points that do not fit well within any cluster.

● Imputation techniques: average, nearest neighbor, heuristic, interpolation, etc.

● Semi-supervised method: Combine automated noise detection tools with human inspection

Note: Incorporating noise into data can sometimes enhance the robustness of data mining models

by preventing overfitting, improving generalization, and fostering adaptability to real-world variations.

25
Data Transformation
Convert data into a format that is suitable for analysis.

● Main data transformation techniques:

○ Sampling: Select a subset to represent a larger population. Objects

○ Encoding: Convert categorical data into numerical formats.

○ Normalization: Scale data to a standard range (e.g., 0 to 1). Attributes

○ Discretization: Convert continuous data into discrete categories.

26
Sampling
Select a subset of the dataset to make it more manageable for analysis

while maintaining its representativeness.

● We use sampling because using the entire dataset is:

○ Expensive: Collecting, storing, and processing vast amounts of data
○ Time-consuming: Analyzing the complete dataset can be impractical due to time constraints.

● Challenges:
○ Ensure the sample is representative of the population.
○ Address potential bias in the sampling process.

27
Sampling methods

28
Sampling methods
Simple Random Sampling
● Every item has an equal chance of being selected (could be with or without replacement)
Systematic Sampling
● Select individuals at regular intervals from a list or group.
Stratified Sampling
● Divide the population into groups (strata) based on a characteristic, then random samples are
taken from each group.
Cluster Sampling
● Divide the population into clusters (often geographically), then entire clusters are randomly
selected for sampling.
29
Encoding
Convert categorical variables into numerical format for data mining algorithms

One-hot encoding
(suitable)

Label encoding
(not suitable)

Encode the nominal attribute

“Country”

30
Encoding methods
Label Encoding
● Converts categories into numerical labels.
● Each category gets a unique integer.
● Can create ordinal relationships, even if not intended (e.g. France (0) < Spain (1))
● Suitable for ordinal attributes
One-Hot Encoding
● Creates a binary column for each category.
● Each category is represented by 1 in its column, 0 elsewhere.
● No ordinal relationships are implied.
● Suitable for nominal attributes
● Increases the dimensionality of the data, which can be a concern with many categories.
More advanced encoding techniques exist for nominal attributes, such as “word embeddings”.
31
Normalization
Scale numerical data to a standard range to ensure attributes contribute equally to the analysis

● Normalization prevent attributes with larger ranges from dominating those with smaller ranges
● Normalization is crucial for the convergence of many data mining algorithms.

Min-max normalization: from [minA, maxA] to [new_minA, new_maxA]

Z-score normalization (μ: mean, σ: standard deviation):

32
Normalization methods

● Min-max normalization:
○ Attributes will have the exact same scale.
○ Does not handle outliers well.

● Z-score normalization:
○ More robust to outliers.
○ Does not produce normalized data with
the exact same scale.
○ Still sensitive to extreme outliers

33
Normalization methods

● Min-max normalization:
○ Attributes will have the exact same scale.
○ Does not handle outliers well.

● Z-score normalization:
○ More robust to outliers.
○ Does not produce normalized data with
the exact same scale.
○ Still sensitive to extreme outliers.

34
Normalization methods
Min-Max Normalization is suitable when:

● We need the data to be scaled to a specific range (e.g. [0, 1])

● The data has no significant outliers

Z-score Normalization is suitable when:

● The data follows a normal distribution (or approximately normal)

● The chosen data mining algorithm is sensitive to the distribution of the data

35
Discretization
Transforming continuous data into discrete intervals (bins)

● A potentially infinite number of values are mapped to a small number of categories.

● The goal is to improve the quality and usability of data for analysis and modeling.
36
Discretization methods
Discretization methods can be classified in two categories:

1. Unsupervised Discretization
No class label is used during the discretization process.

2. Supervised Discretization
Class labels are used to guide the discretization, optimizing it for
classification tasks.

37
Unsupervised Discretization Methods
Equal Width

● Divides the data range into intervals of equal size.

● Simple but can create unbalanced bin counts.

Equal Frequency

● Bins (intervals) have the same number of data points.

● Ensures balanced binning.
● May result in varying interval sizes.

K-means

● Cluster data into k groups

● Assigns each group a representative value.
● Effective in identifying natural groupings but requires
specifying the number of bins.

38
Supervised Discretization Methods

Top-down (Splitting)

● Starts with all data in one bin.

● Recursively splits bins to enhance class separability.

Bottom-up (Merging)

● Starts with each data point in its own bin.

● Merges bins by maximizing class purity.

39
5- Similarity and Dissimilarity Measures

40
Similarity and Dissimilarity Measures
Similarity between objects or attributes reveal valuable data relationships for
pattern recognition, clustering, and classification.
● Similarity Measure:
○ Quantifies data object likeness.
○ Higher values indicate greater similarity.
○ Typically within the range [0,1].

● Dissimilarity Measure:
○ Can also be referred to as Distance Measure.
○ Quantifies data object differences.
○ Lower values indicate greater similarity.
○ Often starts at 0 and varies in the upper limit.

● Proximity:
○ Refers to either similarity or dissimilarity.

41
Similarity and Dissimilarity Measures
1. Properties of Similarity

2. Properties of Distance

3. Similarity and Distance matrix

4. Examples of Similarity and Distance measures

5. Similarity, Distance, and Attribute type

6. How to choose the Similarity/Distance measure?

42
Properties of Similarity
● Identity:
○ s(x, y) = 1 (or maximum similarity) only if x = y.
○ Note: This property may not always hold, e.g., cosine similarity.

● Symmetry:
○ s(x, y) = s(y, x) for all x and y.
○ Symmetry ensures that the order of comparison does not affect the similarity score.

These properties ensure that similarity measures are

reliable and consistent in data analysis.

43
Properties of Distance
● Non-Negativity:
○ d(x, y) ≥ 0 for all x and y.
○ d(x, y) = 0 if and only if x = y.

● Symmetry:
○ d(x, y) = d(y, x) for all x and y.

● Triangle Inequality:
○ d(x, z) ≤ d(x,y) + d(y, z) for all x, y, and z.

These properties ensure that distance measures are

reliable and consistent in data analysis.
44
Similarity and Distance matrix
Consider a dataset with 4 points in a 2D space: A(1,2), B(2,3), C(3,5), D(4,6)

We compute the Euclidean Distance Matrix and Cosine Similarity Matrix

A A
B B
C C
D D

Distance Matrix Similarity Matrix

○ Distances between all data objects ○ Similarities between all data objects
○ Useful for clustering and nearest neighbor algorithms ○ Useful for clustering and recommendation systems
○ Symmetric, with values reflecting dissimilarities ○ Often symmetric, higher values: stronger similarities
45
Examples of Similarity and Distance measures

● Measures for numerical vectors

○ Euclidean Distance
○ Minkowski Distance
○ Cosine Similarity
○ Linear correlation

● Measures for binary vectors

○ Simple Matching Coefficient (SMC)
○ Jaccard Coefficient

46
Euclidean Distance

● : number of attributes.
● , , : : kth attributes for objects x and y, respectively.

Standardization is necessary, if scales differ.

47
Minkowski Distance

● Generalization of Euclidean Distance.

● r : parameter
● n: number of attributes
● xk and yk are, respectively, the kth attributes of objects x and y.
● The hyperparameters r allows to adapt the distance to the characteristics of data.

48
Special Cases of Minkowski Distance
● r = 1:
○ Called L1 norm or Manhattan distance.
○ Ideal for measuring distances in grid-like paths.
○ Binary vector example: Hamming distance counts differing bits.

● r = 2:
○ Called L2 norm or Euclidean distance.
○ The most commonly used distance metric.
○ Ideal for measuring the straight-line distance in Euclidean space.

● r → ∞:
○ Called Lmax norm or Chebyshev distance.
○ Calculates the maximum difference between any component of vectors.
○ Ideal when movement is unrestricted in any direction, e.g. king movement in chess

49
Special Cases of Minkowski Distance
● r = 1:
○ Called L1 norm or Manhattan distance.
○ Ideal for measuring distances in grid-like paths.
○ Binary vector example: Hamming distance counts differing bits.

● r = 2:
○ Called L2 norm or Euclidean distance.
○ The most commonly used distance metric.
○ Ideal for measuring the straight-line distance in Euclidean space.

50
Special Cases of Minkowski Distance
● r = 1:
○ Called L1 norm or Manhattan distance.
○ Ideal for measuring distances in grid-like paths.
○ Binary vector example: Hamming distance counts differing bits.

● r = 2:
○ Called L2 norm or Euclidean distance.
○ The most commonly used distance metric.
○ Ideal for measuring the straight-line distance in Euclidean space.

51
Cosine Similarity

● A . B is the dot product of the two vectors.

● The dot product also represents the cosine of the angle between the two vectors.
● Non-sensitive to magnitudes, focusing on orientation.
● Values are between -1 and 1: -1 (completely dissimilar)
1 (perfectly similar)
0 means orthogonal (no similarity)
52
Linear correlation

● Measure the linear relationship between two variables.

● Evaluates how well one variable predicts another one.

● Values are between -1 and 1:

○ 1 (perfect positive correlation)
○ 0 (zero correlation, i.e. no linear relationship)
○ -1 (perfect negative correlation)
53
Examples of Similarity and Distance measures

● Measures for numerical vectors

○ Euclidean Distance
○ Minkowski Distance
○ Cosine Similarity
○ Linear correlation

● Measures for binary vectors

○ Simple Matching Coefficient (SMC)
○ Jaccard Coefficient

54
Simple Matching Coefficient (SMC)
● The number of matches divided by the total number of attributes.
● It is designed for symmetric binary attributes.

SMC = (f11 + f00) / (f01 + f10 + f00 + f11)

● f01 = the number of attributes where x was 0 and y was 1

● f10 = the number of attributes where x was 1 and y was 0
● f00 = the number of attributes where x was 0 and y was 0
● f11 = the number of attributes where x was 1 and y was 1

Example: Two persons represented by binary vectors, x=[0,0,1], y=[0,1,1]

Each element represents a symmetric attribute: marital status, smoking status, pet ownership.
SMC = 2/3 = 0.667
55
Jaccard Coefficient (J)
● The ratio of shared 1 values to the total number of 1 values across both sets.
● It is designed for asymmetric binary attributes.

J = f11 / (f01 + f10 + f11)

● f01 = the number of attributes where x was 0 and y was 1

● f10 = the number of attributes where x was 1 and y was 0
● f00 = the number of attributes where x was 0 and y was 0
● f11 = the number of attributes where x was 1 and y was 1

Example: Two persons’ buying in a market represented by binary vectors, x=[0,0,1], y=[0,1,1]
Each element represents an asymmetric attribute: whether a person bought an item in a market.
J = 1/2 = 0.5
56
Similarity, Distance, and Attribute type

Similarity/Distance between two objects, x and y, with only one attribute:

57
How to Choose the Similarity/Distance Measure?
The choice of the right measure depends on the domain:

● Comparing two documents using word presence

○ Proximity Measure: Jaccard Coefficient
○ Similarity: Two documents are similar if they share a high number of common words.

● Comparing geographical locations of two cities

○ Proximity Measure: Euclidean Distance
○ Similarity: Two city locations are similar if they are close to each other by distance

● Comparing two time series of temperature (Celsius)

○ Proximity Measure: Cosine Similarity
○ Similarity: Two time series are similar if their “pattern” is similar (they vary the same way over time)

● Measuring the relationship between study hours and exam marks

○ Proximity Measure: Linear Correlation
○ Similarity: A higher correlation coefficient indicates a stronger relationship, suggesting that as study
hours increase, exam marks tend to increase as well. 58

Inter Company Reconciliation Tool Processing&Configuration
80% (5)
Inter Company Reconciliation Tool Processing&Configuration
48 pages
Is Chapter 8
No ratings yet
Is Chapter 8
4 pages
Preview of Learn To Code With Baseball
No ratings yet
Preview of Learn To Code With Baseball
31 pages
Data Mining
No ratings yet
Data Mining
40 pages
Preprocessing
No ratings yet
Preprocessing
50 pages
Data Preparation
No ratings yet
Data Preparation
21 pages
Data Preprocessing 09112023 065121pm
No ratings yet
Data Preprocessing 09112023 065121pm
30 pages
Unit I
No ratings yet
Unit I
57 pages
Data Preprocessing
No ratings yet
Data Preprocessing
12 pages
Data Science unit I(LN and QB)
No ratings yet
Data Science unit I(LN and QB)
44 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
33 pages
2 Data Pre-Processing
No ratings yet
2 Data Pre-Processing
50 pages
Data Mining Chapter 2 Data Preprocessing
No ratings yet
Data Mining Chapter 2 Data Preprocessing
33 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
50 pages
AI351 Lecture 1
No ratings yet
AI351 Lecture 1
32 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
17 Data Analysis
No ratings yet
17 Data Analysis
64 pages
Data preprocessing (1)
No ratings yet
Data preprocessing (1)
77 pages
6-Significance of Exploratory Data Analysis, Making Sense of Data-06!02!2024
No ratings yet
6-Significance of Exploratory Data Analysis, Making Sense of Data-06!02!2024
85 pages
Unit 2 - Data Visualization Techniques
No ratings yet
Unit 2 - Data Visualization Techniques
101 pages
Kuliah 2 - Data Dan Eksplorasi Data
No ratings yet
Kuliah 2 - Data Dan Eksplorasi Data
61 pages
DWH m2p2
No ratings yet
DWH m2p2
8 pages
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
No ratings yet
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
49 pages
03 Data Science Process_Fall 23-24
No ratings yet
03 Data Science Process_Fall 23-24
38 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Machine Learning
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Machine Learning
35 pages
CL 2
No ratings yet
CL 2
85 pages
3 DSEngineering
No ratings yet
3 DSEngineering
64 pages
Week 2 - Data Quality
No ratings yet
Week 2 - Data Quality
43 pages
DM Preprocessing Lec4,5
No ratings yet
DM Preprocessing Lec4,5
36 pages
Que Es Datamin
No ratings yet
Que Es Datamin
52 pages
DM Chapter 3
No ratings yet
DM Chapter 3
60 pages
Chapter 2 Data Issues
No ratings yet
Chapter 2 Data Issues
21 pages
Machine Learning Lecture 4 data types
No ratings yet
Machine Learning Lecture 4 data types
21 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
Data Preprocessing
100% (1)
Data Preprocessing
109 pages
M2 PPT
No ratings yet
M2 PPT
60 pages
6 Data Preprocessing
No ratings yet
6 Data Preprocessing
37 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
Data Cleaning: Missing Values: - For Example in Attribute Income If
No ratings yet
Data Cleaning: Missing Values: - For Example in Attribute Income If
30 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
52 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
AIML Unit 2 Understanding Data
No ratings yet
AIML Unit 2 Understanding Data
51 pages
COS10022 - Lecture 03 - Data Preparation PDF
No ratings yet
COS10022 - Lecture 03 - Data Preparation PDF
61 pages
04 - ML - Data Preprocessing
No ratings yet
04 - ML - Data Preprocessing
13 pages
Module2 DataPreprocessing
No ratings yet
Module2 DataPreprocessing
27 pages
Correlation
No ratings yet
Correlation
14 pages
HIT391-week 3-New
No ratings yet
HIT391-week 3-New
43 pages
Data Pre-Processing: Data Preprocessing Describes Any Type of Processing Performed On Raw Data To Prepare It For
No ratings yet
Data Pre-Processing: Data Preprocessing Describes Any Type of Processing Performed On Raw Data To Prepare It For
57 pages
Pre Processing
No ratings yet
Pre Processing
68 pages
VIPDMTheoryChapter3
No ratings yet
VIPDMTheoryChapter3
87 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
JAVA Advanced 3
No ratings yet
JAVA Advanced 3
19 pages
Data Mining: Concepts and Techniques: - Chapter 3
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 3
52 pages
Week2_DataPreprocessing
No ratings yet
Week2_DataPreprocessing
43 pages
BIS 541 Ch03 20-21 S
No ratings yet
BIS 541 Ch03 20-21 S
86 pages
DM_merged
No ratings yet
DM_merged
169 pages
CH 2
No ratings yet
CH 2
36 pages
253777
No ratings yet
253777
66 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
Immediate Download PostgreSQL High Availability Cookbook 2nd Edition Shaun M. Thomas Ebooks 2024
100% (8)
Immediate Download PostgreSQL High Availability Cookbook 2nd Edition Shaun M. Thomas Ebooks 2024
70 pages
DE_Subhani_Resume_FL
No ratings yet
DE_Subhani_Resume_FL
2 pages
Assignment - Database Design and Development
100% (1)
Assignment - Database Design and Development
7 pages
Cyber Threat Intelligence Model: An Evaluation of Taxonomies, Sharing Standards, and Ontologies Within Cyber Threat Intelligence
No ratings yet
Cyber Threat Intelligence Model: An Evaluation of Taxonomies, Sharing Standards, and Ontologies Within Cyber Threat Intelligence
8 pages
ETL Vs ELT
No ratings yet
ETL Vs ELT
13 pages
Case Study On Amazon EC2
No ratings yet
Case Study On Amazon EC2
5 pages
Agichtein CV
No ratings yet
Agichtein CV
17 pages
M.tech 1-II Syllabus JNTUGV
No ratings yet
M.tech 1-II Syllabus JNTUGV
6 pages
Download (Ebook) Text Analytics with Python - A Practical Real-World Approach to Gaining Actionable Insights from Your Data by Dipanjan Sarkar ISBN 9781484223871, 9781484223888, 148422387X, 1484223888 ebook All Chapters PDF
100% (11)
Download (Ebook) Text Analytics with Python - A Practical Real-World Approach to Gaining Actionable Insights from Your Data by Dipanjan Sarkar ISBN 9781484223871, 9781484223888, 148422387X, 1484223888 ebook All Chapters PDF
65 pages
IV Sem IT-Module-4-Sampling Distribution - QB
No ratings yet
IV Sem IT-Module-4-Sampling Distribution - QB
5 pages
Cs Practicals
No ratings yet
Cs Practicals
38 pages
DBMS Ans
No ratings yet
DBMS Ans
13 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
SQL Notes Basic To Advanced (SQL Clauses)
No ratings yet
SQL Notes Basic To Advanced (SQL Clauses)
10 pages
PL SQL
No ratings yet
PL SQL
42 pages
Lock-Based Protocols: 1. Exclusive (X) Mode. Data Item Can Be Both Read As Well As
No ratings yet
Lock-Based Protocols: 1. Exclusive (X) Mode. Data Item Can Be Both Read As Well As
10 pages
SQL Workbench Download & Installation
No ratings yet
SQL Workbench Download & Installation
46 pages
Google Cloud Analytics Lakehouse
No ratings yet
Google Cloud Analytics Lakehouse
47 pages
Maico Database Manual
No ratings yet
Maico Database Manual
11 pages
Exceptions
No ratings yet
Exceptions
5 pages
PTS Workshop System Info and QuickLinks PDF
No ratings yet
PTS Workshop System Info and QuickLinks PDF
5 pages
Data Mining
No ratings yet
Data Mining
27 pages
Shiraz - Hussain MSC Oci Itil Oce Ocp Mcsa
No ratings yet
Shiraz - Hussain MSC Oci Itil Oce Ocp Mcsa
5 pages
Database Management ALL PYQ {Pran Tehare}
No ratings yet
Database Management ALL PYQ {Pran Tehare}
21 pages
A Face Recognition Based Smart Student Attendance and Activeness Monitoring System
No ratings yet
A Face Recognition Based Smart Student Attendance and Activeness Monitoring System
75 pages
Enterprise Search
No ratings yet
Enterprise Search
2 pages
Upload Catalogue Item Images in Iprocurement
No ratings yet
Upload Catalogue Item Images in Iprocurement
42 pages