Module 1 ML Chapter2

This document provides an overview of data understanding, types of data, and big data analytics frameworks. It discusses data characteristics, storage, preprocessing, and various types of analytics, including descriptive, diagnostic, predictive, and prescriptive analytics. Additionally, it covers data visualization techniques and statistical measures such as central tendency and dispersion.

Uploaded by

anudeep05062005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views56 pages

Module 1 ML Chapter2

Uploaded by

anudeep05062005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

MODULE 1

CHAPTER 2
UNDERSTANDING DATA – 1
UNDERSTANDING DATA – 1
Contents
• Introduction.
• Big Data Analysis Framework.
• Descriptive Statistics.
• Univariate Data Analysis and Visualization.
What is data?
• Data are facts
• Facts are in the form of numbers, audio, video, and image
• Need to analyze data for taking decisions
• Organizations store vast amounts of data (GB, TB, PB, EB).
• Data can be human-interpretable or computer-readable.
• Operational and Non-Operational Data
• Operational Data: Encountered in daily business procedures.
• Non-Operational Data: Used for decision-making.
• Processed data is meaningful and used for analysis.
Elements of Big Data
• Big data is characterized by:
• Volume: Large amounts of data (PB, EB).
• Velocity: Fast data arrival speeds.
• Variety: Different forms, functions, and sources of data.
• Veracity: Truthfulness and accuracy of data.
• Validity: Correctness for decision-making.
• Value: Importance of extracted insights for business decisions.
Types of Data
• Structured Data
• Stored in an organized manner (e.g., databases, SQL tables).
• Types include:
• Record Data: Organized as tables with rows and columns.
• Data Matrix: Numeric attributes arranged in multidimensional space.
• Graph Data: Represents relationships between objects (e.g., web pages and hyperlinks).
• Ordered Data: organized the data in order.
• Unstructured Data
• Includes images, video, audio, blogs, and textual documents.
• Estimated that 80% of data is unstructured.
• Semi-Structured Data
• Combines elements of structured and unstructured data.
• Examples: XML, JSON, RSS feeds, hierarchical data.
Data Storage and Representation
• Data stored in structures for analysis.
• Types:
• Flat Files
• CSV(Comma-Separated Values)
• In CSV files, values are separated by commas (","), Used in
spreadsheets, databases, and data analysis tools.
• TSV(Tab-Separated Values)
• In TSV files, values are separated by tabs (\t) instead of commas, Also
used in spreadsheets, databases, and data exchange between
applications.
Data Storage and Representation
• DBMS manages data efficiently.
• Types of databases:
• Transactional Database
• Time-Series Database
• Spatial Database
• World Wide Web (WWW)
• XML (eXtensible Markup Language)
• Data Stream
• RSS (Really Simple Syndication)
• JSON (JavaScript Object Notation)
Big Data Analytics and Types of Analytics
• Big data analytics helps businesses make decisions by analyzing data.
• It generates useful information and insights.
• Data analytics covers data collection, preprocessing, and analysis.
• It deals with the complete cycle of data management.
• Types of Data Analytics
1. Descriptive Analytics
2. Diagnostic Analytics
3. Predictive Analytics
4. Prescriptive Analytics
Types of Analytics
• Descriptive Analytics
• Describes the main features of the data.
• Deals with collected data and quantifies it.
• Focuses on descriptive statistics rather than inference.
• Diagnostic Analytics
• Answers the question: 'Why did something happen?'
• Finds cause-and-effect relationships in data.
• Example: If a product is not selling, diagnostic analytics identifies reasons.
Types of Analytics
• Predictive Analytics
• Answers the question: 'What will happen in the future?'
• Uses algorithms to predict future trends.
• Machine learning heavily relies on predictive analytics.
• Prescriptive Analytics
• Recommends the best course of action.
• Goes beyond prediction and aids decision-making.
• Helps organizations plan for the future and mitigate risks.
Big Data Analysis Framework
• Big data frameworks use a layered architecture for flexibility and scalability.
• This architecture simplifies data processing and management.
• Four primary layers make up the big data framework.
• The framework consists of four layers:
1. Data Connection Layer
2. Data Management Layer
3. Data Analytics Layer
4. Presentation Layer
Big Data Analysis Framework
• Data Connection Layer
• Ingests raw data into appropriate structures.
• Supports Extract, Transform, and Load (ETL) operations.
• Connects data from various sources for analysis.
• Data Management Layer
• Preprocesses data for analysis.
• Executes read, write, and management tasks.
• Enables parallel query execution and data warehousing.
Big Data Analysis Framework
• Data Analytics Layer
• Performs statistical tests and machine learning model construction.
• Supports various analytical functions for insights.
• Validates models to ensure data integrity.
• Presentation Layer
• Displays results through dashboards and reports.
• Provides insights using machine learning models.
• Facilitates interpretation and visualization for better decision-making.
Types of Processing
• Cloud Computing
• Cloud computing provides shared resources over the internet.
• Services include:
• SaaS (Software as a Service) – Allows users to access software
applications over the internet without needing to install them on their
devices. Example: Google Docs, Microsoft 365.
• PaaS (Platform as a Service) – Provides a platform for developers to
build, test, and deploy applications. Example: Google App Engine,
Microsoft Azure.
• IaaS (Infrastructure as a Service) – Offers virtualized computing
resources like servers, storage, and networking. Example: Amazon Web
Services (AWS), Google Cloud Platform.
Types of Processing
• Cloud Service Deployment Models
• Public Cloud – Managed by third-party providers and accessible to the
general public. Example: Google Cloud, AWS.
• Private Cloud – Used exclusively by a single organization, providing greater
security and control.
• Community Cloud – Shared infrastructure owned and used by multiple
organizations with common concerns (e.g., government institutions).
• Hybrid Cloud – A combination of two or more cloud models to balance
security, performance, and
Types of Processing
• Characteristics of Cloud Computing
• Shared Infrastructure – Computing resources are shared across multiple
users.
• Dynamic Provisioning – Resources are allocated based on demand.
• Dynamic Scaling – Services can expand or shrink according to user needs.
• Network Access – Cloud resources are accessed over the internet.
• Utility-Based Metering – Users are charged based on resource consumption.
• Multitenancy – Multiple users share cloud resources securely.
• Reliability – Ensures continuous and reliable services.
Types of Processing
• Grid Computing:
• Uses distributed networks for complex tasks.
• Connects multiple computers to act as a single supercomputer.
• Distributes tasks across nodes for parallel processing.
• Ideal for high-performance, large-scale applications.
• HPC (High-Performance Computing):
• Aggregates resources to solve complex problems quickly.
• Utilizes parallel processing across compute, network, and storage components.
• Enhances performance for scientific and engineering tasks.
Data Collection
• Good Data Characteristics
• Timeliness: Relevant and up-to-date.
• Relevancy: Ready for machine learning tasks.
• Knowledge: Understandable and interpretable.
• Data Source Types:
• 1. Open/Public Data (e.g., digital libraries, healthcare databases)
• 2. Social Media Data (e.g., Twitter, YouTube)
• 3. Multimodal Data (e.g., text, audio, video)
Data preprocessing
• In the real world, data is often 'dirty'. Dirty data includes:
• Incomplete data: Missing values in the dataset.
• Outlier data: Errors in the recorded data.
• Data with inconsistent values: Contradictory or logically incorrect data entries.
• Inaccurate data: Errors in the recorded data.
• Data with missing values: Attributes or records with missing information.
• Duplicate data: Repeated entries that can skew analysis.

• Data preprocessing improves the quality of data mining techniques. The raw data
must be preprocessed to provide accurate results. This process involves data
cleaning and wrangling to make data usable for machine learning.
Data preprocessing
• Examples of Bad Data
• Consider the following examples of bad
data:
• Missing Salary values
• Age recorded as '5' but Date of Birth
indicates otherwise
• Age of '136', likely a typographical
error
• Negative salary values, e.g., '-1500'
• Data Cleaning Process involves:
• Identifying and correcting errors
• Removing duplicate or irrelevant data
• Filling in missing values
• Correcting inconsistent data formats
Missing Data Analysis
• The primary data cleaning process is missing data analysis.
• Data cleaning routines attempt to fill up missing values, smooth noise, identify
outliers, and correct data inconsistencies.
• This helps data mining models avoid overfitting.
• Methods for Handling Missing Data
• Ignore the tuple
• Fill in values manually
• Use a global constant
• Attribute value substitution
• Class mean
• Predicted value
Missing Data Analysis
• Ignore the tuple:
• Ignore records with missing data, especially class labels.
• Effective only when missing data is minimal.

• Fill in values manually:

• Experts analyze and fill values manually.
• Time-consuming and impractical for large datasets.

• Use a global constant:

• Fill missing values with a constant (e.g., 'Unknown').
• May cause spurious results.
Missing Data Analysis
• Attribute value substitution:
• Replace missing value with an attribute's value.
• Example: Use average income for missing income.
• Class mean:
• Use mean value for each class to fill missing values.
• Predicted value:
• Predict the missing value using classification or decision trees.
Removal of Noisy or Outlier Data
• Noise is random error or variance and can be removed using binning
techniques:
• - Smoothing by means: Replace with bin mean
• - Smoothing by medians: Replace with bin median
• - Smoothing by bin boundaries: Replace with boundary values

• Binning helps in discretizing the data and smoothing noisy data.

Example of Binning
• Example dataset: S = {12, 14, 19, 22, 24, 26, 28, 31, 34}
• Bins of size 3:
• - Bin 1: 12, 14, 19
• - Bin 2: 22, 24, 26
• - Bin 3: 28, 31, 34

• Smoothing by means: {15, 15, 15}, {24, 24, 24}, {30.3, 30.3, 30.3}
• Smoothing by boundaries: {12, 12, 19}, {22, 22, 26}, {28, 28, 34}
Data Integration and Data transformation
• Data integration merges data from multiple sources into a single source, which
may lead to redundant data.
• Detect and remove redundancies arising from data integration.
• These operations (like normalization) enhance data mining algorithm performance
by transforming data into a processable format.
•Normalization:
•A preliminary stage of data conditioning.
•Scales attribute values to a range (e.g., 0 to 1) for better algorithm performance.
•Commonly used in neural networks.
•Normalization Procedures:
• Min-Max
• z-Score
Data Normalization
• MIN-MAX normalization
• Transforms data to the range 0-1

• z-Score calculates the number of standard deviations from the mean.

Data Reduction
• Data reduction reduces dataset size while maintaining performance.
• Techniques include:
• Data Aggregation
• Feature Selection
• Dimensionality Reduction
Descriptive Statistics
• Descriptive statistics summarize and describe data.
• It helps in understanding the nature of data.
• Includes techniques like Exploratory Data Analysis (EDA) and data
visualization.
• Dataset and Data Types
• A dataset is a collection of data objects.
• Attributes define the properties of objects.
• Data types are categorized as Categorical and Numerical.
Types of Data
• Categorical Data Types
• Nominal Data: Symbols without statistical value.
• Ordinal Data: Data with a natural order (e.g., Low, Medium, High).
• Numerical Data Types
• Interval Data: Numeric data with meaningful differences.
• Ratio Data: Numeric data where zero has meaning.
• Discrete vs Continuous Data
• Discrete Data: Integer-based, like survey responses.
• Continuous Data: Can have decimal points, like height and weight.
Types of Data
• Data Classification by Variables
• Univariate Data: Single variable.
• Bivariate Data: Two variables.
• Multivariate Data: Three or more variables.
Univariate Data Analysis and Visualization
• Univariate analysis is the simplest form of statistical analysis, involving only one
variable.
• It describes data, finds patterns, and explores frequency distributions, central
tendency measures, and variation.
• Univariate data analysis provides insights into data distribution, central tendency,
and variation.
• Data visualization helps to understand and present data effectively.
• Common types include bar charts, histograms, pie charts, frequency polygons, and
dot plots.
• Data visualization techniques such as bar charts, pie charts, histograms, and dot
plots make data interpretation easier.
Bar Chart
• Bar charts display frequency distributions for variables.
• They illustrate discrete data and help compare the frequency of different groups.
Pie Chart
• Pie charts represent frequency distributions as proportional sectors.
• They help visualize the relative sizes of different groups within a dataset.
Histogram
• Histograms show frequency distributions for grouped data.
• They can illustrate data distribution, mode, and skewness.
Dot Plot
• Dot plots represent data points with dots.
• They are less cluttered than bar charts and help identify individual values.
Central Tendency
• Central tendency is a summary statistic that represents the center point
of a dataset.
• It helps to simplify data analysis by focusing on key measures like the
mean, median, and mode.
• Mean
• Geometric Mean
• Median
• Mode
Mean
• Mean (Arithmetic Average) represents the center of the dataset.
• Calculated by summing all observations and dividing by the number of
observations.
• Formula: x̄ = (Σxᵢ)/N
• Example: Mean of 10, 20, and 30 is (10+20+30)/3 = 20.
• Weighted mean applies different weights to values based on their
importance.
Geometric Mean
• Geometric mean is the nth root of the product of n numbers.
• Formula: GM = (Πxᵢ)^(1/n)
• Example: GM of 6 and 8 is √(6×8) = √48.
• It can also be computed using logarithms.
Median and Mode
• The median is the middle value in a distribution.
• For an odd number of items, it's the middle item; for an even number,
it's the average of the two middle items.
• Formula for continuous data: Median = L₁ + [(N/2 - cf)/f] × i.
• Mode is the most frequently occurring value in a dataset.
• Applicable mainly to discrete data.
• Datasets can be unimodal, bimodal, or trimodal, based on the number
of modes present.
Dispersion
• Dispersion measures the spread of data around the central tendency.
• Range: Difference between the maximum and minimum values.
• Standard Deviation: Average distance from the mean, calculated using:
• σ = sqrt( Σ(xᵢ - x̄)² / (N - 1) )
• Quartiles and Inter Quartile Range (IQR)
• Quartiles divide data into four parts:
• Q₁: 25th percentile
• Q₂: 50th percentile (Median)
• Q₃: 75th percentile
• IQR = Q₃ - Q₁
• Outliers fall 1.5 × IQR above Q₃ or below Q₁.
Example 2.4: IQR Calculation
• Given the dataset: {12, 14, 19, 22, 24, 26, 28, 31, 34}
• - Median (Q₂): 24
• - Q₁ (median of lower half): 16.5
• - Q₃ (median of upper half): 29.5
• IQR = Q₃ - Q₁ = 29.5 - 16.5 = 13
Five-point Summary and Box Plots
• The five-point summary includes:
• - Minimum
• - First Quartile (Q1)
• - Median (Q2)
• - Third Quartile (Q3)
• - Maximum

• Box plots visualize the distribution and spread of the data.

Shape

• Skewness: Measures symmetry.

• - Positive skew: Tail to the right
• - Negative skew: Tail to the left
• Formula: (1/N) * Σ((xᵢ - μ)³ / σ³)
Shape
• **Kurtosis**: Measures the 'peakedness' of the data.
• - High kurtosis: Sharp peak
• - Low kurtosis: Flat peak
• Formula: (1/N) * Σ((xᵢ - x̄)⁴ / σ⁴)
Shape
• **Mean Absolute Deviation (MAD)**:
• - Measures deviation from the mean.
• Formula: (1/N) * Σ|xᵢ - μ|

• Coefficient of Variation (CV):

• - Compares datasets with different units.
• Formula: (σ / μ) * 100
Special Univariate Plots
• Stem-Leaf Plot
• A stem and leaf plot displays data distribution and shape by splitting values into a
stem and a leaf.
• The stem is the left part (leading digits), and the leaf is the right part (last digit).
• For example, marks like 45, 60, 80, and 85 can be represented in this plot.
• In the stem and leaf plot, the first column represents the stem, and the second
column represents the leaf.
• For the English marks, two students with 60 marks are shown in the plot as stem 6
with leaves 0 and 0.
• Stem and leaf plots help visualize data distribution easily.
• Q-Q Plot
• QQ plot is normality test. If data closer to straight line, then the distribution is
normal.
• A Q-Q plot is a 2D scatter plot that compares the quantiles of a dataset with the
theoretical quantiles of a normal distribution.
• If the points lie along a 45-degree line, the data follows a normal distribution.
• In the Q-Q plot, points should ideally lie on the 45-degree reference line.
• Significant deviations indicate a non-normal distribution.
• Q-Q plots assess normality by comparing sample and theoretical quantiles.
• These tools are essential for understanding univariate data distributions.
Thank you

Computer Basic MCQ
80% (5)
Computer Basic MCQ
86 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
33 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
#2 Data Science
No ratings yet
#2 Data Science
32 pages
BIG DATA ANALTICS (UNIT 1)
No ratings yet
BIG DATA ANALTICS (UNIT 1)
31 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
Unit - I - Types of Digital Data
No ratings yet
Unit - I - Types of Digital Data
45 pages
Unit 2 Data Gathering
No ratings yet
Unit 2 Data Gathering
14 pages
BDA Unit 1
No ratings yet
BDA Unit 1
36 pages
Data Processing
No ratings yet
Data Processing
5 pages
Insights Into Big Data: An Industrial Perspective
No ratings yet
Insights Into Big Data: An Industrial Perspective
52 pages
Chaoter Data Science
No ratings yet
Chaoter Data Science
20 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
EmTec Chapter 2 (1)
No ratings yet
EmTec Chapter 2 (1)
32 pages
KCA 034 - Unit 1
No ratings yet
KCA 034 - Unit 1
48 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Introduction to Data
No ratings yet
Introduction to Data
34 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Lecture 2
No ratings yet
Lecture 2
14 pages
DA-1,2,3[1]_merged
No ratings yet
DA-1,2,3[1]_merged
39 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
BUSINESS ANALYTICS NOTES
No ratings yet
BUSINESS ANALYTICS NOTES
31 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
Data Science and Big Data Analytics Unit 1 notes
No ratings yet
Data Science and Big Data Analytics Unit 1 notes
13 pages
Bda Unit 1
No ratings yet
Bda Unit 1
74 pages
BDA-Unit-1 (2)
No ratings yet
BDA-Unit-1 (2)
39 pages
Module 1
No ratings yet
Module 1
21 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
Introduction to Data Science
No ratings yet
Introduction to Data Science
29 pages
2 Data Science
No ratings yet
2 Data Science
27 pages
BIG DATA Module 1
No ratings yet
BIG DATA Module 1
16 pages
File 1
No ratings yet
File 1
3 pages
PPT 1.1.2
No ratings yet
PPT 1.1.2
17 pages
Chapter 2. Introduction to Data Science
No ratings yet
Chapter 2. Introduction to Data Science
41 pages
ict Ch. 2
No ratings yet
ict Ch. 2
38 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
FALLSEM2024-25_SWE2011_ETH_VL2024250103282_2024-07-15_Reference-Material-I
No ratings yet
FALLSEM2024-25_SWE2011_ETH_VL2024250103282_2024-07-15_Reference-Material-I
69 pages
Big Data Analytics-Report
No ratings yet
Big Data Analytics-Report
7 pages
Chapter 2 - Introduction to Data Science
No ratings yet
Chapter 2 - Introduction to Data Science
37 pages
Hamid Seminar Doc
No ratings yet
Hamid Seminar Doc
57 pages
Big Data Analytics Unit1
No ratings yet
Big Data Analytics Unit1
20 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
BD 1
No ratings yet
BD 1
15 pages
Module 1 - Data Science Introduction _Detailed
No ratings yet
Module 1 - Data Science Introduction _Detailed
131 pages
21CS71 IMP
No ratings yet
21CS71 IMP
29 pages
DA
No ratings yet
DA
10 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
UNIT-I DA
No ratings yet
UNIT-I DA
42 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
Big Data Analytics Unit-1
100% (2)
Big Data Analytics Unit-1
5 pages
BDA1-4 bunits
No ratings yet
BDA1-4 bunits
113 pages
Chapter 2 Data Science1
No ratings yet
Chapter 2 Data Science1
41 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Unit 1
No ratings yet
Unit 1
20 pages
UNIT 1
No ratings yet
UNIT 1
51 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
C TS414 2021 SAP S4HANA Implementation Consultant 1704806340
No ratings yet
C TS414 2021 SAP S4HANA Implementation Consultant 1704806340
6 pages
Overview Presentation - It Essentials - CCNA
No ratings yet
Overview Presentation - It Essentials - CCNA
37 pages
Unit - 1 & 3 (C# Programming BCA - 4)
No ratings yet
Unit - 1 & 3 (C# Programming BCA - 4)
30 pages
Scheduling Question & Solutions
No ratings yet
Scheduling Question & Solutions
2 pages
mg-dc-pdf-0246
No ratings yet
mg-dc-pdf-0246
4 pages
Commercial Printing Industry Overview
No ratings yet
Commercial Printing Industry Overview
12 pages
SMT32WB09xE Reference Manual
No ratings yet
SMT32WB09xE Reference Manual
924 pages
introjs.com-User Onboarding and Product Walkthrough Library
No ratings yet
introjs.com-User Onboarding and Product Walkthrough Library
4 pages
Rosendo TSA4
No ratings yet
Rosendo TSA4
13 pages
Date A Live in Order
No ratings yet
Date A Live in Order
47 pages
Advamnce
No ratings yet
Advamnce
14 pages
UNIT 01 – Abstract Windowing Toolkit (AWT)
No ratings yet
UNIT 01 – Abstract Windowing Toolkit (AWT)
12 pages
Vsphere Esxi Vcenter 802 Authentication Guide
No ratings yet
Vsphere Esxi Vcenter 802 Authentication Guide
191 pages
Anand Ganesh Polisetti
No ratings yet
Anand Ganesh Polisetti
3 pages
Salesforce Order of Execution
No ratings yet
Salesforce Order of Execution
2 pages
Ste CH 3
No ratings yet
Ste CH 3
78 pages
José Manuel Espinoza Rocha: (Mechanical Engineer)
No ratings yet
José Manuel Espinoza Rocha: (Mechanical Engineer)
1 page
802.11n Wireless PCI Express Adapter: Key Features
No ratings yet
802.11n Wireless PCI Express Adapter: Key Features
2 pages
Patillaje XBTZ968 - XBTZ915pdf
100% (1)
Patillaje XBTZ968 - XBTZ915pdf
1 page
Oracle Process Manufacturing (OPM) Operating Guide Lines
100% (1)
Oracle Process Manufacturing (OPM) Operating Guide Lines
13 pages
Doctor's Appointment Portal: Project Report On
No ratings yet
Doctor's Appointment Portal: Project Report On
62 pages
2011 Computer studies mark scheme p2
No ratings yet
2011 Computer studies mark scheme p2
4 pages
02 Vector Graphics
No ratings yet
02 Vector Graphics
50 pages
Pid Controllers: Full Featured
No ratings yet
Pid Controllers: Full Featured
15 pages
1st Progress Report
No ratings yet
1st Progress Report
6 pages
Information Technology For Commerce-1
No ratings yet
Information Technology For Commerce-1
49 pages
Using Python To Create A Data Processing App
No ratings yet
Using Python To Create A Data Processing App
3 pages
Algorithm Handout 24
No ratings yet
Algorithm Handout 24
6 pages
Major Topic: Spatial Analysis (Using Vector-Based Data Models)
No ratings yet
Major Topic: Spatial Analysis (Using Vector-Based Data Models)
10 pages

Module 1 ML Chapter2

Uploaded by

Module 1 ML Chapter2

Uploaded by

MODULE 1

• Fill in values manually:

• Use a global constant:

• Binning helps in discretizing the data and smoothing noisy data.

• z-Score calculates the number of standard deviations from the mean.

• Box plots visualize the distribution and spread of the data.

• Skewness: Measures symmetry.

• **Coefficient of Variation (CV)**:

You might also like

• Coefficient of Variation (CV):