AIML Unit 2 Understanding Data

The document discusses the fundamentals of data, including its definitions, characteristics, types, and sources, emphasizing the importance of data analysis for decision-making. It outlines the elements of big data such as volume, velocity, variety, veracity, validity, and value, and explains various data storage methods and preprocessing techniques. Additionally, it covers data analytics types and the big data processing cycle, highlighting the significance of data quality and integration in machine learning applications.

Uploaded by

kingtigerdhanu6903

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views51 pages

AIML Unit 2 Understanding Data

Uploaded by

kingtigerdhanu6903

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine

Learning
S. Sridhar and M. Vijayalakshmi
Chapter 2

Understanding of Data
What is Data?
• DATA ARE FACTS
• FACTS ARE IN THE FORM OF NUMBERS, AUDIO, VIDEO, AND IMAGE
• NEED TO ANALYZE DATA FOR TAKING DECISIONS.

• data sources like flat files, databases, or data warehouses.

• Operational data : normal business procedures and processes. For
example, daily sales data
• Non-operational data used for decision making.
Elements /Characteristics of Big Data
1.small-scale computer is called ‘small data’: volume is less
and can be stored and processed.
2.Big data, on the other hand, is a larger data whose
volume is much larger than ‘small data’
Elements /Characteristics of Big Data
1.Volume –GB,TB,PB,HB(1 Million TB)
2.Velocity – The fast arrival speed of data and its increase in data
volume is noted as velocity
3. Variety
➢ Form – There are many forms of data. Data types range from text, graph,
audio, video, to maps.
➢ Function – various sources like human conversations, transaction
records, and old archive data.
➢ Source of data –open/public data, social media data and multimodal
data.
4. Veracity of data –conformity to the facts, truthfulness,
believability, and confidence in data
5. Validity – Validity is the accuracy of the data for taking
decisions or for any other goals that are needed by the given
problem.
6. Value – Value is the characteristic of big data that indicates
the value of the information that is extracted from the data and
its influence on the decisions that are taken based on it.
The data quality of the numeric attributes is determined by factors like
precision, bias, and accuracy
➢Precision: closeness of repeated measurements. standard deviation
is used to measure the precision.
➢ Bias : systematic result due to erroneous assumptions of the
algorithms or procedures.
➢Accuracy is the degree of measurement of errors to the true value
of the quantity.
Data Sources
Types of Data
• STRUCTURED DATA
• SEMI-STRUCTURED DATA
• UNSTRUCTURED DATA
Structured Data
1. RECORD DATA:
➢ The measurements in matrix. Rows- entities, cases, or records.
➢ The columns -attributes, features, or fields.
➢ Label is the term that is used to describe the individual observations.
2.GRAPHICS DATA
➢ Relationships among objects. hyperlink
3. DATA MATRIX
➢ It is a variation of the record type ,numeric attributes. Matrix operations can be applied on these
data.
4.ORDERED DATA –
➢ Temporal data –with time.
➢ Sequence data – It is like sequential data but does not have time stamps. This data involves the
sequence of words or letters.
➢ Spatial data – It has attributes such as positions or areas. For example, maps are spatial data
where the points are related by location.
Unstructured Data
AN UNSTRUCTURED DATA CAN BE ANY ONE OF THE FOLLOWING –

• VIDEO, IMAGE, PROGRAMS

• BLOG DATA
• 80% OF ORGANIZATION DATA
Semi-Structured Data
A SEMI-STRUCTURED DATA CAN BE ANY ONE OF THE FOLLOWING –

• XML/JSON OBJECTS
• RSS FEEDS
• HIERARCHICAL RECORDS
Data Storage
❖Flat Files
I. Simplest and most commonly available data source.
II. Data is stored in plain ASCII or EBCDIC
III. Minor changes of data in flat files affect the results of the data mining
algorithms.
IV. Suitable only for storing small dataset and not desirable if the dataset
becomes larger.
• CSV files – CSV stands for comma-separated value files where the values are separated by
commas. These are used by spreadsheet and database applications. The first row may have
attributes and the rest of the rows represent the data.
• TSV files – TSV stands for Tab separated values files where values are separated by Tab. Both
CSV and TSV files are generic in nature and can be shared. There are many tools like Google Sheets
and Microsoft Excel to process these files.
Data Storage
❖ Database System
I. Database files and a database management system
II. original data and metadata.
III. Database administrator, query processing, and transaction
manager.

1.A transactional database: Each record is atransaction

2. Time-series database : log files
3.Spatial databases: vector(points,lines)
Data Storage
• OTHER TYPES
BIG DATA ANALYTICS AND TYPES OF ANALYTICS
Data analytics refers to the process of data collection, preprocessing and analysis.
1. Descriptive analytics:
❑Describing the main features of the data.
2. Diagnostic analytics:
❑ question – ‘Why?’.(casual analysis)
❑ the cause and effect of the events.
3. Predictive analytics:
❑ What will happen in future given this data?
4. Prescriptive analytics
❑ finding the best course of action for the business organizations. Prescriptive analytics
goes beyond prediction and helps in decision making by giving a set of actions
BIG DATA ANALYSIS FRAMEWORK
Is a layered architecture
Big Data processing cycle
Big Data processing cycle involves data management that consists of the following steps.
1. Data collection
2. Data preprocessing
3. Applications of machine learning algorithm
4. Interpretation of results and visualization of machine learning
algorithm
1.Data Collection

Time is spent for collection of good quality data. ‘Good data’ is one that has the
following properties:
1. Timeliness –
2. Relevancy –.
3. Knowledge about the data –
1.Data Collection
open/public data, social media data and multimodal data.
1. Open or public data source –
2. Social media –
3. Multimodal data -
2.Data Preprocessing
In real world, the available data is ’dirty’.
•Incomplete data • Inaccurate data
• Outlier data • Data with missing values
• Data with inconsistent values • Duplicate data
✓ Data preprocessing improves the quality of the data mining techniques.
✓ Raw data must be preprocessed to give accurate results.
✓ The process of detection and removal of errors in data → data cleaning.
✓ Making the data processable for ML algorithms→ Data wrangling
• Salary = ’ ’ is incomplete data.
• DoB of patients, John, Andre, and Raju, is the missing data.
• The age of David is ‘5’ but his DoB - 10/10/1980 -→ inconsistent data.
Outliers :characteristics that are different from other data and have very
unusual values. It might be a typographical error. It is often required to
distinguish between noise and outlier data.
Missing Data Analysis
1. Ignore the tuple
2. Fill in the values manually
3. A global constant can be used to fill in the missing attributes:’
Unknown’ or be ’Infinity’.
4. The attribute value may be filled by the attribute value.
5. Use the attribute mean for all samples belonging to the same
class.
6. Use the most possible value to fill in the missing value.
Removal of Noisy or Outlier Data
1. Noise is a random error or variance in a measured value.
2. Removed using binning -the given data values are sorted and distributed
into equal frequency bins. The bins are also called as buckets.
3. The binning method then uses the neighbor values to smooth the noisy
data.
❑ ‘smoothing by means’ mean of the bin removes the values of the bins
❑ ‘smoothing by bin medians’ where the bin median replaces the bin
values
❑ ‘smoothing by bin boundaries’ the closest bin boundary.
❑The maximum and minimum values are called bin boundaries.
Example
Example 2.1: Consider the following set: S = {12, 14, 19, 22, 24, 26, 28, 31, 34}. Apply
various binning techniques and show the result.
Solution: By equal-frequency bin method, the data should be distributed across bins. Let us
assume the bins of size 3, then the above data is distributed across the bins as shown below:
Bin 1 : 12 , 14, 19
Bin 2 : 22, 24, 26
Bin 3 : 28, 31, 32
By smoothing bins method, the bins are replaced by the bin means.
Bin 1 : 15, 15, 15
Bin 2 : 24, 24, 24
Bin 3 : 30.3, 30.3, 30.3
Using smoothing by bin boundaries method, the bins' values would be like:
Bin 1 : 12, 12, 19
Bin 2 : 22, 22, 26
Bin 3 : 28, 32, 32
Data Integration and Data Transformations
1. Merge data from multiple sources into a single data source. Lead to
redundant data.
2. Goal of data integration is to detect and remove redundancies that arise
from integration.
3. Normalization, the attribute values are scaled to fit in a range (say 0-1) to
improve the performance of the data mining algorithm.
4. Min-Max
5. z-Score
Min-Max Procedure
1. Each variable V is normalized by its difference with the minimum
value divided by the range to a new range, say 0–1.

Min and max are the minimum and maximum of the given data, new max
and new min are the minimum and maximum of the target range, say 0
and 1.
Consider the set: V = {88, 90, 92, 94}. Apply Min-Max procedure and
map the marks to a new range 0–1.
Solution: The minimum of the list V is 88 and maximum is 94. The new min and
new max are 0 and 1, respectively.
Marks {88, 90, 92, 94} are
mapped to the new range
{0, 0.33, 0.66, 1}. Thus,
the Min-Max
normalization range is
between 0 and 1.
z-Score Normalization
Difference between the field value and mean value, and
by scaling this difference by standard deviation of the
attribute.

Here, s is the standard deviation of the list V and m is the

mean of the list V.
Example 2.3: Consider the mark list V = {10, 20, 30},
convert the marks to z-score
Solution: The mean and Sample Standard deviation (s) values of the list
V are 20 and 10, respectively.

Hence, the z-score of the marks 10, 20, 30 are -1, 0 and 1, respectively.
2.4 DESCRIPTIVE STATISTICS
1. It is used to summarize and describe data.
Dataset and Data Types
1. A dataset can be assumed to be a collection of data objects.
2. The data objects may be records, points, vectors, patterns, events,
cases, samples or observations.
3. These records contain many attributes.
4. An attribute can be defined as the property or characteristics of an
object.
2.4 DESCRIPTIVE STATISTICS
1. It is used to summarize and describe data.
Dataset and Data Types
1. Every attribute should be associated with a value.
2. This process is called measurement.
3. The type of attribute determines the data types, often referred to as
measurement scale types.
1. Categorical or qualitative data
2. Numerical or quantitative data
Categorical or Qualitative Data
•Nominal Data –patient ID.
1. Data that can be categorized does not have numerical value.
2. Nominal data type provides only information but has no ordering among
data.
3. Only operations like (=, ≠) are meaningful for these data.
•Ordinal Data –
1. It provides enough information and has natural order.
2. Fever = {Low, Medium, High} is an ordinal data. Certainly, low is less
than medium and medium is less than high, irrespective of the value.
3. Any transformation can be applied to these data to get a new value.
Numeric or Qualitative Data
•Interval Data –
1. Interval data is a numeric data for which the differences between values
are meaningful.
2. For example, there is a difference between 30 degree and 40 degree.
3. Only the permissible operations are + and -.
•Ratio Data –
1. For ratio data, both differences and ratio are meaningful.
Another way of classifying the data is to classify it as:
1.Discrete Data This kind of data is recorded as integers.
2.Continuous Data It can be fitted into a range and includes decimal
point. For example, age is a continuous data. Though age appears to
be discrete data, one may be 12.5 years old and it makes sense.
Patient height and weight are all continuous data.

Third way of classifying the data is based on the number of variables

used in the dataset.
UNIVARIATE DATA ANALYSIS AND VISUALIZATION
1. The dataset has only one variable. A variable can be called
as a category.
2. Univariate does not deal with cause or relationships.
3. The aim of univariate analysis is to describe data and find
patterns.
4. It involves finding the frequency distributions, central
tendency measures, dispersion or variation, and shape of
the data.
2.5.1 Data Visualization
Bar Chart A Bar chart (or Bar graph) is used to display the frequency distribution for
variables. Bar charts are used to illustrate discrete data. The charts can also help to
explain the counts of nominal data. It also helps in comparing the frequency of different
groups. The bar chart for students' marks {45, 60, 60, 80, 85} with Student ID = {1, 2, 3,
4, 5} is shown below
Pie Chart These are equally helpful in illustrating the univariate data. The
percentage frequency distribution of students' marks {22, 22, 40, 40, 70, 70,
70, 85, 90, 90} is below.

It can be observed that the number of students with 22 marks are 2. The total
number of students are 10. So, 2/10 × 100 = 20% space in a pie of 100% is allotted
for marks 22 .
Histogram showing frequency distributions. The histogram for students’
marks {45, 60, 60, 80, 85} in the group range of 0-25, 26-50, 51-75, 76-100 is
given below in Figure 2.5. One can visually inspect from Figure 2.5 that the
number of students in the range 76-100 is 2.
Dot Plots less clustered as compared to bar charts. Dot plot of English
marks for five students with ID as {1, 2, 3, 4, 5} and marks {45, 60, 60, 80, 85}
is given. The advantage is that by visual inspection one can find out who got
more marks.
2.5.2 Central Tendency
1. Mean – Arithmetic average (or mean) is a measure of central tendency
that represents the ‘center’ of the dataset. Mathematically, the average of
all the values in the sample (population) is denoted as x. Let x1, x2, … , xN
be a set of ‘N’ values or observations, then the arithmetic mean is given as:

•Weighted mean –weighted mean gives different importance to all

items as the item importance varies.
Geometric mean – Let x1, x2, … , xN be a set of ‘N’ values or observations.
Geometric mean is the Nth root of the product of N items. The formula for computing
geometric mean is given as follows:

Here, n is the number of items and xi are values. In larger cases,

computing geometric mean is difficult. Hence, it is usually calculated
as:
2. Median – The middle value in the distribution is called median. If the total
number of items in the distribution is odd, then the middle value is called
median. A median class is that class where (N/2)th item is present. In the
continuous case, the median is given by the formula

Median class is that class where N/2th item is present. Here, i is the class interval of the
median class and L1 is the lower limit of median class, f is the frequency of the median
class, and cf is the cumulative frequency of all classes preceding median.

3. Mode – Mode is the value that occurs more frequently in the dataset.
the value that has the highest frequency is called mode.
2.5.3 Dispersion
The spreadout of a dataset around the central tendency (mean, median or mode) is
called dispersion. Dispersion is represented by various ways such as range, variance,
standard deviation, and standard error. These are second order measures.
1.Range -Difference between the maximum and minimum of values of the given list
of data.
2.Standard Deviation Standard deviation is the average distance from the mean of
the dataset to each point.

The formula for sample standard deviation is given by: Here, N is the size of the population, xi is
observation or value from the population and m is the population mean. Often, N – 1 is used instead
of N in the denominator of Eq. (2.8).
Quartiles and Inter Quartile Range
1. Percentiles are about data that are less than the coordinates by some
percentage of the total value.
2.kth percentile is the property that the k% of the data lies at or below Xi.
3.For example, median is 50th percentile and can be denoted as Q0.50. The
25th percentile is called first quartile (Q1) and the 75th percentile is called
third quartile (Q3).
4. Inter Quartile Range (IQR) is the difference between Q3 and Q1.
5.Outliers are normally the values falling apart at least by the amount 1.5 ×
IQR above the third quartile or below the first quartile.
Interquartile is defined by Q0.75 – Q0.25.
Example 2.4: For patients’ age list {12, 14, 19, 22, 24, 26, 28, 31, 34}, find the IQR.
Solution: The median is in the fifth position. In this case, 24 is the median. The first
quartile is median of the scores below the mean i.e., {12, 14, 19, 22}. Hence, it’s the
median of the list below 24. In this case, the median is the average of the second and third
values, that is, Q0.25 = 16.5.
Similarly, the third quartile is the median of the values above the median, that is {26, 28,
31, 34}.
So, Q0.75 is the average of the seventh and eighth score. In this case, it is 28 + 31/2 = 59/2
= 29.5.
Hence, the IQR using Eq. (2.10) is:
= Q0.75 – Q0.25
= 29.5-16.5 = 13
Five-point Summary and Box Plots The median, quartiles Q1 and Q3, and minimum
and maximum written in the order < Minimum, Q1, Median, Q3, Maximum > is known as
five-point summary
Example 2.5: Find the 5-point summary of the list {13, 11, 2, 3, 4, 8, 9}.
Solution: The minimum is 2 and the maximum is 13. The Q1, Q2 and Q3 are 3, 8 and 11,
respectively. Hence, 5-point summary is {2, 3, 8, 11, 13}, that is, {minimum, Q1, median,
Q3, maximum}. Box plots are useful for describing 5-point summary. The Box plot for
the set is given in Figure 2.7.
2.5.4 Shape
Skewness and Kurtosis (called moments) indicate the symmetry/asymmetry
and peak location of the dataset.
Skewness
The measures of direction and degree of symmetry are called measures of
third order. skewness should be zero as in ideal normal distribution.
More often, the given dataset may not have perfect symmetry (consider the
following Figure 2.8).
MAD and CV

Module 1 - BCS602 - Chapter 02
No ratings yet
Module 1 - BCS602 - Chapter 02
90 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
2 Data Pre-Processing
No ratings yet
2 Data Pre-Processing
50 pages
Preprocessing
No ratings yet
Preprocessing
50 pages
Data Preparation DM
No ratings yet
Data Preparation DM
26 pages
Data Analytics: Collection & Pre-processing
No ratings yet
Data Analytics: Collection & Pre-processing
16 pages
DWH m2p2
No ratings yet
DWH m2p2
8 pages
Data Preprocessing Techniques Overview
No ratings yet
Data Preprocessing Techniques Overview
66 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
85 pages
Understanding Data Attributes and Preprocessing
No ratings yet
Understanding Data Attributes and Preprocessing
12 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
25 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
52 pages
Data Cleaning: Missing Values: - For Example in Attribute Income If
No ratings yet
Data Cleaning: Missing Values: - For Example in Attribute Income If
30 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
25 pages
Data - Part 1
No ratings yet
Data - Part 1
58 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
Introduction to Data Mining
No ratings yet
Introduction to Data Mining
89 pages
M2 PPT
No ratings yet
M2 PPT
60 pages
Data Mining
No ratings yet
Data Mining
5 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Data Preprocessing 09112023 065121pm
No ratings yet
Data Preprocessing 09112023 065121pm
30 pages
Week2 DataPreprocessing
No ratings yet
Week2 DataPreprocessing
43 pages
Mod1 DM Part2
No ratings yet
Mod1 DM Part2
34 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
ML - Data - Preprocessing For Machine Learning
No ratings yet
ML - Data - Preprocessing For Machine Learning
44 pages
Data Mining Chapter 2 Data Preprocessing
No ratings yet
Data Mining Chapter 2 Data Preprocessing
56 pages
Data Preprocessing & Attributes
No ratings yet
Data Preprocessing & Attributes
33 pages
Data Mining and Preprocessing Guide
No ratings yet
Data Mining and Preprocessing Guide
40 pages
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
No ratings yet
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
49 pages
03 Data Preparation
No ratings yet
03 Data Preparation
28 pages
Pre Processing
No ratings yet
Pre Processing
68 pages
Unit II - Data Preprocessing and Classification RSK-1
No ratings yet
Unit II - Data Preprocessing and Classification RSK-1
115 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
Most Frequent Attribute in Data Analysis
No ratings yet
Most Frequent Attribute in Data Analysis
86 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
19 pages
Data Preprocessing Techniques
No ratings yet
Data Preprocessing Techniques
11 pages
OJCST Vol13 N2-3 P 78-81
No ratings yet
OJCST Vol13 N2-3 P 78-81
4 pages
AI351 Lecture 1
No ratings yet
AI351 Lecture 1
32 pages
Week 2 - Data Quality
No ratings yet
Week 2 - Data Quality
43 pages
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
No ratings yet
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
57 pages
Data Mining for Tech Enthusiasts
No ratings yet
Data Mining for Tech Enthusiasts
61 pages
Data Analytics & Visualization Guide
No ratings yet
Data Analytics & Visualization Guide
77 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
Cse2026 Module 1 & 2 Detailed Notes
No ratings yet
Cse2026 Module 1 & 2 Detailed Notes
185 pages
Introduction To Data Analysis
100% (1)
Introduction To Data Analysis
94 pages
Data Preprocessing
100% (1)
Data Preprocessing
109 pages
DM-2Preprocessing 2
No ratings yet
DM-2Preprocessing 2
61 pages
Data Preprocessing PDF
No ratings yet
Data Preprocessing PDF
57 pages
EDA Guide for Data Analysts
No ratings yet
EDA Guide for Data Analysts
35 pages
Estimasi Anggaran Biaya Google Adwords Iklan Website
No ratings yet
Estimasi Anggaran Biaya Google Adwords Iklan Website
54 pages
Module 2
No ratings yet
Module 2
42 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
52 pages
Data Mining: Concepts and Techniques: September 16, 2020 1
No ratings yet
Data Mining: Concepts and Techniques: September 16, 2020 1
46 pages
Lecture6a DataPreprocessing
No ratings yet
Lecture6a DataPreprocessing
52 pages
DAI101 4 Data Preparation
No ratings yet
DAI101 4 Data Preparation
45 pages
Unit 2: Big Data Analytics
No ratings yet
Unit 2: Big Data Analytics
45 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
Gitandgithub
No ratings yet
Gitandgithub
10 pages
DBMS Mini Project
No ratings yet
DBMS Mini Project
27 pages
Real Time Water Quality Monitoring
No ratings yet
Real Time Water Quality Monitoring
13 pages
Aniotbasedairpollutionmonitoringsystempptx1 221118120041 b1f79dd4
No ratings yet
Aniotbasedairpollutionmonitoringsystempptx1 221118120041 b1f79dd4
27 pages
Cns Unit-2
No ratings yet
Cns Unit-2
8 pages
AIML Unit 1 Chapter 3
No ratings yet
AIML Unit 1 Chapter 3
75 pages
Grade 11-STEM Blessed John of Rieti: Statistics and Probability
No ratings yet
Grade 11-STEM Blessed John of Rieti: Statistics and Probability
9 pages
Quantitative Data Analysis Steps
100% (3)
Quantitative Data Analysis Steps
3 pages
Combining Random Variables Answers
No ratings yet
Combining Random Variables Answers
4 pages
Descriptive Statistics Summer 24-25
No ratings yet
Descriptive Statistics Summer 24-25
39 pages
Chapter 3 Utilization of High Density...
No ratings yet
Chapter 3 Utilization of High Density...
9 pages
Chapter 1-Basic Statistical Concepts
No ratings yet
Chapter 1-Basic Statistical Concepts
30 pages
Correlation Regression
No ratings yet
Correlation Regression
20 pages
Understanding Averages & Statistics
No ratings yet
Understanding Averages & Statistics
11 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
39 pages
Averages and The Range AQA
No ratings yet
Averages and The Range AQA
10 pages
Full N Final Assignment Updated
No ratings yet
Full N Final Assignment Updated
32 pages
Test Bank For Business Statistics Communicating With Numbers 2nd Edition by Jaggia and Kelly ISBN 0078020557 9780078020551
100% (34)
Test Bank For Business Statistics Communicating With Numbers 2nd Edition by Jaggia and Kelly ISBN 0078020557 9780078020551
50 pages
Quantitative Methods: Pilani
No ratings yet
Quantitative Methods: Pilani
437 pages
Statistics Review for Students
No ratings yet
Statistics Review for Students
7 pages
Bma1104 Probability & Statistics I SB Main
No ratings yet
Bma1104 Probability & Statistics I SB Main
4 pages
Earthquake Magnitude Conversion Problem
No ratings yet
Earthquake Magnitude Conversion Problem
13 pages
CH 12 Prob 6 Solution (Cost Accounting by Matz and Usray)
No ratings yet
CH 12 Prob 6 Solution (Cost Accounting by Matz and Usray)
3 pages
N-Gain Pre Post
No ratings yet
N-Gain Pre Post
4 pages
Central Tendency and Dispersion Practice Questions
No ratings yet
Central Tendency and Dispersion Practice Questions
17 pages
3 Parameters Log Logistic Generalized Logistic Distribution
No ratings yet
3 Parameters Log Logistic Generalized Logistic Distribution
8 pages
Week 1.3 Basic Statistical Concepts
No ratings yet
Week 1.3 Basic Statistical Concepts
17 pages
Statistical Method and Application MCQs WITH ANSWERS (BCA-III)_compressed (1)
No ratings yet
Statistical Method and Application MCQs WITH ANSWERS (BCA-III)_compressed (1)
51 pages
Reliability Analysis Report
No ratings yet
Reliability Analysis Report
5 pages
Powerpoint Math 10
No ratings yet
Powerpoint Math 10
26 pages
Test Bank For Statistics For Business and Economics 7th Edition by Newbold
100% (4)
Test Bank For Statistics For Business and Economics 7th Edition by Newbold
54 pages
Sample Deliverables Statistical Testing Service
No ratings yet
Sample Deliverables Statistical Testing Service
5 pages
FTA-Module 1-Notes
No ratings yet
FTA-Module 1-Notes
24 pages
Kutlwanong - Promaths 2025 Mathematics Paper 2 Last Push (KZN)
No ratings yet
Kutlwanong - Promaths 2025 Mathematics Paper 2 Last Push (KZN)
66 pages
Statistical Tools and Techniques: College-Level Notes
No ratings yet
Statistical Tools and Techniques: College-Level Notes
14 pages
Excel Assignment Raw Data
No ratings yet
Excel Assignment Raw Data
9 pages

AIML Unit 2 Understanding Data

Uploaded by

AIML Unit 2 Understanding Data

Uploaded by

Machine

• data sources like flat files, databases, or data warehouses.

• VIDEO, IMAGE, PROGRAMS

1.A transactional database: Each record is atransaction

Here, s is the standard deviation of the list V and m is the

Third way of classifying the data is based on the number of variables

•Weighted mean –weighted mean gives different importance to all

Here, n is the number of items and xi are values. In larger cases,

You might also like