Lecture2
Lecture2
3rd Year
Spring 2025
Lec. 2
➢ Transaction data
timeout
season
coach
game
score
team
ball
lost
pla
wi
n
y
TID Items
1 Bread, Coke, Milk
2 Beer, Bread Document 1 3 0 5 0 2 6 0 2 0 2
3 Beer, Coke, Diaper, Milk
Document 2 0 7 0 2 1 0 0 3 0 0
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk Document 3 0 1 0 0 1 2 2 0 3 0
➢ Molecular Structures
➢ Molecular structures refer to the arrangement of atoms within
a molecule, including the bonding patterns, spatial
positioning, and interactions that define the molecule's shape
and properties.
➢Examples:
➢sales database: customers, store items, sales
➢medical database: patients, treatments
➢university database: students, professors, courses
➢Also called samples , examples, instances, data points, objects, tuples
➢Data objects are described by attributes
➢Database rows → data objects; columns → attributes
Attributes
➢Attribute (or dimensions, features, variables)
➢A data field, representing a characteristic or feature of a data object.
➢E.g., customer _ID, name, address
➢Types: Attributes/features/dimensions/variables
Customer_ name address
➢Nominal (e.g., colors(red, blue), name,… )
ID
➢Binary (e.g., {true, false})
➢Ordinal (e.g., {freshman, sophomore,
junior, senior})
➢Numeric: quantitative (represented as numbers)
➢Length, weight,…
➢Interval-scaled:
(student’s grades (0 to 59 : F, 60 to 70 :D ,…))
➢Ratio-scaled:
ratio of students who got A (5/50) = 10%
➢Discrete (cannot be divided into parts), Continuous Attributes
(can be divided)
Attribute Types
String variable: letters and numbers
Central Tendency:
➢Try it yourself:
Two dice were thrown 10 times. For each throw, their scores were added together
and recorded.
7, 5, 2, 7, 6, 12, 10, 4, 8, 9
➢Try it yourself:
Two dice were thrown 10 times. For each throw, their scores were added together
and recorded.
7, 5, 2, 7, 6, 12, 10, 4, 8, 9
Mean= 7+5+2+7+6+12+10+4+8+9 = 70 = 7
10 10
Measuring the Central Tendency: Weighted Mean
➢Weighted arithmetic mean:
➢Sometimes each value x(i) in a set may be associated with a weight
w(i) n
for I = 1,……,N wi xi
x = i =1 n
w
i =1
i
➢ The weights reflect the significance, importance, or occurrence frequency attached to their
respective values.
Measuring the Central Tendency: Weighted Mean
➢Weighted arithmetic mean:
Try it yourself
Find weighted mean for following data set w = {2, 5, 6, 8, 9}, x = {4, 3,
7, 5, 6}
Measuring the Central Tendency: Weighted Mean
➢Weighted arithmetic mean:
Try it yourself
Find weighted mean for following data set w = {2, 5, 6, 8, 9}, x = {4, 3,
7, 5, 6}
Measuring the Central Tendency: Trimmed Mean
➢Trimmed mean:
➢Chopping extreme values
Measuring the Central Tendency: Trimmed Mean
➢Trimmed mean:
➢Try it Yourself
Measuring the Central Tendency: Trimmed Mean
➢Trimmed mean:
➢Try it Yourself
Measuring the Central Tendency: (2) Median
➢Median:
➢After sorting the values in the data set, it is the middle value if the data set is odd
numbers of values, or average of the middle two values otherwise
Measuring the Central Tendency: (2) Median
➢Median:
Measuring the Central Tendency: (2) Median
➢Median:
➢Try it yourself
Measuring the Central Tendency: (2) Median
➢Median:
➢Try it yourself
Measuring the Central Tendency: Median of large dataset
Calculate the mean and median household size (number of family members) in each family
Even number
(3/30)*100=10.0
(4/30)*100=13.3 30 students
(Freq/total cumulative frequency)*100 reflects
30 families
3+4
(2 is included)
3+4+10
(2 ,3 are included)
3+4+10+4
Median interval
= 60-40=20
To get the median: get the cumulative frequency and find the middle number of the whole employees. In this
example there are 120 employees, divided by 2 = 60, so the middle number is the 60th employee. This employee
exist in the cumulative frequency (77), so the median interval is (40$-60$)
Measuring the Central Tendency: Median of grouped data
Try it yourself
The following data represents the survey regarding the heights (in cm) of 51 girls of Class x. Find the median height.
Measuring the Central Tendency: Median of grouped data
Try it yourself
The following data represents the survey regarding the heights (in cm) of 51 girls of Class x. Find the median height.
Thank You