Lec 7 Data Visualization Basic Statistics Updated 21102024 122008pm
Lec 7 Data Visualization Basic Statistics Updated 21102024 122008pm
science
Example 1
• Create a new feature called "Family Size" by combining the
SibSp(Siblings/Spouses) and Parch(parent/child) columns. How does
family size affect the survival rate? Use a bar plot to visualize the
survival rates based on different family sizes.
import pandas as pd
import matplotlib.pyplot as plt
# Step 1: Create the "Family Size" feature
df['Family Size'] = df['SibSp'] + df['Parch'] + 1
# Display the result
print(family_size_counts)
# Step 2: Calculate the survival rate for each family size
family_survival_rate = df.groupby('Family Size')['Survived'].count()
print (family_survival_rate)
# Step 3: Plotting the survival rates for different family sizes
plt.figure(figsize=(10, 6))
plt.bar(family_survival_rate.index, family_survival_rate.values,
color='skyblue')
plt.title('Survival Rate Based on Family Size')
plt.xlabel('Family Size')
plt.ylabel('Survival Rate')
plt.xticks(family_survival_rate.index) # To show each family size as
a tick
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
# Sample DataFrame creation (Replace this with your actual data loading)
# df = pd.read_csv('your_dataset.csv')
Calculates the mean (average) of the
# Step 1: Create the "Family Size" feature
"survived" column for each group. Since
df['Family Size'] = df['SibSp'] + df['Parch'] + 1
the values in "survived" are either 0 or 1,
the mean effectively represents the
# Step 2: Calculate the survival rate for each family size
proportion of survivors for each family
family_survival_rate = df.groupby('Family Size')['Survived'].mean()
size.
# Step 3: Plotting the survival rates for different family sizes
plt.figure(figsize=(10, 6))
plt.bar(family_survival_rate.index, family_survival_rate.values, color='skyblue')
plt.title('Survival Rate Based on Family Size')
plt.xlabel('Family Size')
plt.ylabel('Survival Rate')
plt.xticks(family_survival_rate.index) # To show each family size as a tick
plt.show()
counts the number of non-null entries in the
family_survival_rate = df.groupby('Family "Survived" column for each group (i.e., each
Size')['Survived'].count() family size).
Output:
November 2, 2024 22
Mean
November 2, 2024 23
Mean
November 2, 2024 24
Mean
November 2, 2024 25
Median
November 2, 2024 26
Medain
November 2, 2024 27
Median
November 2, 2024 28
Example
November 2, 2024 30
Mode
November 2, 2024 31
Example: mode of Grouped Data
November 2, 2024 32
Midrange
November 2, 2024 33
Example
November 2, 2024 34
Symmetric Data
November 2, 2024 35
Karl Pearson’s Co-efficient of Skewness
The formula for measuring Skewness using Karl Pearson’s
Co-efficient is in the below image
Example 1: Find the skewness for the given Data ( 2,4,6,6)
Solution:
Mean of Data = (2 + 4 + 6 + 6) / 4
= 18 / 4
= 4.5
Median of Data = [4+6]/2
= 10/2=5
S.D. = √[(4.5-2 )2 + (4.5-4)2 + (4.5-6)2 + (4.5-6)2/4]
= √[(6.25+0.25+2.25+2.25)/4]
= √1.658
= 1.1.658
Skewness = 3(Mean – Median)/S.D.
By Applying Skewness Formula,
Skewness = 3(4.5 – 5)/1.658
= 3(-0.5)/ 1.658
Skewness = – 0.904 So, the skewness of these data is negative.
Solve Example