Unit 4
Unit 4
Male 80 20 100
Female 70 30 100
Applications in Development:
• Bug Analysis: Understand patterns in bug occurrence, such as which severity levels remain
unresolved.
• Feature Usage: Analyze how frequently features are used across different user demographics or
time periods.
• Error Reporting: Categorize errors by type and frequency across modules or teams.
• A/B Testing: Compare user behavior under different conditions.
• Example Analysis:
• From the table above, we see that:
– High-severity bugs are less frequent but have a higher percentage of being unresolved (10/17
= 58.8% remain open).
– Medium-severity bugs have more resolved cases (10/30 = 33.3%).
– Low-severity bugs are mostly resolved or closed, indicating they are easier to handle.
• import pandas as pd
• # Example data
• data = {
• 'Severity': ['High', 'High', 'Medium', 'Low', 'Low', 'Medium', 'High'],
• 'Status': ['Open', 'Resolved', 'Open', 'Closed', 'Resolved', 'Open',
'Closed']
• }
• # Create a DataFrame
• df = pd.DataFrame(data)
• print(contingency_table)
Scatter Plots and Resistant Lines
• Scatter Plot
• A scatter plot is a graphical representation of the relationship between two
variables. Each point on the plot represents an observation, with its position
determined by the values of the two variables.
• X-axis: Represents the independent variable.
• Y-axis: Represents the dependent variable.
• Points: Represent observations, with coordinates (x, y).
• Scatter plots are often used to:
• Visualize relationships: Identify patterns, correlations, or clusters.
• Detect outliers: Spot points that deviate significantly from the general trend.
• Assess trends: Help determine if a relationship is linear, quadratic, or non-
linear.
• Example:
• If you are studying the relationship between hours studied (X) and exam
scores (Y), a scatter plot could help determine if more study hours generally
lead to better scores.
Resistant Line
• A resistant line is a robust statistical line fitted to a scatter plot that is less
affected by outliers compared to traditional regression lines. It is used to
summarize the central trend in the data.
• Characteristics:
• Resistant to Outliers: Unlike the least squares regression line (which
minimizes the squared deviations of points), resistant lines are not overly
influenced by extreme points.
• Approximation of Trends: Offers a more realistic summary of data when
outliers or non-uniform variance is present.
• Simpler Computation: Often calculated using medians or other resistant
measures.
• Construction:
• A common approach to constructing a resistant line is Median-Median Line
Fitting:
• Divide the data into three groups based on the x-values (low, middle, and
high).
• Compute the median of x-values and y-values for each group.
• Use the medians to compute a slope and intercept, forming the resistant line.
Transformations in Bivariate Analysis