0% found this document useful (0 votes)
20 views29 pages

Meeting 5 - Data Analysis

Uploaded by

tastudies22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views29 pages

Meeting 5 - Data Analysis

Uploaded by

tastudies22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

ANALISIS DATA

PERTEMUAN 5
CORRELATION
COURSE CONTENT
 Introducing correlation
 Type of Correlation
 Method of Measuring Correlation
 Significance Test in Correlation Analysis
 Steps in Correlation Analysis
 Case Study Correlation Analysis : Lung Cancer
Dataset
COURSE LEARNING
OBJECTIVES
After attending this course, students will be able to :
 Explain the basic concepts of correlation analysis.
 Explain how to measure the strength and direction
of the relationship between variables using the
correlation coefficient.
 Recognize the types of correlation and when to
use each method.
 Interpret correlation analysis results in the context
of relevant problems.
WHAT IS CORRELATION?
 Correlation is a statistical method used to measure
the strength and direction of the relationship
between two or more variables.
 Correlation only demonstrates how closely two
variables are related to one another; it does not
establish a cause and effect relationship.
THE OBJECTIVE OF
CORRELATION ANALYSIS
 Find patterns of relationships between variables.
 Understand the strength and direction of the
relationship.
 Guiding decision-making.
TYPE OF CORRELATION
 Positive Correlation
• Both variables move in the same direction.
• Example: Increase in temperature, increase in ice cream
sales.
 Negative Correlation
• Variables move in opposite directions.
• Example: Increase in price, decrease in the number of
buyers.
 No Correlation
• No apparent relationship between variables.
• Example: the relationship between temperature in New
York and the price of tea in China.
METHOD OF
MEASURING
CORRELATION
 Pearson Correlation Coefficient
 Spearman and Kendall Correlation
PEARSON CORRELATION
COEFFICIENT
 Continuous variables with a normal or nearly
normal distribution.
 Measure the strength and direction of the linear
relationship between two variables.
 Range :
• From -1 to 1.
• 1 indicates a perfect positive correlation, -1
indicates a perfect negative correlation, and 0
indicates no correlation.
CALCULATE PEARSON
CORRELATION COEFFICIENT
SPEARMAN AND
KENDALL CORRELATION
 Spearman Correlation:
• Ordinal variables or continuous variables with a
non-normal distribution
• Measures the strength and direction of
monotonic relationships (not necessarily linear).
 Kendall Correlation:
• Ordinal variables or continuous variables with a
non-normal distribution.
• Measures the similarity of the orderings of the
data points in two variables.
CALCULATE SPEARMAN AND
KENDALL CORRELATION
COEFFICIENT
INTERPRETING
CORRELATION
 Strong (0.7 - 1.0): Very significant correlation.
 Moderate (0.4 - 0.7): Moderate correlation.
 Weak (0.1 - 0.4): Weak correlation.
 Very Weak (0 - 0.1): Almost no correlation.
SIGNIFICANCE TEST IN
CORRELATION ANALYSIS
 In correlation analysis, significance tests can be
performed to determine whether the correlation
coefficient calculated from the sample data is
statistically significant.
 The most common test used for this purpose is the
t-test for correlation coefficients, which helps
assess whether the observed correlation is likely to
be real or occurred by chance.
 Example : Is there a significant relationship
between advertising spending and sales?
PERFORMING
HYPOTHESIS TEST
STEPS IN CORRELATION
ANALYSIS
1. Understanding the Data
 Identify Variables: Determine which variables you want
to analyze for correlation.
 Data Type: Understand the data types of the variables
(numeric, categorical, etc.).
 Data Distribution: Examine the data distribution using
visualizations and descriptive statistics.
2. Setting the Analysis Goal
 Hypotheses: Plan the hypotheses you want to test. Do
you expect a positive, negative, or no correlation?
 Objective: Determine what you want to learn from the
correlation analysis, such as the relationship between
variable A and variable B.
STEPS IN CORRELATION
ANALYSIS
3. Selecting the Correlation Method
 Pearson Correlation: Used to measure the
linear relationship between two numerical
variables.
 Spearman Correlation: Used to measure the
monotonic relationship (not necessarily linear)
between numerical or ordinal variables.
 Kendall Correlation: Used to measure the
similarity of the orderings of data points
between two variables.
STEPS IN CORRELATION
ANALYSIS
4. Calculating Correlation
 Calculate Correlation Coefficient: Compute the
correlation coefficient using the appropriate
formula based on the chosen method.
 Interpret the Coefficient: Determine the
direction (positive/negative) and strength of the
correlation based on the coefficient value
(between -1 and 1).
STEPS IN CORRELATION
ANALYSIS
5. Data Visualization
 Scatter Plot: Plot a scatter plot to visualize the relationship
between two variables.
 Heatmap: If you're calculating correlations for multiple
variables, visualize all correlation coefficients at once using a
heatmap.
6. Significance Testing (Optional):
 Hypothesis Testing: Conduct significance tests (such as t-test)
to determine if the observed correlation is statistically
significant.
 Choose Significance Level: Select alpha value (usually 0.05) as
the threshold to determine whether the correlation is
significant.
STEPS IN CORRELATION
ANALYSIS
7. Interpreting the Results:
 Correlation Direction: Is it positive, negative, or no
correlation?
 Correlation Strength: How strong is the relationship based on
the coefficient value?
 Significance: If you performed significance testing, check if
the correlation is statistically significant.
8. Conclusion and Reporting:
 Summarize Conclusion: Provide a brief summary of your
findings.
 Report: If this analysis is part of a larger report, contextualize
the results and provide in-depth interpretations.
CASE STUDY : LUNG
CANCER DATASET
Step 1: Understanding the Data
import pandas as pd
# Read the dataset
data = pd.read_csv('lung-cancer.csv’)
# Display first few rows of the dataset
print(data.head())
# Check data types of variables
print(data.dtypes)
CASE STUDY : LUNG
CANCER DATASET
# Transform categorical variables (strings
or objects) to whole numbers (integers)
from sklearn.preprocessing import
LabelEncoder
lung_encoder = LabelEncoder()
data = data.copy()
data.LUNG_CANCER =
lung_encoder.fit_transform(data.LUNG_CA
NCER)
# Compute descriptive statistics
print(data.describe())
CASE STUDY : LUNG
CANCER DATASET
Step2 : Setting the Analysis Goal in Correlation Analysis
In this step, you define the specific correlation you want
to explore between variables. Let's consider a scenario
where the goal is to investigate the correlation between
alhcohol consumption and lung cancer patients.
2.1. Define the Hypothesis:
 Null Hypothesis (H0): There is no significant
correlation between alcohol consumption with lung
cancer patients.
 Alternative Hypothesis (H1): There is a significant
correlation between alcohol consumption with lung
cancer patients.
CASE STUDY : LUNG
CANCER DATASET
Step 3 : Selecting the Correlation Method
In this step, we use the Pearson method to calculate
the correlation coefficient between alcohol
consumption with lung cancer patients.
CASE STUDY : LUNG
CANCER DATASET
Step 4: Calculating Correlation
# Perform Pearson correlation hypothesis test
corr_coefficient, p_value = pearsonr(data['ALCOHOL CONSUMING'],
data['LUNG_CANCER'])

# Check for significance


if (corr_coefficient<=-1):
pearson_cor = "Negatif"
elif (corr_coefficient==0):
pearson_cor = "No correlation"
else:
pearson_cor = "positive"

print(f"The Pearson correlation between alcohol consumption and lung cancer


patients is : {pearson_cor}")
CASE STUDY : LUNG
CANCER DATASET
Step 4: Calculating Correlation
# Check for significance (considering alpha = 0.05)
if p_value < 0.05:
pvalue_status = "Correlation is significant."
else:
pvalue_status = "No significant correlationt"

print(f"Pearson Correlation Coefficient: {corr_coefficient}")

print(f"The Pearson correlation between alcohol consumption


and lung cancer patients is : {pvalue_status }")
CASE STUDY : LUNG
CANCER DATASET
Step 5: Data Visualization
Heatmap to see correlation between features
# Calculating Pearson correlation coefficient
correlation_matrix = data.corr()

# Creating a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True,
cmap='coolwarm', fmt=".2f")
plt.title('Pearson Correlation Coefficient Heatmap')
plt.show()
CASE STUDY : LUNG
CANCER DATASET
Step 6: Interpreting the Results
 Correlation direction : The Pearson correlation between
alcohol consumption and lung cancer patients is a very
significant correlation : positive .
 Correlation Strength : 0.28853280309173124
 Significance : Correlation is significant
COURSE ASSIGNMENT
1. Data Preparation
 Read the dataset using Python (use pandas).
 Understand the structure of the dataset, check for missing values,
and perform data cleaning if necessary.
2. Exploratory Data Analysis (EDA)
 Conduct data exploration to understand the distribution of
variables.
 Create data visualizations such as histograms, box plots, or scatter
plots for each variable.
3. Correlation Analysis
 Calculate the correlation coefficient between variables X, Y, and Z.
Determine the type of correlation (positive, negative, or no
correlation) based on the correlation coefficient value.
 Visualize correlations using scatter plots or heatmaps.
COURSE ASSIGNMENT
4. Significance Test
 Perform a significance test (use a T-Test or other
correlation test) to ensure that the correlation found is
statistically significant.
6. Presentation of Results
 Present your findings in the form of a report or
presentation.
 Use visualizations and graphs to explain the correlations
found.
 Make conclusions and recommendations based on the
correlation analysis you have conducted.

You might also like