0% found this document useful (0 votes)
58 views4 pages

2-Scatterplots and Correlation

This document discusses scatterplots and correlation. It explains that scatterplots show the relationship between two numeric variables by plotting each data point based on its values for each variable. It provides examples of creating scatterplots in matplotlib and seaborn that show negative and positive correlations. It also discusses fitting a regression line to the scatterplot and transforming the data when the relationship is not linear.

Uploaded by

Lionel Yepdieu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views4 pages

2-Scatterplots and Correlation

This document discusses scatterplots and correlation. It explains that scatterplots show the relationship between two numeric variables by plotting each data point based on its values for each variable. It provides examples of creating scatterplots in matplotlib and seaborn that show negative and positive correlations. It also discusses fitting a regression line to the scatterplot and transforming the data when the relationship is not linear.

Uploaded by

Lionel Yepdieu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Scatterplots and Correlation

classroom.udacity.com/nanodegrees/nd089/parts/8de94dee-7635-43b3-9d11-5e4583f22ce3/modules/c7f6b93a-
28e1-46b3-97c2-997a7eeddbf3/lessons/0491d74e-dcd8-4700-a971-a7f1b0a26ddb/concepts/9d1316b3-f339-4d52-
b63f-91994aefdd40

Watch Video At: https://youtu.be/wqMwTDVT9_Y

Watch Video At: https://youtu.be/wBDC5AmYgyg

Scatterplots

1/4
If we want to inspect the relationship between two numeric variables, the standard choice
of plot is the scatterplot. In a scatterplot, each data point is plotted individually as a
point, its x-position corresponding to one feature value and its y-position corresponding
to the second.

matplotlib.pyplot.scatter()
One basic way of creating a scatterplot is through Matplotlib's scatter function:

Example 1 a. Scatter plot showing negative correlation between two


variables

# TO DO: Necessary import

# Read the CSV file


fuel_econ = pd.read_csv('fuel_econ.csv')
fuel_econ.head(10)

# Scatter plot
plt.scatter(data = fuel_econ, x = 'displ', y = 'comb');
plt.xlabel('Displacement (1)')
plt.ylabel('Combined Fuel Eff. (mpg)')

In the example above, the relationship between the two variables is negative because as
higher values of the x-axis variable are increasing, the values of the variable plotted on the
y-axis are decreasing.

Alternative Approach - seaborn.regplot()


Seaborn's regplot() function combines scatterplot creation with regression function
fitting:

Example 1 b. Scatter plot showing negative correlation between two


variables

2/4
sb.regplot(data = fuel_econ, x = 'displ', y = 'comb');
plt.xlabel('Displacement (1)')
plt.ylabel('Combined Fuel Eff. (mpg)')

The basic function parameters, "data", "x", and "y" are the same for regplot as they are
for matplotlib's scatter .

The regression line in a scatter plot showing a negative correlation between the two
variables.

Example 2. Scatter plot showing a positive correlation between two


variables

Let's consider another plot shown below that shows a positive correlation between two
variables.

The regression line in a scatter plot showing a positive correlation between the two
variables.

3/4
In the scatter plot above, by default, the regression function is linear and includes a
shaded confidence region for the regression estimate. In this case, since the trend looks
like a \text{log}(y) \propto xlog(y)∝x relationship (that is, linear increases in the value of
x are associated with linear increases in the log of y), plotting the regression line on the
raw units is not appropriate. If we don't care about the regression line, then we could set
fit_reg = False in the regplot function call.

You can even plot the regression line on the transformed data as shown in the example
below. For transformation, use a similar approach as you've learned in the last lesson.

Example 3. Plot the regression line on the transformed data

def log_trans(x, inverse = False):


if not inverse:
return np.log10(x)
else:
return np.power(10, x)

sb.regplot(fuel_econ['displ'], fuel_econ['comb'].apply(log_trans))
tick_locs = [10, 20, 50, 100]
plt.yticks(log_trans(tick_locs), tick_locs);

Note - In this example, the x- and y- values sent to regplot are set directly as Series,
extracted from the dataframe.

Regression line on a scattered plot based on the log-transformed data

Supporting Materials

fuel_econ.csv

4/4

You might also like