Lab 04 Handout
Lab 04 Handout
Lab 04 Handout
I N T R O D U C T I O N T O S TAT I S T I C S I N P Y T H O N
Maggie Matsui
Content Developer, DataCamp
What can statistics do?
What is statistics?
The field of statistics - the practice and study of collecting and analyzing data
How many occupants will your hotel have? How can you optimize occupancy?
How many sizes of jeans need to be manufactured so they can fit 95% of the population?
Should the same number of each size be produced?
import numpy as np
np.mean(car_speeds['speed_mph'])
40.09062
single 188
married 143
divorced 124
dtype: int64
Maggie Matsui
Content Developer, DataCamp
Relationships between two variables
x = explanatory/independent variable
y = response/dependent variable
0.751755
msleep['sleep_rem'].corr(msleep['sleep_total'])
0.751755
x̄ = mean of x
σx = standard deviation of x
n
(xi − x̄)(yi − ȳ )
r=∑
σx × σy
i=1
Maggie Matsui
Content Developer, DataCamp
Non-linear relationships
r = 0.18
df['x'].corr(df['y'])
0.081094
0.3119801
sns.lmplot(x='log_bodywt',
y='awake',
data=msleep,
ci=None)
plt.show()
msleep['log_bodywt'].corr(msleep['awake'])
0.5687943
Reciprocal transformation ( 1 / x )
sqrt(x) and 1 / y
Linear regression