0% found this document useful (0 votes)
214 views

Statistic - With Python PDF

The document provides details on conducting a statistical analysis of housing price data. Descriptive statistics are calculated for the overall SalePrice variable including measures of central tendency, dispersion, outliers and distribution. Descriptive statistics are also calculated separately for each year between 2006-2010. Hypothesis testing is performed to compare the mean SalePrice between 2008 and 2009. An additional section outlines a proposed A/B test experiment for Snapchat to test the impact of a new design layout on various engagement metrics.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
214 views

Statistic - With Python PDF

The document provides details on conducting a statistical analysis of housing price data. Descriptive statistics are calculated for the overall SalePrice variable including measures of central tendency, dispersion, outliers and distribution. Descriptive statistics are also calculated separately for each year between 2006-2010. Hypothesis testing is performed to compare the mean SalePrice between 2008 and 2009. An additional section outlines a proposed A/B test experiment for Snapchat to test the impact of a new design layout on various engagement metrics.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Homework Statistik

In [12]:

import pandas as pd
import numpy as np
import scipy.stats as st
from scipy import stats
import statsmodels.stats.proportion as sp

## Library visualisasi
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

# please import as much as you need

In [13]:

# read your data

df = pd.read_csv('train.csv')

Basic

tunjukkan statistik deskriptif untuk variable harga rumah


In [14]:

# your code goes here!


##Measurement of Tendency
Descibe = df['SalePrice'].describe().transpose()
plt.hist(df['SalePrice'])
plt.show()

### Mendefinisikan kolom


x = df['SalePrice']

### Mean and Median


print('Rata-rata : ',x.mean())
print('Median : ',x.median())

### Modus
Modus = x.mode()[0]
print ('Modus: ',Modus)

### Percentil
Percentile_90 = x.quantile(0.9)
print (Percentile_90)

### Quartile
#### Q1, Q2, dan Q3
Q1 = x.quantile(0.25)
Q2 = np.quantile(x,0.5)
Q3 = np.percentile(x,75)
print ('Quartile_1 :', Q1)
print ('Quartile_2 :', Q2)
print ('Quartile_3 :', Q3)

### Variance
variance = x.var()
print ('Variance :', variance)

### Standard Deviasion


std = x.std ()
print ('Standard Deviasion :', std)

### Range
maks = x.max()
mini = x.min()
Range = maks - mini
print ('Jangkauan :', Range)

### Interquartilerange (IQR)


IQR = Q3 - Q1
print('Inter Quartile Range (IQR):',IQR)

### Boxplot
plt.figure(figsize = (5,7))
sns.boxplot(data = x)
Rata-rata : 180921.19589041095
Median : 163000.0
Modus: 140000
278000.0
Quartile_1 : 129975.0
Quartile_2 : 163000.0
Quartile_3 : 214000.0
Variance : 6311111264.297451
Standard Deviasion : 79442.50288288663
Jangkauan : 720100
Inter Quartile Range (IQR): 84025.0

Out[14]:

<matplotlib.axes._subplots.AxesSubplot at 0x1c007666370>

Statistik Deskriptif untuk harga rumah setiap tahunnya


In [15]:

# your code goes here!


def statistik_deskriptif(tahun):
##menghitung statistik deskriptif
rata_rata = tahun.mean()
median = tahun.median()
modus = tahun.mode()[0]
Percentile_90 = tahun.quantile(0.9)
Q1 = tahun.quantile(0.25)
Q2 = np.quantile(tahun,0.5)
Q3 = np.percentile(tahun,75)
variance = tahun.var()
std = tahun.std ()
Range = tahun.max() - tahun.min()
IQR = tahun.quantile(0.75) - tahun.quantile(0.25)
#Menyiapkan wadah untuk menampung hasil perhitungan
result = {}
result ['rata-rata'] = rata_rata
result ['median'] = median
result ['modus'] = modus
result ['Percentile_90'] = Percentile_90
result ['Quartile_1'] = Q1
result ['Quartile_2'] = Q2
result ['Quartile_3'] = Q3
result ['variance'] = variance
result ['Standard Deviasion'] = std
result ['Jangkauan'] = Range
result ['Inter Quartile Range'] = IQR
#mengeluarkan hasil
return(result)
## mendefinisikan data tiap tahun
tahun_2006 = (df[df['YrSold'] == 2006])
tahun_2007 = (df[df['YrSold'] == 2007])
tahun_2008 = (df[df['YrSold'] == 2008])
tahun_2009 = (df[df['YrSold'] == 2009])
tahun_2010 = (df[df['YrSold'] == 2010])

In [16]:

Statistik_2006 = pd.DataFrame(statistik_deskriptif(tahun_2006 ['SalePrice']),index=[0])


Statistik_2006

Out[16]:

rata-rata median modus Percentile_90 Quartile_1 Quartile_2 Quartile_3

0 182549.458599 163995.0 140000 275000.0 131375.0 163995.0 218782.5 6.308


In [17]:

Statistik_2007 = pd.DataFrame(statistik_deskriptif(tahun_2007 ['SalePrice']),index=[0])


Statistik_2007

Out[17]:

rata-rata median modus Percentile_90 Quartile_1 Quartile_2 Quartile_3 va

0 186063.151976 167000.0 129000 290000.0 129900.0 167000.0 219500.0 7.35617

In [18]:

Statistik_2008 = pd.DataFrame(statistik_deskriptif(tahun_2008 ['SalePrice']),index=[0])


Statistik_2008

Out[18]:

rata-rata median modus Percentile_90 Quartile_1 Quartile_2 Quartile_3

0 177360.838816 164000.0 140000 271000.0 131250.0 164000.0 207000.0 4.863

In [19]:

Statistik_2009 = pd.DataFrame(statistik_deskriptif(tahun_2009 ['SalePrice']),index=[0])


Statistik_2009

Out[19]:

rata-rata median modus Percentile_90 Quartile_1 Quartile_2 Quartile_3 v

0 179432.10355 162000.0 110000 275900.0 125250.0 162000.0 212750.0 6.5414

In [20]:

Statistik_2010 = pd.DataFrame(statistik_deskriptif(tahun_2010 ['SalePrice']),index=[0])


Statistik_2010

Out[20]:

rata-rata median modus Percentile_90 Quartile_1 Quartile_2 Quartile_3

0 177393.674286 155000.0 128000 264900.0 128100.0 155000.0 213250.0 6.472

Advanced
Penjelasan distribusi harga rumah
Hint: dapat ditunjukan melalui distribusi data
In [21]:

# your code goes here!

print('Rata-rata : ',df['SalePrice'].mean())
print('Median : ',df['SalePrice'].median())

plt.figure(figsize = (7,4))
sns.distplot(df['SalePrice'])
plt.show()
plt.tight_layout()

plt.figure(figsize = (5,7))
sns.boxplot(data = df['SalePrice'])
plt.show()
plt.tight_layout()
Rata-rata : 180921.19589041095
Median : 163000.0

<Figure size 432x288 with 0 Axes>


<Figure size 432x288 with 0 Axes>

Dari grafik diatas menunjukkan distribusi negarif skewd dimana nilai median lebih kecil dari nilai mean/rata-
ratanya. sehingga dalam analisa pada data ini nilai penggunaan nilai median cenderung lebih representaif
dibandingkan dengan pengguanaan mean

Pengujian Hipotesis pada Rata-rata Harga Rumah Tahun 2008


dan 2009

$\textbf{Mendefinisikan H0 dan H1}$


Apakah rata-rata harga rumah tahun 2008 sama dengan rata-rata harga rumah pada tahun 2009
H0 : Rata-rata harga rumah tahun 2008 tidak sama dengan harga rumah tahun 2009
H1 : Rata-rata harga rumah tahun 2008 sama dengan harga rumah tahun 2009

In [22]:

##Langkah 1
tahun_2008 = df[df['YrSold'] == 2008]
tahun_2009 = df[df['YrSold'] == 2009]

###Langkah 2
print('Rata-rata Tahun 2008 : ',tahun_2008['SalePrice'].mean())
print('Rata-rata Tahun 2009 : ',tahun_2009['SalePrice'].mean())

###Langkah 3
#### Masukkan data kedalam fungsi Ttest
ttest = st.ttest_ind(a = tahun_2008['SalePrice'], b=tahun_2009['SalePrice'])
p_value = ttest.pvalue

print('P-Value :', p_value)


if p_value >= 0.05:
print('Tidak cukup bukti menyatakan bahwa rata-rata harga rumah tahun 2008 sama den
gan rata-rata harga rumah tahun 2009')
else:
print('Cukup bukti untuk menyatakan bahwa rata-rata harga rumah tahun 2008 sama den
gan rata-rata harga rumah tahun 2009')

Rata-rata Tahun 2008 : 177360.83881578947


Rata-rata Tahun 2009 : 179432.10355029587
P-Value : 0.7297119988122992
Tidak cukup bukti menyatakan bahwa rata-rata harga rumah tahun 2008 sama d
engan rata-rata harga rumah tahun 2009

Soal Pengembangan (Additional)

A B Test Tampilan SnapChat's

Define an Experiment
Nama Eksperimen
Pengujian tampilan baru snapchats

Mendefinisikan Hipotesis
desain tampilan baru snapchat meningkatkan intensitas membuka aplikasi

Participant
Semua User Snapchats

Variabel yang Akan Diuji


Old Design & New Design
Define Metrics
Macroconversions
login rate/jumlah mengunjungi apps

Microconversions
Upload photo profile

Vanity Metrics
intensitas chat/lama waktu membuka aplikasi

Define The Duration


Ukuran Sample
Ditentukan berdasarkan rumus slovin terhadap populasi pengguna snapchat
Seasonal Effect
Selama 1 minggu untuk mengetahui seasonal effect pengguna ketika weekand dan weekdays

Analisis Berkala
Dilakukan analisis berkala dengan
80:20 (desain baru ditetapkan pada 20 % pengguna) kemudian dilihat effeknya
selanjutnya
50:50 (new design diterapkan pada 50 % penguna) kemudian dilihat effeknya

Post Test
dilakukan T test untuk melihat apakah desain baru menaikkan tingkat user mngunjungi aplikasi

You might also like