0% found this document useful (0 votes)
38 views9 pages

統計 python作業一

The document discusses analyzing financial data from Taiwanese communications companies. It imports necessary libraries, mounts Google Drive, changes directories, reads an Excel file, and samples 20 rows of data randomly with a seed of 25.

Uploaded by

hamburgerhenry13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views9 pages

統計 python作業一

The document discusses analyzing financial data from Taiwanese communications companies. It imports necessary libraries, mounts Google Drive, changes directories, reads an Excel file, and samples 20 rows of data randomly with a seed of 25.

Uploaded by

hamburgerhenry13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Python作業一

Group 25

In [ ]: from google.colab import drive


drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

In [ ]: !pip install numpy pandas matplotlib scipy

Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (1.25.2)


Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (2.0.3)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.7.1)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (1.11.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2024.1)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (4.50.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (24.0)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (3.1.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)

In [ ]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
from random import sample
from scipy.stats import ttest_1samp
import matplotlib.pyplot as plt
import numpy as np

In [ ]: %cd drive/MyDrive/for_colab
[Errno 2] No such file or directory: 'drive/MyDrive/for_colab'
/content

In [ ]: import os
# Get the current directory
current_directory = os.getcwd()

# List all the files in the current directory


files = os.listdir(current_directory)

# Filter out only the .xlsx files


xlsx_files = [file for file in files]

# Print the list of .xlsx files


print("Excel files under the current directory:")
for xlsx_file in xlsx_files:
print(xlsx_file)

Excel files under the current directory:


.config
drive
sample_data

In [ ]: import os

from google.colab import drive


drive.mount('/content/drive')

os.chdir('/content/drive/My Drive')
files = os.listdir()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

In [ ]: df = pd.read_excel('12. 通訊網路業(23E,23K1K2).xlsx')
df.head()
Out[ ]: 本期綜 營業 營業 稅後
非流動 非流動 營業收入 營業費 營業利 每股 季底
公司 年度 流動資產 資產總額 流動負債 負債總額 股本 ... 營業毛利 合損益 毛利 利益 淨利
資產 負債 淨額 用 益 盈餘 股
總額 率 率 率

2314
0 2022 4501789 2039261 6541050 3325624 1137822 4463446 2380283 4482301 ... 629308 1048047 -418739 -411957 -2.06 14.04 -9.34 -10.85 93
台揚

2332
1 2022 11103868 4317673 15421541 4953414 906671 5860085 5998365 17077888 ... 4314830 3762322 552508 755788 0.18 25.27 3.24 1.52 89
友訊

2345
2 2022 41882761 4178048 46060809 21567677 4064971 25632648 5601399 77205223 ... 16518262 6885654 9632608 8218603 14.64 21.40 12.48 10.58 1313
智邦

2419
3 2022 9562665 2675430 12238095 6058063 120155 6178218 3213172 12318229 ... 2711167 1913970 797197 863234 1.50 22.01 6.47 4.94 76
仲琦

2444
4 2022 1428243 682555 2110798 626711 490722 1117433 1055576 1802375 ... 318814 201596 117218 121085 1.18 17.69 6.50 6.66 17
兆勁

5 rows × 21 columns

 

In [ ]: df.describe()
Out[ ]:
年度 流動資產 非流動資產 資產總額 流動負債 非流動負債 負債總額 股本 營業收入淨額 營業成本 營業毛利 營

count 210.000000 2.100000e+02 2.100000e+02 2.100000e+02 2.100000e+02 2.100000e+02 2.100000e+02 2.100000e+02 2.100000e+02 2.100000e+02 2.100000e+02 2.10000

mean 2017.500000 9.052694e+06 2.642744e+06 1.169544e+07 5.466099e+06 6.001561e+05 6.066255e+06 2.681074e+06 1.519220e+07 1.273901e+07 2.453188e+06 1.94491

std 2.879145 8.678743e+06 2.367828e+06 1.062071e+07 6.013315e+06 1.011919e+06 6.722186e+06 1.648993e+06 1.650684e+07 1.403928e+07 2.708213e+06 1.89811

min 2013.000000 6.197480e+05 4.181100e+04 9.104460e+05 9.970500e+04 0.000000e+00 1.476310e+05 1.929200e+05 0.000000e+00 0.000000e+00 0.000000e+00 0.00000

25% 2015.000000 2.917963e+06 7.235212e+05 4.207700e+06 1.556856e+06 6.848775e+04 1.761672e+06 1.307526e+06 3.965642e+06 3.371364e+06 5.107795e+05 5.36462

50% 2017.500000 5.550054e+06 2.119766e+06 7.886796e+06 3.121683e+06 2.266640e+05 3.887001e+06 2.409562e+06 8.110573e+06 7.018242e+06 1.266440e+06 1.16562

75% 2020.000000 1.249885e+07 3.528458e+06 1.671675e+07 7.363838e+06 6.996238e+05 7.900389e+06 3.486676e+06 2.212473e+07 1.760470e+07 4.002777e+06 3.05184

max 2022.000000 5.240555e+07 1.347240e+07 6.587795e+07 3.960358e+07 7.802910e+06 4.552176e+07 6.769961e+06 9.525745e+07 8.366274e+07 1.651826e+07 8.35708

 

1. 請問「ROE(A)-稅後」平均數為何?

如上面表格,「ROE(A)-稅後」平均為5.912048

2. 請針對該母體資料進行抽樣,抽出一組樣本數為 20 之樣本,儲存於 dataframe中並印製出來。(隨機種子請設定為組別,例如,第一組的答案隨


機種子設定為 1)

使用 df.sample 方法,將n設定20,random_state設為組別25。

In [ ]: sample_df = df.sample(n=20, random_state=25)


sample_df.head(20)
Out[ ]: 本期綜 營業 營業 稅後
非流動 非流動 營業收入 營業毛 營業費 營業利 每股 季底
公司 年度 流動資產 資產總額 流動負債 負債總額 股本 ... 合損益 毛利 利益 淨利
資產 負債 淨額 利 用 益 盈餘 股
總額 率 率 率

6416
19 瑞祺 2022 4537955 761511 5299466 1646584 108109 1754693 731889 4982672 ... 1122041 619472 502569 464750 5.85 22.52 10.09 9.08 637
電通

5388
77 2019 21424452 4752628 26177080 15656189 2844892 18501081 2490548 31797130 ... 5079038 4092136 986902 924612 4.21 15.97 3.10 3.25 1935
中磊

5388
119 2017 20457851 4309595 24767446 17304180 236970 17541150 2456538 38600003 ... 5027843 3493639 1534204 715006 5.38 13.03 3.97 3.34 2080
中磊

6416
40 瑞祺 2021 4357648 598297 4955945 1446133 283411 1729544 731889 4673944 ... 924866 619119 305747 216489 3.00 19.79 6.54 5.14 725
電通

6152
205 2013 5897048 1720143 7617191 3804606 496702 4301308 1724005 11828464 ... 1633963 1349695 284268 447124 2.03 13.81 2.40 2.94 387
百一

5388
203 2013 9471368 3921531 13392899 7887024 778180 8665204 2110586 19076628 ... 3070040 2197849 872191 983866 4.19 16.09 4.57 4.43 1078
中磊

3596
73 2019 22052835 3478150 25530985 13044806 1145245 14190051 2085350 32897900 ... 4352375 2624863 1727512 1303283 6.85 13.23 5.25 4.12 1962
智易

6674
41 鋐寶 2021 2920676 318828 3239504 1544934 69853 1614787 684704 2906921 ... 572635 488802 83833 31670 0.49 19.70 2.88 1.13 209
科技

6674
125 鋐寶 2017 4156167 276256 4432423 2884239 71 2884310 603513 6817502 ... 738679 555031 183648 182269 3.03 10.84 2.69 2.67 301
科技

2332
169 2014 18102947 7307163 25410110 11495830 712648 12208478 6477557 30305802 ... 8273704 7966679 307025 196032 0.10 27.30 1.01 0.37 1204
友訊

3704
117 合勤 2017 11638883 2770391 14409274 6201089 224818 6425907 4411773 19141444 ... 4276668 4807005 -530337 -579185 -1.32 22.34 -2.77 -2.98 672

6285
102 2018 25793877 7370275 33164152 17094483 220458 17314941 3894121 56049676 ... 7112667 5096594 2016073 1718720 5.21 12.69 3.60 3.44 3111
啟碁
本期綜 營業 營業 稅後
非流動 非流動 營業收入 營業毛 營業費 營業利 每股 季底
公司 年度 流動資產 資產總額 流動負債 負債總額 股本 ... 合損益 毛利 利益 淨利
資產 負債 淨額 利 用 益 盈餘 股
總額 率 率 率

6152
37 2021 2702078 687360 3389438 1878650 126485 2005135 1677385 3793240 ... 561468 606623 -45155 -14576 0.05 14.80 -1.19 0.30 206
百一

2332
43 2020 12551922 3398308 15950230 5346116 863759 6209875 6519961 15179443 ... 4775295 4695069 80226 1075407 1.90 31.46 0.53 8.63 1871
友訊

3380
135 2016 11383804 2877620 14261424 5297486 445282 5742768 4344697 21830730 ... 3183471 2534479 648992 324698 1.40 14.58 2.97 2.79 851
明泰

6152
121 2017 4735030 1225849 5960879 3880389 123201 4003590 1724005 8158197 ... 880944 1078862 -197918 -227246 -1.16 10.80 -2.43 -2.43 146
百一

3047
154 2015 3409675 1168273 4577948 1998042 59881 2057923 1988890 5513650 ... 1448345 1301369 146976 147191 0.77 26.27 2.67 2.53 213
訊舟

3047
91 2018 4336393 2631929 6968322 3075211 1472491 4547702 1864916 6873561 ... 1653759 1441677 212082 127256 -0.15 24.06 3.09 1.77 175
訊舟

3025
195 2013 670292 317474 987766 107664 84086 191750 709206 603841 ... 265992 238617 27375 43400 0.57 44.05 4.53 6.39 80
星通

2485
152 2015 7958563 2310990 10269553 2579247 211213 2790460 3176890 11256992 ... 1749776 1010636 739140 792446 2.66 15.54 6.57 7.31 1620
兆赫

20 rows × 21 columns

3. 根據 2.抽出之樣本,使用 t 檢定,檢定「ROE(A)-稅後」母體平均數是否大於 0,統計量為何?p 值?結論?(顯著水準定為 0.05)

使用以下設定:

「ROE(A)-稅後」母體平均數不大於 0
H0 :

H : 「ROE(A)-稅後」母體平均數大於 0
1

In [ ]: import scipy.stats as st

res = st.ttest_1samp(sample_df['ROE(A)-稅後'], popmean=0, alternative="greater")


print('test statistics:', res.statistic)
print('p-value:', res.pvalue)
print('degree of freedom:', res.df)

test statistics: 4.550743540959721


p-value: 0.00010926623393065556
degree of freedom: 19

由於p − value約為0.0001,小於顯著水準0.05,故拒絕H ,推定「ROE(A)-稅後」母體平均數大於0。


0

4. 設定樣本數自 5 至 30,並重複 2 與 3 的步驟 1000 次,請根據 1 之結果,繪製該檢定之模擬檢定力曲線,如下圖。(隨機種子請設定為第 i 次執行


加上組別加上該迴圈的樣本個數,例如,第一組在計算樣本數為 5 下第三次重複的隨機種子設定為 1+3+5)
Hint:

檢定力 = P (RejectH o|H oisF alse)

使用巢狀,第一層為樣本數,第二層為重複次數,進行抽樣與檢定,計算出各樣本數之下 1000 次中拒絕 Ho 的比例。

In [ ]: sizes = range(5, 31)


h0_rejected_prob = []

for n in sizes:
h0_rejected_test = []
for i in range(1,1001):
sample = df.sample(n, random_state=1+n+i)
t_stat, p_value = ttest_1samp(sample['ROE(A)-稅後'], popmean=0, alternative="greater")
if p_value <= 0.05:
h0_rejected_test.append(1)

h0_rejected_prob.append(sum(h0_rejected_test)/1000)

plt.plot(sizes, h0_rejected_prob)
plt.xlabel('Sample Size')
plt.ylabel('Power')
plt.title('Power Curve')
plt.show()
檢定力曲線結果如圖

註:團隊分工

姓名 負責

黃少凱 第一、二題

郭柏亨 第三題

張哲綸 第四題
In [ ]:

You might also like