Assignment 1 - Introduction To Machine Learning: Version 1.0 of This Notebook. To Download
Assignment 1 - Introduction To Machine Learning: Version 1.0 of This Notebook. To Download
Assignment 1 - Introduction to
Machine Learning
For this assignment, you will be using the Breast Cancer Wisconsin
(Diagnostic) Database to create a classifier that can help diagnose
patients. First, read through the description of the dataset (below).
import pandas as pd
cancer = load_breast_cancer()
==========================================
===
Notes
-----
:Attribute Information:
- perimeter
- area
- symmetry
- class:
- WDBC-Malignant
- WDBC-Benign
:Summary Statistics:
=====================================
====== ======
Min Max
=====================================
====== ======
radius (mean):
6.981 28.11
texture (mean):
9.71 39.28
perimeter (mean):
43.79 188.5
area (mean):
143.5 2501.0
smoothness (mean):
0.053 0.163
compactness (mean):
0.019 0.345
concavity (mean):
0.0 0.427
symmetry (mean):
0.106 0.304
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 3/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
radius (worst):
7.93 36.04
texture (worst):
12.02 49.54
perimeter (worst):
50.41 251.2
area (worst):
185.2 4254.0
smoothness (worst):
0.071 0.223
compactness (worst):
0.027 1.058
concavity (worst):
0.0 1.252
symmetry (worst):
0.156 0.664
=====================================
====== ======
https://goo.gl/U2Uwz2
ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDB
C/
References
----------
July-August 1995.
163-171.
In [2]: cancer.keys()
Question 0 (Example)
How many features does the breast cancer dataset have?
def answer_zero():
return len(cancer['feature_names'])
answer_zero()
Out[4]: 30
Question 1
Scikit-learn works with lists, numpy arrays, scipy-sparse matrices, and
pandas DataFrames, so converting the dataset to a DataFrame is not
necessary for training this model. Using a DataFrame does however
help make many things easier such as munging data, so let's practice
creating a classifier with a pandas DataFrame.
columns =
'target']
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 6/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
and index =
return pd.DataFrame(data=np.c_[cancer.
data, cancer.target], columns=list(cancer.
feature_names) + ['target'])
Out[48]:
mean mean mean mean mean
radius texture perimeter area smoothness
Question 2
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 8/31
Question 2
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
cancerdf = answer_one()
df = cancerdf['target'].value_counts()
df.index = ['benign', 'malignant']
return df
malignant 212
Question 3
Split the DataFrame into X (the data) and y (the labels).
cancerdf = answer_one()
X = cancerdf.iloc[:, 0:30]
y = cancerdf['target']
return X, y
0 17.990 10.38
122.80 1001.0 0.11840
1 20.570 17.77
132.90 1326.0 0.08474
2 19.690 21.25
130.00 1203.0 0.10960
3 11.420 20.38
77.58 386.1 0.14250
4 20.290 14.34
135.10 1297.0 0.10030
5 12.450 15.70
82.57 477.1 0.12780
6 18.250 19.98
119.60 1040.0 0.09463
7 13.710 20.83
90.20 577.9 0.11890
8 13.000 21.82
87.50 519.8 0.12730
9 12.460 24.04
83.97 475.9 0.11860
10 16.020 23.24
102.70 797.8 0.08206
11 15.780 17.89
103.60 781.0 0.09710
12 19.170 24.80
132.40 1123.0 0.09740
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 9/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
132.40 1123.0 0.09740
13 15.850 23.95
103.70 782.7 0.08401
14 13.730 22.61
93.60 578.3 0.11310
15 14.540 27.54
96.73 658.8 0.11390
16 14.680 20.13
94.74 684.5 0.09867
17 16.130 20.68
108.10 798.8 0.11700
18 19.810 22.15
130.00 1260.0 0.09831
19 13.540 14.36
87.46 566.3 0.09779
20 13.080 15.71
85.63 520.0 0.10750
21 9.504 12.44
60.34 273.9 0.10240
22 15.340 14.26
102.50 704.4 0.10730
23 21.160 23.04
137.20 1404.0 0.09428
24 16.650 21.38
110.00 904.6 0.11210
25 17.140 16.40
116.00 912.7 0.11860
26 14.580 21.53
97.41 644.8 0.10540
27 18.610 20.25
122.10 1094.0 0.09440
28 15.300 25.27
102.40 732.4 0.10820
29 17.570 15.05
115.00 955.1 0.09847
.. ... ...
... ... ...
0 0.27760 0.300100
0.147100 0.2419
1 0.07864 0.086900
0.070170 0.1812
2 0.15990 0.197400
0.127900 0.2069
3 0.28390 0.241400
0.105200 0.2597
4 0.13280 0.198000
0.104300 0.1809
5 0.17000 0.157800
0.080890 0.2087
6 0.10900 0.112700
0.074000 0.1794
7 0.16450 0.093660
0.059850 0.2196
8 0.19320 0.185900
0.093530 0.2350
9 0.23960 0.227300
0.085430 0.2030
10 0.06669 0.032990
0.033230 0.1528
11 0.12920 0.099540
0.066060 0.1842
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 11/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
12 0.24580 0.206500
0.111800 0.2397
13 0.10020 0.099380
0.053640 0.1847
14 0.22930 0.212800
0.080250 0.2069
15 0.15950 0.163900
0.073640 0.2303
16 0.07200 0.073950
0.052590 0.1586
17 0.20220 0.172200
0.102800 0.2164
18 0.10270 0.147900
0.094980 0.1582
19 0.08129 0.066640
0.047810 0.1885
20 0.12700 0.045680
0.031100 0.1967
21 0.06492 0.029560
0.020760 0.1815
22 0.21350 0.207700
0.097560 0.2521
23 0.10220 0.109700
0.086320 0.1769
24 0.14570 0.152500
0.091700 0.1995
25 0.22760 0.222900
0.140100 0.3040
26 0.18680 0.142500
0.087830 0.2252
27 0.10660 0.149000
0.077310 0.1697
28 0.16970 0.168300
0.087510 0.1926
29 0.11570 0.098750
0.079530 0.1739
.. ... ...
... ...
0 0.07871
... 25.380
1 0.05667
... 24.990
2 0.05999
... 23.570
3 0.09744
... 14.910
4 0.05883
... 22.540
5 0.07613
... 15.470
6 0.05742
... 22.880
7 0.07451
... 17.060
8 0.07389
... 15.490
9 0.08243
... 15.090
10 0.05697
... 19.190
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 13/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
11 0.06082
... 20.420
12 0.07800
... 20.960
13 0.05338
... 16.840
14 0.07682
... 15.030
15 0.07077
... 17.460
16 0.05922
... 19.070
17 0.07356
... 20.960
18 0.05395
... 27.320
19 0.05766
... 15.110
20 0.06811
... 14.500
21 0.06905
... 10.230
22 0.07032
... 18.070
23 0.05278
... 29.170
24 0.06330
... 26.460
25 0.07413
... 22.250
26 0.06924
... 17.620
27 0.05699
... 21.310
28 0.06540
... 20.270
29 0.06149
... 20.010
.. ...
... ...
539 0.07751
... 8.678
540 0.06782
... 12.260
541 0.06341
... 16.220
542 0.05680
... 16.510
543 0.05781
... 14.370
544 0.06688
... 15.050
545 0.05801
... 15.350
546 0.06201
... 11.250
547 0.06714
... 10.830
548 0.06235
... 10.930
549 0.06328
... 13.030
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 14/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
550 0.05948
... 11.660
551 0.06552
... 12.020
552 0.05637
... 13.870
553 0.06576
... 9.845
554 0.05708
... 13.890
555 0.06127
... 10.840
556 0.06331
... 10.650
557 0.06059
... 10.490
558 0.06147
... 15.480
559 0.06570
... 12.480
560 0.06171
... 15.300
561 0.05502
... 11.920
562 0.07152
... 17.520
563 0.06879
... 24.290
564 0.05623
... 25.450
565 0.05533
... 23.690
566 0.05648
... 18.980
567 0.07016
... 25.740
568 0.05884
... 9.456
0 17.33 184.60
2019.0 0.16220
1 23.41 158.80
1956.0 0.12380
2 25.53 152.50
1709.0 0.14440
3 26.50 98.87
567.7 0.20980
4 16.67 152.20
1575.0 0.13740
5 23.75 103.40
741.6 0.17910
6 27.66 153.20
1606.0 0.14420
7 28.14 110.60
897.0 0.16540
8 30.73 106.20
739.3 0.17030
9 40.68 97.65
711.4 0.18530
10 33 88 123 80
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 15/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
10 33.88 123.80
1150.0 0.11810
11 27.28 136.50
1299.0 0.13960
12 29.94 151.70
1332.0 0.10370
13 27.66 112.00
876.5 0.11310
14 32.01 108.80
697.7 0.16510
15 37.13 124.10
943.2 0.16780
16 30.88 123.40
1138.0 0.14640
17 31.48 136.80
1315.0 0.17890
18 30.88 186.80
2398.0 0.15120
19 19.26 99.70
711.2 0.14400
20 20.49 96.09
630.5 0.13120
21 15.66 65.13
314.9 0.13240
22 19.08 125.10
980.9 0.13900
23 35.59 188.00
2615.0 0.14010
24 31.56 177.00
2215.0 0.18050
25 21.40 152.40
1461.0 0.15450
26 33.21 122.40
896.9 0.15250
27 27.26 139.90
1403.0 0.13380
28 36.71 149.30
1269.0 0.16410
29 19.52 134.90
1227.0 0.12550
.. ... ...
... ...
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 16/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
549 31.45 83.90
505.6 0.12040
0 0.66560 0.71190
0.26540 0.4601
1 0.18660 0.24160
0.18600 0.2750
2 0.42450 0.45040
0.24300 0.3613
3 0.86630 0.68690
0.25750 0.6638
4 0.20500 0.40000
0.16250 0.2364
5 0.52490 0.53550
0.17410 0.3985
6 0.25760 0.37840
0.19320 0.3063
7 0.36820 0.26780
0.15560 0.3196
8 0.54010 0.53900
0.20600 0.4378
9 1.05800 1.10500
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 17/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
0.22100 0.4366
10 0.15510 0.14590
0.09975 0.2948
11 0.56090 0.39650
0.18100 0.3792
12 0.39030 0.36390
0.17670 0.3176
13 0.19240 0.23220
0.11190 0.2809
14 0.77250 0.69430
0.22080 0.3596
15 0.65770 0.70260
0.17120 0.4218
16 0.18710 0.29140
0.16090 0.3029
17 0.42330 0.47840
0.20730 0.3706
18 0.31500 0.53720
0.23880 0.2768
19 0.17730 0.23900
0.12880 0.2977
20 0.27760 0.18900
0.07283 0.3184
21 0.11480 0.08867
0.06227 0.2450
22 0.59540 0.63050
0.23930 0.4667
23 0.26000 0.31550
0.20090 0.2822
24 0.35780 0.46950
0.20950 0.3613
25 0.39490 0.38530
0.25500 0.4066
26 0.66430 0.55390
0.27010 0.4264
27 0.21170 0.34460
0.14900 0.2341
28 0.61100 0.63350
0.20240 0.4027
29 0.28120 0.24890
0.14560 0.2756
.. ... ...
... ...
0 0.11890
1 0.08902
2 0.08758
3 0.17300
4 0.07678
5 0.12440
6 0.08368
7 0.11510
8 0.10720
9 0.20750
10 0.08452
11 0.10480
12 0.10230
13 0.06287
14 0.14310
15 0.13410
16 0.08216
17 0.11420
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 19/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
18 0.07615
19 0.07259
20 0.08183
21 0.07773
22 0.09946
23 0.07526
24 0.09564
25 0.10590
26 0.12750
27 0.07421
28 0.09876
29 0.07919
.. ...
539 0.10660
540 0.08134
541 0.10230
542 0.06956
543 0.06443
544 0.08492
545 0.06953
546 0.07399
547 0.09479
548 0.07920
549 0.07626
550 0.06592
551 0.08032
552 0.06484
553 0.07393
554 0.07242
555 0.08283
556 0.06742
557 0.06969
558 0.08004
559 0.08732
560 0.08321
561 0.05905
562 0.14090
563 0.09873
564 0.07115
565 0.06637
566 0.07820
567 0.12400
568 0.07039
1 0.0
2 0.0
3 0.0
4 0.0
5 0.0
6 0.0
7 0.0
8 0.0
9 0.0
10 0.0
11 0.0
12 0.0
13 0.0
14 0.0
15 0.0
16 0.0
17 0.0
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 20/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
17 0.0
18 0.0
19 1.0
20 1.0
21 1.0
22 0.0
23 0.0
24 0.0
25 0.0
26 0.0
27 0.0
28 0.0
29 0.0
...
539 1.0
540 1.0
541 1.0
542 1.0
543 1.0
544 1.0
545 1.0
546 1.0
547 1.0
548 1.0
549 1.0
550 1.0
551 1.0
552 1.0
553 1.0
554 1.0
555 1.0
556 1.0
557 1.0
558 1.0
559 1.0
560 1.0
561 1.0
562 0.0
563 0.0
564 0.0
565 0.0
566 0.0
567 0.0
568 1.0
Question 4
Using train_test_split, split X and y into training and test sets
(X_train, X_test, y_train, and y_test).
def answer_four():
X, y = answer_three()
68 9.029 17.33
58.79 250.5 0.10660
65 14.780 23.94
97.40 668.3 0.11720
7 13 710 20 83
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 22/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
7 13.710 20.83
90.20 577.9 0.11890
.. ... ...
... ... ...
99 14.420 19.77
94.48 642.5 0.09752
72 17.200 24.52
114.20 929.4 0.10710
87 19.020 24.59
122.00 1076.0 0.09029
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 23/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
472 14.920 14.93
96.45 686.9 0.08098
70 18.940 21.31
123.60 1130.0 0.09009
9 12.460 24.04
83.97 475.9 0.11860
68 0.14130 0.313000
0.043750 0.2111
65 0.14790 0.126700
0.090290 0.1953
7 0.16450 0.093660
0.059850 0.2196
.. ... ...
... ...
99 0.11410 0.093880
0.058390 0.1879
72 0.18300 0.169200
0.079440 0.1927
87 0.12060 0.146800
0.082710 0.1953
70 0.10290 0.108000
0.079510 0.1582
9 0.23960 0.227300
0.085430 0.2030
293 0.05715
... 13.060
332 0.06028
... 11.980
565 0.05533
... 23.690
278 0.05520
... 15.500
489 0.05325
... 19.180
346 0.06048
... 13.640
357 0.05883
... 15.110
355 0.06184
... 13.370
112 0.07769
... 15.300
68 0.08046
... 10.310
526 0.06317
... 15.350
206 0.06285
... 10.420
65 0.06654
... 17.310
437 0.05898
... 15.660
126 0.06130
... 16.890
429 0.05544
... 13.820
392 0.06744
... 21.200
343 0.05715
... 22.750
334 0.05945
... 13.350
440 0.06640
... 12.360
441 0.05407
... 20.380
137 0.05865
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 26/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
... 12.320
230 0.06325
... 19.590
7 0.07451
... 17.060
408 0.06069
... 21.080
523 0.06843
... 15.110
361 0.05696
... 14.200
553 0.06576
... 9.845
478 0.06574
... 12.400
303 0.06600
... 11.060
.. ...
... ...
459 0.05952
... 10.670
510 0.06758
... 12.450
151 0.08261
... 9.092
244 0.06000
... 21.650
543 0.05781
... 14.370
544 0.06688
... 15.050
265 0.05674
... 32.490
288 0.06233
... 11.860
423 0.06181
... 15.140
147 0.06493
... 16.250
177 0.06323
... 17.790
99 0.06390
... 16.330
448 0.05746
... 16.300
431 0.07102
... 12.880
115 0.06194
... 13.670
72 0.06487
... 23.320
537 0.07405
... 12.980
174 0.05975
... 11.540
87 0.05629
... 24.560
551 0.06552
... 12.020
486 0.05355
... 16.460
314 0.07359
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 27/31
16/10/2021, 10:42 applied-machine-learning-in-python/Assignment+1.ipynb at master · amirkeren/applied-machine-learning-in-python · GitHub
314 0.07359
... 8.952
396 0.06079
... 14.800
472 0.05669
... 17.180
70 0.05461
... 24.860
277 0.04996
... 19.960
9 0.08243
... 15.090
359 0.06959
... 12.020
192 0.06447
... 9.968
559 0.06570
... 12.480
68 22.65 65.50
324.7 0.14820
65 33.39 114.60
925.1 0.16480
7 28.14 110.60
897.0 0.16540
.. ... ...
... ...
99 30.86 109.50
826.4 0.14310
72 33.82 151.60
1681.0 0.15850
87 30.41 152.90
1623.0 0.12490
70 26.58 165.90
1866.0 0.11930
9 40.68 97.65
711.4 0.18530
68 0.43650 1.25200
0.17500 0.4228
65 0.34160 0.30240
0.16140 0.3321
7 0.36820 0.26780
0.15560 0.3196
.. ... ...
... ...
https://github.com/amirkeren/applied-machine-learning-in-python/blob/master/Assignment%2B1.ipynb 31/31