SlideShare a Scribd company logo
@canard0328
t t
2
r
Tn
t
t
or
3
http://nbviewer.ipython.org/gist/canard0328/6f44229365f53b7bd30f/
http://nbviewer.ipython.org/gist/canard0328/a5911ee5b4bf1a07fbcb/
https://gist.github.com/canard0328/07a65584c134a2700725
https://gist.github.com/canard0328/b2f8aec2b9c286f53400
4
Sample
Explore
Modify
Model
Assess
Sample Explore Modify Model Assess
t
t r
t
SEMMA
5
CRISPLDM CRossLIndustryNStandardNProcessNforNDataNMining
BusinessNUnderstanding
DataNUnderstanding
DataNPreparation
Modeling
Evaluation
Deployment
KDD KnowledgeNDiscoveryNinNDatabases
Selection
Preprocessing
Transformation
DataNMining
Interpretation/Evaluation
KKD Keiken,NKan andNDokyo
6
t
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv
(DataNobtainedNfromNhttp://biostat.mc.vanderbilt.edu/DataSets)
> data = read.csv(“titanic3.csv”,
+ stringsAsFactors=F, na.strings=c("","NA"))
>>> import pandas as pd
>>> data = pd.read_csv(‘titanic3.csv')
Sample Explore Modify
Assess Model
7
t
t
t
t
t
Sample Explore Modify
Assess Model
8
() ( ( ) )
) ( ()
(
( ) (
( ) (
÷
9
r
t r
r
r
10
11
1. t
2. t
3. t
4. t
12
t
t
t
t
Sample Explore Modify
Assess Model
13
t rT
14
u
nT t T
10of0K 15
N t NL1 t
Feature hashing /=Hashing trick 16
FeatureNhashing t
Nt v t
xN:=NnewNvector[N]
forNfNinNfeatures:
hN:=Nhash(f)
x[hNmodNN]N+=N1
http://en.wikipedia.org/wiki/Feature_hashing
(Curse=of=dimensionality) 17
t
r
g r ur n
u
t e
e e
t T Tn u
t e
(Standardization) 18
xt 10 i
t n
(Standardization) 19
a
(Standardization)
σ
µ−
=
x
z
σ
µ xt
xt
P 1 e
20
t r
(Feature selection)
t r t e
(ForwardNstepwiseNselection)
(BackwardNstepwiseNselection)
21
UglyNducklingNtheorem
T t t t t u
t t t t T
t
22
4. t
5. t
6. t
23
u
“MachineNlearningNisNtheNscienceNofNgettingNcomputersNtoN
actNwithoutNbeingNexplicitlyNprogrammed.”NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
AndrewNNg
u t T
t e e
23
Sample Explore Modify
Assess Model
24
supervisedNlearning
t
• classification
• regression
unsupervisedNlearning
u t
•
•
• outlierNdetection
25
gt t
• semiLsupervisedNlearning
• reinforcementNlearning
• activeNlearning
• onlineNlearning
• transferNlearning
26
•
•
•
• k
•
•
•
•
•
27
r
• KLmeansN
•
• Apriori
• OneLclassNSVM
28
nu
TnT t
rT
r rT
29
x
y
εββββ +++++= ii xxxy !22110
u
generalizedNlinearNmodel
u t
30
KLmeans
KLmeans u
t
n
T
GaussianNmixtureNmodel
t
31
t T
÷
u n T t T t
32
7.
3333
Sample Explore Modify
Assess Model
34
(MeanNabsoluteNerror)
T T
(MeanNsquare(d)Nerror)
T T
RootNMeanNSquare(d)N Error
R2(CoefficientNofNdetermination)
÷ T e
0( T) 1( T)
T r
35
(Accuracy)
(ErrorNrate)
1N
1 t t 100 t
e t u99%
u T T i
36
(ConfusionNmatrix)
(Positive)   26 5 8 6
(TrueNpositiveN:NTP) (FalseNnegativeN:NFN)
(FalseNpositiveN:NFP)   4: 6 96 5 8 6 / 42
T nT
v t r
37
(Precision)
TP/(TPN+NFP)
tt
(Recall)
TP/(TPN+NFN)
t
F (F1Nscore,NFLmeasure)
2 ( )N/N( ) P 2
3 TP FN
2 FP 42
38
(True Positive Rate)
TP/(TPN+NFN)
t
(False Positive Rate)
FP/(FPN+NTN)
t n
P 2
3 TP FN
2 FP 42
39
1 t t 100 t
e
(Positive) (Negative)
0 100
0 9900
0.99
0
0
F 0
40
t u
e r
T
t
rT e T e
SMOTE
u r rT T...
41
u
t T e
u r r
ROC
t r
t
AUC
ROC t t 1.0
42
ROC AUC
43
n
r
T t u
rT
>Nclf =NSVC().fit(X,Ny)
44
u
e
>Nclf =NSVC(kernel=‘rbf’,NC=1.0Ngamma=0.1).fit(X,Ny)
45
r t
T t e
46
t
r t( : t )u r
g rT tu n(10L2,10L1,100,101,102)
u
n
47
n
r
0.0 F 1.0 i
r
48
t 0.0 u t
49
(OverNfitting)
n
n T u T
e n
t e
e
t T T r
T eT
50
e r e rT
(Regularization) t
Lasso SVMr
t t
r
e n rT(UnderNfitting)
51
(Cross validation)
e
1. B E A
2. A,C E B
3. A,B,D,E C
4. A C,E D
5. A D E
6. 5t
5 5LfoldNcrossNvalidation
52
t
K
1 (LeaveLoneLout cross validation)
(StratifiedNcrossNvalidation)
t t
K
t
a r t e t
53
8.
9.
54
t
ε=N(0,Nσ2)
σ2+Bias2+Variance
Bias( )
t e
Variance( )
e
55
t
ε
t
56
ε
t
u T tv u T →
1
57
ε
t
T →
58
u t
t T
(OverNfitting)
t T
UnderNfitting
59
r ( )
( )
60
( ) T( T)
t T
t T
t nTrT
61
T
t T
t T
62
r e
t T t e
e
r e
63
10.
11. t
12. t
64
(EnsembleNlearning)
• t t
• Stacking Bagging Boosting
• u
DeepNlearning
• NeuralNnetworkst
• r
… 65
https://www.linkedin.com/pulse/inconvenientLtruthLdataLscienceLkamilLbartocha
66
MALSS
(MachineNLearningNSupportNSystem)
t e
Python
•
•
•
•
•
67
MALSS
> pip install –U malss
> from malss import MALSS
> clf = MALSS('classification‘, lang=‘jp’)
> clf.fit(X, y, ‘report_output_dir')
> clf.make_sample_code('sample_code.py')
68
MALSS
69
MALSS
70
F.NProvost
Coursera:=Machine=Learning
AndrewNNg https://www.coursera.org/course/ml
scikit0learn=Tutorials
http://scikitLlearn.org/stable/tutorial/
Tutorial:=Machine=Learning=for=Astronomy=with=Scikit0learn
http://www.astroml.org/sklearn_tutorial/
71
MALSS=(Machine=Learning=Support=System)
https://pypi.python.org/pypi/malss/
https://github.com/canard0328/malss
Python MALSS
Qiita http://qiita.com/canard0328/items/fe1ccd5721d59d76cc77
Python MALSS
Qiita http://qiita.com/canard0328/items/5da95ff4f2e1611f87e1
Python MALSS
Qiita http://qiita.com/canard0328/items/3713d6758fe9c045a19d
72
1.
SEMMA CRISPLDM KDD KKD
2. t
t T T t
3.
4.

More Related Content

PDF
Effective Numerical Computation in NumPy and SciPy
KEY
NumPy/SciPy Statistics
PDF
Python NumPy Tutorial | NumPy Array | Edureka
PDF
The Joy of SciPy
PDF
Pythonで機械学習入門以前
PDF
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
PPTX
Introduction to numpy
PDF
Everything You Always Wanted to Know About Memory in Python - But Were Afraid...
Effective Numerical Computation in NumPy and SciPy
NumPy/SciPy Statistics
Python NumPy Tutorial | NumPy Array | Edureka
The Joy of SciPy
Pythonで機械学習入門以前
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Introduction to numpy
Everything You Always Wanted to Know About Memory in Python - But Were Afraid...

What's hot (20)

PDF
Introduction to NumPy (PyData SV 2013)
PDF
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
PDF
What’s eating python performance
PPTX
PDF
Python for Scientific Computing -- Ricardo Cruz
PDF
Python update in 2018 #ll2018jp
PDF
Plotting data with python and pylab
PDF
TensorFlow example for AI Ukraine2016
PPTX
Introduction to numpy Session 1
PPTX
PDF
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
PPTX
Tensorflow internal
PDF
Welcome to python
PDF
Tokyo webmining 2017-10-28
PPTX
Tensorflow in practice by Engineer - donghwi cha
PDF
Scientific visualization with_gr
PDF
A peek on numerical programming in perl and python e christopher dyken 2005
PDF
Introduction to TensorFlow 2.0
PDF
TensorFlow Dev Summit 2017 요약
PDF
Python For Scientists
Introduction to NumPy (PyData SV 2013)
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
What’s eating python performance
Python for Scientific Computing -- Ricardo Cruz
Python update in 2018 #ll2018jp
Plotting data with python and pylab
TensorFlow example for AI Ukraine2016
Introduction to numpy Session 1
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
Tensorflow internal
Welcome to python
Tokyo webmining 2017-10-28
Tensorflow in practice by Engineer - donghwi cha
Scientific visualization with_gr
A peek on numerical programming in perl and python e christopher dyken 2005
Introduction to TensorFlow 2.0
TensorFlow Dev Summit 2017 요약
Python For Scientists
Ad

Similar to 機械学習によるデータ分析 実践編 (20)

PDF
It's Not Magic - Explaining classification algorithms
PPTX
Moviereview prjct
PPTX
Machine Learning Algorithms (Part 1)
PDF
ML MODULE 2.pdf
PPTX
wk5ppt1_Titanic
PDF
Random forest algorithm for regression a beginner's guide
PDF
A Survey on Stroke Prediction
PDF
A survey on heart stroke prediction
PDF
Data Science Cheatsheet.pdf
PDF
Scikit-learn Cheatsheet-Python
PDF
Visualizing the Model Selection Process
PDF
Machine Learning Guide maXbox Starter62
PDF
Cheat Sheet for Machine Learning in Python: Scikit-learn
PDF
Scikit learn cheat_sheet_python
PDF
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
PPTX
Classification: MNIST, training a Binary classifier, performance measure, mul...
PPTX
SVM - Functional Verification
PDF
maXbox starter67 machine learning V
PPTX
svm classification
PPTX
knn classification
It's Not Magic - Explaining classification algorithms
Moviereview prjct
Machine Learning Algorithms (Part 1)
ML MODULE 2.pdf
wk5ppt1_Titanic
Random forest algorithm for regression a beginner's guide
A Survey on Stroke Prediction
A survey on heart stroke prediction
Data Science Cheatsheet.pdf
Scikit-learn Cheatsheet-Python
Visualizing the Model Selection Process
Machine Learning Guide maXbox Starter62
Cheat Sheet for Machine Learning in Python: Scikit-learn
Scikit learn cheat_sheet_python
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
Classification: MNIST, training a Binary classifier, performance measure, mul...
SVM - Functional Verification
maXbox starter67 machine learning V
svm classification
knn classification
Ad

Recently uploaded (20)

PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Inferential Statistics.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Managing Community Partner Relationships
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Transcultural that can help you someday.
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
DOCX
Factor Analysis Word Document Presentation
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
modul_python (1).pptx for professional and student
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
importance of Data-Visualization-in-Data-Science. for mba studnts
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Inferential Statistics.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
[EN] Industrial Machine Downtime Prediction
Optimise Shopper Experiences with a Strong Data Estate.pdf
Managing Community Partner Relationships
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Transcultural that can help you someday.
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Qualitative Qantitative and Mixed Methods.pptx
CYBER SECURITY the Next Warefare Tactics
Factor Analysis Word Document Presentation
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
STERILIZATION AND DISINFECTION-1.ppthhhbx
modul_python (1).pptx for professional and student
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...

機械学習によるデータ分析 実践編