0% found this document useful (0 votes)
333 views

DDT: Distributed Decision Tree

DDT: Distributed Decision Tree Paper-18 A. Desai PhD Scholar, School of Engineering & Applied Sciences, Ahmedabad University Co-author: S. Chaudhary Professor, School of Engineering & Applied Science Ahmedabad University ACM Compute, October 2016
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
333 views

DDT: Distributed Decision Tree

DDT: Distributed Decision Tree Paper-18 A. Desai PhD Scholar, School of Engineering & Applied Sciences, Ahmedabad University Co-author: S. Chaudhary Professor, School of Engineering & Applied Science Ahmedabad University ACM Compute, October 2016
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

DDT: Distributed

Decision Tree

A. Desai

DDT: Distributed Decision Tree Introduction


Need for Research
Paper-18 Distributed
Decision Tree
Related Work
Working of
Distributed Decision
A. Desai Tree

Results and
PhD Scholar, School of Engineering & Applied Sciences, Discussion
Ahmedabad University Results
Discussion

Summary
Co-author:
S. Chaudhary
Professor,
School of Engineering & Applied Science
Ahmedabad University

ACM Compute, October 2016

1 / 35
DDT: Distributed
Outline Decision Tree

A. Desai

Introduction
Need for Research

Introduction Distributed
Decision Tree
Need for Research Related Work
Working of
Distributed Decision
Tree

Results and
Distributed Decision Tree Discussion
Results
Related Work Discussion

Working of Distributed Decision Tree Summary

Results and Discussion


Results
Discussion

2 / 35
DDT: Distributed
Introduction Decision Tree

A. Desai

Introduction
Need for Research
I What is Machine Learning (ML)?
Distributed
I Decision Tree, Support Vector Machines, Neural Decision Tree
Related Work
Networks etc. Working of
Distributed Decision
I Types of Problem in ML. Tree

Results and
I Classification, regression and clustring Discussion
Results
I Problem domains with respect to Architecture (Single Discussion

node v/s Cluster) Summary


I Machine Learning and Statistics (Single node - only
memory), Classical data mining (Single node - memory
+ disk) and Distributed Environment (Cluster Archi.)
I Why Boosting?
I Why MapReduce?

3 / 35
1 DDT: Distributed
Introduction Decision Tree

A. Desai

Introduction
I What is Machine Learning (ML)? Need for Research

I Decision Tree, Support Vector Machines, Neural Distributed


Decision Tree
Networks etc. Related Work
Working of
I Types of Problem in ML. Distributed Decision
Tree

I Classification, regression and clustring Results and


Discussion
I Problem domains with respect to Architecture (Single Results
Discussion
node v/s alertCluster) Summary
I Machine Learning and Statistics (Single node - only
memory), Classical data mining (Single node - memory
+ disk) and Distributed Environment (Cluster Archi.)
I Why Boosting?
I Why MapReduce?

1
DDT uses Decision Tree to solve classification problem using
ensemble of trees and MapReduce approach
4 / 35
DDT: Distributed
Outline Decision Tree

A. Desai

Introduction
Need for Research

Introduction Distributed
Decision Tree
Need for Research Related Work
Working of
Distributed Decision
Tree

Results and
Distributed Decision Tree Discussion
Results
Related Work Discussion

Working of Distributed Decision Tree Summary

Results and Discussion


Results
Discussion

5 / 35
DDT: Distributed
Need for Research Decision Tree

A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree
I Data Analytics Approaches wrt memory requirement Results and
Discussion
Results
Discussion

Summary

6 / 35
DDT: Distributed
Need for Research Decision Tree

A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree
I Data Analytics Approaches wrt memory requirement Results and
Discussion
I Machine Learning and Statistics Results
Discussion

Summary

6 / 35
DDT: Distributed
Need for Research Decision Tree

A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree
I Data Analytics Approaches wrt memory requirement Results and
Discussion
I Machine Learning and Statistics Results
I Classical Data Mining Discussion

Summary

6 / 35
DDT: Distributed
Need for Research Decision Tree

A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree
I Data Analytics Approaches wrt memory requirement Results and
Discussion
I Machine Learning and Statistics Results
I Classical Data Mining Discussion

I Data Mining in Distributed Environment Summary

6 / 35
DDT: Distributed
Outline Decision Tree

A. Desai

Introduction
Need for Research

Introduction Distributed
Decision Tree
Need for Research Related Work
Working of
Distributed Decision
Tree

Results and
Distributed Decision Tree Discussion
Results
Related Work Discussion

Working of Distributed Decision Tree Summary

Results and Discussion


Results
Discussion

7 / 35
DDT: Distributed
Related Work Decision Tree

PLANET [12] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
I Scalable construction of classification and regression Related Work
Working of
trees Distributed Decision
Tree

Results and
Discussion
Results
Discussion

Summary

8 / 35
DDT: Distributed
Related Work Decision Tree

PLANET [12] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
I Scalable construction of classification and regression Related Work
Working of
trees Distributed Decision
Tree

I Core: Decide which attribute to split on (using Results and


Discussion
MapReduce) Results
Discussion

Summary

8 / 35
DDT: Distributed
Related Work Decision Tree

PLANET [12] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
I Scalable construction of classification and regression Related Work
Working of
trees Distributed Decision
Tree

I Core: Decide which attribute to split on (using Results and


Discussion
MapReduce) Results
Discussion
I Dataset: AdCorpus (314 million records) Summary

8 / 35
DDT: Distributed
Related Work Decision Tree

PLANET [12] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
I Scalable construction of classification and regression Related Work
Working of
trees Distributed Decision
Tree

I Core: Decide which attribute to split on (using Results and


Discussion
MapReduce) Results
Discussion
I Dataset: AdCorpus (314 million records) Summary

I Parameters: Scalability, Training time versus Data size,


Running time versus Depth of tree and Number of tree
versus error reduction

8 / 35
DDT: Distributed
Related Work Decision Tree

PLANET [12] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
I Scalable construction of classification and regression Related Work
Working of
trees Distributed Decision
Tree

I Core: Decide which attribute to split on (using Results and


Discussion
MapReduce) Results
Discussion
I Dataset: AdCorpus (314 million records) Summary

I Parameters: Scalability, Training time versus Data size,


Running time versus Depth of tree and Number of tree
versus error reduction
I Limitation: 2 class problems only

8 / 35
DDT: Distributed
Related Work Decision Tree

MReC4.5 [7] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree
I Bagging ensemble of C4.5 trees using MapReduce Results and
Discussion
Results
Discussion

Summary

9 / 35
DDT: Distributed
Related Work Decision Tree

MReC4.5 [7] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree
I Bagging ensemble of C4.5 trees using MapReduce Results and
Discussion
I Dataset: Breast Cancer Dataset Results
Discussion

Summary

9 / 35
DDT: Distributed
Related Work Decision Tree

MReC4.5 [7] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree
I Bagging ensemble of C4.5 trees using MapReduce Results and
Discussion
I Dataset: Breast Cancer Dataset Results
Discussion
I Parameters: Partition time, Map time, Reduce time, Summary
Total time, Number of base classifiers and Number of
nodes

9 / 35
DDT: Distributed
Related Work Decision Tree

MRC4.5 (MapReduce C4.5) [10] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree
I Time efficient and Scalable implementation of Decision Results and
Tree using MapReduce Discussion
Results
Discussion

Summary

10 / 35
DDT: Distributed
Related Work Decision Tree

MRC4.5 (MapReduce C4.5) [10] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree
I Time efficient and Scalable implementation of Decision Results and
Tree using MapReduce Discussion
Results
Discussion
I Dataset: 1.5-3 million records, Synthetic Dataset
Summary

10 / 35
DDT: Distributed
Related Work Decision Tree

MRC4.5 (MapReduce C4.5) [10] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree
I Time efficient and Scalable implementation of Decision Results and
Tree using MapReduce Discussion
Results
Discussion
I Dataset: 1.5-3 million records, Synthetic Dataset
Summary
I Parameters: Execution Time versus Number of
Instances and Execution Time versus Number of nodes

10 / 35
DDT: Distributed
Related Work Decision Tree

MRTrees [11] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
I Decision Tree algorithm i.e. ID3 extension using Working of
Distributed Decision
MapReduce Tree

Results and
Discussion
Results
Discussion

Summary

11 / 35
DDT: Distributed
Related Work Decision Tree

MRTrees [11] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
I Decision Tree algorithm i.e. ID3 extension using Working of
Distributed Decision
MapReduce Tree

Results and
I Scalable with the size of data Discussion
Results
Discussion

Summary

11 / 35
DDT: Distributed
Related Work Decision Tree

MRTrees [11] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
I Decision Tree algorithm i.e. ID3 extension using Working of
Distributed Decision
MapReduce Tree

Results and
I Scalable with the size of data Discussion
Results
I Core: Decide on split attribute, used pruning Discussion

Summary

11 / 35
DDT: Distributed
Related Work Decision Tree

MRTrees [11] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
I Decision Tree algorithm i.e. ID3 extension using Working of
Distributed Decision
MapReduce Tree

Results and
I Scalable with the size of data Discussion
Results
I Core: Decide on split attribute, used pruning Discussion

Summary
I Dataset: US Census Bureuse dataset (100 GB)

11 / 35
DDT: Distributed
Related Work Decision Tree

MRTrees [11] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
I Decision Tree algorithm i.e. ID3 extension using Working of
Distributed Decision
MapReduce Tree

Results and
I Scalable with the size of data Discussion
Results
I Core: Decide on split attribute, used pruning Discussion

Summary
I Dataset: US Census Bureuse dataset (100 GB)
I Parameter: Datasize versus Running Time

11 / 35
DDT: Distributed
Related Work Decision Tree

MRTrees [11] A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
I Decision Tree algorithm i.e. ID3 extension using Working of
Distributed Decision
MapReduce Tree

Results and
I Scalable with the size of data Discussion
Results
I Core: Decide on split attribute, used pruning Discussion

Summary
I Dataset: US Census Bureuse dataset (100 GB)
I Parameter: Datasize versus Running Time
note: In all work surveyed, the authors have not compared
their implementation with any other similar algorithm.

11 / 35
DDT: Distributed
Outline Decision Tree

A. Desai

Introduction
Need for Research

Introduction Distributed
Decision Tree
Need for Research Related Work
Working of
Distributed Decision
Tree

Results and
Distributed Decision Tree Discussion
Results
Related Work Discussion

Working of Distributed Decision Tree Summary

Results and Discussion


Results
Discussion

12 / 35
DDT: Distributed
Working of Distributed Decision Tree Decision Tree

MapReduce of DDT A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree

Results and
Discussion
Results
Discussion

Summary

Figure : MapReduce of DDT


13 / 35
DDT: Distributed
Working of Distributed Decision Tree Decision Tree

Phase 1 Model Construction A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree

Results and
Discussion
Results
Discussion

Summary

Figure : phase 1 model construction

14 / 35
DDT: Distributed
Working of Distributed Decision Tree Decision Tree

Phase 2 Model Evaluation A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree

Results and
Discussion
Results
Discussion

Summary

Figure : phase 2 model evaluation

15 / 35
DDT: Distributed
Outline Decision Tree

A. Desai

Introduction
Need for Research

Introduction Distributed
Decision Tree
Need for Research Related Work
Working of
Distributed Decision
Tree

Results and
Distributed Decision Tree Discussion
Results
Related Work Discussion

Working of Distributed Decision Tree Summary

Results and Discussion


Results
Discussion

16 / 35
DDT: Distributed
Results Decision Tree

Main Table A. Desai

Introduction
Need for Research

Distributed
Characteristic Accuracy Size of Tree Number of Leaves Decision Tree
Data-set #C #I #N #No DT BT DDT ST DT BT DDT ST DT BT DDT ST Related Work
bcw 2 699 9 1 98.14 100 96.99 95.85 27 38 3.6 12.5 14 19.5 2.3 6.75 Working of
bupa 2 345 6 1 84.64 97.97 65.8 72.75 51 25.75 5.4 13.5 26 13.37 3.2 7.25 Distributed Decision
crx 2 690 6 10 90.72 100 85.51 86.09 41 102.4 5.6 11.5 29 68.7 3.5 7.5 Tree
echo 2 132 11 2 97.3 100 85.13 93.24 3 5.5 3 3 2 3.25 2 2
h-d 5 303 13 1 78.55 100 66.67 68.08 67 89.4 10.4 25.5 34 45.2 5.7 13.25 Results and
hv-84 2 435 12 5 97.24 99.08 92.87 95.63 11 19.22 5.4 5 6 10.11 3.2 3 Discussion
hypo 2 3163 7 19 99.43 99.97 97.88 99.11 13 53.2 6.6 8 7 27.1 3.8 4.5 Results
krkp 2 3196 33 4 99.66 100 92.4 98.97 59 85.6 42.7 40 31 44.8 23.7 21.5 Discussion
pima 2 768 8 1 84.11 100 77.43 78.26 39 103.8 11 18.5 20 52.4 6 9.75
sonar 2 208 60 1 98.08 100 76.92 81.73 35 22 5.6 9.5 18 11.5 3.3 5.25 Summary
Yahoo! 2 1155124 10 1 88.22 89.77 96.47 88.19 45 3130 108.33 26.4 23 6259 54.66 13.7
Average* N.A. 92.79 99.70 83.76 86.93 34.60 54.49 9.93 17.97 18.70 29.59 5.67 10.25
Note: #C = number of Classes, #I = number of Instances, #N = number of Numeric attributes and #No = number of Nominal
attributes, *=average is over first 10 datasets only.

Table : Accuracy, Size of Tree an Number of leaves of Decision


Tree, Boosting Trees and Distributed Decision Tree over selected
datasets

17 / 35
DDT: Distributed
Results Decision Tree

Accuracy A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree

Results and
Discussion
Results
Discussion

Summary

Figure : Accuracy v/s Dataset

18 / 35
DDT: Distributed
Results Decision Tree

Size of Tree A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree

Results and
Discussion
Results
Discussion

Summary

Figure : Size of Tree v/s Algorithm

19 / 35
DDT: Distributed
Results Decision Tree

Number of Leaves A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree

Results and
Discussion
Results
Discussion

Summary

Figure : Number of Leaves v/s Algorithm

20 / 35
DDT: Distributed
Outline Decision Tree

A. Desai

Introduction
Need for Research

Introduction Distributed
Decision Tree
Need for Research Related Work
Working of
Distributed Decision
Tree

Results and
Distributed Decision Tree Discussion
Results
Related Work Discussion

Working of Distributed Decision Tree Summary

Results and Discussion


Results
Discussion

21 / 35
DDT: Distributed
Discussion Decision Tree

Accuracy A. Desai

Introduction
Need for Research

Distributed
Decision Tree
Related Work
I BT is almost accurate in predictions (avg: 99.7) Working of
Distributed Decision
I DT is close to the value of DDT and ST, except with Tree

bupa, h-d and pima Results and


Discussion
I bupa and pima has only few attributes, i.e. 7 and 9 Results
Discussion
respectively, and only one attribute of nominal type (i.e. Summary
class attribute itself) and all other attributes are nu
meric.
I Using h-d, no algorithm is not accurate because it has
high number of classes (five)

22 / 35
DDT: Distributed
Discussion Decision Tree

Size of Tree and Number of Leaves A. Desai

Introduction
Need for Research

Distributed
I DDT and ST outperformed DT and BT Decision Tree
Related Work
I implementation of DDT and ST considers small number Working of
Distributed Decision
of examples from each chunk Tree

Results and
I Average size of tree: DT, BT, DDT and ST 34.60, Discussion
Results
54.49, 9.93 and 14.7 respectively Discussion

I Average number of leaves: DT, BT, DDT and ST Summary

18.70, 29.59, 5.67 and 8.07 respectively


I echo and sonar produce smaller trees with less number
of leaves.
I That is due to less number of samples in the original
dataset.

23 / 35
DDT: Distributed
Discussion Decision Tree

Large Dataset (Yahoo!) A. Desai

Introduction
Need for Research
I BT takes very long time to build the model with Distributed
accuracy improvement of just 1% over DT Decision Tree
Related Work
I comparing size of the tree, BT and DT has a huge Working of
Distributed Decision
Tree
difference in size of trees
Results and
I in both cases, DT wins. Discussion
Results
I DDT and ST results: Discussion

Summary
I DDT improves accuracy drastically, with an increase in
learning time
I ST takes an advantage of using Spark, it builds the
model in just few seconds even with such a large
dataset, its accuracy is comparable to DT and BT and
comparatively less then DDT
I At the same time, the size of a tree and number of
leaves are comparatively lower in ST.

24 / 35
DDT: Distributed
Conclusion Decision Tree

A. Desai
I DDT and ST outperformed DT and BT in terms of size
Introduction
of tree and number of leaves with acceptable accuracy Need for Research

of classification Distributed
Decision Tree
Related Work
Working of
Distributed Decision
Tree

Results and
Discussion
Results
Discussion

Summary

25 / 35
DDT: Distributed
Conclusion Decision Tree

A. Desai
I DDT and ST outperformed DT and BT in terms of size
Introduction
of tree and number of leaves with acceptable accuracy Need for Research

of classification Distributed
Decision Tree
I Average accuracy of DT, BT, DDT and ST over all ten Related Work
Working of
selected datasets are 92.79, 99.70, 83.76 and 86.93 Distributed Decision
Tree

respectively. Results and


Discussion
Results
Discussion

Summary

25 / 35
DDT: Distributed
Conclusion Decision Tree

A. Desai
I DDT and ST outperformed DT and BT in terms of size
Introduction
of tree and number of leaves with acceptable accuracy Need for Research

of classification Distributed
Decision Tree
I Average accuracy of DT, BT, DDT and ST over all ten Related Work
Working of
selected datasets are 92.79, 99.70, 83.76 and 86.93 Distributed Decision
Tree

respectively. Results and


Discussion
Results
Table : % Reduction of sot and nol in DDT and ST wrt BT and Discussion

DT Summary

DDT ST
BT DT BT DT
sot 82% 71% 67% 48%
nol 81% 70% 65% 45%

25 / 35
DDT: Distributed
Conclusion Decision Tree

A. Desai
I DDT and ST outperformed DT and BT in terms of size
Introduction
of tree and number of leaves with acceptable accuracy Need for Research

of classification Distributed
Decision Tree
I Average accuracy of DT, BT, DDT and ST over all ten Related Work
Working of
selected datasets are 92.79, 99.70, 83.76 and 86.93 Distributed Decision
Tree

respectively. Results and


Discussion
Results
Table : % Reduction of sot and nol in DDT and ST wrt BT and Discussion

DT Summary

DDT ST
BT DT BT DT
sot 82% 71% 67% 48%
nol 81% 70% 65% 45%

I Even with large dataset DDT and ST is proven to


produce accurate results with a trade in learning time
25 / 35
DDT: Distributed
References I Decision Tree

A. Desai

[1] Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Appendix


References
simplified data processing on large clusters. Commun.
ACM 51, 1 (January 2008), 107-113.
DOI=http://dx.doi.org/10.1145/1327452.1327492
[2] Schapire, R. E. (1990). The strength of weak
learnability. Machine learning, 5(2), 197-227.
[3] Rajaraman, A., & Ullman, J. D. (2012). Mining of
massive datasets (Vol. 77). Cambridge: Cambridge
University Press.
[4] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard
Pfahringer, Peter Reutemann, Ian H. Witten (2009);
The WEKA Data Mining Software: An Update;
SIGKDD Explorations, Volume 11, Issue 1.

26 / 35
DDT: Distributed
References II Decision Tree

A. Desai

[5] Fan, W., Stolfo, S. J., & Zhang, J. (1999, August). The Appendix
References
application of AdaBoost for distributed, scalable and
on-line learning. In Proceedings of the fifth ACM
SIGKDD international conference on Knowledge
discovery and data mining (pp. 62-366). ACM.
[6] Cooper, J., & Reyzin, L. Improved Algorithms for
Distributed Boosting.
[7] Dai, W., & Ji, W. (2014). A mapreduce implementation
of C4. 5 decision tree algorithm. International Journal of
Database Theory and Application, 7(1), 49-60.
[8] Lazarevic, A., & Obradovic, Z. (2002). Boosting
algorithms for parallel and distributed learning.
Distributed and Parallel Databases, 11(2), 203-229.

27 / 35
DDT: Distributed
References III Decision Tree

A. Desai

Appendix
[9] Abualkibash, M., ElSayed, A., & Mahmood, A. (2013). References

Highly Scalable, Parallel and Distributed AdaBoost


Algorithm using Light Weight Threads and Web
Services on a Network of Multi-Core Machines. arXiv
preprint arXiv:1306.1467.
[10] Wu, G., Li, H., Hu, X., Bi, Y., Zhang, J., & Wu, X.
(2009, August). MReC4. 5: C4. 5 ensemble classification
with MapReduce. In ChinaGrid Annual Conference,
2009. ChinaGrid09. Fourth (pp. 249-255). IEEE.
[11] Purdila, V., & Pentiuc, . G. (2014). MR-Tree-A
Scalable MapReduce Algorithm for Building Decision
Trees. Journal of Applied Computer Science &
Mathematics, (16), 8.

28 / 35
DDT: Distributed
References IV Decision Tree

A. Desai
[12] Biswanath Panda, Joshua S. Herbach, Sugato Basu,
Appendix
and Roberto J. Bayardo. 2009. PLANET: massively References

parallel learning of tree ensembles with MapReduce.


Proc. VLDB Endow. 2, 2 (August 2009), 1426-1437.
DOI=http://dx.doi.org/10.14778/1687553.1687569.
[13] Palit, Indranil; Reddy, Chandan K., Scalable and
Parallel Boosting with MapReduce, in Knowledge and
Data Engineering, IEEE Transactions on , vol.24, no.10,
pp.1904-1916, Oct. 2012 doi: 10.1109/TKDE.2011.208.
[14] Jerry Ye, Jyh-Herng Chow, Jiang Chen, and Zhaohui
Zheng. 2009. Stochastic gradient boosted distributed
decision trees. In Proceedings of the 18th ACM
conference on Information and knowledge management
(CIKM 09). ACM, New York, NY, USA, 2061-2064.
DOI=http://dx.doi.org/10.1145/1645953.1646301.
29 / 35
DDT: Distributed
References V Decision Tree

A. Desai

[15] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Appendix


References
Leung. 2003. The Google file system. In Proceedings of
the nineteenth ACM symposium on Operating systems
principles (SOSP 03). ACM, New York, NY, USA,
29-43. DOI=http://dx.doi.org/10.1145/945445.945450
[16] Quinlan, J. R. (1987). Simplifying decision trees.
International journal of man-machine studies, 27(3),
221-234.
[17] Han, J., Kamber, M., & Pei, J. (2011). Data mining:
concepts and techniques: concepts and techniques.
Elsevier.
[18] Witten, I. H., & Frank, E. (2005). Data Mining:
Practical machine learning tools and techniques. Morgan
Kaufmann.

30 / 35
DDT: Distributed
References VI Decision Tree

A. Desai

Appendix
References
[19] Berry, M. J., & Linoff, G. (1997). Data mining
techniques: for marketing, sales, and customer support.
John Wiley & Sons, Inc.
[20] Quinlan, J. R. (1990). Decision trees and
decision-making. Systems, Man and Cybernetics, IEEE
Transactions on, 20(2), 339-346.
[21] Quinlan, J. R. (1996). Improved use of continuous
attributes in C4.5. Journal of artificial intelligence
research, 77-90.
[22] Quinlan, J. R. (1986). Induction of decision trees.
Machine learning, 1(1), 81-106.

31 / 35
DDT: Distributed
References VII Decision Tree

A. Desai

Appendix
[23] Bowyer, K. W., Hall, L. O., Moore, T., Chawla, N., & References

Kegelmeyer, W. P. (2000, October). A parallel decision


tree builder for mining very large visualization datasets.
In Systems, Man, and Cybernetics, 2000 IEEE
International Conference on (Vol. 3, pp. 1888-1893).
IEEE.
[24] Shafer, J., Agrawal, R., & Mehta, M. (1996,
September). SPRINT: A scalable parallel classi er for
data mining. In Proc. 1996 Int. Conf. Very Large Data
Bases (pp. 544-555).
[25] White, T. (2012). Hadoop: The definitive guide.
OReilly Media, Inc..

32 / 35
DDT: Distributed
References VIII Decision Tree

A. Desai

Appendix
References

[26] Yahoo! Webscope dataset


ydata-frontpage-todaymodule-clicksv1 0
[http://labs.yahoo.com/Academic Relations].

33 / 35
DDT: Distributed
Thank you for your attention! Decision Tree

A. Desai

Appendix
References

I Any Questions?

34 / 35
DDT: Distributed
Decision Tree

A. Desai

Appendix
References

I Contact Details
I Ankit Desai
email: [email protected]
I Sanjay Chaudhary
email: [email protected]

35 / 35

You might also like