Data Mining Based On Cloud-Computing Technology
Data Mining Based On Cloud-Computing Technology
Abstract. There are performance bottlenecks and scalability problems when traditional data-mining system is
used in cloud computing. In this paper, we present a data-mining platform based on cloud computing.
Compared with a traditional data mining system, this platform is highly scalable, has massive data processing
capacities ˈ is service-oriented, and has low hardware cost. This platform can support the design and
applications of a wide range of distributed data-mining systems.
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution
License 4.0 (http://creativecommons.org/licenses/by/4.0/).
MATEC Web of Conferences 61, 07015 (2016) DOI: 10.1051/ matecconf/2016 610 7015
APOP2016
2
MATEC Web of Conferences 61, 07015 (2016) DOI: 10.1051/ matecconf/2016 610 7015
APOP2016
data mining because of its huge storage capacity and function to merge with Value, and set the formation of a
computing ability of elastic changes. smaller Value,and each Reduce function call has only 0
Hadoop cloud computing framework is an open source or 1 value output. Each stage of the task execution are
distributed system architecture widely used[12]. Users can supporting fault tolerance.If one or more nodes appear
easily build private cloud platform. Because not needing error in the calculation of the process will be
to understand the development of distributed applications automatically re allocation of tasks to other nodes.
distributed the underlying details of the case, the user can This paper designs the data mining system based on
make full use of the ability of cluster computing and cloud computing technology,the overall structure as
high-speed storage. shown in Figure 3.The nodes in the system are divided
The current cloud computing data analysis and into two categories:MainCtrlNode and WorkNode.
processing widely use distributed development MainCtrlNode in the system consists of NameNode, data
framework for dealing with similar MapReduce. It can warehouse,JobTracker,SecondaryNameNode,data mining
execute in parallel massive data collection and analysis algorithms library.WorkerNode consists of Task-
tasks in a large number of PC machine.This model can Tracker,DataNode,which is responsible for actual storage
highly abstraction the complex operation in large -scale and computational work.NameNode manages file system
cluster parallel computing on process of to two functions: metadata,which is the main server of distributed file
Map and Reduce[12]. system and implement open,closed,operation,rename of
In the stage of Map,the Map/Reduce framework will the file system.DataNode is responsible for handling
split the input data into a large number of data customer read and write requests,to store the actual
segments,and each data fragment is assigned to a Map data,in accordance with the NameNode command,
task. Each Map task will be to its assigned to Key-Value performs the data block copy,delete,create work.We
to calculate, to generate an intermediate result,then all apply data mining to be used in the data set to uploaded
intermediate results with the same Key value of the Value to the data warehouse,NameNode will automatically
pass to the Reduce function after the calculation. block files and data redundancy storage to each
In the stage of Reduce,each Reduce task take the two DataNode. SecondaryNameNode assisted NameNode
tuples Key-Value as input.Two tuple will call the Reduce processing image files and transaction log.
MainCtrlNodede
Data
mining
algorithms
library Data warehouse
NameNode SecondaryNameNode
JobTracker
WorkerNoden
WorkerNode1 WorkerNode2
… TaskTracker DataNode
TaskTracker DataNode TaskTracker DataNode
Fig 3 The overall architecture of data mining system diagram based on cloud computing technology
This paper designs the data mining system based on system and implement open,closed,operation,rename of
cloud computing technology,the overall structure as the file system. DataNode is responsible for handling
shown in Figure 3.The nodes in the system are divided customer read and write requests,to store the actual
Into two categories: MainCtrlNode and WorkNode. data,in accordance with the NameNode command,
MainCtrlNode in the system consists of NameNode,data performs the data block copy,delete,create work.We
warehouse,JobTracker,SecondaryNameNode,data mining apply data mining to be used in the data set to uploaded
algorithms library.WorkerNode consists of Task- to the data warehouse,NameNode will automatically
Tracker,DataNode,which is responsible for actual storage block files and data redundancy storage to each
and computational work.NameNode manages file system DataNode. SecondaryNameNode assisted NameNode
metadata,which is the main server of distributed file processing image files and transaction log.
3
MATEC Web of Conferences 61, 07015 (2016) DOI: 10.1051/ matecconf/2016 610 7015
APOP2016
6 Conclusion
Massive data information and powerful computing and
data processing capabilities of cloud computing provide
powerful support for data mining. Through the analysis
of the data mining and the cloud computing technology,
this paper proposes the architecture of data mining
platform based on cloud computing,for enterprise and
individual user data mining task provides a good solution.
References
1. J Han,M Kamber. Data mining concepts
andtechniques[M].Third Edition.San Francisco,
CA,USA:Morgan Kaufmann Publishers,2012.
2. Shao feng-jing,Yu zhong-qing. Principle and
algorithm of data mining[M].Beijing: Science
Press,2009.
3. Shang Lin,Luo Bin. A data mining system based on
Data Warehouse Framework[J]. Application
Research of computers,2000,17(9):63-65.
4. Yang Yong,Dong zhen-jiang,Lu Ping. With the
characteristics of cloud computing service delivery
platform and its key technology research[J]. ZTE
Communications,2011,17(5):55-57.
5. Wu zhu-hua. The analysis of the core technology of
cloud computing[M].Bei Jing: People's Posts and
Telecommunications Press,2011.
6. Mell P,Grance TˊThe NIST Definition of Cloud
Computing ˷ R ˹ .Gaithersburg,MD: National
Institute of Standards andTechnology,2011ˊ
7. Zhang jian-xun,Gu zhi-ming,Zheng chao. Review on
research progress of cloud computing.
2010,27(2)˖429-433.
8. Chen Quan,Deng qian-ni. Cloud computing and its
key technology[J]. The computer
applications,2009,29(9):2562-2567.
9. Li jian-jiang,Cui jian,Wang pin. MapReduce parallel
programming model of review[J]. Chinese Journal of
Electronics,2011(11):2635-2642.
10. Wang yi-jie,Sun wei-dong,Zhou Song. The key
technology of distributed storage in cloud computing
environment[J]. Journal of software,2012,23(4):962.
11. Wang Cong,Wang cui-rong,Wang xing-wei. The
design of data center network architecture for Cloud
Computing[J]. Research and development of
computer,2012,49(2):286-293.
12. Hang He,Yi xiao-dong,Li shan-shan. Realization and
evaluation of massive data processing platform for
high performance computer[J]. Research and
development of computer,2012,49:357-361.