0% found this document useful (0 votes)
48 views4 pages

Data Mining Based On Cloud-Computing Technology

This document discusses using cloud computing technology to improve the efficiency of data mining algorithms on large datasets. It proposes a data mining platform based on cloud computing that is highly scalable, can process massive amounts of data, is service-oriented, and has low hardware costs. The platform would allow for distributed data mining systems to be designed and applications developed. Key aspects of using cloud computing for data mining include distributed computing, parallel computing, distributed file storage, and efficient algorithms for classification, clustering, and association rule mining.

Uploaded by

Khushi Hanuman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views4 pages

Data Mining Based On Cloud-Computing Technology

This document discusses using cloud computing technology to improve the efficiency of data mining algorithms on large datasets. It proposes a data mining platform based on cloud computing that is highly scalable, can process massive amounts of data, is service-oriented, and has low hardware costs. The platform would allow for distributed data mining systems to be designed and applications developed. Key aspects of using cloud computing for data mining include distributed computing, parallel computing, distributed file storage, and efficient algorithms for classification, clustering, and association rule mining.

Uploaded by

Khushi Hanuman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

MATEC Web of Conferences 61, 07015 (2016) DOI: 10.

1051/ matecconf/2016 610 7015


APOP2016

Data Mining Based on Cloud-Computing Technology

Ren Ying1, Lv Hong1, Li Hua-wei2, Zhou Li-jun1 , Wang Li-na1


1
Naval Aeronautical and Astronautical University,Yantai 264000,China;
2
Shan dong Business Institute,Shandong Yantai 264001,China)

Abstract. There are performance bottlenecks and scalability problems when traditional data-mining system is
used in cloud computing. In this paper, we present a data-mining platform based on cloud computing.
Compared with a traditional data mining system, this platform is highly scalable, has massive data processing
capacities ˈ is service-oriented, and has low hardware cost. This platform can support the design and
applications of a wide range of distributed data-mining systems.

1 Submitting the manuscript


With the rapid development of mobile Internet and the Software as a Software license and to
Internet of things, huge amounts of data are produced in service(Saas) provide the service
every minute. Data has penetrated into each field of
industry and business functions. In the age of big data,if
we want to excavate implicit useful information from Platform as a IT application
nonholonomic, massive,noise and random data,wo must
improve the efficiency data mining algorithm. Cloud service(Paas) requirements
computing is to provide dynamic resource,virtualization
and high available computing platform.Cloud computing
into data mining can solve the efficiency problem of Software as a The IT application
massive data mining. service(Iaas) infrastructure

Figure 1. The service model of cloud computing


2 Data mining and cloud computing
technology Cloud computing technology requirements to put
[1-3] compute nodes and storage nodes together.Task
Data mining is the technology finding the valuable
scheduling assigns and executes tasks on the preservation
information from large amounts of data by the analysis of
of equipment corresponding input file blocks as far as
the data.The data mining process base on cloud
possible. This method makes the most of the parallel
computing technology is basic consistent with the
tasks to read the input data on the local machine,
traditional data mining ,which are made up of data
reducing the network data flow effectively[7].
preparation,data mining,evaluation results of three stage
Distributed computing is one of the effective means to
and explain the composition.With the information age
solve the mass data mining tasks and improve the mass
development resulting from the "big" data,data mining
data mining[8]. Cloud computing platform provides a
tasks will bring forth the new through the old,to emphasis
distributed file storage and parallel computing
on a large database of efficient and scalable data mining
ability,which is a good solution to the distributed memory
technology.
contained in distributed computing and parallel
Cloud computing[4-5]distributed the tasks to a large
computing in two levels of content[9]. A good framework
number of computers resources pool,so that all
for the construction of cloud computing data mining
applications can access computing power, storage space
platform core support ability of distributed file system
and information service according to the needs of. At the
and distributed parallel computing[10].
same time, cloud computing is the development of
The popular distributed file system has Google file
distributed computing and grid computing, parallel
system (GFS), distributed file system(HDFS),the file
computing[6].Cloud computing usually consists of the
system(KFS),which can effectively solve the problem of
following 3 levels of service: Saas,Iaas,Paas. Service
massive data storage.
model as shown in Figure 1.

© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution
License 4.0 (http://creativecommons.org/licenses/by/4.0/).
MATEC Web of Conferences 61, 07015 (2016) DOI: 10.1051/ matecconf/2016 610 7015
APOP2016

3 Cloud based data mining algorithm


Begin
The algorithm of data mining is the soul,only the most
efficient data mining algorithm in order to better Create a mining task
accomplish the task of data mining. But because of a
variety of data mining algorithms, there are also many
data types, the requirements of different types of data Set the data header files and algorithm
mining algorithm is not the same.The most commonly parameters
used data mining algorithm has the following categories.

Start mining task


3.1. Classification algorithm
The main purpose of the classification algorithm is based The results of query and display
on the existing data sets for mining to find the other data,
and analysis of existing data sets and the discovery of
new data, and then find the principle of data classification. End
This principle can be used to classify the data after
adding. Classification algorithms are suitable for
Fig 2 Parallel data mining algorithms perform general
relational data consisting of tuple.
process

3.2 Cluster analysis


The main purpose of cluster analysis is find meaningful
data distribution pattern of new from the potential
data.The process is that the existing data is not specified 4 Based on the data mining system
grouping rules in advance,while is divided into different based on Cloud Computing
groups to miningn accordance with the data itself
characteristics. Cluster analysis is also used for relational The data mining system based on cloud computing is
data consisting of tuple. built on the "cloud", transparent providing interface
services for a variety of terminal users,providing an open
3.3 Association rules interface for program based on for the development of the
system.The user can use indirect the various service by
The main purpose of the association rules is to find the calling the open interface provided by system.The user
interesting association or correlation between sets of does not need to know how the system is to achieve, no
items in large amounts of data. Association rules for data need to worry about the computing and storage capacity
type is relatively more,mainly is suitable for the of the system,only need to select the appropriate
transaction type, transaction type and relation type algorithm to process data,and ultimately to the way the
data.Association rule best suited for handling the variable task for system deployment area executed to get the data
type is Boolean and numerical type. mining results[11].
Parallel mining algorithm is one of the key The data mining system based on cloud computing can
technologies which can effectively use the basic ability adopts on-demand pay way.Enterprises or individuals for
provided by loud computing platform.The general a service can be directly through this platform to
process of parallel data mining algorithms such as shown obtain,which do not have to buy expensive software.The
in figure 2. data are mostly stored in storage cloud after the arrival of
the cloud Era so that the mining tool based cloud
computing platform based data become possible.

5 Cloud computing system architecture


based on Data Mining
Cloud is a computational model based on the Internet,
public participation, whose computational resources
include computing power, storage capacity expansion and
is virtualization, and is to provide services to the user.
With the massive data increasing, diversification, and
personalized data mining to strong demand,the traditional
centralized data mining methods cannot adapt. Cloud
computing become efficient way to solve the problem of
massive

2
MATEC Web of Conferences 61, 07015 (2016) DOI: 10.1051/ matecconf/2016 610 7015
APOP2016

data mining because of its huge storage capacity and function to merge with Value, and set the formation of a
computing ability of elastic changes. smaller Value,and each Reduce function call has only 0
Hadoop cloud computing framework is an open source or 1 value output. Each stage of the task execution are
distributed system architecture widely used[12]. Users can supporting fault tolerance.If one or more nodes appear
easily build private cloud platform. Because not needing error in the calculation of the process will be
to understand the development of distributed applications automatically re allocation of tasks to other nodes.
distributed the underlying details of the case, the user can This paper designs the data mining system based on
make full use of the ability of cluster computing and cloud computing technology,the overall structure as
high-speed storage. shown in Figure 3.The nodes in the system are divided
The current cloud computing data analysis and into two categories:MainCtrlNode and WorkNode.
processing widely use distributed development MainCtrlNode in the system consists of NameNode, data
framework for dealing with similar MapReduce. It can warehouse,JobTracker,SecondaryNameNode,data mining
execute in parallel massive data collection and analysis algorithms library.WorkerNode consists of Task-
tasks in a large number of PC machine.This model can Tracker,DataNode,which is responsible for actual storage
highly abstraction the complex operation in large -scale and computational work.NameNode manages file system
cluster parallel computing on process of to two functions: metadata,which is the main server of distributed file
Map and Reduce[12]. system and implement open,closed,operation,rename of
In the stage of Map,the Map/Reduce framework will the file system.DataNode is responsible for handling
split the input data into a large number of data customer read and write requests,to store the actual
segments,and each data fragment is assigned to a Map data,in accordance with the NameNode command,
task. Each Map task will be to its assigned to Key-Value performs the data block copy,delete,create work.We
to calculate, to generate an intermediate result,then all apply data mining to be used in the data set to uploaded
intermediate results with the same Key value of the Value to the data warehouse,NameNode will automatically
pass to the Reduce function after the calculation. block files and data redundancy storage to each
In the stage of Reduce,each Reduce task take the two DataNode. SecondaryNameNode assisted NameNode
tuples Key-Value as input.Two tuple will call the Reduce processing image files and transaction log.
MainCtrlNodede

Data
mining
algorithms
library Data warehouse

NameNode SecondaryNameNode
JobTracker

WorkerNoden
WorkerNode1 WorkerNode2

… TaskTracker DataNode
TaskTracker DataNode TaskTracker DataNode

Fig 3 The overall architecture of data mining system diagram based on cloud computing technology
This paper designs the data mining system based on system and implement open,closed,operation,rename of
cloud computing technology,the overall structure as the file system. DataNode is responsible for handling
shown in Figure 3.The nodes in the system are divided customer read and write requests,to store the actual
Into two categories: MainCtrlNode and WorkNode. data,in accordance with the NameNode command,
MainCtrlNode in the system consists of NameNode,data performs the data block copy,delete,create work.We
warehouse,JobTracker,SecondaryNameNode,data mining apply data mining to be used in the data set to uploaded
algorithms library.WorkerNode consists of Task- to the data warehouse,NameNode will automatically
Tracker,DataNode,which is responsible for actual storage block files and data redundancy storage to each
and computational work.NameNode manages file system DataNode. SecondaryNameNode assisted NameNode
metadata,which is the main server of distributed file processing image files and transaction log.

3
MATEC Web of Conferences 61, 07015 (2016) DOI: 10.1051/ matecconf/2016 610 7015
APOP2016

6 Conclusion
Massive data information and powerful computing and
data processing capabilities of cloud computing provide
powerful support for data mining. Through the analysis
of the data mining and the cloud computing technology,
this paper proposes the architecture of data mining
platform based on cloud computing,for enterprise and
individual user data mining task provides a good solution.

References
1. J Han,M Kamber. Data mining concepts
andtechniques[M].Third Edition.San Francisco,
CA,USA:Morgan Kaufmann Publishers,2012.
2. Shao feng-jing,Yu zhong-qing. Principle and
algorithm of data mining[M].Beijing: Science
Press,2009.
3. Shang Lin,Luo Bin. A data mining system based on
Data Warehouse Framework[J]. Application
Research of computers,2000,17(9):63-65.
4. Yang Yong,Dong zhen-jiang,Lu Ping. With the
characteristics of cloud computing service delivery
platform and its key technology research[J]. ZTE
Communications,2011,17(5):55-57.
5. Wu zhu-hua. The analysis of the core technology of
cloud computing[M].Bei Jing: People's Posts and
Telecommunications Press,2011.
6. Mell P,Grance TˊThe NIST Definition of Cloud
Computing ˷ R ˹ .Gaithersburg,MD: National
Institute of Standards andTechnology,2011ˊ
7. Zhang jian-xun,Gu zhi-ming,Zheng chao. Review on
research progress of cloud computing.
2010,27(2)˖429-433.
8. Chen Quan,Deng qian-ni. Cloud computing and its
key technology[J]. The computer
applications,2009,29(9):2562-2567.
9. Li jian-jiang,Cui jian,Wang pin. MapReduce parallel
programming model of review[J]. Chinese Journal of
Electronics,2011(11):2635-2642.
10. Wang yi-jie,Sun wei-dong,Zhou Song. The key
technology of distributed storage in cloud computing
environment[J]. Journal of software,2012,23(4):962.
11. Wang Cong,Wang cui-rong,Wang xing-wei. The
design of data center network architecture for Cloud
Computing[J]. Research and development of
computer,2012,49(2):286-293.
12. Hang He,Yi xiao-dong,Li shan-shan. Realization and
evaluation of massive data processing platform for
high performance computer[J]. Research and
development of computer,2012,49:357-361.

You might also like