0% found this document useful (0 votes)
27 views

Data Sets Preparing For Data Mining Analysis by SQL Horizontal Aggregation

The document discusses using horizontal aggregation to prepare data sets for data mining analysis by transforming rows into columns using SQL queries. It proposes three horizontal aggregation operators - SPJ, PIVOT and CASE - to automatically generate SQL code for obtaining data in a horizontal format suitable for data mining. This approach aims to reduce the manual effort required compared to traditional aggregation functions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Data Sets Preparing For Data Mining Analysis by SQL Horizontal Aggregation

The document discusses using horizontal aggregation to prepare data sets for data mining analysis by transforming rows into columns using SQL queries. It proposes three horizontal aggregation operators - SPJ, PIVOT and CASE - to automatically generate SQL code for obtaining data in a horizontal format suitable for data mining. This approach aims to reduce the manual effort required compared to traditional aggregation functions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

ISSN (Online): 2349-7084

GLOBAL IMPACT FACTOR 0.238


INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING IN RESEARCH TRENDS VOLUME 1, ISSUE 4, OCTOBER
2014, PP 225-229

Data sets preparing for Data mining analysis by


SQL Horizontal Aggregation
1 2 3 4
V.Nikitha , P.Jhansi , K.Neelima , D.Anusha
Department Of IT, G.Pullaiah College of Engineering and Technology. Kurnool
JNTU Anatapur, Andhra Pradesh, India

Abstract: - Data mining is essentially employed in getting ready information sets for data processing analysis. However it's
most time overwhelming method. It needs great deal of manual effort. Data processing is essentially used domain for obtaining
the patterns from historical info or keep info. Great deal of effort is needed to arrange datasets that will be input for data
processing algorithmic program. As we tend to have already got some aggregation operate easy lay, MIN, SUM, COUNT, AVG
that aren't economical for creating datasets in data processing analysis. This combination operate have disadvantage as they
come single price single price per mass cluster in this table. In data processing analysis when we tend to needs information in
horizontal layout that prong we need arduous effort. therefore we tend to are developing straightforward however powerful tool
to urge SQL code to come combined columns in horizontal layout type, that returns cluster of varieties rather than one number
per row. This new cluster of tool or operate is claimed to be horizontal aggregation. From third queries we'll get output
information that is appropriate for varied data processing operations. It means that this paper provides horizontal aggregation
victimization some constructs that embody SQL queries. Here we tend to are victimization 3 functions that is Grouping column,
Horizontal column, combination column. Users need to provide this as input. So user gets the output that is appropriate for data
processing analysis.

Keywords:- PaaS, Private Cloud, Middleware, load balancing, resumption of work, E-learning.

1. INTRODUCTION functions on tables and might come through the output


in vertical and horizontal format[6].This paper describes
Data mining is that the tool that is employed to extract 3 horizontal aggregation operators these area unit
the helpful data within the variety of datasets. During a SPJ,PIVOT and CASE.SPJ aggregation is exploitation the
knowledgebase data gift within the normalized format. quality SQL constructs, that area unit choice, projection
Thus immense quantity of effort needed to organize short and joins. Its the set of SQL operations. PIVOT operator
summarized information sets as a input for data is made in operator in some relative operator and its
processing formula. Most of the formula needs accustomed remodel the rows into columns. CASE
information sets in horizontal layout format, that isn't gift methodology will be performed by combining cluster by
in offered information. Thats the matter in models and case statement [9].Using this we offer the condition.
likeclustering, classification, regression and varied Thus we tend to provide some extension to traditional
alternative algorithms. Different analysis areas uses functionalities to CASE, PIVOT SPJ to get lead to
varied thought to clarify information sets. This paper horizontal layout. Weve redicted this methodology of
represents a brand new cluster of combination perform horizontal aggregation is advanced and not economical
that user could used. to make information sets in to organize information set and this is often difficult
horizontal format. This helps automation in SQL code drawback. Therefore we tend to introduced totally
writing and extention.In existing SQL capability. In data different strategy for his or her economical analysis. Its
processing formula input is needed within the variety of helpful to organize information sets in horizontal layout
table. Extra effort is needed for computer database to format
predict the info in classified type. For getting the small
print of specific application for any analysis information
is needed in demoralized format. Using the standard SQL
queries users able to perform varied aggregation

IJCERT2014 225
ww.ijcert.org
ISSN (Online): 2349-7084
GLOBAL IMPACT FACTOR 0.238
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING IN RESEARCH TRENDS VOLUME 1, ISSUE 4, OCTOBER
2014, PP 225-229
troublesome to induce the information sets once there are
sizable amount of rows gift in Database. With
consideration of all this limitations, we introduce a
replacement methodology of mixture functions that
mixture numeric values of given expression and
transpose rows into column therefore to convey
horizontal format output. Horizontal aggregation some
form of extension in existing SQL aggregation. Ancient
aggregation returns the only worth per row however
horizontal aggregation returns the set of values [8].
.
Fig 1. Data integration models Architecture

2. MOTIVATION
As horizontal aggregation will turn out output
knowledge sets that are helpful in varied real time
applications however it takes more time to convey AN
output in needed format. This task needs massive and
sophisticated SQL code that is incredibly difficult to
remember and it needs large manual effort. There are 2
vital ways utilized in SQL code: these are JOINS and
AGGREGATIONS.Aggreagtion is generally wont to get
the information sets in summarized type. So we tend to
directly head to introduce aggregation [2].Aggregation is
outlined as assortment or gathering of things along,
Fig 2. PIVOT architecture
thought-about as an entire. Oracle provides variety of
predefined mixture functions like Georgia home
3. AGGREGATION
boy,MIN, SUM,AVG,COUNT for performing a
Database is nothing however the gathering of huge
operations on information and among this total perform
quantity of information. To extract the relevant info or
is generally used. Aggregate functions Georgia home boy
information from numerous kinds of sources structured
is employed to come most worth.MIN perform come
command language is employed. Mainly the SQL is
minimum worth.AVG is employed to come average of
employed in aggregation of huge quantity of
values. COUNT is employed to count the amount of
information. Aggregation is employed to mix or
rows. There are sure limitations in making ready the
combination rows over variety of columns. Various
information sets exploitation aggregation perform for
aggregation functions are wont to gain info in
data processing analysis. Commonly the information sets
summarized kind. Simply it's assortment of many things
keep in on-line database. Comes from real time or on-line
cluster along take into account as whole. In general
dealing process systems. Where information tables are
direction AN aggregation perform may be a perform
gift in extremely normalized type. But varied data
wherever the worths of multiple rows are classified
processing, machine learning and applied mathematics
along as input on bound criteria to create one value of
algorithms needs knowledge in summarized format.
additional vital which means or measuring like a group,
When user needs knowledge in horizontal tabular format
a bag, or list.
an outsized quantity of effort is need exploitation current A. Vertical Aggregation
accessible functions in SQL.User dont get the output for Normal SQL aggregation is same as vertical aggregation.
data processing formula. Such endeavor is attributable to In vertical aggregation results predict within the variety
great deal of SQL code and its quality. There are other of vertical layout. Result of vertical aggregation contains
problems to get mixture functions in horizontal layout. additional variety of rows.
Some OLAP tools are wont to transpose the result. This B. Horizontal Aggregation
generally same to be PIVOT.PIVOT is additional useful if In Horizontal Aggregation result's manufacture in
it will offer the facilities of aggregating and transposing horizontal layout. To represent output in horizontal
the rows into column combined along. It is terribly

IJCERT2014 226
ww.ijcert.org
ISSN (Online): 2349-7084
GLOBAL IMPACT FACTOR 0.238
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING IN RESEARCH TRENDS VOLUME 1, ISSUE 4, OCTOBER
2014, PP 225-229
format tiny syntax extensions to combination perform is represent the direct aggregation method: Horizontal
needed. In distinction, we tend to decision normal SQL aggregation queries to beat those issues in existing
aggregation vertical aggregation since they manufacture system, we are going for our planned horizontal
tables with vertical layout[6].The problem of horizontal aggregations which offer many distinctive options and
aggregation variety of column might exceed than the advantages. This gives North American nation pattern to
allowed variety of column of DBMS.That means reaching come up with SQL code from this methodology. this
most the variety of maximum column name length once gives North American nation SQL code while not
column are mechanically named. To elaborate on this, the writing, minimize them and to check them whether or
not it's correct or not[9].
4. LITERATURE SURVEY C.PIVOT methodology
PIVOT operator that may be a inherent operator in a
A.SPJ Method number of the DBMS. This methodology will rework
Left outer be a part of queries are wont to be a part of all rows into columns which is knowns as transposition that
the projected tables SPJ methodology will manufacture indirectly helps to provide the output in horizontal form
tables in horizontal layout an Optimized SPJ The PIVOT methodology principally need to see what
methodology will manufacture additional economical percentage columns are required to store the backward
result. The performance of SPJ approach is incredibly low relation and it used with the cluster BY clause. We tend
once there's large number of rows .This can perform to cannot use single PIVOT operator for that we've got to
aggregation with the assistance of basic SQL quries.This use CASE and SPJ methodology. PIVOT operator is
is easier to support by any information. The SPJ employed with normal choose statement by exploitation
methodology is fascinating from a theoretical purpose of tiny syntax extention.PIVOT operator is perform well
read as a result of its supported SPJ methodology is even if the dataset is incredibly giant. The most important
predicated on relative operators. Solely the most purpose advantage of PIVOT operator is that it will solve the
is to form a table with vertical aggregation for every higher limit limitation of DBMS.
result column and so mix all those tables to provide
horizontal table. We combination table exploitation 5. EXISTING SYSTEM
choose, project, join and aggregation queries. We tend to Existing system contains SPJ, CASE, and PIVOT
SPJ methodology for traditional relative pure operators. Using SPJ, CASE, PIVOT we will get the lead
mathematics operator. We can use left outer be a part of, to horizontal layout format however solely SPJ or CASE
right outer be a part of an inner outer join[9].Relational user cannot use it desires PIVOT operator for
operators solely. The idea is to form one table with a transposition. And code for PIVOT is goodbye and
vertical aggregation for every result column and be a part laborious thus it not economical for data processing
of all those tables to provide horizontal aggregation [6]. algorithms and it's time intense task. In existing system to
B.CASE methodology making a knowledge set for analysis is usually needs
For this methodology we tend to use the case longer in an exceedingly data processing project, it
programming construct that are gift in SQL.case provides desires several advanced SQL queries, joining tables and
North American nation a number of the values on the aggregating columns thus it becomes a awfully time
idea of condition from a group of values supported intense task. Existing SQL aggregations have some bound
Boolean expressions and come back values from the limitations to arrange information sets in data processing
chosen set of values. CASE statement place the result to as a result of they return just one column per collective
NULL once there's no matching row is found. This cluster. In Existing SQL aggregations a big manual effort
conjointly manufacture resultant table in horizontal is needed to create information sets, wherever a
layout. we tend to manufacture 2 basic sub-categories to horizontal layout is needed.
work out FH[6]. in an exceedingly similar thanks to
SPJ,the first one directly aggregates from F and also the
other computes the vertical aggregation in an
exceedingly temporary table FV and so horizontal
aggregations are foretold from vertical aggregation table
i.e. FV. CASE methodology will be performed by cluster
BY and condition statement. It is additional economical
and wide relevancy. CASE statement. We tend to

IJCERT2014 227
ww.ijcert.org
ISSN (Online): 2349-7084
GLOBAL IMPACT FACTOR 0.238
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING IN RESEARCH TRENDS VOLUME 1, ISSUE 4, OCTOBER
2014, PP 225-229
7. CONCLUSION
We projected a replacement technique of extended
mixture functions i.e. extension to straightforward
aggregation operate known as horizontal aggregations
which offer economical means for getting ready
information sets which might tend as a input for varied
data mining algorithms. Output table with horizontal
layout is a lot of economical for making information sets
as largely needed for data processing analysis.
Unremarkably this approach of horizontal aggregation is
giving output as set of numbers rather than one record
for every cluster. We tend to analyzed 3 question analysis
ways. The primary one SPJ is predicated on normal
relative operators. The second approach of CASE is
Fig3: Existing System predicated on the SQL CASE construct. The third
approach PIVOT is nothing however; it's an intrinsic
Suppose we have relations R1....Rk.Then using CASE, SPJ operator I.e. present in a number of an advertisement
and PIVOT we can compute vertical tabular form. Using software package that's not sometimes obtainable. The
CASE and SPJ we cannot easily get table in horizontal SPJ technique consists of choice. Projection and be a part
format. It needs PIVOT operator for transposition. of queries. CASE construct employed by combining
Disadvantages GROUP-BY and CASE statements. We tend to tested that
1. Existing SQL aggregations have limitations to prepare these all the 3 strategies giving constant result. Our
data sets. projected horizontal aggregations are used as an info
2. To return one column per aggregated group technique to mechanically generate economical SQL
3. Manual effort is required to build data sets. queries with 3 sets of parameters: grouping columns,
4. Disadvantage is that vertical aggregation increase the Horizontal columns and mass column. The info obtained
number of rows and columns. Thus increases the from horizontal aggregation is analyzed with the
complexity. assistance of aggregating column, grouping column,
horizontal column and generate the output. This paper
6. PROPOSED SYSTEM gift the horizontal aggregation through some technique
To overcome those problems in existing system, we are like SPJ, CASE and PIVOT technique.
going for our proposed horizontal aggregations which
provide several unique features and benefits. It REFERENCES
represents a pattern to create SQL code from this method. [1]. PradeepKumar, Dr.R.V.Krishnaiah, IEEE,
We get SQL code without writing the code, minimizing Horizontal Aggregations in SQL to Prepare Data Sets for
them and to test whether it is correct Data Mining Analysis vol.2, ISSN: 2278-0661, ISBN:
2278-8727 Volume 6,PP 36-41,Nov - Dec. 2012.
[2]. Mohd Abdul Samad,Md. Riazur Rahman,Syed
Zahed,Mohd Abdul Fattah, International Journal of
Computer Applications in Engineering Sciences,
Creation of Datasets for Data Mining Analysis by Using
Horizontal Aggregation in SQL VOL III,ISSN: 2231-
4946, pp.46-51, March.2013
[3]. Karana Hanirex.D,Durka.C,International Journal of
Advanced Research in Computer Science and Software
Engineering, An Efficient Approach for Building
Dataset in Data Mining Volume 3,ISSN: 2277,pp.156-
160,128X Issue 3, March 2013.
[4]. Carlos Ordonez, Zhibo Chen, University of
HoustonHorizontal Aggregations in SQL to Prepare
Fig.2: Proposed System Data Sets for Data Mining Analysis, pp.1-14.

IJCERT2014 228
ww.ijcert.org
ISSN (Online): 2349-7084
GLOBAL IMPACT FACTOR 0.238
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING IN RESEARCH TRENDS VOLUME 1, ISSUE 4, OCTOBER
2014, PP 225-229
[5]. Carlos Ordonez and Zhibo Chen, IEEE,Horizontal
Aggregations in SQL to Prepare Datasets for Data
MiningAnalysis VOL. 24, NO. 4, pp.678-691, APRIL
2012.
[6]. Mr.Prasanna M.Rathod Prof. Mrs. Karuna G.
Bagde,IJARCETWorkload Optimization by Horizontal
Aggregation in SQL for Data Mining Analysis Volume
1, pp.144-147, Issue 8,October 2012.
[7]. Mrs Krishna Veni,Mr Ranjith Kumar
K,Int.J.Computer Technology Applications,, PREPARE
DATASETS FOR DATA MINING ANALYSIS BY USING

IJCERT2014 229
ww.ijcert.org

You might also like