2 - Unms - Building A Data Warehouse To Support Active Student Management Analysis and Design PDF
2 - Unms - Building A Data Warehouse To Support Active Student Management Analysis and Design PDF
Claresta Vasthi
Information Systems Department,
School of Information Systems
Bina Nusantara University
Jakarta, Indonesia 11480
[email protected]
Abstract— Data analysis for the number of active students in a XYZ University is a private university which spreads in
university is very important. It is required for universities to several locations, one of them located in DKI Jakarta. Each year
comply with the regulation of the ministry of research, technology, XYZ University has produced graduates and has increased the
and higher education of the Republic of Indonesia, Number 32 of number of students. That way the amount of data owned will
2016 on the accreditation of study programs and universities. increase and affect in terms of analyzing important information.
However, there are many difficulties in analyzing active student Until now to analyze student data especially about total active
reports, additional and ad hoc reports, and the need for business student, top management spend a lot of time to gain structured,
intelligence and data mining development. The purpose of this accurate, and complete information. This is because the data
research is to analyze and design data warehouse to integrate
source used to generate information obtained from several
various operational databases needed to provide information
about active students at XYZ University. The method of analysis
different databases. Active student data is needed to support the
is done by running system analysis and analysis of the running strategic decision-making process.
system weaknesses. While the data warehouse design method uses To overcome these problems, it is necessary to invest in
4 stages (Four-Step Methodology) used by Ralph Kimball in technology which capable of managing large amounts of data
designing a data warehouse. The stages are selecting the business and could perform effective analysis so that data in the
process, declaring grain, identifying the dimensions, and organization can be processed into valuable information for
identifying facts. The results achieved are the design of data competitive advantage and support the needs of long-term
warehouse and dashboard that will provide relevant and
information [1]. This technology called data warehouse. Data
integrated information about active students that can be viewed
from different angles. Designed data warehouses are needed to
warehouse is an analytical database which can support decision
help organizations to analyze information as needed and help making process. Various databases within the organization
management to make strategic decisions. The conclusion is that could be integrated into data warehouse that can provide user
with the build of a data warehouse to support active student convenience to perform data analysis. Data which already
management can help the university to analyze active student and integrated inside data warehouse can be utilized to present the
make decisions in the student area. information that can be reviewed from various dimensions and
can be adjusted the level of data details [2]. Besides that, data
Keyword— data warehouse, dashboard, four-step methodology, warehouse is also a source for Business Intelligence and data
strategic mining.
In previous research, the student retention rate is one of the
I. INTRODUCTION biggest warnings of universities. To address this problem, some
Currently information is urgently needed in a whole life universities implement data mining to determine the variables
aspect. Information has already been a part of important need for correlated to student retention. These universities build a data
the future life development. But the increased need for warehouse to aggregate the data needed to overcome the
information is not accompanied by timely and accurate challenge. [3] [4] [5][6].
information presentation, not infrequently the information is still To enhance decision-making capabilities, business
to be traced in depth from the large amount of data. Information organizations have implemented data warehouses. As a
technology plays an important role to help alignment between prerequisite, a data warehouse analysis and design framework is
strategies, processes, and technologies that can enhance an needed, specifically for data mining and business reporting
organization’s competitiveness.
978-1-5386-5821-5/18/$31.00 ©2018 IEEE 3-5 September 2018, Bina Nusantara University, Jakarta, Indonesia
2018 International Conference on Information Management and Technology (ICIMTech)
Page 460
purposes, for a better and more thorough data analysis of the Physical modelling, which a mapping from transformation
university in the future [7][8][9] . result into table on data [17].
The research questions are how to analyze and design a
metadata from 3 operational databases and how to build a data III. RESULT AND DISCUSSION
warehouse to support active student management.
To obtain an overview from current conditions and
The scope of this research is the build a data warehouse to problems facing the organization, appropriate data and
support active student. The purpose of this research is the built a information are needed for analysis and design of data
data warehouse for analyzing active student. The benefit of this warehouse [18]. Therefore, several researches were conducted
research to include producing analytic data and dashboard for
to obtain information related to active students at XYZ
deciding.
University, such as Literacy study related to data warehouse,
This paper consists 4 sections such as introduction, study conduct interview with specific users, observe and analyze
literatures, result and discussion, and conclusions. First section report generated from operational databases, analyzing current
is background, previous research, research questions, scopes, running systems, analyze the weakness from running systems,
objectives, and benefits. The second and the third section are and problem-solving analysis.
encompassing theories that support the design of a data
warehouse for active student. In designing data warehouse at XYZ University, the data
warehousing design method used is based on the Kimball
II. STUDY LITERATURE methodology. Where this method consists of 4 stages [19],
which are select the business process, declare grain, identify the
A. Data Warehousing dimensions, and identify facts.
Data warehouse is used to collect and integrate databases
that can be used as a decision support system. Data warehouse Select the Business Process, at this stage we have to
focused on providing information to support company’s determine subject from the faced problem. Based on analysis
decision making. Data warehouse usually provide storaging result that has been done there are several important related to
media in a high performance, performing calculation / data operational student activities at XYZ University, those are
aggregation operations, and providing an interface that allows Student Registration, Student Assessment, Student Leave
users to command data [11][12][13]. Compared with database Request, Student Withdrawal, and Graduation.
which could do Create, Read, Update, and Delete (CRUD)
process, data warehouse tends to do the Read process to analyze Declare Grain, this stage is to determine the balance
data based on dimensions that have been normalized and between business needs and available data. Also, to specify
illustrated in the star scheme [14]. what kind of data can be shown in fact tables. Grain in the
The star scheme is designed in such a way as to produce design of this data warehouse includes: first, analysis that can
relevant information. The relationship between the tables be done on the Student Registration process, including:
contained in the star schema is managed by using surrogate key Quantity of student registration, Quantity of student active,
and indexing table to facilitate the query process [10]. Quantity of student who took thesis. Second, analysis that can
be done on the Student Assessment process, including: Student
B. Extract, Transform, and Loading Quantity based on Student Activity Transcript (SAT) points and
ETL (Extract, Transform, and Load) includes all between social work hours, Quantity of students whose achievement
operational source systems and data warehouse presentation index is not eligible for graduation. Third, analysis that can be
area or Business Intelligence [15]. ETL is the main activity in done on the Student Leave Request process, including: Quantity
the data warehouse, Extraction refers to the reading process, of students who apply for leave, Total reasons for frequent
understanding the data source, and copying the data from the leave. Fourth, analysis that can be done on the Student
heterogeneous source needed in ETL process for further Withdrawal process, including: Quantity of students who
manipulation. After the data extraction to ETL system, next resigned, Total reasons for frequent resignations submitted,
process is transformation. In this process will be conducted Quantity of students who moved into different majors. The last,
cleaning data and combining data from multiple sources. analysis that can be done on the Graduation process, including:
Finally, the transformed data will enter the data warehouse. Quantity of students who eligible and attend the graduation
When an organization uses a data warehouse from the database, ceremony, the average achievement index gained by graduation
the number of ETL processing is minimized significantly [16]. student.
ETL Models is divided into 3 models, (1) Conceptual
modeling, conceptual modeling aims to create conceptual Identify the Dimensions, at this stage will be the process of
models for ETL processes that describe the mapping of identifying dimensions that will be related to fact table. Here is
attributes from a data source to a data warehouse attribute. (2) one of the examples:
Logical modelling, logical modelling focused on data from data
source to data warehouse which ended at data store and (3)
TABLE I. IDENTIFY DIMENSION FOR STUDENT LEAVE TRANSACTION
FACT
978-1-5386-5821-5/18/$31.00 ©2018 IEEE 3-5 September 2018, Bina Nusantara University, Jakarta, Indonesia
2018 International Conference on Information Management and Technology (ICIMTech)
Page 461
Grain Quantity of Total reasons Execute SQL Running Stuctured Query
Scripting
students who apply for frequent script Language (SQL) script
for leave leave Data Grid Input Inserting static data rows
If field value is Specifies a certain value if the
Dimension Utility
null data is null
Student X Insert / Update Output Update or insert data to database
Program X X
Adjust specific attributes or
Lecture Period X X Select values Transform
values and set attribute metadata
Time X X Look for a specific value based
Database
Lookup on the attributes value contained
lookup
Identify Facts, at this stage will be identified facts that will in the database
be used in data warehouse based on the subject that has been Look for a particular value that
Stream lookup Lookup comes from another source in
determined. The following are facts contained in the data the transformation process
warehouse Registration Fact, Assessment Fact, Leave Request Read the data information from
Table input Input
Fact, Withdrawal Fact, and Graduation Fact database table
Insert information into the
Table output Output
database table
Here is an example of designed star schema of an active
student data warehouse. Factcuti is related with dimWaktu, Pentaho Data Integration (PDI) is an open source software
dimMasaPerkuliahan, dimProgramAkademik, and that we used for integrating the database. The advantages of
dimMahasiswa. PDI are (1) Has a large collection of transformation stage, (2)
Modules are easy to use in data warehouse design (3) Has a
good performance and scalability, and (4) Can be developed
with various additional plugins [20].
978-1-5386-5821-5/18/$31.00 ©2018 IEEE 3-5 September 2018, Bina Nusantara University, Jakarta, Indonesia
2018 International Conference on Information Management and Technology (ICIMTech)
Page 462
4 Total Transform
totalMahasiswaCuti int Mahasiswa Count
Cuti (emplid)
Dashboard design, Dashboard design aims to represent or
visualize data contained in the data warehouse so it is easy to
understand and help the management to explore data
visualization.
Source
Database Table Attribute
Student_Active_Olap dimwaktu skwaktu
Student_Active_Olap dimmahasiswa skmahasiswa
Student_Active_Olap dimprogramakademik skprogramakademik
Student_Active_Olap dimmasaperkuliahan skmasaperkuliahan
Bcs ps_prog_rsn_tbl leave_type
Bcs ps_prog_rsn_tbl, descry,
Legacy transaksi_cuti_resmi alasan
Bcs ps_bn_ofcleav_dtl,
emplid
Legacy transaksi_cuti_resmi
Metadata of Leave_Request_Fact consists source tables and
target tables. Source tables are from student_active_olap, bcs,
and legacy databases.
978-1-5386-5821-5/18/$31.00 ©2018 IEEE 3-5 September 2018, Bina Nusantara University, Jakarta, Indonesia
2018 International Conference on Information Management and Technology (ICIMTech)
Page 463
building a data mining, so the analysis can be more in-depth and
also helps in decision making.
REFERENCES
978-1-5386-5821-5/18/$31.00 ©2018 IEEE 3-5 September 2018, Bina Nusantara University, Jakarta, Indonesia
2018 International Conference on Information Management and Technology (ICIMTech)
Page 464
[19] T. M. Connolly and C. E. Begg, “Database systems. A practical approach Conference Series: Materials Science and Engineering, 2016, vol. 128,
to design implementation and management. global ed,” Harlow, Pearson no. 1, p. 12020.
Educ., 2015.
[20] R. J. Salaki, J. Waworuntu, and I. Tangkawarow, “Extract transformation
loading from OLTP to OLAP data using pentaho data integration,” in IOP
978-1-5386-5821-5/18/$31.00 ©2018 IEEE 3-5 September 2018, Bina Nusantara University, Jakarta, Indonesia
2018 International Conference on Information Management and Technology (ICIMTech)
Page 465