0% found this document useful (0 votes)
61 views12 pages

Development of National Health Data Warehouse For Data Mining

This document discusses the development of a national health data warehouse in Bangladesh to integrate data from disparate healthcare sources for data mining and knowledge discovery. It describes how healthcare data comes from various sources like hospitals, departments and laboratories in different formats like text, numbers, images and videos. Integrating this fragmented data can help researchers, providers and patients utilize the stored knowledge better. The document proposes developing a healthcare data warehouse, which would standardize data, improve analysis and allow data sharing across organizations. It discusses the challenges in building such a warehouse, which involves designing the warehouse, extracting data from sources and regularly updating it from operational databases. The overall aim is to identify obstacles to healthcare data integration and propose a model to integrate data in Bangladesh

Uploaded by

Kanchi Keerthij
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views12 pages

Development of National Health Data Warehouse For Data Mining

This document discusses the development of a national health data warehouse in Bangladesh to integrate data from disparate healthcare sources for data mining and knowledge discovery. It describes how healthcare data comes from various sources like hospitals, departments and laboratories in different formats like text, numbers, images and videos. Integrating this fragmented data can help researchers, providers and patients utilize the stored knowledge better. The document proposes developing a healthcare data warehouse, which would standardize data, improve analysis and allow data sharing across organizations. It discusses the challenges in building such a warehouse, which involves designing the warehouse, extracting data from sources and regularly updating it from operational databases. The overall aim is to identify obstacles to healthcare data integration and propose a model to integrate data in Bangladesh

Uploaded by

Kanchi Keerthij
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/296938522

Development of National Health Data Warehouse for Data Mining

Article · July 2015

CITATIONS READS

11 11,099

2 authors:

Shahidul Islam Khan Abu Sayed Latiful Haque


International Islamic University Chittagong Bangladesh University of Engineering and Technology
53 PUBLICATIONS   413 CITATIONS    57 PUBLICATIONS   569 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Problem-based eLearning (PBeL) View project

Maglev Train Guideway View project

All content following this page was uploaded by Shahidul Islam Khan on 06 March 2016.

The user has requested enhancement of the downloaded file.


Database Systems Journal vol. VI, no. 1/2015 3

Development of National Health Data Warehouse for Data Mining

Shahidul Islam Khan, Abu Sayed Md. Latiful Hoque


Dept. of Computer Science and Engineering (CSE),
Bangladesh University of Engineering & Technology (BUET), Dhaka, Bangladesh
[email protected] , [email protected]

Health informatics is currently one of the top focuses of computer science researchers.
Availability of timely and accurate data is essential for medical decision making. Health care
organizations face a common problem with the large amount of data they have in numerous
systems. Researchers, health care providers and patients will not be able to utilize the knowledge
stored in different repositories unless amalgamate the information from disparate sources is
done. This problem can be solved by Data warehousing. Data warehousing techniques share a
common set of tasks, include requirements analysis, data design, architectural design,
implementation and deployment. Developing health data warehouse is complex and time
consuming but is also essential to deliver quality health services. This paper depicts prospects
and complexities of health data warehousing and mining and illustrate a data-warehousing
model suitable for integrating data from different health care sources to discover effective
knowledge.

Keywords: Data Mining, Data Warehouse, Health Informatics, Clinical Database, Data
Preprocessing

Introduction text or numbers (patient identification,


1 Health informatics or healthcare
informatics is an intersection of
demographics, history, laboratory data, etc),
analog or digital signals (ECG, EEG, EMG,
computer science and health care ENG etc), images (histological, radiological,
services. It deals with resources and ultrasound, etc), and videos. Further
methods needed to optimize the complicating the storage of this data is the
acquisition, storage, retrieval and use of fact that because patient identification
information in medical research and information cannot be publicly used. Such
applied to the areas of health care identifiers must be removed from other
management, diagnosis, clinical care, clinical parameters. Difficulty in storing this
pharmacy, nursing and public health [1, type of data is that each disease and species
2]. Knowledge discovery from data can only be effectively described using
(KDD) is the non-trivial process of greatly different vocabularies and data
identifying valid, novel, potentially elements. [2, 6, 7, 8].
useful and ultimately understandable One of the major Information Technology
patterns in data. Data mining, a major challenge in medical practice is how to
part in KDD, consists of applying data integrate several disparate, isolated
analysis and learning algorithms to information repositories into a single logical
produce potential interesting patterns repository to create consistent information
over the data [3, 4, 5]. for all users. A massive amount of health
Health data refers to any information that records, related documents and medical
is contained in a patient’s medical record. images created by clinical diagnostic
This information may be acquired from equipment are generated daily. These
notes derived from a hospital admission valuable data are stored in various medical
or a doctor’s visit or diagnostic report. information systems such as HIS (Hospital
This data comes in various forms such as Information System), PACS (Picture
4 Development of National Health Data Warehouse for Data Mining

Archiving and Communications System), regular basis from different data sources.
RIS (Radiology Information System) in The advantages and disadvantages of DW
various hospitals, departments and are given below [9, 10, 11]:
diagnostic laboratories. Data required to Advantages of DW:
make informed medical decisions are 1. Standardize data across the organization
trapped within fragmented, disparate, and 2. Improve turnaround time for analysis and
heterogeneous clinical and administrative reporting
systems that are not properly integrated. 3. Easy Sharing of data
As a result health care suffer because 4. Remove informational processing load
medical practitioners and health care from operational database
providers are unable to access this 5. Enhance Data Quality and Consistency
information to perform activities such as 6. Provide historical intelligence and reduce
diagnostics, and treatment optimization to cost to access historical data
improve patient care [1, 6, 7]. 7. Integrate data from multiple sources into a
Successful healthcare data management is single repository
an important factor in developing support 8. Improve data quality by providing fixing
systems for the clinical decision-making noisy data
process. Traditional operational database 9. Restructure the data so that it delivers
system does not satisfy the requirements excellent query performance
for critical data analysis tasks of the 10.Make decision–support queries easier to
clinical decision-making users. It write.
contains detailed data but do not include Disadvantage of DW:
important historical data, and since it is 1. Long initial development time and
highly normalized, it performs poorly for associated high cost
complex queries that need to join many 2. Data owners lose control over data,
relational tables or to aggregate large raising ownership and privacy issues
volumes of data in order to generate Implementing a Health DW is a complex
various clinical reports. A health data task containing two major phases. Firstly, in
warehouse is a data store that is different the configuration phase, a conceptual view
from the hospital’s operational databases. of the warehouse is specified according to
It can be used for the analysis of user requirements (DW design). Secondly,
consolidated historical data [7, 8]. the related data sources and the Extraction-
According to Inmon [9] A data Transform-Load (ETL) process (data
warehouse (DW) is a subject-oriented, acquisition) are determined. After the initial
integrated, non-volatile, and time-variant load during the operation phase, warehouse
collection of data in support of data must be regularly refreshed that is,
management’s decisions. modifications of operational data since the
Subject-oriented: as the warehouse is last DW refreshment must be propagated
organized around the major subjects of into the warehouse such that data stored in
the enterprise (such as customers, the data warehouse reflect the state of the
products, and sales). underlying operational systems [5, 8, 12].
Integrated: as DW is constructed by The main aim of this research is to identify
integrating multiple heterogeneous the obstacles for healthcare data integration
sources usually, such as relational and to propose a data-warehousing model
databases, flat files etc. suitable for integrating fragmented data in
Time-variant: as data in the warehouse is respect to Bangladesh as well as anywhere
only accurate and valid at some point in else. The result will contribute to the
time or over some time interval. advancement of knowledge in the field of
Non-volatile: as the data is not updated in medical informatics. In this paper “Health”,
real time but is refreshed from on a “Clinical” “Pathological” and “Medical”
Database Systems Journal vol. VI, no. 1/2015 5

these terms are used for similar meaning. different hospitals to enable any hospital to
The rest of this paper is organized as obtain a total overview of a patient's health
follows. In Section 2 we have presented history. Different heterogeneity problems
selected literature reviews on DW, Health have to be solved in order to integrate EHR
DW and KDD techniques. Section 3 systems from different hospitals and health
describes briefly some design issues of service providers in a consistent way. The
National Health DW. In Section 4 we first problem is that different hospitals
have shown the calculation of normally do not use a same DBMS and
approximate size of our DW. Some therefore, the traditional ACID properties of
preprocessing techniques that we have databases are missing across the different
used are illustrated in Section 5. Section hospital locations. This may cause
6 gives readers ideas about how our DW performance, autonomy, and consistency
will be used for knowledge discovery and problems. Another heterogeneity problem is
mining. Finally Section 7 concludes the that there are several incompatible standards
paper. for EHR entries [12].
The trend of adopting data warehouses for
2. Literature Review health systems in presented in [13], where
DW unifies the data scattered throughout the design experience in the University of
an organization into a single centralized Virginia Health System is reported. Here the
data structure. It is a repository of data warehouse is used to provide clinicians
integrated information available for and researchers with direct, rapid access to
querying and analysis. DW may be desired patients’ data. In addition they use
considered a proactive approach to DW also for educational and research aims,
information integration, as compared to as it serves to face informatics issues−such
the more traditional query driven as data capture−and to perform exploratory
approaches where processing and analyses of healthcare problems.
integration starts when a query arrives [5, Medical domain has certain unique data
6]. A health data warehouse is a requirements such as high volumes of
repository where healthcare providers can unstructured data and data confidentiality.
gain access to medical data gathered in There are huge constraints and issues that
the patient care process. Extracting limit the way the data mining is performed
medical domain information to a data for medical datasets. Some of these issues
warehouse can facilitate efficient storage, are the way the data is collected; accuracy of
enhances timely analysis and increases the data, ethical, privacy and social issues
the quality of real time decision making that comes with patient’s records [2].
processes. Today’s healthcare Research is also done to find out impact of
organizations require not only the quality missing values and explore the impact of
and effectiveness of their treatment, but noise and how this can influence the output.
also reduction of waste and unnecessary Zhu et al. classified noises into class noise
costs. In order to construct an operational and attributes noise. Attribute noise include
and effective DW it is essential to incorrect attribute values, missing or don’t
combine process work, domain expertise know attribute values and incomplete
and high quality database design [7, 8]. attributes or don’t care values [14].
Electronic Health Record (EHR) Several researches have focused on the
describes the diseases and treatments of techniques that have built in mechanism to
patients, are normally stored in hospitals handle noise and missing values and which
or clinics, where they are created. are more appropriate to use for medical
Patients may be treated in different applications. Few techniques that have been
hospitals, clinics and, therefore, there is a applied and are more suited to medical data
need for integrating health records from sets are studied in [15, 16]. For example
6 Development of National Health Data Warehouse for Data Mining

decision tree, logic programs, K-nearest studied in [21]. Cubillas et. al. proposed a
neighbour, and Bayesian classifiers. Lee model for improvement in appointment
et al recommended that Bayesian scheduling in health care centers [22].
networks and decision trees are the Hoque et. al. discussed present structure of
primary techniques applied in medical pathological data, requirements to formulate
information systems [17]. Obenshain efficient models and the necessity to reform
claimed that that neural networks the present structure for predicative data
performed better then logistic regression, mining in [23]. Kumari and Singh used
but the decision tree did better in identify Neural Network for the diagnosis of diabetes
active compounds most likely to have [24]. Yilmaz et. al. proposed a modified K-
biological activity [18]. Wang and Wang means Algorithm based data preparation
discussed that most process models do method for diagnosis of heart and diabetes
not focus in gaining new knowledge. diseases [25]. Herland et. al. present recent
Medical data mining applications should research using Big Data tools and
follow a five stage data mining approaches for the analysis of Health
development cycle: planning tasks, Informatics [26].
developing data mining hypotheses,
preparing data, selecting data mining 3. Design Issues of National Health DW
tools, and evaluating data mining results The architecture of national health DW
[19]. model is illustrated in Fig. 1. Health data
Handling Missing Data in Pathology from different govt. and private sources such
Databases using Multiple Imputation as hospitals, clinics, diagnostic centers,
technique is discussed in [20]. research centers will be collected. Using
Optimizing public health data collection ETL process data will be integrated into a
for KDD using feature selection is temporary data repository [27].

Fig. 1 Brief Architecture of Health Data Warehouse

Cleaning, noise reduction, normalization and mining operations can be easily


techniques will be applied next. After that performed over the pathological DW.
data will be loaded into DW. Online 4D Health data cube used for national health
Analytical Processing (OLAP) queries DW development is shown in Fig. 2. Here 0-
D apex cube will provide highest level of
Database Systems Journal vol. VI, no. 1/2015 7

summarization of national health data. Model is the Star Schema [9, 12, 13]. We
Partial materialization is used rather than have used Star Schema in our design,
full materialization of cuboids to reduce illustrated in Fig. 3.
huge space requirements [9, 10]. Using the building blocks of the fact table
Logical design of DW involves the and the various dimension tables, one has
definition of structures that enable an thousands of ways to aggregate the data. For
efficient access to information. There are clinical analysis purposes, frequently needed
many logical models like Flat schema, aggregated datasets should be created in
Star schema, Star Cluster schema, advance for the users. Having data readily
Snowflake schema, Fact Constellation and easily available is a major tenet of data
schema etc. Among them, star schema, warehousing. For our DW, some aggregated
snowflake schema and fact constellation datasets could be:
schema are mostly used commercially. • Patient count by Diagnosis, Gender,
Efficiency is the most important factor in Age, Date
DW modeling because many queries • Count of Procedures by Provider and
access large amounts of data that may Date
involve multiple join operations. Most • Billing and discount information.
suitable Logical Data Warehousing • Count of retesting

Fig. 2 Health Data Cube


8 Development of National Health Data Warehouse for Data Mining

Fig. 3 Fact Table and Dimension Tables of National Health DW

4. DW Size Analysis Government hospitals of secondary and


Let, test for a single patient = tp ; total tertiary levels under DGHS is 125[28], [29].
patients in class i = pi According to the Bangladesh Private Clinic
So total test for pi patients = and Diagnostic Owners Association
total reports in class i = ri ; (BPCDOA), 8,000 diagnostic centers of the
country have DGHS approvals till
number of test in ri = December 2014 [30], [31]. So minimum
number of places where pathological tests
where trj is number of test in report j are performed is 8717. If for simplicity of
Total number of tests T is given by calculation we consider average 500 patients
n=number of classes in laboratories reports are produced every day, each report
li = number of laboratories in class i consists 15 test attributes then putting the
ri =number of test reports in class i values in above equation we get:
t = number of test per report T= 65377500 records(tuples /test attributes)
per day.
We have derived the following equation So more than 65 million records will be
to count the total number of test records added in the fact tables of National Health
(tuples in fact table) are generated by the DW. Considering 1 record takes 0.2KB
different healthcare organizations such as memory space than the DW will consume
hospitals and diagnostic centers. 12.50 GB memory/day. It is certainly falls
under big data category and Bangladesh
Government should go for Cloud storage
and services for this [26, 32]. For
According to Directorate General of fragmentation of big database there are
Health Services (DGHS) under the several techniques such as CRUD matrix
Ministry of Health and Family Welfare based fragmentation proposed by the authors
(MoHFW): Total number of government of this paper [33, 34].
hospitals under DGHS is 592 and
Database Systems Journal vol. VI, no. 1/2015 9

5. Data Preprocessing performed. Table 1 and 2 present few of


Data preprocessing is one of the major PCV Hct Red Cell Indices test data. Here
task for developing a DW from reference values for female are 36-46 and
heterogeneous sources. It includes data for male are 40-50. The full dataset for this
cleaning, missing values imputation, particular test contains 13296 records where
normalization, transformation etc. As for the minimum and maximum results are 0.1
National Health DW, data are coming and 64 respectively. The results and the
from different public and private reference values are normalized using the
hospitals, diagnostic centers and other Min-Max and Z-Score normalization
sources, different preprocessing steps has techniques. Missing data are replaced by
been performed on data. Followings are class mean method. Table 3 shows partial
some data preprocessing that we have metadata for the same test dataset.

Table 1 Attribute subset selection and normalization of numeric data


TESTRESULT_ID RESULT Z Score normalized result Min Max Normalized result
114080000000002 39.9 0.58 0.6228
114080000000204 39 0.41 0.6088
114080000000283 0.1 -6.91 0
114080000000609 0.2 -6.89 0.0016
114080000000755 0.1 -6.91 0
114080000000834 28.3 -1.6 0.4413
114080000000913 43 1.16 0.6714
114080000001138 29.7 -1.34 0.4632
114080000001279 37.8 0.18 0.59
114080000001436 37 0.03 0.5775
114080000001650 39 0.41 0.6088
114080000002071 39 0.41 0.6088
114080000002248 35 -0.34 0.5462
114080000003618 42 0.97 0.6557
114080000003766 41 0.79 0.6401
114080000003900 46 1.73 0.7183

Table 2 Reference values and their normalization


Reference values Min-Max Normalization Z Score Normalization
Female Lower: 36 0.561815336 -0.154685736
Female Upper:46 0.718309859 1.727135868
Male Lower: 40 0.624413146 0.598042906
Male Upper:50 0.780907668 2.479864509

Table 3 Metadata for the test result


Type Value
Average 36.812
Maximum 64
10 Development of National Health Data Warehouse for Data Mining

Minimum 0.1
Standard Dev. 5.3070

Table 4 presents normalization technique of used to replace result data for Urine colour
nominal data where metadata of Table 5 are diagnosis.

Table 4 Preprocessing of nominal data


TESTRESULT_ID Result Min-Max
Type_ID Normalization
114080000002446 0 0.0000
114080000013324 1 0.1111
114080000098717 4 0.4444
114080000487792 6 0.6667
114080000743386 2 0.2222
114090000792554 3 0.3333
114090000822074 3 0.3333
114090000902763 7 0.7778
314080000143669 8 0.8889
314080000652184 5 0.5556
414080000046596 9 1.0000

Table 5 Metadata for Type_ID generation from Urine colour


Colour Sample_Count Type_ID
Straw 9439 0
Yellow 58 1
Reddish 32 2
L. Yellow 29 3
Milkly white 2 4
D. Yellow 2 5
Yellowish white 1 6
Reddish black 1 7
L.Reddish 1 8
Hazy 1 9

6 Data Mining from National Health Table 6 WHO's Hemoglobin thresholds to


DW define anemia
National Health DW can be used in many Age/Gender group Hb
ways to improve national health standard, threshold(g/dl)
to provide better and prompt services to Children(0.5-5yrs) 11.0
the patients and to facilitate health related Children(5-12yrs) 11.5
research among the doctors, clinical Teens(12-15yrs) 12.0
researchers etc. In this section we are Women, non- 12.0
describing the use of National Health DW pregnant(>15yrs)
with two examples. Women, pregnant 11.0
Example 1: National Reference level Men(>15yrs) 13.0
threshold finding
Database Systems Journal vol. VI, no. 1/2015 11

In Bangladesh, rule of thumbs is for Pathological centers.


Woman > 15 years, non pregnant, Hb> 2. Stakeholders should clearly know what
11 Good; Hb >=10.5 ok and if Hb < 10 the data retrieval operations are going to
(g/dl), medication for the patient is be executed using the data warehouse.
needed. This is slightly different from 3. There should be strong cooperative mind
WHO threshold [35]. Using National among different health service providers
Health DW National Reference level for to help the governmental bodies for
different clinical values can be found by successful implementation.
data mining.
Acknowledgment
Example 2: Fraud Testing Awareness This research is funded by ICT Division,
If For a Costly Test T1: Ministry of Posts, Telecommunications and
Age(X,<30) ^ (Gender=‘M’) => Information Technology, Government of the
Negative (X, T1) People's Republic of Bangladesh.
[Support =85%, Confidence=98%] Authors are grateful to Mr. Mamunur
From confidence value it can be clearly Rashid and Muhammed Dastagir Husain of
identified that this test T1 has almost no Dept. of CSE, BUET for their valuable
impact of disease diagnosis. National discussions.
awareness can be developed not to
perform the test at initial level for Young References
Males. [1] Roddick JF, Fule P, Graco WJ (2003)
In this way many other interesting Exploratory medical knowledge
patterns can be derived from National discovery: experiences and issues.
Health DW by using various data mining SIGKDD Explor. Newsletter, 5(1): 94-
algorithms like association or clustering. 99
[2] Cios K (2002) Uniqueness of medical
7. Conclusions data mining. Artificial intelligence in
This paper presented the developmental medicine. 26:1-24
stages of National Health DW platform [3] Fayyad UM, Shapiro GP, Smyth P
for the management, processing and (1996) From Data Mining to
analysis of large-scale Health data Knowledge Discovery: An Overview.
modeled for e-health system. In this Advances in Knowledge Discovery and
paper, widely accepted conceptual and Data Mining 1–36
logical design approaches in DW design [4] Khosla R, Dillon T (1997) Knowledge
are discussed. Considering the quality Discovery, Data Mining and Hybrid
factors and the information requirements, Systems. Engineering Intelligent Hybrid
star schema was chosen as the most Multi-Agent Systems, Kluwer
suitable logical model for the purpose. Academic Publishers 143–177
Establishing a data warehouse gathering [5] Inmon WH (1992) EIS and the data
huge data from existing Health databases warehouse: a simple approach to
should give easier and better access to building an effective foundation for
interesting data for researchers, health EIS. Database Programming and
service providers and govt. authorities. In Design, 5(11): 70-73
order to get maximum benefit from the [6] Stolba N, Banek M, and Tjoa AM (2006)
model presented in this research, the The Security Issue of Federated Data
conditions mentioned below should be Warehouses in the Area of Evidence-
satisfied. Based Medicine. First International
1. There should be a document which Conference on Availability, Reliability
clearly defines the structure of the data and Security (ARES’06, IEEE)
tables currently used by the concerned
12 Development of National Health Data Warehouse for Data Mining

[7] Sahama TR, Croll PR (2007) A Data [18] Obenshain MK, Application of Data
Warehouse Architecture for Clinical Mining Techniques to Healthcare Data,
Data Warehousing. Australasian Infection Control and Hospital
Workshop on Health Knowledge Epidemiology, vol.25, no 8, pp. 690-
Management and Discovery (HKMD 695, 2004
2007) [19] Wang, H, Wang S (2008) Medical
[8] Lyman JA, Scully K, Harrison JH knowledge acquisition through data
(2008) The development of health mining,. IEEE International Symposium
care data warehouses to support data ITME.
mining. Clin Lab Med. 28(1):55-71 [20] S. FU (2011) Missing Data in
[9] Inmon, W (2005): Building the Data Pathology Databases. MSc Thesis,
Warehouse, 4th edition, Wiley-New Australian National University.
York. [21] Partington SN, Papakroni V, Menzies T
[10] Jiawei H, Micheline K, Jian P (2014) Optimizing data collection for
(2012) Data Mining Concepts and public health decisions: a data mining
Techniques 3rd Edition, Elsevier approach. BMC Public Health 14: 593-
[11] Kimball R, Ross M (2013) The Data 598
Warehouse Toolkit: The Definitive [22] Cubillas JJ, Ramos MI, Feito FR,
Guide to Dimensional Modeling 3rd Ureña T (2014) An improvement in the
Edition, Wiley appointment scheduling in primary
[12] Nugawela S (2013) Data health care centers using data mining. J.
Warehousing Model For Integrating Med. Syst., Springer 38: 89
Fragmented Electronic Health [23] Hoque ASML, Galib S, Tasnim M
Records From Disparate And (2013) Mining Pathological Data to
Heterogeneous Clinical Data Stores, Support Medical Diagnostics.
M.Sc. Thesis, Queensland University Workshop on Advances on Data
of Technology Management: Applications and
[13] Mullins M, Siadaty MS, Lyman J et Algorithms, Department of Computer
al (2006) Data mining and clinical Science and Engineering, BUET,
data repositories: Insights from a Dhaka, 71-74
667,000 patient data set. Comput. [24] Kumari S, Singh A (2013) A data
Biol. Med. 36: 1351–1377 mining approach for the diagnosis of
[14] Zhu X, Khoshgoftaar T, Davidson I, diabetes mellitus. IEEE 7th
Zhang S (2007) Special issue on International Conference on Intelligent
mining low-quality data, Knowledge Systems and Control
and Information Systems, 11:131- [25] Yilmaz N, Inan O, Uzer MS (2014) A
136 New Data Preparation Method Based on
[15] Brown ML, Kros JF (2003) Data Clustering Algorithms for Diagnosis
mining and the impact of missing Systems of Heart and Diabetes
data. Industrial Management & Data Diseases,” J. Med. Syst., Springer, 38
Systems 103: 611-621 [26] Herland M, Khoshgoftaar TM, Wald R
[16] Lavrač N (1999) Selected techniques (2014) A review of data mining using
for data mining in medicine. big data in health informatics. J. Big
Artificial intelligence in medicine Data, Springer 1: 2
16(1): 3-23 [27] Khan SI, Hoque ASML (2015)
[17] Lee IN, Liao SC, Embrechts M Towards Development of Health Data
(2000) Data mining techniques Warehouse: Bangladesh Perspective.
applied to medical information. Accepted in 2nd International
Medical Informatics & the Internet in Conference on Electrical Engineering
Medicine 25(2): 81-102
Database Systems Journal vol. VI, no. 1/2015 13

and Information & Communication Fragmentation in Distributed Systems,


Technology (iCEEiCT 2015). International Journal of Computer
[28] HEALTH BULLETIN 2014, 2nd Applications 5 (9), 20-24.
Edition, DGHS, Ministry of Health [34] Khan SI, Hoque ASML (2012)
and Family Welfare, Government of Scalability and Performance Analysis of
the People’s Republic of Bangladesh CRUD Matrix Based Fragmentation
[29] http://www.dghs.gov.bd/index.php/e Technique for Distributed Database, in
n/health-program-progress/hpnsdp- Proceedings of 15th International
2011-16/84-english-root/ehealth- Conference on Computer and
eservice/497-hpnsdp-2011-16-brief. Information Technology (ICCIT), 562-
Accessed 20 Feb 2015 567.
[30] http://www.bpcdoa.com/clinics_and [35] World Health Organization (2008).
_diagnostics.html. Accessed 22 Feb Worldwide prevalence of anaemia
2015 1993–2005. Geneva: World Health
[31] http://www.thefinancialexpress- Organization. ISBN 978-92-4-159665-7
bd.com/2014/12/15/71077/print
Accessed 22 Feb 2015
[32] Liang Z, Sherif S, Anna L, Athman
B (2014) Cloud Data Management-
Springer Switzerland
[33] Khan SI, Hoque ASML (2010) A
New Technique for Database

Shahidul Islam Khan obtained his B.Sc. and M.Sc. Engineering Degree in
Computer Science and Engineering (CSE) from Ahsanullah University of
Science & Technology (AUST) and Bangladesh University of Engineering
& Technology (BUET), Dhaka, Bangladesh in 2003 and 2011 respectively.
He is now a Doctoral Student in the Department of CSE, BUET, which is the
highest ranked technical university of Bangladesh. His current field of
research is data mining and health informatics. He has more than 10
published papers in international conferences and journal. He is also an Assistant Professor
(Currently in study leave) in the Dept. of CSE, International Islamic University Chittagong
(IIUC), Bangladesh.

Abu Sayed Md. Latiful Hoque graduated from the Dept. of Electrical and
Electronic Engineering (EEE), Bangladesh University of Engineering &
Technology (BUET), Dhaka, Bangladesh in 1986. He obtained Post
Graduate Diploma in 1992 from Asian Institute of Technology (AIT),
Thailand and Ph.D. in CS from University of Strathclyde, Glasgow, UK in
2003. He is a professor of the Dept. of CSE, BUET and a prominent
international researcher in the field of Database, Data Mining, E-Learning.
He has near about 50 published papers in reputed international journals and conferences. He is
also author of a book on Database Systems which is taught in Universities.

View publication stats

You might also like