BIA Model Test Paper - 2014

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

MODEL TEST PAPER

BUSINESS INTELLIGENCE & ITS APPLICATIONS

Ques1: Differentiate between Decision Support System and Group Decision


support System.
Ques3. Which information system is suitable for top level management? Why
Ques 4. a) Difference between database and knowledge base?
b)Difference between OLAP and OLTP
Ques 5. Explain the term Knowledge Discovery in Databases?
Ques 6. Consider the table
Employee (Empid, Ename, Job, salary, Hiredate, Deptno)
Write the SQL for the following:
i) Find the average salary of each Deptno
ii) Display the employees who work in Deptno 10 and 20
iii) Display the employee names which were hired in the year 2007
iv) Add a new record to the employee table
Ques7. Explain the architecture of Data Warehouse in HealthCare Sector.
Ques8. How are data warehousing, data mining, expert system technologies
associated with knowledge management?
Model Test Paper (2014)
BUSINESS INTELLIGENCE & ITS APPLICATIONS

Ques1: Differentiate between Decision Support System and Group Decision Support System.

Answer: Decision Support System and Group Decision Support System

GDSS and DSS are computer based information system that may assist decision making processes
within a group, company or office. By using GDSS and DSS, the company can speed up decision
making process allowing more time for employees to focus on particular issues. Learning and
training can be promoted through this system.

Group Decision Support System: - GDSS or Group Decision Support System is a subclass or
subcategory of DSS. It is defined as a computer based information system built to support and
promotes positive group decision making. GDSS has three important components: software, which
consists of the database with management capabilities for group decision making. Another
component is the hardware and lastly the people. The latter will include the decision making
participants.
A group decision support system is a decision support system that facilitates decision making by a
team of decision markets working as a group. The importance of collective decisions is being felt
today. For main issue to be sorted out, brainstorming sessions are carried out and the collective pool
of ideas and opinions give a final shape to a decision. A GDSS is a DSS that facilitates decision
making by a team of decision maker working as a group.
A GDSS is an interactive, computer based system that facilitates solution of unstructured problems
by a set of decisions makers working together as a group. A GDSS is superior then DSS because in
GDSS the decisions are taken by a group of DSS. So it is superior to the DSS.

Characteristics of GDSS : The main features of GDSS is explained as follows:-


1. A GDSS is a goal oriented. A GDSS is designed with the goal of supporting groups of decision
makers in their work.
2. A GDSS is a specially designed information system.
3. A GDSS is easy to learn and to use.
4. A GDSS is designed with the goal of supporting groups of decisions makers in their work.
5. The GDSS is designed to encourage activities such as idea generation, conflict resolution and
freedom of expression.

Decision Support System: - Meanwhile, DSS also known as Decision Support System is meant to
affect how individuals decide or process decision making. Through the use of DSS, both human
capabilities and computer capacities are maximized to result to one great positive decision. The
system will provide assistance for the human element and not the sole decision maker. DSS also
allows customization of the programs particularly the decision making capabilities to better suit
individual needs.
In other words, a Decision Support System (DSS) is a computer-based information system that
supports business or organizational decision- making activities. DSSs serve the management,
operations, and planning levels of an organization (usually mid and higher management) and help
to make decisions, which may be rapidly changing and not easily specified in advance
(Unstructured and Semi-Structured decision problems). Decision support systems can be either
fully computerized, human or a combination of both.

Characteristics of DSS: - The characteristics of the DSS are as follows: -


1. DSS focus on towards providing help in analyzing situations rather than providing right
information in form of various types of reports.
2. DSS is individual specific. Each decisions maker can incorporate his own perceptions about t he
problem and analyze its effect.
3. DSS incorporates various mathematical, statistical and operations research models.
4. These systems support complex non-routine decisions.
5. Primary purpose to process data into information
6. DSS systems are typically employed by tactical level management whose decisions and what-if
analysis are less structured.
7. This information system not only presents the results but also expands the information with
alternatives.
8. DSS is only supportive in nature and human decisions makers still retain their Supremacy. It
does not thrust its outcomes on the decision maker.

Difference between GDSS and DSS


GDSS is a computer based information system that focuses on the group while DSS focuses on an
individual for instance, the manager or the supervisor. GDSS and DSS may have similar
components in terms of hardware and software structures however, GDSS has a networking
technology that is best suited for group discussions or communication. DSS on the other hand, have
technologies that are focused for a single user. GDSS maintenance involves a better system
reliability and incomprehensible multi- user access compared to DSS because system failures in
GDSS will involve a lot of individual.
Through these programs or computer based information system, company or individual decision
making capacities will be enhanced and hasten. This allows not only good communication system
but also a positive outcome within a department, group, or company.
Question2: Explain the term Groupware, How is it linked to the term group decision
support system?

Groupware : - The term "groupware" refers to specialized software applications that enable group
members to share and sync information and also communicate with each other more easily. A class
of software that helps groups of colleagues (workgroups) attached to a local-area network organize
their activities. Typically, groupware supports the following operations:

scheduling meetings and allocating resources


e-mail
password protection for documents
telephone utilities
Benefits

Groupware can allow both geographically dispersed team members and a company's on-site
workers to collaborate with each other through the use of computer networking technologies (i.e.,
via the Internet or over an internal network/intranet). As such, groupware is especially important for
remote workers and professionals on the go, since they can collaborate with other team members
virtually.

Forces Driving Groupware Development: -Some of the major factors include:

Increased productivity

Reduced number of meetings

Increased automation of routine workflow

Need for better global coordination

Availability of widespread networks

Features

Some common features provided in groupware solutions include:

A centralized repository for documents and files that users can access and save to
Document version management and change management
Shared calendars and task management
Web conferencing, instant messaging, message boards, and/or whiteboards

Current market leaders are Lotus Notes and Domino, Microsoft Exchange, Novell GroupWise and
Oracle Office

It is a Individual tools inside the software suite include a meeting manager (Lotus Sametime) and
message exchange (Lotus Notes Mail)

A classification system based on type of support it provides:

1. Messaging systems

2. Conferencing systems
3. Collaborative authoring systems: eg Google doc , Media wiki

4. Group DSS

5. Coordination systems

6. Intelligent agent systems eg IBM Web Browser Intelligent

Groupware is linked to the group decision support system in following aspects:-

Groupware exists to facilitate the movement of messages or documents so as to enhance the quality
of communication among individuals in remote locations. It provides access to shared databases,
document handling, electronic messaging, work flow management, and conferencing. In fact,
groupware can be thought of as a development environment in which cooperative applications
including decisionscan be built. Groupware achieves this through the integration of eight distinct
technologies: messaging, conferencing, group document handling, work flow, utilities/development
tools, frameworks, services, and vertical market applications. Hence, it provides the foundation for
the easy exchange of data and information among individuals located far apart. Although no
currently available product has an integrated and complete set of capabilities.

Ques3. Which information system is suitable for top level management? Why

Ans. The information system suitable for top level management is the expert system &
Execuitve Information system.

An expert system is a computer system that emulates the decision- making ability of a human
expert. Expert systems are designed to solve complex problems by reasoning about knowledge,
represented primarily as IF-THEN rules rather than through conventional procedural code. The first
expert systems were created in the 1970s and then proliferated in the 1980s. Expert systems were
among the first truly successful forms of AI software.

It is an artificial intelligence based system that converts the knowledge of an expert in a specific
subject into a software code. This code can be merged with other such codes (based on the
knowledge of other experts) and used for answering questions (queries) submitted through a
computer. Expert systems typically consist of three parts: (1) a knowledge base which contains the
information acquired by interviewing experts, and logic rules that govern how that information is
applied; (2) an Inference engine that interprets the submitted problem against the rules and logic of
information stored in the knowledge base; and an (3) Interface that allows the user to express the
problem in a human language such as English.

Expert systems are computer applications which embody some non-algorithmic expertise for
solving certain types of problems. For example, expert systems are used in diagnostic applications
servicing both people and machinery. They also play chess, make financial planning decisions,
configure computers, monitor real time systems, underwrite insurance policies, and perform many
other services which previously required human expertise. Expert systems generate decisions that
an expert would make: they can recommend solutions to nursing problems which mimic the clinical
judgment of a nurse expert. These systems are developed to facilitate a nd enhance the clinical
judgment of nurses, not to replace them. Like decision support systems, expert systems provide
information to help health professionals to make informed judgments when assessing the validity of
data, information, diagnoses, and choices for treatment and care.

THE APPLICATIONS OF EXPERT SYSTEMS

The spectrum of applications of expert systems technology to industrial and commercial problems
is so wide as to defy easy characterization. The applications find their way into most areas of
knowledge work. They are as varied as helping salespersons sell modular factory-built homes to
helping NASA plan the maintenance of a space shuttle in preparation for its next flight.

Applications tend to cluster into seven major classes.

Diagnosis and Troubleshooting of Devices and Systems of All Kinds

This class comprises systems that deduce faults and suggest corrective actions for a malfunctioning
device or process. Medical diagnosis was one of the first knowledge areas to which ES technology
was applied (for example, see Shortliffe 1976), but diagnosis of engineered systems quickly
surpassed medical diagnosis. There are probably more diagnostic applications of ES than any other
type. The diagnostic problem can be stated in the abstract as: given the evidence presenting itself,
what is the underlying problem/reason/cause?

Planning and Scheduling


Systems that fall into this class analyze a set of one or more potentially complex and interacting
goals in order to determine a set of actions to achieve those goals, and/or provide a detailed
temporal ordering of those actions, taking into account personnel, materiel, and other constraints.
This class has great commercial potential, which has been recognized. Examples involve airline
scheduling of flights, personnel, and gates; manufacturing job-shop scheduling; and manufacturing
process planning.

Configuration of Manufactured Objects from Subassemblies

Configuration, whereby a solution to a problem is synthesized from a given set of elements related
by a set of constraints, is historically one of the most important of expert system applications.
Configuration applications were pioneered by computer companies as a means of facilitating the
manufacture of semi-custom minicomputers (McDermott 1981). The technique has found its way
into use in many different industries, for example, modular home building, manufacturing, and
other problems involving complex engineering design and manufacturing.

Financial Decision Making

The financial services industry has been a vigoro us user of expert system techniques. Advisory
programs have been created to assist bankers in determining whether to make loans to businesses
and individuals. Insurance companies have used expert systems to assess the risk presented by the
customer and to determine a price for the insurance. A typical application in the financial markets is
in foreign exchange trading.

Knowledge Publishing

This is a relatively new, but also potentially explosive area. The primary function of the expert
system is to deliver knowledge that is relevant to the user's problem, in the context of the user's
problem. The two most widely distributed expert systems in the world are in this category. The first
is an advisor which counsels a user on appropriate grammatical usage in a text. The second is a tax
advisor that accompanies a tax preparation program and advises the user on tax strategy, tactics, and
individual tax policy.

Process Monitoring and Control


Systems falling in this class analyze real-time data from physical devices with the goal of noticing
anomalies, predicting trends, and controlling for both optimality and failure correction. Examples of
real-time systems that actively monitor processes can be found in the steel making and oil refining
industries.

Design and Manufacturing

These systems assist in the design of physical devices and processes, ranging from high- level
conceptual design of abstract entities all the way to factory floor configuration of manufacturing
processes.

Ques 4 a) Difference between database and knowledge base?

Ans. Database

A database is an organized collection of data. The data are typically organized to model relevant
aspects of reality in a way that supports processes requiring this information. For example,
modeling the availability of rooms in hotels in a way that supports finding a hotel with vacancies.

Knowledge Base

A knowledge base (KB) is a technology used to store complex structured and unstructured
information used by a computer system. The initial use of the term was in connection with expert
systems which were the first knowledge-based systems.

The original use of the term knowledge-base was to describe one of the two sub-systems of a
knowledge-based system. A knowledge-based system consists of a knowledge-base that represents
facts about the world and an inference engine that can reason about those facts and use rules and
other forms of logic to deduce new facts or highlight inconsistencies.

1. A database stores data - for example, personnel data, sales data etc. As they stand, simply raw
data are not of much practical value, unless they can be transformed into information - for
example you may be able to analyze the sales data and arrive at purchase patterns, so your
company can leverage that information into profit. Now that data has become information.
Now, the question to ask is: what kind of "expertise" did you apply in transforming the raw data
into useful information? Can you store that "expertise", that "how to", in some place, so that
somebody else or perhaps some automated process can use that "stored knowledge" to do future
analysis? There, you have your knowledge base.

2. Knowledge base is just a collection of information based on experience or test cases which are
compiled into documents. This would relate to more of a functional domain. This is more of
subjective data.

3. Database is related collection of information compiled into rows and column. The data stored is
objective in nature. It is used in IT related applications. Some of the famous databases are
ORACLE, MS SQL, MS Access.

4. Knowledge base is a compilation of information that is based on Fact, Research, and life
experiences.
Database is a compilation of information that is just that Data, random information about
anything at all, pick a subject and compile information about it wether true or false,

Knowledge = If one puts his or her hand in Fire = It will burn, ( you gained knowledge )
Data = There are 100 ways to build a fire, and the list is compiled of data collected from others
experience.

Ques 4 b)Difference between OLAP and OLTP


An OLAP (On-line Analytical Processing) deal with Historical Data or Archival Data, and it is
characterized by relatively low volume of transactions. In addition, the Queries needed for these
systems are often very complex and involve aggregations as for OLAP systems the response time is
an effectiveness measure.

Example: If we collect last 10 years data about flight reservation, the data can give us much
meaningful information such as the trends in reservation. This may give useful information like
peak time of travel, and what kinds of people are traveling in the various classes available
(Economy/Business).

OLTP (On-line Transaction Processing) deals with operational data, which is data involved in the
operation of a particular system and it is characterized by a large number of short on- line
transactions (INSERT, UPDATE, and DELETE). The main emphasis for OLTP systems is put on
very fast query processing, maintaining data integrity in multi-access environments and an
effectiveness measured by number of transactions per second. In addition, in an OLTP system, the
data is frequently updated and queried and to prevent data redundancy and to prevent update
anomalies the database tables are normalized, which makes the write operation in the database
tables more efficient.

Example: In a banking System, you withdraw amount through an ATM. Then account Number,
ATM PIN Number, Amount you are withdrawing and Balance amount in account are operational
data elements.

The following table summarizes the major differences between OLTP and OLAP system
design.

OLTP SYSTEM OLAP SYSTEM


Operational data; OLTPs are the original Consolidation data; OLAP data comes from
Source of data
source of the data. the various OLTP Databases

To control and run fundamental business To help with planning, problem solving, and
Purpose of data
tasks decision support

Reveals a snapshot of ongoing business Multi-dimensional views of various kinds of


What the data
processes business activities

Inserts and Short and fast inserts and updates initiated Periodic long-running batch jobs refresh the
Updates by end users data

Relatively standardized and simple queries


Queries Often complex queries involving aggregations
Returning relatively few records

Depends on the amount of data involved; batch


Processing data refreshes and complex queries may
Typically very fast
Speed take many hours; query speed can be
improved by creating indexes

Space Larger due to the existence of aggregation


Can be relatively small if historical data is
Requiremen structures and history data; requires more
archived
ts indexes than OLTP

Database Typically de-normalized with fewer tables; use


Highly normalized with many tables
Design of star and/or snowflake schemas

Backup religiously; operational data is


Instead of regular backups, some environments
Backup and critical to run the business, data loss is
may consider simply reloading the OLTP
Recovery likely to entail significant monetary loss
data as a recovery method
and legal liability
Ques 5. Explain the term Knowledge Discovery in Databases?
Ans. Knowledge Discovery in Databases (KDD) is an automatic, exploratory analysis and
modeling of large data repositories. KDD is the organized process of identifying valid, novel,
useful, and understandable patterns from large and complex data sets. Data Mining (DM) is the
core of the KDD process, involving the inferring of algorithms that explore the data, develop the
model and discover previously unknown patterns. The model is used for understanding phenomena
from the data, analysis and prediction.
The accessibility and abundance of data today makes knowledge discovery and Data Mining a
matter of considerable importance and necessity. Given the recent growth of the field, it is not
surprising that a wide variety of methods is now available to the researchers and practitioners. No
one method is superior to others for all cases. The handbook of Data Mining and Knowledge
Discovery from Data aims to organize all significant methods developed in the field into a coherent
and unified catalog; presents performance evaluation approaches and techniq ues; and explains with
cases and software tools the use of the different methods.
The goals of this introductory chapter are to explain the KDD process, and to position DM within
the information technology tiers. Research and development challenges for the next generation of
the science of KDD and DM are also defined. The rationale, reasoning and organization of the
handbook are presented in this chapter. In this chapter there are six sections followed by a brief
reference primer list containing leading papers, books, conferences and journals in the field:
1. The KDD Process
2. Taxonomy of Data Mining Methods
3. Data Mining within the Complete Decision Support System
4. KDD & DM Research Opportunities and Challenges
5. KDD & DM Trends
6. The Organization of the Handbook. The special recent aspects of data availability that are
promoting the rapid development of KDD and DM are the electronically readiness of data (though
of different types and reliability). The internet and intranet fast development in particular promote
data accessibility. Methods that were developed before the
Internet revolution considered smaller amounts of data with less variability in data types and
reliability.
Since the information age, the accumulation of data has become easier and storing it inexpensive. It
has been estimated that the amount of stored information doubles every twenty months.
Unfortunately, as the amount of electronically stored information increases, the ability to
understand and make use of it does not keep pace with its growth. Data Mining is a term coined to
describe the process of sifting through large databases for interesting patterns and relationships.
The studies today aim at evidence-based modeling and analysis, as is the leading practice in
medicine, finance and many other fields.
The data availability is increasing exponentially, while the human processing level is almost
constant. Thus the gap increases exponentially. This gap is the opportunity for the KDD n DM
field, which therefore becomes increasingly important and necessary.
The KDD Process
The knowledge discovery process (Figure 1.1) is iterative and interactive, consisting of nine steps.
Note that the process is iterative at each step, meaning that moving back to previous steps may be
required. The process has many artistic aspects in the sense that one cannot present one formula
or make a complete taxonomy for the right choices for each step and application type. Thus it is
required to understand the process and the different needs and possibilities in each step. Taxonomy
is appropriate for the Data Mining methods and is presented in the next section.
Figure

The process starts with determining the KDD goals, and ends with the implementation of the
discovered knowledge. Then the loop is closed - the
Active Data Mining part starts (which is beyond the scope of this book and the process defined
here). As a result, changes would have to be made in the application domain (such as offering
different features to mobile phone users in order to reduce churning). This closes the loop, and the
effects are then measured on the new data repositories, and the KDD process is launched again.
Following is a brief description of the nine-step KDD process, starting with
a managerial step:
1. Developing an unde rstanding of the application domain: This is the initial preparatory step. It
prepares the scene for understanding what should be done with the many decisions (about
transformation, algorithms, representation, etc.). The people who are in charge of a KDD project
need to understand and define the goals of the end-user and the environment in which the
knowledge discovery process will take place (including relevant prior knowledge). As the KDD
process proceeds, there may be even a revision of this step.
Having understood the KDD goals, the preprocessing of the data starts, defined in the next three
steps (note that some of the methods here are similar to Data Mining algorithms, but are used in the
preprocessing context):

2. Selecting and creating a data set on which discovery will be performed.


Having defined the goals, the data that will be used for the knowledge discovery should be
determined. This includes finding out what data is available, obtaining additional necessary data,
and then integrating all the data for the knowledge discovery into one data set, including the
attributes that will be considered for the process. This process is very important because the Data
Mining learns and discovers from the available data. This is the evide nce base for constructing the
models. If some important attributes are missing, then the entire study may fail.
From this respect, the more attributes are considered, the better. On the other hand, to collect,
organize and operate complex data repositories is expensive and there is a tradeoff with the
opportunity for best understanding the phenomena. This tradeoff represents an aspect where the
interactive and iterative aspect of the KDD is taking place. This starts with the best available data
set and later expands and observes the effect in terms of knowledge discovery and modeling.

3. Preprocessing and cleansing. In this stage, data reliability is enhanced. It includes data clearing,
such as handling missing values and removal of noise or outliers. There are many methods
explained in the handbook, from doing nothing to becoming the major part (in terms of time
consumed) of a KDD project in certain projects. It may involve complex statistical methods or
using a Data Mining algorithm in this context. For example, if one suspects that a certain attribute is
of insufficient reliability or has many missing data, then this attribute could become the goal of a
data mining supervised algorithm. A prediction model for this attribute will be developed, and then
missing data can be predicted. The extension to which one pays attention to this level depends on
many factors.
In any case, studying the aspects is important and often revealing by itself, regarding enterprise
information systems.

4. Data transformation. In this stage, the generation of better data for the data mining is prepared
and developed. Methods here include dimension reduction (such as feature selection and extraction
and record sampling), and attribute transformation (such as discretization of numerical attributes
and functional transformation). This step can be crucial for the success of the entire KDD project,
and it is usually very project-specific.
For example, in medical examinations, the quotient of attributes may often be the most important
factor, and not each one by itself. In marketing, we may need to consider effects beyond our control
as well as efforts and temporal issues (such as studying the effect of advertising accumulation).
However, even if we do not use the right transformation a t the beginning, we may obtain a
surprising effect that hints to us about the transformation needed (in the next iteration). Thus the
KDD process reflects upon itself and leads to an understanding of the transformation needed.
Having completed the above four steps, the following four steps are related to the Data Mining part,
where the focus is on the algorithmic aspects employed for each project:

5. Choosing the appropriate Data Mining task. We are now ready to decide on which type of
Data Mining to use, for example, classification, regression, or clustering. This mostly depends on
the KDD goals, and also on the previous steps. There are two major goals in Data Mining:
prediction and description. Prediction is often referred to as supervised Data Mining, while
descriptive Data Mining includes the unsupervised and visualization aspects of Data Mining. Most
data mining techniques are based on inductive learning, where a model is constructed explicitly or
implicitly by generalizing from a sufficient number of training examples.
The underlying assumption of the inductive approach is that the trained model is applicable to
future cases. The strategy also takes into account the level of meta- learning for the particular set of
available data.

6. Choosing the Data Mining algorithm. Having the strategy, we now decide on the tactics. This
stage includes selecting the specific method to be used for searching patterns (including multiple
inducers). For example, in considering precision versus understandability, the former is better with
neural networks, while the latter is better with decision trees.
For each strategy of meta- learning there are several possibilities of how it can be accomplished.
Meta- learning focuses on explaining what causes a Data Mining algorithm to be successful or not in
a particular problem.
Thus, this approach attempts to understand the conditions under which a
Data Mining algorithm is most appropriate. Each algorithm has parameters and tactics of learning
(such as ten-fold cross-validation or another division for training and testing).

7. Employing the Data Mining algorithm. Finally the implementation of the Data Mining
algorithm is reached. In this step we might need to employ the algorithm several times until a
satisfied result is obtained, for instance by tuning the algorithms control parameters, such as the
minimum number of instances in a single leaf of a decision tree.
8. Evaluation. In this stage we evaluate and interpret the mined patterns (rules, reliability etc.),
with respect to the goals defined in the first step. Here we consider the preprocessing steps with
respect to their effect on the Data Mining algorithm results (for example, adding features in Step 4
and repeating from there). This step focuses on the comprehensibility and usefulness of the induced
model. In this step the discovered knowledge is also documented for further usage.
The last step is the usage and overall feedback on the patterns and discovery results obtained by the
Data Mining:

9. Using the discovered knowledge. We are now ready to incorporate the knowledge into another
system for further action. The knowledge becomes active in the sense that we may make changes to
the system and measure the effects. Actually the success of this step determines the effective ness of
the entire KDD process. There are many challenges in this step, such as losing the laboratory
conditions under which we have operated. For instance, the knowledge was discovered from a
certain static snapshot (usually sample) of the data, but now the data becomes dynamic. Data
structures may change (certain attributes become unavailable), and the data domain may be
modified (such as, an attribute may have a value that was not assumed before).

Ques 6 Consider the table


Employee (Empid, Ename, Job, salary, Hiredate, Deptno)

Write the SQL for the following:

i) Find the average salary of each Deptno

ii) Display the employees who work in Deptno 10 and 20

iii) Display the employee names which were hired in the year 2007

iv) Add a new record to the employee table

Ans. Create Table Employee


{
Empid varchar(20) (unique, not null),
Ename varchar(20),
Job varchar(20),
Salary int not null,
Hiredate datetime(20),
Deptno varchar(20),
};
i) Select Salary, avg(Salary)
From Employee
GROUP BY Deptno;
ii) Select from Employee
Where Deptno = 10 and Deptno = 20;

iii) Select from Employee


Where Hiredate >= 1/1/2007
And
Hiredate <= 31/12/2007;

iv) INSERT into Employee


Values (20, varun, manager, 200000, 1/1/2007, 123);

Ques7. Explain the architecture of Data Warehouse in HealthCare Sector.

Definition of Data warehouse

A data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used
for reporting (1) and data analysis (2). Integrating data from one or more disparate sources creates a
central repository of data, a data warehouse (DW). Data warehouses store current and historical
data and are used for creating trending reports for senior management reporting such as annual and
quarterly comparisons.

Characteristics of a Data warehouse

Subject Oriented: In the Data Warehouse, data is stored by objects, not by applications.
Business subjects differ from enterprise to enterprise. These are the subjects critical for the
enterprise. For a manufacturing company, sales, shipment, and inventory are critical
business subjects.

Integrated: Integration is closely related to subject orientation. Data warehouses must put
data from disparate sources into a consistent format. They must resolve such problems as
naming conflicts and inconsistencies among units of measure. When they achieve this, they
are said to be integrated.

Nonvolatile: Nonvolatile means that, once entered into the warehouse, data should not
change. This is logical because the purpose of a warehouse is to enable you to analyze what
has occurred.

Time Variant: In order to discover trends in business, analysts need large amounts of data.
This is very much in contrast to online transaction processing (OLTP) systems, where
performance requirements demand that historical data be moved to an archive. A data
warehouse's focus on change over time is what is meant by the term time variant.

Data Warehousing in Healthcare facilitates the measurement of the effectiveness of


treatment,relationships between causality and treatment protocols for systematic diseases etc. Also
it is essential to realise that todays healthcare organizations are being evaluated not only on the
quality and effectiveness of their treatment, but also on waste and unnecessary cost. By effectively
leveraging enterprise-wise data n labour expenditures,supply utilization,procedures,medications
prescribed,and other costs associated with patient care,healthcare professionals can identify and
correct wasteful expenditures. These chanes benefit the bottom line and ca n also be used to
differentiate the healthcare organization from its competition.

ARCHITECTURE/COMPONENTS OF DATA WAREHOUSE IN HEALTHCARE:

1. SOURCE DATA COMPONENT: Healthcare organizations maintains data of a patient regarding


his admission,diagnosis,discharge and or/transfer,length of stay,patient demographic,severity of
illness etc. This is done by the organization in the form of private spreadsheets,documents,patient
profile, and sometimes even departmental databases.It is the internal data,the parts of w hich
could be useful in a data warehouse.
2. DATA STAGING COMPONENT: Organizations have bulk of information across various
departments, laboratories and related administrative processes,which are time consuming and
laborious tasts to separately access and integrate reliably.Thus the extracted data coming from
several disparate sources needs to be changed,converted,and made ready in a format that is
suitable to be stored for quering and analysis.
Three major functions need to be performed for getting the data ready for storing into the data
warehouse. One needs to extract the data,transform the the load the data into the data warehouse
stage.

Data Extraction:This function has to deal with numerous data sources. One has to employ the
appropriate technique for each data source.Source Data may be from different source machines in
diverse data formats. Part of the source data may be in relational database systems.Some data may
be on the other legacy network and hierarchical data models.Many data sources may still be in flat
files. Thus data extraction may become quite complex.So, the organization develop in- house
programs to extract data or use outside tools.Most frequently,datawarehouse implementation teams
extract the source into a separate physical environment from which moving the data into the data
warehouse would be easier. In the separate environment, one may extract the source data into a
group of flat files,or a data-staging relational database, or a combination of both.

Data Transformation:One may perform a number of individual tasks as part of data tranformation.
First data is cleaned. Cleaning may just be correction of mis-spellings,or may deal with providing
default values for missing data elements,or elimination of duplicates. Also , standardization forms a
large part of data transformation.

Data Loading: Two distinct groups of tasks form the data loading function. When one completes
the design and construction of the data warehouse and go live for the first time,he does the initial
loading of the data in the data warehouse storage.The initial load moves large volumes of data using
up substantial amount of time. As the data warehouse starts functioning,one may continue to extract
the changes to the source data,transform the data revisions,and feed the incremental data revisions
on an ongoing basis.

3. Data Storage Component:The data storage for the data warehouse is a separate repository
because the operational systems of the organization support the day-to day operations only.
When the analysts use the data in the datahouse for analysis,they need to know that the data
is stable nd that it represents snapshots at specified periods.As they are working with the
data,the data storage must not be in a stage of continual updating.For this reason,the data
warehouses are read-only data repositories.The data may be yearly refreshed or monthly
refreshed etc. Many of the data warehouses also employ Multidimensional database
management.
4. Information Delivery Component: It includes different methods of information delivery for
the healthcare oranization.Ad hoc reports are predefined reports primarily meant meant
for novice and casual users .Provision of for complex queries,multi-dimensional
analysis(MD), and statistical analysis cater to the needs of analysts and power users. EIS is
meant for Senior Doctors and high- level managers. Data Mining applications are knowledge
discovery systems where the mining algorithms help you discover tre nds and patterns from
the usage of data. This helps the healthcare organizations to maintain the clinical inventory
like surgical tools,medicines and check the occupancy and vacant bed ratio.
5. Meta Data Component: Metadata in a data warehouse is similar to the data dictionary or the
data catalog in data base management. It keeps the information about the logical data
structures, the information about the files and addresses, the information about the indexes,
and so on. For eg Data Mart could be created for the number and name of patients in the
general ward for the year 2012.
6. Management and Control component:The management and control component coordinates
the services and activities within the data warehouse. This component controls the data
transformation and the data transformation and the data transfer into the data warehouse
storage. On the other hand,it moderates the information delivery to the users. The
management and control component interacts with the metadata component to perform its
function.

Ques8. How are data warehousing, data mining, expert system technologies associated with
knowledge management?
Knowledge management refers to the critical issues of organizational adaptation,
survival and competence against radical discontinuous environmental change.
Essentially it embodies organizational process that seeks synergistic combination of data
and information processing capacity of information technologies,and creative and
innovative capacity of human beings.
Knowledge Management (KM) refers to a multi-disciplined approach to achieving
organizational objectives by making the best use of knowledge. KM focuses on
processes such as acquiring, creating and sharing knowledge and the cultural and
technical foundations that support them.
Knowledge management is a surprising mix of strategies, tools, and techniques: some of
which are nothing new under the sun. Storytelling, peer-top mentoring, and learning
from mistakes, for example, all have precedents in education, training, and artificial
intelligence practices. Knowledge management makes use of a mixture of techniques
from knowledge-based system design, such as structured knowledge acquisition
strategies from subject matter experts and educational technology(e.g., task and job
analysis to design and develop task support systems. This makes it both easy and
difficult to define what KM is.
At one extreme, KM encompasses everything to do with knowledge. At the other
extreme, it is narrowly defined as an information technology system that dispenses
organizational know-how. KM is in fact both of these and many more. One of the few
areas of consensus in the field is that KM is a highly multidisciplinary field.
Management and coordination of the diverse technology architectures, data architecture
and system architecture poses a knowledge management challenges. Such challenges
result from the need for the integrating need for the diverse technologies, computer
programs, and data sources across internal business processes. These challenges are
compounded manifold by the concurrent need for simultaneously adapting enterprise
architectures to keep up with the changes in external business environment. For most
high- risk and high- return strategic decisions, timely information often unavailable as
more and more of such information is external in nature. Also, internal information may
often be hopelessly out of date with respect to evolving strategic needs.
This is where data warehousing, data mining and external system technologies play a
role in the area of knowledge management.
A data warehouse is a relational database that is designed for query and analysis rather
than for transaction processing. It usually contains historical data derived from
transaction data, but it can include data from other sources. It separates analysis
workload from transaction workload and enables an organization to consolidate data
from several sources.
In addition to a relational database, a data warehouse environment includes an
extraction, transportation, transformation, and loading (ETL) solution, an online
analytical processing (OLAP) engine, client analysis tools, and other applications that
manage the process of gathering data and delivering it to business users.
Data mining or knowledge discovery, is the computer-assisted process of digging
through and analyzing enormous sets of data and then extracting the meaning of the
data. Data mining tools predict behaviours and future trends, allowing businesses to
make proactive, knowledge-driven decisions. Data mining tools can answer business
questions that traditionally were too time consuming to resolve. They scour databases
for hidden patterns, finding predictive information that experts may miss because it lies
outside their expectations.
Data mining derives its name from the similarities between searching for valuable
information in a large database and mining a mountain for a vein of valuable ore. Both
processes require either sifting through an immense amount of material, or intelligently
probing it to find where the value resides.
An Expert system or an Artificial intelligence based system that converts the
knowledge of an expert in a specific subject into a software code. This code can be
merged with other such codes (based on the knowledge of other experts) and used for
answering questions (queries) submitted through a computer. Expert systems typically
consist of three parts: (1) a knowledge base which contains the information acquired by
interviewing experts, and logic rules that govern how that information is applied; (2) an
Inference engine that interprets the submitted problem against the rules and logic of
information stored in the knowledge base; and an (3) Interface that allows the user to
express the problem in a human language such as English. expert systems technology
has found application only in areas where information can be reduced to a set of
computational rules, such as insurance underwriting.

You might also like