0% found this document useful (0 votes)
39 views3 pages

Medical

The document discusses the design and implementation of a data warehouse at the University of Florida. Key points: - The data warehouse stores detailed transaction data from various sources in a highly denormalized DB2 repository and feeds data to different models for various administrative units. - The data warehouse aims to generate reports, feed business intelligence tools, forecast trends, and train machine learning models. - Goals for implementation include dramatic performance gains, reasonable additional storage, transparency for users, direct benefits for all users, minimal impact on costs and administrative responsibilities.

Uploaded by

dd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views3 pages

Medical

The document discusses the design and implementation of a data warehouse at the University of Florida. Key points: - The data warehouse stores detailed transaction data from various sources in a highly denormalized DB2 repository and feeds data to different models for various administrative units. - The data warehouse aims to generate reports, feed business intelligence tools, forecast trends, and train machine learning models. - Goals for implementation include dramatic performance gains, reasonable additional storage, transparency for users, direct benefits for all users, minimal impact on costs and administrative responsibilities.

Uploaded by

dd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Excutive summary

A discussion of the design and modeling issues associated with a data warehouse for the University of
Florida, as developed by the office of the Chief Information Officer (CIO). The data warehouse is
designed and implemented on a mainframe system using a highly de-normalized DB2 repository for
detailed transaction data and for feeding data to heterogeneous data models owned by different
administrative units. Further more in this report we will discuss other aspects of the dara warehouse
along with the stakeholders analysis and also the curent and future states.
Needs
Since the need for Data Warehouse is to generate reports, feed data to Business Intelligence (BI)
tools, forecast trends, and train Machine Learning models. Data Warehouse stores data from multiple
sources such as APIs, Databases, Cloud Storage, etc., using the ETL (Extract Load Transform) process.
The concept of the data warehouse has existed since the 1980s, when it was developed to help
transition data from merely powering operations to fueling decision support systems that reveal
business intelligence.
Expected outcomes
The data warehouse must provide data services to numerous administrative units on campus. It is
both a parallel and serial processing environment, which executes different tasks and requests
concurrently. Implementation goals include:
• Dramatic performance gains for as many categories of user queries as possible,
• A reasonable amount of extra data storage to the warehouse,
• Complete transparency to end users and to application designers,
• Direct benefit to all users, regardless of which query tool they use,
• Impact the cost of the data system as little as possible, and
• Impact the DBA’s administrative responsibilities as little as possible.
scope
A data warehouse is a central server system that permits the storage, analysis, and interpretation of
data to aid in decision-making. It is a storage area that houses structured data (database tables, Excel
sheets) as well as semi-structured data (XML files, webpages) for tracking and reporting. So therefore
it becomes an essentail part for the organization.
Inscope
In current times the world is moving at fast pace so it becomes important for all the finacial and
business cooperations to have data warehouses because it makes the transcation of data easy and
theirfore it allows the user to save their time.
Out of scope
Small cooperations didn’t need data warehouses it is because they can manage that with single
system because the traffic is not high according to their business, similarly small finanail systems
also didn’t need data warehosues.

Stakeholder analysis

CEO
The CEO is the main stakeholder in our case because of managing a insititue overall operations. This
may include delegating and directing agendas, driving profitability, managing organizational structure,
strategy, and communicating with the board.
Department mangers
Department mangers are the other stakeholders because they have to manages the daily activities of
the team responsible for the design, implementation, maintenance, and support of data of
warehouse systems and projects. Oversees data design and the creation of database architecture and
data repositories.

Assumptions
The system will not shut at any moment so the end can access that at any moment they want.
The required sit will open in 1 click without any delay.
No one will be able to hack that data becuase of its high level security.
Constraint
It uses a simple set of XML-like tags to integrate HTML or XML with data from dynamic queries. It
does not require any user-written CICS programs for data access.
End users view the data through dynamic Web pages that access predefined canned queries produced
by Eagle Server Pages (ESP).
Dependencies
the choice of technology.
the checking the suitability.
the installation of hardware and software.
the development of guidelines for performant use.
Cureent state diagram
The current of data warehouse is
 Extracting data from legacy systems and other data sources,
 Cleansing, scrubbing, and preparing data for decision supports, Page
 Proceedings of the 2001 American Society for Engineering Education Annual Conference &
Exposition Copyright  2001, American Society for Engineering Education
 Maintaining consistent data in appropriate data storage,
 Ensuring and protecting information assets at minimum cost,
 Accessing and analyzing data using a variety of end user tools,
 Mining data for significant relationships, and
 Providing both summarized data as well as extremely fine-grained data.

Future state of warehouse

The future state of data warehouse will be

• Determine the data types, primary key, foreign keys, and how data will be passed between tables. •
Define and determine the parameters of the table storage.
• Estimate the size of the table storage and the entire data warehouse – The lengths of each attribute,
the number of rows for the initial prototype, the full historical load, and incremental rows per load
when in production.
• Develop the initial indexing plan and define the indexes, Overview the indexes and query strategies
to optimize the database in the process. We build many sets of indexes for queries to increase the
efficiency of data retrieval. We use query analyzers to view the results of queries, optimize the
queries, and improve the indexes on the query.

Business requirements
Platform Functions
These features establish a baseline for the system to operate around. Interactivity refers to the
communication process between human users and the software and how easy the system is to use.
Customizations and white labeling allow users to remake the software to their preferences and needs.
This has the double benefit of a seamless experience with other software systems you might use and
the assurance that your employees will actually use it.
Scalability
Scalability is one of the most vital differentiators for a data warehouse solution. A robust solution
scales rapidly to terabytes or even petabytes of data and concurrent users without downtimes or
disruptions. Elasticity refers to scaling up and down instantly to meet demands. Scale up rapidly to
handle unexpected workloads, scale down just as quickly to reduce resources and expenses.
Performance Requirements
At the end of the day, your data warehouse should be able to handle huge workloads efficiently,
utilize finite resources to deliver the best performance, parallelly process multiple queries, users and
processes – enhancing analytics and business decisions. An ideal solution lets you stream data in real
time while sustaining ACID properties for transactions. Workload separation is essential for parallel
processing, it refers to the proper balancing and prioritization of processes and users. Increasing data
load throughput enables faster ETL processing while a lower latency leads to faster querying.
Data Visualization
Once data is organized in a data warehouse, it is ready to be visualized. This involves the system
discovering trends and patterns in data sets and generating graphs, charts, scattergrams and other
visual depictions. Visualization makes complex statistical relations easy to interpret for users. Did you
know that when we sit down to read a website, we only read an average of 28 percent of the words
on the page? We skim, make assumptions and extrapolate based on the words we do read to glean
information. That’s one reason visual depictions are so much more effective at delivering information
to our brains. Data visualization helps bridge that gap and offer information that sticks.
Integrations
While some BI tools restrict their users to proprietary architecture, more and more are offering a
range of integrations with other kinds of software systems and data sources. For example, service-
centered organizations need to be able to draw data directly from their CRM to generate reports and
visualizations on that information. Extract, transform, load (ETL) is also a crucial integration. ETL
combines three database functions into a single tool in order to transfer data from one database to
another.

You might also like