Project Two: Gap Analysis & Proposal: Presented by John Miere
Project Two: Gap Analysis & Proposal: Presented by John Miere
Format data to match the format in MSQL warehouse such as the numbering of
the months, Kansas City months start at “2”, this needs to corrected in order to be
consistent with the other facilities data.
Load the formatted and corrected data into MSQL warehouse to be with other
facilities.
Ensure all future data from Kansas City store flows into the MSQL
warehouse.
Visualize the Gap Analysis
Current Desired
S ta t State
Dadatabase
ta s is
tor e Extract to excel
incompatible with spreadsheet
e
MSQL warehouse
KC ddata
in is being Future data will be
loaded into defunct loaded into MSQL
AS400 database warehouse
Data stored in excel Data will be moved
spreadsheet, into MSQL
separate from other warehouse with
facilities Datoathceorrrfaectleitdie
Data stored in excel free of null and errors,
with null and error tsoformatted
be to match
values otherdata
facilities
Summarize the Gap Analysis
◦ The analysis conducted has proven that in order to operate efficiently as a business, a
warehouse storing the data of ALL facilities is crucial to accurate reporting and data
analyzing in a timely manner.
◦ Creating an ETL process would be beneficial as it allows for easier accessing and
analyzing of data, and to make informed decisions more quickly and efficiently.
ETL (Extract, Transform, and
Load) Process
◦ Extract
◦ Extracting the KC data from AS400 database where it currently resides and in its current
form, into an excel spreadsheet to view and correct, as necessary.
◦ Transform
◦ Correcting any null and error values with the aid of the supplemental word document
and ensuring accuracy.
◦ Load
◦ All KC data will be loaded accurately and consistently into the MSQL warehouse that houses
the data of the other facilities.
Sensitive or Confidential
Information Handling
◦ The data that is involved is not sensitive information, but it should be made so that
only internal processes and workers who need to use this data have access, such as
those who currently have access to this data at the other facilities.
PROPOSAL TO
DATA STEWARD
Issues Identified in Project One
◦ Kansas City data is stored in defunct AS400 database that is incompatible with the
MSQL warehouse.
◦ KC data initially shows null and error values.
◦ KC data missing one month
◦ Month numbering may have to be renumbered to start at “2”.
◦ Values loaded into excel are in scientific formula.
◦ Needs to be reformatted to match other facilities.
Production and Testing
Environments
◦ Sandbox environments are important to the preservation of data integrity and
should be supported for formatting and the loading of new data.
◦ Sandbox allows data to be manipulated and preserved without error.
◦ A master copy of data should be secured and separated as a backup to prevent
loss of data during the ETL process.
◦ ETL process should be tested in sandbox then moved to production
environment if proven successful.
Additional Data Resources
◦ Additional data resources that may be needed for this project could be some ETL
tools such as Talend or Stitch
References
◦ Cramer, J. J. (2019, March 5). 6 Key Responsibilities of the Invaluable Data
Steward. Dun & Bradstreet. https://www.dnb.com/perspectives/master- data/6-
key-responsibilities-of-data-stewards.html.
◦ Markovic, I. (2019, November 1). Gap analysis: What it is and why it’s
important in project management. TMS. https://tms-
outsource.com/blog/posts/gap-analysis/.
◦ Tobin, D. (2020, September 8). ETL & Data Warehousing Explained: ETL Tool
Basics. Xplenty. https://www.xplenty.com/blog/etl-data-warehousing- explained-
etl-tool-basics/.