0% found this document useful (0 votes)

31 views

Incorporating Data Warehouse Using SSIS

The document discusses implementing a data warehouse solution for a retail store to enable better business decision making. It describes the current system and limitations. It then proposes a dimensional data warehouse schema including fact and dimension tables to store sales data and related attributes to facilitate analysis and data mining.

Uploaded by

dhiewo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Incorporating Data Warehouse Using SSIS

Uploaded by

dhiewo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Database Systems & Management ID : 47642

In the name of Allah, the most gracious, the most merciful

Abstract

Information is key to the success for many businesses today. The more information is
managed regarding the customers, the better business decisions a company can make. This
means that as the database continue to grow in size and complexity, there is a significant
need in its management to extract vital information from this data.

Many AI techniques and tools are being deployed in order to find those hidden patterns and
trends in data to gain competitive advantage. The project discusses some of these BI
techniques and their implications on the business.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

Introduction

The assignment focuses on incorporating a data warehouse solution to realize tangible

business value from large amounts of transaction data.

It justifies how the data warehouse implementation could enable business decision making
in Quality food retail store.

Task 1

As we can the store has 4 branches which store its data in a distributed environment. It
makes use of Microsoft Access Database. It also stores data in flat files.
This is the high level architecture diagram of current system:

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

The complete table schema design of the existing OLTP system with entities and attributes
is shown below :

Invoice Sales Customer

InvoiceID SaleID CustomerID

SaleID CustomerID StaffID

ProductID Customer_Name

Quantity Customer_Address

Total_Amount DOB

DateTime Gendre

Staff
Store
StaffID
Product StoreID
Staff_Name
ProductID StaffID
Staff_Designation
Product_Name StoreName
Gendre
Product_Description StoreLocation
Received_Training
Product_category StoreSales
Training_Date
Date_Join

Limitations of the Current Design

As we can see from the ERD that the existing design of the OLTP database system has
many limitations which prevent it from offering valuable insights about the sales. It does not
show any particular relation between the customer, products, sales, staff or the stores
demographics.

We will now improve on these aspects and design the data warehouse system that would
offer a deep insight into the relation between customer, products, sales and staff.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

Task 2.A

Business Intelligence is a set of theories, methodologies, processes, architectures and

technologies that transform raw data into meaningful and useful information for business
purposes.
- Performance Management
- Data Warehousing
- Enterprise Reporting
- Data Mining

Approach
A proposed solution is the implementation of the Data Warehouse System (DWH) which
would facilitate the data analysis and data mining.

We will make use of the Entity Relational concept to design the data base for the grocery
store and then use that to built the data warehouse and mine the point-of-sale records into
the data warehouse.

By mining demographic data about customers, Quality foods could develop products and
promotions to appeal to specific customer segments.

Figure below shows the architecture diagram of a DWH system to be implemented:

We will make use of this flow to incorporate data warehouse methodologies into the Quality
Food Grocery. The first stage in designing a data warehouse is to design the logical model
of the data warehouse schema based on the initial findings.

Justification of the Solution Proposed

There are various database management tools like Oracle, Microsoft, Teradata etc. We will
use Microsoft SQL Server since the old data was stored in flat file and in MS Access
database, so a Microsoft Product will be most suitable for a smooth ETL process.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

Task 2.B

Proposed Data Warehouse Schema (Snowflake)

We considered a Snowflake schema for the design of data warehouse system. In this
schema we intend to store the sales data as a Fact table which links all the Dimension tables
which have relationship among each other.

The data model illustrated in the following diagram shows the tables and relationships that
are to be used in the data warehouse:

DimProduct FactSales *
DimCustomer *

DimProductSubcategory
DimGeography

DimProductCategory DimPromotion DimDate

DWH Schema
The planned data warehouse logical schema consists of the following entities and attributes:

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

DimCustomer *
CustomerKey
GeographyKey DimDate
Datekey
CustomerLabel
FullDateLabel
Title
DateDescription
FirstName
CalendarYear
MiddleName
DimGeography CalendarYearLabel
GeographyKey LastName
CalendarHalfYear
GeographyType NameStyle
CalendarHalfYearLabel
ContinentName BirthDate
CalendarQuarter
CityName MaritalStatus
CalendarQuarterLabel
StateProvinceName Suffix
CalendarMonth
RegionCountryName Gender
CalendarMonthLabel
Geometry EmailAddress
CalendarWeek
ETLLoadID YearlyIncome
CalendarWeekLabel
LoadDate TotalChildren
CalendarDayOfWeek
UpdateDate NumberChildrenAtHome
CalendarDayOfWeekLabel
Education
FiscalYear
Occupation
FiscalYearLabel
HouseOwnerFlag
FiscalHalfYear
DimProduct NumberCarsOwned
FiscalHalfYearLabel
ProductKey AddressLine1
FiscalQuarter
ProductLabel AddressLine2
FiscalQuarterLabel
ProductName Phone
FiscalMonth
ProductDescription DateFirstPurchase
FiscalMonthLabel
ProductSubcategoryKey CustomerType
IsWorkDay
Manufacturer CompanyName
IsHoliday
BrandName ETLLoadID
HolidayName
ClassID LoadDate
EuropeSeason
ClassName UpdateDate
NorthAmericaSeason
StyleID
AsiaSeason
StyleName
ColorID
ColorName FactSales *
Size SalesKey

SizeRange CustomerKey

SizeUnitMeasureID DateKey DimPromotion

channelKey PromotionKey
Weight
StoreKey PromotionLabel
WeightUnitMeasureID
ProductKey PromotionName
UnitOfMeasureID
PromotionKey PromotionDescription
UnitOfMeasureName
CurrencyKey DiscountPercent
StockTypeID
UnitCost PromotionType
StockTypeName
UnitPrice PromotionCategory
UnitCost
SalesQuantity StartDate
UnitPrice
ReturnQuantity EndDate
AvailableForSaleDate
ReturnAmount MinQuantity
StopSaleDate
DiscountQuantity MaxQuantity
Status
DiscountAmount ETLLoadID
ImageURL
TotalCost LoadDate
ProductURL
SalesAmount UpdateDate
ETLLoadID
LoadDate ETLLoadID

UpdateDate LoadDate
UnitsSold
Markup
Profit
PurchaseCostPerUnit
PurchaseCost
UpdateDate

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

DWH Schema Justification

There are various assumptions and reasons for choosing the snowflake design schema for
the grocery store data warehouse design. Some of these are mentioned below :

1- Sales data is very crucial for the data analysis. Since the sales data is directly co-related
with the Product and Customers so we decided to make the Sales Table as a Fact Table
which will incorporate all the important aspects of data mining.
However, since the initial data source of transactional data from which the data has been
extracted into the data warehouse, stores the customer's address within the customer table;
we decided to store that information separately in Geography Table. This was done to
analyse geographical data of user at a later stage. This design caused our DHW schema to
be a Snowflake schema.

2 - Secondly, the transactional data had timestamp of each transaction stored in the invoice
table. For our data warehousing the date and time of each transaction is a very important
signal of information so we have decided to make a separate dimension table for Date and
Time. We call it the DimDate Table.

3- We have decided to do an analysis on the customer transaction with respect to their age
and salary, so for that we needed the customer information along with their invoice details
which contain information about their purchased products.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

Task 2.C

The Extraction, Transformation and Loading (ETL) process begins by extracting data from
the source database. The destination database is then populated on a database system like
Oracle or SQL Server which will host the data warehouse database.

Many vendors have produced their own version of ETL tools, like Microsoft SQL Server
Integration Services, IBM Cognos, Informatica PowerCenter, SAS Data Integration Studio to
do perform the ETL tasks.

We will make use of a ETL tool which would be best suited to communicate with different
relational databases and different file formats.

For this project we will make use of Microsoft SQL Server Integration Services (SSIS) which
is part of the SQL Server 2012 powered by Visual Studio 2013.

In SSIS - We begin by allocating source and destination connections.

1) Create a Control Flow of the process

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

2) Specify Data Flow Task

3) Specify Source and Destination Data

Connections

4) Choose the source fields that need to the mapped

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

5) Setup the mappings between the source and destination columns

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

6) At this stage if you see an error on the OLE DB Destination Component that means
there is some data type mismatch between the source and destination table.
In order to fix this we would make use of a Data Conversion Component, which will help us
to auto suggest the data type conversions.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

7) Once we have done these steps for all the tables that we need to bring into our
destination data source, we can start the transformation process.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

8) We can Start the Transformation process and at this moment we should see the
successful results.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

9) Verify the successful Transformed Database by querying it.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

Task 3

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

The Entity Relation (ER) model is a detailed, logical representation of the data. They lead to
ease of development and implementation of relational databases.

Entity relationship concept are rules to interpret, specify and document logical data
requirements.

The initial model of the database was not-relational and so it was difficult to associate any
relationship among the customer and sales data.

That is why the use of ER concept is considered in the design of this project. The database
management system used in this project is SQL Server which is a relational database
management system (RDBMS) based on relational model as introduced by EF Codd.

Task 4

DWH is a way or methodology of efficiently storing large amounts of data. This data can then
be used to apply data mining and business intelligence techniques.
The legacy OLTP system was impractical for the new business requirements of Quality Food
Store. They needed an system on which they can produce reports, analyse trends and
patterns to enhance their sales and store performance.
The legacy system could not store data for analytics. It was not capable to store large
amounts of data. It was also not suitable of querying large data sets and produce efficient
search results. The old system was not capable of predicting any short terms future trends.
It was based on flat files and access database which are considered old technologies and
latest database management software are able to overcome all of these.

There was no historical data storage facility available in the legacy system which means that
the data gets overwritten after a certain period of time. This was a huge loss of important
data.

There was no way to capture the user trends and user buying and shopping preferences.
The data was highly normalised which means they have to get rid of minute details which is
critical in data analysis.

Data warehouse implementation facilitates all of that. The OLAP tools tends to extract
historical data that has accumulated over a long period of time. For the new data warehouse
system, redundant or "de-normalized" data would likely facilitate business intelligence
applications.

The schema that has been recommended for the data warehouse has considered these
details about the new business requirements for the Quality Food Store.

Task 5

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

For this task we would extract information from the data using data mining tools. In order to
do that we have to analyse what are the those signals of information for which we would
need to extract from the data.
We will analyse the data from various angles to find the problems specified in the case
study. Some signals of information could be:

 Buying pattern of customers

 Ethnicity of the customers
 Location of the Customers
 Shopping days and timings
 Product related to another product
 Non Similar products (e.g. Umbrella sales brought Ice Cream sales down)
 Re-Buying probability
 Which product is suitable for promotion
 Spending Pattern (how much each customer spend ?)
 Quarter Sales

Analytical Tools (Data Mining Tools)

Different Tools are available for Data Mining and Data Analysis from various vendors like
Oracle Discoverer, R, SPSS, SAS, Weka

For this assignment we will be making use of Weka Machine Learning Tool which is a
combination of machine learning algorithms for data mining.

Note that it's the data on which we apply Machine Learning Algorithms (using Weka in this
case) to discover valuable knowledge.

Predicting Markup using Linear Regression Algorithm

Now that we have the data from our data warehouse, we can start analysing the data to
predict some trends and patterns in order to aid the management in making the promotion
plans.
We will make use of Linear Regression Model in WEKA to forecast the Markup of a product
based on the unit sales and purchase cost and that would give us the estimated profit for
that product.
This model can then be used to calculate the markup for other products as well.
This will give us some insights weather we should bring this product on promotion or not.
We will pull the required data from the data warehouse and use it for analysis in WEKA.

STEP 1 (Training)

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

Weka works by first training the algorithm with test data. Once the system is trained with
initial training data then we can use test data to predict the results. We have the initial
training file .arff file.
We will take sample of 7 different products, and use the units sold, purchase cost and
purchase cost per unit to find markup for that product, so that we can then predict the profit
of this product.

We will use product sales figure as a sample file to train the system. We use Linear
Regression Model in Weka.

The training sample file has this data in .arff format :

Units Purchase Cost Purchase Mark Up - Sales Profit

Sold Per Unit Cost 5 times Price
of
Purchase
Cost
Apple 108000 0.35 37800 1.75 189000 151200
Banana 200000 0.1 20000 0.5 100000 80000
Mineral 350000 0.3 105000 1.5 525000 420000
Water
Kitchen 100000 0.2 20000 1 100000 80000
Towel
Tea 50000 0.75 37500 3.75 187500 150000
Sugar 25000 0.1 2500 0.5 12500 10000
Coffee 75000 1.05 78750 5.25 393750 315000

Now that the data has been chosen, we have to train the algorithm to build the model with
this data. Use training set option to train the system. This tells Weka that to build our desired
model using this data as a training set.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

Next, In Classify tab, Choose LinearRegression Model under the classifier functions.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

Choose the dependent variable (the column we are looking to predict). We know this should
be the Markup, since that's what we're trying to determine for the product.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

Right below the test options, there's a combo box which lets you choose the dependent
variable. The column Markup should be selected by default. If it's not, please select it.
Now we are ready to create our model.
Click Start to begin the Machine Learning process; to train the system with the data
provided. Figure shows what the output should look like:

Interpreting the regression model

WEKA simply puts the regression model in the output, and calculate the formula for us as
shown in listing below :

Regression output :

markUp = 5 * purchaseCostPerUnit + 0

As we can see that WEKA has calculated the formula for marup for us. We can use this
information to predict the markup for any other product.

STEP 2 (Save the model)

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

We save the model as a LinearRegression.model file

STEP 3 (Prediction)

Now we can use that model to predict the profit of other products. For a single product we
can simple use the above formula and calculate it's markup.

Our Test data looks like this :

Where ? are the values which we are trying to predict

Units Purchase Cost Purchas Mark Up - Sales Profit

Sold Per Unit e Cost 5 times Price
of
Purchase
Cost
Apple 10800 0.35 37800 1.75 189000 151200
0
Banana 20000 0.1 20000 0.5 100000 80000
0
Mineral 35000 0.3 105000 1.5 525000 420000
Water 0
Kitchen 10000 0.2 20000 1 100000 80000
Towel 0
Tea 50000 0.75 37500 3.75 187500 150000
Sugar 25000 0.1 2500 0.5 12500 10000
Coffee 75000 1.05 78750 5.25 393750 315000
Milk 80000 0.2 16000 ? ? ?

Process of re-evaluating the Test Set (for Prediction)

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

 In Weka select the Supplied test Set option,

 and in More Options... check Output Predictions.
 Next right click to Re-Load the training model (LinearRegression.model) file
 and right click Re-evaluate model on current test set.

Results

Weka calculate the missing value for us.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

=== Predictions on test set ===

=== Summary ===

Correlation coefficient 1
Mean absolute error 0
Root mean squared error 0
Total Number of Instances 7
Ignored Class Unknown Instances 1

Conclusion

Weka calculated the markup price is 1.

British Institute of Technology & E-commerce (BITE)

Database Systems & Management ID : 47642

Based on this estimated value the Sales Price will be 80000 (1 X 80000) and therefore the
foreseeable Net Profit will be 64000 (80000-16000).

The results for the product :

Units Purchase Cost Per Purchase Mark Up - Sales Price Profit

Sold Unit Cost 5 times of
Purchase
Cost

Milk 80000 0.2 16000 1 80000 64000

Also we concluded that data mining strives to turn simple data into useful information by
creating models and rules. Our goal was to use the models and rules to predict future
behaviour, to improve business, and to explain things which we might not otherwise be able
to with normal analysis.

Impact of Emerging Database Technologies

I would like to conclude that Data Warehouse is changing drastically and Column Oriented
Databases will take over in the next 5 years with technologies like MongoDB, SAP HANA,
HP Vertica, EMC Pivotal, IBM DB2 BLU and Oracle 12c.
As the data is increasingly being generated, most companies have realised that data is the
new natural resource of the future. In order to manage it various companies are spending
vast amounts of resources in research & development to make the most of this data.
Implementations like BIG Data is increasingly becoming common in forward thinking
corporations.

British Institute of Technology & E-commerce (BITE)

TÜV Functional Safety Program: WWW - Prosalus.co - Uk
100% (3)
TÜV Functional Safety Program: WWW - Prosalus.co - Uk
267 pages
Software Test Plan For A Mobile Application
67% (3)
Software Test Plan For A Mobile Application
7 pages
Journal Lowcode Nocode
No ratings yet
Journal Lowcode Nocode
17 pages
20767C ENU Companion PDF
100% (1)
20767C ENU Companion PDF
194 pages
CS7079NI - Data Warehousing and Big Data Y22 Autumn (1st Sit) - CW QP
No ratings yet
CS7079NI - Data Warehousing and Big Data Y22 Autumn (1st Sit) - CW QP
5 pages
Business Intelligence
No ratings yet
Business Intelligence
27 pages
Design and Implementation of An Enterprise Data Warehouse
No ratings yet
Design and Implementation of An Enterprise Data Warehouse
91 pages
Design and Implementation of An Enterprise Data Warehouse
No ratings yet
Design and Implementation of An Enterprise Data Warehouse
91 pages
Design and Implementation of An Enterprise Data Warehouse
No ratings yet
Design and Implementation of An Enterprise Data Warehouse
91 pages
Data Warehouse Design For E-Commerce Environment
No ratings yet
Data Warehouse Design For E-Commerce Environment
25 pages
Data Ware House
No ratings yet
Data Ware House
6 pages
FundamentalsOfDesigningDW MelissaCoates
No ratings yet
FundamentalsOfDesigningDW MelissaCoates
87 pages
DB0512 Data Warehousing
No ratings yet
DB0512 Data Warehousing
3 pages
Assignment2014 15
No ratings yet
Assignment2014 15
3 pages
8 Data Warehousing
No ratings yet
8 Data Warehousing
113 pages
Project4B PrProject4B - Presentationesentation
No ratings yet
Project4B PrProject4B - Presentationesentation
7 pages
Olap and Oltap
No ratings yet
Olap and Oltap
14 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
Web Based Business Intelligence Tool For A Financial Organization
No ratings yet
Web Based Business Intelligence Tool For A Financial Organization
4 pages
Paper Presentation: Data Ware Housing AND Data Mining
No ratings yet
Paper Presentation: Data Ware Housing AND Data Mining
10 pages
Technologies Enabling Organizational Memory: S.Rosun@uom - Ac.mu
No ratings yet
Technologies Enabling Organizational Memory: S.Rosun@uom - Ac.mu
39 pages
Data Warehouse
No ratings yet
Data Warehouse
9 pages
Advances in Database Querying: S. Sudarshan
No ratings yet
Advances in Database Querying: S. Sudarshan
86 pages
Data Warehouse: Tobiasgroup, Inc
No ratings yet
Data Warehouse: Tobiasgroup, Inc
18 pages
1 Lecture 1-Introduction
No ratings yet
1 Lecture 1-Introduction
22 pages
Data Warehouse Using Kimball Approach in Computer Maniac
No ratings yet
Data Warehouse Using Kimball Approach in Computer Maniac
10 pages
Rainfall Analysis Implementing On Data Warehouse
No ratings yet
Rainfall Analysis Implementing On Data Warehouse
12 pages
Data Warehousing (MOHD UMAIR AHMED FAROOQUI) - 1
No ratings yet
Data Warehousing (MOHD UMAIR AHMED FAROOQUI) - 1
76 pages
Introduction On Data Warehouse With OLTP and OLAP: Arpit Parekh
No ratings yet
Introduction On Data Warehouse With OLTP and OLAP: Arpit Parekh
5 pages
06 - Physical Design in Data Warehouse
No ratings yet
06 - Physical Design in Data Warehouse
18 pages
Trends in Data Warehousing and Business Intelligence
No ratings yet
Trends in Data Warehousing and Business Intelligence
44 pages
jvp42019
No ratings yet
jvp42019
10 pages
UNIT 1 NNRG
No ratings yet
UNIT 1 NNRG
56 pages
JVP 42019
No ratings yet
JVP 42019
10 pages
Data Mining & Housing
No ratings yet
Data Mining & Housing
13 pages
Warehouse Assignment
No ratings yet
Warehouse Assignment
9 pages
Chapter Four
No ratings yet
Chapter Four
43 pages
Data Warehousing Good To Learn
No ratings yet
Data Warehousing Good To Learn
44 pages
Unit 1 DWDM
No ratings yet
Unit 1 DWDM
122 pages
DWH Start l2
No ratings yet
DWH Start l2
117 pages
Malineni Lakshmaiah Engineering College S.KONDA-523101 Andhra Pradesh
No ratings yet
Malineni Lakshmaiah Engineering College S.KONDA-523101 Andhra Pradesh
15 pages
Assignment of Information Technology: Submitted To: Submitted by
No ratings yet
Assignment of Information Technology: Submitted To: Submitted by
14 pages
20767B ENU Companion
100% (1)
20767B ENU Companion
188 pages
Data Wharehousing, OLAP and Data Mining
No ratings yet
Data Wharehousing, OLAP and Data Mining
84 pages
Unit 1
No ratings yet
Unit 1
22 pages
CMP1042 Information Systems
No ratings yet
CMP1042 Information Systems
4 pages
Advances in Database Querying: Sripad Sanpurkar
No ratings yet
Advances in Database Querying: Sripad Sanpurkar
87 pages
Activity For Today
No ratings yet
Activity For Today
4 pages
Eserver I5 and Db2: Business Intelligence Concepts
No ratings yet
Eserver I5 and Db2: Business Intelligence Concepts
12 pages
Data Warehousing Fundamentals: Designing and Planning: Course Objectives
No ratings yet
Data Warehousing Fundamentals: Designing and Planning: Course Objectives
7 pages
Dataware House
No ratings yet
Dataware House
78 pages
Design of Data Warehouse and Business in
No ratings yet
Design of Data Warehouse and Business in
78 pages
BIDA NOTES (1)
No ratings yet
BIDA NOTES (1)
67 pages
DWDM
No ratings yet
DWDM
15 pages
DWDM Final
No ratings yet
DWDM Final
193 pages
Session 5 - Data Warehousing and OLAP
No ratings yet
Session 5 - Data Warehousing and OLAP
38 pages
Case Study Retail Chain
No ratings yet
Case Study Retail Chain
2 pages
Elective-I Advanced Database Management Systems
No ratings yet
Elective-I Advanced Database Management Systems
67 pages
Study and Realization of S
No ratings yet
Study and Realization of S
4 pages
Data Warehousing and OLAP Technology
No ratings yet
Data Warehousing and OLAP Technology
12 pages
Data Warehousing and Business Intelligence
No ratings yet
Data Warehousing and Business Intelligence
8 pages
Chapter6_DataWareHousing_final
No ratings yet
Chapter6_DataWareHousing_final
46 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Hadoop-Use - Cases
No ratings yet
Hadoop-Use - Cases
28 pages
SAP IDoc Tutorial - SAP Zero To Hero
No ratings yet
SAP IDoc Tutorial - SAP Zero To Hero
13 pages
GCC M1 Specifications - v1.5-AR (Saudia)
No ratings yet
GCC M1 Specifications - v1.5-AR (Saudia)
41 pages
Chapter 17 Data Analysis and Visualisation
No ratings yet
Chapter 17 Data Analysis and Visualisation
18 pages
Cim Unit 1
No ratings yet
Cim Unit 1
77 pages
Oracle Webcenter Content For Accounts Payable: Data Sheet
No ratings yet
Oracle Webcenter Content For Accounts Payable: Data Sheet
3 pages
Mkt Term Paper
No ratings yet
Mkt Term Paper
17 pages
Oracle r12 Contract Import Interface Guide
No ratings yet
Oracle r12 Contract Import Interface Guide
194 pages
Key Costs and Quality Steps For Building Revit Families
No ratings yet
Key Costs and Quality Steps For Building Revit Families
3 pages
Senior Product Designer
No ratings yet
Senior Product Designer
1 page
POC Enablement - Interval Meter Analytics - IMA Module Overview
No ratings yet
POC Enablement - Interval Meter Analytics - IMA Module Overview
7 pages
Exercise in Class For Topic 4
No ratings yet
Exercise in Class For Topic 4
3 pages
Cloud Computing - Assignment 2 - Greenwich FPT
No ratings yet
Cloud Computing - Assignment 2 - Greenwich FPT
43 pages
Reaction Paper
No ratings yet
Reaction Paper
2 pages
Modern Marketing Proposal
No ratings yet
Modern Marketing Proposal
25 pages
Dissertation Topics in Big Data
100% (2)
Dissertation Topics in Big Data
8 pages
Checkpoint in Talend
No ratings yet
Checkpoint in Talend
4 pages
Dissertation On Mobile Payment
100% (2)
Dissertation On Mobile Payment
8 pages
Adam Peabody Resume 2020
No ratings yet
Adam Peabody Resume 2020
2 pages
Qlik Sense Enterprise Multi Cloud Capabilities Data Sheet 6-18
No ratings yet
Qlik Sense Enterprise Multi Cloud Capabilities Data Sheet 6-18
4 pages
Abhishek Singh
No ratings yet
Abhishek Singh
1 page
ONLINE SHARE TRADING SYSTEM - Ajmal
100% (1)
ONLINE SHARE TRADING SYSTEM - Ajmal
51 pages
Is Career Path
No ratings yet
Is Career Path
16 pages
Integrated Planning Solutions With SAP BPC
No ratings yet
Integrated Planning Solutions With SAP BPC
2 pages
Agappe Casestudy
No ratings yet
Agappe Casestudy
9 pages
Buy Verified PayPal Accounts
No ratings yet
Buy Verified PayPal Accounts
10 pages
Master Data Introduction
No ratings yet
Master Data Introduction
9 pages

Incorporating Data Warehouse Using SSIS

Uploaded by

Incorporating Data Warehouse Using SSIS

Uploaded by

Database Systems & Management ID : 47642

In the name of Allah, the most gracious, the most merciful

British Institute of Technology & E-commerce (BITE)

The assignment focuses on incorporating a data warehouse solution to realize tangible

British Institute of Technology & E-commerce (BITE)

Invoice Sales Customer

SaleID CustomerID StaffID

Limitations of the Current Design

British Institute of Technology & E-commerce (BITE)

Business Intelligence is a set of theories, methodologies, processes, architectures and

Figure below shows the architecture diagram of a DWH system to be implemented:

Justification of the Solution Proposed

British Institute of Technology & E-commerce (BITE)

Proposed Data Warehouse Schema (Snowflake)

DimProductCategory DimPromotion DimDate

British Institute of Technology & E-commerce (BITE)

SizeUnitMeasureID DateKey DimPromotion

British Institute of Technology & E-commerce (BITE)

DWH Schema Justification

British Institute of Technology & E-commerce (BITE)

In SSIS - We begin by allocating source and destination connections.

1) Create a Control Flow of the process

British Institute of Technology & E-commerce (BITE)

2) Specify Data Flow Task

3) Specify Source and Destination Data

4) Choose the source fields that need to the mapped

British Institute of Technology & E-commerce (BITE)

5) Setup the mappings between the source and destination columns

British Institute of Technology & E-commerce (BITE)

British Institute of Technology & E-commerce (BITE)

British Institute of Technology & E-commerce (BITE)

British Institute of Technology & E-commerce (BITE)

9) Verify the successful Transformed Database by querying it.

British Institute of Technology & E-commerce (BITE)

British Institute of Technology & E-commerce (BITE)

British Institute of Technology & E-commerce (BITE)

 Buying pattern of customers

Analytical Tools (Data Mining Tools)

Predicting Markup using Linear Regression Algorithm

British Institute of Technology & E-commerce (BITE)

The training sample file has this data in .arff format :

Units Purchase Cost Purchase Mark Up - Sales Profit

British Institute of Technology & E-commerce (BITE)

British Institute of Technology & E-commerce (BITE)

British Institute of Technology & E-commerce (BITE)

Interpreting the regression model

STEP 2 (Save the model)

British Institute of Technology & E-commerce (BITE)

We save the model as a LinearRegression.model file

Our Test data looks like this :

Units Purchase Cost Purchas Mark Up - Sales Profit

Process of re-evaluating the Test Set (for Prediction)

British Institute of Technology & E-commerce (BITE)

 In Weka select the Supplied test Set option,

Weka calculate the missing value for us.

British Institute of Technology & E-commerce (BITE)

=== Predictions on test set ===

=== Summary ===

Weka calculated the markup price is 1.

British Institute of Technology & E-commerce (BITE)

The results for the product :

Units Purchase Cost Per Purchase Mark Up - Sales Price Profit

Milk 80000 0.2 16000 1 80000 64000

Impact of Emerging Database Technologies

British Institute of Technology & E-commerce (BITE)

You might also like