0% found this document useful (0 votes)
15 views9 pages

Homework

The document describes setting up a data warehouse database with dimension and fact tables linked by foreign keys and partitioned by country. It also covers loading the data from source databases into the data warehouse using SQL Server Integration Services (SSIS) data flows and creating views for analysis. Machine learning models are built in Weka on a predictive view to compare the performance of decision tree and Bayesian classifiers.

Uploaded by

fatima alhaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views9 pages

Homework

The document describes setting up a data warehouse database with dimension and fact tables linked by foreign keys and partitioned by country. It also covers loading the data from source databases into the data warehouse using SQL Server Integration Services (SSIS) data flows and creating views for analysis. Machine learning models are built in Weka on a predictive view to compare the performance of decision tree and Bayesian classifiers.

Uploaded by

fatima alhaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

- database:

///

CREATE DATABASE 2S23SalesDWH;

USE 2S23SalesDWH;

///

- DimProd table:

///

```

CREATE TABLE DimProd (

ProdID INT PRIMARY KEY IDENTITY(1,1),

Size VARCHAR(50),

Category VARCHAR(50)

);

///

- DimCust table:

///

CREATE TABLE DimCust (

CustID INT PRIMARY KEY IDENTITY(1,1),

Age INT,

Gender VARCHAR(10),

AnnualIncome DECIMAL(15,2),

NumChildren INT

);

///
- DimAdrs table:

///

CREATE TABLE DimAdrs (

AdrsID INT PRIMARY KEY IDENTITY(1,1),

CountryRegion VARCHAR(50)

);

///

basic fact table linked with dimension tables and with sales amount as measure:

///

CREATE TABLE FactSale (

SaleID INT PRIMARY KEY IDENTITY(1,1),

ProdID INT,

CustID INT,

AdrsID INT,

SaleAmount DECIMAL(15,2),

FOREIGN KEY (ProdID) REFERENCES DimProd (ProdID),

FOREIGN KEY (CustID) REFERENCES DimCust (CustID),

FOREIGN KEY (AdrsID) REFERENCES DimAdrs (AdrsID)

);

///

You might need to alter the `FactSale` table later according to part 2 instructions. Also, adjust the field
types and sizes according to your actual data.
2.

-filegroups:

///

ALTER DATABASE 2S23SalesDWH

ADD FILEGROUP USA,

ADD FILEGROUP Canada,

ADD FILEGROUP Mexico;

///

-add files to the filegroups:

///

ALTER DATABASE 2S23SalesDWH

ADD FILE

NAME = 'USAData',

FILENAME = 'C:\path\USAData.ndf',

SIZE = 5MB,

MAXSIZE = 100MB,

FILEGROWTH = 5MB

TO FILEGROUP USA,

ADD FILE

NAME = 'CanadaData',

FILENAME = 'C:\path\CanadaData.ndf',

SIZE = 5MB,

MAXSIZE = 100MB,
FILEGROWTH = 5MB

TO FILEGROUP Canada,

ADD FILE

NAME = 'MexicoData',

FILENAME = 'C:\path\MexicoData.ndf',

SIZE = 5MB,

MAXSIZE = 100MB,

FILEGROWTH = 5MB

TO FILEGROUP Mexico;

///

-the partition function. Note that you need to decide on the boundaries for partitioning:

///

CREATE PARTITION FUNCTION CountryRegionPF (VARCHAR(50))

AS RANGE LEFT FOR VALUES ('USA', 'Canada', 'Mexico');

///

- the partition scheme:

///

CREATE PARTITION SCHEME CountryRegionPS

AS PARTITION CountryRegionPF

TO (USA, Canada, Mexico, [PRIMARY]);

///
recreate the `FactSale` table within the partition scheme. ( drop it first if it already exists):

///

DROP TABLE IF EXISTS FactSale;

CREATE TABLE FactSale (

SaleID INT PRIMARY KEY IDENTITY(1,1),

ProdID INT,

CustID INT,

AdrsID INT,

SaleAmount DECIMAL(15,2),

CountryRegion VARCHAR(50),

FOREIGN KEY (ProdID) REFERENCES DimProd (ProdID),

FOREIGN KEY (CustID) REFERENCES DimCust (CustID),

FOREIGN KEY (AdrsID) REFERENCES DimAdrs (AdrsID)

ON CountryRegionPS (CountryRegion);

///
3.

The first part of the task, creating views, is done in SQL. DataFlow tasks are typically created in SQL
Server Integration Services (SSIS) which is a graphical tool, so they can't be represented as code.

the views:

///

-- Create view for product information

CREATE VIEW vw_Products AS

SELECT ProdID, Size, Category

FROM AdventureWork.Product;

-- Create view for place of sale information

CREATE VIEW vw_Address AS

SELECT AdrsID, CountryRegion

FROM AdventureWork.Address;

-- Create view for sales transaction information

CREATE VIEW vw_Sales AS

SELECT SaleID, ProdID, CustID, AdrsID, SaleAmount

FROM AdventureWork.Sales;

///
You would create the DataFlow tasks in SSIS as follows:

1. Open SQL Server Data Tools (SSDT) and create a new Integration Services Project.

2. In the Control Flow tab, drag a Data Flow Task from the Toolbox onto the design surface.

3. Double-click the Data Flow Task to go to the Data Flow tab.

4. Drag a Source Assistant from the Toolbox onto the design surface.

5. Double-click the Source Assistant and configure it to use the AdventureWork connection and select
the appropriate view.

6. Drag a Destination Assistant from the Toolbox onto the design surface.

7. Connect the Source Assistant to the Destination Assistant.

8. Double-click the Destination Assistant and configure it to use the 2S23SalesDWH connection and
select the appropriate table.

9. Repeat steps 2 to 8 for each view.

To extract the customer data from an Excel file, you would use the Excel Source in SSIS and configure it to
use the Excel connection manager and select the appropriate sheet.

If you need to do transformations on the data, you would add Transformations between the Source and
Destination. For example, you might add a Lookup Transformation to match up identifiers between the
AdventureWork database and the 2S23SalesDWH database.

Once your DataFlow tasks are set up, you can run the package to load the data into the 2S23SalesDWH
database.
4.

- create the view in SQL Server:

///

-- Create view for the predictive model

CREATE VIEW vw_PredictiveModel AS

SELECT dp.Size AS ProductSize, da.CountryRegion AS SalesCountry, dc.Age AS CustomerAge,

dc.Gender, dc.AnnualIncome, dc.NumChildren, fs.BuyBike

FROM FactSale fs

JOIN DimProd dp ON fs.ProdID = dp.ProdID

JOIN DimCust dc ON fs.CustID = dc.CustID

JOIN DimAdrs da ON fs.AdrsID = da.AdrsID;

///

create a DataFlow task in SSIS to export the data from this view to an Excel file. As previously stated, this
is a graphical process and can't be represented as code.

follow the same steps as for the other DataFlow tasks, but your Source would be the
vw_PredictiveModel view and your Destination would be an Excel connection manager.

The rest of the steps involve using Weka, which is a graphical tool for machine learning and data mining.
Here are the steps you would follow, although they can't be represented as code:

1. Open Weka and choose the Explorer.

2. Click Open file and select the Excel file you created (you may need to convert it to CSV first).

3. Choose the `Classify` tab.

4. Click on `Choose` and select `trees.J48` for the DecisionTree model.

5. In the `Test options` section, choose `Cross-validation` and enter `10` for the number of folds.

6. Click `Start`.
After the DecisionTree model has been built, you can view the tree by clicking on the `Visualize tree`
button.

To compare this with a Bayesian model:

1. Click on `Choose` again and select `bayes.NaiveBayes`.

2. Click `Start` again.

Now you can compare the results of the DecisionTree and Bayesian models by comparing the output in
the classifier output area. You would typically look at measures like accuracy, precision, recall, and the F-
measure to compare the performance of the two models.

You might also like