0% found this document useful (0 votes)
83 views7 pages

Lab Terminal Data Warehousing and Data Mining: Part-I (CLO-C1, C2, C3)

The document describes setting up a star schema in SQL Server for a data warehouse containing sales data. It includes instructions to: 1. Create dimension tables for products, customers, and employees and a fact table for sales. 2. Insert sample data into the tables. 3. Write SQL queries to analyze the data, such as finding the best selling product and sales by a specific customer. 4. Create a view for a frequent query and add a clustered index to improve performance. It also describes using linear regression to predict housing prices based on house characteristics.

Uploaded by

Ar. Raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views7 pages

Lab Terminal Data Warehousing and Data Mining: Part-I (CLO-C1, C2, C3)

The document describes setting up a star schema in SQL Server for a data warehouse containing sales data. It includes instructions to: 1. Create dimension tables for products, customers, and employees and a fact table for sales. 2. Insert sample data into the tables. 3. Write SQL queries to analyze the data, such as finding the best selling product and sales by a specific customer. 4. Create a view for a frequent query and add a clustered index to improve performance. It also describes using linear regression to predict housing prices based on house characteristics.

Uploaded by

Ar. Raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Lab Terminal

Data Warehousing and Data Mining

Part-I [CLO-C1, C2, C3]

Suppose we have built a star schema for sales related data to be loaded in the data warehouse. The
schema contains following three dimension tables (Products, Customers, Employees) and a fact table
(Sales):

 Products (ProductID, Name, Price)


 Customers (CustomerID, FirstName, MiddleInitial, LastName)
 Employees (EmployeeID, FirstName, MiddleInitial, LastName)
 Sales (SalesID, EmployeeID, CustomerID, ProductID, Quantity)

You have also been provided with the .sql files containing data to be inserted in each of the above tables.

Now, you should follow the following steps and paste the screenshots as instructed:

Step-1: Using SQL Server Management Studio, create a database 1. The database name must be your
registration number without dashes.

1
https://datatofish.com/database-sql-server/
<Paste here the screenshot showing that the database has been created>

Step-2: Using SQL Server Management Studio, create the three-dimension tables and the fact table
within the database created in Step-1.2

<Paste here the screenshot showing that the four tables have been created>

Employees

Customers

2
https://datatofish.com/table-sql-server/
Products

Step-3: Insert the provided data to the corresponding tables of your database.

<Paste here the four screenshots each showing that the count of the number of rows in each table>
Step-4: Answer the following queries over the star schema:

 Q1: Which product has been sold the most?


<Write your sql query here>

SELECT TOP(1) P.Name,(S.Quantity)

FROM Sales Sel, Products P

where

Sel.Quantity = (SELECT max(Sel.Quantity)

FROM Sales S, Products P

where Sel.ProductID=P.ProductID

) group by Name,Quantity

<Paste here the screenshot showing that the result of the query>

Sorry sir no light (labtop battery dead)

Q2: Product-wise count of the all the product that the customer Trevor C Coleman bought.

<Write your sql query here>

SELECT COUNT(Prod.Name),Prod.Name

FROM Customers Cs, Products Prod, Sales Sel

where Cs.FirstName = 'Trevor' and Cs.MiddleInitial='C' and

Cs.LastName = 'Coleman' and Cs.CustomerID = Sel.CustomerID and

Prod.ProductID = Sel.Product

Group by Name

<Paste here the screenshot showing that the result of the query>
Sorry sir no light (labtop battery dead)

 Q3: Which employee has made the least number of sales?


<Write your sql query here>
<Paste here the screenshot showing that the result of the query>
Step-5: Suppose that Trevor C Coleman is a VIP customer and the query Q2 in Step-4 is run frequently
in the data warehouse.

<Write your sql script here>

Create a simple view in SQL

Create VIEW PriorityCustomer

AS SELECT COUNT(Prod.Name),Prod.Name

FROM Customers Cust, Products Prod, Sales Sal

where Cust.FirstName = 'Trevor' and Cust.MiddleInitial='C' and Cust.LastName = 'Coleman'

and Cust.CustomerID = Sal.CustomerID and Prod.ProductID = Sal.ProductID

group by Name

 Write SQL script that will add a clustered index to it on attributes ProductID and
CustomerID.

<Write your sql script here>

ALTER VIEW PriorityCustomer

WITH SCHEMABINDING

AS SELECT COUNT_BIG(Prod.Name) as 'No. of Products',Prod.Name,Prod.ProductID,Cust.CustomerID,

COUNT_BIG(*) as RecordCount

FROM dbo.Customers Cust, dbo.Products Prod, dbo.Sales Sal

where Cust.FirstName = 'Trevor' and Cust.MiddleInitial='C' and Cust.LastName = 'Coleman'

and Cust.CustomerID = Sal.CustomerID and Prod.ProductID = Sal.ProductID

group by Prod.Name,Prod.ProductID,Cust.CustomerID

CREATE UNIQUE CLUSTERED INDEX IX_PriorityCustomer

ON PriorityCustomer (ProductID, CustomerID)


 Now run the Q2 again and report the difference in terms of time taken to execute it

Part-II [CLO-C4, C5]

You are provided with the housing prices dataset which has following six attributes:
 Age of the house
 Distance to the nearest metro station
 Number of convenience stores in the locality
 Latitude
 Longitude
 Unit area house price

The dataset contains data of prices of 414 houses. Your task is to fit a linear regression model to it and
once fitted, use it to predict the price for the following house:

Age Distance Number of Latitude Longitude


stores
14.7 1717.193 2 24.96447 121.5165

For this part, you also need to provide your code and the linear regression model (i.e., the equation) that’s
been fitted to the dataset.

Code
import pandas as pd

import matplotlib.pyplot as plt from sklearn

import linear_model

Givendata = pd.read_csv("Downloads\\housingprices.csv")

Givendata.columns=['Age', 'Distance', 'Stores', 'Latitude', 'Longitude','Price']

Reg = linear_model.LinearRegression()

Reg.fit(df[['Age','Distance','Stores','Latitude','Longitude']],Givendata.Price)

print("The Price: ",Reg.predict([[14.7,1717.193,2,24.96447,121.5165]]))

You might also like