Final BI Lab Manual
Final BI Lab Manual
Final BI Lab Manual
DEPARTMENT OF
COMPUTER
ENGINEERING
SEMESTER-II
[A.Y. : 2023 - 2024]
INSTITUTE To impart value added technological education through pursuit of academic excellence,
VISION research and entrepreneurial attitude.
M1: To achieve academic excellence through innovative teaching and learning process.
INSTITUTE
MISSION M2: To imbibe the research culture for addressing industry and societal needs.
M4: To produce competent and socially responsible professionals with core human values.
M4: To incorporate social and ethical awareness among the students to make them
conscientious professionals.
PEO1: To Impart fundamentals in science, mathematics and engineering to cater the needs of society and
Industries.
PEO2: Encourage graduates to involve in research, higher studies, and/or to become entrepreneurs.
PEO3: To Work effectively as individuals and as team members in a multidisciplinary environment with
high ethical values for the benefit of society.
Course Objectives:
Sr.
Course Objective Statements
No.
1 To introduce the concepts and components of Business Intelligence (BI)
a. Course Outcomes:
Sr. No. CO Statements
1 CO1: Differentiate the concepts of Decision Support System & Business Intelligence
List of Assignments
410253(C) : Business Intelligence
2 Perform the Extraction Transformation and Loading (ETL) process to construct the database in
the Sql server
3 Create the cube with suitable dimension and fact tables based on ROLAP, MOLAP and
HOLAP model
4 Import the data warehouse data in Microsoft Excel and create the Pivot table and Pivot
Chart
5 Perform the data classification using classification algorithm. Or Perform the data clustering
using clustering algorithm
Group 2
6 Mini Project: Each group of 4 Students (max) assigned one case study for this; A BI report must be
prepared outlining the following steps:
a) Problem definition, identifying which data mining task is needed.
b) Identify and use a standard data mining dataset available for the problem.
Group A
Prerequisite:
1. Basics of dataset extensions.
2. Concept of data import
Legacy data, according to Business Dictionary, is "information maintained in an old or out-of-date format
or computer system that is consequently challenging to access or handle."
Where does legacy data come from? Virtually everywhere. Figure 1 indicates that there are many sources
from which you may obtain legacy data. This includes existing databases, often relational, although non-
RDBs such as hierarchical, network, object, XML, object/relational databases, and NoSQL databases.
Files, such as XML documents or "flat files” such as configuration files and comma-delimited text files,
are also common sources of legacy data. Software, including legacy applications that have been wrapped
(perhaps via CORBA) and legacy services such as web services or CICS transactions, can also provide
access to existing information. The point to be made is that there is often far more to gaining access to
legacy data than simply writing an SQL query against an existing relational database.
lOMoARcPSD|23094708
Step 2: Click on Get data following list will be displayed → select Excel
Step 3: Select required file and click on Open, Navigator screen appears
Conclusion: In this way we import the Legacy datasets using the Power BI Tool.
Date:
Marks obtained:
Sign of course coordinator:
Name of course
Coordinator :
Assignment No:2
Title: Perform the Extraction Transformation and Loading (ETL) process to construct the database in the Sql
server.
Objective of the Assignment: To introduce the concepts and components ofBusiness Intelligence
(BI)
Prerequisite:
1. Basics of ETL Tools.
2. Concept of Sql Server.
Theory:
ETL(Extract, Transform and Load)
ETL is a process in Data Warehousing and it stands for Extract, Transform and Load.
It is a process, in which an ETL tool extracts the data from various data source systems, transforms it in
the staging area and then finally, loads it into the Data Warehouse system.
Extraction
1. Identify the Data Sources: The first step in the ETL process is to identify the data
sources. This may include files, databases, or other data repositories.
2. Extract the Data: Once the data sources are identified, we need to extract the data
from them. This may involve writing queries to extract the relevant data or using tools
such as SSIS to extract data from files or databases.
3. Validate the Data: After extracting the data, it's important to validate it to ensure
that it's accurate and complete. This may involve performing data profiling or
data quality checks.
Transformation
1. Clean and Transform the Data: The next step in the ETL process is to clean and
transform the data. This may involve removing duplicates, fixing invaliddata, or
converting data types. We can use tools such as SSIS or SQL scriptsto perform these
transformations.
2. Map the Data: Once the data is cleaned and transformed, we need to map thedata to
the appropriate tables and columns in the database. This may involvecreating a data
mapping document or using a tool such as SSIS to perform the mapping.
Loading
1. Create the Database: Before loading the data, we need to create the database and the
appropriate tables. This can be done using SQL Server Management Studio or a SQL
script.
2. Load the Data: Once the database and tables are created, we can load the data into
the database. This may involve using tools such as SSIS or writingSQL scripts to
insert the data into the appropriate tables.
3. Validate the Data: After loading the data, it's important to validate it toensure that it
was loaded correctly. This may involve performing data. profiling or data quality
checks to ensure that the data is accurate and complete.
Perform the Extraction Transformation and Loading (ETL) process to construct thedatabase in
the SQL server.
Step 6 : Click ok and in select backup devices window Addboth files of AdventureWorks
Step 10: Configure OLE DB Connection Manager window appears Click on New
Step 11: Select Server name(as per your machine) from drop down and databasename and click on
Test connection.
If the test connection succeeded, click on OK.
Step 14: Drag OLE DB Source from Other Sources and drop into Data Flow tab
Step 15: Double click on OLE DB source -> OLE DB Source Editor appears->click on New to
add connection manager.
Select [Sales].[Store] table from drop down ok
Step 16: Drag ole db destination in data flow tab and connect both
Click on OK.
USE [AdventureWorks2012]GO
SELECT [BusinessEntityID] ,
[Name]
,[SalesPersonID]
,
[Demographics] ,
[rowguid] ,
[ModifiedDate]
FROM [dbo].[OLE DB Destination]GO
Conclusion : In this way we can perform the ETL process to construct adatabase in SQL Server.
Date:
Marks obtained:
Sign of course coordinator:
Name of course Coordinator :
Assignment No:3
Title of the Assignment : Create the cube with suitable dimension and fact tables based
on ROLAP, MOLAP and HOLAP model.
Prerequisite:
1. Basics of OLAP.
2. Concept of Multi Dimensional Cube.
Theory :
In Business Intelligence (BI), A Fact Table is a table that stores quantitative data or facts about
a business process or activity.It is a central table in a data warehouse that provides a snapshot of
a business at a specific point in time.
For example - A Fact Table in a retail business might contain sales data for each transaction,
with dimensions such as date, product, store, and customer. Analysts can use the Fact Table to
analyze trends and patterns in sales, such as which products are selling the most, which stores
are performing well, and which customers are buying the most.
ROLAP, MOLAP, and HOLAP are three types of models used in Business Intelligence (BI)
for organizing and analyzing data:
1. ROLAP (Relational Online Analytical Processing):
In this model, data is stored in a relational database, and the analysis is performed by
joining multiple tables. ROLAP allows for complex queries and is good for handling large
amounts of data, but it may be slower due to the need for frequent joins.
2. MOLAP (Multidimensional Online Analytical Processing):
In this model, data is stored in a multidimensional database, which is optimized for fast query
performance. MOLAP is good for analyzing data in multiple dimensions, such as time,
geography, and product, but may be limited in its ability to handle large amounts of
data. 3. HOLAP (Hybrid Online Analytical Processing):
This model combines elements of both ROLAP and MOLAP. It stores data in both a relational
and multidimensional database, allowing for efficient analysis of both large amounts of data and
complex queries. HOLAP is a good compromise between the othertwo models, offering both
speed and flexibility.
4. Create the cube with a suitable dimension and fact tables based on OLAP ?
3. Click on execute or press F5 by selecting the query one by one or directly click on
Execute.
4. After completing execution save and close SQL Server Management studio & Reopen
Step 2: Start SSDT environment and create New Data Source Go to Sql Server DataTools --
> Right click and run as administrator
Click on New
Select Server Name → select Use SQL Server Authentication → Select or enter adatabase
name (Sales_DW)
Note : Password for sa : admin123 (as given during installation of SQL 2012 fullversion)
Click Next
Click Next
click Next
select FactProductSales(dbo) from Available objects and put in Includes Objects byclicking
Click Next
Click Finish
Click on Finish
Sales_DW.cube is created
Drag and Drop Product Name from Table in Data Source View and Add in AttributePane at
left side
Deployment successful
Click run
Date:
Marks obtained:
Name of course
Coordinator :
dimension and fact tables based on ROLAP, MOLAP and HOLAP model
Assignment No:-4
MoARcPSD|23094708
Title of the Assignment: Import the data warehouse data in Microsoft Excel and create the Pivot table
and Pivot Chart.
Objective of the Assignment: To introduce the concepts and components of Business Intelligence (BI)
Prerequisite:
24
Department of Computer Engineering , ZCOER, Narhe, Pune-41 Page 37
Department of Computer Engineering , ZCOER, Narhe, Pune-41 Page 38
lOMoARcPSD|23094708
especially useful when dealing with large amounts of data, as they can help identify patterns and
trends that might not be immediately obvious from the raw data.
Conclusion: In this way we pivot table and pivot chart using Google spreadsheets | Excel.
Date:
Marks obtained:
Sign of course coordinator:
Name of course Coordinator :
Department of Computer Engineering , ZCOER, Narhe, Pune-41 Page 39
D|23094708
Assignment No:-5
Title of the Assignment: Perform the data classification using classification algorithm. Or perform the
data clustering using a clustering algorithm.
Objective of the Assignment: To introduce the concepts and components of Business Intelligence (BI)
2. Clustering in Tableau:
3. Classification in Tableau:
Theory:
Clustering in Tableau:
1. Connect to the data: Connect to the data set that you want to cluster in Tableau.
2. Drag and drop the data fields: Drag and drop the data fields into the view, and select the data points
that you want to cluster.
3. Choose a clustering algorithm: Select a clustering algorithm from the analytics pane in Tableau.
Tableau provides several built-in clustering algorithms, such as K-Means and Hierarchical
Clustering.
2
Department of Computer Engineering , ZCOER, Narhe, Pune-41 Page 40
Department of Computer Engineering , ZCOER, Narhe, Pune-41 Page 41
lOMoARcPSD|23094708
4. Define the number of clusters: Define the number of clusters that you want to create. You
can do this manually or let Tableau automatically determine the optimal number of clusters.
5. Analyze the clusters: Visualize the clusters and analyze them using Tableau's built-in
visualizations and tools.
Classification in Tableau:
1. Connect to the data: Connect to the data set that you want to classify in Tableau.
2. Drag and drop the data fields: Drag and drop the data fields into the view, and select
the target variable that you want to predict.
4. Define the model parameters: Define the model parameters, such as the maximum tree
depth or the number of trees to use in the forest.
5. Train the model: Train the model on a subset of the data using Tableau's built-in cross-
validation functionality.
6. Evaluate the model: Evaluate the accuracy of the model using Tableau's built-in metrics,
such as confusion matrix, precision, recall, and F1 score.
7. Predict the target variable: Use the trained model to predict the target variable for new data.
8. Visualize the results: Create visualizations to communicate the results of the classification
analysis using Tableau's built-in visualization tools.
Date:
Marks obtained:
Sign of course
coordinator:
Name of course
Coordinator :