BI - Lab Manual
BI - Lab Manual
Import the legacy data from different sources such as ( Excel , Sql Server, Oracle etc.)
and load in the target system. ( You can download sample databases such as Adventure
works,Northwind, foodmart etc.)
Objective of the Assignment :
To introduce the concepts and components of Business Intelligence (BI)
Prerequisite:
1. Basics of dataset extensions.
2. Concept of data import.
Theory :
Legacy Data :
Legacy data, according to BusinessDictionary, is "information maintained in an old or out-
of-date format or computer system that is consequently challenging to access or handle."
Sources of Legacy Data
Where does legacy data come from? Virtually everywhere. Figure 1 indicates that there
are many sources from which you may obtain legacy data. This includes existing
databases, often relational, although non-RDBs such as hierarchical, network, object,
XML, object/relational databases, and NoSQL databases. Files, such as XML documents
or "flat files"• such as configuration files and comma-delimited text files, are also
common sources of legacy data. Software, including legacy applications that have been
wrapped (perhaps via CORBA) and legacy services such as web services or CICS
transactions, can also provide access to existing information. The point to be made is that
there is often far more to gaining access to legacy data than simply writing an SQL query
against an existing relational database.
Step 2 : Click on Get data following list will be displayed → select Excel
Step 3: Select required file and click on Open, Navigator screen appears
Step 4: Select file and click on edit
Step 5: Power query editor appears
Step 6: Again, go to Get Data and select OData feed
Step 7:
Paste url as http://services.odata.org/V3/Northwind/Northwind.svc/ Click on
ok
Step 8: Select orders table
And click on edit
Click on edit to view table
Conclusion : In this way we import the Legacy datasets using the Power BI Tool.
Group No: 1
Assignment No: 2
Perform the Extraction Transformation and Loading (ETL) process to construct the
database in the Sql server.
Prerequisite:
1. Basics of dataset extensions.
2. Concept of data import.
Theory:
Extraction Transformation and Loading (ETL) :-
ETL, which stands for extract, transform and load, is a data integration process that combines
data from multiple data sources into a single, consistent data store that is loaded into a data
warehouse or other target system.
As the databases grew in popularity in the 1970s, ETL was introduced as a process for
integrating and loading data for computation and analysis, eventually becoming the primary
method to process data for data warehousing projects.
ETL provides the foundation for data analytics and machine learning workstreams. Through a
series of business rules, ETL cleanses and organizes data in a way which addresses specific
business intelligence needs, like monthly reporting, but it can also tackle more advanced
analytics, which can improve back-end processes or end user experiences. ETL is often used by
an organization to:
Extract data from legacy systems
Cleanse the data to improve data quality and establish consistency
Load data into a target database
Extract
During data extraction, raw data is copied or exported from source locations to a staging area.
Data management teams can extract data from a variety of data sources, which can be structured
or unstructured. Those sources include but are not limited to:
Since data coming from multiple sources has a different schema, every dataset must be
transformed differently before utilizing BI and analytics. For instance, if you are compiling data
from source systems like SQL Server and Google Analytics, these two sources will need to be
treated individually throughout the ETL process. The importance of this process has increased
since big data analysis has become a necessary part of every organization.
Comprehensive automation and ease of use: Leading ETL tools automate the entire data
flow, from data sources to the target data warehouse. Many tools recommend rules for
extracting, transforming and loading the data.
A visual, drag-and-drop interface: This functionality can be used for specifying rules
and data flows.
Support for complex data management: This includes assistance with complex
calculations, data integrations, and string manipulations
Security and compliance: The best ETL tools encrypt data both in motion and at rest and
are certified compliant with industry or government regulations, like HIPAA and GDPR.
Conclusion :-
In this way we import the Extraction Transformation and Loading (ETL) process to construct
the database in the Sql server.
Group No: 1
Assignment No: 3
Create the cube with suitable dimension and fact tables based on OLAP, MOLAP and HOLAP
model
To introduce the concepts and components of OLAP, MOLAP and HOLAP model.
Prerequisite:
1. Basics of dataset extensions.
2. Concept of data import.
Theory:
What is OLAP?
OLAP was introduced into the business intelligence (BI) space over 20 years ago, in a time where
computer hardware and software technology weren’t nearly as powerful as they are today. OLAP
introduced a (typically analysts) to easily perform multidimensional analysis of large volumes of
business data.
Aggregating, grouping, and joining data are the most difficult types of queries for a relational
database to process. The magic behind OLAP derives from its ability to pre-calculate and pre-
aggregate data. Otherwise, end users would be spending most of their time waiting for query results
to be returned by the database. However, it is also what causes OLAP-based solutions to be
extremely rigid and IT-intensive.
Changes to an OLAP cube requires a full update of the cube – a lengthy process
OLAP stands for Relational Online Analytical Processing. ROLAP stores data in columns and
rows (also known as relational tables) and retrieves the information on demand through user
submitted queries. A ROLAP database can be accessed through complex SQL queries to
calculate information. ROLAP can handle large data volumes, but the larger the data, the slower
the processing times.
Because queries are made on-demand, ROLAP does not require the storage and pre-computation
of information. However, the disadvantage of ROLAP implementations are the potential
performance constraints and scalability limitations that result from large and inefficient join
operations between large tables. Examples of popular ROLAP products include Metacube by
Stanford Technology Group, Red Brick Warehouse by Red Brick Systems, and AXSYS Suite by
Information Advantage.
What is MOLAP?
MOLAP stands for Multidimensional Online An
Its simple interface makes MOLAP easy to use, even for inexperienced users. Its speedy data
retrieval makes it the best for “slicing and dicing” operations. One major disadvantage of MOLAP is
that it is less scalable than ROLAP, as it can handle a limited amount of data.
What is HOLAP?
HOLAP stands for Hybrid Online Analytical Processing. As the name suggests, the HOLAP
storage mode connects attributes of both MOLAP and ROLAP. Since HOLAP involves storing
part of your data in a ROLAP store and another part in a MOLAP store, developers get the
benefits of both.
With this use of the two OLAPs, the data is stored in both multidimensional databases and
relational databases. The decision to access one of the databases depends on which is most
appropriate for the requested processing application or type. This setup allows much more
flexibility for handling data. For theoretical processing, the data is stored in a multidimensional
database. For heavy processing, the data is stored in a relational database.
Microsoft Analysis Services and SAP AG BI Accelerator are products that run off HOLAP.
Conclusion :-
Import the data warehouse data in Microsoft Excel and create the Pivot table and Pivot Char
To introduce the concepts and components of data warehouse data in Microsoft Excel and create
the Pivot table and Pivot Char.
Prerequisite:
1. Basics of dataset extensions.
2. Concept of data import.
Theory:
In these tutorials you learn how to import and explore data in Excel, build and refine
a data model using Power Pivot, and create interactive reports with Power View that
you can publish, protect, and share.
At the end of this tutorial is a quiz you can take to test your learning.
This tutorial series uses data describing Olympic Medals, hosting countries, and
various Olympic sporting events. We suggest you go through each tutorial in order.
Also, tutorials use Excel 2013 with Power Pivot enabled. For more information on
Excel 2013, click here. For guidance on enabling Power Pivot, click here.
We start this tutorial with a blank workbook. The goal in this section is to connect to
an external data source, and import that data into Excel for further analysis.
Let’s start by downloading some data from the Internet. The data describes Olympic
Medals, and is a Microsoft Access database.
1. Click the following links to download files we use during this tutorial
series. Download each of the four files to a location that’s easily accessible,
such as Downloads or My Documents, or to a new folder you create:
Select the PivotTable Report option, which imports the tables into Excel
and prepares a PivotTable for analyzing the imported tables, and click OK.
6. Once the data is imported, a PivotTable is created using the imported tables.
With the data imported into Excel, and the Data Model automatically created, you’re
ready to explore the data.
Exploring imported data is easy using a PivotTable. In a PivotTable, you drag fields
(similar to columns in Excel) from tables (like the tables you just imported from the
Access database) into different areas of the PivotTable to adjust how it presents your
data. A PivotTable has four areas: FILTERS, COLUMNS, ROWS, and VALUES.
It might take some experimenting to determine which area a field should be dragged
to. You can drag as many or few fields from your tables as you like, until the
PivotTable presents your data how you want to see it. Feel free to explore by
dragging fields into different areas of the PivotTable; the underlying data is not
affected when you arrange fields in a PivotTable.
Let’s explore the Olympic Medals data in the PivotTable, starting with Olympic
medalists organized by discipline, medal type, and the athlete’s country or region.
1. In PivotTable Fields, expand the Medals table by clicking the arrow
beside it. Find the NOC_CountryRegion field in the
expanded Medals table, and drag it to the COLUMNS area. NOC stands
for National Olympic Committees, which is the organizational unit for a
country or region.
2. Next, from the Disciplines table, drag Discipline to the ROWS area.
3. Let’s filter Disciplines to display only five sports: Archery, Diving,
Fencing, Figure Skating, and Speed Skating. You can do this from within
the PivotTable Fields area, or from the Row Labels filter in the PivotTable
itself.
Or, in the Row Labels section of the PivotTable, click the dropdown next to Row
Labels in the PivotTable, click (Select All) to remove all selections, then scroll
down and select Archery, Diving, Fencing, Figure Skating, and Speed Skating.
Click OK.
In PivotTable Fields, from the Medals table, drag Medal to the VALUES area.
Since Values must be numeric, Excel automatically changes Medal to Count of
Medal.
From the Medals table, select Medal again and drag it into the FILTERS area.
Let’s filter the PivotTable to display only those countries or regions with more than
90 total medals. Here’s how.
With little effort, you now have a basic PivotTable that includes fields from three
different tables. What made this task so simple were the pre-existing relationships
among the tables. Because table relationships existed in the source database, and
because you imported all the tables in a single operation, Excel could recreate those
table relationships in its Data Model.
But what if your data originates from different sources, or is imported at a later time?
Typically, you can create relationships with new data based on matching columns. In
the next step, you import additional tables, and learn how to create new relationships.
Import data from a spreadsheet
Now let’s import data from another source, this time from an existing workbook,
then specify the relationships between our existing data and the new data.
Relationships let you analyze collections of data in Excel, and create interesting and
immersive visualizations from the data you import.
Let’s start by creating a blank worksheet, then import data from an Excel workbook.
Formatting the data as a table has many advantages. You can assign a name
to a table, which makes it easy to identify. You can also establish
relationships between tables, enabling exploration and analysis in
PivotTables, Power Pivot, and Power View.
6. Name the table. In TABLE TOOLS > DESIGN > Properties, locate
the Table Name field and type Sports. The workbook looks like the
following screen.
Now that we’ve imported data from an Excel workbook, let’s import data from a
table we find on a web page, or any other source from which we can copy and paste
into Excel. In the following steps, you add the Olympic host cities from a table.
Conclusion :-
Prerequisite:
1. Basics of dataset extensions.
2. Concept of data import.
Theory:
I’ll be using the MNIST dataset which comes with scikit learn which is a
collection of labelled handwritten digits and use KMeans to find clusters
within the dataset and test how good it is as a feature.
I have created a class named clust for this purpose which when initialized
takes in a sklearn dataset and divides it into train and test dataset.
The function KMeans applies KMeans clustering to the train data with the
number of classes as the number of clusters to be made and creates labels
both for train and test data. The parameter output controls how do we want
to use these new labels, ‘add’ will add the labels as a feature in the dataset
and ‘replace’ will use the labels instead of the train and test dataset to train
our classification model.
Conclusion :-
In this way we import data clustering using clustering algorithm.