BI Notes QA
BI Notes QA
(AUTONOMOUS)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Course Name : 20CS765 -Business Intelligence
Class : IV CSE
Syllabus
UNIT – I
Basics of Business Intelligence
Introduction to Digital Data and its types – Structured, Semi Structured and Unstructured –
Introduction to OLTP and OLAP, OLAP Architectures – Data Models. BI Definitions and
Concepts – Business Intelligence Applications – BI Framework – BI Process – BI
Technology – BI Roles and Responsibilities – BI Best Practices.
UNIT – II
Data Integration
Data Warehouse – Need and Goals of Data Warehouse – Data Integration – Need and
Advantages of Data Integration – Common Data Integration Approaches – Data Integration
Technologies – Data Quality – Data Profiling Concepts and Applications – Introduction to
ETL using SSIS.
UNIT – III
Data Flow and Transformations
Introduction to SSIS Architecture – Introduction to ETL using SSIS – Integration Services
Objects – Data Flow Components – Sources, Transformations and Destinations – Working
with Transformations, Containers, Tasks, Precedence Constraints and Event Handlers.
UNIT – IV
Multidimensional Data Modeling
Introduction to Data and Dimension Modeling –Types of Data Model – Data Modeling
Techniques – Fact Table – Dimension Table – Typical Dimensional Models – Dimensional
Model Life Cycle – Introduction to Business Metrics and KPIs – Creating Cubes using
SSAS.
UNIT – V
ENTERPRISE REPORTING
Introduction to Enterprise Reporting – Reporting Perspectives Common to all Levels of
Enterprise – Report Standardization and Presentation Practices – Enterprise Reporting
Characteristics in OLAP – Concepts of Balanced Scorecards, Dashboards – Create
Dashboards – Scorecards vsDashboards – Introduction to SSRS Architecture – Enterprise
Reporting using SSRS.
Text Books:
1. R N Prasad, SeemaAcharya, Fundamentals of Business Analytics, John Wiley
India Pvt. Ltd, US, Second Edition, 2016.
2. David Loshin, Business Intelligence - The Savvy Manager's Guide, Morgan
Kaufmann Publishers, United States, Second Edition, 2012.
UNIT – I
Basics of Business Intelligence
Introduction to Digital Data and its types – Structured, Semi Structured and Unstructured –
Introduction to OLTP and OLAP, OLAP Architectures – Data Models. BI Definitions and
Concepts – Business Intelligence Applications – BI Framework – BI Process – BI
Technology – BI Roles and Responsibilities – BI Best Practices.
Part – A (2 Marks)
1. What is Business Intelligence?
Business Intelligence (BI) refers to the strategies, technologies, and tools used by
businesses to collect, analyze, and present data to support better decision-making.
6. Define OLTP
• On-Line Transaction Processing (OLTP) –refers to a class of systems that manage
transaction oriented applications.
• These applications are mainly concerned with the entry, storage and retrieval of
data.
7. Define OLAP
• OLAP is designed for enabling businesses to derive insights from vast datasets
through multidimensional analysis.
• Online Analytical Processing (OLAP) refers to software tools used for the analysis
of data in business decision-making processes.
• It is working based on multi dimensional data model.
It is well-known as an online
It is well-known as an online
Definition database query management
database modifying system.
system.
It makes use of a
Method
It makes use of a data warehouse. standard database management
used
system (DBMS).
11. List out some of the BI best practices adopted from an article TDWI’s Flash Point
e-newsletter.
• Practice ―User First‖ Design
• Create New Value
• Attend to Human Impacts
• Focus on Information and Analytics
• Manage BI as a long term investment
Structured data
Data that is organized in a fixed schema (rows and columns) and is easily stored, accessed,
and processed using traditional databases.
Characteristics:
Predefined data model
Easily searchable
Stored in relational databases (RDBMS)
Examples:
Excel spreadsheets
SQL databases
Employee records (ID, name, salary)
Online transaction records
Semi-Structured Data
Data that does not reside in a traditional table format but still contains tags or markers to
separate elements.
Characteristics:
No fixed schema, but has structure through tags or keys
Can be parsed and stored using NoSQL databases
Examples:
JSON files
XML documents
HTML pages
Email (To, From, Subject, Body)
How to manage semi structured data
Schemas
Graph based data models and XML
Unstructured Data
Data that has no predefined format or structure, making it hard to process using traditional
tools.
Characteristics:
No specific schema
Large in volume
Requires advanced tools like AI/ML for analysis
Examples:
Images, audio, video
Social media posts
Word documents or PDFs
Customer feedback or call recordings
Key Features:
Data is stored in proprietary, optimized cube structures.
Very fast query performance due to pre-computed aggregates.
Supports complex calculations and slicing/dicing.
Best for smaller to medium datasets that fit in memory.
Advantages:
Excellent performance for analytical queries.
Efficient storage through compression and aggregation.
Easy multidimensional navigation (drill-down, roll-up).
Disadvantages:
Limited scalability (not ideal for big data).
Cube reprocessing needed when data updates.
Proprietary formats; limited flexibility.
Key Features:
Works directly with large relational databases.
Uses star/snowflake schemas in a data warehouse.
Aggregations are calculated on-the-fly using SQL.
Advantages:
Can handle huge volumes of data (scalable).
No need to pre-aggregate or reprocess entire cube.
Leverages existing relational database systems.
Disadvantages:
Slower query performance compared to MOLAP.
Complex SQL generation for multidimensional queries.
Performance highly depends on indexing and database tuning.
Key Features:
Frequently used aggregations are stored in MOLAP cubes.
Detailed or less-used data stays in ROLAP relational tables.
Queries access both sources as needed.
Advantages:
Balanced performance and scalability.
Faster access to summaries with drill-down capability.
Efficient storage for detailed historical data.
Disadvantages:
More complex architecture.
Might require more tuning and management.
ER model
Entity - An objects that is stored as data such as Student, Course or Company.
Attribute - Properties that describes an entity such as StudentID, CourseName
Relationship - A connection between entities such as "a Student enrolls in a Course".
Entities are
• Employee
• EmployeeAddress
• EmployeePayHistory
Relationships
• 1:M cardinality between Employee and EmployeeAddress entities.
• 1:M cardinality between Employee and EmployeePayHistory entities
Fact Constellation/Galaxy schema -It is a schema design that integrates multiple fact
tables sharing common dimensions, often referred to as a "Galaxy schema."
This approach allows businesses to conduct multi-dimensional analysis across complex
datasets.
4. Discuss briefly three major layers of BI component framework.
BI Component Framework
BI component framework has three major Layers:
• Business Layer
• Administration and operation layer
• Implementation layer
Business layer
This layer consists of four components
BUSINESS BUSINESSVALUE
REQUIREMENTS
BUSINESS
LAYER
PROGRAM DEVELOPMENT
MANAGEMENT
1. Business requirements
• Business drivers - changing workforce, changing labor laws, changing
technology
• Business Goals - increased productivity, improved market share, improved
customer satisfaction
• Business Strategies - outsourcing, partnerships, customer retention programs,
employee retention programs
2. Business Value
The business value can be measured in the terms of ROI, ROA, TCO, TVO
• Return on Investment
ROI=( Gain from Investment−Cost of Investment/
Cost of Investmen)t×100
• Return on Assest
ROA=( Net Income/Total Assets)×100
• Total Cost of Ownership - the complete cost of acquiring, operating, and
maintaining a product
TCO=Initial Cost+Operational Costs+Maintenance Costs+Other Hidden
Costs
• Total Value of Ownership
TVO=Total Value (Benefits)−Total Cost (TCO)
3. Program Management
This component of the business layer ensures:
• Business priorities
• Mission and goals
• Strategies and risks
• Multiple projects
• Dependencies
• Cost and value
• Business rules
• Infrastructure
4. Development
The process of development consists of
• database/data-warehouse development (consisting of ETL, data profiling, data
cleansing and database tools),
• data integration system development (consists of data integration tools and data
quality tools)
1. BI Architecture
• Data - Metadata
• Integration - according to business semantics and rules
• Information - derived from data
• Technology - must be accessible
• Organization - different roles and responsibilities like management, development
2. BI and DW Operations
• Backup and restore
• Security
• Configuration and Management
• Database Management
2. Information Services
• Information Delivery
• Business Analytics
BI Data architect
• Ensure proper definition, storage, distribution, management of data
• Owns accountability for the enterprise‘s data.
BI ETL architect
• Training project ETL specialist on data acquisition, transformation and
loading
BI Technical architect
• Interface with operations staff
• Interface with technical staff
• Interface with DBA staff
• Evaluate and select BI tools
• Access current technical architecture
• Define strategy for data backup and recovery
Metadata manager
• Responsible for structure of data
• Level of details of data
• When was ETL job performed?
• When was the data warehouse updated?
• Who access the data and when?
• What is the frequency of access?
BI administrator
• Design and architect the entire BI environment
• Architect the metadata layer
• Manage the security of BI environment
• Monitor and tune the performance of the entire BI environment
• Maintain the version control of all objects in the BI environment
BI Business specialist
• Identify the suitable data usage and structure for the business functional area
BI Project manager
• Understand existing business processes
• Understand subject matter
• Anticipate and judge what users may want
• Mange expectations of the project
• Scope and increment
• Develop project plan
• Motivate team members
• Evaluate team members
• Coordinate with other project managers
• Communicate with all other team members
BI designer
• Create a subject area model
• Interpret requirements
• Create a logical staging area model
• Create a structural staging area model
• Create a physical staging area model
• Create a logical distribution model
• Create a structural distribution model
• Create a physical distribution model
• Create a logical relational model
• Create a structural relational model
• Create a physical relational model
ETL specialist
• Understand both the source and the target BI systems
• Identify data sources
• Assess data sources
• Create source mapping
• Apply business rules as transformation
• Coordinate with the program level ETL architect
Database administrator
• Design, implement and tune database schemas
• Conduct regular performance testing and tuning
• Manage storage space and memory
• Conduct capacity planning
• Analyze user patterns and downtime
• Administrate tables, triggers
UNIT – II
DATA INTEGRATION
Data Warehouse – Need and Goals of Data Warehouse – Data Integration – Need and
Advantages of Data Integration – Common Data Integration Approaches – Data Integration
Technologies – Data Quality – Data Profiling Concepts and Applications – Introduction to
ETL using SSIS.
Part – A (2 Marks)
1. Define data warehouse.
• A data warehouse is a subject-oriented, integrated, time variant and non-volatile
collection of data in support of management‘s decision making process.
• It provide multidimensional view of data
10. Write the steps to convert ER diagram into set of dimensional models
• Separate out the various business processes and represent each as a separate
dimensional model.
• Identify all many-to-many relationships in ER diagram and construct them into fact
table.
• De-normalize all the remaining tables into single part key tables.
Data warehousing
The various primary concepts used in data warehousing would be:
• ETL (Extract Transform Load)
• Component-based (Data Mart)
• Dimensional Models and Schemas
• Metadata driven
Schema integration
• Multiple data sources may provide data on the same entity type.
• The main goal is to allow applications to transparently view and query this data as one
uniform data source, and this is done using various mapping rules to handle structural
differences.
• ―Schema integration is developing a unified representation of semantically similar
information, structured and stored differently in the individual databases‖.
Example:
Instance integration
• Data integration from multiple heterogeneous data sources has become a high-priority
task in many large enterprises.
• Hence to obtain the accurate semantic information on the data content, the
information is being retrieved directly from the data.
• It identifies and integrates all the instance of the data items that represents the real-
world entity, distinct from the schema integration.
Example:
Database 1 :
Empno empname noofleave
E1 John kumar 4
Database 2 :
Empno empname noofpresent
E1 J. kumar 25
Database 3 :
Empno empname salary
E1 John.k 25000
Solution: use Empno common attribute and replace the value in the empname with one
consistent value such as John kumar in all places.
Database 1 :
Empno empname noofleave
E1 John kumar 4
Database 2 :
Empno empname noofpresent
E1 John kumar 25
Database 3 :
Empno empname salary
E1 John kumar 25000
(iii)Modeling techniques
• Entity-Relational Modeling
• Dimensional Modeling
Entity-Relational Modeling
Its a logical design techniques. The main focus is to present data in a standard format for end
user consumption.
Star schema
Snowflake scheme
Every dimensional model is composed of one fact table and a number of dimensional tables.
3. Illustrate the way of maintain data quality with its key dimensions along with ensure
the quality.
• Data Quality refers to the accuracy, completeness, reliability, and relevance of data
for its intended use.
• High-quality data is essential for effective decision-making, analytics, and operations
in any data-driven environment.
• Improves decision-making
• Enhances customer satisfaction
• Reduces operational costs
• Supports compliance and reporting
• Strengthens analytics and business intelligence
Data integrity
• Data Integrity refers to the accuracy, consistency, and reliability of data throughout its
lifecycle — from creation to storage, processing, and retrieval.
• It ensures that data remains unchanged, valid, and trustworthy unless deliberately
modified through authorized processes.
Definition Ensures data is accurate, consistent, Measures how well data is fit for its
and secure over its entire lifecycle intended purpose
Focus Correct structure, relationships, and Content accuracy, completeness,
protection from corruption timeliness, and relevance
Scope More technical and database- Broader business and analytical context
focused
Maintained By Database administrators Data stewards
System architects Business analysts
Data engineers
Data quality profiling – Analyzing the data from data source against certain specified rules or
requirements.
Database profiling – Analyze of database with respect to its scheme, relationship between
tables, columns used , data type of column, keys of the table.
Type Description
Structure Discovery Analyzes data types, lengths, formats, and schema adherence.
Examines actual data values (e.g., min, max, mean, frequency
Content Discovery
distribution).
Relationship Detects relationships across columns/tables (e.g., foreign key
Discovery candidates).
Datiris Profiler
• Compatible with other applications
• Domain validation
• Command line interface
• Pattern analysis
• Real time data viewing
OpenRefine
5. Demonstrate the way of implementing ETL using SSIS components
ETL
Extract - Retrieve data from source systems (databases, files, APIs, etc.)
Transform - Clean, format, and convert data (e.g., remove duplicates, change data types)
Load - Store the transformed data into the target (e.g., SQL Server, data warehouse)
SQL Server Integration Services (SSIS) is a Microsoft tool used for ETL operations.
SSIS enables:
1. Create a New SSIS Project in SQL Server Data Tools (Visual Studio).
2. Add Data Flow Task in Control Flow.
3. Inside Data Flow:
• Use a Source (like OLE DB Source) to extract data.
• Apply Transformations (like Data Conversion, Conditional Split).
• Use a Destination (like SQL Server, Flat File, Excel) to load data.
4. Run orSchedule the Package using SQL Server Agent or command-line tools.
SSIS Package
An SSIS Package is a collection of control flow and data flow tasks saved as a .dtsx file. It
can be deployed to the SSIS catalog for reuse, monitoring, and scheduling.
Part – A (2 Marks)
SQL Server Integration Services (SSIS) is a core component of the Microsoft SQL
Server ecosystem, designed to facilitate the automation of data migration, ingestion, and
transformation tasks.
10. Is it possible to combine constraints and event handler in SSIS? Justify your
answer.
Yes we can combine constraints and event handler in SSIS.
A robust SSIS package typically uses both:
Main Workflow:
• Uses precedence constraints for normal flow
• Handles the "happy path" scenario
Event Handlers:
• Handle exceptional conditions
• Manage cross-cutting concerns like logging
Packages
• A package is a collection of control flow elements, data flow elements, variables,
event handlers, and connection managers.
• Initially, when you create a package, it is an empty object that does nothing.
• But when you create the basic package, you can add advanced features such as log
providers and variables to extend package functionality in the package.
Control flow
• A control flow contains one or more containers and tasks, and they execute when the
package runs.
Data flow
• A data flow contains the source and destination which are used to modify and extend
data, extract and load data, and the paths that link sources, transformations, and
destinations.
• A data flow task is executable within the SSIS package that creates, runs, and orders
the data flow.
Event Handlers
• The event handler is a workflow that runs in response to the run-time events raised by
a package, container, or task.
Variables
• Variables are used to evaluate an expression at the runtime.
1. System variable – A system variable provides useful information about the package at
run time.
Precedence Constraints
• Precedence constraints are the arrows in a Control flow of a package component that
direct tasks to execute in a predefined order and manage the order in which the tasks
will execute.
Transformations
• Transformations are the key components within the Data Flow that allow changes to
the data within the data pipeline.
Containers
• Container is the core unit in the SSIS architecture for grouping tasks together
logically into units of work. It allows us to declare variables and event handlers.
• Sequence Container
• For loop Container
• Foreach loop container
Destinations
• SSIS destination is used to load data into a variety of database tables/views/SQL
commands. The destination editor provides an ability to create a new table.
2. Illustrate data flow implementation in SSIS using its data flow components.
SQL Server Integration Services provides three different types of data flow components:
• Sources
• Transformations
• Destinations.
Sources - extract data from data stores such as tables and views in relational databases, files,
and Analysis Services databases.
• Adding one or more sources to extract data from files and databases, and add
connection managers to connect to the sources.
• Adding the transformations that meet the business requirements of the package. A
data flow is not required to include transformations.
• Some transformations require a connection manager. For example, the Lookup
transformation uses a connection manager to connect to the database that contains the
lookup data.
• Connecting data flow components by connecting the output of sources and
transformations to the input of transformations and destinations.
• Adding one or more destinations to load data into data stores such as files and
databases, and adding connection managers to connect to the data sources.
• Configuring error outputs on components to handle problems.
Sources
The following sources have properties that can be updated by property expressions:
Example:
• OLE DB Source
• Flat File Source
• Excel Source
• ADO.NET Source
• XML Source
• Raw File Source
• Oracle Source
• SAP BI Source
• Teradata Source
OLE DB Source: Reads from relational databases (e.g., SQL Server, Oracle).
Flat File Source: Reads from .csv, .txt, or other delimited files.
Transformations
• Modify, clean, merge, or reshape data during flow.
• The capabilities of transformations vary broadly.
• Transformations can perform tasks such as updating, summarizing, cleaning, merging,
and distributing data.
• Modify values in columns, look up values in tables, clean data, and aggregate column
values.
• The inputs and outputs of a transformation define the columns of incoming and
outgoing data.
• Depending on the operation performed on the data, some transformations have a
single input and multiple outputs, while other transformations have multiple inputs
and a single output.
• Transformations can also include error outputs, which provide information about the
error that occurred, together with the data that failed:
• For example, string data that could not be converted to an integer data type.
• The Integration Services object model does not restrict the number of inputs, regular
outputs, and error outputs that transformations can contain.
• Create custom transformations that implement any combination of multiple inputs,
regular outputs, and error outputs.
• The input of a transformation is defined as one or more input columns.
• Some Integration Services transformations can also refer to external columns as input.
Example, the input to the OLE DB Command transformation includes external
columns.
• An output column is a column that the transformation adds to the data flow.
• Both regular outputs and error outputs contain output columns. These output columns
in turn act as input columns to the next component in the data flow, either another
transformation or a destination.
Example:
Destinations
Example:
Transformations modify data within a Data Flow task, enabling tasks like cleaning, merging,
sorting, joining, and distributing data.
Examples - Derived Column, Data Conversion, Conditional Split, Merge, etc.
How they work - Transformations read data from a source, apply the specified
transformations, and then pass the transformed data to the destination.
• Aggregate: Performs calculations like SUM, AVG, COUNT, etc. across groups of
data
• Sort: Sorts data and optionally removes duplicates
• Pivot/Unpivot: Converts rows to columns (Pivot) or columns to rows (Unpivot)
• Percentage Sampling/Row Sampling: Extracts a sample of rows from the input
Memory Management
• Be cautious with blocking transformations (Sort, Aggregate, etc.) that require all rows
before processing
• Consider using SSIS buffer tuning properties for large datasets
Error Handling
Performance Tips
Debugging Transformations
SQL Server Integration Services (SSIS) containers are fundamental control flow elements
that help organize packages, manage scope, and implement complex workflow logic.
Types of Containers in SSIS
Common Uses:
• Grouping related tasks
• Creating transaction blocks
• Limiting variable scope
• Enabling/disabling sections of packages
Common Uses:
• Processing files with sequential names
• Executing a task a specific number of times
• Implementing counters
Common Uses:
• Processing multiple files in a directory
• Executing tasks for each row in a dataset
• Handling collections of server names or other objects
Copy
Download
plaintext
Copy
Download
plaintext
Copy
Download
5. Explain different types of Precedence Constraints and Event Handlers in SSIS with
example.
Precedence constraints are the "connectors" that control workflow between tasks and
containers in SSIS control flows, determining the execution order based on conditions.
1. OnError
• Triggers when an error occurs
• Most commonly used event handler
• Example: Send email alert on task failure
2. OnExecStatusChanged
• Triggers when execution status changes
• Example: Log status changes to audit table
3. OnPreExecute/OnPostExecute
• Triggers before/after a task executes
• Example: Set variables before task runs or clean up after
4. OnWarning
• Triggers when a warning occurs
• Example: Log warnings to monitoring system
5. OnTaskFailed
• Triggers when a task fails
• Similar to OnError but more specific
Implementing Event Handlers
• At run time, containers and tasks raise events. Create custom event handlers that
respond to these events by running a workflow when the event is raised.
• For example: Create an event handler that sends an e-mail message when a task fails.
• An event handler is similar to a package. Like a package, an event handler can
provide scope for variables, and includes a control flow and optional data flows.
• Create event handlers by using the design surface of the Event Handlers tab in SSIS
Designer.
• In SQL Server Data Tools (SSDT), open the Integration Services project that contains
the package you want.
• In Solution Explorer, double-click the package to open it.
• Click the Event Handlers tab.
• In the Executable list, select the executable for which you want to create an event
handler.
• In the Event handler list, select the event handler you want to build.
• Click the link on the design surface of the Event Handler tab.
• Add control flow items to the event handler, and connect items using a precedence
constraint by dragging the constraint from one control flow item to another.
• Optionally, add a Data Flow task, and on the design surface of the Data Flow tab,
create a data flow for the event handler.
• On the File menu, click Save Selected Items to save the package.Set the properties of
an event handler
Set properties in the Properties window of SQL Server Data Tools (SSDT) or
programmatically.
UNIT – IV
MULTIDIMENSIONAL DATA MODELING
Introduction to Data and Dimension Modeling –Types of Data Model – Data Modeling
Techniques – Fact Table – Dimension Table – Typical Dimensional Models – Dimensional
Model Life Cycle – Introduction to Business Metrics and KPIs – Creating Cubes using
SSAS.
Part – A (2 Marks)
1. What is data model? Why is there a need for data model?
• A data model is a diagrammatic representation of the data and the relationship
between its different entities.
• It assists in identifying how the entities are related through a visual representation of
their relationships and thus helps reduce possible errors in the database design
2. How is the conceptual data model differs from logical data model?
Level of Detail Entities & relationships only Attributes, keys, data types
3. How is the logical data model different from physical data model?
Level of Detail Attributes, keys, data types Tables, columns, indexes, storage
6. What is snowflaking?
• The snowflake design is the result of further expansion and normalization of the
dimension table.
• Dimension table is said to be snowflaked if the low-cardinality attributes of the
dimensions have been divided into separate normalized tables.
• These tables are then joined to dimension table with referential constraints (foreign
key constraints).
7. When do we snowflake?
The dimensional model is snowflaked under the following two conditions:
• The dimension table consists of two or more sets of attributes which define
information at different grains.
• The sets of attributes of the same dimension table are being populated by different
source systems.
Strategic performance
Purpose General tracking
measurement
Broad (many
Scope Focused (few critical ones)
tracked)
The conceptual data model is designed by identifying the various entities and the highest-
level relationships between them as per the given requirements.
For example, ‗many to many‘ tables may exist in a logical or physical data model but they are
just shown as a relationship with no cardinality under the conceptual data model.
Logical Data Model
The logical data model is used to describe data in as much detail as possible.
While describing the data, no consideration is given to the physical implementation aspect.
Normalization:
• 1NF
• 2NF
• 3NF and so on
Physical data model is a representation of how the model will be build in the database.
It exhibit all the table structures, including column names, columns data types, constraints,
primary key, foreign key, relationship between tables.
Features of physical data model
• Specification of all tables and columns.
• Foreign keys are used to identify relationships between tables.
• While logical data model is about normalization, physical data model may support de-
normalization based on user requirements.
• Physical considerations (implementation concerns) may cause the physical data model
to be quite different from the logical data model.
• Physical data model will be different for different RDBMS. For example, data type
for a column may be different for MySQL, DB2, Oracle, SQL Server, etc.
An industry service provider, ―InfoMechanists‖, has several Business Units (BUs) such as
• Financial Services(FS)
• Insurance Services (IS)
• Life Science Services (LSS)
• Communication Services (CS)
• Testing Services (TS)
Each BU has
• a Head as a manager
• Many employees reporting to him. Each employee has a current residential address.
There are cases where a couple (both husband and wife) are employed either in the same BU
or a different one.
For example, in an insurance project, the development and maintenance work is with
Insurance Services (IS) and the testing task is with Testing Services (TS). Each BU usually
works on several projects at a time.
Given the specifications mentioned to design an ER model by using the following steps.
Identify the relationships among the entities along with cardinality and participation
type(total/partial participation).
• One to one
• Many to one
The company sells its products in north, north-west, and western regions of India.
They have sales units at Mumbai, Pune, Ahmedabad, Delhi, and Punjab.
The President of the company wants the latest sales information to measure the sales
performance and to take corrective actions if required.
14 41 33 25
Digital Camera 6 17
Mobile Phones 6 16 6 8
Pen Drives 8 25 21
Representation 4: No of items sold in each region for each product over time
Mobile Phones 3 16 6
Pen Drives 4 16 6
Mobile Phones 3 8
Pen Drives 4 9 15
• Dimensional modeling is the first step towards building a dimensional database, i.e. a
data warehouse.
• Dimensional modeling divides the database into two parts:
(a) Measurement and (b) Context.
• These measurements are usually numeric values called facts.
• Facts are enclosed by various contexts that are true at the moment the facts are
recorded. These contexts are intuitively divided into independent logical clumps
called dimensions.
3. What constitutes a fact table? What are the various types of facts? Explain using
examples.
• A fact table is the central table in a star schema or snowflake schema data warehouse
design.
• It contains the quantitative measurements or metrics of a business process.
Types of Fact
Additive facts
• These are the facts that can be summed up/aggregated across all dimensions in a fact
table.
• For example, discrete numerical measures of activity — quantity sold, dollars sold,
etc.
• Consider a scenario where a retail store ―Northwind Traders‖ wants to analyze the
revenue generated.
• It can be in terms of any combination of multiple dimensions. Products, time, region,
and employee are the dimensions in this case.
• The revenue, which is a fact, can be aggregated along any of the above dimensions to
give the total revenue along that dimension.
• Such scenarios where the fact can be aggregated along all the dimensions make the
fact a fully additive or just an additive fact.
• Here revenue is the additive fact.
• The ―SalesFact‖ fact table along with its corresponding dimension tables.
• This fact table has one measure. ―SalesAmount‖, and three dimension keys,
―DateID‖, ―ProductID‖, and ―StoreID‖.
• The purpose of the ―SalesFact‖ table is to record the sales amount for each product in
each store on a daily basis.
• In this table, ―SalesAmount‖ is an additive fact because we can sum up this fact
along any of the three dimensions.
Semi Additive facts
• These are the facts that can be summed up for some dimensions in the fact table, but
not all.
• For example, account balances, inventory level, distinct counts etc.
• Figure depicts the ―AccountsFact‖ fact table along with its corresponding
dimension tables.
• The ―AccountsFact‖ fact table has two measures :―CurrentBalance‖ and
―ProfitMargin‖.
• It has two dimension keys: ―DatelD‖ and ―AccountID‖.
• ―CurrentBalance‖ is a semi- additive fact.
• It makes sense to add up current balances for all accounts to get the information on
―what's the total current balance for all accounts in the bank?‖
• However, it does not make sense to add up current balances through time.
• It does not make sense to add up all current balance for a given account for a given
account for each day of the month.
• Similarly, ―ProfitMargin‖ is another non-additive fact, as it does not make sense to
add profit margins at the account level or at the day level.
• These are the facts that cannot be summed up for some dimensions present in the fact
table.
• For example, measurement of room temperature, percentages, ratios, factless, facts,
etc.
• Non-additive facts are facts where SUM operator cannot be used to produce any
meaningful results.
• The following illustration will help you understand why room temperature is a non-
additive fact.
Date Temperature
• Textual facts: Adding textual facts does not result in any number. However, counting
textual facts may result in a sensible number.
• Per-unit prices: Adding unit prices does not produce any meaningful number. For
example: the unit sales price or unit cost is strictly non-addictive.
• Percentages and ratios: A ratio, such as gross margin, is non-additive. Non-additive
facts are usually the result of ratio or other calculations, such as percentages.
• Measures of intensity: Measures of intensity such as the room temperature are non-
additive across all dimensions.
• Summing the room temperature across different times of the day produces a totally
non-meaningful number.
• Averages: Facts based on averages are non-additive.
• For example, average sales price is non-additive. Adding all the average unit prices
produces a meaningless number.
• Factless facts (event-based fact tables): Event fact tables are tables that record
events.
• For example, event fact tables are used to record events such as Webpage clicks and
employee or student attendance.
• In an attendance recording scenario, attendance can be recorded in terms of ―yes‖ or
―no‖ OR with pusedo facts like ―1‖ or ―0‖.
• In such scenarios, we can count the values but adding them will give invalid values.
Degenerate Dimension
• In a dimension model, dimension attributes are not fixed as their values can change
slowly over a period of time.
• Here comes the role of a slowly changing dimension.
• A slowly changing dimension is a dimension whose attribute/attributes for a record
(row) change slowly over time, rather than change on a regularly timely basis.
• Let us assume a company sells car-related accessories.
• The company decides to assign a new sales territory, Los Angeles, to its sales
representative, Bret Watson, who earlier operated from Chicago.
• How can you record the change without making it appear that Watson earlier held
Chicago?
• Let us take a look at the original record of Bret Watson: Now the original record has
to be changed as Bret Watson has been assigned ―Los Angeles‖ as his sales territory,
effective May 1, 2011.
• This would be done through a slowly changing dimension.
Given below are the approaches for handling a slowly changing dimension:
• In this approach, the existing dimension attribute is overwritten with new data, and
hence no history is preserved.
• A new row is added into the dimension table with a new primary key every time a
change occurs to any of the attributes in the dimension table.
• Therefore, both the original values as well as the newly updated values are captured.
• This approach is used when it is compulsory for the data warehouse to track historical
and when these changes will happen only for a finite number of times.
• Type-III SCDs do not increase the size of the table as compared to the Type-II SCDs
since old information is updated by adding new information.
• Age
• Income
• Test score
• Rating
• Credit history score
• Customer account status
• Weight
• One method of handling fast changing dimensions is to break off a fast changing
dimension into one or more separate dimensions known as mini-dimensions.
• The fact table would then have two separate foreign keys — one for the primary
dimension table and another for the fast changing attribute.
• A single dimension that is expressed differently in a fact table with the usage of views
is called a role-playing dimension.
• Consider an on-line transaction involving the purchase of a laptop.
• The moment an order is placed, an order date and a delivery date will be generated.
• It should be observed that both the dates are the attributes of the same time dimension.
• Whenever two separate analyses of the sales performance — one in terms of the order
date and the other in terms of the delivery date
• Two views of the same time dimension will be created to perform the analyses.
• In this scenario, the time dimension is called the role-playing dimension as it is
playing the role of both the order and delivery dates.
Internal/external
• Sometimes the consumers of reports may be external to the enterprise.
• We are very familiar with the annual reports of organizations.
• Correctness as well attractive presentation of the report is of paramount importance.
Role-based
• Today we are witnessing massive information overload.
• The trend is to provide standard format of report to similar roles across the enterprise,
as they are likely to make similar decisions.
• For example, a sales executive responsible for strategic accounts will need
similar information/facts for decision making irrespective of the country/ products
he/she handles.
Strategic/operational
• Reports could also be classified based on the nature of the purpose they preserve.
• Strategic reports inform the alignment with the goals, whereas operational reports
present transaction facts.
• The quarterly revenue report indicates variance with regard to meeting targets,
whereas the daily cash flow summary indicates summary of day's business
transactions.
• When consolidated across several locations, regions, products/services, even this
report will be of strategic importance.
Summary/detail
• As the name suggests, summary reports do not provide transaction-level information.
• Provide several summaries could be aggregated to track enterprise-level performance.
Standard/ad hoc
• Departments tend to generate periodic reports, say, weekly, monthly, or quarterly
reports in standard formats.
• Executives many times need ad hoc or on-demand critical business decision making.
Purpose
• Enterprises classify reports as statutory that focus on business transparency and
need to be shared with regulatory bodies.
• For example, a bank reporting to the Reserve Bank stipulated parameters of its
operations.
• Analytical reports look into a particular area of operation like sales, production, and
procurement, and they find patterns in historical data.
• These reports typically represent large data interpretations in the form of graphs.
• Scorecards are used in modern enterprises to objectively capture the key
performances against set targets and deviation with reasons.
Technology platform-centric
Matrix report
• A matrix, cross tab aggregates data along the x-axis and y-axis of a grid to form a
summarized table
• Matrix report columns are not static but are based on the group values.
List reports
• A list report has a single, rectangular detail area that repeats fro every record or group
value in the underlying data set.
• The main purpose is to contain other related data regions and report items and to
repeat them for a group of values.
Chart reports
• Chart report provide a visual context for a lot of different kinds of data.
• There are several chart forms that can be used in the chart report such as bar chart,
line graph, column chart etc,
Gauge reports
• A gauge report is a data visualization tool that displays a single key performance
indicator (KPI) or metric in a format resembling a speedometer or dial gauge.
The tabular representation of the objective, measures, target and initiative concerning
the airline example:
4. Explain the process of creating dashboards and describe the different types of
dashboards with the help of a neat diagram.
Steps for creating dashboards
First step
Identify the data that will go into an Enterprise Dashboard .
Enterprise Dashboards can contain either/both of the below mentioned data:
• Quantitative data
• Non-Quantitative data
Second step
Decide on the timeframe.
E.g.: The various timeframes can be:
• This month to date
• This quarter to date
• This year to date
• Today so far
Third step
Decide on the comparative measures
Fourth step
Decide on the evaluation mechanisms
• E.g.: the evaluation can be performed as follows:
• Using visual objects e.g. traffic lights
• Using visual attributes e.g. red color for the measure to alert a serious
condition
Types of dashboard
Enterprise performance Dashboards
• Provide a overall view of the entire enterprise, rather than specific business functions.
• Typical portlets in an Enterprise Performance Dashboard include:
• Corporate financials
• Sales revenue
• Business Unit KPI‘s [Key Performance Indicators]
• Supply chain information
• Compliance or regulatory data
• Balanced scorecard information
Customer Support Dashboards
• Organizations provide such a dashboard to its customers as a value-add service.
• They provide the customer with their personal account information as pertaining to
the business relationship, such as:
• Online Trading
• Utility Services
• Entertainment
• B2B SLA Monitoring
Divisional Dashboards
5. Explain the architecture of SQL Server Reporting Services (SSRS) by identifying its
key components and illustrating their interactions with a well-structured diagram.
1. Report Server
2. Report Manager
3. Report Designer
• Tool for creating reports (available in Visual Studio/SQL Server Data Tools)
• Supports drag-and-drop functionality
• Provides a preview mode for testing reports
4. Report Builder
• It stores metadata, resources, report definitions, security settings, delivery data and so
on.
6.Data sources
• Reporting services retrieves data from data sources like relational and
multidimensional data sources.
Authoring Phase - Reports are created using Report Designer or Report Builder (RDL files)
Processing Phase
Rendering Phase - Report is converted to the requested output format (PDF, Excel, HTML,
etc.)
Delivery Phase - Report is delivered to the user via browser, email (subscription), or file
share
Processing of SQL Server Reporting Services