0% found this document useful (0 votes)
5 views76 pages

BI Notes QA

The document outlines the syllabus for the Business Intelligence course at K.S.R. College of Engineering, covering topics such as digital data types, data integration, ETL processes using SSIS, multidimensional data modeling, and enterprise reporting. It includes definitions, comparisons of OLTP and OLAP systems, and details on BI roles and responsibilities. Additionally, it provides insights into various data models and architectures relevant to business intelligence applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views76 pages

BI Notes QA

The document outlines the syllabus for the Business Intelligence course at K.S.R. College of Engineering, covering topics such as digital data types, data integration, ETL processes using SSIS, multidimensional data modeling, and enterprise reporting. It includes definitions, comparisons of OLTP and OLAP systems, and details on BI roles and responsibilities. Additionally, it provides insights into various data models and architectures relevant to business intelligence applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

K.S.R.COLLEGE OF ENGINEERING – TIRUCHENGODE 637 215.

(AUTONOMOUS)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Course Name : 20CS765 -Business Intelligence
Class : IV CSE
Syllabus
UNIT – I
Basics of Business Intelligence
Introduction to Digital Data and its types – Structured, Semi Structured and Unstructured –
Introduction to OLTP and OLAP, OLAP Architectures – Data Models. BI Definitions and
Concepts – Business Intelligence Applications – BI Framework – BI Process – BI
Technology – BI Roles and Responsibilities – BI Best Practices.
UNIT – II
Data Integration
Data Warehouse – Need and Goals of Data Warehouse – Data Integration – Need and
Advantages of Data Integration – Common Data Integration Approaches – Data Integration
Technologies – Data Quality – Data Profiling Concepts and Applications – Introduction to
ETL using SSIS.
UNIT – III
Data Flow and Transformations
Introduction to SSIS Architecture – Introduction to ETL using SSIS – Integration Services
Objects – Data Flow Components – Sources, Transformations and Destinations – Working
with Transformations, Containers, Tasks, Precedence Constraints and Event Handlers.

UNIT – IV
Multidimensional Data Modeling
Introduction to Data and Dimension Modeling –Types of Data Model – Data Modeling
Techniques – Fact Table – Dimension Table – Typical Dimensional Models – Dimensional
Model Life Cycle – Introduction to Business Metrics and KPIs – Creating Cubes using
SSAS.
UNIT – V
ENTERPRISE REPORTING
Introduction to Enterprise Reporting – Reporting Perspectives Common to all Levels of
Enterprise – Report Standardization and Presentation Practices – Enterprise Reporting
Characteristics in OLAP – Concepts of Balanced Scorecards, Dashboards – Create
Dashboards – Scorecards vsDashboards – Introduction to SSRS Architecture – Enterprise
Reporting using SSRS.

Text Books:
1. R N Prasad, SeemaAcharya, Fundamentals of Business Analytics, John Wiley
India Pvt. Ltd, US, Second Edition, 2016.
2. David Loshin, Business Intelligence - The Savvy Manager's Guide, Morgan
Kaufmann Publishers, United States, Second Edition, 2012.
UNIT – I
Basics of Business Intelligence
Introduction to Digital Data and its types – Structured, Semi Structured and Unstructured –
Introduction to OLTP and OLAP, OLAP Architectures – Data Models. BI Definitions and
Concepts – Business Intelligence Applications – BI Framework – BI Process – BI
Technology – BI Roles and Responsibilities – BI Best Practices.

Part – A (2 Marks)
1. What is Business Intelligence?
Business Intelligence (BI) refers to the strategies, technologies, and tools used by
businesses to collect, analyze, and present data to support better decision-making.

2. Which are the Components of Business Intelligence?


• Data Collection
• Data Warehousing
• Data Analysis
• Reporting
• Decision Support

3. List out BI Tools


• Microsoft Power BI
• Tableau
• QlikView
• SAP BusinessObjects
• Looker

4. Categories various types of digital data.


• Structured
• Semi Structured
• Unstructured

5. Compare Structured, Semi Structured and Unstructured data.
Properties Structured data Semi-structured data Unstructured data

It is based on It is based on It is based on


Technology Relational database XML/RDF(Resource character and
table Description Framework). binary data

Version Versioning over Versioning over tuples or Versioned as a


management tuples,row,tables graph is possible whole

It is more flexible than It is more


It is schema
structured data but less flexible and
Flexibility dependent and less
flexible than unstructured there is absence
flexible
data of schema
Properties Structured data Semi-structured data Unstructured data

It is very difficult to It's scaling is simpler than It is more


Scalability
scale DB schema structured data scalable.

6. Define OLTP
• On-Line Transaction Processing (OLTP) –refers to a class of systems that manage
transaction oriented applications.
• These applications are mainly concerned with the entry, storage and retrieval of
data.

7. Define OLAP
• OLAP is designed for enabling businesses to derive insights from vast datasets
through multidimensional analysis.
• Online Analytical Processing (OLAP) refers to software tools used for the analysis
of data in business decision-making processes.
• It is working based on multi dimensional data model.

8. Differentiate OLTP and OLAP


OLAP (Online Analytical OLTP (Online Transaction
Category Processing) Processing)

It is well-known as an online
It is well-known as an online
Definition database query management
database modifying system.
system.

Consists of historical data from Consists of only operational


Data source
various Databases. current data.

It makes use of a
Method
It makes use of a data warehouse. standard database management
used
system (DBMS).

It is subject-oriented. Used for Data


It is application-oriented. Used for
Application Mining, Analytics, Decisions
business tasks.
making, etc.

9. How does Business Intelligence works?


• Requirements Gathering
• Data Collection
• Data Preparation
• Data Analysis
• Data Visualization
• Decision Making
10. List out the major business intelligence applications
Technology solutions
• DSS - Decision Support System.
• EIS - Executive Information Systems.
• OLAP - handling multidimensional data
Business Solutions
• Performance Analysis - employee and business
• Customer Analysis - capture about customer‘s behaviour
• Market Place Analysis - understanding customers, competitors and products

11. List out some of the BI best practices adopted from an article TDWI’s Flash Point
e-newsletter.
• Practice ―User First‖ Design
• Create New Value
• Attend to Human Impacts
• Focus on Information and Analytics
• Manage BI as a long term investment

Part – B(Big Questions)


1. Discuss elaborately various types of digital data with its characteristics and example.

Structured data
Data that is organized in a fixed schema (rows and columns) and is easily stored, accessed,
and processed using traditional databases.

Characteristics:
 Predefined data model
 Easily searchable
 Stored in relational databases (RDBMS)
Examples:
 Excel spreadsheets
 SQL databases
 Employee records (ID, name, salary)
 Online transaction records

Semi-Structured Data
Data that does not reside in a traditional table format but still contains tags or markers to
separate elements.

Characteristics:
 No fixed schema, but has structure through tags or keys
 Can be parsed and stored using NoSQL databases
Examples:
 JSON files
 XML documents
 HTML pages
 Email (To, From, Subject, Body)
How to manage semi structured data
 Schemas
 Graph based data models and XML

How to store semi structured data


Challenges
 Storage cost
 Irregular and partial structure
 Distinction between schema and data
 Implicit structure
Solutions
 XML
 RDBMS
 Special purpose DBMS
 Data can be stored in the form of graph (OEM – Object Exchange Modeling) – its a
model for storing and exchanging semi structured data.
 It structures the data in the form of graphs.

How to extract information from semi structured data


Challenges
 Flat files
 Heterogeneous sources
 Incomplete irregular structure
Solutions
 Indexing
 OEM
 XML
 Mining tools

Unstructured Data
Data that has no predefined format or structure, making it hard to process using traditional
tools.
Characteristics:
 No specific schema
 Large in volume
 Requires advanced tools like AI/ML for analysis
Examples:
 Images, audio, video
 Social media posts
 Word documents or PDFs
 Customer feedback or call recordings

How to manage unstructured data


 Indexing
 Tags/Metadata
 Classification /Taxonomy

How to store unstructured data


 Changing format
 Developing new hardware
 Storing in RDBMS which support BLOB(Binary Large Objects)
 Storing in xml format
 Content addressable storage

How to extract information from stored unstructured data


Challenges
 Interpretation
 Indexing
 Deriving meaning
 File format
Solutions
 Tags
 Text mining
 Applications platforms like XOLAP

2. Explain the various OLAP architectures with neat diagram.


(i) Multidimensional On-Line Analytical Processing (MOLAP)
MOLAP stores data on disks in a specialized multidimensional array structure.
Stores data in a multidimensional cube format instead of relational tables.

Key Features:
 Data is stored in proprietary, optimized cube structures.
 Very fast query performance due to pre-computed aggregates.
 Supports complex calculations and slicing/dicing.
 Best for smaller to medium datasets that fit in memory.
Advantages:
 Excellent performance for analytical queries.
 Efficient storage through compression and aggregation.
 Easy multidimensional navigation (drill-down, roll-up).
Disadvantages:
 Limited scalability (not ideal for big data).
 Cube reprocessing needed when data updates.
 Proprietary formats; limited flexibility.

Example Tools: Microsoft SSAS (Multidimensional mode), IBM Cognos TM1


(ii) Relational On-Line Analytical Processing (ROLAP)
Uses relational databases to store data.
Multidimensional views are created by writing SQL queries on top of the relational schema.(
Star Schema Based)

Key Features:
 Works directly with large relational databases.
 Uses star/snowflake schemas in a data warehouse.
 Aggregations are calculated on-the-fly using SQL.

Advantages:
 Can handle huge volumes of data (scalable).
 No need to pre-aggregate or reprocess entire cube.
 Leverages existing relational database systems.

Disadvantages:
 Slower query performance compared to MOLAP.
 Complex SQL generation for multidimensional queries.
 Performance highly depends on indexing and database tuning.

Example Tools: MicroStrategy, SAP BW (in ROLAP mode), Oracle OLAP


(iii) Hybrid On-Line Analytical Processing (HOLAP)
Combines the speed of MOLAP with the scalability of ROLAP by storing summary data in
cubes and detailed data in relational databases.

Key Features:
 Frequently used aggregations are stored in MOLAP cubes.
 Detailed or less-used data stays in ROLAP relational tables.
 Queries access both sources as needed.

Advantages:
 Balanced performance and scalability.
 Faster access to summaries with drill-down capability.
 Efficient storage for detailed historical data.

Disadvantages:
 More complex architecture.
 Might require more tuning and management.

Example Tools: Microsoft SSAS (Hybrid mode), SAP BW (Hybrid configurations)

3. Illustrate the data models used in OLTP and OLAP

Data Model for OLTP


OLTP system adopts an Entity Relationship (ER) data model.

ER model
Entity - An objects that is stored as data such as Student, Course or Company.
Attribute - Properties that describes an entity such as StudentID, CourseName
Relationship - A connection between entities such as "a Student enrolls in a Course".
Entities are
• Employee
• EmployeeAddress
• EmployeePayHistory
Relationships
• 1:M cardinality between Employee and EmployeeAddress entities.
• 1:M cardinality between Employee and EmployeePayHistory entities

Data Modal for OLAP


OLAP system adopts either star or snowflake model.
Fact table - Central table that contains quantitative data for analysis. It Containsnumeric
value.
Dimensions table - Table that contains descriptive attributes (context) related to fact data.
Non-numeric values used for filtering, grouping, and labeling facts.
Star schema- data is organized into a central fact table that contains the measures of
interest, surrounded by dimension tables that describe the attributes of the measures.
Snowflake Schema - is similar to the star schema. In the snowflake schema, dimensions
are normalized into multiple related tables.
In star schema's dimensions are denormalized with each dimension represented by a single
table.

Fact Constellation/Galaxy schema -It is a schema design that integrates multiple fact
tables sharing common dimensions, often referred to as a "Galaxy schema."
This approach allows businesses to conduct multi-dimensional analysis across complex
datasets.
4. Discuss briefly three major layers of BI component framework.
BI Component Framework
BI component framework has three major Layers:
• Business Layer
• Administration and operation layer
• Implementation layer
Business layer
This layer consists of four components

BUSINESS BUSINESSVALUE
REQUIREMENTS

BUSINESS
LAYER

PROGRAM DEVELOPMENT
MANAGEMENT

1. Business requirements
• Business drivers - changing workforce, changing labor laws, changing
technology
• Business Goals - increased productivity, improved market share, improved
customer satisfaction
• Business Strategies - outsourcing, partnerships, customer retention programs,
employee retention programs
2. Business Value
The business value can be measured in the terms of ROI, ROA, TCO, TVO
• Return on Investment
ROI=( Gain from Investment−Cost of Investment/
Cost of Investmen)t×100
• Return on Assest
ROA=( Net Income/Total Assets)×100
• Total Cost of Ownership - the complete cost of acquiring, operating, and
maintaining a product
TCO=Initial Cost+Operational Costs+Maintenance Costs+Other Hidden
Costs
• Total Value of Ownership
TVO=Total Value (Benefits)−Total Cost (TCO)
3. Program Management
This component of the business layer ensures:
• Business priorities
• Mission and goals
• Strategies and risks
• Multiple projects
• Dependencies
• Cost and value
• Business rules
• Infrastructure

4. Development
The process of development consists of
• database/data-warehouse development (consisting of ETL, data profiling, data
cleansing and database tools),
• data integration system development (consists of data integration tools and data
quality tools)

Administration and Operation Layer


This layer consists of four components

1. BI Architecture
• Data - Metadata
• Integration - according to business semantics and rules
• Information - derived from data
• Technology - must be accessible
• Organization - different roles and responsibilities like management, development

2. BI and DW Operations
• Backup and restore
• Security
• Configuration and Management
• Database Management

3. Data Resource Administration


a. Data Governance - is a technique for controlling data quality, which is used to
assess, improve, manage and maintain information.
• Data ownership
• Data stewardship - ensures that data is accurate, consistent, and properly
defined
• Data custodianship - is responsible for the technical management and
protection of data
b. Metadata management - data about data
Metadata can be divided into four groups:
• Business metadata
• Process metadata
• Technical metadata
• Application metadata

4. Business Applications - generation of information or intelligence from data assets


like data warehouses/data marts.
Implementation Layer
The implementation layer of the BI component framework consists of technical components
that are required for data capture, transformation and cleaning, data into information.
1. Data Warehousing
1. Data Sources
2. Data Acquisition, Cleaning, and Integration
3. Data Stores
Data warehousing must play the following five distinct roles:
• Intake
• Integration
• Distribution
• Delivery
• Access

2. Information Services
• Information Delivery
• Business Analytics

5. Explain the different business intelligence roles and responsibilities in detail.


BI roles can be broadly classified into two categories – Program roles and Project roles.

BI Program Team Roles


BI Program Manager – responsible for several projects.
• has clear understanding of organizations objectives
• plan and budget the projects
• identify the measures of success
• measures success/ROI

BI Data architect
• Ensure proper definition, storage, distribution, management of data
• Owns accountability for the enterprise‘s data.

BI ETL architect
• Training project ETL specialist on data acquisition, transformation and
loading
BI Technical architect
• Interface with operations staff
• Interface with technical staff
• Interface with DBA staff
• Evaluate and select BI tools
• Access current technical architecture
• Define strategy for data backup and recovery

Metadata manager
• Responsible for structure of data
• Level of details of data
• When was ETL job performed?
• When was the data warehouse updated?
• Who access the data and when?
• What is the frequency of access?

BI administrator
• Design and architect the entire BI environment
• Architect the metadata layer
• Manage the security of BI environment
• Monitor and tune the performance of the entire BI environment
• Maintain the version control of all objects in the BI environment

BI Project Team Roles


Business manager
• Monitoring the activities of project team
• Addressing the business issues identified by the project manager

BI Business specialist
• Identify the suitable data usage and structure for the business functional area

BI Project manager
• Understand existing business processes
• Understand subject matter
• Anticipate and judge what users may want
• Mange expectations of the project
• Scope and increment
• Develop project plan
• Motivate team members
• Evaluate team members
• Coordinate with other project managers
• Communicate with all other team members

Business requirement analyst


• Work with architects to transform requirements into technical specifications
• Identify and assess potential data sources
• Recommend appropriate scope of requirements and priorities
• Coordinate prototype reviews
• Gather prototype feedback
Decision support analyst
• Educate users on warehousing capabilities
• Analyze business information requirements
• Design training infrastructures
• Discover business transformation rules
• Create state transformation models
• Discover dimension hierarchies
• Plan acceptance test
• Train BI users
• Implement support plan

BI designer
• Create a subject area model
• Interpret requirements
• Create a logical staging area model
• Create a structural staging area model
• Create a physical staging area model
• Create a logical distribution model
• Create a structural distribution model
• Create a physical distribution model
• Create a logical relational model
• Create a structural relational model
• Create a physical relational model

ETL specialist
• Understand both the source and the target BI systems
• Identify data sources
• Assess data sources
• Create source mapping
• Apply business rules as transformation
• Coordinate with the program level ETL architect

Database administrator
• Design, implement and tune database schemas
• Conduct regular performance testing and tuning
• Manage storage space and memory
• Conduct capacity planning
• Analyze user patterns and downtime
• Administrate tables, triggers
UNIT – II
DATA INTEGRATION
Data Warehouse – Need and Goals of Data Warehouse – Data Integration – Need and
Advantages of Data Integration – Common Data Integration Approaches – Data Integration
Technologies – Data Quality – Data Profiling Concepts and Applications – Introduction to
ETL using SSIS.
Part – A (2 Marks)
1. Define data warehouse.
• A data warehouse is a subject-oriented, integrated, time variant and non-volatile
collection of data in support of management‘s decision making process.
• It provide multidimensional view of data

2. What is data mart?


A Data Mart is a subset of a data warehouse that is focused on a specific business line
or department, such as sales, marketing, or finance.

3. What are the needs for data warehouse?


• Lack of Information Sharing
• Lack of information credibility
• Reports take a longer time to be prepared
• Little or no scope for ad hoc querying or queries that require historical data.

4. What are the goals of data warehouse?


• Information accessibility
• Information credibility
• Flexible to change
• Support for the data security
• Information consistency

5. List out the limitations and advantages of data warehouse.


Advantages
• Integration at the lowest level, eliminating need for integration queries.
• Runtime schematic cleaning is not needed – performed at the data staging
environment
• Independent of original data source
• Query optimization is possible.
Limitations
• Process would take a considerable amount of time and effort
• Requires an understanding of the domain
• More scalable when accompanied with a metadata repository – increased load.
• Tightly coupled architecture

6. Define data integration


Process of merging of data from various data sources and presenting a
cohesive/consolidated view to the user.
7. What is the need of data integration?
• It is done for providing data in a specific view as requested by users,
applications
• Increases with the need for data sharing

8. Which are the advantages of using data integration?


• Benefit to decision-makers, who have access to important information from
past studies
• Reduces cost, overlaps and redundancies; reduces exposure to risks
• Helps to monitor key variables like trends and consumer behaviour.

9. Differentiate ER modelling and dimensional modelling.


ER Modelling Dimensional Modelling
Eliminate redundant data Does not eliminate redundant data
Highly normalized It aggregates most of the attributes and
hierarchies of a dimension into a single
entity

Useful for transactional systems Useful for analytical systems


It is split as per the entities It is split as per the dimensions and facts

10. Write the steps to convert ER diagram into set of dimensional models
• Separate out the various business processes and represent each as a separate
dimensional model.
• Identify all many-to-many relationships in ER diagram and construct them into fact
table.
• De-normalize all the remaining tables into single part key tables.

Part – B (Big Questions)


1. Explain the following data integration approaches along with its comparison.
a. Federated databases
b. Memory-mapped data structure
c. Data warehousing

Federated database (virtual database)


• Type of meta-database management system which transparently integrates multiple
autonomous databases into a single federated database
• The constituent databases are interconnected via a computer network, geographically
decentralized.
• The federated databases is the fully integrated, logical composite of all constituent
databases in a federated database management system.
• The federated database system was defined by McLeod and Heimbigner.
Memory-mapped data structure:
• Useful when needed to do in-memory data manipulation and data structure is large.
• It‘s mainly used in the dot net platform and is always performed with C# or using
VB.NET
• It‘s is a much faster way of accessing the data than using Memory Stream.

Data warehousing
The various primary concepts used in data warehousing would be:
• ETL (Extract Transform Load)
• Component-based (Data Mart)
• Dimensional Models and Schemas
• Metadata driven

Comparison between federated database and datawarehouse


Federated Data Warehouse
Preferred when the database are present Preferred when the source information can be
across various locations over a large area taken from one location
Data would be present in various servers The entire data warehouse would be present
in one server
Requires high speed network connection Requires no network connection
It is easier to create as compared to data Its creation is not as easy as that of the
warehouse federated database
Requires network expert to setup network Requires database experts
connection
2. Elaborately describe the various data integration technologies with example.

Integration is divided into two main approaches:

Schema integration

• Multiple data sources may provide data on the same entity type.
• The main goal is to allow applications to transparently view and query this data as one
uniform data source, and this is done using various mapping rules to handle structural
differences.
• ―Schema integration is developing a unified representation of semantically similar
information, structured and stored differently in the individual databases‖.

Example:

Branch 1 :custid, transid, prodid, unit


C1 T1 P1 10

Branch 2: customerid, transid, prodid, unit


C2 T2 P2 20
Map two database use metadata information. Then output will be

Branch1,2: custid, transid, prodid, unit


C1 T1 P1 10
C2 T2 P2 20

Instance integration

• Data integration from multiple heterogeneous data sources has become a high-priority
task in many large enterprises.
• Hence to obtain the accurate semantic information on the data content, the
information is being retrieved directly from the data.
• It identifies and integrates all the instance of the data items that represents the real-
world entity, distinct from the schema integration.

Example:
Database 1 :
Empno empname noofleave
E1 John kumar 4

Database 2 :
Empno empname noofpresent
E1 J. kumar 25

Database 3 :
Empno empname salary
E1 John.k 25000
Solution: use Empno common attribute and replace the value in the empname with one
consistent value such as John kumar in all places.

Database 1 :
Empno empname noofleave
E1 John kumar 4

Database 2 :
Empno empname noofpresent
E1 John kumar 25

Database 3 :
Empno empname salary
E1 John kumar 25000

Data Integration Technologies

(i) Electronic Data Interchange (EDI):

• It refers to the structured transmission of data between organizations by electronic


means.
• It is used to transfer electronic documents from one computer system to another.
• It is more than mere E-mail.

(ii)Object Brokering/Object Request Broker (ORB):

• An ORB is a piece of middleware software that allows programmers to make


programs calls from one computer to another via a network.
• It handles the transformation of in-process data structure to and from the byte
sequence.

(iii)Modeling techniques

• Entity-Relational Modeling
• Dimensional Modeling

Entity-Relational Modeling

• The Entity-Relationship Model (ER Model) is a conceptual model for designing a


databases.
• This model represents the logical structure of a database, including entities, their
attributes and relationships between them.

Steps to drawing an ER model:


• Identify entities
• Identify relationships between various entities
• Identify the key attribute
• Identify the other relevant attributes for the entities
• Draw the ER diagram
• Review the ER diagram with business users.
Dimensional Modeling

Its a logical design techniques. The main focus is to present data in a standard format for end
user consumption.
 Star schema
 Snowflake scheme
Every dimensional model is composed of one fact table and a number of dimensional tables.
3. Illustrate the way of maintain data quality with its key dimensions along with ensure
the quality.
• Data Quality refers to the accuracy, completeness, reliability, and relevance of data
for its intended use.
• High-quality data is essential for effective decision-making, analytics, and operations
in any data-driven environment.

Key dimensions of data quality


• Correctness
• Consistency
• Completeness
• Timeliness
• Uniqueness
• Integrity

Why Data Quality is Important:

• Improves decision-making
• Enhances customer satisfaction
• Reduces operational costs
• Supports compliance and reporting
• Strengthens analytics and business intelligence

How to Ensure/maintain Data Quality:

• Data profiling and auditing


• Validation rules and constraints
• Regular cleansing and deduplication
• Master Data Management (MDM)
• Data governance policies

Data integrity
• Data Integrity refers to the accuracy, consistency, and reliability of data throughout its
lifecycle — from creation to storage, processing, and retrieval.
• It ensures that data remains unchanged, valid, and trustworthy unless deliberately
modified through authorized processes.

Comparison between data integrity and data quality

Aspect Data Integrity Data Quality

Definition Ensures data is accurate, consistent, Measures how well data is fit for its
and secure over its entire lifecycle intended purpose
Focus Correct structure, relationships, and Content accuracy, completeness,
protection from corruption timeliness, and relevance
Scope More technical and database- Broader business and analytical context
focused
Maintained By Database administrators Data stewards
System architects Business analysts
Data engineers

4. Describe the data profiling concepts and applications

• Data profilingis the process of examining, analyzing, and summarizing data to


understand its structure, content, and quality.
• It helps to assess the data's accuracy, completeness, consistency, and uniqueness,
which are essential for tasks like data cleaning, data integration, and data quality
improvement.

Example : Phone no (9727)483491

Objectives of Data Profiling:

• Understand the Data:


• Assess Data Quality:
• Support Data Management Tasks:

Data profiling can be either data quality profiling or database profiling

Data quality profiling – Analyzing the data from data source against certain specified rules or
requirements.

Database profiling – Analyze of database with respect to its scheme, relationship between
tables, columns used , data type of column, keys of the table.

Types of Data Profiling Techniques:

Type Description
Structure Discovery Analyzes data types, lengths, formats, and schema adherence.
Examines actual data values (e.g., min, max, mean, frequency
Content Discovery
distribution).
Relationship Detects relationships across columns/tables (e.g., foreign key
Discovery candidates).

When to conduct data profiling?

• At the requirement gathering phase


• Just before the dimensional modelling process
• During ETL package design
How to conduct data profiling?

• Data quality – analyze quality of data at data source


• Null values – look out the null values in an attribute
• Candidate keys – analyze the candidate keys
• Primary key selection
• Empty string values
• String length
• Numeric length and type
• Identification of cardinality
• Data format

Applications / Data profiling software / Tools

Trillium Enterprise Data quality


• It scan all data systems
• Remove duplicate records
• Generate statistical reports about the data

Datiris Profiler
• Compatible with other applications
• Domain validation
• Command line interface
• Pattern analysis
• Real time data viewing

Talend Data Profiler


• Free open source software
• Good for small business and non-profit organizations

IBM InfoSphere information analyzer


• Perform deep scan of the system
• Scanning scheduler
• Reports
• Rules analysis
• Source system profiling and analysis

Oracle warehouse builder


• It is not strictly a data profiling software tool
• Provide necessary functionalities to clean data

SSIS data profiling task (Microsoft SQL Server Data Tools)


• It is not independent software tool
• It is integrated into ERL software SQL Server Integration Service (SSIS) provided by
Microsoft.

Pandas (in Python)

OpenRefine
5. Demonstrate the way of implementing ETL using SSIS components

ETL
Extract - Retrieve data from source systems (databases, files, APIs, etc.)
Transform - Clean, format, and convert data (e.g., remove duplicates, change data types)
Load - Store the transformed data into the target (e.g., SQL Server, data warehouse)

SQL Server Integration Services (SSIS) is a Microsoft tool used for ETL operations.

It is part of the Microsoft SQL Server Data Tools (SSDT) suite.

SSIS enables:

• Data migration and transformation


• Workflow creation and automation
• Integration from multiple sources (SQL, Excel, XML, etc.)

Implementing ETL using SSIS Components


• The ETL acronym stands for Extraction, Transformation, and Loading. This process
involves extracting, transforming, and loading data into the final repository.
• ETL is the process of loading data into the warehouse from the source system.
ETL process step-by-step.

• Extraction (E): Collection of data from different sources.


• Transformation (T): A different form of data obtained from different sources and
converted according to business needs.
• Loading (L): The Data warehouse contains the loaded data.
ETL is facilitated by SSIS components, which are listed below:
• Control Flow (for storing containers and tasks)
• Data Flow (Source, destination, and transformations)
• Event Handler (for managing messages and e-mails)
• Package Explorer (for offering an all-in-one view)
• Parameters (for fostering user interaction)

ETL Process in SSIS:

1. Create a New SSIS Project in SQL Server Data Tools (Visual Studio).
2. Add Data Flow Task in Control Flow.
3. Inside Data Flow:
• Use a Source (like OLE DB Source) to extract data.
• Apply Transformations (like Data Conversion, Conditional Split).
• Use a Destination (like SQL Server, Flat File, Excel) to load data.
4. Run orSchedule the Package using SQL Server Agent or command-line tools.

SSIS Package

An SSIS Package is a collection of control flow and data flow tasks saved as a .dtsx file. It
can be deployed to the SSIS catalog for reuse, monitoring, and scheduling.

Benefits of SSIS for ETL

• High performance for large data volumes


• Drag-and-drop GUI for rapid development
• Built-in connectors for various sources and destinations
• Error handling and logging support
• Easy integration with other SQL Server services
UNIT – III
DATA FLOW AND TRANSFORMATIONS
Introduction to SSIS Architecture – Introduction to ETL using SSIS – Integration Services
Objects – Data Flow Components – Sources, Transformations and Destinations – Working
with Transformations, Containers, Tasks, Precedence Constraints and Event Handlers.

Part – A (2 Marks)

1. What is SQL Server Integration Services?

SQL Server Integration Services (SSIS) is a core component of the Microsoft SQL
Server ecosystem, designed to facilitate the automation of data migration, ingestion, and
transformation tasks.

2. Why should use SSIS in data flow and transformation process?

Key features of SQL Server Integration Services include:

• Integration with a variety of data sources


• Built-in error handling and logging
• Parallel processing
• Data transformations

3. What Is a SQL Server Integration Services Package?


• An SSIS package is a reusable collection of tasks executed in a sequential manner
to combine different datasets into one single dataset.
• It then loads the resulting merged dataset into the destination target table in the
same step.

4. Which are the key components of an SSIS Package?


• Tasks
• Containers
• Precedence Constraints

5. List out the data flow components in SSIS


• Sources
• Transformations
• Destinations.

6. Write the steps for creating data flow.


• Adding one or more sources to extract data from files and databases, and add
connection managers to connect to the sources.
• Adding the transformations that meet the business requirements of the package.
• Connecting data flow components by connecting the output of sources and
transformations to the input of transformations and destinations.
• Adding one or more destinations to load data into data stores such as files and
databases, and adding connection managers to connect to the data sources.
• Configuring error outputs on components to handle problems.

7. Write the procedures for debugging containers


• Breakpoints - Set breakpoints on container start/end events
• Logging - Enable container-specific logging
• Progress Reporting - Watch container execution in Progress tab
• Variables Window - Monitor variable changes during container execution

8. What is the use of data flow task


• Data Flow Task is a key component of SSIS that is responsible for moving and
processing data within ETL processes.
• The component identifies what information should be extracted, modified, and added
to target systems.

9. List out the various data preparation task


• File System Task
• FTP Task
• Web Service Task
• XML Task
• Data Profiling Task

10. Is it possible to combine constraints and event handler in SSIS? Justify your
answer.
Yes we can combine constraints and event handler in SSIS.
A robust SSIS package typically uses both:
Main Workflow:
• Uses precedence constraints for normal flow
• Handles the "happy path" scenario
Event Handlers:
• Handle exceptional conditions
• Manage cross-cutting concerns like logging

Part – B (Big Questions)

1. Elaborately describe SSIS architecture with neat diagram.

SSIS Architecture consists of the following parts:

Packages
• A package is a collection of control flow elements, data flow elements, variables,
event handlers, and connection managers.
• Initially, when you create a package, it is an empty object that does nothing.
• But when you create the basic package, you can add advanced features such as log
providers and variables to extend package functionality in the package.
Control flow
• A control flow contains one or more containers and tasks, and they execute when the
package runs.

Data flow
• A data flow contains the source and destination which are used to modify and extend
data, extract and load data, and the paths that link sources, transformations, and
destinations.
• A data flow task is executable within the SSIS package that creates, runs, and orders
the data flow.

Connection managers (connections)


• A connection manager is a link between the package and the data source.
• It defines the connection string for accessing the data. The package includes at least
one connection manager.

Event Handlers
• The event handler is a workflow that runs in response to the run-time events raised by
a package, container, or task.

Log Providers and logging


• The log is a collection of information about the package that is collected when the
package runs.

Variables
• Variables are used to evaluate an expression at the runtime.

Integration Services supports the two types of variables:

1. System variable – A system variable provides useful information about the package at
run time.

2. User-defined variable – A user-defined variable supports a custom scenario in the


package.
Tasks
• A task can be explained as an individual unit of the work.
• Write custom tasks using the programming language that supports COM, such as
Visual Basic, C#, or a .NET programming language.

Precedence Constraints
• Precedence constraints are the arrows in a Control flow of a package component that
direct tasks to execute in a predefined order and manage the order in which the tasks
will execute.
Transformations
• Transformations are the key components within the Data Flow that allow changes to
the data within the data pipeline.

Containers
• Container is the core unit in the SSIS architecture for grouping tasks together
logically into units of work. It allows us to declare variables and event handlers.

There are the following types of containers in SSIS:

• Sequence Container
• For loop Container
• Foreach loop container

Destinations
• SSIS destination is used to load data into a variety of database tables/views/SQL
commands. The destination editor provides an ability to create a new table.

2. Illustrate data flow implementation in SSIS using its data flow components.

SQL Server Integration Services provides three different types of data flow components:

• Sources
• Transformations
• Destinations.

Sources - extract data from data stores such as tables and views in relational databases, files,
and Analysis Services databases.

Transformations - modify, summarize, and clean data.

Destinations - load data into data stores or create in-memory datasets.

Data Flow Implementation


• Adding a Data Flow task to the control flow of a package is the first step in
implementing a data flow in a package.
• A package can include multiple Data Flow tasks, each with its own data flow.
• For example, if a package requires that data flows be run in a specified sequence, or
that other tasks be performed between the data flows, must use a separate Data Flow
task for each data flow.
Creating a data flow includes the following steps:

• Adding one or more sources to extract data from files and databases, and add
connection managers to connect to the sources.
• Adding the transformations that meet the business requirements of the package. A
data flow is not required to include transformations.
• Some transformations require a connection manager. For example, the Lookup
transformation uses a connection manager to connect to the database that contains the
lookup data.
• Connecting data flow components by connecting the output of sources and
transformations to the input of transformations and destinations.
• Adding one or more destinations to load data into data stores such as files and
databases, and adding connection managers to connect to the data sources.
• Configuring error outputs on components to handle problems.

Sources

• Extract data from external sources.


• In Integration Services, a source is the data flow component that makes data from
different external data sources available to the other components in the data flow.
• You can extract data from flat files, XML files, Microsoft Excel workbooks, and files
that contain raw data.
• You can also extract data by accessing tables and views in databases and by running
queries.
• A data flow can include a single source or multiple sources.
• The source for a data flow typically has one regular output. The regular output
contains output columns, which are columns the source adds to the data flow.

The following sources have properties that can be updated by property expressions:

Example:

• OLE DB Source
• Flat File Source
• Excel Source
• ADO.NET Source
• XML Source
• Raw File Source
• Oracle Source
• SAP BI Source
• Teradata Source

OLE DB Source: Reads from relational databases (e.g., SQL Server, Oracle).

Flat File Source: Reads from .csv, .txt, or other delimited files.

Excel Source: Reads data from Excel worksheets.

Transformations
• Modify, clean, merge, or reshape data during flow.
• The capabilities of transformations vary broadly.
• Transformations can perform tasks such as updating, summarizing, cleaning, merging,
and distributing data.
• Modify values in columns, look up values in tables, clean data, and aggregate column
values.
• The inputs and outputs of a transformation define the columns of incoming and
outgoing data.
• Depending on the operation performed on the data, some transformations have a
single input and multiple outputs, while other transformations have multiple inputs
and a single output.
• Transformations can also include error outputs, which provide information about the
error that occurred, together with the data that failed:
• For example, string data that could not be converted to an integer data type.
• The Integration Services object model does not restrict the number of inputs, regular
outputs, and error outputs that transformations can contain.
• Create custom transformations that implement any combination of multiple inputs,
regular outputs, and error outputs.
• The input of a transformation is defined as one or more input columns.
• Some Integration Services transformations can also refer to external columns as input.
Example, the input to the OLE DB Command transformation includes external
columns.
• An output column is a column that the transformation adds to the data flow.
• Both regular outputs and error outputs contain output columns. These output columns
in turn act as input columns to the next component in the data flow, either another
transformation or a destination.

Example:

• Derived Column: Add or modify columns using expressions.


• Data Conversion: Convert data types (e.g., string to integer).
• Conditional Split: Route rows based on conditions (like an IF statement).
• Lookup: Match and join data from another source.
• Merge Join: Join two sorted datasets (INNER, LEFT OUTER).
• Multicast: Duplicate data flow into multiple outputs.
• Union All: Combine data from multiple flows into one.
• Aggregate: Perform operations like SUM, AVG, COUNT.
• Script Component: Use custom .NET code to transform data.

Destinations

• Load data into target systems or files.


• A destination is the data flow component that writes the data from a data flow to a
specific data store, or creates an in-memory dataset.
• Load data into flat files, process analytic objects, and provide data to other processes.
Also load data by accessing tables and views in databases and by running queries.
• A data flow can include multiple destinations that load data into different data stores.
• An Integration Services destination must have at least one input. The input contains
input columns, which come from another data flow component.
• The input columns are mapped to columns in the destination.
• Many destinations also have one error output.
• The error output for a destination contains output columns, which typically contain
information about errors that occur when writing data to the destination data store.
• Errors occur for many different reasons. For example, a column may contain a null
value, whereas the destination column cannot be set to null.
• The Integration Services object model does not restrict the number of regular inputs
and error outputs.

Example:

• OLE DB Destination - Load into databases like SQL Server.


• Flat File Destination -Export to text files.
• Excel Destination - Write to Excelspreadsheets.
• Raw File Destination - Save in a raw SSIS-compatible format.

3. Describe the various types of SSIS transformation with implementation example.

Transformations modify data within a Data Flow task, enabling tasks like cleaning, merging,
sorting, joining, and distributing data.
Examples - Derived Column, Data Conversion, Conditional Split, Merge, etc.
How they work - Transformations read data from a source, apply the specified
transformations, and then pass the transformed data to the destination.

Common SSIS Transformations

(i). Row Transformations

• Derived Column: Creates new column values or replaces existing columns by


applying expressions
• Data Conversion: Converts data from one type to another
• Copy Column: Copies input columns to new output columns
• Character Map: Performs string operations like uppercase, lowercase, etc.

(ii). Rowset Transformations

• Aggregate: Performs calculations like SUM, AVG, COUNT, etc. across groups of
data
• Sort: Sorts data and optionally removes duplicates
• Pivot/Unpivot: Converts rows to columns (Pivot) or columns to rows (Unpivot)
• Percentage Sampling/Row Sampling: Extracts a sample of rows from the input

(iii). Split and Join Transformations

• Conditional Split: Routes rows to different outputs based on conditions


• Multicast: Sends each row to all outputs (1-to-many)
• Union All: Combines multiple inputs into one output (many-to-1)
• Merge: Combines two sorted inputs into one output
• Merge Join: Performs joins (inner, left outer, full outer) between two sorted inputs

(iv). Lookup Transformations

• Lookup: Performs exact match lookups against a reference dataset


• Fuzzy Lookup: Performs approximate matching using fuzzy logic
• Fuzzy Grouping: Identifies potential duplicate rows in data

(v). Advanced Transformations

• Slowly Changing Dimension (SCD): Implements Type 1 and Type 2 dimension


changes
• Term Extraction: Extracts terms from text columns
• Term Lookup: Looks up extracted terms against a reference table
• Data Mining Query: Executes data mining prediction queries
Best Methods for Using Transformations

Optimize Data Flow

• Place filters early in the data flow to reduce rows processed


• Use the most efficient transformation for your needs
• Avoid unnecessary transformations

Memory Management

• Be cautious with blocking transformations (Sort, Aggregate, etc.) that require all rows
before processing
• Consider using SSIS buffer tuning properties for large datasets

Error Handling

• Configure error outputs on transformations


• Implement robust logging
• Use event handlers for transformation errors

Performance Tips

• Use SQL operations where possible instead of SSIS transformations


• Consider temporary tables for complex transformations
• Parallelize data flows when possible

Example - Implementing a Common Transformation Pattern

A typical ETL pattern might use:

• Data Conversion to standardize data types


• Derived Column to add calculated fields
• Conditional Split to route data based on business rules
• Lookup to enrich data with reference information
• Aggregate to summarize data before loading

Debugging Transformations

• Use data viewers to inspect data between transformations


• Implement row counts to verify expected record counts
• Use breakpoints on data flow components
• Check transformation metadata to ensure proper column mappings

4. Demonstrate working with different types of containers in SSIS with example

Working with Containers in SSIS

SQL Server Integration Services (SSIS) containers are fundamental control flow elements
that help organize packages, manage scope, and implement complex workflow logic.
Types of Containers in SSIS

(i). Sequence Container

• Groups tasks into logical units


• Executes tasks sequentially (unless constrained by precedence constraints)
• Provides transaction scope for all contained tasks
• Allows variable scoping (variables defined within are only visible inside)

Common Uses:
• Grouping related tasks
• Creating transaction blocks
• Limiting variable scope
• Enabling/disabling sections of packages

(ii). For Loop Container

• Implements a traditional programming loop structure


• Contains initialization expression, evaluation expression, and iteration expression
• Continues executing until evaluation expression returns false

Common Uses:
• Processing files with sequential names
• Executing a task a specific number of times
• Implementing counters

(iii). Foreach Loop Container

• Iterates through collections of objects


• Supports multiple enumerator types:

• Foreach File Enumerator: Files in a folder


• Foreach Item Enumerator: Custom list of items
• Foreach ADO Enumerator: Rows in ADO recordset
• Foreach ADO.NET Enumerator: Rows in ADO.NET dataset
• Foreach From Variable Enumerator: Objects in a variable
• ForeachNodeList Enumerator: XML nodes
• Foreach SMO Enumerator: SQL Server Management Objects

Common Uses:
• Processing multiple files in a directory
• Executing tasks for each row in a dataset
• Handling collections of server names or other objects

(iv). Task Host Container

• Implicit container that encapsulates individual tasks


• Not visible in the SSIS designer but present in the object model
• Provides properties and services to the task it contains
Container Properties and Features
1. Transaction Support
• Containers can participate in transactions (Supported, Required, Not
Supported)
• MSDTC must be configured for distributed transactions
2. Variable Scoping
• Variables defined within a container are only visible to that container and its
children
• Child containers can access parent variables (unless redefined)
3. Logging
• Containers can have their own logging configuration
• Useful for tracking container-level events
4. Expressions
• Containers support property expressions that can dynamically modify
properties at runtime
5. Event Handlers
• Containers can have their own event handlers (OnError, OnWarning, etc.)
Practical Examples
Example 1: File Processing with Foreach Loop
plaintext

Copy

Download

Foreach Loop Container (Foreach File Enumerator)



├─ Get all .csv files from C:\Data\Inbound

└─ Data Flow Task
├─ Flat File Source (uses current file from loop)
└─ OLE DB Destination

Example 2: Batch Processing with Sequence Container

plaintext

Copy

Download

Sequence Container (Transaction=Required)



├─ Execute SQL Task (Log batch start)

├─ Data Flow Task (Load data)

└─ Execute SQL Task (Log batch completion)
Example 3: Controlled Iteration with For Loop

plaintext

Copy

Download

For Loop Container (Counter from 1 to 10)



└─ Execute SQL Task
└─ Executes stored procedure with current counter value

5. Explain different types of Precedence Constraints and Event Handlers in SSIS with
example.

Precedence constraints are the "connectors" that control workflow between tasks and
containers in SSIS control flows, determining the execution order based on conditions.

Types of Precedence Constraints

1. Success Constraint (green arrow)


• Next task executes only if previous task succeeds
• Most common constraint type
• Example: Load data only if file exists and was processed successfully

2. Failure Constraint (red arrow)


• Next task executes only if previous task fails
• Used for error handling workflows
• Example: Send notification email if data load fails
3. Completion Constraint (blue arrow)
• Next task executes regardless of previous task's outcome
• Used when you want to continue workflow no matter what
• Example: Log package execution details whether it succeeded or failed
Event Handlers
Event handlers are workflows that execute in response to specific events during package
execution.

Common Event Types

1. OnError
• Triggers when an error occurs
• Most commonly used event handler
• Example: Send email alert on task failure
2. OnExecStatusChanged
• Triggers when execution status changes
• Example: Log status changes to audit table
3. OnPreExecute/OnPostExecute
• Triggers before/after a task executes
• Example: Set variables before task runs or clean up after
4. OnWarning
• Triggers when a warning occurs
• Example: Log warnings to monitoring system
5. OnTaskFailed
• Triggers when a task fails
• Similar to OnError but more specific
Implementing Event Handlers

Add an event handler to a package

• At run time, containers and tasks raise events. Create custom event handlers that
respond to these events by running a workflow when the event is raised.
• For example: Create an event handler that sends an e-mail message when a task fails.
• An event handler is similar to a package. Like a package, an event handler can
provide scope for variables, and includes a control flow and optional data flows.
• Create event handlers by using the design surface of the Event Handlers tab in SSIS
Designer.

Add an event handler on the Event Handlers tab

• In SQL Server Data Tools (SSDT), open the Integration Services project that contains
the package you want.
• In Solution Explorer, double-click the package to open it.
• Click the Event Handlers tab.
• In the Executable list, select the executable for which you want to create an event
handler.
• In the Event handler list, select the event handler you want to build.
• Click the link on the design surface of the Event Handler tab.
• Add control flow items to the event handler, and connect items using a precedence
constraint by dragging the constraint from one control flow item to another.
• Optionally, add a Data Flow task, and on the design surface of the Data Flow tab,
create a data flow for the event handler.
• On the File menu, click Save Selected Items to save the package.Set the properties of
an event handler

Set properties in the Properties window of SQL Server Data Tools (SSDT) or
programmatically.
UNIT – IV
MULTIDIMENSIONAL DATA MODELING

Introduction to Data and Dimension Modeling –Types of Data Model – Data Modeling
Techniques – Fact Table – Dimension Table – Typical Dimensional Models – Dimensional
Model Life Cycle – Introduction to Business Metrics and KPIs – Creating Cubes using
SSAS.
Part – A (2 Marks)
1. What is data model? Why is there a need for data model?
• A data model is a diagrammatic representation of the data and the relationship
between its different entities.
• It assists in identifying how the entities are related through a visual representation of
their relationships and thus helps reduce possible errors in the database design

2. How is the conceptual data model differs from logical data model?

Characteristic Conceptual Data Model Logical Data Model

Purpose High-level business view Detailed business requirements

Audience Business stakeholders Business analysts & data architects

Level of Detail Entities & relationships only Attributes, keys, data types

Normalization Not applicable Typically normalized (3NF)

Example Content Customer places Order Customer(cust_id, name, address)

3. How is the logical data model different from physical data model?

Characteristic Logical Data Model Physical Data Model

Purpose Detailed business requirements Technical implementation

Business analysts & data


Audience DBAs & developers
architects

Level of Detail Attributes, keys, data types Tables, columns, indexes, storage

May be denormalized for


Normalization Typically normalized (3NF)
performance

Example Customer(cust_id, name, CUSTOMERS table with


Content address) VARCHAR(50)
4. Define Factless Fact.
Event fact tables are called Factless Fact. These tables that record events.
• For example in an attendance recording scenario, attendance can be recorded in terms
of ―yes‖ or ―no‖ OR with pusedo facts like ―1‖ or ―0‖.
• In such scenarios, we can count the values but adding them will give invalid values.

5. What are the benefits of dimensional modeling?


• Comprehensibility:
• Data presented is more subjective as compared to objective nature in a
relational model.
• Data is arranged in a coherent category or dimensions to enable better
comprehension.
• Improved query performance:
• Trended for data analysis scenarios.

6. What is snowflaking?
• The snowflake design is the result of further expansion and normalization of the
dimension table.
• Dimension table is said to be snowflaked if the low-cardinality attributes of the
dimensions have been divided into separate normalized tables.
• These tables are then joined to dimension table with referential constraints (foreign
key constraints).

7. When do we snowflake?
The dimensional model is snowflaked under the following two conditions:

• The dimension table consists of two or more sets of attributes which define
information at different grains.
• The sets of attributes of the same dimension table are being populated by different
source systems.

8. Draw the dimensional model life cycle


9. Compare metrics and KPIs.

Aspect Metrics KPIs

Strategic performance
Purpose General tracking
measurement

Broad (many
Scope Focused (few critical ones)
tracked)

Impact Operational insights Tied to business success

10. What is a level of granularity of a fact table?


• The level of granularity is the level of detail that place into the fact table in a
data warehouse.
• It implies the level of detail that one is willing to put for each transactional fact.

Part – B (Big Questions)


1. Describe the different types of data model with neat diagram.
Conceptual data model

The conceptual data model is designed by identifying the various entities and the highest-
level relationships between them as per the given requirements.

Features of a conceptual data model

• It identifies the most important entities.

• It identifies relationships between different entities.

• It does not support the specification of attributes.

• It does not support the specification of the primary key.

A conceptual model is developed to present an overall picture of the system by recognizing


the business objects involved.

It defines what entities exist, NOT which tables.

For example, ‗many to many‘ tables may exist in a logical or physical data model but they are
just shown as a relationship with no cardinality under the conceptual data model.
Logical Data Model

The logical data model is used to describe data in as much detail as possible.

While describing the data, no consideration is given to the physical implementation aspect.

Features of a logical data model:


• It identifies all entities and the relationships among them.
• It identifies all the attributes for each entity.
• It specifies the primary key for each entity.
• It specifies the foreign keys (keys identifying the relationship between different
entities).
• Normalization of entities is performed at this stage.

Normalization:
• 1NF
• 2NF
• 3NF and so on

Physical Data Model

Physical data model is a representation of how the model will be build in the database.

It exhibit all the table structures, including column names, columns data types, constraints,
primary key, foreign key, relationship between tables.
Features of physical data model
• Specification of all tables and columns.
• Foreign keys are used to identify relationships between tables.
• While logical data model is about normalization, physical data model may support de-
normalization based on user requirements.
• Physical considerations (implementation concerns) may cause the physical data model
to be quite different from the logical data model.
• Physical data model will be different for different RDBMS. For example, data type
for a column may be different for MySQL, DB2, Oracle, SQL Server, etc.

The steps for designing a physical data model are as follows:


• Convert entities into tables/relation.
• Convert relationships into foreign keys.
• Convert attributes into columns/fields.

2. Illustrate the following data modeling techniques with example. (i)Normalization


modeling (ii)Dimensional modeling

(i)Normalization (Entity relationship) Modeling

Construct the ER diagram

An industry service provider, ―InfoMechanists‖, has several Business Units (BUs) such as

• Financial Services(FS)
• Insurance Services (IS)
• Life Science Services (LSS)
• Communication Services (CS)
• Testing Services (TS)
Each BU has

• a Head as a manager
• Many employees reporting to him. Each employee has a current residential address.

There are cases where a couple (both husband and wife) are employed either in the same BU
or a different one.

In such a case, they (the couple) have same address.

For example, in an insurance project, the development and maintenance work is with
Insurance Services (IS) and the testing task is with Testing Services (TS). Each BU usually
works on several projects at a time.

Given the specifications mentioned to design an ER model by using the following steps.

Identify all the entities.


• Business units
• BU head
• employee

Identify the relationships among the entities along with cardinality and participation
type(total/partial participation).
• One to one
• Many to one

Identify the key attribute or attributes.


• Bu name
• Empid
• Projectid
• Clientid

Identify all other relevant attributes.


• Empname, emailed, phno
• Projectname, startdate, enddate
• Street, city, state, country

Plot the ER diagram with all attributes including key attribute(s).

The ER diagram is then reviewed with the business users.


(ii)Dimensional Modelling

An electronic gadget distributor company ―ElectronicsForAll‖ is based out of Delhi, India.

The company sells its products in north, north-west, and western regions of India.

They have sales units at Mumbai, Pune, Ahmedabad, Delhi, and Punjab.

The President of the company wants the latest sales information to measure the sales
performance and to take corrective actions if required.

He has requested this information from his business analysts.

Sales Report of ―ElectronicsForAll‖

Representation 1 : The number of units sold = 113


Representation 2: No of units sold over time

January February March April

14 41 33 25

Representation 3: No of items sold for each product over time

Products January February March April

Digital Camera 6 17

Mobile Phones 6 16 6 8

Pen Drives 8 25 21

Representation 4: No of items sold in each region for each product over time

Products January February March April

Mumbai Digital Camera 3 10

Mobile Phones 3 16 6

Pen Drives 4 16 6

Pune Digital Camera 3 7

Mobile Phones 3 8

Pen Drives 4 9 15

• Dimensional modeling is the first step towards building a dimensional database, i.e. a
data warehouse.
• Dimensional modeling divides the database into two parts:
(a) Measurement and (b) Context.
• These measurements are usually numeric values called facts.
• Facts are enclosed by various contexts that are true at the moment the facts are
recorded. These contexts are intuitively divided into independent logical clumps
called dimensions.
3. What constitutes a fact table? What are the various types of facts? Explain using
examples.
• A fact table is the central table in a star schema or snowflake schema data warehouse
design.
• It contains the quantitative measurements or metrics of a business process.

Types of Fact

Additive facts

• These are the facts that can be summed up/aggregated across all dimensions in a fact
table.
• For example, discrete numerical measures of activity — quantity sold, dollars sold,
etc.
• Consider a scenario where a retail store ―Northwind Traders‖ wants to analyze the
revenue generated.
• It can be in terms of any combination of multiple dimensions. Products, time, region,
and employee are the dimensions in this case.
• The revenue, which is a fact, can be aggregated along any of the above dimensions to
give the total revenue along that dimension.
• Such scenarios where the fact can be aggregated along all the dimensions make the
fact a fully additive or just an additive fact.
• Here revenue is the additive fact.

• The ―SalesFact‖ fact table along with its corresponding dimension tables.
• This fact table has one measure. ―SalesAmount‖, and three dimension keys,
―DateID‖, ―ProductID‖, and ―StoreID‖.
• The purpose of the ―SalesFact‖ table is to record the sales amount for each product in
each store on a daily basis.
• In this table, ―SalesAmount‖ is an additive fact because we can sum up this fact
along any of the three dimensions.
Semi Additive facts

• These are the facts that can be summed up for some dimensions in the fact table, but
not all.
• For example, account balances, inventory level, distinct counts etc.
• Figure depicts the ―AccountsFact‖ fact table along with its corresponding
dimension tables.
• The ―AccountsFact‖ fact table has two measures :―CurrentBalance‖ and
―ProfitMargin‖.
• It has two dimension keys: ―DatelD‖ and ―AccountID‖.
• ―CurrentBalance‖ is a semi- additive fact.
• It makes sense to add up current balances for all accounts to get the information on
―what's the total current balance for all accounts in the bank?‖
• However, it does not make sense to add up current balances through time.
• It does not make sense to add up all current balance for a given account for a given
account for each day of the month.
• Similarly, ―ProfitMargin‖ is another non-additive fact, as it does not make sense to
add profit margins at the account level or at the day level.

Non Additive facts

• These are the facts that cannot be summed up for some dimensions present in the fact
table.
• For example, measurement of room temperature, percentages, ratios, factless, facts,
etc.
• Non-additive facts are facts where SUM operator cannot be used to produce any
meaningful results.
• The following illustration will help you understand why room temperature is a non-
additive fact.

Date Temperature

5th May (7AM) 27

5th May (12 AM) 33

5th May (5 PM) 10

Sum 70 (Non-Meaningful result)

Average 23.3 (Meaningful result)


Examples of non-additive facts:

• Textual facts: Adding textual facts does not result in any number. However, counting
textual facts may result in a sensible number.
• Per-unit prices: Adding unit prices does not produce any meaningful number. For
example: the unit sales price or unit cost is strictly non-addictive.
• Percentages and ratios: A ratio, such as gross margin, is non-additive. Non-additive
facts are usually the result of ratio or other calculations, such as percentages.
• Measures of intensity: Measures of intensity such as the room temperature are non-
additive across all dimensions.
• Summing the room temperature across different times of the day produces a totally
non-meaningful number.
• Averages: Facts based on averages are non-additive.
• For example, average sales price is non-additive. Adding all the average unit prices
produces a meaningless number.
• Factless facts (event-based fact tables): Event fact tables are tables that record
events.
• For example, event fact tables are used to record events such as Webpage clicks and
employee or student attendance.
• In an attendance recording scenario, attendance can be recorded in terms of ―yes‖ or
―no‖ OR with pusedo facts like ―1‖ or ―0‖.
• In such scenarios, we can count the values but adding them will give invalid values.

4. Explain different types of dimensional tables with example.


• A dimension table is a companion table to a fact table in a data warehouse.
• It stores descriptive attributes (textual or categorical data) that provide context for the
numerical measures in the fact table.

Types of Dimension Tables

Degenerate Dimension

• A degenerate dimension is a data that is dimension in temperament but is present in a


fact table.
• It is a dimension without any attributes. Usually, a degenerate dimension is a
transaction-based number.
• There can be more than one degenerate dimension in a fact table.
• They act as dimension keys in fact tables; however, they are not joined to
corresponding dimensions in other dimension tables as all their attributes are already
present in other dimension tables.
• Degenerate dimensions can also be called textual facts, but they are not facts as the
primary key for the fact table is often a combination of dimensional foreign keys and
degenerate dimensions.
• For example, an insurance claim line fact table typically includes both claim and
policy numbers as degenerate dimensions.
• A manufacturer can include degenerate dimensions for the quote, order, and bill of
lading numbers in the shipments fact table.
• PointOfSalesFact table along with other dimension tables. The ―PointOfSalesFact‖
has two measures: AmountTransacted and QuantitySold.
• It has the following dimension keys: DateKey that links the ―PointOfSaleFact‖ to
―DimDate‖, ProductID that links the ―PointOfSaleFact‖ to ―DimProduct‖ and
―StoreID‖ that links the ―PointOfSaleFact‖ to ―DimStore‖.
• Here, TransactionNo is a degenrate dimension as it is a dimension key without a
corresponding dimension table.
• All information/details pertaining to the transaction are extracted and stored in the
―PointOfSaleFact‖ table itsel.
• There is no need to have a separate dimension table to store the attributes of the
transaction.

Slowly Changing Dimension (SCD)

• In a dimension model, dimension attributes are not fixed as their values can change
slowly over a period of time.
• Here comes the role of a slowly changing dimension.
• A slowly changing dimension is a dimension whose attribute/attributes for a record
(row) change slowly over time, rather than change on a regularly timely basis.
• Let us assume a company sells car-related accessories.
• The company decides to assign a new sales territory, Los Angeles, to its sales
representative, Bret Watson, who earlier operated from Chicago.
• How can you record the change without making it appear that Watson earlier held
Chicago?
• Let us take a look at the original record of Bret Watson: Now the original record has
to be changed as Bret Watson has been assigned ―Los Angeles‖ as his sales territory,
effective May 1, 2011.
• This would be done through a slowly changing dimension.

Given below are the approaches for handling a slowly changing dimension:

Type-I (Overwriting the History)

• In this approach, the existing dimension attribute is overwritten with new data, and
hence no history is preserved.

Type-ll (Preserving the History)

• A new row is added into the dimension table with a new primary key every time a
change occurs to any of the attributes in the dimension table.
• Therefore, both the original values as well as the newly updated values are captured.

Type-III(Preserving One or more Versions of History)

• This approach is used when it is compulsory for the data warehouse to track historical
and when these changes will happen only for a finite number of times.
• Type-III SCDs do not increase the size of the table as compared to the Type-II SCDs
since old information is updated by adding new information.

Rapidly Changing Dimension (RCD)

• A dimension is considered to be a fast changing dimension, also call changing


dimension.
• For example, consider a customer table having 1,00,000 rows.
• Assuming that on an average 10 changes occur in a dimension every year, then in one
year the number of rows will increase to 1,00,000 x 10 = 10,00,000.
• To identify a fast changing dimension, look for attributes having continuously
variable values.
• Some of the fast changing dimension attributes have been identified as:

• Age
• Income
• Test score
• Rating
• Credit history score
• Customer account status
• Weight

• One method of handling fast changing dimensions is to break off a fast changing
dimension into one or more separate dimensions known as mini-dimensions.
• The fact table would then have two separate foreign keys — one for the primary
dimension table and another for the fast changing attribute.

Role Playing Dimension (RPD)

• A single dimension that is expressed differently in a fact table with the usage of views
is called a role-playing dimension.
• Consider an on-line transaction involving the purchase of a laptop.
• The moment an order is placed, an order date and a delivery date will be generated.
• It should be observed that both the dates are the attributes of the same time dimension.
• Whenever two separate analyses of the sales performance — one in terms of the order
date and the other in terms of the delivery date
• Two views of the same time dimension will be created to perform the analyses.
• In this scenario, the time dimension is called the role-playing dimension as it is
playing the role of both the order and delivery dates.

Junk Garbage Dimension (JGD)

• The garbage dimension is a dimension that contains low-cardinality


columns/attributes such as indicators, codes, and status flags.
• The garbage dimension is also known as junk dimension.
• The attributes in a garbage dimension are not associated with any hierarchy.
• Let us look at the following example from the healthcare domain. There are two
source tables and a fact table:
• Example, each of the source tables [CaseType (Case-TypeID, CaseTypeDescription)
and TreatmentLevel (Treatment TypelD, Treatment TypeDescription)] has only two
attributes each. The cardinality of each attribute is also low.
• One way to build the junk dimension will be to perform a cross-join of the source
tables.
• This will create all possible combinations of attributes, even if they do not or might
never exist in the real world.
• The other way is to build the junk dimension based on the actual attribute
combinations found in the source tables for the fact table.

5. Demonstrate the way of creating cubes using SSAS with procedures.


• SQL Server Analysis Services (SSAS) is a Microsoft OLAP (Online Analytical
Processing) tool that enables the creation of multidimensional cubes for advanced
business analytics.
• Cubes allow users to perform fast, interactive analysis of large datasets.

1. Prerequisites for Creating SSAS Cubes


 SQL Server with SSAS installed (Multidimensional mode)
 SQL Server Data Tools (SSDT) or Visual Studio with SSAS project template
 A prepared dimensional model (star/snowflake schema)
 Proper permissions to access data sources

2. Step-by-Step Cube Creation Process

Step 1: Create a New SSAS Project


 Open Visual Studio (with SSDT)
 Go to File → New → Project
 Select Analysis Services Multidimensional and Data Mining Project
 Name your project and click OK
Step 2: Define Data Sources
 In Solution Explorer, right-click Data Sources → New Data Source
 Click New to create a connection to your SQL Server database
 Select authentication method (Windows or SQL auth) and Test connection

Step 3: Create a Data Source View (DSV)


 Right-click Data Source Views → New Data Source View
 Select your data source
 Choose fact and dimension tables from your schema
 Define relationships between tables if not auto-detected
 Click Finish
Step 4: Build Dimensions
 Right-click Dimensions → New Dimension
 Use the Dimension Wizard
• Select creation method: "Use existing table"
• Choose the dimension table (e.g., Dim_Product)
• Select key and name columns
• Define hierarchies (e.g., Category → Subcategory → Product)
 Repeat for all dimensions (Customer, Date, etc.)

Step 5: Create the Cube


 Right-click Cubes → New Cube
 Use the Cube Wizard
• Select fact table (e.g., Fact_Sales)
• Choose measures (e.g., SalesAmount, Quantity)
• Select related dimensions
 Click Finish

Step 6: Configure Cube Properties


 Measures: Organize into measure groups
 Calculations: Add MDX calculations (e.g., Profit = Sales - Cost)
 KPIs: Define Key Performance Indicators
 Perspectives: Create simplified views of the cube
 Partitions: Optimize large cubes by splitting fact data

Step 7: Deploy and Process the Cube


 Right-click project → Properties
• Set Server (SSAS instance name)
• Set Database name
 Right-click project → Deploy (builds and deploys to SSAS)
 Process the cube to load data:
• Right-click cube → Process
• Choose Full Process (or incremental)

Step 8: Browse and Validate


 Go to the Browser tab in cube designer
 Drag and drop measures/dimensions to analyze data
 Test drill-down, slicing, and dicing operations
UNIT – V
ENTERPRISE REPORTING
Introduction to Enterprise Reporting – Reporting Perspectives Common to all Levels of
Enterprise – Report Standardization and Presentation Practices – Enterprise Reporting
Characteristics in OLAP – Concepts of Balanced Scorecards, Dashboards – Create
Dashboards – Scorecards vs Dashboards – Introduction to SSRS Architecture – Enterprise
Reporting using SSRS.
Part – A (2 Marks)
1. What is enterprise reporting?
• Enterprise reporting refers to the process of collecting, organizing, and distributing
business data across an organization to support decision-making.
• It involves structured reporting systems that provide insights into key business
metrics, operational performance, and financial health.

2. Compare enterprise reporting with traditional reporting.

Feature Enterprise Reporting Traditional Reporting

Scope Organization-wide Department-specific

Automation High (scheduled refreshes) Manual (Excel-based)

Data Source Multiple integrated systems Single database/Excel

Audience Executives, managers, analysts Individual teams

3. List out the types of enterprise reports.


• Operational Reports
• Analytical Reports
• Strategic Reports
• Regulatory Reports
4. Which are the report delivery formats?
• Printed Reports
• Secure Soft Copy
• Email attachment
• Link to reports
• Worksheet, PowerPoint Presentation, text, eBook
5. Write the enterprise reporting characteristics in OLAP.
• Single version of truth
• Role-based delivery
• Anywhere/anytime/any-device access
• Personalization
• Security
• Alerts
• Reports repository

6. What are the long term benefits of enterprise reporting?


• Enhanced collaboration
• Objective communication
• Reduced cost of audits/reviews
• Reduced decision cycle time
• Better predictability and ability to influence goals

7. Show the benefits of balanced scorecard.


• Translating the organizations strategies into measurable parameters
• Communicating the strategies to all the individuals in the organization
• Alignment of individual goals with the organizations strategic objectives

8. What are dashboards?


• Dashboard is graphical user interface that organizes and presents information in a way
that is easy to read.
• It provides at a glance insight to what is actually happening in an organization.

9. Why enterprises need dashboards?


• Faster decision-making – No manual data gathering
• Identify trends – Spot opportunities or risks early
• Improve transparency – Shared access to metrics
• Reduce reporting workload – Automated vs. static report
10. Distinguish between dashboards and scorecards,
Balanced Scorecard Dashboards
Business Use Performance Measure Monitor Operations
Users Senior Executives Operations Manager
Used by Corporate/Unit Corporate/
Depar tment

Data Summary Detail


Refresh Monthly/Quarterly/Annual Intra-day

Part – B(Big Questions)


1. Discuss the reporting perspectives common to all levels of enterprises.
• Enterprises have headquarters and several regional centres.
• Each geographic location may have ―revenue generating customer
facing units‖ and ―support units‖.
• There could be regional or corporate level support functions as well.
• It is natural to expect IT enabled reporting to occur at local, regional, or
corporate levels.
Common perspectives of reporting that apply at all levels of the enterprise.
Function level
• Reports being generated at the function level may be consumed by users within a
department or geographic location or region or by decision makers at the corporate
level.
• One needs to keep in mind the target audience for the reports.
• The requirements will vary based on the target audience.
• Reports could be produced in many languages to meet global user needs.

Internal/external
• Sometimes the consumers of reports may be external to the enterprise.
• We are very familiar with the annual reports of organizations.
• Correctness as well attractive presentation of the report is of paramount importance.
Role-based
• Today we are witnessing massive information overload.
• The trend is to provide standard format of report to similar roles across the enterprise,
as they are likely to make similar decisions.
• For example, a sales executive responsible for strategic accounts will need
similar information/facts for decision making irrespective of the country/ products
he/she handles.
Strategic/operational
• Reports could also be classified based on the nature of the purpose they preserve.
• Strategic reports inform the alignment with the goals, whereas operational reports
present transaction facts.
• The quarterly revenue report indicates variance with regard to meeting targets,
whereas the daily cash flow summary indicates summary of day's business
transactions.
• When consolidated across several locations, regions, products/services, even this
report will be of strategic importance.

Summary/detail
• As the name suggests, summary reports do not provide transaction-level information.
• Provide several summaries could be aggregated to track enterprise-level performance.

Standard/ad hoc
• Departments tend to generate periodic reports, say, weekly, monthly, or quarterly
reports in standard formats.
• Executives many times need ad hoc or on-demand critical business decision making.

Purpose
• Enterprises classify reports as statutory that focus on business transparency and
need to be shared with regulatory bodies.
• For example, a bank reporting to the Reserve Bank stipulated parameters of its
operations.
• Analytical reports look into a particular area of operation like sales, production, and
procurement, and they find patterns in historical data.
• These reports typically represent large data interpretations in the form of graphs.
• Scorecards are used in modern enterprises to objectively capture the key
performances against set targets and deviation with reasons.
Technology platform-centric

• Reporting in today‘s context need not use paper at all.


• Dashboards could be delivered on smartphones and tablets.
• Reports could be published in un-editable (secure) form with watermarks.
• Reports could be protected to be used by a specific person, during specific hours from
specific device.
• Reports could be delivered to the target user in user-preferred through an email link as
well.
• Security of data is a constant concern in large enterprisis reports represent the
secret recipe of the business.
• Several tools have emerged in the marketplace to meet the reporting
requirements of the enterprise.

2. Illustrate the various report standardization and presentation practices in the


enterprises.
Enterprises tend to standardize reporting from several perspectives.
(i) Report standardization perspectives :
Data standardization
• This standardization perspective enables enterprise users receive common,
pre-determined data.
Content standardization
• Enterprises focus on content standardization.
• This is tightly tied to the name of the report.
Presentation standardization
• Enterprises set standards on naming conventions, date formats, color
standards. Use of logos, fonts, page formats and so on.
Metrics standardization
• Enterprises find the metrics that reflect the performance of the project.
Reporting tools standardization
• Enterprises deply specific class of reporting tools for different requirements,
locations, and audienc.
(ii)Common Report Layout Types
Tabular reports
• Tabular reports have a finite number of columns, representing the fields in a database.
• It has header and footer, and repeating detail rows.
• Data can be grouped on various fields.
• Each group can have its own header, footer, breaks, and subtotal.
• It is used for logging detailed transactions.

Matrix report
• A matrix, cross tab aggregates data along the x-axis and y-axis of a grid to form a
summarized table
• Matrix report columns are not static but are based on the group values.
List reports
• A list report has a single, rectangular detail area that repeats fro every record or group
value in the underlying data set.
• The main purpose is to contain other related data regions and report items and to
repeat them for a group of values.
Chart reports
• Chart report provide a visual context for a lot of different kinds of data.
• There are several chart forms that can be used in the chart report such as bar chart,
line graph, column chart etc,
Gauge reports
• A gauge report is a data visualization tool that displays a single key performance
indicator (KPI) or metric in a format resembling a speedometer or dial gauge.

3. Explain about the balanced scorecards in terms of strategy map, measurement


system and management system.

• A Balanced Scorecard (BSC) is a strategic planning and management system that


aligns business activities with organizational vision and strategy.
• It provides a balanced view by tracking financial and non-financial metrics across
four key perspectives:
• Financial
• Customer
• Internal Processes
• Learning & Growth
• Developed by Robert Kaplan and David Norton in the 1990s, it helps organizations
measure performance holistically not just financially.
(i) Balanced scorecard as strategy map
• The Balanced Scorecard Strategy Map Describes How the Company Intends to
Create Value for Shareholders and Customers
Each of the four balanced scorecard perspectives can be described in terms of the
following parameters:
• Objectives
• Measurement
• Target
• initiative
(ii) Measurement system
• The measurement system interconnects the objectives and the measures in the various
perspectives so that they can be validated and managed.
• Example: an airline company wishes to reduce its operating costs. It has decided to
reduce the number of planes and increase the frequency of the flights.
• To ensure on time departure of flights, the airline company ensure the turnaround time
is less.
• The airline company achieves this by training and improving the skill sets of the crew.
The cause and effect relationship among four balanced scorecard perspectives is the
case of airline example:

The tabular representation of the objective, measures, target and initiative concerning
the airline example:

Objectives Measurement Target Initiative

Fast • On Ground • 30 • Cycle Time


Ground Time Mins
Turnaround • On-time • 90 % Optimization
Departure
(iii) Balanced scorecard as a management system
The balanced scorecard translates an organization‘s mission and strategies into tangible
objectives and measures.
Steps for designing the balance scorecard
• Clarify and translate vision and strategy
• Communicate and link strategic objectives and measures
• Plan, set targets, and align strategic initiatives
• Enhance strategic feedback and learning

4. Explain the process of creating dashboards and describe the different types of
dashboards with the help of a neat diagram.
Steps for creating dashboards
First step
Identify the data that will go into an Enterprise Dashboard .
Enterprise Dashboards can contain either/both of the below mentioned data:
• Quantitative data
• Non-Quantitative data
Second step
Decide on the timeframe.
E.g.: The various timeframes can be:
• This month to date
• This quarter to date
• This year to date
• Today so far

Third step
Decide on the comparative measures

• E.g.: the comparative measures can be:


• The same measure at the same point in time in the past
• The same measure at some other point in time in the past

Fourth step
Decide on the evaluation mechanisms
• E.g.: the evaluation can be performed as follows:
• Using visual objects e.g. traffic lights
• Using visual attributes e.g. red color for the measure to alert a serious
condition

Tips for creating dashboard


• Dont make your dashboard a data repository
• Avoid fancy formatting
• Limit each dashboard to one printable page

Types of dashboard
Enterprise performance Dashboards
• Provide a overall view of the entire enterprise, rather than specific business functions.
• Typical portlets in an Enterprise Performance Dashboard include:
• Corporate financials
• Sales revenue
• Business Unit KPI‘s [Key Performance Indicators]
• Supply chain information
• Compliance or regulatory data
• Balanced scorecard information
Customer Support Dashboards
• Organizations provide such a dashboard to its customers as a value-add service.
• They provide the customer with their personal account information as pertaining to
the business relationship, such as:
• Online Trading
• Utility Services
• Entertainment
• B2B SLA Monitoring
Divisional Dashboards

• One of the most popular dashboards, used to provide at-a-glance actionable


information to division heads, operational managers and department managers.
• Each division has its own set of KPIs which can be visually displayed on the
enterprise dashboard.
• Typical Divisional Dashboards include:
• Purchasing Dashboards
• Supply Chain Dashboards
• Operations Dashboards
• Manufacturing Dashboards
• Quality Control Dashboards
• Marketing Dashboards
• Sales Dashboards
• Finance Dashboards
• Human Resources Dashboards

5. Explain the architecture of SQL Server Reporting Services (SSRS) by identifying its
key components and illustrating their interactions with a well-structured diagram.

SQL Server Reporting Services (SSRS) is a comprehensive, server-based reporting


platform that provides a full range of ready-to-use tools and services to create, deploy, and
manage reports for your organization.

Key Components of SSRS Architecture

1. Report Server

The core component that forms the heart of SSRS architecture:

• Processes report requests


• Delivers reports in various formats
• Handles scheduling and subscriptions
• Manages security and authentication

2. Report Manager

• Web-based interface for managing reports and report server content


• Provides access controls and configuration settings
• Allows users to view and navigate reports

3. Report Designer

• Tool for creating reports (available in Visual Studio/SQL Server Data Tools)
• Supports drag-and-drop functionality
• Provides a preview mode for testing reports

4. Report Builder

• Standalone, user-friendly report authoring tool


• Designed for business users (less technical than Report Designer)
• Supports ad-hoc report creation

5.Report server database

• It stores metadata, resources, report definitions, security settings, delivery data and so
on.
6.Data sources

• Reporting services retrieves data from data sources like relational and
multidimensional data sources.

SSRS Processing Architecture

Authoring Phase - Reports are created using Report Designer or Report Builder (RDL files)

Management Phase - Reports are published to the Report Server

Processing Phase

• Report Processor retrieves the report definition


• Data Processing Extension connects to data sources and retrieves data
• Report Processor combines data with layout to produce the report

Rendering Phase - Report is converted to the requested output format (PDF, Excel, HTML,
etc.)

Delivery Phase - Report is delivered to the user via browser, email (subscription), or file
share
Processing of SQL Server Reporting Services

This diagram shows the processing of SQL Server Reporting Services.

You might also like