BI Mod 3
BI Mod 3
?
Ans : Multidemension database : Multidimensional database design refers to the organization and
structuring of data in a way that allows for efficient and intuitive analysis of information from
multiple dimensions. In simpler terms, it is a method of arranging data that enables users to view it
from various perspectives simultaneously.Multidimensional database design is crucial for effective
data warehousing and OLAP (Online Analytical Processing) systems.
1. Star Schema:
o Fact Tables: Central tables containing quantitative data (measures) like sales amounts, quantities,
etc.
o Dimension Tables: Surrounding tables that contain descriptive attributes related to the fact
table, like time, product, or location.
o Design: Fact tables are connected to dimension tables through foreign keys, creating a star-like
structure.
2. Snowflake Schema:
o Normalization: Dimension tables are normalized into multiple related tables, reducing
redundancy and improving data integrity.
o Design: A variation of the star schema where dimension tables are split into hierarchical layers,
creating a snowflake-like structure.
o Multiple Fact Tables: Contains multiple fact tables that share dimension tables.
o Design: Useful for complex databases where multiple fact tables relate to the same dimensions
but are used for different analytical purposes.
o Design: Useful in scenarios where OLAP operations are not as critical, and transactional data
processing is more important.
5. OLAP Cubes:
o Cube Design: Defines measures, dimensions, and hierarchies to allow for multidimensional
analysis.
o Design: Cubes store aggregated data and allow for quick querying and analysis across different
dimensions.
These techniques help structure and organize data for efficient querying and analysis,
supporting various business intelligence and reporting needs.
Q 4] Write a note on physical database design ?
Ans : Physical database design is a critical phase in the database development process that
translates the logical data model into a physical structure that is optimized for performance, storage
efficiency, and data integrity. It involves making decisions about how data will be stored, accessed,
and managed on physical storage media.
1.Partitioning :-
* Partitioning allows the data of one "logical" table to be spread across multiple physical datasets.
* The physical data distribution is based on a partitioning column, which is most commonly date.
* The partitioning column cannot be a derived column, and it cannot contain NULL values.
2.Clustering :-
* Clustering is a very useful technique for sequential access of large amounts of data.
* Database clustering is the process of connecting more than one single database instance or server
to your system.
3. Indexing :-
* It is a data structure technique used to locate and quickly access data in database.
* There are two extreme indexing strategies, one strategy is to index everything, and the other is to
index nothing.
* The objective of indexing is to organize and categorize information in a way that makes it easier to
retrieve and access.
* Backup refers to creating copies of important documents and data that are stored on your
computer. This process includes backing up your database, videos and other media.
Recovery is the process of recovering deleted or damaged data from backups. If you accidentally
delete something or get corrupted, you can restore it from your backup.
* To improve the performance of a query, break down a single query into components to be run
concurrently.
* Performance is greatly increased when multiple portions of one query run in parallel on multiple
processors.
* Parallel processing is a very important concept for BI applications and should be considered
whenever possible.
Q 7] List and define the deliverables resulting from the database design
activities ?
ANS : Physical data model :- The physical data model, also known as the logical database design, is a
diagram of the physical database structures that will contain the BI data. Depending on the selected
database design schema, this diagram can be an entity-relationship diagram, a star schema diagram,
or a snowflake diagram. It shows tables, columns, primary keys, foreign keys, cardinality, referential
integrity rules, and indices.
ii] Physical design of the BI target databases :- The physical database design components include
dataset placement, index placement, partitioning, clustering, and indexing. These physical database
components must be defined to the DBMS when the BI target databases are created.
iii] Data definition language :- The DDL is a set of SQL instructions that tells the DBMS what types of
physical database structures to create, such as databases, tablespaces, tables, columns, and indices.
iv] Data control language :- The DCL is a set of SQL instructions that tells the DBMS what types of
CRUD access to grant to people, groups, programs, and tools.
v] Physical BI target databases :- Running (executing) the DDL and DCL statements builds the actual BI
target databases.
vi] Database maintenance procedures :- These procedures describe the time and frequency
allocated for performing ongoing database maintenance activities, such as database backups,
recovery (including disaster recovery), and database reorganizations. The procedures should also
specify the process for and the frequency of performance-monitoring activities.
* Data administrator :- The data administrator should provide the logical data model and the meta
data to the database administrator. The logical data model and the meta data will be helpful to the
database administrator when he or she designs the BI target databases.
* Database administrator :- The database administrator has the primary responsibility for database
design. He or she needs to know the access paths, weigh the projected data volumes and growth
factors, and understand the platform limitations. He or she must create and run the DDL and DCL to
build the physical databases. In addition, he or she is responsible for choosing the most appropriate
implementation options.
* ETL lead developer :- The ETL process is dependent on the database design. The ETL lead
developer should be involved in the database design activities in order to stay informed about any
database design changes that will affect the ETL process or the ETL programming specifications.
Q 9] What is the importance of incremental rollout in BI ?
ANS : When planning the implementation, use the same iterative approach used when developing
the BI application and the meta data repository. The iterative approach, or incremental rollout, works
well because it reduces the risk of exposing potential defects in the BI application to the entire
organization. In addition, it gives you the opportunity to informally demonstrate the BI concepts and
the BI tool features to the business people who were not directly involved in the BI project. Here are
some suggestions.
* Start with a small group of business people :- This small group should consist of not only "power
users" but also some less technology-savvy knowledge workers and business analysts, as well as the
primary business representative who was involved in the development work as a member of the core
team.
* Take the opportunity to test your implementation approach. You may consider adjusting your
implementation approach or modifying the BI application prior to the full rollout (e.g., change
cumbersome logon procedures).
* It may be necessary to duplicate implementation activities at multiple sites. Adding these sites
slowly over time is easier than launching them all at the same time.
* Organizations that have strong security umbrellas on their mainframes are more likely to pay
attention to security measures for their BI applications on multi-tier platforms.
* Organizations may unintentionally expose themselves to security breaches, especially if they plan
to deliver information from the BI target databases over the Web.
* No off-the-shelf umbrella security solutions can impose this kind of security. This security
requirement would have to be implemented through the various security features of the database
management system (DBMS) and of the access and analysis tools used by the BI application.
* The solution of imposing security at a table level may not be granular enough.
* One possible way to achieve this type of security is to partition the tables either physically or
logically (through VIEWs).
* Partitioning will restrict access solely to the appropriate distributor as long as both the fact tables
and the dimension tables are partitioned.
* An alternative may be to enhance the meta data with definitions of data parameters, which could
control access to the data. This form of security would be implemented with appropriate program
logic to tell the meta data repository the distributor's identity, allowing the application to return the
appropriate data for that distributor only. This type of security measure will be only as good as the
program controlling it.
* Internet security refers to security designed to protect systems and the activities of employees and
other users while connected to the internet, web browsers, web apps, websites, and networks.
* Internet security solutions protect users and corporate assets from cybersecurity attacks and
threats.
* Authentication is the process of identifying a person, usually based on a logon ID and password.
This process is meant to ensure that the person is who he or she claims to be.
* Encryption is the "translation" of data into a secret code. It is the most effective way to achieve
data security. To read an encrypted file, you must have access to a secret key or password that
enables you to decrypt it. Unencrypted data is usually referred to as plain text, while encrypted data
is usually referred to as cipher text.
> Incremental backup :- An incremental backup is a backup type that only copies data that has been
changed or created since the previous backup activity was conducted.
> High-speed mainframe backup :- Another possibility is to use the mainframe transfer utilities to
pass BI data back to the mainframe for a high-speed backup, which is supported only on the
mainframe.
* Mainframe backup serves as a safety net, ensuring that in the event of data loss or corruption,
critical information can be restored to its original state.
> Partial backup :- A partial backup resembles a full database backup, but a partial backup does not
contain all the filegroups.
* In this partial backup one partition is being backed up, the other partitions can remain available.
RECOVERY : Data recovery is the process of restoring data that has been lost, unintentionally
deleted, corrupted, or made inaccessible.
> Physical data recovery: - Physical data recovery means that you can recover all your lost data after
it has been damaged due to some reasons like broken hardware or memory.
> Instant data recovery: - Whenever data is lost in this data recovery methodology, the user is
automatically directed to a backup server where they can quickly have a look at their workload while
IT is taking care of the restorative process in the background.
> Formatted Drive Recovery: Data retrieval from hard disks that have been formatted or initialized
would be achieved through formatted hard disk recovery process
> Logical data recovery :- Logical data recovery is a process of recovering data that has been lost due
to logical failures.