0% found this document useful (0 votes)
51 views25 pages

Itm Mod 3

Module 3 discusses managing data resources through data administration and governance. Data administration develops information policies and oversees database design. Data governance deals with policies for availability, usability, integrity and security of enterprise data. Large organizations require formal data administration and have database design groups responsible for database structure, content and maintenance. Steps must be taken to ensure accurate, reliable data through quality audits, data cleansing, and consistency across sources.

Uploaded by

SHARMIKA SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views25 pages

Itm Mod 3

Module 3 discusses managing data resources through data administration and governance. Data administration develops information policies and oversees database design. Data governance deals with policies for availability, usability, integrity and security of enterprise data. Large organizations require formal data administration and have database design groups responsible for database structure, content and maintenance. Steps must be taken to ensure accurate, reliable data through quality audits, data cleansing, and consistency across sources.

Uploaded by

SHARMIKA SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

MODULE 3

MANAGING DATA RESOURCES


Data administration is responsible for the specific policies and procedures
through which data can be managed as an organizational resource. Responsibilities
include developing information policy, planning for data, overseeing logical
database design and data dictionary development, and monitoring how information
systems specialists and end-user groups use data. Large organizations often require
a formal data administration function.

Data governance deals with the policies and processes for managing the
availability, usability, integrity, and security of the data employed in an enterprise,
with special emphasis on promoting privacy, security, data quality, and compliance
with government regulations.

A large organization will also have a database design and management group that
is responsible for defining and organizing the structure and content of the database,
and maintaining the database. The functions it performs are called database
administration.

In managing data, steps must be taken to ensure that the data in organizational
databases are accurate and remain reliable. Data that are inaccurate, untimely, or
inconsistent with other sources of information lead to incorrect decisions, product
recalls, and even financial losses.

A good database design also includes efforts to maximize data quality and
eliminate error. Some data quality problems result from redundant and inconsistent
data, but most stem from errors in data input. Organizations need to identify and
correct faulty data and establish better routines for input and editing.

A data quality audit can be performed by surveying entire data files, sample data,
and surveying end-users impressions of data quality. Data cleansing (or data
scrubbing) techniques can be used to correct data and enforce consistency among
different sets of data.
What Is Data Management?
Data management is the development and execution of processes, architectures,
policies, practices and procedures in order to manage the information generated by
an organization.
The effective management of data within any organization has grown in
importance in recent years as organizations are subject to an increasing number of
compliance regulations, large increases in storage information storage capacity and
the sheer amount of data and documents being generated by organizations. This
rate of growth is not expected to slow down as IDC predicts the amount of
information generated will increase 29 fold by 2020(^). These large sums of data
from ERP systems, CRM systems and general business documents if often referred
to as big data.

Why Is Data Management Important?


Data management is important because the data your organization create is a very
valuable resource. The last thing you want to do is spend time and resources
collecting data and business intelligence, only to lose or misplace that information.
In that case, you would then have to spend time and resources again to get that
same business intelligence you already had.
93% of companies that lost their data center for 10 days or more due to a disaster
filed for bankruptcy within one year of the disaster. 50% of businesses that found
themselves without data management for this same time period filed for
bankruptcy immediately. (National Archives & Records Administration in
Washington). As you can see having a strong data management plan is very
important for the success of your company. Below are a few other benefits of a
strong data management plan.
Productivity
Good data management will make your organization more productive. On the flip
side, poor data management will lead to your organization being very inefficient.
Good data management makes it easier for employees to find and understand
information that they need to do their job. In addition, it allows them to easily
validate results or conclusions they may have. It also provides the structure for
information to be easily shared with others and to be stored for future reference
and easy retrieval.
Cost Efficiency
Another benefit of proper data management can be that it should allow your
organization to avoid unnecessary duplication. Be storing and making all data
easily referable it ensures you never have employees conducting the same research,
analysis or work that has already been completed by another employee.
Operational Nimbleness
In business the speed at which a company can make decisions and change direction
is a key factor to determining how successful a company can be. If a company
takes too long to react to the market or its competitors it can spell disaster for the
company. With a good data management system it can allow employees to access
information and be notified of market or competitor changes faster. As a result, it
allows a company to make decisions and take action significantly faster than
companies who have poor data management and data sharing systems.
Security Risks
In addition there are multiple risks if your data is not managed properly and your
information falls into the hands of the wrong people. For example electronics giant
Sony was prey to computer attacks which led to the theft of over 77 million
PlayStation users’ bank details. A strong data management system will greatly
reduce the risk of this ever happening to your organization.
Reduced Instances Of Data Loss
With a data management system and plan in place that all your employees know
and following it can greatly reduce the risk of losing vital information. With a data
management plan things will be put in place to ensure that important information is
backed up and retrievable from a secondary source if the primary source ever
becomes non accessible.
More Accurate Decisions
Many organizations use different sources of information for planning, trends
analysis, and managing performance. Within an organization different employees
may even use different sources of information to perform the same task if there is
no data management process and they are unaware of the correct information
source to use.

Data management challenges and how to overcome


them
1. Sheer volume of data
Every day, it’s estimated that 2.5 quintillion bytes of data are created. This leaves
organisations continuing to face the challenge of aggregating, managing and
creating value from data. The sheer amount of data being created and the numerous
collection channels make good data management an important, yet elusive goal.

2. Taking a reactive approach to data management


One of the biggest problems we often see, is that firms often don’t realise they
have a problem with their data. This means many organisations take a reactive
approach to data management, and will often wait until there are specific issues
that need fixing.

3. Lack of processes and systems


When data is extracted from disparate databases, the inevitable result is data
inconsistencies, and nobody trusts the numbers. A lack of processes, data
management systems and inadequate data strategies contribute towards inaccurate
data.

4. Fragmented data ownership


A lack of data ownership is one of the key shortfalls for most organisations we
speak to. Data ownership is still predominantly fragmented, with the management
of data quality driven by multiple stakeholders and frequently measured at a
department-by-department level, rather than across the business as a whole.

5. Driving a data culture


Many organisations cannot invoke enough support to improve data culture. This
may be due to organisations often lacking the knowledge or skills around data
management and the resources required to manage data properly.

A new dawn for data management


As organisations begin their shift to that of a more data-centric organisation, they
recognise both the importance of quality data and having a more sophisticated
approach to managing data. From building better customer relationships to
overcoming internal and external data management challenges, organisations will
need to overhaul and evolve their data management practices.

In addition, as organisations shift towards a centralised data management strategy,


they will be able to take on more sophisticated data projects. The ability to use
high-quality data to make critical business decisions and improve your bottom line
should be a huge focus in the coming months.

Data Independence
Data independence is the type of data transparency that matters for a
centralized DBMS. It refers to the immunity of user applications to changes made
in the definition and organization of data.
Physical data independence deals with hiding the details of the storage structure
from user applications. The application should not be involved with these issues,
since there is no difference in the operation carried out against the data.

A database system normally contains a lot of data in addition to users’ data. For
example, it stores data about data, known as metadata, to locate and retrieve data
easily. It is rather difficult to modify or update a set of metadata once it is stored in
the database. But as a DBMS expands, it needs to change over time to satisfy the
requirements of the users. If the entire data is dependent, it would become a tedious
and highly complex job.

Metadata itself follows a layered architecture, so that when we change data at one
layer, it does not affect the data at another level. This data is independent but
mapped to each other.
Logical Data Independence
Logical data is data about database, that is, it stores information about how data is
managed inside. For example, a table (relation) stored in the database and all its
constraints, applied on that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from
actual data stored on the disk. If we do some changes on table format, it should not
change the data residing on the disk.
Physical Data Independence
All the schemas are logical, and the actual data is stored in bit format on the disk.
Physical data independence is the power to change the physical data without
impacting the schema or logical data.
For example, in case we want to change or upgrade the storage system itself −
suppose we want to replace hard-disks with SSD − it should not have any impact
on the logical data or schemas.

Data Redundancy Defined


Data redundancy is a data organization issue that allows the unnecessary
duplication of data within your Microsoft Access database. A change or
modification, to redundant data, requires that you make changes to multiple fields
of a database. While this is the expected behaviour for flat file database designs
and spreadsheets, it defeats the purpose of relational database designs. The data
relationships, inherent in a relational database, should allow you to maintain a
single data field, at one location, and make the database’s relational model
responsible to port any changes, to that data field, across the database. Redundant
data wastes valuable space and creates troubling database maintenance problems.
To eliminate redundant data from your Microsoft Access database, you must take
special care to organize the data in your data tables. Normalization is a method of
organizing your data to prevent redundancy. Normalization involves establishing
and maintaining the integrity of your data tables as well as eliminating inconsistent
data dependencies.
Establishing and maintaining integrity requires that you follow the Access
prescribed rules to maintain parent-child, table relationships. Eliminating
inconsistent, data dependencies involves ensuring that data is housed in the
appropriate Access database table. An appropriate table is a table in which the data
has some relation to or dependence on the table.
Normalization requires that you adhere to rules, established by the database
community, to ensure that data is organized efficiently. These rules are called
normal form rules. Normalization may require that you include additional data
tables in your Access database. Normal form rules number from one to three, for
most applications. The rules are cumulative such that the rules of the 2nd normal
form are inclusive of the rules in the 1st normal form. The rules of the 3rd normal
form are inclusive of the rules in the 1st and 2nd normal forms, etc.

The rules are defined as follows:


1st normal form: Avoid storing similar data in multiple table fields.
▪ Eliminate repeating groups in individual tables.
▪ Create a separate table for each set of related data.
▪ Identify each set of related data with a primary key.
2nd normal form: Records should be dependent, only, upon a table’s primary
key(s)
▪ Create separate tables for sets of values that apply to multiple records.
▪ Relate these tables with a foreign key.
3rd normal form: Record fields should be part of the record’s key
▪ Eliminate fields that do not depend on the key.
The 3rd normal form suggests that fields, that apply to more than one record,
should be placed in a separate table. However, this may not be practical solution,
particularly for small databases. The inclusion of additional tables may degrade
database performance by opening more files than memory space allows. To
overcome this limitation, of the third normal form, you may want to apply the third
normal form only to data that is expected to change frequently.
Two, more advanced, normal forms have been established with application that is
more complex. The Failure to conform to the established rules of these normal
forms results in a less perfectly designed database, but the functionality of your
database is not affected by avoiding them.
The advanced normal forms are as follows:
4th normal form: Boyce Codd Normal Form (BCNF)
▪ Eliminate relations with multi-valued dependencies.
5th normal form:
▪ Create relations that cannot be further decomposed.

DATA CONSISTENCY
Consistency in database systems refers to the requirement that any given database
transaction must change affected data only in allowed ways. Any data written to
the database must be valid according to all defined rules,
including constraints, cascades, triggers, and any combination thereof. This does
not guarantee correctness of the transaction in all ways the application programmer
might have wanted (that is the responsibility of application-level code) but merely
that any programming errors cannot result in the violation of any defined database
constraints.[1]

Consistency, in the context of databases, states that data cannot be written that
would violate the database’s own rules for valid data. If a certain transaction
occurs that attempts to introduce inconsistent data, the entire transaction is rolled
back and an error returned to the user.

A simple rule of consistency may state that the ‘Gender’ column of a database may
only have the values ‘Male’ , ‘Female’ or ‘Unknown’. If a user attempts to enter
something else, say ‘Hermaphrodite’ then a database consistency rule kicks in and
disallows the entry of such a value.
Consistency rules can get quite elaborate, for example a bank account number must
follow a specific pattern- it must begin with a ‘C’ for checking account or ‘S’ for
savings account, then followed by 14 digits that are picked from the date and time,
in the format YYYYMMDDHHMISS.
Database consistency does not only occur at the single-record level. In our bank
example above, another consistency rule may state that the ‘Customer Name’ field
cannot be empty when creating a customer.
Consistency rules are vitally important while creating databases, as they are the
embodiment of the business rules for which the database is being created. They
also serve another important function: they make the application developers’ work
easier- it is usually much easier to define consistency rules at the database level
rather than defining them in the application that connects to the database

Data Access
Definition - What does Data Access mean?
Data access refers to a user's ability to access or retrieve data stored within a
database or other repository. Users who have data access can store, retrieve, move
or manipulate stored data, which can be stored on a wide range of hard drives and
external devices.

Techopedia explains Data Access


There are two ways to access stored data: random access and sequential access.
The sequential method requires information to be moved within the disk using a
seek operation until the data is located. Each segment of data has to be read one
after another until the requested data is found. Reading data randomly allows users
to store or retrieve data anywhere on the disk, and the data is accessed in constant
time.

Oftentimes when using random access, the data is split into multiple parts or pieces
and located anywhere randomly on a disk. Sequential files are usually faster to
load and retrieve because they require fewer seek operations.

DATA ADMINISTRATION
A database administrator (DBA) directs or performs all activities related to
maintaining a successful database environment. Responsibilities include designing,
implementing, and maintaining the database system; establishing policies and
procedures pertaining to the management, security, maintenance, and use of
the database management system; and training employees in database management
and use. A DBA is expected to stay abreast of emerging technologies and new
design approaches. Typically, a DBA has either a degree in Computer Science and
some on-the-job training with a particular database product or more extensive
experience with a range of database products. A DBA is usually expected to have
experience with one or more of the major database management products, such
as Structured Query Language, SAP, and Oracle-based database management
software.

Database administration refers to the whole set of activities performed by a


database administrator to ensure that a database is always available as needed.
Other closely related tasks and roles are database security, database monitoring and
troubleshooting, and planning for future growth.

Database administration is an important function in any organization that is


dependent on one or more databases.
The database administrator (DBA) is usually a dedicated role in the IT department
for large organizations. However, many smaller companies that cannot afford a
full-time DBA usually outsource or contract the role to a specialized vendor, or
merge the role with another in the ICT department so that both are performed by
one person.

The primary role of database administration is to ensure maximum up time for the
database so that it is always available when needed. This will typically involve
proactive periodic monitoring and troubleshooting. This in turn entails some
technical skills on the part of the DBA. In addition to in-depth knowledge of the
database in question, the DBA will also need knowledge and perhaps training in
the platform (database engine and operating system) on which the database runs.

A DBA is typically also responsible for other secondary, but still critically
important, tasks and roles. Some of these include:

• Database Security: Ensuring that only authorized users have access to the
database and fortifying it against any external, unauthorized access.
• Database Tuning: Tweaking any of several parameters to optimize
performance, such as server memory allocation, file fragmentation and disk
usage.
• Backup and Recovery: It is a DBA's role to ensure that the database has
adequate backup and recovery procedures in place to recover from any
accidental or deliberate loss of data.
• Producing Reports from Queries: DBAs are frequently called upon to
generate reports by writing queries, which are then run against the database.

It is clear from all the above that the database administration function requires
technical training and years of experience. Some companies that offer commercial
database products, such as Oracle DB and Microsoft's SQL Server, also offer
certifications for their specific products. These industry certifications, such as
Oracle Certified Professional (OCP) and Microsoft Certified Database
Administrator (MCDBA), go a long way toward assuring organizations that a DBA
is indeed thoroughly trained on the product in question. Because most relational
database products today use the SQL language, knowledge of SQL commands and
syntax is also a valuable asset for today's DBAs

MANAGING CONCURRENCY
In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions.
We have concurrency control protocols to ensure atomicity, isolation, and
serializability of concurrent transactions. Concurrency control protocols can be
broadly divided into two categories −
• Lock based protocols
• Time stamp based protocols
Lock-based Protocols
Database systems equipped with lock-based protocols use a mechanism by which
any transaction cannot read or write data until it acquires an appropriate lock on it.
Locks are of two kinds −
• Binary Locks − A lock on a data item can be in two states; it is either locked
or unlocked.
• Shared/exclusive − This type of locking mechanism differentiates the locks
based on their uses. If a lock is acquired on a data item to perform a write
operation, it is an exclusive lock. Allowing more than one transaction to write
on the same data item would lead the database into an inconsistent state. Read
locks are shared because no data value is being changed.

Database Security
Definition - What does Database Security mean?
Database security refers to the collective measures used to protect and secure a
database or database management software from illegitimate use and malicious
threats and attacks.
It is a broad term that includes a multitude of processes, tools and methodologies
that ensure security within a database environment.

Techopedia explains Database Security


Database security covers and enforces security on all aspects and components of
databases. This includes:

• Data stored in database


• Database server
• Database management system (DBMS)
• Other database workflow applications

Database security is generally planned, implemented and maintained by a database


administrator and or other information security professional.
Some of the ways database security is analyzed and implemented include:
• Restricting unauthorized access and use by implementing strong and
multifactor access and data management controls

• Load/stress testing and capacity testing of a database to ensure it does not


crash in a distributed denial of service (DDoS) attack or user overload

• Physical security of the database server and backup equipment from theft
and natural disasters

• Reviewing existing system for any known or unknown vulnerabilities and


defining and implementing a road map/plan to mitigate them

Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed
every second. The durability and robustness of a DBMS depends on its complex
architecture and its underlying hardware and system software. If it fails or crashes
amid transactions, it is expected that the system would follow some sort of
algorithm or techniques to recover lost data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various
categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from
where it can’t go any further. This is called transaction failure where only a few
transactions or processes are hurt.
Reasons for a transaction failure could be −
• Logical errors − Where a transaction cannot complete because it has some
code error or any internal error condition.
• System errors − Where the database system itself terminates an active
transaction because the DBMS is not able to execute it, or it has to stop
because of some system condition. For example, in case of deadlock or
resource unavailability, the system aborts an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop
abruptly and cause the system to crash. For example, interruptions in power supply
may cause the failure of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk
drives or storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head
crash or any other failure, which destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be
divided into two categories −
• Volatile storage − As the name suggests, a volatile storage cannot survive
system crashes. Volatile storage devices are placed very close to the CPU;
normally they are embedded onto the chipset itself. For example, main
memory and cache memory are examples of volatile storage. They are fast
but can store only a small amount of information.
• Non-volatile storage − These memories are made to survive system crashes.
They are huge in data storage capacity, but slower in accessibility. Examples
may include hard-disks, magnetic tapes, flash memory, and non-volatile
(battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed and various
files opened for them to modify the data items. Transactions are made of various
operations, which are atomic in nature. But according to ACID properties of
DBMS, atomicity of transactions as a whole must be maintained, that is, either all
the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
• It should check the states of all the transactions, which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
• It should check whether the transaction can be completed now or it needs to
be rolled back.
• No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well
as maintaining the atomicity of a transaction −
• Maintaining the logs of each transaction, and writing them onto some stable
storage before actually modifying the database.
• Maintaining shadow paging, where the changes are done on a volatile
memory, and later, the actual database is updated.
What is RDBMS?
RDBMS stands for Relational Database Management System. RDBMS is the basis
for SQL, and for all modern database systems like MS SQL Server, IBM DB2,
Oracle, MySQL, and Microsoft Access.
A Relational database management system (RDBMS) is a database management
system (DBMS) that is based on the relational model as introduced by E. F. Codd.
What is a table?
The data in an RDBMS is stored in database objects which are called as tables. This
table is basically a collection of related data entries and it consists of numerous
columns and rows.
Remember, a table is the most common and simplest form of data storage in a
relational database. The following program is an example of a CUSTOMERS table

• +----+----------+-----+-----------+----------+
• | ID | NAME | AGE | ADDRESS | SALARY |
• +----+----------+-----+-----------+----------+
• | 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
• | 2 | Khilan | 25 | Delhi | 1500.00 |
• | 3 | kaushik | 23 | Kota | 2000.00 |
• | 4 | Chaitali | 25 | Mumbai | 6500.00 |
• | 5 | Hardik | 27 | Bhopal | 8500.00 |
• | 6 | Komal | 22 | MP | 4500.00 |
• | 7 | Muffy | 24 | Indore | 10000.00 |
• +----+----------+-----+-----------+----------+
What is a field?
Every table is broken up into smaller entities called fields. The fields in the
CUSTOMERS table consist of ID, NAME, AGE, ADDRESS and SALARY.
A field is a column in a table that is designed to maintain specific information about
every record in the table.
What is a Record or a Row?
A record is also called as a row of data is each individual entry that exists in a table.
For example, there are 7 records in the above CUSTOMERS table. Following is a
single row of data or record in the CUSTOMERS table −

• +----+----------+-----+-----------+----------+
• | 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
• +----+----------+-----+-----------+----------+
A record is a horizontal entity in a table.
What is a column?
A column is a vertical entity in a table that contains all information associated with
a specific field in a table.
For example, a column in the CUSTOMERS table is ADDRESS, which represents
location description and would be as shown below −

• +-----------+
• | ADDRESS |
• +-----------+
• | Ahmedabad |
• | Delhi |
• | Kota |
• | Mumbai |
• | Bhopal |
• | MP |
• | Indore |
• +----+------+

What is a NULL value?


A NULL value in a table is a value in a field that appears to be blank, which means
a field with a NULL value is a field with no value.
It is very important to understand that a NULL value is different than a zero value
or a field that contains spaces. A field with a NULL value is the one that has been
left blank during a record creation.

Data Integrity
The following categories of data integrity exist with each RDBMS −
• Entity Integrity − There are no duplicate rows in a table.
• Domain Integrity − Enforces valid entries for a given column by restricting
the type, the format, or the range of values.
• Referential integrity − Rows cannot be deleted, which are used by other
records.
• User-Defined Integrity − Enforces some specific business rules that do not
fall into entity, domain or referential integrity.

Database Normalization
Database normalization is the process of efficiently organizing data in a database.
There are two reasons of this normalization process −
• Eliminating redundant data, for example, storing the same data in more than
one table.
• Ensuring data dependencies make sense.
Both these reasons are worthy goals as they reduce the amount of space a database
consumes and ensures that data is logically stored. Normalization consists of a
series of guidelines that help guide you in creating a good database structure.
Normalization guidelines are divided into normal forms; think of a form as the
format or the way a database structure is laid out. The aim of normal forms is to
organize the database structure, so that it complies with the rules of first normal
form, then second normal form and finally the third normal form.
It is your choice to take it further and go to the fourth normal form, fifth normal
form and so on, but in general, the third normal form is more than enough.

• First Normal Form (1NF)


• Second Normal Form (2NF)
• Third Normal Form (3NF)
Understanding DBMS Architecture
A Database Management system is not always directly available for users and
applications to access and store data in it. A Database Management system can
be centralised(all the data stored at one location), decentralised(multiple copies
of database at different locations) or hierarchical, depending upon its architecture.
1-tier DBMS architecture also exist, this is when the database is directly available
to the user for using it to store data. Generally such a setup is used for local
application development, where programmers communicate directly with the
database for quick response.
Database Architecture is logically of two types:

1. 2-tier DBMS architecture


2. 3-tier DBMS architecture

2-tier DBMS Architecture


2-tier DBMS architecture includes an Application layer between the user and the
DBMS, which is responsible to communicate the user's request to the database
management system and then send the response from the DBMS to the user.
An application interface known as ODBC(Open Database Connectivity) provides
an API that allow client side program to call the DBMS. Most DBMS vendors
provide ODBC drivers for their DBMS.
Such an architecture provides the DBMS extra security as it is not exposed to the
End User directly. Also, security can be improved by adding security and
authentication checks in the Application layer too.

3-tier DBMS Architecture


3-tier DBMS architecture is the most commonly used architecture for web
applications.
It is an extension of the 2-tier architecture. In the 2-tier architecture, we have an
application layer which can be accessed programatically to perform various
operations on the DBMS. The application generally understands the Database
Access Language and processes end users requests to the DBMS.
In 3-tier architecture, an additional Presentation or GUI Layer is added, which
provides a graphical user interface for the End user to interact with the DBMS.
For the end user, the GUI layer is the Database System, and the end user has no
idea about the application layer and the DBMS system.
If you have used MySQL, then you must have seen PHPMyAdmin, it is the best
example of a 3-tier DBMS architecture.

data warehouse
A data warehouse is a federated repository for all the data collected by an
enterprise's various operational systems, be they physical or logical. Data
warehousing emphasizes the capture of data from diverse sources for access and
analysis rather than for transaction processing.
Typically, a data warehouse is a relational database housed on an
enterprise mainframe server or, increasingly, in the cloud. Data from various
online transaction processing (OLTP) applications and other sources are selectively
extracted for business intelligence activities, decision support and to answer user
inquiries.

Basic components of a data warehouse

A data warehouse stores data that is extracted from data stores and external
sources. The data records within the warehouse must contain details to make it
searchable and useful to business users. Taken together, there are three main
components of data warehousing:

• data sources from operational systems, such as Excel, ERP, CRM or financial
applications;
• a data staging area where data is cleaned and ordered; and
• a presentation area where data is warehoused.

Data analysis tools, such as business intelligence software, access the data within
the warehouse. Data warehouses can also feed data marts, which are decentralized
systems in which data from the warehouse is organized and made available to
specific business groups, such as sales or inventory teams.

In addition, Hadoop has become an important extension of data warehouses for


many enterprises because the data processing platform can improve components of
the data warehouse architecture -- from data ingestion to analytics processing to
data archiving.

Data warehouse benefits and options

Data warehouses can benefit organizations from an both IT and a business


perspective. Separating the analytical processes from the operational processes can
enhance the operational systems and enable business users to access and query
relevant data faster from multiple sources. In addition, data warehouses can offer
enhanced data quality and consistency, thereby improving business intelligence.

Beyond basic data warehouses

Businesses can choose on-premises, the cloud or data-warehouse-as-a-


service systems. On-premises data warehouses from IBM, Oracle and Teradata
offer flexibility and security so IT teams can maintain control over their data
warehouse management and configuration.

Cloud-based data warehouses such as Amazon Redshift, Google BigQuery,


Microsoft Azure SQL Data Warehouse and Snowflake enable companies to
quickly scale while eliminating the initial infrastructure investments and ongoing
maintenance requirements.

Data Mining
Definition - What does Data Mining mean?
Data mining is the process of analyzing hidden patterns of data according to
different perspectives for categorization into useful information, which is collected
and assembled in common areas, such as data warehouses, for efficient analysis,
data mining algorithms, facilitating business decision making and other
information requirements to ultimately cut costs and increase revenue.
Data mining is also known as data discovery and knowledge discovery.

Techopedia explains Data Mining


The major steps involved in a data mining process are:

• Extract, transform and load data into a data warehouse


• Store and manage data in a multidimensional databases
• Provide data access to business analysts using application software
• Present analyzed data in easily understandable forms, such as graphs
The first step in data mining is gathering relevant data critical for business.
Company data is either transactional, non-operational or metadata. Transactional
data deals with day-to-day operations like sales, inventory and cost etc. Non-
operational data is normally forecast, while metadata is concerned with logical
database design. Patterns and relationships among data elements render relevant
information, which may increase organizational revenue. Organizations with a
strong consumer focus deal with data mining techniques providing clear pictures of
products sold, price, competition and customer demographics.
For instance, the retail giant Wal-Mart transmits all its relevant information to a
data warehouse with terabytes of data. This data can easily be accessed by
suppliers enabling them to identify customer buying patterns. They can generate
patterns on shopping habits, most shopped days, most sought for products and
other data utilizing data mining techniques.
The second step in data mining is selecting a suitable algorithm - a mechanism
producing a data mining model. The general working of the algorithm involves
identifying trends in a set of data and using the output for parameter definition. The
most popular algorithms used for data mining are classification algorithms and
regression algorithms, which are used to identify relationships among data
elements. Major database vendors like Oracle and SQL incorporate data mining
algorithms, such as clustering and regression tress, to meet the demand for data
mining.

OLAP (online analytical processing)


OLAP (online analytical processing) is a computing method that enables users to
easily and selectively extract and query data in order to analyze it from different
points of view. OLAP business intelligence queries often aid in trends analysis,
financial reporting, sales forecasting, budgeting and other planning purposes.

For example, a user can request that data be analyzed to display a spreadsheet
showing all of a company's beach ball products sold in Florida in the month of
July, compare revenue figures with those for the same products in September and
then see a comparison of other product sales in Florida in the same time period.
How OLAP systems work
To facilitate this kind of analysis, data is collected from multiple data sources and
stored in data warehouses then cleansed and organized into data cubes.
Each OLAP cube contains data categorized by dimensions (such as customers,
geographic sales region and time period) derived by dimensional tables in the data
warehouses. Dimensions are then populated by members (such as customer names,
countries and months) that are organized hierarchically. OLAP cubes are often pre-
summarized across dimensions to drastically improve query time over relational
databases.

Analysts can then perform five types of OLAP analytical operations against
these multidimensional databases:

• Roll-up. Also known as consolidation, or drill-up, this operation summarizes


the data along the dimension.
• Drill-down. This allows analysts to navigate deeper among the dimensions of
data, for example drilling down from "time period" to "years" and "months" to
chart sales growth for a product.
• Slice. This enables an analyst to take one level of information for display, such
as "sales in 2017."
• Dice. This allows an analyst to select data from multiple dimensions to analyze,
such as "sales of blue beach balls in Iowa in 2017."
• Pivot. Analysts can gain a new view of data by rotating the data axes of the
cube.
Uses of OLAP
OLAP can be used for data mining or the discovery of previously undiscerned
relationships between data items. An OLAP database does not need to be as large
as a data warehouse, since not all transactional data is needed for trend analysis.
Using Open Database Connectivity (ODBC), data can be imported from existing
relational databases to create a multidimensional database for OLAP.

OLAP products include IBM Cognos, Oracle OLAP and Oracle Essbase. OLAP
features are also included in tools such as Microsoft Excel and Microsoft SQL
Server's Analysis Services). OLAP products are typically designed for multiple-
user environments, with the cost of the software based on the number of users.
OLTP (online transaction processing)
OLTP (online transaction processing) is a class of software programs capable of
supporting transaction-oriented applications on the Internet.

Typically, OLTP systems are used for order entry, financial transactions, customer
relationship management (CRM) and retail sales. Such systems have a large
number of users who conduct short transactions. Database queries are usually
simple, require sub-second response times and return relatively few records.

An important attribute of an OLTP system is its ability to maintain concurrency.


To avoid single points of failure, OLTP systems are often decentralized.

IBM's CICS (Customer Information Control System) is a well-known OLTP


product.

You might also like