0% found this document useful (0 votes)

51 views25 pages

Itm Mod 3

Module 3 discusses managing data resources through data administration and governance. Data administration develops information policies and oversees database design. Data governance deals with policies for availability, usability, integrity and security of enterprise data. Large organizations require formal data administration and have database design groups responsible for database structure, content and maintenance. Steps must be taken to ensure accurate, reliable data through quality audits, data cleansing, and consistency across sources.

Uploaded by

SHARMIKA SINGH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views25 pages

Itm Mod 3

Uploaded by

SHARMIKA SINGH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

MODULE 3

MANAGING DATA RESOURCES

Data administration is responsible for the specific policies and procedures
through which data can be managed as an organizational resource. Responsibilities
include developing information policy, planning for data, overseeing logical
database design and data dictionary development, and monitoring how information
systems specialists and end-user groups use data. Large organizations often require
a formal data administration function.

Data governance deals with the policies and processes for managing the
availability, usability, integrity, and security of the data employed in an enterprise,
with special emphasis on promoting privacy, security, data quality, and compliance
with government regulations.

A large organization will also have a database design and management group that
is responsible for defining and organizing the structure and content of the database,
and maintaining the database. The functions it performs are called database
administration.

In managing data, steps must be taken to ensure that the data in organizational
databases are accurate and remain reliable. Data that are inaccurate, untimely, or
inconsistent with other sources of information lead to incorrect decisions, product
recalls, and even financial losses.

A good database design also includes efforts to maximize data quality and
eliminate error. Some data quality problems result from redundant and inconsistent
data, but most stem from errors in data input. Organizations need to identify and
correct faulty data and establish better routines for input and editing.

A data quality audit can be performed by surveying entire data files, sample data,
and surveying end-users impressions of data quality. Data cleansing (or data
scrubbing) techniques can be used to correct data and enforce consistency among
different sets of data.
What Is Data Management?
Data management is the development and execution of processes, architectures,
policies, practices and procedures in order to manage the information generated by
an organization.
The effective management of data within any organization has grown in
importance in recent years as organizations are subject to an increasing number of
compliance regulations, large increases in storage information storage capacity and
the sheer amount of data and documents being generated by organizations. This
rate of growth is not expected to slow down as IDC predicts the amount of
information generated will increase 29 fold by 2020(^). These large sums of data
from ERP systems, CRM systems and general business documents if often referred
to as big data.

Why Is Data Management Important?

Data management is important because the data your organization create is a very
valuable resource. The last thing you want to do is spend time and resources
collecting data and business intelligence, only to lose or misplace that information.
In that case, you would then have to spend time and resources again to get that
same business intelligence you already had.
93% of companies that lost their data center for 10 days or more due to a disaster
filed for bankruptcy within one year of the disaster. 50% of businesses that found
themselves without data management for this same time period filed for
bankruptcy immediately. (National Archives & Records Administration in
Washington). As you can see having a strong data management plan is very
important for the success of your company. Below are a few other benefits of a
strong data management plan.
Productivity
Good data management will make your organization more productive. On the flip
side, poor data management will lead to your organization being very inefficient.
Good data management makes it easier for employees to find and understand
information that they need to do their job. In addition, it allows them to easily
validate results or conclusions they may have. It also provides the structure for
information to be easily shared with others and to be stored for future reference
and easy retrieval.
Cost Efficiency
Another benefit of proper data management can be that it should allow your
organization to avoid unnecessary duplication. Be storing and making all data
easily referable it ensures you never have employees conducting the same research,
analysis or work that has already been completed by another employee.
Operational Nimbleness
In business the speed at which a company can make decisions and change direction
is a key factor to determining how successful a company can be. If a company
takes too long to react to the market or its competitors it can spell disaster for the
company. With a good data management system it can allow employees to access
information and be notified of market or competitor changes faster. As a result, it
allows a company to make decisions and take action significantly faster than
companies who have poor data management and data sharing systems.
Security Risks
In addition there are multiple risks if your data is not managed properly and your
information falls into the hands of the wrong people. For example electronics giant
Sony was prey to computer attacks which led to the theft of over 77 million
PlayStation users’ bank details. A strong data management system will greatly
reduce the risk of this ever happening to your organization.
Reduced Instances Of Data Loss
With a data management system and plan in place that all your employees know
and following it can greatly reduce the risk of losing vital information. With a data
management plan things will be put in place to ensure that important information is
backed up and retrievable from a secondary source if the primary source ever
becomes non accessible.
More Accurate Decisions
Many organizations use different sources of information for planning, trends
analysis, and managing performance. Within an organization different employees
may even use different sources of information to perform the same task if there is
no data management process and they are unaware of the correct information
source to use.

Data management challenges and how to overcome

them
1. Sheer volume of data
Every day, it’s estimated that 2.5 quintillion bytes of data are created. This leaves
organisations continuing to face the challenge of aggregating, managing and
creating value from data. The sheer amount of data being created and the numerous
collection channels make good data management an important, yet elusive goal.

2. Taking a reactive approach to data management

One of the biggest problems we often see, is that firms often don’t realise they
have a problem with their data. This means many organisations take a reactive
approach to data management, and will often wait until there are specific issues
that need fixing.

3. Lack of processes and systems

When data is extracted from disparate databases, the inevitable result is data
inconsistencies, and nobody trusts the numbers. A lack of processes, data
management systems and inadequate data strategies contribute towards inaccurate
data.

4. Fragmented data ownership

A lack of data ownership is one of the key shortfalls for most organisations we
speak to. Data ownership is still predominantly fragmented, with the management
of data quality driven by multiple stakeholders and frequently measured at a
department-by-department level, rather than across the business as a whole.

5. Driving a data culture

Many organisations cannot invoke enough support to improve data culture. This
may be due to organisations often lacking the knowledge or skills around data
management and the resources required to manage data properly.

A new dawn for data management

As organisations begin their shift to that of a more data-centric organisation, they
recognise both the importance of quality data and having a more sophisticated
approach to managing data. From building better customer relationships to
overcoming internal and external data management challenges, organisations will
need to overhaul and evolve their data management practices.

In addition, as organisations shift towards a centralised data management strategy,

they will be able to take on more sophisticated data projects. The ability to use
high-quality data to make critical business decisions and improve your bottom line
should be a huge focus in the coming months.

Data Independence
Data independence is the type of data transparency that matters for a
centralized DBMS. It refers to the immunity of user applications to changes made
in the definition and organization of data.
Physical data independence deals with hiding the details of the storage structure
from user applications. The application should not be involved with these issues,
since there is no difference in the operation carried out against the data.

A database system normally contains a lot of data in addition to users’ data. For
example, it stores data about data, known as metadata, to locate and retrieve data
easily. It is rather difficult to modify or update a set of metadata once it is stored in
the database. But as a DBMS expands, it needs to change over time to satisfy the
requirements of the users. If the entire data is dependent, it would become a tedious
and highly complex job.

Metadata itself follows a layered architecture, so that when we change data at one
layer, it does not affect the data at another level. This data is independent but
mapped to each other.
Logical Data Independence
Logical data is data about database, that is, it stores information about how data is
managed inside. For example, a table (relation) stored in the database and all its
constraints, applied on that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from
actual data stored on the disk. If we do some changes on table format, it should not
change the data residing on the disk.
Physical Data Independence
All the schemas are logical, and the actual data is stored in bit format on the disk.
Physical data independence is the power to change the physical data without
impacting the schema or logical data.
For example, in case we want to change or upgrade the storage system itself −
suppose we want to replace hard-disks with SSD − it should not have any impact
on the logical data or schemas.

Data Redundancy Defined

Data redundancy is a data organization issue that allows the unnecessary
duplication of data within your Microsoft Access database. A change or
modification, to redundant data, requires that you make changes to multiple fields
of a database. While this is the expected behaviour for flat file database designs
and spreadsheets, it defeats the purpose of relational database designs. The data
relationships, inherent in a relational database, should allow you to maintain a
single data field, at one location, and make the database’s relational model
responsible to port any changes, to that data field, across the database. Redundant
data wastes valuable space and creates troubling database maintenance problems.
To eliminate redundant data from your Microsoft Access database, you must take
special care to organize the data in your data tables. Normalization is a method of
organizing your data to prevent redundancy. Normalization involves establishing
and maintaining the integrity of your data tables as well as eliminating inconsistent
data dependencies.
Establishing and maintaining integrity requires that you follow the Access
prescribed rules to maintain parent-child, table relationships. Eliminating
inconsistent, data dependencies involves ensuring that data is housed in the
appropriate Access database table. An appropriate table is a table in which the data
has some relation to or dependence on the table.
Normalization requires that you adhere to rules, established by the database
community, to ensure that data is organized efficiently. These rules are called
normal form rules. Normalization may require that you include additional data
tables in your Access database. Normal form rules number from one to three, for
most applications. The rules are cumulative such that the rules of the 2nd normal
form are inclusive of the rules in the 1st normal form. The rules of the 3rd normal
form are inclusive of the rules in the 1st and 2nd normal forms, etc.

The rules are defined as follows:

1st normal form: Avoid storing similar data in multiple table fields.
▪ Eliminate repeating groups in individual tables.
▪ Create a separate table for each set of related data.
▪ Identify each set of related data with a primary key.
2nd normal form: Records should be dependent, only, upon a table’s primary
key(s)
▪ Create separate tables for sets of values that apply to multiple records.
▪ Relate these tables with a foreign key.
3rd normal form: Record fields should be part of the record’s key
▪ Eliminate fields that do not depend on the key.
The 3rd normal form suggests that fields, that apply to more than one record,
should be placed in a separate table. However, this may not be practical solution,
particularly for small databases. The inclusion of additional tables may degrade
database performance by opening more files than memory space allows. To
overcome this limitation, of the third normal form, you may want to apply the third
normal form only to data that is expected to change frequently.
Two, more advanced, normal forms have been established with application that is
more complex. The Failure to conform to the established rules of these normal
forms results in a less perfectly designed database, but the functionality of your
database is not affected by avoiding them.
The advanced normal forms are as follows:
4th normal form: Boyce Codd Normal Form (BCNF)
▪ Eliminate relations with multi-valued dependencies.
5th normal form:
▪ Create relations that cannot be further decomposed.

DATA CONSISTENCY
Consistency in database systems refers to the requirement that any given database
transaction must change affected data only in allowed ways. Any data written to
the database must be valid according to all defined rules,
including constraints, cascades, triggers, and any combination thereof. This does
not guarantee correctness of the transaction in all ways the application programmer
might have wanted (that is the responsibility of application-level code) but merely
that any programming errors cannot result in the violation of any defined database
constraints.[1]

Consistency, in the context of databases, states that data cannot be written that
would violate the database’s own rules for valid data. If a certain transaction
occurs that attempts to introduce inconsistent data, the entire transaction is rolled
back and an error returned to the user.

A simple rule of consistency may state that the ‘Gender’ column of a database may
only have the values ‘Male’ , ‘Female’ or ‘Unknown’. If a user attempts to enter
something else, say ‘Hermaphrodite’ then a database consistency rule kicks in and
disallows the entry of such a value.
Consistency rules can get quite elaborate, for example a bank account number must
follow a specific pattern- it must begin with a ‘C’ for checking account or ‘S’ for
savings account, then followed by 14 digits that are picked from the date and time,
in the format YYYYMMDDHHMISS.
Database consistency does not only occur at the single-record level. In our bank
example above, another consistency rule may state that the ‘Customer Name’ field
cannot be empty when creating a customer.
Consistency rules are vitally important while creating databases, as they are the
embodiment of the business rules for which the database is being created. They
also serve another important function: they make the application developers’ work
easier- it is usually much easier to define consistency rules at the database level
rather than defining them in the application that connects to the database

Data Access
Definition - What does Data Access mean?
Data access refers to a user's ability to access or retrieve data stored within a
database or other repository. Users who have data access can store, retrieve, move
or manipulate stored data, which can be stored on a wide range of hard drives and
external devices.

Techopedia explains Data Access

There are two ways to access stored data: random access and sequential access.
The sequential method requires information to be moved within the disk using a
seek operation until the data is located. Each segment of data has to be read one
after another until the requested data is found. Reading data randomly allows users
to store or retrieve data anywhere on the disk, and the data is accessed in constant
time.

Oftentimes when using random access, the data is split into multiple parts or pieces
and located anywhere randomly on a disk. Sequential files are usually faster to
load and retrieve because they require fewer seek operations.

DATA ADMINISTRATION
A database administrator (DBA) directs or performs all activities related to
maintaining a successful database environment. Responsibilities include designing,
implementing, and maintaining the database system; establishing policies and
procedures pertaining to the management, security, maintenance, and use of
the database management system; and training employees in database management
and use. A DBA is expected to stay abreast of emerging technologies and new
design approaches. Typically, a DBA has either a degree in Computer Science and
some on-the-job training with a particular database product or more extensive
experience with a range of database products. A DBA is usually expected to have
experience with one or more of the major database management products, such
as Structured Query Language, SAP, and Oracle-based database management
software.

Database administration refers to the whole set of activities performed by a

database administrator to ensure that a database is always available as needed.
Other closely related tasks and roles are database security, database monitoring and
troubleshooting, and planning for future growth.

Database administration is an important function in any organization that is

dependent on one or more databases.
The database administrator (DBA) is usually a dedicated role in the IT department
for large organizations. However, many smaller companies that cannot afford a
full-time DBA usually outsource or contract the role to a specialized vendor, or
merge the role with another in the ICT department so that both are performed by
one person.

The primary role of database administration is to ensure maximum up time for the
database so that it is always available when needed. This will typically involve
proactive periodic monitoring and troubleshooting. This in turn entails some
technical skills on the part of the DBA. In addition to in-depth knowledge of the
database in question, the DBA will also need knowledge and perhaps training in
the platform (database engine and operating system) on which the database runs.

A DBA is typically also responsible for other secondary, but still critically
important, tasks and roles. Some of these include:

• Database Security: Ensuring that only authorized users have access to the
database and fortifying it against any external, unauthorized access.
• Database Tuning: Tweaking any of several parameters to optimize
performance, such as server memory allocation, file fragmentation and disk
usage.
• Backup and Recovery: It is a DBA's role to ensure that the database has
adequate backup and recovery procedures in place to recover from any
accidental or deliberate loss of data.
• Producing Reports from Queries: DBAs are frequently called upon to
generate reports by writing queries, which are then run against the database.

It is clear from all the above that the database administration function requires
technical training and years of experience. Some companies that offer commercial
database products, such as Oracle DB and Microsoft's SQL Server, also offer
certifications for their specific products. These industry certifications, such as
Oracle Certified Professional (OCP) and Microsoft Certified Database
Administrator (MCDBA), go a long way toward assuring organizations that a DBA
is indeed thoroughly trained on the product in question. Because most relational
database products today use the SQL language, knowledge of SQL commands and
syntax is also a valuable asset for today's DBAs

MANAGING CONCURRENCY
In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions.
We have concurrency control protocols to ensure atomicity, isolation, and
serializability of concurrent transactions. Concurrency control protocols can be
broadly divided into two categories −
• Lock based protocols
• Time stamp based protocols
Lock-based Protocols
Database systems equipped with lock-based protocols use a mechanism by which
any transaction cannot read or write data until it acquires an appropriate lock on it.
Locks are of two kinds −
• Binary Locks − A lock on a data item can be in two states; it is either locked
or unlocked.
• Shared/exclusive − This type of locking mechanism differentiates the locks
based on their uses. If a lock is acquired on a data item to perform a write
operation, it is an exclusive lock. Allowing more than one transaction to write
on the same data item would lead the database into an inconsistent state. Read
locks are shared because no data value is being changed.

Database Security
Definition - What does Database Security mean?
Database security refers to the collective measures used to protect and secure a
database or database management software from illegitimate use and malicious
threats and attacks.
It is a broad term that includes a multitude of processes, tools and methodologies
that ensure security within a database environment.

Techopedia explains Database Security

Database security covers and enforces security on all aspects and components of
databases. This includes:

• Data stored in database

• Database server
• Database management system (DBMS)
• Other database workflow applications

Database security is generally planned, implemented and maintained by a database

administrator and or other information security professional.
Some of the ways database security is analyzed and implemented include:
• Restricting unauthorized access and use by implementing strong and
multifactor access and data management controls

• Load/stress testing and capacity testing of a database to ensure it does not

crash in a distributed denial of service (DDoS) attack or user overload

• Physical security of the database server and backup equipment from theft
and natural disasters

• Reviewing existing system for any known or unknown vulnerabilities and

defining and implementing a road map/plan to mitigate them

Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed
every second. The durability and robustness of a DBMS depends on its complex
architecture and its underlying hardware and system software. If it fails or crashes
amid transactions, it is expected that the system would follow some sort of
algorithm or techniques to recover lost data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various
categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from
where it can’t go any further. This is called transaction failure where only a few
transactions or processes are hurt.
Reasons for a transaction failure could be −
• Logical errors − Where a transaction cannot complete because it has some
code error or any internal error condition.
• System errors − Where the database system itself terminates an active
transaction because the DBMS is not able to execute it, or it has to stop
because of some system condition. For example, in case of deadlock or
resource unavailability, the system aborts an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop
abruptly and cause the system to crash. For example, interruptions in power supply
may cause the failure of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk
drives or storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head
crash or any other failure, which destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be
divided into two categories −
• Volatile storage − As the name suggests, a volatile storage cannot survive
system crashes. Volatile storage devices are placed very close to the CPU;
normally they are embedded onto the chipset itself. For example, main
memory and cache memory are examples of volatile storage. They are fast
but can store only a small amount of information.
• Non-volatile storage − These memories are made to survive system crashes.
They are huge in data storage capacity, but slower in accessibility. Examples
may include hard-disks, magnetic tapes, flash memory, and non-volatile
(battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed and various
files opened for them to modify the data items. Transactions are made of various
operations, which are atomic in nature. But according to ACID properties of
DBMS, atomicity of transactions as a whole must be maintained, that is, either all
the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
• It should check the states of all the transactions, which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
• It should check whether the transaction can be completed now or it needs to
be rolled back.
• No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well
as maintaining the atomicity of a transaction −
• Maintaining the logs of each transaction, and writing them onto some stable
storage before actually modifying the database.
• Maintaining shadow paging, where the changes are done on a volatile
memory, and later, the actual database is updated.
What is RDBMS?
RDBMS stands for Relational Database Management System. RDBMS is the basis
for SQL, and for all modern database systems like MS SQL Server, IBM DB2,
Oracle, MySQL, and Microsoft Access.
A Relational database management system (RDBMS) is a database management
system (DBMS) that is based on the relational model as introduced by E. F. Codd.
What is a table?
The data in an RDBMS is stored in database objects which are called as tables. This
table is basically a collection of related data entries and it consists of numerous
columns and rows.
Remember, a table is the most common and simplest form of data storage in a
relational database. The following program is an example of a CUSTOMERS table
−

• +----+----------+-----+-----------+----------+
• | ID | NAME | AGE | ADDRESS | SALARY |
• +----+----------+-----+-----------+----------+
• | 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
• | 2 | Khilan | 25 | Delhi | 1500.00 |
• | 3 | kaushik | 23 | Kota | 2000.00 |
• | 4 | Chaitali | 25 | Mumbai | 6500.00 |
• | 5 | Hardik | 27 | Bhopal | 8500.00 |
• | 6 | Komal | 22 | MP | 4500.00 |
• | 7 | Muffy | 24 | Indore | 10000.00 |
• +----+----------+-----+-----------+----------+
What is a field?
Every table is broken up into smaller entities called fields. The fields in the
CUSTOMERS table consist of ID, NAME, AGE, ADDRESS and SALARY.
A field is a column in a table that is designed to maintain specific information about
every record in the table.
What is a Record or a Row?
A record is also called as a row of data is each individual entry that exists in a table.
For example, there are 7 records in the above CUSTOMERS table. Following is a
single row of data or record in the CUSTOMERS table −

• +----+----------+-----+-----------+----------+
• | 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
• +----+----------+-----+-----------+----------+
A record is a horizontal entity in a table.
What is a column?
A column is a vertical entity in a table that contains all information associated with
a specific field in a table.
For example, a column in the CUSTOMERS table is ADDRESS, which represents
location description and would be as shown below −

• +-----------+
• | ADDRESS |
• +-----------+
• | Ahmedabad |
• | Delhi |
• | Kota |
• | Mumbai |
• | Bhopal |
• | MP |
• | Indore |
• +----+------+

What is a NULL value?

A NULL value in a table is a value in a field that appears to be blank, which means
a field with a NULL value is a field with no value.
It is very important to understand that a NULL value is different than a zero value
or a field that contains spaces. A field with a NULL value is the one that has been
left blank during a record creation.

Data Integrity
The following categories of data integrity exist with each RDBMS −
• Entity Integrity − There are no duplicate rows in a table.
• Domain Integrity − Enforces valid entries for a given column by restricting
the type, the format, or the range of values.
• Referential integrity − Rows cannot be deleted, which are used by other
records.
• User-Defined Integrity − Enforces some specific business rules that do not
fall into entity, domain or referential integrity.

Database Normalization
Database normalization is the process of efficiently organizing data in a database.
There are two reasons of this normalization process −
• Eliminating redundant data, for example, storing the same data in more than
one table.
• Ensuring data dependencies make sense.
Both these reasons are worthy goals as they reduce the amount of space a database
consumes and ensures that data is logically stored. Normalization consists of a
series of guidelines that help guide you in creating a good database structure.
Normalization guidelines are divided into normal forms; think of a form as the
format or the way a database structure is laid out. The aim of normal forms is to
organize the database structure, so that it complies with the rules of first normal
form, then second normal form and finally the third normal form.
It is your choice to take it further and go to the fourth normal form, fifth normal
form and so on, but in general, the third normal form is more than enough.

• First Normal Form (1NF)

• Second Normal Form (2NF)
• Third Normal Form (3NF)
Understanding DBMS Architecture
A Database Management system is not always directly available for users and
applications to access and store data in it. A Database Management system can
be centralised(all the data stored at one location), decentralised(multiple copies
of database at different locations) or hierarchical, depending upon its architecture.
1-tier DBMS architecture also exist, this is when the database is directly available
to the user for using it to store data. Generally such a setup is used for local
application development, where programmers communicate directly with the
database for quick response.
Database Architecture is logically of two types:

1. 2-tier DBMS architecture

2. 3-tier DBMS architecture

2-tier DBMS Architecture

2-tier DBMS architecture includes an Application layer between the user and the
DBMS, which is responsible to communicate the user's request to the database
management system and then send the response from the DBMS to the user.
An application interface known as ODBC(Open Database Connectivity) provides
an API that allow client side program to call the DBMS. Most DBMS vendors
provide ODBC drivers for their DBMS.
Such an architecture provides the DBMS extra security as it is not exposed to the
End User directly. Also, security can be improved by adding security and
authentication checks in the Application layer too.

3-tier DBMS Architecture

3-tier DBMS architecture is the most commonly used architecture for web
applications.
It is an extension of the 2-tier architecture. In the 2-tier architecture, we have an
application layer which can be accessed programatically to perform various
operations on the DBMS. The application generally understands the Database
Access Language and processes end users requests to the DBMS.
In 3-tier architecture, an additional Presentation or GUI Layer is added, which
provides a graphical user interface for the End user to interact with the DBMS.
For the end user, the GUI layer is the Database System, and the end user has no
idea about the application layer and the DBMS system.
If you have used MySQL, then you must have seen PHPMyAdmin, it is the best
example of a 3-tier DBMS architecture.

data warehouse
A data warehouse is a federated repository for all the data collected by an
enterprise's various operational systems, be they physical or logical. Data
warehousing emphasizes the capture of data from diverse sources for access and
analysis rather than for transaction processing.
Typically, a data warehouse is a relational database housed on an
enterprise mainframe server or, increasingly, in the cloud. Data from various
online transaction processing (OLTP) applications and other sources are selectively
extracted for business intelligence activities, decision support and to answer user
inquiries.

Basic components of a data warehouse

A data warehouse stores data that is extracted from data stores and external
sources. The data records within the warehouse must contain details to make it
searchable and useful to business users. Taken together, there are three main
components of data warehousing:

• data sources from operational systems, such as Excel, ERP, CRM or financial
applications;
• a data staging area where data is cleaned and ordered; and
• a presentation area where data is warehoused.

Data analysis tools, such as business intelligence software, access the data within
the warehouse. Data warehouses can also feed data marts, which are decentralized
systems in which data from the warehouse is organized and made available to
specific business groups, such as sales or inventory teams.

In addition, Hadoop has become an important extension of data warehouses for

many enterprises because the data processing platform can improve components of
the data warehouse architecture -- from data ingestion to analytics processing to
data archiving.

Data warehouse benefits and options

Data warehouses can benefit organizations from an both IT and a business

perspective. Separating the analytical processes from the operational processes can
enhance the operational systems and enable business users to access and query
relevant data faster from multiple sources. In addition, data warehouses can offer
enhanced data quality and consistency, thereby improving business intelligence.

Beyond basic data warehouses

Businesses can choose on-premises, the cloud or data-warehouse-as-a-

service systems. On-premises data warehouses from IBM, Oracle and Teradata
offer flexibility and security so IT teams can maintain control over their data
warehouse management and configuration.

Cloud-based data warehouses such as Amazon Redshift, Google BigQuery,

Microsoft Azure SQL Data Warehouse and Snowflake enable companies to
quickly scale while eliminating the initial infrastructure investments and ongoing
maintenance requirements.

Data Mining
Definition - What does Data Mining mean?
Data mining is the process of analyzing hidden patterns of data according to
different perspectives for categorization into useful information, which is collected
and assembled in common areas, such as data warehouses, for efficient analysis,
data mining algorithms, facilitating business decision making and other
information requirements to ultimately cut costs and increase revenue.
Data mining is also known as data discovery and knowledge discovery.

Techopedia explains Data Mining

The major steps involved in a data mining process are:

• Extract, transform and load data into a data warehouse

• Store and manage data in a multidimensional databases
• Provide data access to business analysts using application software
• Present analyzed data in easily understandable forms, such as graphs
The first step in data mining is gathering relevant data critical for business.
Company data is either transactional, non-operational or metadata. Transactional
data deals with day-to-day operations like sales, inventory and cost etc. Non-
operational data is normally forecast, while metadata is concerned with logical
database design. Patterns and relationships among data elements render relevant
information, which may increase organizational revenue. Organizations with a
strong consumer focus deal with data mining techniques providing clear pictures of
products sold, price, competition and customer demographics.
For instance, the retail giant Wal-Mart transmits all its relevant information to a
data warehouse with terabytes of data. This data can easily be accessed by
suppliers enabling them to identify customer buying patterns. They can generate
patterns on shopping habits, most shopped days, most sought for products and
other data utilizing data mining techniques.
The second step in data mining is selecting a suitable algorithm - a mechanism
producing a data mining model. The general working of the algorithm involves
identifying trends in a set of data and using the output for parameter definition. The
most popular algorithms used for data mining are classification algorithms and
regression algorithms, which are used to identify relationships among data
elements. Major database vendors like Oracle and SQL incorporate data mining
algorithms, such as clustering and regression tress, to meet the demand for data
mining.

OLAP (online analytical processing)

OLAP (online analytical processing) is a computing method that enables users to
easily and selectively extract and query data in order to analyze it from different
points of view. OLAP business intelligence queries often aid in trends analysis,
financial reporting, sales forecasting, budgeting and other planning purposes.

For example, a user can request that data be analyzed to display a spreadsheet
showing all of a company's beach ball products sold in Florida in the month of
July, compare revenue figures with those for the same products in September and
then see a comparison of other product sales in Florida in the same time period.
How OLAP systems work
To facilitate this kind of analysis, data is collected from multiple data sources and
stored in data warehouses then cleansed and organized into data cubes.
Each OLAP cube contains data categorized by dimensions (such as customers,
geographic sales region and time period) derived by dimensional tables in the data
warehouses. Dimensions are then populated by members (such as customer names,
countries and months) that are organized hierarchically. OLAP cubes are often pre-
summarized across dimensions to drastically improve query time over relational
databases.

Analysts can then perform five types of OLAP analytical operations against
these multidimensional databases:

• Roll-up. Also known as consolidation, or drill-up, this operation summarizes

the data along the dimension.
• Drill-down. This allows analysts to navigate deeper among the dimensions of
data, for example drilling down from "time period" to "years" and "months" to
chart sales growth for a product.
• Slice. This enables an analyst to take one level of information for display, such
as "sales in 2017."
• Dice. This allows an analyst to select data from multiple dimensions to analyze,
such as "sales of blue beach balls in Iowa in 2017."
• Pivot. Analysts can gain a new view of data by rotating the data axes of the
cube.
Uses of OLAP
OLAP can be used for data mining or the discovery of previously undiscerned
relationships between data items. An OLAP database does not need to be as large
as a data warehouse, since not all transactional data is needed for trend analysis.
Using Open Database Connectivity (ODBC), data can be imported from existing
relational databases to create a multidimensional database for OLAP.

OLAP products include IBM Cognos, Oracle OLAP and Oracle Essbase. OLAP
features are also included in tools such as Microsoft Excel and Microsoft SQL
Server's Analysis Services). OLAP products are typically designed for multiple-
user environments, with the cost of the software based on the number of users.
OLTP (online transaction processing)
OLTP (online transaction processing) is a class of software programs capable of
supporting transaction-oriented applications on the Internet.

Typically, OLTP systems are used for order entry, financial transactions, customer
relationship management (CRM) and retail sales. Such systems have a large
number of users who conduct short transactions. Database queries are usually
simple, require sub-second response times and return relatively few records.

An important attribute of an OLTP system is its ability to maintain concurrency.

To avoid single points of failure, OLTP systems are often decentralized.

IBM's CICS (Customer Information Control System) is a well-known OLTP

product.

CDMP Chapter 1 Notes
No ratings yet
CDMP Chapter 1 Notes
17 pages
Data Management Maturity DMM Ebook
No ratings yet
Data Management Maturity DMM Ebook
9 pages
BIT221 / BIT 121: Data and Information Management
No ratings yet
BIT221 / BIT 121: Data and Information Management
23 pages
Data, Information and Knowledge Management Framework and The Data Management Book of Knowledge (DMBOK)
100% (11)
Data, Information and Knowledge Management Framework and The Data Management Book of Knowledge (DMBOK)
356 pages
4.installing A New Product in T24
100% (1)
4.installing A New Product in T24
17 pages
Enterprise Data Management PDF
0% (1)
Enterprise Data Management PDF
18 pages
Unit 1-Pda - Extra
No ratings yet
Unit 1-Pda - Extra
79 pages
Microsoft - DP-900.vMar-2024.by .Isac .111q
No ratings yet
Microsoft - DP-900.vMar-2024.by .Isac .111q
74 pages
Data Management Challenges
0% (1)
Data Management Challenges
9 pages
Group 3 Chapter 4 - The Data Resource
No ratings yet
Group 3 Chapter 4 - The Data Resource
69 pages
Data Management Chapter1
No ratings yet
Data Management Chapter1
11 pages
Unit I-Database Management System
No ratings yet
Unit I-Database Management System
67 pages
Data-Analytic Im 2021-2022
No ratings yet
Data-Analytic Im 2021-2022
67 pages
MIS Notes Unit-3
No ratings yet
MIS Notes Unit-3
16 pages
Group 1 Computer Presentation
No ratings yet
Group 1 Computer Presentation
14 pages
Data Analytics
No ratings yet
Data Analytics
56 pages
Gartner - Use - Data - Integration - Pattern - 270543
100% (1)
Gartner - Use - Data - Integration - Pattern - 270543
30 pages
Mis Unit 03
No ratings yet
Mis Unit 03
11 pages
CSC 404 07-11-2023
No ratings yet
CSC 404 07-11-2023
24 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
52 pages
TR DataManagement CheatSheet
No ratings yet
TR DataManagement CheatSheet
12 pages
MIS Unit-3
No ratings yet
MIS Unit-3
15 pages
Review Book Dmbok DR Yutub
No ratings yet
Review Book Dmbok DR Yutub
61 pages
cc6 1
No ratings yet
cc6 1
21 pages
Data Assignment
No ratings yet
Data Assignment
24 pages
Dbb3102 Unit 2
No ratings yet
Dbb3102 Unit 2
24 pages
ICT1 - Lec - 4 - Data Management
No ratings yet
ICT1 - Lec - 4 - Data Management
9 pages
Data Management
No ratings yet
Data Management
20 pages
Data Quality Metrics
No ratings yet
Data Quality Metrics
7 pages
Data Management: What It Is, Importance, and Challenges
No ratings yet
Data Management: What It Is, Importance, and Challenges
6 pages
Introduction To Data Management
No ratings yet
Introduction To Data Management
10 pages
Lecture 1 - Data Management
No ratings yet
Lecture 1 - Data Management
33 pages
Roadmap Data Analyst
No ratings yet
Roadmap Data Analyst
6 pages
Data Management Handout
No ratings yet
Data Management Handout
19 pages
Group7 Presentation (MIS)
No ratings yet
Group7 Presentation (MIS)
12 pages
Notes - IM1
No ratings yet
Notes - IM1
5 pages
Manajemen Data DMBOK (Kelompok 1)
No ratings yet
Manajemen Data DMBOK (Kelompok 1)
55 pages
Data Organization and Distribution (Systems and Output)
No ratings yet
Data Organization and Distribution (Systems and Output)
3 pages
Talend Data Quality Guide
No ratings yet
Talend Data Quality Guide
45 pages
BDAI A1 (Sem2 2024-2025) - V2
No ratings yet
BDAI A1 (Sem2 2024-2025) - V2
5 pages
Big Data Management
No ratings yet
Big Data Management
4 pages
MT6735M Android EMMC Scatter
0% (1)
MT6735M Android EMMC Scatter
7 pages
WP Dirty Data Omni
No ratings yet
WP Dirty Data Omni
13 pages
Chapter 1 Summary
No ratings yet
Chapter 1 Summary
7 pages
Importance of Data Management
No ratings yet
Importance of Data Management
2 pages
Databases & Database Management
No ratings yet
Databases & Database Management
8 pages
Data Management Handout
No ratings yet
Data Management Handout
19 pages
Chapter 1 Intro - DONE DONE DONE
No ratings yet
Chapter 1 Intro - DONE DONE DONE
27 pages
Data Management in A Business (Ict 100 - Group 3)
No ratings yet
Data Management in A Business (Ict 100 - Group 3)
3 pages
OpenText Archive Center 16.0.2 - Installation Guide For Windows (Integrated Archive Installer) English (AR160002-00-IGW-EN-1)
No ratings yet
OpenText Archive Center 16.0.2 - Installation Guide For Windows (Integrated Archive Installer) English (AR160002-00-IGW-EN-1)
62 pages
Data Management
No ratings yet
Data Management
6 pages
Mba It Unit 2
No ratings yet
Mba It Unit 2
6 pages
Enterprise Data Management (Midterm Reviewer)
No ratings yet
Enterprise Data Management (Midterm Reviewer)
6 pages
Power Pivot
No ratings yet
Power Pivot
20 pages
Information System Design "Student Registration System Example"
No ratings yet
Information System Design "Student Registration System Example"
18 pages
VMware Site Recovery Manager On NetApp Storage
100% (1)
VMware Site Recovery Manager On NetApp Storage
42 pages
Logcat CSC Update Log
No ratings yet
Logcat CSC Update Log
280 pages
UNIT-3 DBMS Normalization and FD
No ratings yet
UNIT-3 DBMS Normalization and FD
66 pages
Power Bi (Lab File) Ilma Hafeez
No ratings yet
Power Bi (Lab File) Ilma Hafeez
13 pages
Common To B.E / B.Tech. - CS & IT Programmes
No ratings yet
Common To B.E / B.Tech. - CS & IT Programmes
1 page
Module 6
No ratings yet
Module 6
13 pages
HR Data Analysis Assessment Questions
No ratings yet
HR Data Analysis Assessment Questions
2 pages
Netapp ® Snap Creator™ Framework 3.5.0 Installation and Administration Guide
No ratings yet
Netapp ® Snap Creator™ Framework 3.5.0 Installation and Administration Guide
200 pages
SQL Server 2008 Enterprise Edition On Windows 2008 Server
No ratings yet
SQL Server 2008 Enterprise Edition On Windows 2008 Server
17 pages
Web Application For Managing A Sports Complex
100% (1)
Web Application For Managing A Sports Complex
3 pages
Docu59923 - VMAX3 TimeFinder SnapVX and Microsoft SQL Server White Paper PDF
No ratings yet
Docu59923 - VMAX3 TimeFinder SnapVX and Microsoft SQL Server White Paper PDF
49 pages
Disk Partitioning Methods and File Systems
No ratings yet
Disk Partitioning Methods and File Systems
58 pages
Activity Sheets
No ratings yet
Activity Sheets
9 pages
Create Accounting
No ratings yet
Create Accounting
4 pages
2cqr Library Automation
No ratings yet
2cqr Library Automation
31 pages
Lista - Profesional Centro Mayorista PDF
No ratings yet
Lista - Profesional Centro Mayorista PDF
10 pages
Configure PDS Servers On Windows 2003: Plant Design System (PDS) Installation and Configuration Checklist
No ratings yet
Configure PDS Servers On Windows 2003: Plant Design System (PDS) Installation and Configuration Checklist
16 pages
Advantages of DBMS
No ratings yet
Advantages of DBMS
2 pages
Bugs Team 2 Tests Credits
No ratings yet
Bugs Team 2 Tests Credits
1 page
CH 06
No ratings yet
CH 06
12 pages
Frangipani: A Scalable Distributed File System
No ratings yet
Frangipani: A Scalable Distributed File System
3 pages
1 Deom For AICU
No ratings yet
1 Deom For AICU
4 pages
Assignment
No ratings yet
Assignment
2 pages
Data Analytics. Fast Overview.
From Everand
Data Analytics. Fast Overview.
George Letton
2.5/5 (19)
IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance
From Everand
IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance
Sunil Soares
3.5/5 (2)
Decision Making with Data
From Everand
Decision Making with Data
Ravi Deshpande
No ratings yet
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
From Everand
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
Riley Adams
5/5 (1)
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)
Management Information System
From Everand
Management Information System
IntroBooks Team
No ratings yet
Data Management
From Everand
Data Management
IntroBooks Team
No ratings yet

Itm Mod 3

Uploaded by

Itm Mod 3

Uploaded by

MODULE 3

MANAGING DATA RESOURCES

Why Is Data Management Important?

Data management challenges and how to overcome

2. Taking a reactive approach to data management

3. Lack of processes and systems

4. Fragmented data ownership

5. Driving a data culture

A new dawn for data management

In addition, as organisations shift towards a centralised data management strategy,

Data Redundancy Defined

The rules are defined as follows:

Techopedia explains Data Access

Database administration refers to the whole set of activities performed by a

Database administration is an important function in any organization that is

Techopedia explains Database Security

• Data stored in database

Database security is generally planned, implemented and maintained by a database

• Load/stress testing and capacity testing of a database to ensure it does not

• Reviewing existing system for any known or unknown vulnerabilities and

What is a NULL value?

• First Normal Form (1NF)

1. 2-tier DBMS architecture

2-tier DBMS Architecture

3-tier DBMS Architecture

Basic components of a data warehouse

In addition, Hadoop has become an important extension of data warehouses for

Data warehouse benefits and options

Data warehouses can benefit organizations from an both IT and a business

Beyond basic data warehouses

Businesses can choose on-premises, the cloud or data-warehouse-as-a-

Cloud-based data warehouses such as Amazon Redshift, Google BigQuery,

Techopedia explains Data Mining

• Extract, transform and load data into a data warehouse

OLAP (online analytical processing)

• Roll-up. Also known as consolidation, or drill-up, this operation summarizes

An important attribute of an OLTP system is its ability to maintain concurrency.

IBM's CICS (Customer Information Control System) is a well-known OLTP

You might also like