0% found this document useful (0 votes)
63 views12 pages

The Need of Data Analysis

A data warehouse is a database used for reporting and analysis. It stores data from operational systems and may pass through an operational data store for additional processing before being used in the data warehouse. A data warehouse maintains data in three layers: staging for raw data storage, integration for data transformation and abstraction, and access for user data retrieval. It provides cleaned, transformed and cataloged data to managers and analysts for tasks like data mining, online analytical processing, and decision support.

Uploaded by

Venkat Reddy A
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views12 pages

The Need of Data Analysis

A data warehouse is a database used for reporting and analysis. It stores data from operational systems and may pass through an operational data store for additional processing before being used in the data warehouse. A data warehouse maintains data in three layers: staging for raw data storage, integration for data transformation and abstraction, and access for user data retrieval. It provides cleaned, transformed and cataloged data to managers and analysts for tasks like data mining, online analytical processing, and decision support.

Uploaded by

Venkat Reddy A
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

data warehouse (DW) is a database used for reporting. The data is offloaded from the operational
systems for reporting. The data may pass through an operational data store for additional operations
before it is used in the DW for reporting.

A data warehouse maintains its functions in three layers: staging, integration, and access. Staging is used
to store raw data for use by developers (analysis and support). The integration layer is used to integrate
data and to have a level of abstraction from users. The access layer is for getting data out for users.

This definition of the data warehouse focuses on data storage. The main source of the data is cleaned,
transformed, catalogued and made available for use by managers and other business professionals for
data mining, online analytical processing, market research and decision support (Marakas & OBrien
2009). However, the means to retrieve and analyze data, to extract, transform and load data, and to
manage the data dictionary are also considered essential components of a data warehousing system.
Many references to data warehousing use this broader context. Thus, an expanded definition for data
warehousing includes business intelligence tools, tools to extract, transform and load data into the
repository, and tools to manage and retrieve metadata.

THE NEED OF DATA ANALYSIS

Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of
highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis
has multiple facets and approaches, encompassing diverse techniques under a variety of names, in
different business, science, and social science domains.
Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery for
predictive rather than purely descriptive purposes. Business intelligence covers data analysis that relies
heavily on aggregation, focusing on business information. In statistical applications, some people divide
data analysis into descriptive statistics, exploratory data analysis, and confirmatory data analysis. EDA
focuses on discovering new features in the data and CDA on confirming or falsifying existing
hypotheses. Predictive analytics focuses on application of statistical or structural models for predictive
forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to
extract and classify information from textual sources, a species of unstructured data. All are varieties of
data analysis.

Data integration is a precursor to data analysis, and data analysis is closely linked to data
visualization and data dissemination. The term data analysis is sometimes used as a synonym for data
modeling, which is unrelated to the subject of this

DECISION SUPPORT SYSTEMS:

A decision support systems (DSS) is a computer-based information system that supports business or


organizational decision-making activities. DSSs serve the management, operations, and planning levels
of an organization and help to make decisions, which may be rapidly changing and not easily specified in
advance.

DSSs include knowledge-based systems. A properly designed DSS is an interactive software-based


system intended to help decision makers compile useful information from a combination of raw data,
documents, personal knowledge, or business models to identify and solve problems and make decisions.

Typical information that a decision support application might gather and present are:

 inventories of information assets (including legacy and relational data sources, cubes, data
warehouses, and data marts),
 comparative sales figures between one period and the next,
 projected revenue figures based on product sales assumptions.

ONLINE ANALYTICAL PROCESSING:

In computing, online analytical processing, or OLAP an approach to swiftly answer multi-dimensional


analytical queries. OLAP is part of the broader category of business intelligence, which also
encompasses relational reporting and data mining. Typical applications of OLAP include business
reporting for sales, marketing, management reporting, business process management (BPM),
budgeting and forecasting, financial reporting and similar areas, with new applications coming up, such
as agriculture. The term OLAP was created as a slight modification of the traditional database
term OLTP (Online Transaction Processing).

Databases configured for OLAP use a multidimensional data model, allowing for complex analytical
and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and
hierarchical databases that are faster than relational databases.

The output of an OLAP query is typically displayed in a matrix (or pivot) format. The dimensions form the
rows and columns of the matrix; the measures form the values

STAR SCHEMAS:

The star schema (also called star-join schema, data cube, or multi-dimensional schema) is the simplest


style of data warehouse schema. The star schema consists of one or more fact tables referencing any
number of dimension tables. The star schema is considered an important special case of the snowflake
schema, and is more effective for handling simpler queries

EXAMPLE:

Consider a database of sales, perhaps from a store chain, classified by date, store and product. The
image of the schema to the right is a star schema version of the sample schema provided in
the snowflake schema article.
Fact.Sales is the fact table and there are three dimension
tables Dim.Date,Dim.Store and Dim.Product.

Each dimension table has a primary key on its PK column, relating to one of the columns (viewed as rows
in the example schema) of the Fact.Sales table's three-column (compound) primary key
(Date_FK, Store_FK,Product_FK). The non-primary key [Units Sold] column of the fact table in
this example represents a measure or metric that can be used in calculations and analysis. The non-
primary key columns of the dimension tables represent additional attributes of the dimensions (such as
the Yearof the Dim.Date dimension).

Using schema descriptors with dot-notation, combined with simple suffix decorations for column
differentiation, makes it easier to write the SQL for Star Schema queries. This is because fewer
underscores are required and table aliasing is minimized.

Most SQL database engines allow schemata descriptors, and also permit decoration suffixes on
surrogate keys columns. Using square brackets, which are physically easier to type on the keyboard (no
shift key needed) are not intrusive and make the code easier to read.
For example, the following query extracts how many TV sets have been sold, for each brand and country,
in 1997:
SELECT Brand, Country, SUM ([Units Sold])
FROM Fact.Sales

JOIN Dim.Date
ON Date_FK = Date_PK

JOIN Dim.Store
ON Store_FK = Store_PK

JOIN Dim.Product
ON Product_FK = Product_PK

WHERE [Year] = 1997


AND [Product Category] = 'tv'
GROUP BY Brand, Country

DATA MINING:

Data mining, a branch of computer science and artificial intelligence, is the process of extracting patterns
from data. Data mining is seen as an increasingly important tool by modern business to transform data
into business intelligence giving an informational advantage. It is currently used in a wide range
of profiling practices, such as marketing, surveillance, fraud detection, and scientific discovery.

The related terms data dredging, data fishing and data snooping refer to the use of data mining
techniques to sample portions of the larger population data set that are (or may be) too small for reliable
statistical inferences to be made about the validity of any patterns discovered (see also data-snooping
bias). These techniques can, however, be used in the creation of new hypotheses to test against the
larger data populations

Data mining commonly involves four classes of tasks:

 Clustering - is the task of discovering groups and structures in the data that are in some way or
another "similar", without using known structures in the data.
 Classification - is the task of generalizing known structure to apply to new data. For example, an
email program might attempt to classify an email as legitimate or spam.
 Common algorithms include decision tree learning, nearest neighbor, naive Bayesian
classification, neural networks and support vector machines.
 Regression - Attempts to find a function which models the data with the least error.
 Association rule learning - Searches for relationships between variables. For example a
supermarket might gather data on customer purchasing habits. Using association rule learning, the
supermarket can determine which products are frequently bought together and use this information
for marketing purposes. This is sometimes referred to as market basket analysis.
DATABASE ADMINISTRATION:

Database administration is the function of managing and maintaining database management


systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM DB2 and Microsoft SQL
Server need ongoing management. As such, corporations that use DBMS software often hire specialized
IT (Information Technology) personnel called Database Administrators or DBAs.

DBA Responsibilities

 Installation, configuration and upgrading of Database server software and related products.
 Evaluate Database features and Database related products.
 Establish and maintain sound backup and recovery policies and procedures.
 Take care of the Database design and implementation.
 Implement and maintain database security (create and maintain users and roles, assign
privileges).
 Database tuning and performance monitoring.
 Application tuning and performance monitoring.
 Setup and maintain documentation and standards.
 Plan growth and changes (capacity planning).
 Work as part of a team and provide 24x7 support when required.
 Do general technical troubleshooting and give consultation to development teams.
 Interface with Oracle / Microsoft Corporation for technical support.

Role of the Database in an Organization:An organization is traditionally viewed as a three level


pyramid-operational activities at the bottom, management planning and control activities in the middle and
strategic planning and policy making in top management. The corporate database contains data relating
to the organization, its operations, its plan and its environment.

The needs of organizations and management are changeable, diverse and often ill-defined, yet they must
be met. Added to these are outside pressures from federal taxing authorities, federal securities agencies
and legislators making privacy laws. Both internal and external forces demand that organizations exercise
control over their data resources. 

Decisions and actions in the organization are based upon the image contained in the corporate database.
Managerial decisions direct the actions at the operational level and produce plans and expectations which
are formally captured and stored in the corporate database. Transactions record actual results of
rganizational activities and environmental changes and update the database to maintain a current image .

Data - A Valuable Corporate Asset

Corporate Data is an Asset - photorack

To avoid data inaccuracies and the potential for disasters, there must be a corporate-wide awareness of
data quality and a recognition of the importance of corporate data

There are three critical success factors that each company needs to identify before moving
forward with the issue of data quality:

1. Commitment by senior management to the quality of corporate data


2. Definition of data quality
3. Quality assurance of data.

Senior management commitment to maintaining the quality of corporate data can be achieved by
instituting a data administration department that oversees data management standards, policies,
procedures, and guidelines.

Defining Data Quality

Data quality is defined as being data that is complete, timely, accurate, valid, and consistent. The
definition of data quality must describe the degree of quality required for each element loaded
into the data warehouse.

The quality assurance of data refers to the verification of the accuracy and correction of the data,
if necessary, and this may involve cleansing of existing data. Since no company is able to rectify
all of its unclean data, procedures have to be put in place to ensure data quality at the source.

This task can only be achieved by modifying business processes and designing data quality into
the system. In identifying every data item and its usefulness to the ultimate users of this data,
data quality requirements can be established. Increasing the quality of data as an after-the-fact
task is five to ten times more costly than capturing it correctly at the source.

If companies want to use data warehouse for competitive advantage and reap its benefits, data
quality is extremely important. Only when data quality is recognized as a corporate asset by
every member of the organization will the benefits of data warehousing and CRM initiatives be
realized.

Read on 

 Database Management Systems Can Perform Analysis


 Analyzing Customer Databases
 Learning from CRM Successes

Unreliable and inaccurate data in the data warehouse causes numerous problems. First and
foremost, the confidence of the users in the validity and reliability of this technology will be
seriously impaired. Furthermore, if the data is used for strategic decision making, unreliable data
affects the entire organization, and will affect senior management’s view of the data warehousing
project.

Erroneous Data

An excellent example of the damage that can be caused by erroneous data occurred in the early
1980s, when the banks had incorrect risk exposure data on Texas-based businesses. When the oil
market slumped those banks that had many Texas accounts encountered major losses. In other
cases, manufacturing firms scaled down their operations and took actions to eliminate excess
inventory. Because they had inaccurate market data, they had overestimated the inventory and
sold off critical business equipment.

Erroneous data should be captured and corrected of before it enters the warehouse. Capture and
correction are handled programmatically in the process of transforming data from one system to
the data warehouse. An example might be a field that was in lowercase that needs to be stored in
uppercase. A final means of handling errors is to replace erroneous data with a default value. If,
for example, the date February 29 of a non-leap year is defaulted to February 28, there is no loss
in data integrity.

Consistency of Data

In analyzing the characteristics of data required for data warehousing and data mining
applications, the quality of the data is of extreme importance in a data warehousing project, and
the challenge for data managers is to ensure the consistency of data entering the system. In some
organizations, data is stored in so-called 'flat' files and a variety of relational databases. In
addition, different systems designed for different functions contain the same terms but with
different meanings.
If care is not taken to clean up this terminology during data warehouse construction, misleading
management information results. The logical consequence of this requirement is that
management has to agree on the data definitions for elements in the warehouse.

Important Role Of Databases In An Organization Or Company

The database is a collection of information stored in the computer in a systematic way so that it
can be checked using a computer program to obtain information from the database. Software
used to manage and call the database query is called database management system. Database
system studied in information science.

The term database originated from computer science. Although then the meaning is more
widespread, putting things outside the field of electronics. Note that similar to the data base was
already there before the industrial revolution in the form of ledgers, receipts and collection of
data related to the business.

The basic concept of the database is a collection of records, or pieces of knowledge. A structured
database has an explanation of the types of facts that are stored in it an explanation is called a
schema. The scheme describes the objects that represented a data base, and the relationship
between these objects. There are many ways to organize the scheme, or model the database
structure is known as a database model or data model.

The database has a crucial role in an organization, and utilized for a number of objectives that
support the main objectives of the organization. The primary role of the database are as follows:

 Availability                             :  The database must be organized in such a way that data is


always available when needed, although the physical storage of data files do not have to
be at one location, but with computer network technology, these data files are logically
available to users.
 Speed and convenience     :  The database must be ensure that the data can be accessed
easily and quickly when necessary.
 Completeness                         :  The data stored in the database must be complete, in other
words to serve all the needs of its users, although the complete word is relative to the
needs of everyone, but database ensure ease of adding data collection, ensuring ease of
modifying the data structures such as the addition of data files.
 Accuracy                                 :  The data in the database files are organized in such a way
as to suppress the errors at the time of entry and in storage.
 Security                                   :  A good database system must provide facilities to secure
data so that data can not be accessed, modified, altered, or removed by people who were
not given rights. Database system should be able to determine who is allowed to access
the data who should not, as such data can be secured.
 Use common                          :  Its public database designed to be used by various work
units, and not limited to one user, at one location, or one application.
 Efficiency of storage           :  Database organization created in such a way as to avoid
duplication of data, because duplication increases the data storage space. Coding system
and data relationships that applied to the data base can save storage space.

In an enterprise database role is essential. Information can be obtained quickly thanks to the
underlying data has been stored in the database. For example, a mechanism for withdrawing
money at ATM (Automatic Transfer Money) actually is based on decisions that are based on the
data base. First, the system will validate the legitimacy of the cardholder to check the password
provided by that person. In this case, you entered will be matched with the password to the
database.

If the same, the next step will be implemented, namely checking the balance of money that was
recorded in the database against the amount of money taken. If you qualify, the money will be
issued by the machine. The database also allows applications like online KRS implemented,
which enables students to fill in data taking courses via the Internet.

So far I think the database is not only useful in an organization or company, but also for personal
purposes. By using software such as Microsoft Access database, one can manage the data into
personal affairs, such as telephone data and monthly spending data, and if necessary all the
information can be obtained easily and quickly.

Technically, the functional areas of an organization or company that has been generally applied
database system for the sake of efficiency, security, accuracy, and speed and ease of data
management, among others are:

• Human Resources (Personnel)


• Warehousing (inventory)
• Accounting (financial)
• Reservation (booking tickets, hotel rooms, etc.)
• Customer service (customer services)
• Sales (point of sale in supermarkets)
• etc.

Various organizations have implemented the database in its information systems, and managed
to improve the performance of organizations, among others:

• Banking
• Insurance
• Education / schools
• Supermarkets
• Hospitals
• Travel agencies
• Industrial / manufacturing
• Telecommunications
• etc.
That’s some of my descriptions and opinions about the important role of databases in an
organization and company.

Oracle Database Administration Tools

Aqua Data Studio provides database administration tools for the Oracle database. To access the Oracle
DBA Tools, you may use the Application menu under DBA Tools->Oracle->[Tool]. You may also access
the Oracle DBA Tool by selecting the context popup menu on the schema browser and select DBA Tools-
>[Name].

The tools consist of 8 tools to manage every aspect of an Oracle database. The tools include:

 Instance Manager: Provides manageability of the Oracle instance allowing the user to view and
modify server parameters. Including the monitoring and backup of the Oracle controlfile.

 Storage Manager: Provides manageability of the Oracle tablespaces and datafiles. Allowing a
user to visualize and maintained storage. Including object and file IO statistics.

 Rollback Manager: Provides the monitoring and maintenance or rollback segments, including
current statements, transactions and execution plans.
 Log Manager: Provides manageability of Redo Logs and Archive Logs. Allows users to create and
manage redo logs including monitoring archive logs.

 Security Manager: Provides manageability of users, roles and profiles. Allowing the user to
manage permissions, roles and security of the Oracle database.

 Session Manager: Provides manageability of database sessions, including user and system locks.
Allowing the user to kill/disconnect sessions, start traces and monitor open cursors and user
queries with execution plans.

 SGA Manager: Provides manageability of the Oracle SGA area, including SQL Area, Lib Cache, Lib
Cache Stats and a summary of the SGA. Allowing users to also pin and unpin code.
 Server Statistics: Provides a summary of statistics for the Oracle instance, waits and latches.

You might also like