IT0009 Reviewer
IT0009 Reviewer
What is a Cursor?
Attribute Description
Now that you have a conceptual understanding of cursors, review the steps to use them:
• DECLARE the cursor in the declarative section by naming it and defining the SQL SELECT
statement to be associated with it.
• OPEN the cursor.
– This will populate the cursor's active set with the results of the SELECT statement in the
cursor's definition.
– The OPEN statement also positions the cursor pointer at the
first row.
• FETCH each row from the active set and load the data into
variables.
– After each FETCH, the EXIT WHEN checks to see if the FETCH
reached the end of the active set resulting in a data NOTFOUND
condition.
– If the end of the active set was reached, the LOOP is exited.
• CLOSE the cursor.
– The CLOSE statement releases the active set of rows.
– It is now possible to reopen the cursor to establish a fresh
active set using a new OPEN statement.
What Is a Procedure?
Differences Between
Procedures and Functions
Procedures
• You create a procedure to store a series of actions for later
execution.
• A procedure does not have to return a value.
• A procedure can call a function to assist with its actions.
• Note: A procedure containing a single OUT parameter
might be better rewritten as a function returning the
value.
Functions
• You create a function when you want to compute a value
that must be returned to the calling environment.
• Functions return only a single value, and the value is
returned through a RETURN statement.
• The functions used in SQL statements cannot use OUT or
IN OUT modes.
• Although a function using OUT can be invoked from a
PL/SQL procedure or anonymous block, it cannot be used
in SQL statements.
Removing Packages
What Is a Trigger?
• A database trigger:
• Is a PL/SQL block associated with a specific action (an event)
such as a successful logon by a user, or an action taken on a
database object such as a table or view
• Executes automatically whenever the
associated action occurs
• Is stored in the database
• In the example on the previous slide, the
trigger is associated with this action: UPDATE
OF salary ON employees
Types of Triggers
Triggers can be either row-level or statement-level.
• A row-level trigger fires once for each row affected by the
triggering statement
• A statement-level trigger fires once for the whole statement.
• Simple. All data warehouses share a basic design in which metadata, summary data,
and raw data are stored within the central repository of the warehouse. The
repository is fed by data sources on one end and accessed by end users for analysis,
reporting, and mining on the other end.
• Simple with a staging area. Operational data must be cleaned and processed before
being put in the warehouse. Although this can be done programmatically, many data
warehouses add a staging area for data before it enters the warehouse, to simplify
data preparation.
• Hub and spoke. Adding data marts between the central repository and end users
allows an organization to customize its data warehouse to serve various lines of
business. When the data is ready for use, it is moved to the appropriate data mart.
• Sandboxes. Sandboxes are private, secure, safe areas that allow companies to quickly
and informally explore new datasets or ways of analyzing data without having to
conform to or comply with the formal rules and protocol of the data warehouse.
Middle Tier
The Middle Tier consists of the OLAP Servers
OLAP is Online Analytical Processing Server
OLAP is used to provide information to business analysts and managers
Bottom Tier
The Bottom Tier mainly consists of the Data Sources, ETL Tool, and Data Warehouse.
2. Warehouse Manager
• Warehouse manager performs operations associated with the
management of the data in the warehouse.
• It performs operations like analysis of data to ensure consistency,
creation of indexes and views, generation of denormalization and
aggregations, transformation and merging of source data and
archiving and baking-up data.
3. Query Manager
• Query manager is also known as backend component.
• It performs all the operation operations related to the management of
user queries.
• The operations of this Data warehouse components are direct queries
to the appropriate tables for scheduling the execution of queries.
4. End-user Access Tools
• This is categorized into five different groups like:
o Data Reporting
o Query Tools
o Application development tools
o EIS tools
o OLAP tools and data mining tools.
Data Mart
• A data mart is a subset of the data warehouse. It is specially designed
for a particular line of business, such as sales, finance, sales or finance.
In an independent data mart, data can collect directly from sources.
• A data mart is a scaled-down version of a data warehouse aimed at
meeting the information needs of a homogeneous small group of end
users such as a department or business unit (marketing, finance,
logistics, or human resources). It typically contains some form of
aggregated data and is used as the primary source for report
generation and analysis by this end user group.
Schema
• A schema is a logical description that describes the entire database.
• In the data warehouse there includes the name and description of records.
• It has all data items and also different aggregates associated with the data.
Star Schema
• It is known as star schema as its structure resembles a star.
• The star schema is the simplest type of Data Warehouse schema.
• It is also known as Star Join Schema and is optimized for querying large data sets.
• The center of the star can have one fact table and a number of associated
dimension tables.
Fact Table
• A Fact table in a Data Warehouse system is nothing but the table that contains
all the facts or the business information, which can be subjected to analysis
and reporting activities when required.
• These tables hold fields that represent the direct facts, as well as the foreign
fields that are used to connect the fact table with other dimension tables in
the Data Warehouse system.
• A Data Warehouse system can have one or more fact tables, depending on the
model type used to design the Data Warehouse.
Dimension Table
• Dimension is a collection of reference information about a measurable in the
fact table.
• The primary key column of the dimension table has uniquely identifies each
dimension record or row.
• The dimension tables are organized has descriptive attributes. For example, a
customer dimension’s attributes could include first and last name, birth date,
gender, Qualification, Address etc.
Snowflake Schema
• A snowflake schema is an extension of star schema where the dimension
tables are connected to one or more dimensions.
• The dimension tables are normalized which splits data into additional tables.
• Snowflake schema keeps same fact table structure as star schema.
• In the dimension, it has multiple levels with multiple hierarchies. From each
hierarchy of levels any one level can be attached to Fact Table.
Galaxy Schema
An OLAP Cube is a data structure that allows fast analysis of data according to the multiple
Dimensions that define a business problem.
Drill-down – To perform the analysis in deeper among the dimensions of data. For example,
drilling down from “time period” to “years” and “months” and to “days” and so on to plot
sales growth for a product.
Slice – To perform the analysis take one level of information for display, such as “sales in
2019.”
Dice – To perform the analysis, select data from multiple dimensions to analyze, such as
“sales of Laptop in Region 4 in 2019.”
Pivot – To perform the analysis that can gain a new view of data by rotating the data axes
of the cube
Types of OLAP System
WOLAP Web OLAP which is OLAP system accessible via the web
browser. WOLAP is a three-tiered architecture. It consists of
three components: client, middleware, and a database
server.
Mobile OLAP Mobile OLAP helps users to access and analyze OLAP data
using their mobile devices
SOLAP SOLAP is created to facilitate management of both spatial
and non-spatial data in a Geographic Information system
(GIS)
What is ETL?
• ETL stands for Extract, Transform and Load.
• It is a process in data warehousing used to extract data from the
database or source systems and after transforming placing the
data into data warehouse. It is a combination of three database
functions i.e. Extract, Transform and Load.
ETL Process
• Step 1 - Extraction
All the preferred data from various source systems such as databases,
applications, and flat files is identified and extracted. Data extraction can be
completed by running jobs during non-business hours.
Step 2 - Transformation
Most of the extracted data can’t be directly loaded into the target system.
Based on the business rules, some transformations can be done before
loading the data.
The transformation process also corrects the data, removes any incorrect
data and fixes any errors in the data before loading it.
Step 3 - Loading
All the gathered information is loaded into the target Data Warehouse
tables.
Types of Loading:
Initial Load — populating all the Data Warehouse tables
Incremental Load — applying ongoing changes as when needed periodically.
Full Refresh —erasing the contents of one or more tables and reloading with
fresh data.
What is Data Mining?
It is basically the extraction of vital information/knowledge from a large set of data.
Fundamentally, data mining is about processing data and identifying patterns and trends in
that information so that you can decide or judge.
Data mining is also called as Knowledge discovery, Knowledge extraction, data/pattern
analysis, information harvesting, etc.
Anomaly Detection.
It is used to determine when something is noticeably different from the regular pattern. It is
used to eliminate any database inconsistencies or anomalies at the source.
Regression Analysis.
This technique is used to make predictions based on relationships within the data set.
Classification.
This deals with the things which have labels on it. Note in cluster detection, the things did
not have a label in it and by using data mining we had to label and form into clusters, but in
classification, there is information existing that can be easily classified using an algorithm.
Associative Learning.
It is used to analyze which things tend to occur together either in pairs or larger groups.
C4.5 constructs a classifier in the form of a decision tree. In order to do this, C4.5 is
given a set of data representing things that are already classified.
k-means creates k groups from a set of objects so that the members of a group are
more similar. It’s a popular cluster analysis technique for exploring a dataset.
Support vector machine (SVM) learns a hyperplane to classify data into 2 classes. At
a high-level, SVM performs a similar task like C4.5 except SVM doesn’t use decision
trees at all.
The Apriori algorithm learns association rules and is applied to a database containing a
large number of transactions.
CART stands for classification and regression trees. It is a decision tree learning technique
that outputs either classification or regression trees. Like C4.5, CART is a classifier.
CRISP-DM
• CRISP-DM stands for Cross Industry Standard Process for Data Mining.
• Is a 1996 methodology created to shape Data Mining projects. It consists of 6 steps to
conceive a Data Mining project and they can have cycle iterations according to developers’
needs.
Phases of CRISP-DM
1. Business Understanding
Focuses on understanding the project objectives and requirements from a
business perspective, and then converting this knowledge into a data mining
problem definition and a preliminary plan.
2. Data Understanding
Starts with an initial data collection and proceeds with activities in order to get
familiar with the data, to identify data quality problems, to discover first
insights into the data, or to detect interesting subsets to form hypotheses for
hidden information.
3. Data Preparation
The data preparation phase covers all activities to construct the final dataset
from the initial raw data.
4. Modeling
Modeling techniques are selected and applied. Since some techniques like
neural nets have specific requirements regarding the form of the data, there
can be a loop back here to data prep.
5. Evaluation
Once one or more models have been built that appear to have high quality based on
whichever loss functions have been selected, these need to be tested to ensure they
generalize against unseen data and that all key business issues have been sufficiently
considered. The end result is the selection of the champion model(s).
6. Deployment
Consists of presenting the results in a useful and understandable manner, and by
achieving this, the project should achieve its goals. It is the only step not belonging to
a cycle.