0% found this document useful (0 votes)
25 views6 pages

Chapter 2

A data warehouse is a centralized system for storing and managing large volumes of structured and unstructured data from various sources, designed to support business decision-making through historical data analysis. It categorizes data into structured and unstructured types, and further into quantitative and categorical variables, with analysis types including univariate, bivariate, and multivariate data. Additionally, it discusses measurement scales such as nominal, ordinal, interval, and ratio, each with specific properties for data classification.

Uploaded by

iqra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views6 pages

Chapter 2

A data warehouse is a centralized system for storing and managing large volumes of structured and unstructured data from various sources, designed to support business decision-making through historical data analysis. It categorizes data into structured and unstructured types, and further into quantitative and categorical variables, with analysis types including univariate, bivariate, and multivariate data. Additionally, it discusses measurement scales such as nominal, ordinal, interval, and ratio, each with specific properties for data classification.

Uploaded by

iqra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Warehouse:

A data warehouse is a centralized system used for storing and managing large volumes of data
from various sources. It is designed to help businesses analyze historical data and make informed
decisions. Data from different operational systems is collected, cleaned, and stored in a structured
way, enabling efficient querying and reporting.
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data
in support of management's decision making process.
Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example,
"sales" can be a particular subject.
Integrated: A data warehouse integrates data from multiple data sources. For example, source A
and source B may have different ways of identifying a product, but in a data warehouse, there will
be only a single way of identifying a product.
Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data
from 3 months, 6 months, 12 months, or even older data from a data warehouse.For example, a
transaction system may hold the most recent address of a customer, where a data warehouse can
hold all addresses associated with a customer.
Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data
warehouse should never be altered.

Why data is important ?


 Data helps in make better decisions.
 Data helps in solve problems by finding the reason for underperformance.
 Data helps one to evaluate the performance.
 Data helps one improve processes.

REPRESENTATION OF RAW DATA

Categories of Data
 Data can be catogeries into two main parts –
 Structured Data: This type of data is organized data into specific format, making it easy to
search , analyze and process. Structured data is found in a relational databases that includes
information like numbers, data and categories.
 UnStructured Data: Unstructured data does not conform to a specific structure or format. It
may include some text documents , images, videos, and other data that is not easily organized
or analyzed without additional processing.

Quantitative Variable

Numerical Data: Numerical data can further be classified into two categories:
 Discrete Data: Discrete data contains the data which have discrete numerical values for
example Number of Children, age,etc.
 Continuous Data: Continuous data contains the data which have continuous numerical
values for example Weight, Voltage, height, etc.
Numeric values include real-value variables or integer variables such as age, speed, or length. A
feature with numeric values has two important properties: its values have an order relation (2 < 5
and 5 < 7) and a distance relation (d(2.3, 4.2) = 1.9).

2. Categorical Data: In categorical data we see the data which have a defined category, for
example:
 Marital Status
 Political Party
 Eye colour
 Country of Citizenship

Categorical (often called symbolic) variables have neither of these two relations. The two values
of a categorical variable can be either equal or not equal: they only support an equality relation
(Blue = Blue or Red Black).
A categorical variable with two values can be converted, in principle, to a numeric binary variable
with two values: 0 or 1.A categorical variable with N values can be converted into N binary
numeric variables, namely, one binary variable for each categorical value. These coded categorical
variables are known as “dummy variables” in statistics. For example, if the variable eye color has
four values, namely, black, blue, green, and brown, they can be coded with four binary digits.

Univariate data:
Univariate data refers to a type of data in which each observation or data point corresponds to a
single variable. In other words, it involves the measurement or observation of a single
characteristic or attribute for each individual or item in the dataset.

Analyzing univariate data is the simplest form of analysis in statistics.


Heights (in cm) 164 167.3 170 174.2 178 180 186
Suppose that the heights of seven students in a class is recorded (above table). There is only one
variable, which is height, and it is not dealing with any cause or relationship.
Bivariate data
Bivariate data involves two different variables, and the analysis of this type of data focuses on
understanding the relationship or association between these two variables. Example of bivariate
data can be temperature and ice cream sales in summer season.
Temperature Ice Cream Sales

20 2000

25 2500

35 5000

Suppose the temperature and ice cream sales are the two variables of a bivariate data. Here, the
relationship is visible from the table that temperature and sales are directly proportional to each
other and thus related because as the temperature increases, the sales also increase.
Multivariate data
Multivariate data refers to datasets where each observation or sample point consists of multiple
variables or features. These variables can represent different aspects, characteristics, or
measurements related to the observed phenomenon. When dealing with three or more variables,
the data is specifically categorized as multivariate.
Example of this type of data is suppose an advertiser wants to compare the popularity of four
advertisements on a website.
Advertisement Gender Click rate

Ad1 Male 80

Ad3 Female 55

Ad2 Female 123

Ad1 Male 66

Ad3 Male 35

The click rates could be measured for both men and women and relationships between variables
can then be examined. It is similar to bivariate but contains more than one dependent variable.

Univariate Bivariate Multivariate

It only summarize single It only summarize two It only summarize more than
variable at a time. variables 2 variables.
It is similar to bivariate but it
It does not contain any It does contain only one
contains more than 2
dependent variable. dependent variable.
variables.

Time-Dependent Data
Time-dependent data, also known as temporal data, is data that changes over time or has a specific
time reference. For example, a customer's address, a product's price, or a stock's value are all
temporal data.

A special class of discrete variables is periodic variables. A periodic variable is a feature for
which the distance relation exists but there is no fixed order relation. Examples are days of the
week, days of the month, or year. Monday and Tuesday, as the values of a feature, are closer than
Monday and Thursday, but Monday can come before or after Friday.
Scale of Measurement
A scale is a device or an object used to measure or quantify any event or another object.The
variables or numbers are defined and categorised using different scales of measurements.
Each level of measurement scale has specific properties that determine the various use of
statistical analysis.

Nominal Scale
A nominal scale is the 1st level of measurement scale in which the numbers serve as “tags” or
“labels” to classify or identify the objects. A nominal scale usually deals with the non-numeric
variables.A nominal scale is an orderless scale, which uses different symbols, characters, and
numbers to represent the different states (values) of the variable being measured. These values can
be coded alphabetically as A, B, and C or numerically as 1, 2, or 3.

Example:

Some of the situations where nominal measurement scale can be used are given below:

 Study to find the country of birth of people in a town


 In collecting data on the eye color of people
 Classifying people into categories like male/female, working-class population/unemployed,
vaccinated/unvaccinated people, etc.
 Gender
 Marital Status
 College Major

Some of the properties of the nominal scale of measurement are given below:
 It can categorize variables but does not put them in any order(No Ranking).
 It does not show any numerical value.
 It is used for qualitative data.
Ordinal Scale
The ordinal scale is the 2nd level of measurement that reports the ordering and ranking of data
without establishing the degree of variation between them. Ordinal represents the “order.” Ordinal
data is known as qualitative data or categorical data. It can be grouped, named and also ranked.

 AGE (with values young, middle-aged, and old)


 INCOME (with values low, middle-class, upper-middle-class, and rich).

Interval Scale
The interval scale is the 3rd level of the measurement scale. It is defined as a quantitative
measurement scale in which the difference between the two variables is meaningful.
The zero point in the interval scale is placed arbitrarily (not a true meaningful zero point), and
thus it does not indicate the complete absence of whatever is being measured.
Example
The classic example of an interval scale is Celsius temperature because the difference between
each value is the same. For example, the difference between 60 and 50 degrees is a measurable
10 degrees, as is the difference between 80 and 70 degrees.
The best example of the interval scale is the temperature scale, where 0 F does not mean a total
absence of temperature.

Ratio Scale

The ratio scale is the most comprehensive scale among others. It includes the properties of all the
above three scales of measurement. The unique feature of the ratio scale of measurement is that it
considers the absolute value of zero, which was not the case in the interval scale. When we measure
the height of the people, 0 inches or 0 cm means that the person does not exist.

Examples: Quantities such as height, length, and salary, Expense uses this type of scale.

Continue….

You might also like