Data Management
Data Management
Introduction
Data management is a process by which information is acquired and processed to
ensure the accessibility and reliability of the data for its users. One of the most important tool
in processing and managing such information is statistics. Statistics is utilized in most areas of
human endeavor. It is usually used in education, research, business, agriculture, and other
fields and even in everyday life activities.
Data or the pieces of information may be collected by conducting a survey, interview,
observation, and experiment. The data gathered can be properly organized and presented
graphically by a line graph, bar graph or pictograph or with the aid of a statistical table known
as frequency distribution table (FDT). A concise and meaningful conclusion is obtained from the
analysis and interpretation of data. Relevant information can be deduced from the analysis of
numerical descriptions and predictions may be made based on a small group to project the
whole population. The work of statistics offers a wide area of concern. Thus, statistics is
subdivided into two branches, namely: descriptive statistics and inferential statistics.
Statistics is a science which deals with the collection, organization, presentation, analysis, and
interpretation of data so as to give a more meaningful information.
Descriptive statistics refers to the collection, organization, summary, and presentation of data
while inferential statistics deals with the interpretation and analysis of data where conclusion is
drawn based from the subset of the population.
In descriptive statistics, a set of data is simply described without drawing any inferences
or implications. The data is merely summarized and discussed in a clear, concise and
informative manner. In inferential statistics, information or inferences concerning a large group
known as population is provided based on the study of a representative group or selected
members in the population which are identified as sample. Calculating the average rating of a
class of 40 students in Math 01illustrates the descriptive statistics while determining the
performance of the same class based on the performance of 10 randomly selected members in
the class exhibits inferential statistics.
BASIC TERMS
Some of the basic terminologies and notations involved in statistics are the following:
a. Population - a collection or set of things or objects under consideration
b. Sample - a subset or representative group of the population
c. Data - refers to the information gathered in a research Statistical data are classified
according to their sources, namely: primary data or secondary data.
Primary data – information gathered from respondents by the researcher himself.
Secondary data – information obtained from published materials or data gathered by
other individuals or agencies. These are the data which are transcribed from original
sources.
d. Array – listing of observations which are arranged in an increasing or decreasing
magnitude
e. Parameter - a value which is computed from a population
f. Statistic – a value which is computed from a sample
g. Variable – a characteristic of interest that has been observed or measured on every
member of the population or sample. A variable may be quantitative or qualitative
where quantitative variable is further classified as discrete or continuous.
i. Quantitative/Numerical variable – describes the amount or number of an
element of a sample or population
Discrete – takes on a countable amount (it is usually expressed as whole
number) Example: number of books owned by a student
Continuous – measured in a continuous scale (it takes any value within a
range or interval) Example: height of the students (in feet)
ii. ii. Qualitative/Categorical variable – describes the quality, category, or character
of an element of a population or sample Examples: gender (male or female) hair
color (black, brown, blonde) level of satisfaction of a student on his grade (highly
satisfied, satisfied, not satisfied)
Levels of Measurement
A more detailed distinction, termed as the levels of measurement, is used by some researchers
in examining the information that is collected. It is classified as follows:
1. Nominal Measurement - numbers or symbols are used to code or classify each element
in the population. Note that the assigned numbers have no numerical meaning.
Examples: gender, educational background, employment status
2. Ordinal Measurement– uses numerical category that expresses the meaningful order.
There is no indication of distance between positions. The numbers become meaningful
because they reveal whether one class or category is more or less than the other.
Categories are ranked according to the order of their value on the property like first,
second, third; oldest, next oldest, youngest. Example: rank in beauty contest
3. Interval Measurement– has equal intervals. There is significance to the distance
between any two values. It tells us that one unit differs by a certain amount of the
property from another unit. It has no absolute zero. Example: Aptitude test,
temperature
4. Ratio Measurement – A variable measured at this level not only includes the concepts
of order and interval, but also includes the idea of ’nothingness’, or absolute zero.
Example: Measurement of height, weight, ages
Remark: The scale of measurement depends mainly on the method of measurements and not
on the property being measured. For instance, the weight of a pack of milk measured in
kilograms has an interval scale but if the boxes are labelled as one of small, medium or large,
the weight is measured in ordinal scale
Measure of Central Tendency
One way of summarizing the data is to figure out the data set by using the descriptive
measures. Among the most commonly used descriptive measures which are important are the
measures of central tendency and measures of dispersion.
A measure of central tendency (or central location) is a single value that is used to identify the
“center” of the data set or set of observations.
The three measures of central tendency are the mean, median and mode
The mean also known as the arithmetic average is the sum of all the observed values divided by
the number of observations in the data set. It can be computed as 𝜇 = 𝑋𝑖 𝑛 𝑖=1 𝑛 where 𝑥𝑖 is the
𝑖 𝑡ℎ observation and 𝑛 is the number of observations in the data set.
The mean of the population is symbolized by the lowercase letter “mu” in Greek alphabet,µ,
while the mean of the sample is represented by x̄ (x – bar).
Example 1:
The scores of five students who are selected randomly in a class of Math 01 are as follows: 44,
37, 41, 35 and 32. Find their average score.
Solution:
Example 2: If the final examination of a class in statistics is given the weight 2, the average
quizzes the weight 3, and a project report the weight 1, what would be the mean grade of a
student who got the grades 90, 85 and 87, respectively.
Solution:
In terms of measure of central tendency, each student performs equally since they have same
average rating of 80%. However, looking at the variability of their ratings, Student A has the
highest range as compared to the other students. This shows that scores of student A are more
dispersed than the other. The rating of Student A is fluctuating while that of Student B is
uniformly distributed. On the other hand, Student C has range equal to zero so his ratings are
all concentrated at its mean indicating that the distribution has no spread.
Example 2.
The average daily allowances (in pesos) of 12 college students studying at University Y are 112,
127, 118, 147.5, 165.5, 99.75, 150, 145, 145, 102, 136.25 and 113. Find the range.
Solution:
Remarks:
1. The larger the value of the range, the more dispersed the observations are.
2. The range considers only the extreme values or observations in the data set.
A more reliable measure in describing the spread of a set of observations is the standard
deviation. Most researches uses this measure in the treatment of data. The computation
includes all the values in the data set.
The standard deviation is the positive square root of the variance. The variance is the average
of the squared deviations of every observation from the mean
The standard deviation and variance can be obtained from a population and a sample but most
its applications utilizes the sample rather than the population due to the complete enumeration
of the latter. The unit of the variance is squared unit while that of the standard deviation is the
same as the unit of the data set. The following symbols are used to designate these measures
to a population and sample.
The variance and standard deviation of a population are calculated by using the formulas
below.
78 -2 4
The result indicates that on the average, the percentage scores of the student tends to deviate
from the mean by an amount of 6.23 units.
Example 2: The following data were obtained by sampling on a population. 10 12 14 15 17 18
18 24 Find the variance and the standard deviation of the sample.
Solution:
10 -6 36
12 -4 16
14 -2 4
15 -1 1
17 1 1
18 2 4
18 2 4
24 8 64
Total 128 130
The variance is 18.57 while the standard deviation is approximately 4.31. What can you infer
from this? Remarks: A large amount of standard deviation indicates that, on the average, the
data values will be far from the mean while the standard deviation of smaller amount shows
that, on the average, the data values will be close to the mean.