Overview and Nature of Data
Overview and Nature of Data
OVERVIEW
Specific numbers
Method of analysis
STATISTICS
STATISTICS
Example: Twenty-three percent of people surveyed believed that learning statistics is difficult.
STATISTICS
Specific number
Example: Her vital statistics are: 34-24-34
STATISTICS
Method of analysis : a collection of methods for (1) planning experiments, (2) obtaining data, and then (3) organizing, (4) summarizing, (5) presenting, (6) analyzing, (7) interpreting, and (8) drawing conclusions based on the data
STATISTICS
(Collection, Organization, Summary, Presentation, Analysis and Interpretation of Data)
INFERENTIAL -is a scientific discipline concerned with developing and using mathematical tools to make forecasts and inferences. Basic to the development and understanding of inferential/inductive statistics are the concepts of probability theory.
DEFINITIONS
QUALITATIVE (CATEGORICAL) OR DATA (ATTRIBUTE) consist of attributes, labels, or non-numerical entries
QUANTITATIVE
(NUMERICAL)
The suggested retail prices are numerical entries, so these are quantitative data.
The information shown in the table can be separated into two data sets. One data set contains the names of vehicle models, and the other contains the suggested retail prices of vehicle models.
Levels of Measurements
Another common way of classifying data is to use four levels of measurement: nominal, ordinal, interval, and ratio. In applying statistics to real problems, the level of measurement of the data is an important factor in determining which procedure to use. Never do computations and never use statistical methods with data that are NOT appropriate.
For example, it would not make sense to compute an average of social security numbers, because those numbers are data that are used for identification; they dont represent measurements or counts of anything.
TYPES OF DATA
Qualitative Categorical or Attribute data can be separated into different categories that are distinguished by some nonnumeric characteristic
NOMINAL ORDINAL
RATIO
DEFINITIONS
nominal level of measurement characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (such as low to high) Data at the nominal level of measurement are qualitative only. Data at this level are categorized using names, labels, or qualities. No mathematical computations can be made at this level. Example: survey responses yes, no, undecided Gender (male or female)
DEFINITIONS
ordinal level of measurement involves data that may be arranged in some order, but differences between data values either cannot be determined or are meaningless Data at the ordinal level of measurement are qualitative or quantitative. Data at this level can be arranged in order, or ranked, but differences between data entries are not meaningful.
Two data sets are shown. Which data set consists of data at the nominal level? Which data set consists of data at the ordinal level?
Data 1 Top 5 Grossing Movies of 2012 1. 2. 3. 4. 5. Marvels The Avengers The Dark Knight Rises The Hunger Games Skyfall The Twilight Saga: Breaking Dawn Part 2
Data 2 VHF Philippine Television Stations 1. 2. 3. 4. 5. ABS-CBN TV-2 Peoples Television Network TV-4 ABC Development Corp. TV-5 GMA Network Inc. TV-7 Radio Philippine Network and Solar Entertainment TV-9 6. Intercontinental Broadcasting Corporation TV-13
Example: In 2012, Business Insider chose the 50 best business schools in the world. Business Insider based it on an extensive survey of over one thousand professionals (n > 1000) of which 87 percent had attended business school and 71 percent had hiring experience. Business Insider Worlds Top 5 Business Schools 1. 2. 3. 4. 5. Stanford University Harvard University University of Pennsylvania (Wharton) Massachusetts Institute of Technology (Sloan) London Business School
Example: Mathematics students can take elective courses that range from linear algebra to math structures for computer science. Graduates can find jobs in statistics, teaching, and many other professional fields. These are the world's best universities for mathematics
US News Worlds Best Mathematics Universities Massachusetts Institute of Technology (MIT) Harvard University Stanford University Princeton University University of California Los Angeles (UCLA) University of California Berkeley (UCB) New York University Yale University University of Cambridge University of Oxford
What is the level of measurement?
DEFINITIONS: The two highest levels of measurement consist of quantitative data only.
Data at the interval level of measurement can be ordered, and meaningful differences between data entries can be calculated. At the interval level, a zero entry simply represents a position on a scale; the entry is not a natural or inherent zero.
Years 1000, 2000, 1776, and 1492 1. Category: [11th , 21st , 18th , 15th Centuries] 2. Rank/Order: [1000, 1492, 1776, 2000] 3. Difference between two values can be calculated: 1000 1492 = 492 (year 1000 is 492 years earlier than year 1492) 2000 1776 = 224 (year 2000 is 224 years later than year 1776)
Interval Data
1. Category 2. Rank 3. Difference between 2 values can be calculated 4. No inherent zero (Zero is not a starting point)
DEFINITIONS: Ratio
Data at the ratio level of measurement are similar to data at the interval level, with the added property that a zero entry is an inherent or natural zero. A ratio of two data values can be formed so that one data value can be meaningfully expressed as a multiple of another.
DEFINITIONS: Ratio
Example: Prices of college textbooks in Pesos: 100, 400, 650 and 825
1. Category: [Php100, Php400, Php600 and Php800 Books 2. Rank/Order: Cheapest to Most expensive 3. Difference between two values: 400 100 = 300 (a 400-peso book is 300 pesos more expensive than a 100-peso book) 400 650 = 250 (a 400-peso book is 250 pesos cheaper than a 650peso Book 4. Inherent Zero (Zero as starting point) 400 100 = 4 (A 400-peso book is 4 times more expensive than a 100peso book
DEFINITIONS: Ratio
Examples: (ratio data) Height: A person whose height of 66 inches tall is twice as tall than a child whose height is 33 inches tall.
Weight: A person who weighs 60 kilos is three times as heavy as a child who weighs 20 kilos.
An inherent or natural zero is a zero that implies none. For instance, the amount of money you have in a savings account could be zero baht. In this case, the zero represents no money; it is an inherent or natural zero.
On the other hand, a temperature of 0oC does not represent a condition in which no heat is present. The temperature is simply a position on the Celsius scale; it is not an inherent or natural zero.
DEFINITIONS: The two highest levels of measurement consist of quantitative data only.
To distinguish between data at the interval level and at the ratio level, determine whether the expression twice as much has any meaning in the context of the data. For example, 2 pesos is twice as much as 1 peso, so these data are at the ratio level. On the other hand, 20oC is not twice as warm as 10oC so these data are at the interval level.
The following tables summarize which operations are meaningful at each of the four levels of measurement. When identifying a data sets level of measurement, use the highest level that applies.
Level of Put data in Measurement Categories
Nominal Ordinal Interval Ratio YES YES YES YES
July 73.4 August 71.7 September 62.4 October 51.0 November 37.5 December 30.3
July 7.2 August 6.3 September 5.8 October 2.7 November 2.3 December 2.3
DEFINITIONS
Population (N): the complete collection of all elements (scores, people, measurements, and so on) to be studied. The collection is complete that it includes all subjects to be studied.
Target Population The whole group of study units which we are interested in applying our inferences or conclusions
DEFINITIONS
A population can be finite (Countable)) or infinite (Cannot be Counted) and is made up of study units Unfortunately the target population is not always readily accessible, and we can study only that part of it that is available.
There are many ways to collect information about the study population. One way is to conduct a sample. A sample (n)is a sub-collection of members selected from a population.
DEFINITIONS
DEFINITIONS
The whole pizza represents a POPULATION.
DEFINITIONS
The whole pizza represents a POPULATION.
DEFINITIONS
Example: A fisheries researcher is interested in the behaviour pattern of a crab along the coast of the Lingayen Gulf. It would be unthinkable and impossible to investigate every crab individually. The only way to make any kind of educated guess about their behaviour would be by examining a small subcollection, that is, a sample.
Example: Suppose a machine has produced 10,000 electric bulbs and we are interested in getting some idea about how long the bulbs will last. It would not be practical to test all the bulbs, because the bulbs that are tested will never reach the market. So we might pick 50 of these bulbs to test. Our interest is in learning about the 10,000 bulbs and we study 50. The 10,000 bulbs constitute the population and the 50 bulbs a sample.