Definitions: Data Is A Collection of Any Number of Related Observation
Definitions: Data Is A Collection of Any Number of Related Observation
Definitions: Data Is A Collection of Any Number of Related Observation
Data is a collection of any number of related observation. This can be: number of classes per day for a MBA student over a week; number of cars produced per shift by an automobile major in a month; Stock price of a listed organization over a quarter. The observations can be collected through: questionnaire and/or telephone surveys, content analysis of newspaper and magazines, planned experiments, direct observations in natural settings, etc. A collection of data is called data-set, and a single observation is called a data-point. 1
Definitions
A population is a collection of all the elements one wants to study and about which one is trying to draw conclusions. A sample is a collection of some, but not all, of the elements of a population. Consider a beauty soap which is targeted at middleclass women customer aged between 18 and 45 years, The population is the entire set of middle-class females of age between 18 45. But you need to be careful about definition of middle-class. Clearly, a school girl is not a member of the population. Sample is any subset of the above set.
2
Quick check
IITM surveyed its students to determine average weekly time spent surfing the internet. From the 174 students surveyed, the average time was found to be 23.1 hours. What is the population? What is the sample?
Quick check
One airline claims that less than 1% of its scheduled flights out of Mumbai airport depart late. Data from 20 flights showed that 1.5% flights departed later than the scheduled time. What is the population? What is the sample?
Example
State of Tamil Nadu (TN) plans to introduce a language course on Hindi in primary schools. The government wants to know the percentage of the TN residents who favor the plan. Population of TN of interest is the collection of more than 62 million residents of TN (2001 census). Impossibility of contacting everyone in the state, requires us to conduct a poll of, say, 1000 (sample) residents of TN. Suppose in that poll 46% of the sampled subjects said that they favored introduction of Hindi in school curriculum (descriptive statistics). But, we are not just interested in what 1000 residents are saying!
5
Example
Based on the sample data we have, can we draw inferences about the all the TN residents? Yes, through techniques of inferential statistics! Inferential statistics method can say that the population percentage favoring Hindi falls between 42% to 49%. Even though, the sample is very small compared to the population, we can still conclude, for instance, that probably there is only minority support for Hindi.
Definitions
A parameter is a numerical summary of the population. A statistic is a numerical summary of the sample data. From the above example, the percentage of resident (out of the 1000 sampled) favoring Hindi, is a statistic. Since we havent asked every resident of TN, population parameter is unknown. Through inferential statistics, using known sample statistics, we are trying to make inferences about the population parameters.
7
Definitions
Important consideration is the accuracy with which we can infer from sample statistics about the population parameters. How close the sample value of 46% is likely to be to the true (unknown) percentage of the population favoring Hindi? Conceptual population AutoDrive (magazine) evaluates gas mileage for a new model by observing the average kmpl for five sample cars driven on a standardized 100-km course. Inferences could be made about mileage on this course for all the cars of this model that could be manufactured.
8