0% found this document useful (0 votes)
9 views14 pages

Document From Nashra

The document outlines different types of variables, including numerical (discrete and continuous) and categorical, as well as measurement scales such as nominal, ordinal, interval, and ratio. It discusses data collection methods, distinguishing between primary and secondary sources, and highlights the importance of proper survey design and potential errors in data. Additionally, it covers sampling techniques and classifications of data based on the type of data collected, such as cross-sectional, time series, and panel data.

Uploaded by

Nashra Ansari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

Document From Nashra

The document outlines different types of variables, including numerical (discrete and continuous) and categorical, as well as measurement scales such as nominal, ordinal, interval, and ratio. It discusses data collection methods, distinguishing between primary and secondary sources, and highlights the importance of proper survey design and potential errors in data. Additionally, it covers sampling techniques and classifications of data based on the type of data collected, such as cross-sectional, time series, and panel data.

Uploaded by

Nashra Ansari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Type of variables

• Numerical – variables whose data represent a counted or measured


quantity (e.g., monthly sales)
• Discrete – data arise from a counting process (e.g.,smartphones sold monthly)
• Continuous – data arise from a measuring process (e.g., waiting time)
• Categorical - variables whose data represent categories (e.g., gender)

• For some data, you might define a numerical variable for one
problem, and categorical for another. E.g., age
• Numerical
• Categories – children, young adults, middle aged, retirement age
Measurement scales
• Nominal
• simple codes assigned to objects as labels, which are not measurements. Eg.
Marital status: married(1), unmarried(2), divorced(3)
• Ordinal
• Assigned codes represent rank order. Eg. Credit score- low(1), medium(2), high(3)
• Interval
• variables that can be measured on interval scales. Eg. measurement is
temperature on the Celsius scale
• Ratio
• measurement is the estimation of the ratio between a magnitude of a continuous
quantity and a unit magnitude of the same kind. Eg. Height, weight,age
Data collection
• Improper data collection -> faculty statistical analysis
• Biases, errors
• Data is collected from
• Sample – contains only a portion of the population of interest (e.g., select 200
sales transactions)
• Data from sample may be less time consuming or cumbersome or more practical
• Statistic – summarizes the value of a specific variable for a sample
• Population – contains all the items or individuals of interest that you seek to
study (e.g., all sales transactions in a year)
• Parameter summarizes the value of a population for a specific variable
Data sources
• Primary sources
• Data collected directly from the field of enquiry for a specific purpose
• Primary data collection is a time-consuming process and costly
• E.g., data collected through a survey

• Secondary sources
• Republished information
• E.g., publications of state, central governments, international bodies (UNESCO etc. )
• Internal source
• Data obtained from within the organization and relates to organization’s operations
• External source
• Data obtained from outside the organization (e.g., data published in financial periodicals)
Data set
• A dataset contains some basic measurements of individual terms or
elementary units
• Based on number of variables a dataset can be classified as
• Univariate – one variable (e.g., income level)
• Bivariate – two variables (e.g., cost of production, number of units produced)
• Multivariate – atleast three varibles ( e.g., gender, age, experience, salary)
Data collection
• Primary data can be collected through
• Observation
• Person who collects the data asks no questions, but observes carefully the phenomenon and
records the essential data
• Devices including mechanical and electronic can be used
• Demerits – may be inaccurate, difficult for observer
• Questionnaire
• A questionnaire with relevant questions is designed
• Data can be collected through
• personal interaction – face to face, accurate and reliable , time consuming and costly
• Mail – to respondents address
• Telephonic interaction – somewhat accurate data
• A pilot study may be done to judge the quality and modification
• The units of measurement have to be clearly defined -> uniformity
Survey design
• Questionnaire design
• Efficient and imaginative questions
• Evoke a sense of satisfaction among the respondents
• Letter of introduction
• Specify purpose of study
• Assurance to respondents about confidentiality
• Motive respondents
• Number of questions
• Too many questions may lead to stress and strain the respondent
• Suggested questions are 20 to 50
• Subheading of questions may be created
• Structure of questions
• Simple, short, and easy to understand
• Nature of questions
• Sensitive, confidential questions may not be asked.
Survey design
• Sequence of questions
• Mixture of introductory, crucial, and light questions
• Proper sequence to ensure continuity of responses
• Questions of cross verification type
• Uniqueness
• The question should mean the same to each respondent
• Clear and unambiguous questions
• Marking for clarity
• May give some examples in the question for more clarity
• Pilot survey
• Pre-testing the questionnaire to rectify problems, inconsistencies, repetitions etc.
• Editing the primary data
• After data collection, it must be edited before analysis.
• Completeness, consistency, and accuracy of data must be ensured
Possible errors in secondary data
• Transcribing error
• Estimating error
• The data may have some statistical estimation analysis
• Errors due to bias
• Bias of the estimator for secondary data
• Secondary data may be verified before using it, as it may have issues
such as biasness, sample size low, computational errors, etc.
Secondary data users should
consider
• Complete history of data
• Methods used for data collection
• Time frame and area covered
• Source of reliability and authenticity of primary investigator
• Unitization of measurement of data collected
Methods of primary data collection
• Census or complete enumeration
• When the data is collected from each and every individual of population
• + information collected is more accurate
• - requires lots of time and money
• sampling
Types of sampling
• Non-probability - items or individuals are selected without knowing their
probability
• Judgement sample
• Collect opinions of preselected experts in a subject matter
• Convenience sample
• Items are easy, inexpensive and convenient
• Probability – items are selected based on known probability
• Simple random sample
• Systematic sample
• Stratified sample
• Cluster sample
Whenever possible, probability sample may be used as it allows to make inferences
about the population
Simple Taxonomy of Data
Data in
Analytics
Any Data that is not
Data in matrix form originally in matrix
with rows and columns form.

Unstructured
divides a variable into Structured Numeric values or Semi
specific groups. Eg. race, Data of variable. Eg. structured Data
sex, age group, and Age, distance,
educational level. temperature

Categorical Numerical Textual Multimedia XML/JSON

Nominal Ordinal Interval Ratio Image Audio Video

simple codes assigned to Assigned codes variables that can measurement is


objects as labels, which represent rank order. be measured on the estimation of
are not measurements. Eg. Credit score- interval scales. Eg. the ratio between
Eg. Marital status: low(1), medium(2), measurement is a magnitude of a
married(1), high(3) temperature on continuous
unmarried(2), the Celsius scale quantity and a
divorced(3) unit magnitude of
the same kind. Eg.
Height,
weight,age
Classification of data based on type of data
collected
• Data collected on • Data collected for a • Data collected on
many variables of single variable for several variables over
interest at the same several time intervals several time intervals
time or duration of (weekly, monthly • Also called
time etc.) longitudinal data

Cross- Time Series


Panel data
Sectional data data

Eg. Data on movies such as Eg. Demand for smart Eg. Data collected on
budget, collection, actors, phones collected monthly variables such as GDP, Gini
directors, genre during index and unemployment
2019 rate for several countries
for several years

You might also like