Data processing
Unit IV
Data
Data The word data is derived from Latin language. It is plural of Datum (But Data is usually used as a singular term.) Datum (singular) Data (plural). Data is any collection of facts of figures. The data is the raw material to be processed by a computer. Example Names of students, marks obtained in the examination, designation of employees, addresses, quantity, rate, sales figures or anything that is input to the computer is data. Even pictures, photographs, drawings, charts and maps can be treated as data. Computer processes the data and produces the output or result
Types of Data
Mainly Data is divided into two types: 1. Numeric Data 2. Character Data 1. Numeric Data The data which is represented in the form of numbers is known as Numeric Data. This includes 0-9 digits, a decimal point (.), +, /, sign and the letters E or D. 2. Character Data Character data falls into two groups. i. String Data ii. Graphical Data String Data String data consists of the sequence of characters. Characters may be English alphabets, numbers or space. The space, which separates two words, is also a character. The string data is further divided into two types. a. Alphabetic Data b. Alphanumeric Data Graphical Data It is possible that pictures, charts and maps can be treated as data. The scanner is normally used to enter this type of data. The common use of this data is found in the National Identity Card.
Information
A collection of data which conveys some meaningful idea is information. It may provide answers to questions like who, which, when, why, what, and how. or The raw input is data and it has no significance when it exists in that form. When data is collated or organized into something meaningful, it gains significance. This meaningful organization is information or Observations and recordings are done to obtain data, while analysis is done to obtain information
Data Processing
Data processing: Any operation or set of operations performed upon data, whether or not by automatic means, such as collection, recording, organization, storage, adaptation or alteration to convert it into useful information.
Data Processing Cycle
Once data is collected, it is processed to convert it into useful information. The data is processed again and again until the accurate result is achieved. This is called data processing cycle. The data processing is very important activity and involves very careful planning. Usually, data processing activity involves three basic activities. Input 2. Processing 3. Output
1.
1. Input
Data Processing Cycle Step-1
It is the process through which collected data is transformed into a form that computer can understand. It is very important step because correct output result totally depends on the input data. In input step, following activities can be performed. i) Verification The collected data is verified to determine whether it is correct as required. For example, the collected data of all B.Sc. students that appeared in final examination of the university is verified. If errors occur in collected data, data is corrected or it is collected again. ii) Coding The verified data is coded or converted into machine readable form so that it can be processed through computer. iii) Storing The data is stored on the secondary storage into a file. The stored data on the storage media will be given to the program as input for processing.
Data Processing Cycle Step-2 2.Processing
The term processing denotes the actual data manipulation techniques such as classifying, sorting, calculating, summarizing, comparing, etc. that convert data into information. i) Classification The data is classified into different groups and subgroups, so that each group or subgroup of data can be handled separately. ii) Storing The data is arranged into an order so that it can be accessed very quickly as and when required. iii) Calculations The arithmetic operations are performed on the numeric data to get the required results. For example, total marks of each student are calculated. iv) Summarizing The data is processed to represent it in a summarized form. ft means that the summary of data is prepared for top management. For example, the summary of the data of student is prepared to show the percentage of pass and fail student examination etc.
3. Output
Data Processing Cycle Step-3
After completing the processing step, output is generated. The main purpose of data processing is to get the required result. Mostly, the output is stored on the storage media for later user. In output step, following activities can be performed. i) Retrieval Output stored on the storage media can be retrieved at any time. For example, result of students is prepared and stored on the disk. This result can be retrieved when required for different purposes. ii) Conversion The generated output can be converted into different forms. For example, it can be represented into graphical form. iii) Communication The generated output is sent to different places. For example, weather forecast is prepared and. sent to different agencies and newspapers etc. where it is required.
Types of Data Processing
1. Manual Data Processing:
This method of data processing involves human intervention. The manual process of data entry implies many opportunities for errors, such as delays in data capture, as every single data field has to be keyed in manually, a high amount of operator misprints or typos, high labor costs from the amount of manual labor required. Manual processing also implies higher labor expenses in regards to spending for equipment and supplies, rent, etc.
Types of Data Processing
EDP
EDP (electronic data processing), an infrequently used term for what is today usually called "IS" (information services or systems) or "MIS" (management information services or systems), is the processing of data by a computer and its programs in an environment involving electronic communication. EDP evolved from "DP" (data processing), a term that was created when most computing input was physically put into the computer in punched card form or in ATM cards form and output as punched cards or paper reports.
Types of Data Processing
3.Real time processing In a real time processing, there is a continual input, process and output of data. Data has to be processed in a small stipulated time period (real time), otherwise it will create problems for the system. For example, when a bank customer withdraws a sum of money from his or her account it is vital that the transaction be processed and the account balance updated as soon as possible, allowing both the bank and customer to keep track of funds.
Types of Data Processing
4.Batch processing In a batch processing group of transactions collected over a period of time is collected, entered, processed and then the batch results are produced. Batch processing requires seperate programs for input, process and output. It is an efficient way of processing high volume of data. For example: Payroll system, Examination system and billing system.
Hypothesis Testing
Hypothesis Testing
Decision-making process Statistics used as a tool to assist with decision-making Scientific hypothesis is a statement of the predicted relationship amongst the variables Null hypothesis is a statement of no relationship amongst the variables
Null Hypothesis Not Rejected
Total Population Sample reared in sterile environment Sample reared in enriched environment
Null Hypothesis Rejected
Total population of rats reared in sterile environment Sample used in study Total population of rats reared in enriched environment Sample used in study
Hypothesis Testing In Experimental Studies
Your research design determines the kind of statistical test you will use. Experimental studies test hypotheses while quasi-experimental studies tend to focus more on generating hypotheses.
Research Designs/Approaches
Type Purpose Time frame Degree of control
High
Examples
Experimental
Test for cause/ current effect relationships Test for cause/ Current or past effect relationships without full control
Quasiexperimental
Comparing two types of treatments for anxiety. Moderate Gender to high differences in visual/spatial abilities
Research Designs/Approaches
Type Purpose Time frame
Current (crosssectional) or past Past & current
Degree of control
Low to medium
Examples
Nonexperime ntal correlational Ex post facto
Examine relationship between two variables Examine the effect of past event on current functioning.
Low to medium
Relationship between studying style and grade point average. Relationship between history of child abuse & depression.
Research Designs/Approaches
Type Purpose Time frame Degree of control
Low to moderate
Examples
Nonexperime ntal correlational Cohortsequential
Examine relat. Future betw. 2 var. predictive where 1 is measured later. Examine Future change in a var. over time in overlapping groups.
Low to moderate
Relat. betw. history of depression & development of cancer. How motherchild negativity changed over adolescence.
Research Designs/Approaches
Type Purpose Time frame
Current
Degree of control
None or low
Examples
Survey
Qualitative
Assess opinions or characteristics that exist at a given time. Discover potential relationships; descriptive.
Voting preferences before an election. Peoples experiences of quitting smoking.
Past or current
None or Low
Tests of Significance
The Question
Group Difference between means of 2 diff. groups Diff. betw. 2 means of related groups Diff. betw. means of 3 groups Group Relationships: betw. 2 variables Group Relationships: betw. 2 correlations
Null Hypothesis Statistical Test
H0: g1 = g2 t-independent
H0: g1a = g1b
t-dependent
H0: g1 = g2 = g3 ANOVA
H0: xy = 0
H0: ab = cd
t-test for sig. Of correlation t-test for sig. Of diff. betw. 2 corr.
Experimental Designs
Examines differences between experimentally manipulated groups or variables (e.g., one group gets a certain drug and the other gets a placebo). At minimum, experimental (independent) variable has two levels (e.g., drug vs. placebo).
Advantage is that you can determine causality. Disadvantage is cost and many variables cannot
be experimentally manipulated (e.g., smoke exposure over time).
Null Hypothesis Significance Testing
Null hypothesis
Results are due to chance H0
Alternative (scientific) hypothesis
Results are due to a true effect H1
Null Hypothesis Significance Testing
Null hypothesis
Results are due to chance (H0)
Alternative (scientific) hypothesis
Results are due to a true effect (H1)
Assess
Assuming H0 is true, what is the probability or
chance of obtaining the data we did?
Null Hypothesis Significance Testing
Null hypothesis
Results are due to chance (H0)
Alternative (scientific) hypothesis
Results are due to a true effect (H1)
Assess
Assuming H0 is true, what is the probability or
chance of obtaining the data we did?
Decide
If the chance is small enough, reject H0 and infer
the effect is real.
Experimental Designs: Hypothesis Testing
Type of Experimental Research Design
Between Subject Within Subject
Number of independent variables
Number of groups or levels of the independent variable
One independent variable
Two independent variables
Two groups or two levels of the independent variable
More than two groups or more then two levels of the independent variable
Two groups
More than two groups
Two-way ANOVA
Correlated t-tests
Repeated measures ANOVA
Independent samples t-tes
One-way ANOVA
Parametric Vs. Non-Parametric Statistics: Two-Sample Cases
Level of Related Samples measurement
Nominal Ordinal
McNemar
Independent Samples
Fisher
test
exact
X2
Sign
test
Interval
test Wilcoxon matchedpairs sign test T-test for matched pairs
Median
test Mann-Witney U test
T-independent test
Parametric Vs. Non-Parametric Statistics: > 2-Sample Cases
Level of Related Samples measurement
Nominal Ordinal Interval
Cochran
Independent Samples
X2
Q test 2-way measures
test one-
Friedman
Kruskal-Wallis
ANOVA
Repeated
way ANOVA
ANOVA
ANOVA
Parametric Vs. Non-Parametric Statistics: > 2-Sample Cases
Level of measurement
Nominal Ordinal Interval
Correlation
Contingency
coefficient
Spearman
rank correlation Kendall rank correlation, etc.
Pearsons
Correlation
Coefficient
Sampling Distribution of Mean Difference Scores
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Normal Curve
95% of all cases
99% of all cases
Critical Values of T
Need to determine the degrees of freedom
df = N-2
Need to determine the p value for rejecting the null hypothesis (alpha) Need to determine if this is a 1-tailed or 2tailed level of significance.
T-Values
T120 = 2.00, p < 0.05
What is one of the major criticisms of employing statistical tests of the null hypothesis to determine if effects are true?
Limitations of Statistical Tests of the Null Hypothesis
Does not take into account the size of the difference between means (effect size)
Analysis of Variance (ANOVA)
F-ratio = MSbet MSwithin Essentially is the between group variance divided by the within group variance. If the groups come from similar populations, the variances between the groups will be similar to the variance within groups (null hypothesis is not rejected).
ANOVA
Between group variance consists of:
Variability due to the effect of the independent
variable (treatment effect) Variability due to chance factors
Within group variance consists of:
Variability in data with the treatment groups that
is due to chance since if treatment effect was consistent, all subjects within a treatment group would experience similar magnitude of effect.
Analysis of Variance (ANOVA)
F-ratio = MSbet MSwithin The MS refers to the mean square and is the sums of squares divided by the appropriate degrees of freedom. Df for MSbet is the number of groups minus 1. Df for MSwithin is the total number of scores in the experiment minus the number of groups.
ANOVA
MSbet = treatment effect + chance variability MSwithin = chance variability
Ratio will be 1 if there is no treatment effect F(2,144) = 5.56, p < 0.05.
Two-Way ANOVA
Where you have 2 independent variables, each having at least 2 levels. For example,
Drug dose (none vs. 5 mg)
Delivery mood (intravenous vs. oral)
Factorial design so you can test both main effects and interaction effects
Mixed Model: 2 Between Subject Factors 1 within Subject Factor
Where you have 2 independent variables, each having at least 2 levels. For example,
Drug dose (none vs. 5 mg) Delivery mood (intravenous vs. oral)
One within subject factor with for example 3 levels
Pre-treatment, 3 and 6 months follow-up
Factorial design so you can test both main effects and interaction effects (3-way interaction effects)
Rejecting the Null Hypothesis
Null hypothesis can be rejected but not accepted Arguments made for allowing some flexibility in being able to conclude the null hypothesis is true;
No other studies of the phenomenon have
rejected the null hypothesis P value for the test of the null hypothesis is large (e.g., > .20 or .40). Research design is sufficiently powerful
Errors in Statistical Decision-Making
Type I error falsely reject the null hypothesis
At p < .05 there is a 5% chance (5 in 100) of
falsely rejecting null hypothesis
Type II error failing to reject the null hypothesis when it is false
External Validity
Chapter 14
Goals of Psychology Research
Goal is to understand the underlying laws governing the behaviour of organisms. The extent to which the results of your study help inform one about these underlying laws, the more valuable the findings. Limits to the importance of the findings are the internal/external validity.
External Validity
Extent to which the results of the study can be generalized across different persons, settings, and times. Typically think of generalizing to specific populations (e.g., North American elementary school students) than world at large. Best safeguard is random selection but not usually feasible.
Threats to External Validity
Lack of population validity Lack of ecological validity Lack of time validity
Population Validity
Generalizing to the defined population (i.e., target population) from which the sample was drawn. Sample is the experimentally accessible population.
Population Validity
Target Population
Experimentally accessible population
Sample
Population Validity
Threatened by a selection by treatment interaction:
Treatment results may not be exactly
reproducible in target population.
Even willingness to volunteer for studies have been shown to result in a selection by treatment interaction effect.
Ecological Validity
Extent to which the results can be generalized across settings or environmental conditions.
E.g., Would the treatment effect observed in
patients recruited from a 1st class medical centre be the same as the the treatment effect observed in patients recruited from a local community hospital?
Ecological Validity
Multiple-Treatment Interference
Sequencing effect whereby exposure to one
treatment influences responses to another treatment; or Exposure to one experiment influences response in another experiment (e.g., sophisticated participants).
Ecological Validity
Hawthorne Effect
Knowing one is in a study can affect ones
behaviour Participant bias effects (e.g., social acceptability, compliance)
Novelty or Disruption Effect
Effects are simply due to novelty and wear off
once novelty diminishes.
Ecological Validity
Experimenter Effect
Enthusiastic experimenter/clinician may get
different effects than a clinician who is implementing the treatment in routine care.
Pre-testing Effect
Administering a pre-test may sensitive the
participant in such a way that he/she may respond differently to the experiment than what would have occurred without a pre-test.
Temporal Validity
Extent to which the results would generalize to other times
Results might vary depending on the time
elapsed between presentation of the independent variable and the measurement of the dependent variable.
Temporal Validity
Seasonal Variation
Variation that appears regularly over time (e.g.,
change in traffic accident rates between daylight savings time and non-daylight savings time). Fixed-time variation variation at specific, predictable time points Variable-time variation dont know when variation will occur but when it occurs, there are predictable responses.
Temporal Validity
Cyclical Variation
Predictable variation within people or other
organisms
Personological Variation
Variation in the characteristics of the individual
over time
Internal Vs. External Validity
Tends to be an inverse relationship
Internal validity ; external validity
In testing for between group differences, you want to minimize within group variability and maximize between group differences To do so you want to ensure high control over factors that could confound the results but this often results in increasingly artificial experimental conditions.
When Is External Validity Less Important
When you dont need to demonstrate that X will happen but rather X can happen. Sometimes the main goal is to test a theory and extent to which it reflects real-life is less important.