Vivian 2nd Assignment
Vivian 2nd Assignment
13 1 16 19 26 20 14 21 15 18 22 76
1, 13, 14, 15, 16, 18, 19, 20, 21, 22, 26, 76
Number of values
=261
12
=21.75
The median is the middle value of the sorted data. Since there are 12 values (an even number),
the median will be the average of the two middle numbers.
The two middle numbers in the sorted list are the 6th and 7th values:
6th value: 18
7th value: 19
Median =18+19
2
= 37
=18.5
The mode is the value that appears most frequently. In this data set there is no mode
b) Which measures of central tendency would you recommend as being appropriate to describe
data? Give justification of your answers (4).
When selecting a measure of central tendency to describe data, I recommend using the median.
This recommendation is based on several compelling justifications that highlight the median's
robustness and reliability in various contexts.
First and foremost, the median is less influenced by outliers and skewed distributions compared
to the mean. In many real-world datasets, particularly those involving income or age, extreme
values can significantly distort the mean, leading to a misrepresentation of the typical value. For
instance, consider a dataset of household incomes where most families earn between $10,000
and $30,000, but a few families earn several million dollars. The mean income would be
artificially inflated by these high earners, suggesting a higher standard of living than what most
families’ experience. In contrast, the median would accurately reflect the central tendency by
identifying the middle value of the income distribution, providing a clearer picture of the
economic reality for the majority.
Secondly, the median is particularly beneficial when dealing with non-normally distributed data.
Many datasets in fields such as social sciences and economics do not follow a normal
distribution, often exhibiting skewness. In these cases, the mean may not represent the data
effectively, whereas the median remains a stable measure of central tendency. By focusing on the
median, analysts can avoid the pitfalls of using a measure that may not be appropriate for skewed
distributions, ensuring that their interpretations and conclusions are grounded in a more accurate
representation of the data.
Lastly, the median is applicable to ordinal data, where the mean may not be valid. In cases where
data can be ranked but not measured on a numerical scale (such as survey responses on a Likert
scale), the median provides a meaningful way to summarize central tendencies without assuming
equal intervals between points. This flexibility allows researchers to derive central tendencies
from a wider range of data types, further emphasizing the median's versatility.
c) Describe the advantages and disadvantages of the different measures of location (mode,
median, mean) which will help to decide the suitable average for a given situation (13).
Measures of central tendency provides a single value that indicates the general magnitude of the
data and this single value provides information about the characteristics of the data by
identifying the value at or near the central location of the data (Bordens and Abbott, 2011). King
and Minium (2013) described measures of central tendency as a summary figure that helps in
describing a central location for a certain group of scores. Each measure has its own advantages
and disadvantages, which should be carefully evaluated based on the data's characteristics.
Understanding these measures is essential for informed decision-making across various fields,
including statistics, economics, and social sciences.
According to Tate (1995) the mean, often called the arithmetic average, is a widely recognized
measure of central tendency. Mean is a total of all the scores in data divided by the total number
of scores. For example, if there are 100 students in a class and we want to find mean or average
marks obtained by them in a psychology test, we will add all their marks and divide by 100, (that
is the number of students) to obtain mean. Its main advantage is that it takes into account all
values in a dataset, providing a balanced view of the overall trend. This makes the mean
particularly effective for normally distributed data, where values cluster symmetrically around a
central point. Additionally, the mean has useful mathematical properties that facilitate further
statistical analysis, such as calculating variance and standard deviation.
However, the mean has notable drawbacks, particularly its sensitivity to outliers. Extreme values
can skew the mean, resulting in a distorted representation of the dataset, especially in skewed
distributions. For example, when analyzing income data, a few individuals with very high
earnings can inflate the mean, making it unrepresentative of the majority. Mangal (1992) noted
that the mean is not suitable for nominal or ordinal data, as it relies on numerical values. In cases
of non-normally distributed data or significant outliers, the mean may not be a reliable measure
of central tendency.
The median is defined as the middle value in a dataset arranged in order. Its key strength lies in
its robustness; it is unaffected by outliers and skewed data, making it an ideal choice for datasets
that do not follow a normal distribution. The median effectively represents the "typical" value,
especially when extreme values could mislead interpretations based on the mean. Moreover, it is
easy to calculate and comprehend.Misra (2016) noted that one of the most significant advantages
of the median is its robustness against outliers and extreme values. Unlike the mean, which can
be heavily influenced by very large or very small numbers, the median remains stable. For
example, in a dataset of incomes where most values are relatively low but a few are extremely
high, the median provides a better representation of the typical income, as it is not skewed by
those few high values.
Despite its advantages, the median has limitations. It overlooks the magnitude of values beyond
the midpoint, potentially missing important trends in the data. For example, if values are closely
clustered but include an extreme outlier, the median may not accurately reflect the overall
distribution. Additionally, in smaller datasets, the median can be less stable and may not reliably
represent central tendency. While applicable to both ordinal and continuous data, the median
lacks the mathematical utility of the mean, which can restrict its use in further statistical
analyses.
The mode is the value that occurs most frequently in a dataset and offers a unique perspective on
central tendency. Its primary advantage is its simplicity; it is easy to identify, especially with
categorical data. According to Veeraraghavan (2016) the mode is the only measure applicable to
nominal data, making it crucial for identifying the most common category or response. It can
also provide insights into multimodal distributions, where multiple values share the highest
frequency. An advantage of the mode is that it can be easily computed even when a distribution
has open-ended classes, such as age ranges like "60 and above." This flexibility allows the mode
to identify the most frequently occurring value or interval, providing valuable insights from
datasets with incomplete boundaries. Consequently, the mode remains useful in various fields,
including marketing and social sciences, where such data structures are common.
However, the mode has significant drawbacks. It can be misleading if it does not represent the
dataset as a whole. Some datasets may have no mode or multiple modes, complicating
interpretation. King) 2001) noted that the mode provides less information about overall
distribution compared to the mean or median, focusing solely on frequency. This narrow
perspective can lead to an incomplete understanding of central tendency, particularly when
frequency does not correlate with underlying data trends.
In conclusion, the choice among mean, median, and mode should be guided by the specific
characteristics of the data and the analytical context. The mean is best for normally distributed
data without outliers, offering comprehensive insights for further calculations. The median is a
robust alternative when dealing with skewed distributions or outliers, providing a dependable
representation of the central value. The mode is invaluable for categorical data, highlighting the
most common occurrences. By carefully weighing the advantages and disadvantages of each
measure, analysts can ensure a more accurate and meaningful interpretation of the data.
Gahi (2020) defined data processing is a fundamental aspect of the research process,
encompassing a series of operations that transform raw data into meaningful information. This
transformation is essential for drawing valid conclusions and making informed decisions based
on empirical evidence. The operations involved in data processing can be categorized into
several key stages: data collection, data cleaning, data analysis, and data interpretation. Each of
these stages plays a vital role in ensuring the integrity and utility of research findings.
The initial step in data processing is data collection, which involves gathering information from
various sources. This can include surveys, experiments, observational studies, and existing
databases. The choice of data collection method depends on the research objectives and the
nature of the data required. For instance, quantitative research often employs structured surveys
or experiments to collect numerical data, while qualitative research may utilize interviews or
focus groups to gather descriptive information (Creswell, 2014).
Effective data collection is paramount, as the quality of the data directly impacts the validity of
the research outcomes. Researchers must ensure that the data gathered is relevant, accurate, and
representative of the population being studied. This may involve designing appropriate
instruments, such as questionnaires or measurement tools, and piloting them to identify any
potential issues before full-scale data collection begins (Fowler, 2014). Additionally, ethical
considerations must be addressed, such as obtaining informed consent from participants and
ensuring confidentiality.
Once data has been collected, the next critical step is data cleaning, which involves identifying
and correcting errors or inconsistencies in the dataset. This process is essential for ensuring the
reliability of the data and involves several operations, including removing duplicates, handling
missing values and correcting Duplicate entries can skew results and lead to inaccurate
conclusions. Researchers must identify and eliminate these duplicates to maintain data integrity.
Missing data can arise for various reasons, such as non-responses in surveys. According to
(Little, Rubin, 2019) researchers must decide how to address these gaps, whether by imputing
values, removing affected entries, or using statistical techniques that accommodate missing data.
Data entry errors, such as typos or incorrect coding, must be identified and corrected. This may
involve cross-referencing with original sources or using automated tools to detect anomalies.
Data cleaning is often a time-consuming but necessary step in the data processing workflow, as it
ensures that the dataset is accurate and ready for analysis (Pinto et al., 2020). The integrity of the
dataset after cleaning is crucial, as it sets the foundation for meaningful analysis.
Following data cleaning, researchers proceed to data analysis, where they apply statistical
methods to interpret the data. This stage can involve various techniques, depending on the
research questions and the type of data collected. Common analytical methods include
descriptive statistics, inferential statistics and qualitative analysis. Descriptive statistics provide a
summary of the data, including measures of central tendency (mean, median, and mode) and
measures of variability (range, variance, standard deviation). Descriptive statistics help
researchers understand the basic features of the data and identify patterns or trends (Field,
2018).Inferential statistics techniques allow researchers to make generalizations about a
population based on a sample.
(Blander, 2019) noted that inferential statistics include hypothesis testing, confidence intervals,
and regression analysis, which help determine relationships between variables and assess the
significance of findings. For qualitative data, researchers may use thematic analysis, content
analysis, or grounded theory to identify themes and patterns within the data. This involves
coding the data and interpreting the meanings behind participants' responses (Braun, Clarke,
2006).
The choice of analytical methods is crucial, as it influences the conclusions drawn from the data.
Researchers must select appropriate techniques that align with their research objectives and the
nature of the data. Furthermore, the analytical process often involves iterative cycles of analysis
and refinement, allowing researchers to delve deeper into the data and uncover more nuanced
insights.
The final stage of data processing is data interpretation, where researchers draw conclusions
based on the analyzed data. This involves contextualizing the findings within the broader
research framework and considering their implications. Researchers must critically evaluate the
results, considering factors such as limitations, generalizability and practical implications. Every
study has limitations that may affect the validity of the findings. Maxwell (2013) is of the view
that researchers should acknowledge these limitations and discuss how they may influence the
interpretation of the results. . This transparency is vital for maintaining the integrity of the
research process providing a balanced view of the findings.
Additionally: researchers must consider whether the findings can be generalized to a larger
population or if they are specific to the sample studied. This is particularly important in
quantitative research, where sample size and selection methods play a significant role in
generalizability. Cohen et al (2018) understanding the scope and applicability of the findings is
essential for drawing meaningful conclusions and making informed recommendations. Finally,
researchers should discuss the practical implications of their findings, including how they
contribute to existing knowledge, inform policy, or guide future research. This step is crucial for
translating research outcomes into actionable insights that can benefit various stakeholders. Data
interpretation is a critical step that requires careful consideration of the results and their broader
significance. It is the stage where research findings are synthesized and communicated to the
intended audience, whether in academic publications, reports, or presentations.
In conclusion, data processing operations in research encompass a series of systematic steps that
transform raw data into meaningful insights. From data collection to cleaning, analysis, and
interpretation, each stage plays a vital role in ensuring the integrity and validity of research
findings. By adhering to rigorous data processing practices, researchers can enhance the
reliability of their conclusions and contribute valuable knowledge to their respective fields. The
careful execution of these stages not only strengthens the research process but also aids in the
advancement of knowledge across disciplines.
References
1. Blander, J. (2019). “Inferential statistics in research: A guide for beginners.” Statistics Journal,
15(1), pp. 34-47.
4. Fowler, F. J. (2014). Survey Research Methods. 5th ed. Thousand Oaks, CA: Sage
Publications.
5. Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics. 5th ed. London: Sage
Publications.
7. Little, R. J. and Rubin, D. B. (2019). Statistical Analysis with Missing Data. 2nd ed. Hoboken,
NJ: Wiley.
8.Mangal, S. K. (2002). Statistics in Psychology and Education. new Delhi: Phi Learning Private
Limited.
10.Mohanty, B and Misra, S. (2016). Statistics for Behavioural and Social Sciences. Delhi: Sage.
11.Minium, E. W., King, B. M., & Bear, G. (2001). Statistical Reasoning in Psychology and
Education. Singapore: John-Wiley.
12.Pinto, D. et al. (2020). “Data cleaning: A survey of approaches and techniques.” Journal of
Data Science, 18(3), pp. 491-511.
15. Faroukhi AZ, I El Alaoui, Y Gahi, A Amine (2020)- Journal of Big Data, - Springer