0% found this document useful (0 votes)

25 views24 pages

Statistics Research

Uploaded by

Kookie Kookie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views24 pages

Statistics Research

Uploaded by

Kookie Kookie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

I.

Introduction

II. Evaluate business and economic data/ information

2.1 Data, information, and knowledge

Data are a set of facts and provide a partial picture of reality. Whether data are being collected
with a certain purpose or collected data are being utilized, questions regarding what
information the data are conveying, how the data can be used, and what must be done to
include more useful information must be kept in mind. Data refers to raw, unprocessed facts or
observations. It represents the most basic level of information, consisting of individual elements
that have not yet been organized or interpreted. Data can be quantitative or qualitative and
come from various sources, such as sensors, surveys, transactions, and documents. (Foster
Provost, 2013).

Information is organized and processed data that has been given context and meaning. It
represents a higher level of understanding than data, as it provides insights into the underlying
patterns and relationships within the data. Information can be derived from data through
various methods, such as analysis, interpretation, and synthesis. ( Frances Bee, 2005)
https://books.google.com.vn/books?
hl=vi&lr=&id=xRcuDwAAQBAJ&oi=fnd&pg=PR7&dq=what+is+information+in+statistics&ots=rSE
fch0irD&sig=x-FnJSig_pV1E1nvE0H4V22qd50&redir_esc=y#v=onepage&q=what%20is
%20information%20in%20statistics&f=false

Knowledge is the accumulation of information and experience that has been applied to solve
problems or make decisions. It represents the deepest level of understanding, as it combines
information with expertise and judgment. Knowledge is often tacit or subjective, and it can be
difficult to codify or share.

The Transformation from Data to Information:

Data, the raw material of understanding, forms the foundation of our quest for knowledge.
However, data itself is like a pile of puzzle pieces, each containing a unique puzzle piece but
lacking a clear picture. To turn these raw materials into meaningful insights. The journey begins
with data collection, ensuring we have a comprehensive data set for analysis. This involves
identifying relevant sources, using appropriate methods, and ensuring data quality.

Once collected, data must be meticulously cleaned to remove errors, inconsistencies, and
outliers. This ensures data accuracy and reliability, providing a solid foundation for further
analysis. Next, data organization is important to facilitate effective analysis. Data is structured
into a clearly defined format, often involving the creation of a database or data warehouse,
making the data easily accessible and manageable.

This organized data then undergoes data analysis, using statistical methods, machine learning,
or other analytical techniques to discover patterns, trends, and relationships. This process
extracts meaningful insights from structured data. The transformation culminates in information
interpretation, where the results of data analysis are given context and meaning. This involves
understanding the underlying factors, drawing conclusions from identified patterns, and
considering the limitations of the data. Through this process, we transform raw data into
structured information, revealing the story it holds. At its heart, the transition from data to
information is a journey of discovery and understanding. It's about drawing meaning from
chaos, turning raw observations into valuable insights. This process allows us to make informed
decisions, solve complex problems, and better understand the world around us.

The Metamorphosis of Information into Knowledge:

Information, the structured and organized representation of data, provides context and
meaning, revealing the patterns and relationships embedded within the raw material. However,
the knowledge holds the key to unlocking the true potential of this transformed data.
Knowledge, the culmination of information, expertise, experience, and judgment, empowers us
to make informed decisions, solve complex problems, and drive innovation.

The transition from information to knowledge is a multifaceted process that involves a deep
synthesis of understanding. It begins with interpretation, where we delve into the information,
seeking to comprehend the underlying factors, concluding the identified patterns, and
acknowledging the limitations of the data.
Next, experience plays a crucial role. As we engage with the information, apply it to real-world
scenarios, we gain valuable insights that enrich our understanding. Through trial and error,
success and failure, we refine our knowledge, making it more robust and adaptable.

Expertise, the specialized knowledge and skills acquired through years of study, practice, and
mentorship, further elevates the transformation. Experts bring a depth of understanding to the
information, enabling them to identify nuances, draw connections, and make inferences that
might elude others.

Finally, judgment, the ability to make sound decisions based on accumulated knowledge and
experience, completes the metamorphosis. We evaluate the information, considering its
relevance, reliability, and potential biases, before applying our knowledge to make informed
choices.

The journey from information to knowledge is not a linear progression; rather, it is a continuous
cycle of refinement and understanding. As we encounter new information, we revisit our
existing knowledge, challenging and expanding its boundaries. This dynamic process ensures
that our knowledge remains relevant, adaptable, and actionable. In essence, knowledge is the
lifeblood of progress. It empowers us to navigate the complexities of the world, make informed
decisions, solve problems, and drive innovation. The transformation from information to
knowledge is a testament to our innate human capacity to learn, adapt, and grow.

Example

Example 1: Unveiling Customer Behavior through Data

Data: A retail company meticulously collects sales data from its point-of-sale systems. This data
encompasses individual transactions, capturing details such as products purchased, quantities,
customer demographics, and transaction dates.

Information: By analyzing this vast trove of data, the company identifies patterns and trends in
customer purchasing behavior. They discover that certain products are more popular during
specific seasons, while others are frequently bought together. They also gain insights into
customer preferences based on demographics and location.
Knowledge: Armed with this knowledge, the company transforms customer behavior insights
into actionable strategies. They develop targeted marketing campaigns that resonate with
specific customer segments, optimize product placement and inventory management, and tailor
customer service interactions to enhance satisfaction. This knowledge-driven approach leads to
increased customer engagement, improved sales performance, and a stronger competitive
edge.

Example 2: Improving Patient Care with Data-Driven Insights

Data: A healthcare provider diligently collects patient data from electronic health records. This
data includes patient demographics, medical history, diagnoses, treatment plans, medication
records, and outcomes.

Information: Through rigorous analysis of patient data, the healthcare provider identifies risk
factors for various diseases, such as high blood pressure, a family history of certain conditions,
or lifestyle factors that contribute to health risks.

Knowledge: By applying this knowledge, the healthcare provider can develop personalized
preventive care programs that target individual patient needs. These programs may include
lifestyle modifications, medication management, regular screenings, and educational resources.
This proactive approach to healthcare reduces the risk of adverse patient outcomes, improves
overall patient health, and lowers healthcare costs.

Example 3: Guiding Economic Policy with Data-Powered Knowledge

Data: A government agency systematically collects economic data, including unemployment

rates, job creation figures, economic growth indicators, and inflation rates. This data provides a
snapshot of the health of the labor market and the overall economy.

Information: By analyzing this economic data, the government agency identifies trends in
employment, economic growth, and potential areas of concern. They can assess the
effectiveness of current policies and identify emerging economic challenges.
Knowledge: Equipped with this knowledge, the government can formulate informed economic
policies that promote job creation, stimulate economic growth, and address potential
challenges. These policies may involve targeted tax incentives, infrastructure investments,
educational initiatives, or social welfare programs. By applying data-driven knowledge to
economic policymaking, the government can foster a stable and prosperous economy for its
citizens.

2.2. Different methods of analysis

There are many ways to classify data analysis methods, however, from the perspective of this
subject, analysis methods are divided into 3 types as follows: descriptive data analysis,
exploratory data analysis, and confirmatory data analysis.

2.2.1. Descriptive data analysis

Descriptive Analytics is the act of collecting and arranging past data of a business in summary
form, thereby helping businesses better understand past performance to make the right
strategies in the future. Descriptive statistics are like summaries of a dataset, providing a quick
overview of its characteristics and patterns. They are categorized into measures of central
tendency (mean, median, mode) and measures of variability (standard deviation, range, etc.).
These measures help us understand where most of the data lies (center) and how spread out it
is (variability). Descriptive statistics are essential tools for transforming large datasets into
meaningful insights. For example, a cosmetics company will know how many products each
brand sold last month to plan to import more products. (Joel Grus, 2019)

In 2009, Cheryl Bagley Thompson published a report titled "Descriptive Data Analysis" showing
that descriptive data analysis is an important step in any research project, as it helps researchers
understand and summarize their data before proceeding with more complex analyses. By
planning data analysis, researchers can avoid common pitfalls such as wasting time, missing
important data, producing misleading results, and making statistical errors. Descriptive data
analysis encompasses a variety of techniques, including frequency distributions, measures of
central tendency (mean, median, mode), and variability (range, standard deviation) to help
researchers comprehensively understand their data and identify patterns and trends.
Descriptive data analysis is essential for researchers, providing a foundation for understanding
and interpreting their data. It allows researchers to condense large amounts of information into
manageable summaries, identify patterns and trends, and evaluate the overall characteristics of
their sample. By planning data analysis, researchers can ensure that they collect the necessary
information, avoid errors, and conduct objective analysis. Descriptive data analysis is a valuable
skill. valuable for anyone involved in data-driven decision-making. It empowers individuals to
extract meaningful insights from complex data sets, allowing them to make informed choices
and solve problems effectively. Whether in research, business, or everyday life, analyzing data
descriptively is a powerful asset.

https://www.sciencedirect.com/science/article/abs/pii/S1067991X08002976

Based on my personal opinion, I see descriptive data analysis (DAA) as an indispensable tool in
the field of data analysis, creating a solid foundation for deeper exploration and empowering
decision-making. wise decision. It serves as a guide, transforming dense data into manageable
summaries, allowing researchers to identify patterns, trends, and overall characteristics of their
data.By By meticulously planning data analysis in advance, researchers can avoid wasting time
on irrelevant information, missing important data points, or producing misleading results. This
proactive approach ensures that valuable resources are used efficiently, leading to more
accurate and reliable conclusions.

In 2008, Murray J. Fisher RN published a report titled "Understanding Descriptive Statistics"

show that descriptive data analysis, while valuable for summarizing and describing data, has its
limitations. It cannot explain the underlying causes of patterns, is susceptible to outliers, and
presents challenges when comparing different datasets. Moreover, it provides limited insights
into complex relationships between variables and cannot establish causal links. To address these
limitations, researchers often combine descriptive analysis with inferential and predictive
statistical methods to gain a more comprehensive understanding of the data.

https://www.sciencedirect.com/science/article/abs/pii/S1036731408001732

In my personal opinion, I find that descriptive data analysis (DAA) proves its value in
summarizing and describing data, but it lacks the ability to explain the "why" behind it. patterns,
making it susceptible to outliers and challenges when comparing data sets. Additionally, DAA
provides limited insights into complex, variable relationships and cannot establish cause-and-
effect connections. To overcome these limitations, researchers often combine descriptive
analytics with inferential and predictive statistical methods to gain a more comprehensive
understanding of the data.

Descriptive data analysis (DDA) is a fundamental technique in data analysis that involves
summarizing and describing the characteristics of a dataset. It provides a basic understanding of
the data by identifying central trends, variability, and patterns. DDA is widely used in various
research fields, including:

Education research: In 2008, Meilun Shih, Jui Feng, and Chin-Chung Tsai used the Descriptive
data analysis method to research and analyze the content of articles about cognition in e-
learning. The author used this method to describe the scope of research, determine the total
number of articles published within the time frame and the number of articles related to the
research topic, identify research trends, analyze the distribution of research across different
categories such as year of publication, preferred journal, and topics explored (e.g., "Teaching
Approaches" is most popular). Identify trends in data collection methods, showing increased
use of student learning diaries and online messages in addition to traditional survey methods.
Besides, the author also uses this method to determine the potential impact. By analyzing the
number of times cited, the author suggests that studies on "Teaching approaches", "Processing
information" and "Motivation" may have a stronger influence on future studies.

https://sci-hub.se/https://doi.org/10.1016/j.compedu.2007.10.004

Research: In 2018 Assarroudi, A., Heshmati Nabavi, F., Armat, M. R., Ebadi, A., & Vaismoradi, M.
Used Descriptive Data Analysis method wrote a research article "Directed qualitative content
analysis: description and development of foundational methods and data analysis procedures.
Journal of Nursing Research". The research article has focused on Qualitative Content Analysis,
the entire passage revolves around Directed Qualitative Content Analysis (DQCA), a method
used to analyze text data based on theories or existing frameworks. On the other hand,
descriptive data analysis focuses on summarizing and describing numerical data. It would not be
suitable for analyzing interview transcripts or other qualitative documents. The passage
highlights a gap in the literature regarding the specific steps involved in DQCA that the authors
aim to address. They propose a detailed 16-step method for DQCA, demonstrating its
advantages for qualitative researchers.

https://journals.sagepub.com/doi/abs/10.1177/1744987117741667

DDA is a powerful tool that can be used to gain valuable insights from data. However, using it
responsibly and in conjunction with other analytical techniques is important. By combining DDA
with data visualization, inferential statistics, and predictive modeling, researchers can gain a
more comprehensive understanding of the data and make informed decisions. In addition, it's
crucial to consider the context and limitations of the data when using DDA. Data is never
perfect, and it's important to be aware of potential biases, errors, and missing data that could
affect the analysis. DDA should be used to communicate findings effectively. Clear and concise
presentations of data summaries, visualizations, and conclusions can help others understand
the research and its implications.

2.2.2. Exploratory data analysis

Exploratory Data Analysis is a statistical approach that focuses on preliminary investigations of

data before diving into formal inferences. Its main objective is to understand the characteristics,
patterns, and behaviors within a dataset. Exploratory Data Analysis (EDA) doesn't dive headfirst
into conclusions. Instead, it acts like a surveyor analyzing a landscape before building. It uses
four core principles to thoroughly understand the data's characteristics. First, EDA emphasizes
visual representations like charts and graphs to reveal patterns and irregularities. Think of these
as maps that expose hidden features and potential trouble spots. Second, EDA may transform
the data using mathematical tweaks (like taking square roots) to simplify its structure and make
analysis easier. This is akin to adjusting the light or zooming in on a map for a clearer picture.
Third, EDA values methods are resistant to outliers, those unexpected data points that can
distort the view. This ensures the analysis reflects the underlying trends more accurately. Finally,
EDA examines the "residuals," the data remaining after a summary or model is subtracted.
These residuals, like subtle changes in elevation on a map, can point to potential errors or
deviations from the main patterns, giving researchers a more complete picture of the data
landscape. (David C. Hoaglin, 2015)

https://www.sciencedirect.com/science/article/abs/pii/B9780080970868421252

In 1986, Chatfield published a research paper on Exploratory data analysis in the European
Journal of Operational Research. The research shows that Exploratory data analysis helps
summarize and understand the data through techniques like calculating means and standard
deviations, plotting histograms and boxplots, and identifying outliers. Exploratory data analysis
aids in selecting a suitable statistical model for further analysis by revealing patterns and
relationships in the data. It helps assess if assumptions about the data distribution (normal,
exponential, etc.) are reasonable. By providing a clear understanding of the data, Exploratory
data analysis prevents the use of inappropriate statistical methods and helps focus on the most
relevant aspects for analysis. It can even make formal testing unnecessary in some cases.
Exploratory data analysis is applicable across various data types and situations, from completely
new datasets to analyzing a sequence of similar datasets. Overall, Exploratory data analysis is an
essential initial step in data analysis because it provides the foundation for all subsequent
stages. It helps ensure that chosen models and statistical methods are suitable for the data,
leading to more accurate and reliable results.

Based on my personal perspective, I see the versatility of this method across many different
fields, from social sciences to business and engineering. It serves as a bridge between data and
our understanding, allowing us to formulate meaningful questions, choose appropriate
statistical models, and ultimately draw reliable conclusions.

https://www.sciencedirect.com/science/article/abs/pii/0377221786902092

In 1986, Chatfield published a research paper titled "Exploratory data analysis" on exploratory
data analysis in the European Journal of Operational Research. Research shows that exploratory
data analysis (EDA), although valuable for understanding data, has some limitations. EDA can be
difficult to understand and standardize because it relies on subjective judgment rather than
strict rules. This lack of formality also increases the risk of misinterpreting the data if
appropriate statistical analysis is not performed. EDA is a preliminary step to identify trends and
patterns, not a substitute for rigorous statistical methods or well-defined models. Using an
inappropriate model can be more detrimental than not using it at all. The article also criticizes
the popular approach to EDA for ignoring its role as a precursor to statistical analysis and
undervaluing traditional statistical techniques.

Based on my personal perspective, I see exploratory data analysis (EDA) as a valuable tool in
data analysis, but it is important to recognize its limitations. While EDA excels at discovering
patterns and initial insights, its reliance on subjective judgment can introduce bias and
misinterpretation. Without formal statistical analysis to confirm or refute EDA's findings, the
conclusions may be inconclusive. EDA serves as a preliminary step, not a substitute for rigorous
statistical methods or well-defined models. Overemphasis on EDA while undervaluing
traditional statistical techniques can hinder the pursuit of accurate and reliable conclusions.
Therefore, a balanced approach that acknowledges both the strengths and limitations of EDA is
essential for effective data discovery and analysis.

https://www.sciencedirect.com/science/article/abs/pii/0377221786902092

Business and Management: In 2023, Marcos Ferasso used the Exploratory data analysis method
in the research paper "Mapping the circular economy in the SME sector: Exploratory network
analysis". The Exploratory data analysis method focuses on understanding the characteristics of
a data set in the early stages of analysis. It involves techniques such as histograms, scatterplots,
and basic statistics to understand patterns and underlying problems.

https://www.sciencedirect.com/science/article/pii/S2666784323000505

Research: In 2021, David E. Hubbard applied the Exploratory Data Analysis method to the
research article "An exploratory study of library science journal articles in syllabi". This method
is used to help visualize and summarize data distribution. It involves techniques such as
histograms or frequency tables to understand how often articles are cited, publication dates,
subject areas, and disciplines where they are used. Exploratory Data Analysis helps discover
original patterns in data. Here, Exploratory Data Analysis helped identify trends such as a higher
frequency of citations to recently published articles or the dominance of traditional library
science topics. Additionally, Exploratory Data Analysis helps develop exploratory analysis
research questions using Exploratory Data Analysis that can guide the researcher toward more
specific questions. Exploratory Data Analysis has revealed the need to investigate why newer
topics are less cited or to explore how library science articles are used in other disciplines.

Based on my personal perspective, I see EDA serving as a foundational step in data analysis,
providing a deeper understanding of data characteristics, patterns, and potential relationships.
It's like getting to know a person before starting a deep conversation. EDA techniques like
visualization, summary statistics, and basic data manipulation help us uncover hidden insights,
identify outliers, and evaluate data quality. Without EDA, data analysis is like trying to navigate a
maze without a map. It is important to understand complex data sets, formulate meaningful
research questions, and promote informed decision making. EDA is an extremely useful tool for
anyone working with data, regardless of their field or expertise. It allows us to transform raw
data into meaningful knowledge, leading to better understanding, problem solving, and
innovation.

2.2.3. Confirmatory data analysis

According to Dataheadhunters Confirmatory data analysis (CDA) serves as a rigorous statistical

tool for testing predetermined hypotheses about relationships between variables. It builds on
exploratory data analysis (EDA) using statistical methods such as hypothesis testing, confidence
intervals, controlled experiments, and regression models to evaluate evidence supporting for or
against proposed relationships. CDA aims to validate findings, establish scientific evidence,
measure model accuracy, and determine the reliability of results. It contributes to the
advancement of scientific knowledge by providing evidence-based conclusions. However, the
effectiveness of CDA depends on the quality of the hypotheses, the quality of the data, and
adherence to valid statistical assumptions. Combined with other research methods and with an
understanding of its limitations, CDA proves to be a valuable tool for scientific research.
( Dataheadhunters, 2024)

https://dataheadhunters.com/academy/exploratory-vs-confirmatory-data-analysis-approaches-
and-mindsets/
In 2022, Dustin Fife and Joseph Lee Rodgers published a report titled " Understanding
Exploratory/Confirmatory Data Analysis: Moving Beyond the "Replication Crisis". This article
revealed the strengths of the Data Validation Analysis method It is a rigorous approach that
focuses on predetermined testing Hypotheses using statistical methods, it increases the
possibility of introducing incorrect decisions mistakes, and CDA makes the research process
more transparent by requiring researchers to state their assumptions before collecting data.
Easily evaluate and check the validity of research. , including using appropriate statistical
methods and controlling confounding factors, helps increase the reliability of research results.

https://osf.io/preprints/psyarxiv/5vfq6

Based on my personal opinion, I find that Confirmatory Data Analysis (CDA) is a powerful
research method with many advantages. Outstanding strengths of CDA include rigor and
transparency, helping to control errors and enhance verifiability. CDA also supports more
informed decision-making by estimating effect sizes and is suitable for directed research. Thanks
to these advantages, CDA is a valuable tool for researchers in many different fields.

In 2011, Dr. Elissaios Karageorgiou published the report "THE LOGIC OF EXPLORATORY AND
CONFIRMATORY DATA ANALYSIS", the report pointed out the weaknesses of the Confirmatory
Data Analysis method. According to the article "Understanding Exploratory/Confirmatory Data
Analysis: Overcoming the "Replication Crisis"", one weakness of Confirmatory Data Analysis
(CDA) is selection bias. Sampling bias can distort statistical results because the sample collection
is not representative of the entire population. This can lead to over-reliance on the constructed
model, meaning that the model is only accurate for the specific data set and cannot be
generalized to other data sets. Besides, CDA can be less flexible than other data analysis
methods, such as exploratory data analysis. This can make CDA difficult to apply to studies with
open goals or unclear data. CDA requires researchers to state hypotheses before collecting data.
This can limit the ability to discover new and unexpected findings. The results of CDA can be
difficult to interpret for those without a statistical background.

http://www.cogcrit.umn.edu/docs/karageorgiou_11.shtml
Based on my personal opinion, although CDA offers a structured approach to hypothesis testing,
its limitations cannot be ignored. The possibility of bias, inflexibility, and limitations in
exploration need to be carefully considered. Depending on the research goals and clarity of the
data, alternative methods such as EDA may be more suitable for exploring the unknown and
discovering new possibilities.

Medicine: In 2013, Alvydas Mikulskis published the report "Novel data analysis methods to
overcome cut point challenges and enable comprehensive assessment of antidrug binding
activity in confirmatory assays". In this study, confirmatory data analysis was applied indirectly
through improving the accuracy of confirmatory testing, an important step in the drug
immunogenicity assessment process. Confirmatory data analysis is innovatively applied to
improve the accuracy of drug immunogenicity assessment.

https://www.sciencedirect.com/science/article/abs/pii/S0022175913000999

Nursing education: In 2020, Alexis Harerimana published the research paper "Using Exploratory
and Confirmatory Factor Analysis to understand the role of technology in nursing education". In
this report, confirmatory data analysis method is used to check the validity of this model. CFA
evaluates the fit of the five-factor model to the collected data. Researchers use fit indices such
as CFI, IFI, TLI, RMSEA, and SRMR. These indices measure how well the five-factor model
explains the relationships among the 14 questions in the questionnaire. Researchers use
suitable indices such as CFI, IFI, TLI, RMSEA and SRMR. These indices measure how well the five-
factor model explains the relationships among the 14 questions in the questionnaire.

https://www.sciencedirect.com/science/article/abs/pii/S0260691719311414

Based on my personal opinion, I see CDA as an essential tool for researchers who want to test
and validate their theories in a quantitative way. It provides a systematic approach to
understanding the underlying structure and relationships between variables in many fields of
study. However, it is important to carefully consider assumptions, sample size requirements, and
model complexity when applying CDA to ensure the validity and interpretability of results.

2.2.4 Example
First, when analyzing data in depth, we will perform descriptive analysis for the purpose of...
Performing descriptive analysis results are presented in Figure 1.

Figure 1: STATA's results for Descriptive data analysis

Source: Author’s compilation

From the results of Figure 1, we see that ROA and TAT have 368 observations, proving that the
observations are equal and the two variables do not lose data. This is essential information for
the author to take the next data analysis steps. Not only that, ROA recorded an average value of
0.0695049, a standard deviation of 0.0711706, a minimum value of -0.6536702, and a
maximum value of 0.4597398. Besides, the results from Figure 1 also show that the average
value of TAT is 1.104643, the standard deviation is 1.56614, the smallest value is -5.069544 and
the maximum value is 13.83886

To have a deeper look at these two variables, the author decided to draw a Histogram to review
the data distribution of two variables, ROA and TAT, in the author's data set. The results of
plotting in STATA 14 are presented in Figures 2 and 3.
Figure 2: STATA's result of drawing histogram Figure 3: STATA's result of drawing histogram
for ROA for TAT

Source: Author’s compilation

From the results in Figure 2, the ROA chart shows a number of distinct characteristics. First, in
terms of column width, the classes in the histogram chart have equal widths. This helps to
compare the frequency of ROA values. Next in terms of skew, the histogram has a slight positive
skew. The majority of ROA values lie near zero, but there are some higher values (large positive
ROA) that pull the right tail of the distribution farther out. Regarding the top, the peak of the
chart is very clear, located around ROA = 0. This shows that the majority of businesses in the
data sample have ROA close to 0. The number of ROA values concentrated here is the highest. ,
with a frequency of about 300. The assessment of the distribution of ROA data is quite
concentrated, with the majority of values located near the average value (about 0). This may
indicate that the majority of businesses have relatively uniform asset utilization efficiency.
Finally, regarding the fit of the distribution, the ROA distribution does not seem to follow a
completely normal distribution because there are some outliers and a slight right skew.
However, the majority of values cluster around the mean, suggesting that the distribution may
approximate a normal distribution but with some adjustments.

From the results in Figure 2, the LEV chart shows a number of distinct characteristics. First is the
width of the column. This histogram has a horizontal axis width (LEV) from about -5 to 15. This
gives The value range of LEV is quite wide. However, most of the values are concentrated in the
range from -1 to 5. Next, about skew, this chart has a positive skew. The majority of LEV values
lie near zero, but there are some higher values (large positive LEV) that pull the right tail of the
distribution further out. The peak of the chart is very clear, located around LEV = 0. This shows
that the majority of businesses in the data sample have LEV close to 0. The number of LEV
values concentrated here is the highest, with the highest frequency. number around 175. There
are some outliers present on both sides of the distribution, especially on the right side with
large positive values (LEV near 15). However, the number of these outliers is not much. The
assessment of the distribution of LEV data is quite concentrated, with the majority of values
lying near the mean value (around 0). This may show that the majority of businesses have
relatively equal levels of financial leverage. Regarding the fit of the distribution, the LEV
distribution does not seem to follow a normal distribution completely because there are some
outliers and an obvious right skew. However, the majority of values cluster around the mean,
suggesting that the distribution may approximate a normal distribution but with some
adjustments.

Phân tích khám phá (Exploratory data analysis) thường được dùng nhằm mục đích… Trong trường hợp
này, phân tích dữ liệu khám phá sẽ dùng để tìm hiểu xu hướng mối quan hệ của hai biến trên (ROA và
TAT). Kỹ thuật phân tích dữ liệu khám phá được sử dụng tại đây là vẽ biểu đồ scatter plot. Biểu đồ scatter
plot sẽ giúp… Hình 4 mô tả kết quả STATA về việc vẽ scatter plot cho 2 biến ROA và TAT.
Figure 4: STATA's result for drawing scatter plot

Source: Author’s compilation

ROA là trục tung LEV là trục hoành có xu hướng ngược chiều nhau lev tăng thì roa giảm. Biểu
đồ phân tán (scatter plot) giữa hai biến ROA (Return on Assets) và LEV (Leverage)
cho thấy mối quan hệ ngược chiều nhau giữa chúng. Trên trục tung là biến ROA và
trên trục hoành là biến LEV. Dữ liệu cho thấy khi LEV tăng, ROA có xu hướng giảm,
thể hiện qua đường xu hướng màu đỏ nghiêng xuống phía bên phải. Sự phân bố của
các điểm dữ liệu tập trung chủ yếu quanh mức LEV từ 0 đến 5, với một vài điểm
ngoại lệ ở xa. Đường fitted values minh họa rõ ràng xu hướng giảm của ROA khi LEV
tăng.
Regression equation ROA=0.0749701−0.0049475×LEV. In particular, the intercept coefficient
(Intercept) is 0.0749701, the coefficient of the LEV variable is -0.0049475. The above equation
represents the relationship between return on assets (ROA) and financial leverage (LEV). The
intercept is 0.0749701, showing that when LEV is 0, the average value of ROA is 0.0749701. This
means that if a company did not use financial leverage, its expected ROA would be 0.0749701,
or 7.49701%. The coefficient of LEV is -0.0049475, which represents the change in ROA when
LEV increases by one unit. Specifically, when LEV increases by 1 unit, ROA will decrease on
average by 0.0049475 units, or 0.49475%. This coefficient has a negative value, showing a
negative relationship between LEV and ROA: as the level of financial leverage increases, the
return on assets decreases. The accompanying statistical results indicate that this relationship is
statistically significant. The p-value of LEV is 0.037, smaller than the conventional significance
level of 0.05, showing that the effect of LEV on ROA is significant. Hệ số R2R^2R2 là 0.0119,
cho biết khoảng 1.19% sự biến thiên của ROA được giải thích bởi LEV. Mặc dù giá trị R2R^2R2
thấp, nó vẫn cho thấy có một mối quan hệ có ý nghĩa giữa hai biến số này.

III. Analyse and evaluate raw business data using a number of statistical methods

3.1. Quantitative analysis and Qualitative analysis

Quantitative analysis is a mathematical approach that collects and evaluates measurable and
verifiable data in order to evaluate performance, make better decisions, and predict trends.
Unlike qualitative analysis, quantitative analysis uses numerical data to provide an explanation
of "what" happened, but not "why" those events occurred. ( Will Kenton, 2023)

https://www.investopedia.com/terms/q/quantitativeanalysis.asp#:~:text=Quantitative
%20analysis%20is%20a%20mathematical,better%20decisions%2C%20and%20predict
%20trends.

Qualitative data describes qualities or characteristics. It is collected using questionnaires,

interviews, or observation, and frequently appears in narrative form. ( Macalester College
Library, 2024)

Quantitative analysis Qualitative analysis

How is data collected? Close-ended questions with Open-ended questions with
multiple-choice format, interviews and observations.
surveys, polls or
questionnaires.
How is data analyzed? Mathematical and statistical Verbal communication and
analysis communicated with analysis of summarizations,
numbers, graphs and charts. categorizations and
interpretations.
Advantages Impartiality, fast and reliable More detailed insights,
data collection methodology, methodology encourages
larger sample sizes. deeper discussion.
Disadvantages Unable to learn more context Smaller sample sizes, more
in answers, abnormal research risk of biasness, requires
environment, limited answers highly skilled moderator.
for data collection and
insights.
Common industries Finance, accounting, Healthcare, health sciences,
consulting. social sciences, legal, e-
commerce, marketing.

Example: In 2024, Yinlong Luo and colleagues in a research paper titled "Quantitative analysis of
microplastics in water environments based on Raman spectroscopy and convolutional neural
network" used the Quantitative analysis method to study microplastics. Quantitative analysis
plays an important role in assessing the level of microplastic pollution in the water environment.
The above article introduces a new method combining Raman spectroscopy and convolutional
neural network (CNN) to determine the concentration of Polyethylene (PE) microplastics in
actual water samples. This method uses average mapping spectrum (AMS) to improve the
uniformity in Raman spectrum analysis and filters MP solutions of different concentrations to
expand the effective detection range. Experimental results with 6 different PE sizes in 5 water
environments show high accuracy of the method, with high R² and low RMSE. Compared with
other machine learning models such as Random Forest (RF) and Support Vector Machine (SVM),
the combined Raman and CNN method appears to be more effective in determining
microplastic concentration

https://www.sciencedirect.com/science/article/abs/pii/S0048969724020680

3.2. Descriptive statistics and Inferential statistics

3.2.1. Descriptive statistics

Descriptive statistics are brief informational coefficients that summarize a given data set, which
can be either a representation of the entire population or a sample of a population. Descriptive
statistics are broken down into measures of central tendency and measures of variability
(spread). Measures of central tendency include the mean, median, and mode, while measures
of variability include standard deviation, variance, minimum and maximum variables, kurtosis,
and skewness. ( Adam Hayes, 2024)

Mean is a basic and common statistical measure used to describe the central value of a data set.
It is calculated by taking the sum of all the values in the data set and dividing it by the number
of values. ( Nga vu, 2022)

Example: Suppose we have a math data set of 10 students: {8, 9, 7, 6, 10, 5, 9, 8, 7, 6}.

Average = (8 + 9 + 7 + 6 + 10 + 5 + 9 + 8 + 7 + 6) / 10 = 7.6

The median is an important statistical measure used to describe the middle value of a data set
when the values are arranged in ascending or descending order. In other words, the median
divides the data set into two parts with an equal number of values. ( Nga Vu, 2022)

Example:

Odd data set: {2, 3, 4, 5, 6, 7}

Sort: {2, 3, 4, 5, 6, 7}

Median = 5 (value is in the middle)

In statistics, variance measures variability from the average or mean. It is calculated by taking
the differences between each number in the data set and the mean, then squaring the
differences to make them positive, and finally dividing the sum of the squares by the number of
values in the data set. ( Adam Hayes, 2023)

Example to demonstrate how variance works. Let’s say returns for stock in Company ABC are
10% in Year 1, 20% in Year 2, and −15% in Year 3. The average of these three returns is 5%. The
differences between each return and the average are 5%, 15%, and −20% for each consecutive
year. Squaring these deviations yields 0.25%, 2.25%, and 4.00%, respectively. We get a total of
6.5% if we add these squared deviations. When you divide the sum of 6.5% by one less the
number of returns in the data set, as this is a sample (2 = 3-1), it gives us a variance of 3.25%
(0.0325). Taking the square root of the variance yields a standard deviation of 18% (√0.0325 =
0.180) for the returns.

Standard deviation is a statistical measurement of the dispersion of a dataset relative to its

mean. If data points are further from the mean, there is a higher deviation within the data set.
It is calculated as the square root of the variance. ( Adam Hayes, 2023)

Example: Say we have the data points 5, 7, 3, and 7, which total 22. You would then divide 22 by
the number of data points, in this case, four—resulting in a mean of 5.5. This leads to the
following determinations: x̄ = 5.5 and N = 4. The variance is determined by subtracting the
mean value from each data point, resulting in -0.5, 1.5, -2.5, and 1.5. Each of those values is
then squared, resulting in 0.25, 2.25, 6.25, and 2.25. The square values are then added together,
giving a total of 11, which is then divided by the value of N minus 1, which is 3, resulting in a
variance of approximately 3.67.
Hình 6 mô tả kết quả STATA cho việc thống kê mô tả cho biến LEV. Kết quả trong
hình cho thấy, LEV có trung bình được ghị nhận là 1.587965 và trung vị là 1.354083
Hơn thế nữa, phương sai của biến được ghi nhận là 1.421404 bên cạnh đó là độ
lệch chuẩn là 1.192226. Số lượgn quan sát của biến cũng được ghi nhận ở mức 368

Based on the histogram chart above, we can make some important observations. First, in terms
of width, this histogram has a horizontal axis (TAT) ranging from 0 to 10, with the value ranges of
TAT divided into columns (bins) with a width of about 1 unit. The peak of the histogram is in the
first bin (TAT from 0 to 1), with the highest frequency around 250, showing that most of the data
is concentrated at very low TAT values. This graph also has a pronounced right skew, meaning
that there are some high TAT values but their number is much less than the low TAT values.
Regarding the rating level, the data distribution on this histogram shows that the majority of TAT
values are between 0 and 3, and the number of TAT values decreases as TAT increases. In terms
of distribution fit, this histogram shows an uneven distribution of the data, focusing mainly on
low TAT values. This is not a normal distribution but a right-skewed distribution. In summary,
this histogram shows that the majority of TAT values are very small, with a few large TAT values.
This may reflect the specific characteristics of the data you are analyzing, such as a process's
processing time being mostly short, with a few long exceptions.

3.2.2. Inferential statistics

Inferential statistics is a statistical method used to draw conclusions about a population based
on data collected from a sample of that population. It allows researchers to make inferences
about a larger population from which a random sample is drawn. Inferential statistics are used
to test hypotheses and estimate population characteristics. ( C.W. Kuhar, 2010). Inferential
statistics is a powerful tool that allows researchers to draw meaningful conclusions about the
world around them. However, it is important to note that the conclusions drawn from
Inferential statistics are based only on data collected from a sample of the population. (Sue A
Hill, 2006)

Example: You randomly select a sample of 11th graders in your state and collect data on their
SAT scores and other characteristics. You can use inferential statistics to make estimates and test
hypotheses about the whole population of 11th graders in the state based on your sample data.

Scatter plots provide visualization of the relationship, while inferential

statistics provide tools to measure and evaluate the statistical significance of
this relationship. In this case, inferential statistics provide businesses with a
comprehensive view of the relationship between the average collection
period and the average repayment period. Figure 9 describes the STATA
results of drawing a scatter plot for the two variables FIXED and TAT.

Nolan S.A. - Heinzen, T. E. Statistics For Behavioral Sciences 2nd Edition
100% (1)
Nolan S.A. - Heinzen, T. E. Statistics For Behavioral Sciences 2nd Edition
710 pages
Difference Between Data, Information, Knowledge
50% (2)
Difference Between Data, Information, Knowledge
17 pages
Working Knowledge: Thomas H Davenport Laurence Prusak
No ratings yet
Working Knowledge: Thomas H Davenport Laurence Prusak
19 pages
6.life Tables
100% (1)
6.life Tables
18 pages
Solucionario Econometria Jeffrey M Wooldridge PDF
11% (9)
Solucionario Econometria Jeffrey M Wooldridge PDF
4 pages
Hasil Spss Log
No ratings yet
Hasil Spss Log
6 pages
Topic 1 - Estimating Market Risk Measures Answer
No ratings yet
Topic 1 - Estimating Market Risk Measures Answer
22 pages
Unit 1 Introduction To Data Information & Knowledge
No ratings yet
Unit 1 Introduction To Data Information & Knowledge
40 pages
Unit-4 BIA
100% (1)
Unit-4 BIA
27 pages
Data, Information and Knowledge
No ratings yet
Data, Information and Knowledge
58 pages
Introduction To Management Information Systems: Transcription
No ratings yet
Introduction To Management Information Systems: Transcription
39 pages
Module 2
No ratings yet
Module 2
37 pages
RSM1282-2025-Session 9-Binary Dependent Variables & Logistic Regression - POST
No ratings yet
RSM1282-2025-Session 9-Binary Dependent Variables & Logistic Regression - POST
35 pages
Topic 1. Data Information and Knowledge985
No ratings yet
Topic 1. Data Information and Knowledge985
31 pages
Pharmacological Screening Methods Unit 5
No ratings yet
Pharmacological Screening Methods Unit 5
14 pages
MIS PP CH 2
No ratings yet
MIS PP CH 2
52 pages
Session 1
No ratings yet
Session 1
61 pages
Chapter 1
No ratings yet
Chapter 1
53 pages
The Information Audit As A First Step Towards Effective Knowledge Management: An Opportunity For The Special Librarian by Susan Henczel
No ratings yet
The Information Audit As A First Step Towards Effective Knowledge Management: An Opportunity For The Special Librarian by Susan Henczel
17 pages
Understanding Data, Information, Knowledge and Their Inter-Relationships
No ratings yet
Understanding Data, Information, Knowledge and Their Inter-Relationships
7 pages
Knowledge Management Mahbere Kidusan
No ratings yet
Knowledge Management Mahbere Kidusan
77 pages
P. Data Knowledge Information
No ratings yet
P. Data Knowledge Information
16 pages
Midterm Data
No ratings yet
Midterm Data
6 pages
Unit 1 Sessions 3, 4
No ratings yet
Unit 1 Sessions 3, 4
54 pages
JETIR2205547
No ratings yet
JETIR2205547
9 pages
570 - Statistics For Management - Frontsheet Final Report
No ratings yet
570 - Statistics For Management - Frontsheet Final Report
11 pages
Working Knowledge: How Organizations Manage What They Know: Invitation
No ratings yet
Working Knowledge: How Organizations Manage What They Know: Invitation
15 pages
Week 13 Finals
No ratings yet
Week 13 Finals
9 pages
Data, Information, Knowledge
No ratings yet
Data, Information, Knowledge
11 pages
Information System Private Assigment
No ratings yet
Information System Private Assigment
24 pages
7 Analytical Skills
No ratings yet
7 Analytical Skills
19 pages
CH-2 Is
No ratings yet
CH-2 Is
38 pages
Introduction To Information and Knowledge
No ratings yet
Introduction To Information and Knowledge
15 pages
Mod 2
No ratings yet
Mod 2
25 pages
Management Information System
No ratings yet
Management Information System
49 pages
Intro To Data Analytics
No ratings yet
Intro To Data Analytics
8 pages
Introduction of Knowledge-4
No ratings yet
Introduction of Knowledge-4
11 pages
Chi-Square Test Case Processing Summary
No ratings yet
Chi-Square Test Case Processing Summary
6 pages
Introduction of MIS
No ratings yet
Introduction of MIS
15 pages
Statistic ASM1
No ratings yet
Statistic ASM1
12 pages
Soalan 3.: Knowledge Management Process
No ratings yet
Soalan 3.: Knowledge Management Process
33 pages
Asm1 Statistics LeThiThuy EV SNDS
No ratings yet
Asm1 Statistics LeThiThuy EV SNDS
38 pages
Asm1 Statistics DongThiLinh BH01210
No ratings yet
Asm1 Statistics DongThiLinh BH01210
31 pages
Chapter 2 - Knowledge Management (Prelim)
No ratings yet
Chapter 2 - Knowledge Management (Prelim)
33 pages
Journal of Knowledge Management Practice
No ratings yet
Journal of Knowledge Management Practice
11 pages
Knowledge Management - A Road Map For Winning Orga
No ratings yet
Knowledge Management - A Road Map For Winning Orga
11 pages
Research Revision Questions 2-1-1
No ratings yet
Research Revision Questions 2-1-1
23 pages
Data Information and Knowledge
No ratings yet
Data Information and Knowledge
3 pages
Data Objective and Knowledge
No ratings yet
Data Objective and Knowledge
4 pages
Knowledge Management
No ratings yet
Knowledge Management
65 pages
1st Lecture
No ratings yet
1st Lecture
4 pages
Knowledge Management Case Study
No ratings yet
Knowledge Management Case Study
6 pages
The Nature of Knowledge
No ratings yet
The Nature of Knowledge
6 pages
Kid JKMP
No ratings yet
Kid JKMP
10 pages
001 - Srivastava - Introduction To KM
No ratings yet
001 - Srivastava - Introduction To KM
8 pages
Working Knowledge How Organizations Manage What TH
No ratings yet
Working Knowledge How Organizations Manage What TH
16 pages
Module Develop and Manage Information System
No ratings yet
Module Develop and Manage Information System
10 pages
Data Analysis Coursea
No ratings yet
Data Analysis Coursea
15 pages
Notes Comprog
No ratings yet
Notes Comprog
1 page
U01 Notes2
No ratings yet
U01 Notes2
2 pages
Concept
No ratings yet
Concept
4 pages
Knowledge Is An Essential Element at Present World: Munich Personal Repec Archive
No ratings yet
Knowledge Is An Essential Element at Present World: Munich Personal Repec Archive
31 pages
Discussion #3
No ratings yet
Discussion #3
2 pages
Impact That Knowledge Management Has On The Overall Success of A Business and How It Can Be Implemented in Organizations
No ratings yet
Impact That Knowledge Management Has On The Overall Success of A Business and How It Can Be Implemented in Organizations
13 pages
Decision Making For Two Samples
No ratings yet
Decision Making For Two Samples
76 pages
Defining Knowledge PDF
No ratings yet
Defining Knowledge PDF
9 pages
Bus 5114 Discussion Assignment 01 - 07
No ratings yet
Bus 5114 Discussion Assignment 01 - 07
3 pages
Bus 5114 Discussion Assignment 01 - 11
No ratings yet
Bus 5114 Discussion Assignment 01 - 11
2 pages
Test Bank For Introduction To Probability and Statistics 14th Edition by Mendenhall
No ratings yet
Test Bank For Introduction To Probability and Statistics 14th Edition by Mendenhall
12 pages
ITEC 205 Information Management: Information and Decision Making
No ratings yet
ITEC 205 Information Management: Information and Decision Making
42 pages
Basics of Hypothesis Testing
No ratings yet
Basics of Hypothesis Testing
36 pages
DatA414 Prac 2 Linear Regression 2024.pdfasisipho
No ratings yet
DatA414 Prac 2 Linear Regression 2024.pdfasisipho
21 pages
Cook 2008
No ratings yet
Cook 2008
27 pages
Geog 3mb3 Section 4
No ratings yet
Geog 3mb3 Section 4
30 pages
I Css RR MW Brochure
No ratings yet
I Css RR MW Brochure
5 pages
Template For Activity No. 3 T TESTANOVA
No ratings yet
Template For Activity No. 3 T TESTANOVA
6 pages
Traffic Engineering Lab Exercise Report 1&2 Department of Civil Engineering Name:Mekuanint Getnet Entry No: 2018cep2086
No ratings yet
Traffic Engineering Lab Exercise Report 1&2 Department of Civil Engineering Name:Mekuanint Getnet Entry No: 2018cep2086
18 pages
Ridge Regression
No ratings yet
Ridge Regression
6 pages
bài tập
No ratings yet
bài tập
4 pages
Studi Kasus: Identifikasi Komponen Penciri Akreditasi Sekolah/Madrasah Pada Tingkat SD/MI Di Provinsi Kalimantan Timur Tahun 2015
No ratings yet
Studi Kasus: Identifikasi Komponen Penciri Akreditasi Sekolah/Madrasah Pada Tingkat SD/MI Di Provinsi Kalimantan Timur Tahun 2015
8 pages
May 2016 S2
No ratings yet
May 2016 S2
2 pages
CS-7830 Assignment-2 Questions 2022
No ratings yet
CS-7830 Assignment-2 Questions 2022
4 pages
Multi Regression
No ratings yet
Multi Regression
17 pages
Resume 190922 - Restri Ayu Safarina
No ratings yet
Resume 190922 - Restri Ayu Safarina
3 pages
4.3 Example of Single Exponential Smoothing - Minitab
No ratings yet
4.3 Example of Single Exponential Smoothing - Minitab
3 pages
When Can We Trust The Limits On A Process Behavior Chart?: Home Content
No ratings yet
When Can We Trust The Limits On A Process Behavior Chart?: Home Content
2 pages
Working Knowledge: Thomas H Davenport Laurence Prusak
No ratings yet
Working Knowledge: Thomas H Davenport Laurence Prusak
19 pages

Statistics Research

Uploaded by

Statistics Research

Uploaded by

I.

II. Evaluate business and economic data/ information

2.1 Data, information, and knowledge

The Transformation from Data to Information:

The Metamorphosis of Information into Knowledge:

Example 1: Unveiling Customer Behavior through Data

Example 2: Improving Patient Care with Data-Driven Insights

Example 3: Guiding Economic Policy with Data-Powered Knowledge

Data: A government agency systematically collects economic data, including unemployment

2.2. Different methods of analysis

2.2.1. Descriptive data analysis

In 2008, Murray J. Fisher RN published a report titled "Understanding Descriptive Statistics"

2.2.2. Exploratory data analysis

Exploratory Data Analysis is a statistical approach that focuses on preliminary investigations of

2.2.3. Confirmatory data analysis

According to Dataheadhunters Confirmatory data analysis (CDA) serves as a rigorous statistical

Figure 1: STATA's results for Descriptive data analysis

Source: Author’s compilation

Source: Author’s compilation

Source: Author’s compilation

3.1. Quantitative analysis and Qualitative analysis

Qualitative data describes qualities or characteristics. It is collected using questionnaires,

Quantitative analysis Qualitative analysis

3.2. Descriptive statistics and Inferential statistics

3.2.1. Descriptive statistics

Odd data set: {2, 3, 4, 5, 6, 7}

Median = 5 (value is in the middle)

Standard deviation is a statistical measurement of the dispersion of a dataset relative to its

3.2.2. Inferential statistics

Scatter plots provide visualization of the relationship, while inferential

You might also like