Statistics Research
Statistics Research
Introduction
Data are a set of facts and provide a partial picture of reality. Whether data are being collected
with a certain purpose or collected data are being utilized, questions regarding what
information the data are conveying, how the data can be used, and what must be done to
include more useful information must be kept in mind. Data refers to raw, unprocessed facts or
observations. It represents the most basic level of information, consisting of individual elements
that have not yet been organized or interpreted. Data can be quantitative or qualitative and
come from various sources, such as sensors, surveys, transactions, and documents. (Foster
Provost, 2013).
Information is organized and processed data that has been given context and meaning. It
represents a higher level of understanding than data, as it provides insights into the underlying
patterns and relationships within the data. Information can be derived from data through
various methods, such as analysis, interpretation, and synthesis. ( Frances Bee, 2005)
https://books.google.com.vn/books?
hl=vi&lr=&id=xRcuDwAAQBAJ&oi=fnd&pg=PR7&dq=what+is+information+in+statistics&ots=rSE
fch0irD&sig=x-FnJSig_pV1E1nvE0H4V22qd50&redir_esc=y#v=onepage&q=what%20is
%20information%20in%20statistics&f=false
Knowledge is the accumulation of information and experience that has been applied to solve
problems or make decisions. It represents the deepest level of understanding, as it combines
information with expertise and judgment. Knowledge is often tacit or subjective, and it can be
difficult to codify or share.
Data, the raw material of understanding, forms the foundation of our quest for knowledge.
However, data itself is like a pile of puzzle pieces, each containing a unique puzzle piece but
lacking a clear picture. To turn these raw materials into meaningful insights. The journey begins
with data collection, ensuring we have a comprehensive data set for analysis. This involves
identifying relevant sources, using appropriate methods, and ensuring data quality.
Once collected, data must be meticulously cleaned to remove errors, inconsistencies, and
outliers. This ensures data accuracy and reliability, providing a solid foundation for further
analysis. Next, data organization is important to facilitate effective analysis. Data is structured
into a clearly defined format, often involving the creation of a database or data warehouse,
making the data easily accessible and manageable.
This organized data then undergoes data analysis, using statistical methods, machine learning,
or other analytical techniques to discover patterns, trends, and relationships. This process
extracts meaningful insights from structured data. The transformation culminates in information
interpretation, where the results of data analysis are given context and meaning. This involves
understanding the underlying factors, drawing conclusions from identified patterns, and
considering the limitations of the data. Through this process, we transform raw data into
structured information, revealing the story it holds. At its heart, the transition from data to
information is a journey of discovery and understanding. It's about drawing meaning from
chaos, turning raw observations into valuable insights. This process allows us to make informed
decisions, solve complex problems, and better understand the world around us.
Information, the structured and organized representation of data, provides context and
meaning, revealing the patterns and relationships embedded within the raw material. However,
the knowledge holds the key to unlocking the true potential of this transformed data.
Knowledge, the culmination of information, expertise, experience, and judgment, empowers us
to make informed decisions, solve complex problems, and drive innovation.
The transition from information to knowledge is a multifaceted process that involves a deep
synthesis of understanding. It begins with interpretation, where we delve into the information,
seeking to comprehend the underlying factors, concluding the identified patterns, and
acknowledging the limitations of the data.
Next, experience plays a crucial role. As we engage with the information, apply it to real-world
scenarios, we gain valuable insights that enrich our understanding. Through trial and error,
success and failure, we refine our knowledge, making it more robust and adaptable.
Expertise, the specialized knowledge and skills acquired through years of study, practice, and
mentorship, further elevates the transformation. Experts bring a depth of understanding to the
information, enabling them to identify nuances, draw connections, and make inferences that
might elude others.
Finally, judgment, the ability to make sound decisions based on accumulated knowledge and
experience, completes the metamorphosis. We evaluate the information, considering its
relevance, reliability, and potential biases, before applying our knowledge to make informed
choices.
The journey from information to knowledge is not a linear progression; rather, it is a continuous
cycle of refinement and understanding. As we encounter new information, we revisit our
existing knowledge, challenging and expanding its boundaries. This dynamic process ensures
that our knowledge remains relevant, adaptable, and actionable. In essence, knowledge is the
lifeblood of progress. It empowers us to navigate the complexities of the world, make informed
decisions, solve problems, and drive innovation. The transformation from information to
knowledge is a testament to our innate human capacity to learn, adapt, and grow.
Example
Data: A retail company meticulously collects sales data from its point-of-sale systems. This data
encompasses individual transactions, capturing details such as products purchased, quantities,
customer demographics, and transaction dates.
Information: By analyzing this vast trove of data, the company identifies patterns and trends in
customer purchasing behavior. They discover that certain products are more popular during
specific seasons, while others are frequently bought together. They also gain insights into
customer preferences based on demographics and location.
Knowledge: Armed with this knowledge, the company transforms customer behavior insights
into actionable strategies. They develop targeted marketing campaigns that resonate with
specific customer segments, optimize product placement and inventory management, and tailor
customer service interactions to enhance satisfaction. This knowledge-driven approach leads to
increased customer engagement, improved sales performance, and a stronger competitive
edge.
Data: A healthcare provider diligently collects patient data from electronic health records. This
data includes patient demographics, medical history, diagnoses, treatment plans, medication
records, and outcomes.
Information: Through rigorous analysis of patient data, the healthcare provider identifies risk
factors for various diseases, such as high blood pressure, a family history of certain conditions,
or lifestyle factors that contribute to health risks.
Knowledge: By applying this knowledge, the healthcare provider can develop personalized
preventive care programs that target individual patient needs. These programs may include
lifestyle modifications, medication management, regular screenings, and educational resources.
This proactive approach to healthcare reduces the risk of adverse patient outcomes, improves
overall patient health, and lowers healthcare costs.
Information: By analyzing this economic data, the government agency identifies trends in
employment, economic growth, and potential areas of concern. They can assess the
effectiveness of current policies and identify emerging economic challenges.
Knowledge: Equipped with this knowledge, the government can formulate informed economic
policies that promote job creation, stimulate economic growth, and address potential
challenges. These policies may involve targeted tax incentives, infrastructure investments,
educational initiatives, or social welfare programs. By applying data-driven knowledge to
economic policymaking, the government can foster a stable and prosperous economy for its
citizens.
There are many ways to classify data analysis methods, however, from the perspective of this
subject, analysis methods are divided into 3 types as follows: descriptive data analysis,
exploratory data analysis, and confirmatory data analysis.
Descriptive Analytics is the act of collecting and arranging past data of a business in summary
form, thereby helping businesses better understand past performance to make the right
strategies in the future. Descriptive statistics are like summaries of a dataset, providing a quick
overview of its characteristics and patterns. They are categorized into measures of central
tendency (mean, median, mode) and measures of variability (standard deviation, range, etc.).
These measures help us understand where most of the data lies (center) and how spread out it
is (variability). Descriptive statistics are essential tools for transforming large datasets into
meaningful insights. For example, a cosmetics company will know how many products each
brand sold last month to plan to import more products. (Joel Grus, 2019)
In 2009, Cheryl Bagley Thompson published a report titled "Descriptive Data Analysis" showing
that descriptive data analysis is an important step in any research project, as it helps researchers
understand and summarize their data before proceeding with more complex analyses. By
planning data analysis, researchers can avoid common pitfalls such as wasting time, missing
important data, producing misleading results, and making statistical errors. Descriptive data
analysis encompasses a variety of techniques, including frequency distributions, measures of
central tendency (mean, median, mode), and variability (range, standard deviation) to help
researchers comprehensively understand their data and identify patterns and trends.
Descriptive data analysis is essential for researchers, providing a foundation for understanding
and interpreting their data. It allows researchers to condense large amounts of information into
manageable summaries, identify patterns and trends, and evaluate the overall characteristics of
their sample. By planning data analysis, researchers can ensure that they collect the necessary
information, avoid errors, and conduct objective analysis. Descriptive data analysis is a valuable
skill. valuable for anyone involved in data-driven decision-making. It empowers individuals to
extract meaningful insights from complex data sets, allowing them to make informed choices
and solve problems effectively. Whether in research, business, or everyday life, analyzing data
descriptively is a powerful asset.
https://www.sciencedirect.com/science/article/abs/pii/S1067991X08002976
Based on my personal opinion, I see descriptive data analysis (DAA) as an indispensable tool in
the field of data analysis, creating a solid foundation for deeper exploration and empowering
decision-making. wise decision. It serves as a guide, transforming dense data into manageable
summaries, allowing researchers to identify patterns, trends, and overall characteristics of their
data.By By meticulously planning data analysis in advance, researchers can avoid wasting time
on irrelevant information, missing important data points, or producing misleading results. This
proactive approach ensures that valuable resources are used efficiently, leading to more
accurate and reliable conclusions.
https://www.sciencedirect.com/science/article/abs/pii/S1036731408001732
In my personal opinion, I find that descriptive data analysis (DAA) proves its value in
summarizing and describing data, but it lacks the ability to explain the "why" behind it. patterns,
making it susceptible to outliers and challenges when comparing data sets. Additionally, DAA
provides limited insights into complex, variable relationships and cannot establish cause-and-
effect connections. To overcome these limitations, researchers often combine descriptive
analytics with inferential and predictive statistical methods to gain a more comprehensive
understanding of the data.
Descriptive data analysis (DDA) is a fundamental technique in data analysis that involves
summarizing and describing the characteristics of a dataset. It provides a basic understanding of
the data by identifying central trends, variability, and patterns. DDA is widely used in various
research fields, including:
Education research: In 2008, Meilun Shih, Jui Feng, and Chin-Chung Tsai used the Descriptive
data analysis method to research and analyze the content of articles about cognition in e-
learning. The author used this method to describe the scope of research, determine the total
number of articles published within the time frame and the number of articles related to the
research topic, identify research trends, analyze the distribution of research across different
categories such as year of publication, preferred journal, and topics explored (e.g., "Teaching
Approaches" is most popular). Identify trends in data collection methods, showing increased
use of student learning diaries and online messages in addition to traditional survey methods.
Besides, the author also uses this method to determine the potential impact. By analyzing the
number of times cited, the author suggests that studies on "Teaching approaches", "Processing
information" and "Motivation" may have a stronger influence on future studies.
https://sci-hub.se/https://doi.org/10.1016/j.compedu.2007.10.004
Research: In 2018 Assarroudi, A., Heshmati Nabavi, F., Armat, M. R., Ebadi, A., & Vaismoradi, M.
Used Descriptive Data Analysis method wrote a research article "Directed qualitative content
analysis: description and development of foundational methods and data analysis procedures.
Journal of Nursing Research". The research article has focused on Qualitative Content Analysis,
the entire passage revolves around Directed Qualitative Content Analysis (DQCA), a method
used to analyze text data based on theories or existing frameworks. On the other hand,
descriptive data analysis focuses on summarizing and describing numerical data. It would not be
suitable for analyzing interview transcripts or other qualitative documents. The passage
highlights a gap in the literature regarding the specific steps involved in DQCA that the authors
aim to address. They propose a detailed 16-step method for DQCA, demonstrating its
advantages for qualitative researchers.
https://journals.sagepub.com/doi/abs/10.1177/1744987117741667
DDA is a powerful tool that can be used to gain valuable insights from data. However, using it
responsibly and in conjunction with other analytical techniques is important. By combining DDA
with data visualization, inferential statistics, and predictive modeling, researchers can gain a
more comprehensive understanding of the data and make informed decisions. In addition, it's
crucial to consider the context and limitations of the data when using DDA. Data is never
perfect, and it's important to be aware of potential biases, errors, and missing data that could
affect the analysis. DDA should be used to communicate findings effectively. Clear and concise
presentations of data summaries, visualizations, and conclusions can help others understand
the research and its implications.
https://www.sciencedirect.com/science/article/abs/pii/B9780080970868421252
In 1986, Chatfield published a research paper on Exploratory data analysis in the European
Journal of Operational Research. The research shows that Exploratory data analysis helps
summarize and understand the data through techniques like calculating means and standard
deviations, plotting histograms and boxplots, and identifying outliers. Exploratory data analysis
aids in selecting a suitable statistical model for further analysis by revealing patterns and
relationships in the data. It helps assess if assumptions about the data distribution (normal,
exponential, etc.) are reasonable. By providing a clear understanding of the data, Exploratory
data analysis prevents the use of inappropriate statistical methods and helps focus on the most
relevant aspects for analysis. It can even make formal testing unnecessary in some cases.
Exploratory data analysis is applicable across various data types and situations, from completely
new datasets to analyzing a sequence of similar datasets. Overall, Exploratory data analysis is an
essential initial step in data analysis because it provides the foundation for all subsequent
stages. It helps ensure that chosen models and statistical methods are suitable for the data,
leading to more accurate and reliable results.
Based on my personal perspective, I see the versatility of this method across many different
fields, from social sciences to business and engineering. It serves as a bridge between data and
our understanding, allowing us to formulate meaningful questions, choose appropriate
statistical models, and ultimately draw reliable conclusions.
https://www.sciencedirect.com/science/article/abs/pii/0377221786902092
In 1986, Chatfield published a research paper titled "Exploratory data analysis" on exploratory
data analysis in the European Journal of Operational Research. Research shows that exploratory
data analysis (EDA), although valuable for understanding data, has some limitations. EDA can be
difficult to understand and standardize because it relies on subjective judgment rather than
strict rules. This lack of formality also increases the risk of misinterpreting the data if
appropriate statistical analysis is not performed. EDA is a preliminary step to identify trends and
patterns, not a substitute for rigorous statistical methods or well-defined models. Using an
inappropriate model can be more detrimental than not using it at all. The article also criticizes
the popular approach to EDA for ignoring its role as a precursor to statistical analysis and
undervaluing traditional statistical techniques.
Based on my personal perspective, I see exploratory data analysis (EDA) as a valuable tool in
data analysis, but it is important to recognize its limitations. While EDA excels at discovering
patterns and initial insights, its reliance on subjective judgment can introduce bias and
misinterpretation. Without formal statistical analysis to confirm or refute EDA's findings, the
conclusions may be inconclusive. EDA serves as a preliminary step, not a substitute for rigorous
statistical methods or well-defined models. Overemphasis on EDA while undervaluing
traditional statistical techniques can hinder the pursuit of accurate and reliable conclusions.
Therefore, a balanced approach that acknowledges both the strengths and limitations of EDA is
essential for effective data discovery and analysis.
https://www.sciencedirect.com/science/article/abs/pii/0377221786902092
Business and Management: In 2023, Marcos Ferasso used the Exploratory data analysis method
in the research paper "Mapping the circular economy in the SME sector: Exploratory network
analysis". The Exploratory data analysis method focuses on understanding the characteristics of
a data set in the early stages of analysis. It involves techniques such as histograms, scatterplots,
and basic statistics to understand patterns and underlying problems.
https://www.sciencedirect.com/science/article/pii/S2666784323000505
Research: In 2021, David E. Hubbard applied the Exploratory Data Analysis method to the
research article "An exploratory study of library science journal articles in syllabi". This method
is used to help visualize and summarize data distribution. It involves techniques such as
histograms or frequency tables to understand how often articles are cited, publication dates,
subject areas, and disciplines where they are used. Exploratory Data Analysis helps discover
original patterns in data. Here, Exploratory Data Analysis helped identify trends such as a higher
frequency of citations to recently published articles or the dominance of traditional library
science topics. Additionally, Exploratory Data Analysis helps develop exploratory analysis
research questions using Exploratory Data Analysis that can guide the researcher toward more
specific questions. Exploratory Data Analysis has revealed the need to investigate why newer
topics are less cited or to explore how library science articles are used in other disciplines.
Based on my personal perspective, I see EDA serving as a foundational step in data analysis,
providing a deeper understanding of data characteristics, patterns, and potential relationships.
It's like getting to know a person before starting a deep conversation. EDA techniques like
visualization, summary statistics, and basic data manipulation help us uncover hidden insights,
identify outliers, and evaluate data quality. Without EDA, data analysis is like trying to navigate a
maze without a map. It is important to understand complex data sets, formulate meaningful
research questions, and promote informed decision making. EDA is an extremely useful tool for
anyone working with data, regardless of their field or expertise. It allows us to transform raw
data into meaningful knowledge, leading to better understanding, problem solving, and
innovation.
https://dataheadhunters.com/academy/exploratory-vs-confirmatory-data-analysis-approaches-
and-mindsets/
In 2022, Dustin Fife and Joseph Lee Rodgers published a report titled " Understanding
Exploratory/Confirmatory Data Analysis: Moving Beyond the "Replication Crisis". This article
revealed the strengths of the Data Validation Analysis method It is a rigorous approach that
focuses on predetermined testing Hypotheses using statistical methods, it increases the
possibility of introducing incorrect decisions mistakes, and CDA makes the research process
more transparent by requiring researchers to state their assumptions before collecting data.
Easily evaluate and check the validity of research. , including using appropriate statistical
methods and controlling confounding factors, helps increase the reliability of research results.
https://osf.io/preprints/psyarxiv/5vfq6
Based on my personal opinion, I find that Confirmatory Data Analysis (CDA) is a powerful
research method with many advantages. Outstanding strengths of CDA include rigor and
transparency, helping to control errors and enhance verifiability. CDA also supports more
informed decision-making by estimating effect sizes and is suitable for directed research. Thanks
to these advantages, CDA is a valuable tool for researchers in many different fields.
In 2011, Dr. Elissaios Karageorgiou published the report "THE LOGIC OF EXPLORATORY AND
CONFIRMATORY DATA ANALYSIS", the report pointed out the weaknesses of the Confirmatory
Data Analysis method. According to the article "Understanding Exploratory/Confirmatory Data
Analysis: Overcoming the "Replication Crisis"", one weakness of Confirmatory Data Analysis
(CDA) is selection bias. Sampling bias can distort statistical results because the sample collection
is not representative of the entire population. This can lead to over-reliance on the constructed
model, meaning that the model is only accurate for the specific data set and cannot be
generalized to other data sets. Besides, CDA can be less flexible than other data analysis
methods, such as exploratory data analysis. This can make CDA difficult to apply to studies with
open goals or unclear data. CDA requires researchers to state hypotheses before collecting data.
This can limit the ability to discover new and unexpected findings. The results of CDA can be
difficult to interpret for those without a statistical background.
http://www.cogcrit.umn.edu/docs/karageorgiou_11.shtml
Based on my personal opinion, although CDA offers a structured approach to hypothesis testing,
its limitations cannot be ignored. The possibility of bias, inflexibility, and limitations in
exploration need to be carefully considered. Depending on the research goals and clarity of the
data, alternative methods such as EDA may be more suitable for exploring the unknown and
discovering new possibilities.
Medicine: In 2013, Alvydas Mikulskis published the report "Novel data analysis methods to
overcome cut point challenges and enable comprehensive assessment of antidrug binding
activity in confirmatory assays". In this study, confirmatory data analysis was applied indirectly
through improving the accuracy of confirmatory testing, an important step in the drug
immunogenicity assessment process. Confirmatory data analysis is innovatively applied to
improve the accuracy of drug immunogenicity assessment.
https://www.sciencedirect.com/science/article/abs/pii/S0022175913000999
Nursing education: In 2020, Alexis Harerimana published the research paper "Using Exploratory
and Confirmatory Factor Analysis to understand the role of technology in nursing education". In
this report, confirmatory data analysis method is used to check the validity of this model. CFA
evaluates the fit of the five-factor model to the collected data. Researchers use fit indices such
as CFI, IFI, TLI, RMSEA, and SRMR. These indices measure how well the five-factor model
explains the relationships among the 14 questions in the questionnaire. Researchers use
suitable indices such as CFI, IFI, TLI, RMSEA and SRMR. These indices measure how well the five-
factor model explains the relationships among the 14 questions in the questionnaire.
https://www.sciencedirect.com/science/article/abs/pii/S0260691719311414
Based on my personal opinion, I see CDA as an essential tool for researchers who want to test
and validate their theories in a quantitative way. It provides a systematic approach to
understanding the underlying structure and relationships between variables in many fields of
study. However, it is important to carefully consider assumptions, sample size requirements, and
model complexity when applying CDA to ensure the validity and interpretability of results.
2.2.4 Example
First, when analyzing data in depth, we will perform descriptive analysis for the purpose of...
Performing descriptive analysis results are presented in Figure 1.
From the results of Figure 1, we see that ROA and TAT have 368 observations, proving that the
observations are equal and the two variables do not lose data. This is essential information for
the author to take the next data analysis steps. Not only that, ROA recorded an average value of
0.0695049, a standard deviation of 0.0711706, a minimum value of -0.6536702, and a
maximum value of 0.4597398. Besides, the results from Figure 1 also show that the average
value of TAT is 1.104643, the standard deviation is 1.56614, the smallest value is -5.069544 and
the maximum value is 13.83886
To have a deeper look at these two variables, the author decided to draw a Histogram to review
the data distribution of two variables, ROA and TAT, in the author's data set. The results of
plotting in STATA 14 are presented in Figures 2 and 3.
Figure 2: STATA's result of drawing histogram Figure 3: STATA's result of drawing histogram
for ROA for TAT
From the results in Figure 2, the ROA chart shows a number of distinct characteristics. First, in
terms of column width, the classes in the histogram chart have equal widths. This helps to
compare the frequency of ROA values. Next in terms of skew, the histogram has a slight positive
skew. The majority of ROA values lie near zero, but there are some higher values (large positive
ROA) that pull the right tail of the distribution farther out. Regarding the top, the peak of the
chart is very clear, located around ROA = 0. This shows that the majority of businesses in the
data sample have ROA close to 0. The number of ROA values concentrated here is the highest. ,
with a frequency of about 300. The assessment of the distribution of ROA data is quite
concentrated, with the majority of values located near the average value (about 0). This may
indicate that the majority of businesses have relatively uniform asset utilization efficiency.
Finally, regarding the fit of the distribution, the ROA distribution does not seem to follow a
completely normal distribution because there are some outliers and a slight right skew.
However, the majority of values cluster around the mean, suggesting that the distribution may
approximate a normal distribution but with some adjustments.
From the results in Figure 2, the LEV chart shows a number of distinct characteristics. First is the
width of the column. This histogram has a horizontal axis width (LEV) from about -5 to 15. This
gives The value range of LEV is quite wide. However, most of the values are concentrated in the
range from -1 to 5. Next, about skew, this chart has a positive skew. The majority of LEV values
lie near zero, but there are some higher values (large positive LEV) that pull the right tail of the
distribution further out. The peak of the chart is very clear, located around LEV = 0. This shows
that the majority of businesses in the data sample have LEV close to 0. The number of LEV
values concentrated here is the highest, with the highest frequency. number around 175. There
are some outliers present on both sides of the distribution, especially on the right side with
large positive values (LEV near 15). However, the number of these outliers is not much. The
assessment of the distribution of LEV data is quite concentrated, with the majority of values
lying near the mean value (around 0). This may show that the majority of businesses have
relatively equal levels of financial leverage. Regarding the fit of the distribution, the LEV
distribution does not seem to follow a normal distribution completely because there are some
outliers and an obvious right skew. However, the majority of values cluster around the mean,
suggesting that the distribution may approximate a normal distribution but with some
adjustments.
Phân tích khám phá (Exploratory data analysis) thường được dùng nhằm mục đích… Trong trường hợp
này, phân tích dữ liệu khám phá sẽ dùng để tìm hiểu xu hướng mối quan hệ của hai biến trên (ROA và
TAT). Kỹ thuật phân tích dữ liệu khám phá được sử dụng tại đây là vẽ biểu đồ scatter plot. Biểu đồ scatter
plot sẽ giúp… Hình 4 mô tả kết quả STATA về việc vẽ scatter plot cho 2 biến ROA và TAT.
Figure 4: STATA's result for drawing scatter plot
ROA là trục tung LEV là trục hoành có xu hướng ngược chiều nhau lev tăng thì roa giảm. Biểu
đồ phân tán (scatter plot) giữa hai biến ROA (Return on Assets) và LEV (Leverage)
cho thấy mối quan hệ ngược chiều nhau giữa chúng. Trên trục tung là biến ROA và
trên trục hoành là biến LEV. Dữ liệu cho thấy khi LEV tăng, ROA có xu hướng giảm,
thể hiện qua đường xu hướng màu đỏ nghiêng xuống phía bên phải. Sự phân bố của
các điểm dữ liệu tập trung chủ yếu quanh mức LEV từ 0 đến 5, với một vài điểm
ngoại lệ ở xa. Đường fitted values minh họa rõ ràng xu hướng giảm của ROA khi LEV
tăng.
Regression equation ROA=0.0749701−0.0049475×LEV. In particular, the intercept coefficient
(Intercept) is 0.0749701, the coefficient of the LEV variable is -0.0049475. The above equation
represents the relationship between return on assets (ROA) and financial leverage (LEV). The
intercept is 0.0749701, showing that when LEV is 0, the average value of ROA is 0.0749701. This
means that if a company did not use financial leverage, its expected ROA would be 0.0749701,
or 7.49701%. The coefficient of LEV is -0.0049475, which represents the change in ROA when
LEV increases by one unit. Specifically, when LEV increases by 1 unit, ROA will decrease on
average by 0.0049475 units, or 0.49475%. This coefficient has a negative value, showing a
negative relationship between LEV and ROA: as the level of financial leverage increases, the
return on assets decreases. The accompanying statistical results indicate that this relationship is
statistically significant. The p-value of LEV is 0.037, smaller than the conventional significance
level of 0.05, showing that the effect of LEV on ROA is significant. Hệ số R2R^2R2 là 0.0119,
cho biết khoảng 1.19% sự biến thiên của ROA được giải thích bởi LEV. Mặc dù giá trị R2R^2R2
thấp, nó vẫn cho thấy có một mối quan hệ có ý nghĩa giữa hai biến số này.
III. Analyse and evaluate raw business data using a number of statistical methods
Quantitative analysis is a mathematical approach that collects and evaluates measurable and
verifiable data in order to evaluate performance, make better decisions, and predict trends.
Unlike qualitative analysis, quantitative analysis uses numerical data to provide an explanation
of "what" happened, but not "why" those events occurred. ( Will Kenton, 2023)
https://www.investopedia.com/terms/q/quantitativeanalysis.asp#:~:text=Quantitative
%20analysis%20is%20a%20mathematical,better%20decisions%2C%20and%20predict
%20trends.
Example: In 2024, Yinlong Luo and colleagues in a research paper titled "Quantitative analysis of
microplastics in water environments based on Raman spectroscopy and convolutional neural
network" used the Quantitative analysis method to study microplastics. Quantitative analysis
plays an important role in assessing the level of microplastic pollution in the water environment.
The above article introduces a new method combining Raman spectroscopy and convolutional
neural network (CNN) to determine the concentration of Polyethylene (PE) microplastics in
actual water samples. This method uses average mapping spectrum (AMS) to improve the
uniformity in Raman spectrum analysis and filters MP solutions of different concentrations to
expand the effective detection range. Experimental results with 6 different PE sizes in 5 water
environments show high accuracy of the method, with high R² and low RMSE. Compared with
other machine learning models such as Random Forest (RF) and Support Vector Machine (SVM),
the combined Raman and CNN method appears to be more effective in determining
microplastic concentration
https://www.sciencedirect.com/science/article/abs/pii/S0048969724020680
Descriptive statistics are brief informational coefficients that summarize a given data set, which
can be either a representation of the entire population or a sample of a population. Descriptive
statistics are broken down into measures of central tendency and measures of variability
(spread). Measures of central tendency include the mean, median, and mode, while measures
of variability include standard deviation, variance, minimum and maximum variables, kurtosis,
and skewness. ( Adam Hayes, 2024)
Mean is a basic and common statistical measure used to describe the central value of a data set.
It is calculated by taking the sum of all the values in the data set and dividing it by the number
of values. ( Nga vu, 2022)
Example: Suppose we have a math data set of 10 students: {8, 9, 7, 6, 10, 5, 9, 8, 7, 6}.
Average = (8 + 9 + 7 + 6 + 10 + 5 + 9 + 8 + 7 + 6) / 10 = 7.6
The median is an important statistical measure used to describe the middle value of a data set
when the values are arranged in ascending or descending order. In other words, the median
divides the data set into two parts with an equal number of values. ( Nga Vu, 2022)
Example:
Sort: {2, 3, 4, 5, 6, 7}
Example to demonstrate how variance works. Let’s say returns for stock in Company ABC are
10% in Year 1, 20% in Year 2, and −15% in Year 3. The average of these three returns is 5%. The
differences between each return and the average are 5%, 15%, and −20% for each consecutive
year. Squaring these deviations yields 0.25%, 2.25%, and 4.00%, respectively. We get a total of
6.5% if we add these squared deviations. When you divide the sum of 6.5% by one less the
number of returns in the data set, as this is a sample (2 = 3-1), it gives us a variance of 3.25%
(0.0325). Taking the square root of the variance yields a standard deviation of 18% (√0.0325 =
0.180) for the returns.
Example: Say we have the data points 5, 7, 3, and 7, which total 22. You would then divide 22 by
the number of data points, in this case, four—resulting in a mean of 5.5. This leads to the
following determinations: x̄ = 5.5 and N = 4. The variance is determined by subtracting the
mean value from each data point, resulting in -0.5, 1.5, -2.5, and 1.5. Each of those values is
then squared, resulting in 0.25, 2.25, 6.25, and 2.25. The square values are then added together,
giving a total of 11, which is then divided by the value of N minus 1, which is 3, resulting in a
variance of approximately 3.67.
Hình 6 mô tả kết quả STATA cho việc thống kê mô tả cho biến LEV. Kết quả trong
hình cho thấy, LEV có trung bình được ghị nhận là 1.587965 và trung vị là 1.354083
Hơn thế nữa, phương sai của biến được ghi nhận là 1.421404 bên cạnh đó là độ
lệch chuẩn là 1.192226. Số lượgn quan sát của biến cũng được ghi nhận ở mức 368
Based on the histogram chart above, we can make some important observations. First, in terms
of width, this histogram has a horizontal axis (TAT) ranging from 0 to 10, with the value ranges of
TAT divided into columns (bins) with a width of about 1 unit. The peak of the histogram is in the
first bin (TAT from 0 to 1), with the highest frequency around 250, showing that most of the data
is concentrated at very low TAT values. This graph also has a pronounced right skew, meaning
that there are some high TAT values but their number is much less than the low TAT values.
Regarding the rating level, the data distribution on this histogram shows that the majority of TAT
values are between 0 and 3, and the number of TAT values decreases as TAT increases. In terms
of distribution fit, this histogram shows an uneven distribution of the data, focusing mainly on
low TAT values. This is not a normal distribution but a right-skewed distribution. In summary,
this histogram shows that the majority of TAT values are very small, with a few large TAT values.
This may reflect the specific characteristics of the data you are analyzing, such as a process's
processing time being mostly short, with a few long exceptions.
Inferential statistics is a statistical method used to draw conclusions about a population based
on data collected from a sample of that population. It allows researchers to make inferences
about a larger population from which a random sample is drawn. Inferential statistics are used
to test hypotheses and estimate population characteristics. ( C.W. Kuhar, 2010). Inferential
statistics is a powerful tool that allows researchers to draw meaningful conclusions about the
world around them. However, it is important to note that the conclusions drawn from
Inferential statistics are based only on data collected from a sample of the population. (Sue A
Hill, 2006)
Example: You randomly select a sample of 11th graders in your state and collect data on their
SAT scores and other characteristics. You can use inferential statistics to make estimates and test
hypotheses about the whole population of 11th graders in the state based on your sample data.