List of Abrevation
List of Abrevation
List of Abrevation
This statement of research problems seeks to identify the key issues related to
diabetes in Ghana and explore the associated global trends. Data will be collected
from various sources, including government health reports, surveys, and
interviews with healthcare professionals. This data will be used to analyze the
rate of diabetes prevalence, the accessibility of diabetes services, and the cost of
diabetes care in Ghana. Additionally, the research will explore the global trends
in diabetes in order to identify best practices that can be applied in Ghana to
improve the management of diabetes. Ultimately, this research will provide
valuable insights into the challenges posed by diabetes in Ghana and how to
address them.
1.3 Objective
In this study, we will be using LOGISTIC regression analysis to model our data.
Logistic regression will be used to classify individuals into two groups: those
with diabetes (1) and those without diabetes (0) as well as to explain the
relationship between diabetes and various independent variables, such as Age,
Body Mass Index (BMI), Insulin, Diabetes Pedigree Function, Skin Thickness,
Blood Pressure, outcome, Pregnancy, exercise, and Glucose.
The analysis of this study will be conducted using R Statistical Software. Data
will be gathered from the Internet, libraries, personal notes, lecture notes and
other relevant sources such as the World Health Organization (WHO). All of
these sources will provide valuable insight into the research topic, allowing us to
draw meaningful conclusions.
1.5 JUSTIFICATION
The success of this study will provide valuable insight into the factors that
contribute to Diabetes in Ghana, and beyond. With this knowledge, we can work
towards reducing the prevalence of Diabetes, not just in Ghana, but around the
world. With a greater understanding of the causes of this condition, everyone can
take steps to protect their health and reduce the number of people who suffer
from Diabetes.
Literature Review
2.1 Introduction
This chapter reviews some literature on diabetes. It focuses on some factors causing
diabetes, the effects of diabetes, and the prevention and treatment of diabetes.
2.2.1 Pregnancies: Gestational diabetes can occasionally develop during pregnancy. The
body's ability to efficiently control blood sugar levels is temporarily impaired. Although
gestational diabetes usually goes away after childbirth, it does raise the risk of type 2
diabetes later in life, so it's crucial to remember this.
2.2.2 Glucose: Glucose is the bloodstream sugar that serves as the body's main source
of energy. In people with diabetes, the body either generates insufficient insulin (type 1
diabetes) or is unable to utilise the insulin that is produced properly (type 2 diabetes).
As a result, blood glucose levels rise, which can result in a number of issues if not under
control.
2.2.3 Blood Pressure (BP): Diabetes frequently coexists with high blood pressure,
sometimes referred to as hypertension, which is a common illness. The impacts of the
two illnesses can worsen one other when they coexist regularly. People with diabetes
who have high blood pressure are at an increased risk of having problems like heart
disease, stroke, and renal disease.
2.2.4 Skin Thickness: There is no obvious connection between skin thickness and
diabetes. It might, however, be connected to ailments like acanthosis nigricans, a skin
disorder marked by dark, thicker areas of skin. Acanthosis nigricans may indicate
insulin resistance, which is a risk factor for type 2 diabetes.
2.2.5 Insulin: The pancreas releases the hormone insulin, which aids in controlling
blood sugar levels. Type 1 diabetes occurs when the body is unable to produce insulin,
but type 2 diabetes occurs when the body develops a resistance to the effects of insulin.
Insulin enables glucose to enter cells where it can be used as fuel. Blood glucose levels
stay high when insulin is inefficient or absent.
2.2.6 Body Mass Index (BMI) is a calculation of body fat based on a person's height
and weight. If someone is underweight, normal weight, overweight, or obese, it
indicates that. Type 2 diabetes is significantly influenced by excess weight, especially
abdominal obesity. Obesity can cause insulin resistance and impair the body's
utilization of insulin.
2.2.8 Age: Type 2 diabetes risk increases with age. People are more likely to get
diabetes as they age. This may be the result of things like decreased physical activity,
modifications to metabolism, and an increase in body fat storage.
2.2.9 Outcome: The conclusion is whether diabetes exists or not. It probably shows
whether or not a person has had a diabetes diagnosis in this situation. The various
factors stated above may have an impact on diabetes treatment or perhaps cause
diabetes to develop.
2.3 Effects of Diabetes
One of the primary effects of diabetes is the impact it has on blood sugar levels. When a
person has diabetes, their blood sugar levels can become dangerously high or low,
leading to a condition known as hyperglycemia or hypoglycemia, respectively.
Hyperglycemia can cause symptoms such as increased thirst, frequent urination, fatigue,
and blurred vision. On the other hand, hypoglycemia can result in symptoms like
dizziness, confusion, sweating, and even loss of consciousness. Diabetes also
significantly affects the cardiovascular system. People with diabetes are at a higher risk
of developing cardiovascular diseases such as heart attacks, strokes, and high blood
pressure. Elevated blood sugar levels can damage blood vessels and increase the
buildup of fatty deposits, leading to atherosclerosis. This condition restricts blood flow
to vital organs and increases the likelihood of heart-related complications. Another area
where diabetes has a profound impact is on the kidneys. Over time, high blood sugar
levels can damage the blood vessels in the kidneys, impairing their ability to function
properly. This can lead to a condition called diabetic nephropathy, which is
characterized by the gradual loss of kidney function. If left untreated, diabetic
nephropathy can progress to end-stage renal disease, requiring dialysis or a kidney
transplant. Diabetes can also have a significant impact on a person's mental health. The
stress and emotional burden of managing the condition, along with the potential
complications, can contribute to feelings of anxiety and depression. Additionally, the
impact of diabetes on physical health can further exacerbate mental health issues. Nerve
damage, known as diabetic neuropathy, is another common effect of diabetes. High
blood sugar levels can damage the nerves, particularly in the feet and legs. This can
result in symptoms such as numbness, tingling, and pain. Diabetic neuropathy can also
affect other parts of the body, including the digestive system, leading to issues like
gastroparesis (delayed stomach emptying) and erectile dysfunction in men.
Methodology
3.1 Introduction
This chapter highlights the methods, data and analytical procedures employed in order
to attain the objectives of the research study. The study emphasis on the analytical
framework, data source and acquisition, sampling and sample size and binary logistic
regression, estimation techniques, definition and measurement of variables.
https://www.kaggle.com/datasets/mathchi/diabetes-data-set
3.3 Sample Size and Sampling Procedure
The dataset contains 768 rows and 9 columns. These columns’s label are listed below.
[1] "Pregnancies"
[2] "Glucose"
[3] "BloodPressure"
[4] "SkinThickness"
[5] "Insulin"
[6] "BMI"
[7] "DiabetesPedigreeFunction"
[8] "Age"
[9] "Outcome"
There are 8 variables are taken as indicators in the dataset. The variable Outcome is a
response stated whether or not a person has diabetes by showing the result value
as 0 for NO and 1 for Yes. Number of Attributes: 8 plus class
The logistic regression model is based on the concept of the logit function, which
transforms the linear regression equation into a range of [0, 1]. This allows us to
interpret the output as the probability of the event occurring.
In logistic regression, the dependent variable is binary, meaning it can take only two
values, such as "yes" or "no," "success" or "failure." The independent variables can be
continuous or categorical. The goal is to estimate the coefficients of the independent
variables that maximize the likelihood of the observed data.
The logistic regression model assumes that the relationship between the independent
variables and the log-odds of the dependent variable is linear. However, this linearity
assumption can be relaxed by including higher-order terms or interaction terms in the
model.
Once the logistic regression model is fitted, it can be used to make predictions on new
data. The predicted probabilities can be converted into binary outcomes using a
specified cutoff value, such as 0.5. However, the choice of the cutoff value depends on
the specific application and the trade-off between false positives and false negatives.
There are several evaluation metrics that can be used to assess the performance of a
logistic regression model, such as accuracy, precision, recall, and F1 score. These
metrics provide insights into how well the model is able to classify the binary outcome.
In conclusion, logistic regression is a widely used statistical model for predicting binary
outcomes. It provides a flexible framework for modeling the relationship between
independent variables and the probability of the event occurring. By estimating the
coefficients using maximum likelihood estimation, the logistic regression model can
make predictions and evaluate its performance using various metrics.
Some of the instances in which binary logistic regression can used are;
1. Modelling the probability that a patient is diabetic given some factors.
2. Modelling the factors that determine whether or not a student smokes, drinks, and
takes a particular elective course.
3. Determining the risk factors of accident severity
4. Establishing the risk factors of marital resolution or determining the probability that
couples will get divorce.
The logistic regression is most appropriate for categorical and binary outcomes because;
1. The response variable, Yi takes only 0 and 1 hence, the logistic regression ensures
that predicted values lie between 0 and 1 inclusively.
2. The errors are heteroskedastic.
3. Error terms are not normally distributed.
4. The logistic regression does not need a linear relationship between the predictor and
response variables.
( p( y)
Logit ( p ( y ) )=ln 1− p( y ) =ω )
exp (ω)
p ( y )=
1−exp (ω)
¿ ( p( y )
1− p ( y) )
=β ο + β 1 X 1 + β 2 X 2 +…+ β k X k
Where;
β ο Is the model intercept
y is the binary outcome variable The logistic regression model above models the
logarithm of the odds of the outcome variable as a linear combination of the predictor
variables. The model coefficients β 0 are estimated using the maximum likelihood
estimation.
The graph of the logistic function is shown in the figure below
Let's assume we have a logistic regression model with one independent variable,
denoted as X. The logistic regression equation can be written as:
logit(p) = β0 + β1*X
Where;
Logit (p) represents the log-odds of the event, p represents the probability of the event
(prevalence), and β0 and β1 are the coefficients estimated from the logistic regression
model.
To estimate the prevalence, we need to convert the log-odds back to the probability
scale. This can be done using the inverse of the logistic function, also known as the
sigmoid function:
p = 1 / (1 + exp(-logit(p)))
Now, let's say we have a threshold value of p threshold. We can estimate the prevalence
as the proportion of individuals in our dataset whose predicted probability (calculated
using the logistic regression equation) exceeds the threshold:
In summary, the logistic regression equation and the sigmoid function allow us to
estimate the probability (prevalence) of an event based on the coefficients obtained
from the logistic regression model. By setting a threshold, we can determine the
proportion of individuals above that threshold and estimate the prevalence accordingly.
Please note that the threshold value is a subjective choice and can impact the estimated
prevalence. Additionally, this approach assumes that the logistic regression model is
appropriately specified and valid for the data being analyzed.
H o : β=0
H 1 : β ≠ 0 i=1,2,3 , … k
The p-value of this test can be found from the standard normal table which is then
compared to the level of significance, α =0.05
3.8.2 Confidence Intervals for Model Parameters
0 1
TP+TN
Accuracy= TP+ TN + FP+ FN
Chapter 4
Data Analysis and Results
4.1 Introduction
This chapter emphasizes on the analysis and presentation of results. It includes
descriptive and summary statistics, establishing relationship using odds ratios,
interpretation of relationship, and estimation of prevalence, model fitting and
diagnostics.
Outcome
Min. :0.000
1st Qu.:0.000
Median :0.000
Mean :0.349
3rd Qu.:1.000
Max. :1.000
Reference
1. American Diabetes Association. (2021). Standards of Medical Care in Diabetes—2021. Diabetes
Care, 44(Supplement 1), S1-S232. doi: 10.2337/dc21-S000
2. Centers for Disease Control and Prevention. (2021). National Diabetes Statistics Report, 2020.
Retrieved from https://www.cdc.gov/diabetes/pdfs/data/statistics/national-diabetes-statistics-report.pdf
3. International Diabetes Federation. (2019). IDF Diabetes Atlas, 9th Edition. Retrieved from
https://www.diabetesatlas.org
4. Zhang, Z., & Yu, K. F. (1998). What's the relative risk? A method of
correcting the odds ratio in cohort studies of common outcomes. JAMA,
280(19), 1690-1691. This article introduces a method for estimating
prevalence directly from the odds ratio obtained from logistic regression.