List of Abrevation

LIST OF ABREVATION
IDF International Diabetes Federation

WHO World Health Organization
BMI Body Mass Index
CHAPTER 1
1. Introduction
The purpose of this chapter is to provide an overview of diabetes and its effects on the
nation, Ghana and the world at large.
In addition to this, the study focuses on the objectives, methodology, justification, scope
and limitations of the research as well as the limitations of the study.
1.1 Background of study
Diabetes is a chronic disease that affects millions of people worldwide. It is

characterized by high levels of blood glucose resulting from defects in insulin
production, insulin action, or both. In recent years, the prevalence of diabetes has
been on the rise, posing a significant public health challenge. In Ghana, diabetes
has become a major concern. According to the International Diabetes Federation
(IDF), an estimated 1.7 million people in Ghana were living with diabetes in 2019.
This number is expected to rise to 2.7 million by 2045 if appropriate measures are
not taken.
The factors contributing to the increasing prevalence of diabetes in Ghana are

multifaceted. Rapid urbanization, sedentary lifestyles, unhealthy diets, and limited
access to healthcare services are among the key factors. The impact of diabetes on
individuals and the healthcare system is substantial, leading to increased morbidity,
mortality, and economic burden. To address the challenges posed by diabetes,
various studies have been conducted in Ghana and across the world. These studies
aim to understand the epidemiology, risk factors, complications, and management
of diabetes. They provide valuable insights into the disease and inform policy-
making and healthcare interventions.
In Ghana, research studies have focused on identifying the prevalence of diabetes in

different regions, exploring the risk factors associated with the disease, and
evaluating the effectiveness of interventions. These studies have highlighted the
need for targeted prevention and control programs, early detection, and improved
access to healthcare services. On a global scale, extensive research has been
conducted to gain a better understanding of diabetes. The World Health
Organization (WHO) and the International Diabetes Federation (IDF) have played a
crucial role in coordinating research efforts and promoting collaboration among
countries. These efforts have led to significant advancements in diabetes research,
including the development of new treatments, improved diagnostic tools, and
enhanced management strategies. The research conducted in Ghana and around the
world has provided valuable evidence on the burden of diabetes, its risk factors, and
effective prevention and management approaches. However, there is still much
work to be done. Continued research efforts, innovative interventions, and strong
policy support are essential to address the growing diabetes epidemic in Ghana and
globally.
In conclusion, diabetes is a significant health challenge in Ghana and worldwide.

The rising prevalence of the disease calls for urgent action. Research studies
conducted in Ghana and across the world have provided valuable insights into the
epidemiology, risk factors, and management of diabetes. However, more research
and coordinated efforts are needed to effectively prevent and control diabetes and
its associated complications.
1.2 Problem Statement

Diabetes has become a major public health concern, with an estimated 425
million adults living with diabetes worldwide. In recent years, the prevalence of
diabetes has been increasing in Ghana, with approximately 4.1 million people
estimated to be living with diabetes in 2018. This alarming growth has created a
significant burden on Ghana’s healthcare system, as the costs associated with
diagnosis, treatment, and management of diabetes can be prohibitively expensive.
Furthermore, the inadequate availability of specialized healthcare providers,
including endocrinologists and diabetes educators, has exacerbated the issue.
This statement of research problems seeks to identify the key issues related to
diabetes in Ghana and explore the associated global trends. Data will be collected
from various sources, including government health reports, surveys, and
interviews with healthcare professionals. This data will be used to analyze the
rate of diabetes prevalence, the accessibility of diabetes services, and the cost of
diabetes care in Ghana. Additionally, the research will explore the global trends
in diabetes in order to identify best practices that can be applied in Ghana to
improve the management of diabetes. Ultimately, this research will provide
valuable insights into the challenges posed by diabetes in Ghana and how to
address them.
1.3 Objective
Logistic regression, we can create a statistical model to better understand the

prevalence, risk factors, and complications associated with diabetes in Ghana.
We can also compare the findings with global trends to gain further insight into
the current state of diabetes in Ghana and how it impacts the global population.
This research will be beneficial in helping us identify the most effective
interventions to reduce the burden of diabetes in Ghana and beyond.
Furthermore, it will help to create a better understanding of the risk factors
associated with diabetes in Ghana and how they differ from those in other
countries. Ultimately, this will help to inform medical and public health
professionals so that they can create more effective strategies to tackle the
growing problem of diabetes in Ghana and the world.
1.4 METHODOLOGY
In this study, we will be using LOGISTIC regression analysis to model our data.
Logistic regression will be used to classify individuals into two groups: those
with diabetes (1) and those without diabetes (0) as well as to explain the
relationship between diabetes and various independent variables, such as Age,
Body Mass Index (BMI), Insulin, Diabetes Pedigree Function, Skin Thickness,
Blood Pressure, outcome, Pregnancy, exercise, and Glucose.
The analysis of this study will be conducted using R Statistical Software. Data
will be gathered from the Internet, libraries, personal notes, lecture notes and
other relevant sources such as the World Health Organization (WHO). All of
these sources will provide valuable insight into the research topic, allowing us to
draw meaningful conclusions.
1.5 JUSTIFICATION
The success of this study will provide valuable insight into the factors that
contribute to Diabetes in Ghana, and beyond. With this knowledge, we can work
towards reducing the prevalence of Diabetes, not just in Ghana, but around the
world. With a greater understanding of the causes of this condition, everyone can
take steps to protect their health and reduce the number of people who suffer
from Diabetes.
1.6 SCOPE AND LIMITATION OF STUDY

The study seeks to examine the effects of various factors on the rate of Diabetes.
By looking at how these factors contribute to the development of the disease, we
can better identify what needs to be addressed in order to reduce the negative
impact of Diabetes. Although we are aware of many of the factors that cause
Diabetes, not all of them were taken into consideration in this study.
Additionally, the study was limited by the fact that other variables were not
included in the model, which could have had an influence on the findings.
However, by understanding how the factors influence Diabetes, we can work
towards reducing its devastating effects.
1.7 Thesis Organization

In our research study, there are five chapters. Chapter one deals with the
background of the study, problem statement, objectives of the study, the
methodology, justification, limitations of the study and the organization of the
study. Chapter two reviews the related literature of the study. Chapter three
focuses on the methodology of the study. Problems discussed include analytical
framework, data source, sample and sampling procedure, logistic regression,
generalized linear model and binary logistic regression, estimating the single
regression model, estimation techniques, marginal effect, definition and
measurement of variables and data analysis procedure. Chapter four focuses on
data collection, the research findings and the results of our findings. Chapter five
discusses the summary, conclusions from findings and recommendation from the
study.
CHAPTER 2
Literature Review
2.1 Introduction
This chapter reviews some literature on diabetes. It focuses on some factors causing
diabetes, the effects of diabetes, and the prevention and treatment of diabetes.
2.2 Factors causing Diabetes
Diabetes is a chronic medical condition that affects millions of people worldwide. It

occurs when the body is unable to regulate blood sugar levels effectively. There are
several factors that can contribute to the development of diabetes. Understanding these
factors is crucial in managing and preventing the onset of this disease. In this document,
we will explore the most common factors causing diabetes.
2.2.1 Pregnancies: Gestational diabetes can occasionally develop during pregnancy. The
body's ability to efficiently control blood sugar levels is temporarily impaired. Although
gestational diabetes usually goes away after childbirth, it does raise the risk of type 2
diabetes later in life, so it's crucial to remember this.
2.2.2 Glucose: Glucose is the bloodstream sugar that serves as the body's main source
of energy. In people with diabetes, the body either generates insufficient insulin (type 1
diabetes) or is unable to utilise the insulin that is produced properly (type 2 diabetes).
As a result, blood glucose levels rise, which can result in a number of issues if not under
control.
2.2.3 Blood Pressure (BP): Diabetes frequently coexists with high blood pressure,
sometimes referred to as hypertension, which is a common illness. The impacts of the
two illnesses can worsen one other when they coexist regularly. People with diabetes
who have high blood pressure are at an increased risk of having problems like heart
disease, stroke, and renal disease.
2.2.4 Skin Thickness: There is no obvious connection between skin thickness and
diabetes. It might, however, be connected to ailments like acanthosis nigricans, a skin
disorder marked by dark, thicker areas of skin. Acanthosis nigricans may indicate
insulin resistance, which is a risk factor for type 2 diabetes.
2.2.5 Insulin: The pancreas releases the hormone insulin, which aids in controlling
blood sugar levels. Type 1 diabetes occurs when the body is unable to produce insulin,
but type 2 diabetes occurs when the body develops a resistance to the effects of insulin.
Insulin enables glucose to enter cells where it can be used as fuel. Blood glucose levels
stay high when insulin is inefficient or absent.
2.2.6 Body Mass Index (BMI) is a calculation of body fat based on a person's height
and weight. If someone is underweight, normal weight, overweight, or obese, it
indicates that. Type 2 diabetes is significantly influenced by excess weight, especially
abdominal obesity. Obesity can cause insulin resistance and impair the body's
utilization of insulin.
2.27 Diabetes Pedigree Function: This mathematical function calculates an individual's

risk for diabetes based on their family history. It considers a person's family history of
diabetes and estimates how likely it is for them to have the disease based on their
genetic make-up.
2.2.8 Age: Type 2 diabetes risk increases with age. People are more likely to get
diabetes as they age. This may be the result of things like decreased physical activity,
modifications to metabolism, and an increase in body fat storage.
2.2.9 Outcome: The conclusion is whether diabetes exists or not. It probably shows
whether or not a person has had a diabetes diagnosis in this situation. The various
factors stated above may have an impact on diabetes treatment or perhaps cause
diabetes to develop.
2.3 Effects of Diabetes
One of the primary effects of diabetes is the impact it has on blood sugar levels. When a
person has diabetes, their blood sugar levels can become dangerously high or low,
leading to a condition known as hyperglycemia or hypoglycemia, respectively.
Hyperglycemia can cause symptoms such as increased thirst, frequent urination, fatigue,
and blurred vision. On the other hand, hypoglycemia can result in symptoms like
dizziness, confusion, sweating, and even loss of consciousness. Diabetes also
significantly affects the cardiovascular system. People with diabetes are at a higher risk
of developing cardiovascular diseases such as heart attacks, strokes, and high blood
pressure. Elevated blood sugar levels can damage blood vessels and increase the
buildup of fatty deposits, leading to atherosclerosis. This condition restricts blood flow
to vital organs and increases the likelihood of heart-related complications. Another area
where diabetes has a profound impact is on the kidneys. Over time, high blood sugar
levels can damage the blood vessels in the kidneys, impairing their ability to function
properly. This can lead to a condition called diabetic nephropathy, which is
characterized by the gradual loss of kidney function. If left untreated, diabetic
nephropathy can progress to end-stage renal disease, requiring dialysis or a kidney
transplant. Diabetes can also have a significant impact on a person's mental health. The
stress and emotional burden of managing the condition, along with the potential
complications, can contribute to feelings of anxiety and depression. Additionally, the
impact of diabetes on physical health can further exacerbate mental health issues. Nerve
damage, known as diabetic neuropathy, is another common effect of diabetes. High
blood sugar levels can damage the nerves, particularly in the feet and legs. This can
result in symptoms such as numbness, tingling, and pain. Diabetic neuropathy can also
affect other parts of the body, including the digestive system, leading to issues like
gastroparesis (delayed stomach emptying) and erectile dysfunction in men.
2.4 Treatment and Prevention of Diabetes

The effects of diabetes is enormous and hence the need to create awareness by the
educating the public on the existence of diabetes, its treatment and preventive measures.
Through education, people could be encouraged to maintain good health and reduce
certain sedentary lifestyles which may predispose them this condition of diabetes.
Regular monitoring of blood sugar levels is an essential part of diabetes management.
This can be done using a blood glucose meter or continuous glucose monitoring
systems. By monitoring blood sugar levels, individuals with diabetes can make
adjustments to their medication, diet, and physical activity to maintain optimal control.
Engaging in regular physical activity can help improve insulin sensitivity and lower
blood sugar levels. It is recommended to aim for at least 150 minutes of moderate-
intensity aerobic activity, such as brisk walking, cycling, or swimming, per week.
Strength training exercises are also beneficial for maintaining muscle mass and overall
health. Regular screening for diabetes is important, especially for individuals with risk
factors such as obesity, family history of diabetes, or a sedentary lifestyle. Early
detection allows for timely intervention and better management of the condition.
Participating in diabetes prevention programs can help individuals at high risk of
developing diabetes make lifestyle changes and reduce their risk. These programs
typically include education, support, and guidance on healthy eating, physical activity,
and weight management.
CHAPTER 3
Methodology
3.1 Introduction
This chapter highlights the methods, data and analytical procedures employed in order
to attain the objectives of the research study. The study emphasis on the analytical
framework, data source and acquisition, sampling and sample size and binary logistic
regression, estimation techniques, definition and measurement of variables.
3.2 Data Source and Acquisition

To obtain data for our study, we used the secondary data collection. Secondary data is
the data that has been already been collected through primary sources and made readily
available for researchers to use for their own research. It is a type of data that has been
collected from the past. Secondary source of data includes books, personal sources,
journals, newspaper, websites, government records etc. The Research analysis is based
on data taken from National Institute of Diabetes and Digestive and Kidney Diseases.
The objective is to predict based on diagnosis measurement whether a patient has
diabetes. This data provides a wide range of information on variables including
Pregnancies, Glucose, Blood Pressure (BP), Skin Thickness, Insulin, Body Mass Index
(BMI), Diabetes Pedigree Function, Age and Outcome
https://www.kaggle.com/datasets/mathchi/diabetes-data-set
3.3 Sample Size and Sampling Procedure
The dataset contains 768 rows and 9 columns. These columns’s label are listed below.
[1] "Pregnancies"
[2] "Glucose"
[3] "BloodPressure"
[4] "SkinThickness"
[5] "Insulin"
[6] "BMI"
[7] "DiabetesPedigreeFunction"
[8] "Age"
[9] "Outcome"
There are 8 variables are taken as indicators in the dataset. The variable Outcome is a
response stated whether or not a person has diabetes by showing the result value
as 0 for NO and 1 for Yes. Number of Attributes: 8 plus class
3.3 Logistic Regression

Logistic regression analysis extends the techniques of multiple regression analysis to
research situations in which the outcome is categorical. All goes well if linear
regression assumptions are met. However, several assumptions are likely to be unmet if
the dependent variable has only two or three response categories. With the two
dependent variable outcomes, assumptions of homoscedasticity, linearity and normality
are violated and then the Ordinary Least Square estimates are inefficient at best. The
maximum likelihood estimation of a logistic regression overcomes this inefficiency,
transforming Y (0, 1) into a logit (log of the odds of falling into the “1” category).
Logistic regression determines the impact of multiple independent variables presented
simultaneously to predict membership of one or the other of the two dependent variable
categories. Logistic regression also provides knowledge of the relationship and strength
among the variables
3.5 Variable Measurement and Their Definitions

VARIABLE DEFINITION MEASUREMENT
DEPENDENT Probability of the 0-normal
individual having diabetes 1-high
or not
INDEPENDENT
Age Age Years
BMI Body Mass Index Weight (kg) over

height in metre
squared
Pregnancies Number of times pregnant
Blood pressure Diastolic blood pressure (mm Hg)
Skinthickness Triceps skin fold thickness (mm)

Insulin 2-Hour serum insulin (mu U/ml)
Glucose Plasma glucose

concentration a 2 hours in
an oral glucose tolerance
test
DiabetesPedigreeFunction Diabetes pedigree function
Outcome Class variable 0 or 1

3.6 The Logistics Regression Model
Logistic regression is a statistical model used to predict the probability of a binary

outcome based on one or more predictor variables. It is commonly used in various
fields, including healthcare, finance, and marketing.
The logistic regression model is based on the concept of the logit function, which
transforms the linear regression equation into a range of [0, 1]. This allows us to
interpret the output as the probability of the event occurring.
In logistic regression, the dependent variable is binary, meaning it can take only two
values, such as "yes" or "no," "success" or "failure." The independent variables can be
continuous or categorical. The goal is to estimate the coefficients of the independent
variables that maximize the likelihood of the observed data.
The logistic regression model assumes that the relationship between the independent
variables and the log-odds of the dependent variable is linear. However, this linearity
assumption can be relaxed by including higher-order terms or interaction terms in the
model.
To estimate the coefficients of the logistic regression model, maximum likelihood

estimation (MLE) is commonly used. MLE finds the values of the coefficients that
maximize the likelihood of observing the data. The logistic regression model does not
provide p-values for the coefficients, but it does provide odds ratios, which can be used
to interpret the effect of each independent variable on the odds of the outcome.
Once the logistic regression model is fitted, it can be used to make predictions on new
data. The predicted probabilities can be converted into binary outcomes using a
specified cutoff value, such as 0.5. However, the choice of the cutoff value depends on
the specific application and the trade-off between false positives and false negatives.
There are several evaluation metrics that can be used to assess the performance of a
logistic regression model, such as accuracy, precision, recall, and F1 score. These
metrics provide insights into how well the model is able to classify the binary outcome.
In conclusion, logistic regression is a widely used statistical model for predicting binary
outcomes. It provides a flexible framework for modeling the relationship between
independent variables and the probability of the event occurring. By estimating the
coefficients using maximum likelihood estimation, the logistic regression model can
make predictions and evaluate its performance using various metrics.
Some of the instances in which binary logistic regression can used are;
1. Modelling the probability that a patient is diabetic given some factors.
2. Modelling the factors that determine whether or not a student smokes, drinks, and
takes a particular elective course.
3. Determining the risk factors of accident severity
4. Establishing the risk factors of marital resolution or determining the probability that
couples will get divorce.
The logistic regression is most appropriate for categorical and binary outcomes because;
1. The response variable, Yi takes only 0 and 1 hence, the logistic regression ensures
that predicted values lie between 0 and 1 inclusively.
2. The errors are heteroskedastic.
3. Error terms are not normally distributed.
4. The logistic regression does not need a linear relationship between the predictor and
response variables.
3.6.1 Binary Model

In the simplest case of one predictor X and one binary or dichotomous outcome variable
Y , the logistic regression model predicts the logit of Y from X. Diabetes status ( y ) is
coded as y=1(diabetic) and y=0 (not diabetic). The method models the log odds(y)
using the logistic function. Denote P ( y=1 ) as P ( y ); the probability that y=1.
Logistic regression (LR) is one of the most important predictive models in
classification. To put it simply, logistic regression can be used to model the probability
of diabetes. The key concept of logistic regression is the logit, the natural logarithm of
the odds ratio.
For this dichotomous classification task, I will be using R programming to load the
data, split it into training and test datasets, perform data visualization and model
training using the training dataset, and eventually evaluate the model using the hold-out
dataset.
The simple logistic model has the form:
p( y)
Odds (y) = 1− p( y )
Let ω=β ο + β 1 X 1+ β 2 X 2 +…+ β k X k

k
¿ β ο +∑ β i X i
i=1
( p( y)
Logit ( p ( y ) )=ln 1− p( y ) =ω )
exp ⁡(ω)
p ( y )=
1−exp ⁡(ω)
Hence the model is given by
¿ ( p( y )
1− p ( y) )
=β ο + β 1 X 1 + β 2 X 2 +…+ β k X k
Where;
β ο Is the model intercept
β iare the coefficients of the model i=1,2,3 , … , k
Xi are the predictor variables i=1,2,3 , … , k
y is the binary outcome variable The logistic regression model above models the
logarithm of the odds of the outcome variable as a linear combination of the predictor
variables. The model coefficients β 0 are estimated using the maximum likelihood
estimation.
The graph of the logistic function is shown in the figure below
3.6.2 Estimation of Prevalence

Logistic regression can indeed be used to estimate prevalence indirectly. The estimated
prevalence can be calculated using the logistic regression equation and the proportion of
individuals with a predicted probability above a certain threshold.
Let's assume we have a logistic regression model with one independent variable,
denoted as X. The logistic regression equation can be written as:
logit(p) = β0 + β1*X
Where;
Logit (p) represents the log-odds of the event, p represents the probability of the event
(prevalence), and β0 and β1 are the coefficients estimated from the logistic regression
model.
To estimate the prevalence, we need to convert the log-odds back to the probability
scale. This can be done using the inverse of the logistic function, also known as the
sigmoid function:
p = 1 / (1 + exp(-logit(p)))
Now, let's say we have a threshold value of p threshold. We can estimate the prevalence
as the proportion of individuals in our dataset whose predicted probability (calculated
using the logistic regression equation) exceeds the threshold:
Prevalence = (Number of individuals with predicted probability > p threshold) / Total

number of individuals
In summary, the logistic regression equation and the sigmoid function allow us to
estimate the probability (prevalence) of an event based on the coefficients obtained
from the logistic regression model. By setting a threshold, we can determine the
proportion of individuals above that threshold and estimate the prevalence accordingly.
Please note that the threshold value is a subjective choice and can impact the estimated
prevalence. Additionally, this approach assumes that the logistic regression model is
appropriately specified and valid for the data being analyzed.
3.7 Assumptions of the Logistic Regression Model

1. The binary logistic regression assumes that, the dependent variable, yi comes
from the binomial distribution with parameters(n , pi), where n is known and pi is
unknown.
2. Each observation of the dependent variable is independent of the other
3. Log odds ( yi )is a linear function of independent variables
4. Non or very little multicollinearity between independent variables
3.8 Testing for Significance of the Model

The two methods that are employed in this study for testing the significance of
model coefficients are the hypothesis testing and confidence intervals.
3.8.1 Hypotheses Testing

All hypotheses testing and confidence intervals in this study make use of 95
confidence level. When the p−value<α =0.05 , the null hypothesis is rejected.
Testing for significance of individual coefficients is based on the following
hypothesis;
H o : β=0
H 1 : β ≠ 0 i=1,2,3 , … k
The maximum likelihood estimates give asymptotically normally distributed

coefficients with a Wald test statistic given by;
^β
i
Z=
se ( β^ )
i
The p-value of this test can be found from the standard normal table which is then
compared to the level of significance, α =0.05
3.8.2 Confidence Intervals for Model Parameters
95 % Confidence interval of Bi is given by

^β ± Z α se ( β^ )
2
The Odds Ratio of the kth coefficient is expressed as
3.9 Model Accuracy

The accuracy of a logistic regression model refers to the proportion of correct
classifications. To estimate the accuracy of the final model, predictions were
made using the test set and the responses rounded to the nearest binary digit.
The results of the prediction were finally summarized in a confusion matrix
and the accuracy of the model calculated as;
Table 3.2: Confusion Matrix

PREDICTED
0 1
ACTUAL 0 True Positive False Negative

(TP) (FN)
1 False Positive True Negative

(FP) (TN)
TP+TN
Accuracy= TP+ TN + FP+ FN
Chapter 4
Data Analysis and Results
4.1 Introduction
This chapter emphasizes on the analysis and presentation of results. It includes
descriptive and summary statistics, establishing relationship using odds ratios,
interpretation of relationship, and estimation of prevalence, model fitting and
diagnostics.
4.2 Descriptive Statistics

The dataset used was the Pima Indian Diabetes dataset from Machine Learning
Repository (originally from National Institute of Diabetes and Digestive and Kidney
Disease) which contains 8 medical diagnostic attributes and one target variable (i.e,
Outcome) of 768 female patients with 34.9% having diabetes (268 patients). The
variance for insulin for both categories was quite high. This dataset is used to predict
whether a person with certain medical diagnostic attributes is likely to have a diabetes
or not.The dataset contains 768 rows and 9 columns. All analyses were made using the
R- software version “2023.3.386”. The demographic and socio-economic characteristics
of respondents (students) in the study are summarized below. 2
Pregnancies Glucose BloodPressure SkinThickness

Min. : 0.000 Min. : 44.00 Min. : 24.00 Min. : 7.00
1st Qu.: 1.000 1st Qu.: 99.75 1st Qu.: 64.00 1st Qu.:25.00
Median : 3.000 Median :117.00 Median : 72.00 Median :28.00
Mean : 3.845 Mean :121.68 Mean : 72.39 Mean :29.09
3rd Qu.: 6.000 3rd Qu.:140.25 3rd Qu.: 80.00 3rd Qu.:32.00
Max. :17.000 Max. :199.00 Max. :122.00 Max. :99.00
Insulin BMI DiabetesPedigreeFunction Age

Min. : 14.0 Min. :18.20 Min. :0.0780 Min. :21.00
1st Qu.:102.5 1st Qu.:27.50 1st Qu.:0.2437 1st Qu.:24.00
Median :102.5 Median :32.05 Median :0.3725 Median :29.00
Mean :141.8 Mean :32.43 Mean :0.4719 Mean :33.24
3rd Qu.:169.5 3rd Qu.:36.60 3rd Qu.:0.6262 3rd Qu.:41.00
Max. :846.0 Max. :67.10 Max. :2.4200 Max. :81.00
Outcome
Min. :0.000
1st Qu.:0.000
Median :0.000
Mean :0.349
3rd Qu.:1.000
Max. :1.000
Reference
1. American Diabetes Association. (2021). Standards of Medical Care in Diabetes—2021. Diabetes
Care, 44(Supplement 1), S1-S232. doi: 10.2337/dc21-S000
2. Centers for Disease Control and Prevention. (2021). National Diabetes Statistics Report, 2020.
Retrieved from https://www.cdc.gov/diabetes/pdfs/data/statistics/national-diabetes-statistics-report.pdf
3. International Diabetes Federation. (2019). IDF Diabetes Atlas, 9th Edition. Retrieved from
https://www.diabetesatlas.org
1. Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied

logistic regression (3rd ed.). Wiley. Chapter 4 specifically covers logistic
regression for binary outcomes and discusses prevalence estimation.
2. Kleinbaum, D. G., & Klein, M. (2010). Logistic regression: A self-

learning text (3rd ed.). Springer. This book provides a comprehensive
introduction to logistic regression, including discussions on prevalence
estimation.
3. Bursac, Z., Gauss, C. H., Williams, D. K., & Hosmer, D. W. (2008).

Purposeful selection of variables in logistic regression. Source Code for
Biology and Medicine, 3(17). This article discusses variable selection
techniques in logistic regression, which can be useful in prevalence
estimation.
4. Zhang, Z., & Yu, K. F. (1998). What's the relative risk? A method of
correcting the odds ratio in cohort studies of common outcomes. JAMA,
280(19), 1690-1691. This article introduces a method for estimating
prevalence directly from the odds ratio obtained from logistic regression.

List of Abrevation

Uploaded by

Copyright:

Available Formats

List of Abrevation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

List of Abrevation

Uploaded by

Copyright:

Available Formats

LIST OF ABREVATION

IDF International Diabetes Federation

1.1 Background of study

Diabetes is a chronic disease that affects millions of people worldwide. It is

The factors contributing to the increasing prevalence of diabetes in Ghana are

In Ghana, research studies have focused on identifying the prevalence of diabetes in

In conclusion, diabetes is a significant health challenge in Ghana and worldwide.

1.2 Problem Statement

Logistic regression, we can create a statistical model to better understand the

1.6 SCOPE AND LIMITATION OF STUDY

1.7 Thesis Organization

2.2 Factors causing Diabetes

Diabetes is a chronic medical condition that affects millions of people worldwide. It

2.27 Diabetes Pedigree Function: This mathematical function calculates an individual's

2.4 Treatment and Prevention of Diabetes

3.2 Data Source and Acquisition

3.3 Logistic Regression

3.5 Variable Measurement and Their Definitions

BMI Body Mass Index Weight (kg) over

Skinthickness Triceps skin fold thickness (mm)

Glucose Plasma glucose

Outcome Class variable 0 or 1

Logistic regression is a statistical model used to predict the probability of a binary

To estimate the coefficients of the logistic regression model, maximum likelihood

3.6.1 Binary Model

Let ω=β ο + β 1 X 1+ β 2 X 2 +…+ β k X k

Hence the model is given by

β iare the coefficients of the model i=1,2,3 , … , k

Xi are the predictor variables i=1,2,3 , … , k

3.6.2 Estimation of Prevalence

Prevalence = (Number of individuals with predicted probability > p threshold) / Total

3.7 Assumptions of the Logistic Regression Model

3.8 Testing for Significance of the Model

3.8.1 Hypotheses Testing

The maximum likelihood estimates give asymptotically normally distributed

95 % Confidence interval of Bi is given by

The Odds Ratio of the kth coefficient is expressed as

3.9 Model Accuracy

Table 3.2: Confusion Matrix

ACTUAL 0 True Positive False Negative

1 False Positive True Negative

4.2 Descriptive Statistics

Pregnancies Glucose BloodPressure SkinThickness

Insulin BMI DiabetesPedigreeFunction Age

1. Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied

2. Kleinbaum, D. G., & Klein, M. (2010). Logistic regression: A self-

3. Bursac, Z., Gauss, C. H., Williams, D. K., & Hosmer, D. W. (2008).

You might also like