Group 1 Biostat Assignement@
Group 1 Biostat Assignement@
Contents
1. Objective..................................................................................................................................1
2. Background..............................................................................................................................2
3. Assumptions.............................................................................................................................3
3.1 Independence.........................................................................................................................3
3.2 No Multicollinearity..............................................................................................................3
3.3 Linearity.................................................................................................................................4
5. Research questions...................................................................................................................6
6. Descriptive Statistics................................................................................................................7
7. Checking Assumptions...........................................................................................................10
8. Model Fitness.........................................................................................................................12
10. Recommendation................................................................................................................16
REFERENCE................................................................................................................................17
I
1. Objective
To identify the appropriate statistical method to analyze the prevalence and associated
factors of maternal health knowledge at St. Paul Hospital 2024 G.C
To describe the extent of maternal health knowledge and variables using relevant
descriptive statistical methods.
To conduct the selected statistical analysis on the dataset to test associations between
outcome and predictor variables.
To assess how well the selected statistical model fits the dataset using relevant model
fitness tests.
To interpret the results of the statistical analysis, conclude relationships between
variables, and provide recommendations based on the findings.
1
2. Background
A logistic model is one of the most widely used statistical models, it’s a predictive model that is
utilized when the dependent variable is categorical and the outcomes are binary. The idea of
logistic function was first pitched in the early 19 th century when a known statistician named
Pierre François Verhulst utilized it to model population growth, however, despite the idea being
introduced during this time it was not until the late 1950s that the model started to take
shape (Boateng & Abaye, 2019).
This allowed predicting the probability of an event rather than the event itself. Around the same
time, statistician John Tukey began studying logit transformations as an alternative to linear
regression for non-normal data. In the 1960s, developments in computer technology enabled
practical implementations of Cox's method. (Cramer, 2005) Programmers created algorithms for
iterative maximum likelihood estimation of logistic regression parameters A breakthrough came
in the early 1970s when statisticians Alfred Dempster, Nan Laird, and Donald Rubin published
an influential article outlining an algorithm for fitting logistic regression models using the EM
method. In the late 1970s and 1980s, logistic regression gained more traction and usage as
statistical computing capabilities advanced. (Dempster, n.d.) Researchers could more easily
implement maximum likelihood calculations needed for model fitting. Important early adopters
and contributors to developing methodology included Norusis, Hosmer and Lemeshow. Their
texts and software packages helped popularize logistic regression across various fields like
epidemiology, social sciences, and other areas dealing with binary outcomes.
(Joseph Newton et al., n.d.)
By the 1990s, logistic regression had become a standard technique taught in graduate statistics
programs and widely used in published research. Advances continued around model selection,
interpretation of coefficients, and model diagnostics. Today, binary logistic regression remains
one of the most commonly applied statistical methods, especially in medical, biological, and
social science research where dichotomous outcomes are frequently encountered. (Hilbe, n.d.)
2
3. Assumptions
Certain fundamental assumptions about the data are used in the development of statistical models
such as binary logistic regression. When one or more assumptions are broken, the model may
yield inaccurate results. Assumptions are characteristics of the data that are necessary for the
model to function as intended. Before conducting binary logistic regression for the model to be
considered unbiased and generalizable beyond the sample, it must satisfy all assumptions. Some
of the major assumptions are listed below (Harris, 2021a)
3.1 Independence
Independence is one of the major assumptions on behalf of binary logistic regression which
requires that each of the observations in a data set is unrelated to the other observations in the
data set. This assumption could potentially fail due to two important considerations. The first one
is that due to repeated measurements of the same subject through time. For instance, the health
outcomes of a patient across different visits are likely correlated and not independent; Outcomes
measured weekly for trial participants are correlated within subjects; Survey responses from the
same individuals over different periods. The second one is clustered/hierarchical data where
subgroups are correlated violates the independence requirement for logistic regression. For
instance, Students in the same class will have more similar outcomes than students in different
classes. Patients seen at the same hospital may respond more similarly than those from different
hospitals. Neighbors likely have more correlated views than people living in separate
neighborhoods. Checking this assumption requires knowing how the data were collected to
ensure that the observations are unrelated. (Harris, 2021a)
3.2 No Multicollinearity
Multicollinearity occurs when there is high intercorrelation between the predictive variables
causing unstable individual parameter estimates and wider confidence intervals. This assumption
requires each independent variable to not to be perfectly correlated. The premise of no perfect
multicollinearity can be demonstrated in a few different methods. The Variance Inflation Factor,
or VIF, is frequently employed. The degree to which a variable is explained by the other
variables in the model is indicated by its VIF score. The VIF score is generalized (GVIF) and
takes on greater values for binary logistic regression. (Shrestha, 2020)
3
3.3 Linearity
According to the linearity assumption, predictors, or continuous independent variables, must
have a linear connection with the log odds of the outcome's expected probability. Relationships
that appear to follow a comparatively straight path are called linear relationships. Making a
scatterplot with the continuous predictor on the x-axis and the log odds of the anticipated
probability on the y-axis is one method of examining this relationship. To the scatterplot, include
a loess curve and a line that shows a linear relationship between the two variables. A more
nuanced representation of the link between the predictor and the converted outcome is provided
by the loess curve, whereas a fitted line illustrates the relationship if it were linear.
(Harris, 2021b)
4
4. Determinants of Maternal Health Knowledge Among Pregnant Women
Maternal health and knowledge are one of the major factors that plays a crucial role in ensuring
positive pregnancy and delivery outcomes. Lack of awareness about maternal and child health
issues can eventually undermine the health-seeking behaviors of women during pregnancy,
childbirth, and the postnatal period. Several studies have highlighted how social, economic, and
demographic factors influence a woman's access to information and healthcare services, thereby
impacting her overall health knowledge. (Kifle et al., 2017)
One of the key determinants of maternal knowledge is the utilization of antenatal care (ANC)
services. ANC visits provide opportunities for health professionals to educate and counsel
women on nutrition, danger signs during pregnancy, institutional delivery, postnatal care, and
newborn care practices. However, in many developing countries, uptake of the recommended
minimum of 4 ANC visits remains low especially in rural areas due to financial barriers and poor
infrastructure. Missing out on these crucial counseling sessions during pregnancy can negatively
impact a woman’s awareness levels. (Negero et al., 2023)
Whether the current pregnancy was planned or unplanned also plays a role. Unintended
pregnancies are often associated with less pre-pregnancy preparation including seeking health
information. Lack of planning may translate to inadequate knowledge about fetal development,
self-care during pregnancy, and delivery preparedness. Educational attainment of both the
woman and her husband is another important determinant of maternal knowledge. Higher
education is tied to improved health literacy, the ability to comprehend health messages better,
and greater autonomy in healthcare decision-making. (Moges et al., 2020)
Rural-urban differentials also emerge, with women in rural areas generally experiencing limited
access to mass media exposing health promotion campaigns, fewer trained health workers for
IEC activities, and constrained mobility restricting social interactions that foster information
exchange. Marital status affects knowledge as unmarried women may receive irregular guidance
and support compared to married women with the involvement of in-laws. Advancing maternal
age brings more life experiences but young mothers below 20 years of age tend to demonstrate
poorer awareness due to less maturity. (Okwaraji et al., 2015)
5
Occupation of the mother plays a role, where involvement in income generation activities outside
the home leaves limited time for self-education while indoor domestic duties allow more
flexibility. Place of last childbirth, whether institutional or non-institutional, is another factor as
facility-based deliveries expose women to health education whereas home births miss such
instructional opportunities. Finally, primigravid women conventionally display lower knowledge
than multigravid mothers who gain insights from prior pregnancies.
(Alemu et al., 2022)
.Considering these various individual, household, and community-level determinants that
influence access to health information and services, a cross-sectional study was conducted to
evaluate the social and demographic predictors of maternal knowledge.
5. Research questions
1. What is the level of maternal health knowledge among pregnant women visiting St. Pauls
Hospital?
2. How does maternal health knowledge vary across different socio-demographic factors
such as age, education level, and occupation?
3. What factors are most strongly associated with higher levels of maternal health
knowledge?
6
6. Descriptive Statistics
A total of 567 pregnant woman were involved in the study and the study had 1005 response rate.
Majority of the study participants were married 97.5% and live in rural areas 69.7%.
Furthermore 76.9% of the study subjects had delivery at home 76.9% and 23.1 % had delivery at
health institution. s
7
Figure 1: Gravida status of the study subjects
The majority of study participants fall into the 2-4 pregnancies category. This suggests that a
significant proportion of the subjects have experienced multiple pregnancies. The number of
subjects with only one pregnancy is relatively lower. This implies that fewer participants have
had just one pregnancy. Interestingly, the frequency of subjects with five or more pregnancies is
also relatively low.
Furthermore, the data indicated that 76.9% of the study subjects had their last delivery at home
on the other hand 23.1% of the study subjects managed to get their delivery in a health
institution. The data also indicated that most of the pregnancies were planned with 91.5% of the
study subjects and 8.5% of the study subjects’ pregnancies were not planned.
8
Figure 2: The general knowledge score of the study participants
The frequency of maternal health knowledge was rather comparable meaning that the proportion
of study subjects who are knowledgeable 51.3% are somewhat not that different from those who
were not knowledgeable 48.7%.
9
7. Checking Assumptions
1. Binary Dependent Variable: This assumption is fulfilled since our dependent variables’
outcome is either knowledgeable or not knowledgeable, this assumption doesn’t require
performing any statistical methods.
2. Independent Observations: The samples were collected once, there were no any
repeated measures of the same subject through time. Furthermore, the data was collected
through probability sampling, establishing randomness. The study is an institution-based
cross-sectional study
3. Multicollinearity:
mother
Educational status of husband .612 1.635
The above table was taken from the SPSS output, as the table indicates it seems that the
assumption of no multicollinearity was fulfilled. This is because the degree of multicollinearity
increases if the VIF is larger than 10 and the tolerance value is closer to zero
(Shantha Kumari, n.d.)
. Since all the values of VIF were substantially less than 10 and the tolerance was also much
higher than 0.1 for most of the variables we can conclude that there was no multicollinearity
between the independent variables.
10
We could also check multicollinearity through running correlation analysis among each
independent variable but this might not be as effective as VIF and Tolerance since 3 or more
independent variables could correlate simultaneously
4. Extreme Outliers: There are various ways of checking extreme outliers some of them
include Cooks distance, Casewise listing
Casewise listing: One frequent cut-off criterion used to determine whether a given residual might
or might not be reflective of an outlier is a value greater than ±3.in our case we have used 3
standard deviations for the Casewise diagnostics. (Wiggins, n.d.)The outcome was described as
follows.
5. Sample Size Sufficiency: A minimum sample size of 500 is required for observational
studies with large populations that use logistic regression in the analysis in order to obtain
the statistics that describe the parameters.
(Bujang et al., 2018) Due to this reason the study has sample size of 567 with no m
8. Model Fitness
There are various ways of checking model fitness for binary logistic regression
Omnibus Tests of Model Coefficients - Tests if model is significantly better than null
model. Should be significant for good fit.
11
Classification Table - Shows % of cases correctly classified by model. Should be
significantly higher than chance level.
Hosmer and Lemeshow Test - Non-significant result (p>0.05) indicates model predicts
outcomes equally well across risk categories.
H 0= The model with all predictor variables does not significantly fit the data better than a model
with just the intercept (constant)
H a = The model with at least one predictor variable fits the data significantly better than a model
with just the intercept.
If p-value<0.05, then it’s a good fit. This means at least one independent variable can
significantly predict the maternal health knowledge of the study subjects.
The null hypothesis for the Hosmer and Lemeshow Test states that the model’s estimates fit the
data at an acceptable level (i.e., the model is a good fit). The alternative hypothesis suggests that
the model’s fit is significantly different from the expected fit. (Fagerland & Hosmer, 2012)With
12
a Sig. value of 0.696, we do not have sufficient evidence to reject the null hypothesis. Therefore,
we conclude that the model’s estimates adequately fit the observed data.
11.2% of the variability in the knowledge status of the woman were explained by the model ,
this is actually quit low , a higher percentage indicates a better the model predication.
Since all the assumptions have been met, we can now conduct the binary logistic regression.
First a bivariate regression analysis was conducted to identify important covariates All variables
with p-value ≤ 0.2 were taken into the multivariable model to control for all possible
confounders and finally the strength of the association was measured by odds ratio with 95% CI
and P-value less than 0.05 will be considered as statistically significant.
Following the bivariate analysis, Occupation of the mother, Gravida, ANC visit, last place of
delivery, Place of Residence, and educational status of the mother had p-value <0.2, so these
independent variables progress into multivariate analysis. After multivariate regression analysis
was conducted among the variables that passed the bivariate analysis, Occupation of the mother
AOR=2.349 (1.24, 4.450) with p value 0.009; and Last place of delivery 3.2 (1.92, 5337) were
found to be statistically significant.
Interpretation- Those women who had occupation at Governmental level were 2.349 more
likely to be knowledgeable compared to those who are housewife. [AOR=2.349, 95% CI (1.24,
4.45)]
Those women who had their delivery at health institution were 3.2 times more likely to be
knowledgeable than those who had their delivery at home. [AOR=3.2, 95% CI (1.92, 5.337)]
13
Table 2: Results for bivariate and multivariate analysis
Variables Category Knowledgea COR (95 %CI) AOR (95 %CL) p-value
ble status
Yes No
(%) (%)
Age 15-24 68 86
14
Residence Urban 74 98 1.386 (0.966, 1.988)
Gravida 1 46 60
Pregnancy No 25 23 1 1
Planned or
Yes 251 268 1.161 (0.642, 2.098) 2.48 (1.12, 5.17)
not
15
10. Recommendation
16
REFERENCE
Alemu, A., Woltamo, T., & Abuto, A. (2022). Determinants of women participation in income
generating activities: evidence from Ethiopia. Journal of Innovation and Entrepreneurship, 11(1).
https://doi.org/10.1186/s13731-022-00260-1
Boateng, E. Y., & Abaye, D. A. (2019). A Review of the Logistic Regression Model with Emphasis
on Medical Research. Journal of Data Analysis and Information Processing, 07(04), 190–207.
https://doi.org/10.4236/jdaip.2019.74012
Bujang, M. A., Sa’At, N., Tg Abu Bakar Sidik, T. M. I., & Lim, C. J. (2018). Sample size guidelines
for logistic regression from observational studies with large population: Emphasis on the
accuracy between statistics and parameters based on real life clinical data. Malaysian Journal of
Medical Sciences, 25(4), 122–130. https://doi.org/10.21315/mjms2018.25.4.12
dempster. (n.d.).
Harris, J. K. (2021a). Primer on binary logistic regression. Family Medicine and Community Health,
9. https://doi.org/10.1136/fmch-2021-001290
Harris, J. K. (2021b). Primer on binary logistic regression. Family Medicine and Community Health,
9. https://doi.org/10.1136/fmch-2021-001290
Joseph Newton, E. H., Cox, N. J., Bellocco, R., Institutet, K., Buis, M. L., Colin Cameron, G. A.,
Cleves, M. A., Dupont, W. D., Ender, P., Epstein, D., Gregory, A., Hardin, J., Jann, B., Jenkins,
S., Kreuter, F., Lachenbruch, P. A., Lauritsen, J., Scott Long, J., Newson, R., … Skaggs, D.
(n.d.). The Stata Journal.
17
http://www.stata-journal.comhttp://www.stata.com/bookstore/sj.htmlhttp://www.stata.com/
bookstore/sjj.htmlhttp://www.stata-journal.com/archives.html
Kifle, D., Azale, T., Gelaw, Y. A., & Melsew, Y. A. (2017). Maternal health care service seeking
behaviors and associated factors among women in rural Haramaya District, Eastern Ethiopia: a
triangulated community-based cross-sectional study. Reproductive Health, 14(1), 1–11.
https://doi.org/10.1186/s12978-016-0270-5
Moges, Y., Worku, S. A., Niguse, A., & Kelkay, B. (2020). Factors Associated with the Unplanned
Pregnancy at Suhul General Hospital, Northern Ethiopia, 2018. In Journal of Pregnancy (Vol.
2020). Hindawi Limited. https://doi.org/10.1155/2020/2926097
Negero, M. G., Sibbritt, D., & Dawson, A. (2023). Women’s utilisation of quality antenatal care,
intrapartum care and postnatal care services in Ethiopia: a population-based study using the
demographic and health survey data. BMC Public Health, 23(1). https://doi.org/10.1186/s12889-
023-15938-8
Okwaraji, Y. B., Webb, E. L., & Edmond, K. M. (2015). Barriers in physical access to maternal
health services in rural Ethiopia. BMC Health Services Research, 15(1).
https://doi.org/10.1186/s12913-015-1161-0
Schreiber-Gregory, D., Jackson Foundation Karlen Bader, H. M., & Jackson Foundation, H. M. (n.d.).
Logistic and Linear Regression Assumptions: Violation Recognition and Control.
van Smeden, M., Moons, K. G. M., de Groot, J. A. H., Collins, G. S., Altman, D. G., Eijkemans, M. J.
C., & Reitsma, J. B. (2019). Sample size for binary logistic prediction models: Beyond events
per variable criteria. Statistical Methods in Medical Research, 28(8), 2455–2474.
https://doi.org/10.1177/0962280218784726
18
19