Exploratory Data Analysis of Migraine Data
Exploratory Data Analysis of Migraine Data
ISSN No:-2456-2165
Abstract:- This paper aims to provide a methodology for description, a methodology of performing the statistical
the quantitative analysis of migraine data. The main analysis, conclusion, and Further enhancements and ends
objective is to facilitate the health practitioners who are with references.
fascinated by data study and have it in mind to offer a
concise steer that may prove useful across a wide range of II. LITERATURE REVIEW
medical applications. To illustrate the proposed study, a
typical migraine dataset is used to demonstrate how these Among all the medical reasons, the most dominating
steps are useful in practice. However, nowadays migraine feature is the number of times of the occurrence of the
becoming a common problem in almost all kinds of headache. Mostly women with above 40 years of age are
people. Due to stress full working environment, the affected with migraine extensively. The throbbing and pain
impact of parent’s heredity in children, lifestyle change, disturb daily routine functioning. The most useful factors for
irregular food habits, weather conditions, excess governing headaches in women are age, education, and
consumption of caffeine, medication overuse, menstrual frequency of headaches. On account of the headache rate of
time headache, and menopause stress in women and recurrence shows potential as the most influencing factor, it is
tension are the reasons for a migraine attack. The data set of the utmost importance to enlighten patients of the value of
Kostecki Dillon downloaded from the UCI repository with taking prophylactic measures. Also, it is important to find the
4152 observations on 133 subjects for 9 variables is cause of migraine occurrence. This approach will help the
considered for the learning and from this, separation of person in conducting themselves either with the help of
records on migraine handling collected by Tammy medicines or other approaches [1].
Kostecki-Dillon consists of headache entries set aside in a
treatment program. The study will discuss some standard The main reason for the occurrence of Migraine is the
ideas of correlations and p – values to quantify emotional strain. Migraine fatalities are found while the
“importance” (or more mathematically accurately emotional and stress full events take place. If a person gets
statistical significance). Also, it discusses some standard emotional, certain chemicals in the brain are unconfined and
statistical analysis and hypothesis testing which offers an fight the state of affairs. The discharge of these chemicals
improved understanding. causes migraines. Stress is an important aspect of the
occurrence of a Tension headache. Tension headaches can
Keywords:- Migraine; Statistical analysis; Hypothesis both be sporadic or continual. Episodic tension headache is
testing; Correlation) triggered by a demanding circumstance or a build-up of
stress. It can be treated by over-the-counter painkillers. Daily
I. INTRODUCTION strain, such as high-anxiety jobs, will direct to a chronic
tension headache. Treatment for chronic tension headaches
Migraine is one of the most common happening usually involves stress managing, therapy, biofeedback, and
complaints disturbing the nervous organism of humans. There probably the use of antidepressant or anxiety-sinking
are various forms of migraines, influence persons depend on medicines [2].
the surroundings, age, gender, and other aspects. This paper
endeavors to execute some statistical tests on Migraine data Environmental reasons such as transformation in the
and to present some implications concerning the patient’s atmosphere or weather, a change in altitude or barometric
age, gender, and headache varieties. In turn, these realities pressure, high winds, traveling, or a change in habit are
will facilitate the patients and other individuals to be factors which generate migraine. Other ecological triggers
conscious of the happening of Migraine. It would be take account of a bright or gleaming brightness (sunlight
reasonable to say that statistical techniques are essential to reflections, glower, luminous lighting, television, or movies),
effectively work in the course of machine learning schemes. boundaries of heat and resonance, and vigorous smells or
Statistical methods can be used to clean and organize the data fumes. Any change in the environment of the headache
geared up for modeling. Statistical hypothesis tests can patient will aggravate the headache suffering. Change in job
support in model assortment and in presenting the expertise and school and a transaction if adaptation will affect the
and predictions from concluding models. This study is migraine patients. Travel, change in diet, change in the
concerned with presenting the associations between the ecological and atmospheric circumstances may raise the
variables, performing some standard statistical and hypothesis headache. Some physical factors also can trigger migraine
tests on the variables, and gives the conclusions about the headaches; including overexertion such as bending, straining,
data. The paper continues with a literature review, data set or lifting; toothache; or contained head or neck pains [3].
Table 1 Data set description Fig 1 Bar plot for hatype variable.
Variable Name Description
The continuous variable age will be displayed by using
id Patient id. a bar plot in Fig 2.
time Time in days comparative to the start of
treatment, which comes at time 0.
dos A period in days from the beginning of the
study.
A reason with levels Aura Mixed No
hatype
Aura, the type of migraine practiced by a
focus.
age At the onset of treatment, in years.
airq A measure of air quality.
An issue with
levels none concentrated systematic,
demonstrating subjects who discontinued
medication
their medication, who sustained but at a
reduced measure, or who persistent at the
previous dose. Fig 2 Bar plot for age variable.
V. STATISTICAL TESTS
A. T-Test
The T-Test used to calculate the mean of two groups of
samples is called 2 sample T-test. The test is to evaluate the
means of the two sets of data are statistically significantly
Fig 3 Histogram for the age variable
vary from each other. Here in this study, the unpaired two-
sample t-test is used to compare the means of two
independent samples “age” and “hatype”.
Fig 4 Box plot of variable age The algorithm for T-Test is as follows:
rm (list = ls())
#a<- read.csv("KosteckiDillon.csv",header = TRUE,sep =
"\t",row.names = 1)
# check the proportion of headachetype with respect to sex
Fig 6 Box plot for comparison of means (age and hatype) vary significantly or not
a<- read.csv("KosteckiDillon.csv")
Figure 6 illustrates the means of the variables “age” and n <- table(a$sex)
“hatype”. f <- table(a$hatype,a$sex)
p1 <- f[2,1]/(f[1,1] + f[2,1])
The Results for t score and p-value while running the p2<- f[2,2] / (f[1,2] + f[2,2])
algorithm in RStudio is as follows: pp <- (f[2,1] + f[2,2]) / (f[1,1] + f[2,1] + f[1,2] + f[2,2])
zdata <- (p1 - p2)/ sqrt(pp * (1 - pp) * (1/n[1] + 1/n[2]))
P-value pvalue <- 2 * pnorm(abs(zdata),lower.tail = FALSE)
Aura
0.1650248 To check the proportion of headache type concerning
sex varies significantly or not the formula for z score has
Tdata been used and the results as follows:
Aura p-value
1.390601 female
1.769618e-128
The inference from the above said T-test value will be
The higher P-value indicates the t-test fails to reject the null From the result, due to the high p-value, the Z test fails
hypothesis (h0). There is no significant dissimilarity among to reject the null hypothesis. Hence there is no significant
“age” concerning “hatype”. headache type in proportion concerning gender.
rm (list = ls()) The subscript “c” is degrees of freedom. “o” and “E”
a<- read.csv("KosteckiDillon.csv") observed and Expected values.
#boxplot(a$age~a$hatype,col=rainbow(3))
boxplot(a$age~a$hatype,col=rainbow(3)) The algorithm to calculate Chi-Square value implemented in
M <- tapply(a$age,a$hatype,mean) R is as follows:
res <- aov(a$age~a$hatype)
summary(res) rm (list = ls())
a<- read.csv("KosteckiDillon.csv")
The builtin function aov(a$age~a$hatype) has been used to #ChiSquare Test
calculate the Analysis of variance. res <- chisq.test(a$hatype,a$sex)
Fig 7 Box plot comparing “age” and “hatype” From the result above it is clear that due to high P-
Value the Chi-square test fails to reject the Null hypothesis
The Result for the above-mentioned statement will be (H0). Hence the proportion of “hatype” is not the same with
Table 3 Result of ANOVA test gender and there is no relationship between headache type
and gender.
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
VI. CONCLUSION AND FURTHER
From the results shown in Table 3, the p-value is high ENHANCEMENTS
and the ANOVA test fails to reject the Null hypothesis (H0).
Hence there is no significant difference between the samples A headache becomes a common disease due to some
“age” and “hatype”. viral infections, present working environment, and other
lifestyle reasons, it is worth to explore about it by using the
D. Chi-Square Test existing data and to know the actual relations of the age,
There are two types of chi-square tests gender, and the type of headache. The main objective is to
help the clinical practitioners who are interested in data
1. A chi-square test for goodness of fit - determines if the analysis and intends to offer a succinct guide that may prove
sample data matches a population in other words goodness useful across a wide range of medical applications. It is
of fit is used to test if sample data fits a distribution from a possible to find the inferences related to migraines by using
certain population. the statistical tests. Further, the predictions and the
2. A chi-square test for independence – tests to see whether classifications can be made by using the statistical models
distributions of categorical variables differ from each available.
other.
REFERENCES
A very small chi-square test statistic means that the
observed data fit the expected data extremely well. There is a [1]. Dorota Talarska,1 MaBgorzata Zgorzalewicz-
relationship that exists between the two variables. A very Stachowiak,2 MichaB Michalak,3 Agrypina
larger chi-square value means that the data does not fit very Czajkowska,1 and Karolina HudaV 2,” Functioning of
well and there is no relationship exists between the variables Women with Migraine Headaches”, Hindawi Publishing
tested. Corporation, Scientific World Journal Volume 2014,
Article ID 492350, 8 pages
A Chi-Square statistic is a dimension of how http://dx.doi.org/10.1155/2014/492350.
opportunities contrast to results. The data used in calculating [2]. https://my.clevelandclinic.org/health/articles/9646-
a chi-square must be random, raw, mutually exclusive, drawn stress-and-headaches
from independent variables, and drawn from a large sample [3]. https://headaches.org/2007/10/25/environmental-
[12]. physical-factors/.