OpenVigil2 Tutorial
OpenVigil2 Tutorial
authors:
Rahel Bröhan
Marie Steglich
Ruwen Böhm <[email protected]>
Hans-Joachim Klein <[email protected]>
version: 2015-12-07
Contents
1. Introduction 3
2. Definitions 4
2.1. Pharmacovigilance 4
3. Examples 6
3.1. Individual Safety Reports (ISR) 6
3.3. Query construction for the most reported adverse event connected to a
drug/pharmaproduct 10
3.9. Compare OpenVigil 1 & 2 data (no. reports, PRR) to published data 25
4. SQL-database schema: 32
2
1. Introduction
The data currently used in OpenVigil 2.0 are taken from Adverse Event Reporting System (AERS) of
the Food and Drug Administration (FDA) of the USA and – with respect to information on drugs –
from Drugbank (drugbank.ca) and Drugs@FDA.
The advantage of the FDA source is a large amount of data due to the size of the reporting
population. The disadvantage is that reports of AERS are often incomplete (e.g., missing patient
demographic data) or wrong (e.g., non-professional reporter or biased reporting, see the OpenVigil
cave-at documents1).
Nevertheless this data source can be used to generate hypotheses instead of conducting clinical trials
which might be difficult to realize (e.g., the adverse event is very rare).
OpenVigil 2.0 is a data analysis tool which extracts, filters and analyses pharmacovigilance data (e.g.,
AERS) by different criteria.
The following examples of the tutorial illustrate which queries can be realised by using OpenVigil 2.0.
1
Cave-at documents:
OpenVigil 1: http://www.uni-kiel.de/pharmacology/pvt/caveat.html
3
2. Definitions
2.1. Pharmacovigilance
Pharmacovigilance is the science of drug safety. The observation of pharmaceutical products after
the clinical trials leading to marketing authorization and the collection, monitoring and prevention of
adverse effects belongs to this science. 1
In most jurisdictions it is mandatory for physicians, pharmacists and pharmaceutical companies to
report adverse events.
Since OpenVigil relies on external databases for mapping the drugnames to USAN, there is a risk of
mismappings.
Note that there are also other drugnames like the British Adopted Name (BAN) which exist in the raw
FDA data. BAN allows combining two drugs into one “drugname”, e.g., cotrimoxazole as a
combination of trimethoprim and sulfamethoxazole.
1
http://en.wikipedia.org/wiki/Pharmacovigilance
4
2.3. “Pharmaproduct” (as used by OpenVigil)
OpenVigil uses “pharmaproduct” as notion for pharmaceutical products like a pill or liquid forms like
a suspension or solution for injection which contains a drug(s) and excipient(s). Synonyms of the
term “pharmaproduct” are thus
• medicine,
• medication,
• medicinal product,
• brand,
• brand name and
• pharmaceutical product.
To achieve correct results with OpenVigil 2.0 it is important to differentiate between the term
“pharmaproduct” and the often colloquially synonymously used term “drug”.
An adverse event (AE) is an event which occurs after the use of a pharmaceutical product. This does
not automatically reflect a causal relationship. However, statistical, biological or clinical analysis of
this association might reveal such a causal relationship. In this case it is called adverse drug reaction
(ADR). 2
The Structured Query Language (SQL) is used by OpenVigil to retrieve a certain dataset from a large
database, e.g.
As you can see, SQL is a domain specific language designed for storing, retrieving and modifying data
in a relational database managed by a relational database management system (RDBMS).3
OpenVigil uses a SQL database to store the pharmacovigilance data. For complex queries which
cannot be sufficiently phrased using the available
graphical user interfaces (GUI), a generic SQL interface
was added.
Additionally, when using the GUI in OpenVigil 2.0 to
construct a query, pressing the button “Show Query”
will show the SQL query code(s) which resulted from
your query. You can use this code to build a more
complex query on top of it.
2
http://en.wikipedia.org/wiki/Adverse_event
3
http://en.wikipedia.org/wiki/SQL
5
3. Examples
3.1. Individual Safety Reports (ISR)
Problem: Show all individual safety reports for a new drug (azilsartan medoxomil).
Result: A list of all reports; each single report can be accessed by clicking on the link in the ISR
column.
6
Single Report:
In the ISR above some data (for example age, gender and weight of the patient) are missing.
7
3.2. Interpretation of statistics used in OpenVigil 2.0
Query construction: Choose “drug“ in “OpenVigil Search“; drugname is “loperamide”; adverse event
is “drug abuse”; data presentation and statistics are “Frequentist methods” (i.e., calculate a
contingency table and various observed/expected ratios like PRR); choose an output format (e.g.,
HTML):
Result:
OpenVigil 2.0 counts the number of unique ISRs and not the number of patients (several ISRs can be
connected to a single patient) nor the number of drug-usages.
8
The Chi-Squared value estimates whether observed values in this table differ from expected ones: A
Chi-Square of 5 for a degree of freedom of 1 (= 2x2 table) tells us that the difference shown by the
PRR exists with a probability of 97,5 %.4 5
The PRR (Proportional Reporting Ratio) in this case is 2.077. This tells us that drug abuse occurs twice
as frequently for loperamide compared to all other drugs.
The ROR (Reports Odds Ratio) is 2.081, which means that the odds for drug abuse in case of using
loperamide is twice the odds than for all other drugs.6
The lower bound of the confidence interval is 1.182; the upper bound is 3.664 (with a confidence
level of 95 % the true ROR value is in this confidence interval). Since the lower bound is > 1, we can
assume with more than 95% probability that there is a disproportionality.
Details for observed/expected ratios like PRR and ROR can be found in the disproportionality analysis
primer on the OpenVigil 2 website.
The result of this example might refer to the use of loperamide as an illicit drug.
Loperamide is able to cross the blood-brain barrier but is normally immediately pumped out again by
the p-glycoprotein (=ABCB1, MDR1). If loperamide is taken in combination with substances that
inhibit p-glycoprotein like quinidine, loperamide has effects on the central nervous system.8
Another explanation for the result is that loperamide is a drug used against diarrhoea. Drug addicts
are often medicated with loperamide to prevent the diarrhoea which is a consequence of the drug
withdrawal. People might have reported wrong data concerning loperamide to the AERS. For
example, adverse event and indication might have been switched: Drug abuse is the reason why
loperamide is used and not the consequence.
4
http://math.hws.edu/javamath/ryan/ChiSquare.html
5
https://people.richland.edu/james/lecture/m170/tbl-chi.html
6
http://en.wikipedia.org/wiki/Odds_ratio
8
http://en.wikipedia.org/wiki/Loperamide
9
3.3. Query construction for the most reported adverse event connected to a
drug/pharmaproduct
Problem: What are the most reported adverse events connected to the drug amiodarone?
Query construction: Choose “drug“ in “OpenVigil Search“; drugname is ”amiodarone”; no raw data
shall be reported but a list of occurrences of each adverse event.
Result:
Remember that these are just raw counts that have to be normalized to other drugs (e.g., by using
PRR, see example 5, or by using drug utilization data).
10
3.4. Query construction for a specific time interval
Problem: How many hypoglycaemic adverse events are reported for glibenclamide (USAN glyburide)
in the year 2008? How many adverse events are reported in total?
Query construction: Choose “drug“ in “OpenVigil Search“; drugname is ”glyburide”; use the
“Advanced search” to define the reporting date to the FDA (in this case the reporting date shall be
within 2008); data presentation and statistics are “Frequency”. Output format is “Excel CSV” for
further analysis and visualisation in a spreadsheet program.
Result: An Excel document with two columns – name and count of the events.
There are 93 ISRs with the adverse event “hypoglycaemia” reported for glibenclamide.
7009 adverse events have been reported in total.
11
3.5. Proportional Reporting Ratio (PRR) analysis of a drug or pharmaproduct
Problem: How likely is it that the reported adverse events are truly adverse drug reactions specific to
the drug amiodarone?
Result:
Cave: If you cannot properly import numbers to your spreadsheet software, this might be due to the
different symbols used for decimal marks. OpenVigil uses the U.S. american symbols, i.e., a
point represents a decimal mark. For further information see:
http://en.wikipedia.org/wiki/Decimal_mark
12
Use the columns “prr” and “chi-square” to create a graph: x-axis title is “PRR”; y-axis title is “Chi-
Square”.
Changing the scale of both axes to logarithmic gives the final PRR graph:
The upper-right quadrant contains putative adverse drug reactions. Everything else is just an adverse
event.
In the result list “drug interaction” (cf. example above) is reported with a PRR of 8.151 and a Chi-
Squared value of 4042. Due to this drug interaction is very likely an adverse drug reaction of
amiodarone.
However, prior knowledge of this CYP3A4 inhibtion by amiodaron will influence reporting of these
cases and thus skew the results.
13
3.6. Reverse PRR analysis of an adverse event
Query construction: Adverse reaction is ”agranulocytosis”; data presentation and statistics are
“Frequentist methods” (Reverse PRR analysis of the adverse event “agranulocytosis”). “Excel CSV” is
chosen as output format for further analysis and visualisation in a spreadsheet program.
Results:
14
Create a PRR graph like in the example above:
The upper-right quadrant contains drugs that likely have agranulocytosis as an adverse drug reaction,
for example pirenzepine, a drug used in treatment of peptic ulcer9: PRR 178.020285; Chi-Square:
3086.672747; Pirenzepine is shown in the result list with 21 occurrences for agranulocytosis.
You can also choose “HTML” as output format of the query result. The query result is shown in a new
window of the browser:
Tip: The result list can be sorted according to the values in a column by clicking on the arrows in the
corresponding column header (for example data can be sorted in ascending order.)
9
http://en.wikipedia.org/wiki/Pirenzepine
15
In addition to this, the list can be sorted by two criteria (like for example rPRR in descending order
and Chi-Squared value in ascending order) by holding down the shift key and clicking on a second
arrow:
16
3.7. Query construction for different adverse events
Problem: What are the two most reported pharmaproducts with gastrointestinal haemorrhage as an
adverse event?
Result:
17
The two most reported pharmaproducts with gastrointestinal haemorrhage as an adverse event are
aspirin and pradaxa.
18
3.8. Structure Query Language (SQL)
Problem: The occurrence of gastrointestinal haemorrhage as an adverse event of the two most used
acetylsalicylic acid-containing pharmaproducts shall be compared. A very complex query was
constructed that cannot be created with the GUI of OpenVigil 2.0:
select
count(drugusage.brandname),drugusage.brandname
from
drugusage, pharmaproduct, product
where
product.drugname ='acetylsalicylic acid' and
pharmaproduct.brandname=product.brandname and
product.brandname=drugusage.brandname
group by
drugusage.brandname
order by
count(drugusage.brandname) desc
19
Results: Query result is a list with 31 pharmaproducts (brand names).
20
Result is a list of pharmaceutical products (“pharmaproducts”):
In this example Bufferin® and Ecotrin® are compared to each other. Both pharmaproducts contain no
other drugs except acetylsalicylic acid and appear to be used with a similar frequency, extrapolated
from the number of reports in the database.
21
Choose “pharmaproduct” in “OpenVigil Search”; product name is “bufferin” (“ecotrin”); adverse
event is “gastrointestinal haemorrhage”. Data presentation and statistics are “Frequentist methods”;
output format of the query result is “HTML”.
22
Comparing the PRR of bufferin (3.194307) and ecotrin (4.692033), it is obvious that gastrointestinal
haemorrhage is very likely an adverse drug reaction to both pharmaproducts. Gastrointestinal
haemorrhage occurs three times more frequently for bufferin than for all other drugs, while it occurs
for ecotrin even four times more. The values for Chi-Squared confirm the results of the PRR (7.00987
for bufferin (the difference shown by the PRR exists with a probability of 99.995 %)10; 34.278924 for
ecotrin).
1 0
https://people.richland.edu/james/lecture/m170/tbl-chi.html
23
The results of the two contingency tables can be merged in one table for further analysis (e.g., Fisher
exact test, Chi-Squared test):
24
3.9. Compare OpenVigil 1 & 2 data (no. reports, PRR) to published data
Introduction: This example stresses the importance of carefully checking any results obtained.
Common pitfalls are
• counting multiplicates,
• counting ambiguous reports and
• accidentally losing portion of the raw data.
These can happen at every time in the workflow. Therefore, it is important to know your data! Try
different extraction conditions, check numbers for plausibility and browse result lists to manually
screen the data.
Problem: Sakaeda et al. (Sakaeda T, Tamon A, Kadoyama K, Okuno Y. Data mining of the public
version of the FDA Adverse Event Reporting System. Int. J. Med. Sci. 2013; 10(7):796-803. doi:
10.7150/ijms.6048 , http://www.medsci.org/v10p0796.htm ) report their results of data-mining AERS
data from 2004 to 2009 for “warfarin” and other drugs and the adverse event “haematemesis” (see
table below at the end of this example). The number of co-occurences (drug used, adverse event
seen) was reported to be 268. A subsequent analysis of disproportionality did not reveal a statistical
significant association.
Can we reproduce this data?
Discussion: OpenVigil 2 operates on cleaned and validated FDA data only. The drug “warfarin” is
referred to in AERS data/marketed as
• warfarin
• Waran
• Jantoven
• Coumadin
• Lawarin
• Marevan
• Warfant
• coumarin derivative
and perhaps other names which we could not identify.
25
Hint: You can also use OpenVigil 2 to learn more
about drugs and pharmaproducts. Select
Browse and Drugs to see a list of drugnames.
Clicking on drug shows you the associated
pharmaproducts (=brandnames).
Drugs named something like “WARFARIN 5 MG” are currently discarded in OpenVigil 2 since the the
current version of OpenVigil 2 does not know what “5 MG” means. The misspelled “COUMADIN
(WAFRARIN SODIUM)” is not ambiguous for humans and should be mapped to warfarin, too. We are
trying to improve that while at the same time keeping all drug-mapping unambiguous: Verbatim
drugnames containing “BLIND” (like “BLINDED: WARFARIN SODIUM”) or ambiguous drug-names like
“COUMADIN (CLOTRIMAZOLE)” must never be mapped to warfarin.
Finally, one has to decide whether “COUMARIN DERIVATE” should be included since drugs named
like this or named “COUMARIN AND TROXERUTIN” or “ESBERIVEN (COUMARIN, HEPARIN SODIUM,
MELILOT, RUTIN)” are probably not used to inhibit blood clotting and might contain no warfarin (a 4-
hydroxy derivate if coumarin) at all.
The 162 cases in OpenVigil 2.0 are correct: You can look at the original free-text drugname and verify
that only precise, unambiguous reports were considered.
However, OpenVigil 2.0 uses unique ISRs (162) for counting while unique CASEs (140) are probably
the only reasonable way to count in this scenario. This mode of counting was added in OpenVigil 2.1.
Unfortunately, OpenVigil does currently not offer an automated check for multiplicates other than
via CASE/ISR so the result list has to be screened manually.
Sakaeda states that “the total number of reports used was 2,231,029”.
AERS raw data is published quarterly. The lines in the DEMO AERS files from 2004Q1 to 2009Q4 were
counted:
wc DEMO0[4-9]*TXT
2234955
The result contains 24 header lines. Thus the real number of records is 2234931.
26
That’s 3,902 reports too much compared to Sakaeda. Some lines are discarded before importing
them into SQL database due to syntax errors (i.e., wrong amount of items per line). The current
importer of OpenVigil 1 just skips all non-matching data. The OpenVigil 2 import process provides an
error correction mode and suggestions like merging two adjecent text lines. E.g., while OpenVigil 1
has discarded the two lines, OpenVigil 2 has merged them to one record. OpenVigil 1 stores these
import failures in the database (http://www.uni-
kiel.de/pharmacology/pvt/openvigil.php?cd=if). However, the DEMO files in question had
only one premature line break in DEMO09Q3 that results in two lines being discarded. So that’s still
3,901 to 3,900 reports more in the raw data compared to Sakaeda.
Within OpenVigil 2 there is currently no easy way to analyse certain data files only. Instead, we have
to rely on date fields in the DEMO table that tell us whether a report falls into the period 2004 to
2009. Of note, future DEMO tables can contain reports from previous quarters. OpenVigil 1 offers the
possibility to include only or exclude data from certain quarterly FDA AERS files.
So we’ve counted total number of reports (containing duplicates), reports with unique ISR and
reports with unique CASENO for the period where the time period is defined by either FDA_DT,
MFR_DT or EVENT_DT for all data imported from DEMO04Q1 to DEMO09Q4 in OpenVigil 1:
Out of curiosity, we have also counted all reports/cases minus the reports in the data files from
2004Q1 to 2005Q2 (see below for explanation).
Data files and filtering all reports unique ISR unique CASENO
all files (2004-2012) and 2234986 2231030 1645633
2003-12-31 >FDA_DT < 2010-01-01
all reports in the quaterly files 2004-2009 2234929 2231036 1645605
only the quaterly files 2004-2009 and
2003-12-31 > date < 2010-01-01
FDA_DT 2234923 2231030 1645600
EVENT_DT 1655915 1653317 1184848
MFR_DT 2180288 2176768 1584290
FDA_DT minus data files 1805798 1803719 1331082
DEMO04Q1 till DEMO05Q2
Sakaeda 2013 2231029 not provided 1644220
raw line count (minus headers) 2234931 n/a n/a
Warfarin 148, Waran 3, Jantoven 1, Coumadin 109 (originally 110, but manual inspection of the list
shows one overlap to warfarin since “WARFARIN 2.5 MG COUMADIN” was reported), Marevan 7
adding up to 268.
Thus, on first glance, we have found exactly as many “co-occurences” as Sakaeda.
Calculating the PRR is not automatically possible in OpenVigil 1.2.6 since the total number of reports
containing one of the above listed terms needs to be added up while avoiding double counting.
SQL query construction in OpenVigil 1: We use the SQL code that was generated by the query above
and fine-tune it to
SELECT DRUG.DRUGNAME,COUNT(DEMO.ISR),COUNT(DISTINCT
DEMO.ISR),COUNT(DISTINCT DEMO.CASENO) FROM DRUG,REAC,DEMO WHERE
((DRUG.DRUGNAME LIKE "%WARAN%" OR DRUG.DRUGNAME LIKE "%WARFARIN%" OR
DRUG.DRUGNAME LIKE "%COUMADIN%" OR DRUG.DRUGNAME LIKE "%JANTOVEN%"
OR DRUG.DRUGNAME LIKE "%MAREVAN%") AND REAC.PT="HAEMATEMESIS" AND
DEMO.FDA_DT >= "2004-01-01" AND DEMO.FDA_DT <= "2009-12-31") AND
DRUG.ISR=REAC.ISR AND DRUG.ISR=DEMO.ISR GROUP BY DRUG.DRUGNAME DESC;
The result is a list of ISRs and CASEs containing grouped by the different drugnames, adding up to
268 reports of which 256 have a unique ISR of which 212 have a unique CASENO:
28
Therefore, only 212 unique patients for warfarin (and generic) and the adverse event haematemesis
appear to exist – but re-performing the query without grouping (no “GROUP BY DRUG.DRUGNAME
DESC”) shows even less, just 202 distinct cases:
Obviously, some patients were on more than just one warfarin-containing drug and were thus listed
several times in the output shown above.
The next step was to inspect the raw data to find any oddities:
It became apparent that no reports in 2004 and 2005 januar-june were included in this list. How
could that be? We realized that the DEMO data prior to 2005Q3 were not imported properly into
OpenVigil 1.2.3 at the time of the above presented analyses due to a change in the FDA data format
in one data table. Re-performing the analysis with these data yields more reports (and cases):
29
There appear to be 413 reports from 299 distinct cases.
Hint: You can emulate losing data prior to 2005Q3 in OpenVigil 1 by adding
to the WHERE clause your SQL query like we did to obtain the screenshots above in spite of now
using the complete dataset.
It is always important to look at the raw data before trusting any automated countings:
This resulting list has ideally to be completely scanned for multiplicates. E.g., we found the reports
#5503640 and #5502179 which were both linked to different CASENO but have otherwise identical
demographic data including date of death. Another example is #5064922 and #5655430. More
examples might be there but we have not yet established a fast protocol to detect multiplicates.
However, extrapolating from our findings here, we estimate that less than 1% are multiplicates.
Similar, one would need to run the above query without the adverse event and a third time with the
adverse event but without the drugs to populate the 2x2 contingency table for disproportionality
30
analysis. Before these numbers can be trusted, duplicates have to be eliminated (e.g., case 4004520
and 3909737 appear to be the same). Furthermore, the dataset in question has records like
“[THERAPY UNSPECIFIED]” (76 records), “.” (16 records) or “1 CONCOMITANT DRUG” (14 records) are
impossible to map to a drugname and thus need a pre-defined way of dealing with. We’ll leave this
as exercise to the reader. ;-)
Conclusions:
Using OpenVigil 1 is tedious work: You have to think yourself about which names and synonyms to
use. Due to the constraints in the OpenVigil 1 implementation running currently at Kiel University,
you cannot put everything into one big query. The output has to be manually checked to avoid
duplicates.
Using OpenVigil 1 with SQL allows extraction of raw data which can further cleansed, e.g., of the 268
resp. 413 reports initially mentioned above, only at most 202 resp. 299 are unique cases.
OpenVigil 2 is much easier to use but offers just 140 resp. 143 of the putative 299 cases. However,
here you can trust that only valid reports with an unambiguous mapping of the free-text drugname
to a USAN drugname were included in the analysis. A reason for not finding the potential additional
reports can be our drugname mapping system: Names like “WARFARIN 5 MG”, “WARFARIN
(WARFARIN POTASSIUM)”, “WARFARIN 2.5 MG COUMADIN“ are clear and understandable for
human users but the drugname mapping system currently discards these verbatim “drugnames” to
avoid potential mismapping.
There is no exact information available on how Sakaeda extracted the 268 cases and the other non-
case-numbers needed for disproportionality analysis since the Japanese closed source system CzeekV
by Kyoto Constella Technology was used. It is interesting to see that we can reproduce the number
268 when counting reports (including duplicates) and not using data prior to 2005Q3.
We can see that changes in the number of cases (268 vs 162) and non-cases (the remaining 3 fields of
the 2x2 contingency table) can have a serious impact on signal generation (PRR 1.991 is smaller than
2 and does thus not yield a signal).
31
4. SQL-database schema:
32
5. References and resources
http://math.hws.edu/javamath/ryan/ChiSquare.html
http://en.wikipedia.org/wiki/Adverse_event
http://en.wikipedia.org/wiki/Loperamide
http://en.wikipedia.org/wiki/Odds_ratio
http://en.wikipedia.org/wiki/Pharmacovigilance
http://en.wikipedia.org/wiki/Pirenzepine
http://en.wikipedia.org/wiki/Proportional_reporting_ratio
http://en.wikipedia.org/wiki/SQL
https://people.richland.edu/james/lecture/m170/tbl-chi.html
33