Knowledge Graph 4 Paper
Knowledge Graph 4 Paper
ConnectingBigDataSemantics
YingDing
IndianaUniversity
Outline
Vision
UseCase:VIVOOntology
UseCase:Chem2Bio2RDF
Challenges
VISION
Vision ChangesinSearch
Stringsvs.things
Vision ChangesinSearch
Relation matters:connectingthings/entities
Vision ChangesinSearch
Subgraph:Contextisking
Vision ChangesinSearch
Futuresearch:
stringentityrelationsubgraph
Filippo Menczer &Elinor Ostrom
http://ella.slis.indiana.edu/~dingying/pathfinder3/bin
debug/pathfinder.html
Entities
Entitiesareeverywhere
EntitiesontheWeb:person,location,organization,book,
music(vivoweb.org)
Entitiesinmedicine:gene,drug,disease,protein,side
effect(chem2bio2rdf.org)
VIVO
VIVO:Nationalnetworkingofscientists
VIVO:$12.5MfundedbyNationalInstituteofHealthto
enable nationalnetworkingofscientists
9/1/20098/31/2012,withoneyearextension
www.vivoweb.org,http://sourceforge.net/projects/vivo/
7partners(Univ ofFlorida,CornellUniv,IndianaUniversity,
WashingtonUniv,Scripps,WeillCornell,PonceMedical
School)
ItutilizesSemanticWebtechnologiestomodelscientists
andprovidesfederatedsearchtoenhancethediscoveryof
researchersandcollaboratorsacrossthecountry
Togetherwithitssisterprojecteaglei ($13M),theywill
providethesemanticportalstonetworkpeopleandshare
resources.
VIVOOntology:
ModelingNetworkofScientists
NetworkStructure:
People:foaf:Person,foaf:Organization,
Output:vivo:InformationResources
Relationship:vivo:role
AcademicSetting:
Research(bibo:Document,vivo:Grant,vivo:Project,
vivo:Software,vivo:Dataset,vivo:ResearchLaboratory)
Teaching(vivo:TeacherRole,vivo:Course)
Service(vivo:Service,vivo:EditorRole,vivo:OrganizerRole,)
Expertise(skos:Concept)
Relationshipshavenuances
TheVIVOontologysupportsrepresentingrich
informationaboutrelationshipsandhowthey
changeovertime
descriptionanddurationofapersonsparticipationin
aprojectorevent
currentandformeremployment,withtitlesanddates
authororderinapublication
Implementedasclasseswhosememberswecall
contextnodes
VIVOontologylocalization
Differentlocalizationrequiredbydifferent
institutions
UF,Cornell,IU,WASHU,Scripps,MEDCornell
Howtomakelocalization:
Addinglocalnamespace:
indiana:http://vivo.iu.edu/ontology/vivoindiana/
core:http://vivoweb.org/ontology/core#
LocalclassesarethesubclassesoftheVIVOCore
foaf:Person core:Nonacademicindiana:Professional Staff
indiana:AdministrativeServices
Modelingexamples:Research
Scenario:Prof.KatyBrner coauthoredwith
Nianli,Russell,Angelaforthefollowing
publication:Brner,Katy,Ma,Nianli,Duhon,
RussellJ.,Zoss,AngelaM. (2009)OpenData
and OpenCodefor S&TAssessment. IEEE
IntelligentSystems.24(4),pp.7881,
July/August.
Modelingexamples:Research
<http://vivo.iu.edu/individual/person25557>
rdf:type
<http://vivoweb.org/ontology/core#FacultyMember > .
<http://vivo.iu.edu/individual/person25557>
<http://vivoweb.org/ontology/core#authorInAuthorship>
<http://vivo.iu.edu/individual/n74> .
<http://vivo.iu.edu/individual/n74 >
rdf:type
<http://vivoweb.org/ontology/core#Authorship> .
<http://vivo.iu.edu/individual/n74>
<http://vivoweb.org/ontology/core#linkedInformationResource>
<http://vivo.iu.edu/individual/n7109> .
<http://vivo.iu.edu/individual/n7109>
rdf:type
<http://purl.org/ontology/bibo/Article> .
Modelingexamples:Research
<http://vivo.iu.edu/individual/person714388>
rdf:type
<http://vivoweb.org/ontology/core#NonAcademic> .
<http://vivo.iu.edu/individual/person714388>
<http://vivoweb.org/ontology/core#authorInAuthorship>
<http://vivo.iu.edu/individual/n2881> .
<http://vivo.iu.edu/individual/n2881>
rdf:type
<http://vivoweb.org/ontology/core#Authorship> .
<http://vivo.iu.edu/individual/n2881>
<http://vivoweb.org/ontology/core#authorRank> 2 .
<http://vivo.iu.edu/individual/n2881>
<http://vivoweb.org/ontology/core#linkedInformationResource>
<http://vivo.iu.edu/individual/n7109> .
RDFGraph
rdf:type
rdf:type
core:FacultyM individual:pers individual:per core:NonAcade
ember on25557 son714388 mic
core:Authorship core:authorInAuthorship
core:authorInAuthorship
rdf:type rdf:type
core:authorRank
individual:n28 2
individual:n74
81
core:linkedInformationResource core:linkedInformationResource
individual:n7109
rdf:type
http://purl.org/ontolo
gy/bibo/Article
Applications
Queryingsemanticdata
SPARQLquerybuilder
http://vivoonto.slis.indiana.edu/SPARQL/
FederatedSearch
VIVOSearch
http://vivosearch.org/
CHEM2BIO2RDF
BigDatainLifeSciences
Thereisnowanincrediblyrichresourceofpublicinformation relatingcompounds,targets,
genes,pathways,anddiseases.Justforstartersthereisinthepublicdomaininformationon:
69millioncompoundsand449,392bioassays(PubChem)
59millioncompoundbioactivities (PubChem Bioassay)
4,763drugs(DrugBank)
9millionproteinsequences(SwissProt)and58,0003Dstructures(PDB)
14millionhumannucleotidesequences(EMBL)
22millionlifesciencespublications 800,000neweachyear(PubMed)
Multitudeofothersets(drugs,toxicogenomics,chemogenomics,metagenomics )
Evenmoreimportantaretherelationshipsbetweentheseentities.Forexampleachemical
compoundcanbelinkedtoageneoraproteintargetinamultitudeofways:
Biologicalassaywithpercentinhibition,IC50,etc
Crystalstructureofligand/proteincomplex
Cooccurrenceinapaperabstract
Computationalexperiment(docking,predictivemodel)
Statisticalrelationship
Systemassociation(e.g.involvedinsamepathwayscellularprocesses)
Howtotakeadvantageofbigdata?
Newbiomedical
insights
Nuclearreceptors:
Knowledgediscovery PPARgamma,PXR
processes
SPARQLquerybuilder
AssociationSearch&pathfinding
ChemoHub:networkpredictivemodels
IntegrativeTools&Algorithms Topicmodels&ranking
WENDI&Chemogenomic Explorer
Plotviz 3Dvisualization
Chem2Bio2RDF
PubMedNet
Networksofdata&relationships
Compounds,Drugs,Proteins,
Genes,Pathways,Diseases,
Databases&Publications SideEffects,Publications
Text CSV Table HTML XML
Patient
Disease
Tissue
Cell
Pathway
DNA
RNA
Protein
Drug
Text CSV Table HTML XML
RDF
Patient
needadataformat!
Disease
Tissue
needsemantics! Cell
Pathway
http://chem2bio2rdf.org/drug/troglitazone DNA
RNA
bindTo
Protein
http://chem2bio2rdf.org/target/PPARG Drug
Chem2Bio2RDF
NCIHumanTumorCellLinesData
PubChem CompoundDatabase
PubChem BioassayDatabase
PubChem DescriptionsofallPubChem bioassays
Pub3D:Asimilaritysearchabledatabaseof
minimized3DstructuresforPubChem
compounds
Drugbank
MRTD:AnimplementationoftheMaximum
RecommendedTherapeuticDoseset
Medline:IDsofpapersindexedinMedline,with
SMILESofchemicalstructures 31mchemicalstructures
ChEMBL chemogenomics database 59mbioactivitydatapoints
KEGGLigandpathwaydatabase 3m/19mpublications
ComparativeToxicogenomics Database ~5,000drugs
PhenoPredData
HuGEpedia:anencyclopediaofhumangenetic
variationinhealthanddisease.
DereferenableURI
PlotViz:Visualization
Bio2RDF Browsing
CytoscapePlugin
RDF
Chem2Bio2RDF
Triplestore
LinkedPathGenerationandRanking
LODD
uniprot
Others
SPARQLENDPOINTS Thirdpartytools
RelatingPathwaystoAdverseDrug
Reactions
RDFaloneisnotenough
Needstandardization
TroglitazonebindstoPPARG
Romozins bindstoPPARG
Romozins isanothernameofTroglitazone
Chem2Bio2OWL
33
RDFSearch
TargetforTroglitazone
PREFIXc2b2r:
http://chem2bio2rdf.org/chem2bio2rdf.owl#
PREFIXbp:<http://www.biopax.org/release/biopax
level3.owl#>
PREFIXrdfs:<http://www.w3.org/2000/01/rdf
schema#>
selectdistinct?target
from<http://chem2bio2rdf.org/owl#>
where{
?chemicalrdfs:label ?drugName ;
c2b2r:hasInteraction?interaction.
?interactionc2b2r:hasTarget[bp:name ?target];
c2b2r:drugTargettrue.
FILTER(str(?drugName)="Troglitazone")}
MashedChem2Bio2RDF AnnotatedChem2Bio2OWL
SEMANTIC GRAPH MINING:PATH FINDING ALGORITHM
5 15
2 8 13
23
3 6 14 19
9
16
24
1 21 26
10
18
4 25
7 17
11
20
22
12
Dijkstras algorithm
BioLDA
LatentDirichlet Allocation(LDA)
Thecoreofthegroupofpowerful
statisticalmodelingtechniquesfor
automatedextractionoflatenttopics
fromlargedocumentcollections
BioLDA
ExtendedLDAmodelwithBioterms
aslatentvariable
Bioterms:compound,gene,drug,
disease,protein,sideeffect,
Calculatebiotermentropiesover
pathways topics
UsetheKullbackLeibler
divergenceasthenonsymmetric
distancemeasurefortwobio
termsovertopics
Example:Topic10
ApplyBioLDAon336,899PubMedarticleabstractsin2009andextract50topics
Diversitysubgraph
Fig.RankedassociationgraphsbetweenmyocardialinfarctionandTroglitazone 38
Thiazolinediones (TZDs) revolutionarytreatmentfortypeIIDiabetes
Troglitazone (Rezulin):withdrawnin2000(liverdisease)
Rosiglitazone(Avandia):restrictedin2010(cardiacdisease)
RosiglitazoneboundintoPPAR
Pioglitazone:????(doesdecreasebloodsugarlevels, wasassociatedwithbladdertumorsand
hasbeenwithdrawninsomecountries.)
PPARG:TZDtarget
SAA2: Involvedininflammatoryresponseimplicatedin
cardiovasculardisease(CurrentOpinioninLipidology 15,3,,269
2782004)
APOE:Apolipoprotein E3essentialforlipoproteincatabolism.
Implicatedincardiovasculardisease.
ADIPOQ: Adiponectin involvedinfattyacidmetabolism.
Implicatedinmetabolicsyndrome,diabetesandcardiovascular
disease
CYP2C8:CytochromeP450presentincardiovasculartissueand
involvedinmetabolismofxenobiotics
CDKN2A:Tumorsuppressiongene
SLC29A1:Membranetransporter
SemanticPrediction
http://chem2bio2rdf.org/slap
?
Drug 1 Target 1
Substructure
Sideeffect
Chemicalontology
Geneexpressionprofile
bind
Drug 2
FromLigandperspective
?
Drug1 Target1
Sequence
bind 3Dstructure
GO
Ligand
Target2
Fromtargetperspective
Example:TroglitazoneandPPARG
Associationscore:2385.9
Associationsignificance:9.06x106=>
missinglinkpredicted
Topologyisimportantforassociation
substruct Protein
Cmpd1 hasSubstructure hasSubstructure Cmpd 2 bind
ure1 1
SLAPPipeline
Pathfiltering
CrosscheckwithSEA
SEAanalysis(Nature462,175181,
2009)predicts184new
compoundtargetpairs,30of
whichwereexperimentallytested
23ofthesepairswere
experimentallyvalidated(<15uM)
including15aminergic GPCR
targetsand8whichcrossedmajor
receptorclassificationboundaries
9oftheaminergic GPCRtarget
pairingswerecorrectlypredicted
bySLAP(p<0.05) fortheother6
compoundswerenotpresentin
ourset
1ofthe8crossboundarypairs
waspredicted
Assessingdrugsimilarityfrom
biologicalfunction
Took157drugswith10known
therapeuticindications,andcreated
SLAPprofilesagainst1,683human
targets
Pearsoncorrelationbetweenprofiles
>0.9fromSLAPwasusedtocreate
associationsbetweendrugs
Drugswiththesametherapeutic
indicationunsurprisinglycluster
together
Somedrugswithsimilarprofilehave
differentindications potentialfor
useindrugrepurposing?
Challenges
Generatingentities:convertingstringstothings
UsingURI toidentify/integrateentities(RDF)
Usingcommonschemastorepresentsemantics (ontologies)
Managingrelations:
Modelproperties ofrelations
Search andrank relations
Handlingcontext:tricky
Triplesvs.Quads
Provenance:whosayswhat,dataprovenance,how(process)
provenance,workflowprovenance
Challenges
Others:
Queryefficiency,
Datasecurity,
Dataquality,
BigData+BigChallenge UnlimitedPotential