0% found this document useful (0 votes)
73 views52 pages

Knowledge Graph 4 Paper

The document discusses a knowledge graph called Chem2Bio2RDF that connects big data from life sciences. It integrates data from various sources like PubChem, DrugBank, Medline etc. and represents relationships between entities such as compounds, drugs, proteins, genes, pathways and diseases. It addresses the challenge of making sense of large amounts of disconnected data in life sciences by converting it to RDF and modeling semantic relationships. This allows for knowledge discovery through techniques like SPARQL querying, association search, network analysis and visualization.

Uploaded by

itsme_theone4u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views52 pages

Knowledge Graph 4 Paper

The document discusses a knowledge graph called Chem2Bio2RDF that connects big data from life sciences. It integrates data from various sources like PubChem, DrugBank, Medline etc. and represents relationships between entities such as compounds, drugs, proteins, genes, pathways and diseases. It addresses the challenge of making sense of large amounts of disconnected data in life sciences by converting it to RDF and modeling semantic relationships. This allows for knowledge discovery through techniques like SPARQL querying, association search, network analysis and visualization.

Uploaded by

itsme_theone4u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

KnowledgeGraph:

ConnectingBigDataSemantics
YingDing
IndianaUniversity
Outline
Vision
UseCase:VIVOOntology
UseCase:Chem2Bio2RDF
Challenges
VISION
Vision ChangesinSearch
Stringsvs.things
Vision ChangesinSearch
Relation matters:connectingthings/entities
Vision ChangesinSearch
Subgraph:Contextisking
Vision ChangesinSearch
Futuresearch:
stringentityrelationsubgraph
Filippo Menczer &Elinor Ostrom
http://ella.slis.indiana.edu/~dingying/pathfinder3/bin
debug/pathfinder.html
Entities
Entitiesareeverywhere
EntitiesontheWeb:person,location,organization,book,
music(vivoweb.org)
Entitiesinmedicine:gene,drug,disease,protein,side
effect(chem2bio2rdf.org)
VIVO
VIVO:Nationalnetworkingofscientists
VIVO:$12.5MfundedbyNationalInstituteofHealthto
enable nationalnetworkingofscientists
9/1/20098/31/2012,withoneyearextension
www.vivoweb.org,http://sourceforge.net/projects/vivo/
7partners(Univ ofFlorida,CornellUniv,IndianaUniversity,
WashingtonUniv,Scripps,WeillCornell,PonceMedical
School)
ItutilizesSemanticWebtechnologiestomodelscientists
andprovidesfederatedsearchtoenhancethediscoveryof
researchersandcollaboratorsacrossthecountry
Togetherwithitssisterprojecteaglei ($13M),theywill
providethesemanticportalstonetworkpeopleandshare
resources.
VIVOOntology:
ModelingNetworkofScientists
NetworkStructure:
People:foaf:Person,foaf:Organization,
Output:vivo:InformationResources
Relationship:vivo:role

AcademicSetting:
Research(bibo:Document,vivo:Grant,vivo:Project,
vivo:Software,vivo:Dataset,vivo:ResearchLaboratory)
Teaching(vivo:TeacherRole,vivo:Course)
Service(vivo:Service,vivo:EditorRole,vivo:OrganizerRole,)
Expertise(skos:Concept)
Relationshipshavenuances

TheVIVOontologysupportsrepresentingrich
informationaboutrelationshipsandhowthey
changeovertime
descriptionanddurationofapersonsparticipationin
aprojectorevent
currentandformeremployment,withtitlesanddates
authororderinapublication

Implementedasclasseswhosememberswecall
contextnodes
VIVOontologylocalization

Differentlocalizationrequiredbydifferent
institutions
UF,Cornell,IU,WASHU,Scripps,MEDCornell
Howtomakelocalization:
Addinglocalnamespace:
indiana:http://vivo.iu.edu/ontology/vivoindiana/
core:http://vivoweb.org/ontology/core#
LocalclassesarethesubclassesoftheVIVOCore
foaf:Person core:Nonacademicindiana:Professional Staff
indiana:AdministrativeServices
Modelingexamples:Research
Scenario:Prof.KatyBrner coauthoredwith
Nianli,Russell,Angelaforthefollowing
publication:Brner,Katy,Ma,Nianli,Duhon,
RussellJ.,Zoss,AngelaM. (2009)OpenData
and OpenCodefor S&TAssessment. IEEE
IntelligentSystems.24(4),pp.7881,
July/August.
Modelingexamples:Research
<http://vivo.iu.edu/individual/person25557>
rdf:type
<http://vivoweb.org/ontology/core#FacultyMember > .
<http://vivo.iu.edu/individual/person25557>
<http://vivoweb.org/ontology/core#authorInAuthorship>
<http://vivo.iu.edu/individual/n74> .
<http://vivo.iu.edu/individual/n74 >
rdf:type
<http://vivoweb.org/ontology/core#Authorship> .

<http://vivo.iu.edu/individual/n74>
<http://vivoweb.org/ontology/core#linkedInformationResource>
<http://vivo.iu.edu/individual/n7109> .
<http://vivo.iu.edu/individual/n7109>
rdf:type
<http://purl.org/ontology/bibo/Article> .
Modelingexamples:Research
<http://vivo.iu.edu/individual/person714388>
rdf:type
<http://vivoweb.org/ontology/core#NonAcademic> .
<http://vivo.iu.edu/individual/person714388>
<http://vivoweb.org/ontology/core#authorInAuthorship>
<http://vivo.iu.edu/individual/n2881> .
<http://vivo.iu.edu/individual/n2881>
rdf:type
<http://vivoweb.org/ontology/core#Authorship> .
<http://vivo.iu.edu/individual/n2881>
<http://vivoweb.org/ontology/core#authorRank> 2 .

<http://vivo.iu.edu/individual/n2881>
<http://vivoweb.org/ontology/core#linkedInformationResource>
<http://vivo.iu.edu/individual/n7109> .
RDFGraph
rdf:type
rdf:type
core:FacultyM individual:pers individual:per core:NonAcade
ember on25557 son714388 mic

core:Authorship core:authorInAuthorship
core:authorInAuthorship

rdf:type rdf:type
core:authorRank
individual:n28 2
individual:n74
81

core:linkedInformationResource core:linkedInformationResource

individual:n7109

rdf:type

http://purl.org/ontolo
gy/bibo/Article
Applications
Queryingsemanticdata
SPARQLquerybuilder
http://vivoonto.slis.indiana.edu/SPARQL/

FederatedSearch
VIVOSearch
http://vivosearch.org/
CHEM2BIO2RDF
BigDatainLifeSciences
Thereisnowanincrediblyrichresourceofpublicinformation relatingcompounds,targets,
genes,pathways,anddiseases.Justforstartersthereisinthepublicdomaininformationon:
69millioncompoundsand449,392bioassays(PubChem)
59millioncompoundbioactivities (PubChem Bioassay)
4,763drugs(DrugBank)
9millionproteinsequences(SwissProt)and58,0003Dstructures(PDB)
14millionhumannucleotidesequences(EMBL)
22millionlifesciencespublications 800,000neweachyear(PubMed)
Multitudeofothersets(drugs,toxicogenomics,chemogenomics,metagenomics )

Evenmoreimportantaretherelationshipsbetweentheseentities.Forexampleachemical
compoundcanbelinkedtoageneoraproteintargetinamultitudeofways:
Biologicalassaywithpercentinhibition,IC50,etc
Crystalstructureofligand/proteincomplex
Cooccurrenceinapaperabstract
Computationalexperiment(docking,predictivemodel)
Statisticalrelationship
Systemassociation(e.g.involvedinsamepathwayscellularprocesses)
Howtotakeadvantageofbigdata?
Newbiomedical
insights
Nuclearreceptors:
Knowledgediscovery PPARgamma,PXR

processes
SPARQLquerybuilder
AssociationSearch&pathfinding
ChemoHub:networkpredictivemodels
IntegrativeTools&Algorithms Topicmodels&ranking
WENDI&Chemogenomic Explorer
Plotviz 3Dvisualization

Chem2Bio2RDF
PubMedNet
Networksofdata&relationships

Compounds,Drugs,Proteins,
Genes,Pathways,Diseases,
Databases&Publications SideEffects,Publications
Text CSV Table HTML XML

Patient

Disease

Tissue

Cell

Pathway

DNA

RNA

Protein

Drug
Text CSV Table HTML XML
RDF
Patient
needadataformat!
Disease

Tissue

needsemantics! Cell

Pathway

http://chem2bio2rdf.org/drug/troglitazone DNA

RNA
bindTo
Protein

http://chem2bio2rdf.org/target/PPARG Drug
Chem2Bio2RDF
NCIHumanTumorCellLinesData
PubChem CompoundDatabase
PubChem BioassayDatabase
PubChem DescriptionsofallPubChem bioassays
Pub3D:Asimilaritysearchabledatabaseof
minimized3DstructuresforPubChem
compounds
Drugbank
MRTD:AnimplementationoftheMaximum
RecommendedTherapeuticDoseset
Medline:IDsofpapersindexedinMedline,with
SMILESofchemicalstructures 31mchemicalstructures
ChEMBL chemogenomics database 59mbioactivitydatapoints
KEGGLigandpathwaydatabase 3m/19mpublications
ComparativeToxicogenomics Database ~5,000drugs
PhenoPredData
HuGEpedia:anencyclopediaofhumangenetic
variationinhealthanddisease.
DereferenableURI

PlotViz:Visualization
Bio2RDF Browsing

CytoscapePlugin

RDF
Chem2Bio2RDF
Triplestore

LinkedPathGenerationandRanking
LODD

uniprot

Others

SPARQLENDPOINTS Thirdpartytools
RelatingPathwaystoAdverseDrug
Reactions
RDFaloneisnotenough
Needstandardization

TroglitazonebindstoPPARG

Romozins bindstoPPARG

Romozins isanothernameofTroglitazone
Chem2Bio2OWL
33
RDFSearch
TargetforTroglitazone
PREFIXc2b2r:
http://chem2bio2rdf.org/chem2bio2rdf.owl#
PREFIXbp:<http://www.biopax.org/release/biopax
level3.owl#>
PREFIXrdfs:<http://www.w3.org/2000/01/rdf
schema#>

selectdistinct?target
from<http://chem2bio2rdf.org/owl#>

where{

?chemicalrdfs:label ?drugName ;
c2b2r:hasInteraction?interaction.
?interactionc2b2r:hasTarget[bp:name ?target];
c2b2r:drugTargettrue.

FILTER(str(?drugName)="Troglitazone")}

MashedChem2Bio2RDF AnnotatedChem2Bio2OWL
SEMANTIC GRAPH MINING:PATH FINDING ALGORITHM
5 15
2 8 13
23
3 6 14 19
9
16
24
1 21 26
10

18
4 25
7 17
11

20
22
12
Dijkstras algorithm
BioLDA
LatentDirichlet Allocation(LDA)
Thecoreofthegroupofpowerful
statisticalmodelingtechniquesfor
automatedextractionoflatenttopics
fromlargedocumentcollections

BioLDA
ExtendedLDAmodelwithBioterms
aslatentvariable
Bioterms:compound,gene,drug,
disease,protein,sideeffect,
Calculatebiotermentropiesover
pathways topics
UsetheKullbackLeibler
divergenceasthenonsymmetric
distancemeasurefortwobio
termsovertopics
Example:Topic10

ApplyBioLDAon336,899PubMedarticleabstractsin2009andextract50topics
Diversitysubgraph

Fig.RankedassociationgraphsbetweenmyocardialinfarctionandTroglitazone 38
Thiazolinediones (TZDs) revolutionarytreatmentfortypeIIDiabetes

Troglitazone (Rezulin):withdrawnin2000(liverdisease)

Rosiglitazone(Avandia):restrictedin2010(cardiacdisease)

RosiglitazoneboundintoPPAR

Pioglitazone:????(doesdecreasebloodsugarlevels, wasassociatedwithbladdertumorsand
hasbeenwithdrawninsomecountries.)
PPARG:TZDtarget
SAA2: Involvedininflammatoryresponseimplicatedin
cardiovasculardisease(CurrentOpinioninLipidology 15,3,,269
2782004)
APOE:Apolipoprotein E3essentialforlipoproteincatabolism.
Implicatedincardiovasculardisease.
ADIPOQ: Adiponectin involvedinfattyacidmetabolism.
Implicatedinmetabolicsyndrome,diabetesandcardiovascular
disease
CYP2C8:CytochromeP450presentincardiovasculartissueand
involvedinmetabolismofxenobiotics
CDKN2A:Tumorsuppressiongene
SLC29A1:Membranetransporter
SemanticPrediction
http://chem2bio2rdf.org/slap
?
Drug 1 Target 1

Substructure
Sideeffect
Chemicalontology
Geneexpressionprofile
bind

Drug 2
FromLigandperspective
?
Drug1 Target1
Sequence
bind 3Dstructure
GO
Ligand

Target2

Fromtargetperspective
Example:TroglitazoneandPPARG
Associationscore:2385.9
Associationsignificance:9.06x106=>
missinglinkpredicted
Topologyisimportantforassociation

hasSubstructure hasSubstructure bind


Cmpd 1 Cmpd 2 Protein1

hasSubstructure hasSubstructure bind


Cmpd 1 Cmpd 2 Protein1
Semanticsisimportantforassociation
Cmpd bind Protein bind bind Protein
Cmpd 2
1 2 1

bind hasGO hasGO


Protein GO:0000 Protein
Cmpd1
2 1 1

bind Protein PPI Protein


Cmpd1
2 1

hasSideeffect hyperten hasSide ffect bind Protein


Cmpd1 Cmpd 2
sion 1

substruct Protein
Cmpd1 hasSubstructure hasSubstructure Cmpd 2 bind
ure1 1
SLAPPipeline

Pathfiltering
CrosscheckwithSEA
SEAanalysis(Nature462,175181,
2009)predicts184new
compoundtargetpairs,30of
whichwereexperimentallytested
23ofthesepairswere
experimentallyvalidated(<15uM)
including15aminergic GPCR
targetsand8whichcrossedmajor
receptorclassificationboundaries
9oftheaminergic GPCRtarget
pairingswerecorrectlypredicted
bySLAP(p<0.05) fortheother6
compoundswerenotpresentin
ourset
1ofthe8crossboundarypairs
waspredicted
Assessingdrugsimilarityfrom
biologicalfunction
Took157drugswith10known
therapeuticindications,andcreated
SLAPprofilesagainst1,683human
targets
Pearsoncorrelationbetweenprofiles
>0.9fromSLAPwasusedtocreate
associationsbetweendrugs
Drugswiththesametherapeutic
indicationunsurprisinglycluster
together
Somedrugswithsimilarprofilehave
differentindications potentialfor
useindrugrepurposing?
Challenges
Generatingentities:convertingstringstothings
UsingURI toidentify/integrateentities(RDF)
Usingcommonschemastorepresentsemantics (ontologies)
Managingrelations:
Modelproperties ofrelations
Search andrank relations
Handlingcontext:tricky
Triplesvs.Quads
Provenance:whosayswhat,dataprovenance,how(process)
provenance,workflowprovenance
Challenges
Others:
Queryefficiency,
Datasecurity,
Dataquality,

BigData+BigChallenge UnlimitedPotential

Connect Share Discover


Thanks
[email protected]
http://info.slis.indiana.edu/~dingying/index.html

You might also like