User:ProteinBoxBot
This user account is a bot with a bot flag. The bot is operated by Andrawaag and Sulhasan. |
Purpose
[edit]The objective of this bot is to provide WikiData with up-to-date high quality information about genes, diseases, and drugs from authoritative sources. These concepts will form the backbone upon which many biomedical applications of WikiData will be based. Specifically it will make it possible to answer important biomedical questions using the Wikidata query service. We are working to establish a common set of standards for representing the evidence and provenance of this kind of information in wikidata and will be working to apply these standards to all of the work described below. For more information on the Gene Wiki project as a whole, please see WikiProject Gene Wiki.
Sister Bots
[edit]To better divide the many tasks we are undertaking, our team also runs these bot accounts:
Data Sources
[edit]ProteinBoxBot
[edit]Name | Data Used |
---|---|
mygene.info | NCBI Entrez, Ensembl, Uniprot |
Gene Ontology | ontology |
Disease Ontology | ontology |
Interpro | ontology, protein annotations |
Phenocarta | GWAS Catalog |
SoCalChemBot
[edit]Name | Data Used |
---|---|
PubChem | |
Guide to Pharmacology | |
ChEBI | |
DrugBank | |
FDA UNII | |
ChEMBL | |
NDF-RT |
MicrobeBot
[edit]Bot tasks and state
[edit]Bots use a python module for reading and writing to Wikidata called WikidataIntegrator. The open source bot code is divided into a collection of tasks. The initial tasks are concerned with establishing sets of entities corresponding to the three main classes (genes, diseases, drugs) and creating a stable cycle of updates. The next level of tasks focuses on establishing relationships between these entities. All bot edits are based on content from trusted, manually curated scientific resources. For additional information about each bot task, follow the links in the status table below.
Bot task | Discussion started | Coding and testing | Production ready | Is approved | Has been run |
---|---|---|---|---|---|
Gene and protein items | x | x | x | x | x |
Gene Ontology | x | x | x | x | x |
Disease items | x | x | x | x | x |
Drug items | x | x | x | x | x |
Gene-drug links | x | x | x | x | x |
Gene-disease links | x | x | x | x | x |
Drug-disease links | x | x | x | x | x |
Microbial gene and protein items | x | x | x | x | x |
Protein Families | x | ||||
GO Protein Annotations | x |
Bot Status
[edit]The results of scheduled bot runs are automatically added to User:ProteinBoxBot/Bot_Status. This table is automatically updated by Jenkins after each bot run. Reports of each run are generated and linked under the "Log Report" column.
Legalities
[edit]A lot of the work done by this bot involves the import, synchronization, and maintenance of information brought in from other sources. Where those sources are not entirely in the public domain, specific agreements need to be reached about which content can be brought into wikidata and hence rendered CC0. We will track these agreements on the legal subpage.
Task permission requests
[edit]- (1) Initial gene/protein test run circa 2013: (closed)
- (2) Genes: Entrez gene (approved)
- (3) Diseases: Disease Ontology (approved)
- (4) Drugs: Drugbank drugs (approved)
Discussions
[edit]- (Open) To track resource license information for WD, a tracking table like one that Daniel Himmelstein did in one of his last projects could be useful. [[1]]
- (Open) Getting disease content from wikidata into the disease infobox on wikipedia/
- (closed) Handling interwiki links for genes (on Wikipedia). https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Molecular_and_Cellular_Biology#Preparing_for_WikiData_in_gene.2Fprotein_infoboxes
- (closed) Handling interwiki links(wikidata talk). (Started June 29, 2015)
- (closed) Gaining re-approval of bot following the bots blockage in response to errors related to creating duplicate items. (Started June 2, 2015)
- (Closed) SubclassOf disease.
- (completed) Representing genes, proteins, functions, and orthologues.
Workshops
[edit]Date/Time | Venue | Title | Location |
---|---|---|---|
2020-10-29/TBA | Biocuration online workshop | Gene Wiki: how to synchronize and curate primary sources with and in Wikidata | online (Zoom, link TBA) |
2020-10-28/TBA | 2020 Wikicite Virtual Conference | Gene Wiki: using Wikicite in Wikidata as a portal to evidence and structured annotations of the scientific literature. |
Sprints
[edit]- 2020 Complex portal sprint
- 2020 Sars-cov sprint Finished
- 2020 Proteins sprint
- 2020 Disease Ontology sprint
- 2020 MonDO Ontology sprint
- 2020 Symptom Ontology sprint
Archived sprints
[edit]- 2017 CIViC sprint
- 2016 TOGO Picture Gallery sprint (SWAT4LS)
- 2016 WikiPathways sprint (SWAT4LS)
- 2016 ShEx sprint (SWAT4LS)
- 2016 CIViC sprint
- 2015 Gene Ontology sprint
- 2015 Gene Disease relations sprint
Bot development cycle
[edit]- an initial manual modeling of 1 or 2 example entries.
- Then develop the bot on 10 entries.
- Do a test run on 100 entries
- wait for the possible constraint violations to surface.
- perform a full run
Useful Links
[edit]- General bot information, including a list of all approved bots.
- View the flags on a user's page. Bots like this one should have a (bot) flag.
- bot's page on Wikipedia
- Wikipedia bot's 'phase 3'
- Finding stuff on wikidata. For example, to check if a property exists.
- Merging Interpo Items. Help merge Interpro Items with their wikipedia pages.
- SPARQL Examples
- Maintenance Queries