See the future of ChemSpider

18 Jan 2024

For several months now we have been working hard to rebuild ChemSpider from the ground up. This is redeveloping the technical implementation of the site, as well as a complete reassessment of the website; how it looks, how it works and the data that we present.

We’ve been careful to retain all the familiar features of the site and particularly the records pages, but by looking at the way visitors to the site use the functionality, surveys of users, and user interviews we have made changes to provide a cleaner and simpler interface. We hope this new ChemSpider site, while not a complete copy of the features in the old site, will provide access to all the information that you need in a more intuitive and user-friendly way.

While we continue to refine the new site we have made a preview version of the new interface available as Beta.ChemSpider.com alongside the existing ChemSpider site. Beta.ChemSpider only contains the data for the first 5 million ChemSpider records for the moment. Once we are happy that the new site is ready it will replace the current ChemSpider site.

For the moment the new website is running in parallel to the existing https://chemspider.com site. This allows us to provide you with the ability to try improved design, provide feedback, and should you need to still switch back to the familiar interface. We are still adding functionality and tweaking the data, but your feedback will help us to validate where we have made improvements and where we still need to do more.

To try our new site go to: https://beta.chemspider.com/

FAQs

1. Where is Feature X/Data Y on beta.chemspider.com?

Some record tabs which were based on out-of-date or limited availability data have been retired. Other features like structure searching are yet to be added to the new website as we look for a solution to replace the one currently used on ChemSpider.com.

2. Should I use Beta.ChemSpider?

This a decision for you to make after you’ve tried the site. While we don’t have ChemSpider IDs from 5,000,001 to 129,000,000 in the beta site we have data for many common compounds. Please try it out and decide for yourself.

3. What does beta mean?

It means that things might change on the beta.chemspider.com site, features might be added, removed or temporarily break. We are also working on how we process the data that is loaded in the site and so sometimes the data might change as we reload the data.

4. Can you add Feature A/Data B?

We are always interested in new sources of data and features and welcome any suggestions, it might take some time for us to get to reviewing/implementing these.

5. What about ChemSpider accounts?

For the moment the new website doesn’t need any sign-in to use it, as ChemSpider accounts were not well used. We may have new features in the future that will require a login, but for now we won’t have one.

6. How do I provide feedback?

We want to know what you like or dislike about the new site, please do fill in the in-site surveys/feedback forms, or send an email to [email protected].

Comments Off on See the future of ChemSpider

ChemSpider data cleanup

15 Dec 2023

By Susan Richardson.

In previous posts, we have discussed the automated workflow we use to check new incoming data for structure and synonym errors. These checks allow us to remove the most common types of errors before they are added to the site. However, these filters do not apply to data already in ChemSpider.

Manual curation is an important part of our work. We periodically review the data on our most accessed records, in addition to ad-hoc removal or correction of erroneous data that we or our users notice when using the site. However, there are far too many records and far too much data to clean up using manual curation alone.

Recently we have focused on bulk identification and removal of erroneous data. This work has covered mapping errors and other clearly incorrect values in our experimental property data, correction or removal of malformed synonyms, correction of incorrectly labelled synonyms, and resolution of structure/synonym clashes.

Experimental Properties

We retrieved all 6.3 million experimental properties, text properties, and associated annotations from the ChemSpider database. We then compared the original text of the property as it was written in the original file to how that text was parsed and mapped by our deposition system. This enabled us to identify and correct several types of errors affecting around 2% of the properties in our database:

35,774 experimental property values had been assigned the incorrect unit (e.g. g/L instead of g/mL, °C instead of °F)
2,591 boiling points measured under non-standard pressure did not have this pressure displayed
4,292 densities had their density and temperature values swapped
79,252 miscellaneous erroneous properties and associated annotations were deleted. For example, “white crystals” mapped as melting point, impossibly high melting points or densities, etc.

Synonyms

Synonyms, chemical names, and identifiers are the most abundant type of data on ChemSpider, with a total of more than 446 million synonyms. These synonyms have additional metadata including language labels and flags identifying what type of synonym they are (e.g. CAS number, UNII, INN, trade name).

Simple Checks

We ran a series of regular expression string searches to identify synonyms with incorrect metadata, as well as malformed or otherwise erroneous synonyms.

200,007 synonym type flags added, and 4,766 incorrect flags removed
9,170 synonyms with an incorrect language label identified.
631,697 erroneous synonyms identified, including scrambled characters, properties/units, molecular formulae as synonyms, purity information, or invalid CAS numbers or EC numbers (formerly called EINECS).
922,334 instances of these erroneous synonyms deleted from ChemSpider records.

Structure/Synonym comparison

After identifying and removing these synonym-level errors, we then cross-checked ChemSpider records and their synonyms to identify mismatches. This work included amino acids, nucleic acids, and pharmaceutically acceptable salts.

As a first pass, we compared synonyms to molecular formulae to identify records missing key elements. Examples include synonyms describing a sodium salt when the molecular formula does not contain sodium, or describing an amino acid when the molecular formula contains no nitrogen. A total of 28,194 of these synonym/formula clashes were identified and removed.

For records that passed this initial molecular formula check, we performed a SMARTS comparison to identify chemical structures missing key structural features described in the synonym. These SMARTS strings were written broadly, with common substitutions allowed to prevent unnecessary removal of valid synonyms from derivative compounds.

In the following examples, the mismatched part of the synonym is highlighted in bold.

Structure	Removed synonym
	Sulfate ion
	Zolpidem tartrate
	Sodium S-sulfocysteine hydrate

After identifying these clashes, we manually spot-checked the output to weed out false positives and iterate the SMARTS filters. 101,257 synonym/structure clashes were identified and removed.

These checks included the following categories:

Amino acids and their derivatives: 6 formula clashes, 56 structure clashes
Nucleic acids, nucleosides, nucleotides: 977 formula clashes, 1,870 structure clashes
Halogens: 13,437 formula clashes, 1,256 structure clashes
Alkali and alkaline earth metals, and aluminium: 3,586 formula clashes, 56 structure clashes
Carboxylic acids and their derivatives: 5,002 formula clashes, 88,501 structure clashes
Other pharmaceutically acceptable acids: 3,534 formula clashes, 1,529 structure clashes
Amides and amines: 190 formula clashes, 304 structure clashes
Deuterates, hydrates, methylbromides: 1,462 formula clashes, 7,685 structure clashes

Get involved

You are the expert in your area of chemistry, so if you see something that doesn’t look quite right please let us know. If the error is confined to a single ChemSpider record, click the “Comment On This Record” box at the top of the affected record and let us know what the problem is. All we need is a sentence describing the error, however the more information you can provide, the better.

For more systemic errors, or in cases where you want to attach supplementary information or corrected chemical structures, please get in touch via email ([email protected]).

Comments Off on ChemSpider data cleanup

Webinar 3: Chemistry data: Challenges and opportunities. Watch the recording

18 Nov 2023

By Richard Kidd.

We will explore ongoing and planned initiatives developing standards and tools, research infrastructures, and cultures to support FAIR chemistry data as well as its preparation, publication, and reuse.

Webinar 3: Challenges and opportunities

Held 7 December 2023. Recording available on-demand. Register now to watch the recording

Speakers

“How to initiate the cultural change towards digital chemistry” SLIDES
Sonja Herres-Pawlis
Chair of Bioinorganic Chemistry, RWTH Aachen

“How can we combat heterogeneous, unfair and disparate data in digital chemistry? ” SLIDES
Samantha Kanza
Senior Enterprise Fellow, University of Southampton
Pathfinder Lead, Physical Sciences Data Infrastructure (PSDI)

“How data journals can support (chemistry) data sharing and discovery” SLIDES
Guy Jones
Chief Editor of Scientific Data, Springer Nature

Sponsored by Revvity

Revvity Signals Software, formerly PerkinElmer Informatics, has over three decades of experience providing support for scientific workflows.

Our powerful informatics solutions are used in R&D across disciplines from drug discovery to materials development. Now under our Signals Research Suite, our end-to-end SaaS solution integrates workflows to accelerate innovation and help scientists collaborate. In addition, our solution powered by TIBCO® Spotfire® can transform clinical trials.

From our flagship ChemDraw® and E-Notebook applications, to our Signals Research Suite, to our TIBCO® Spotfire® partnership for data analytics, Revvity Signals offers a powerful suite of scientific solutions.

Supported by

About ChemSpider

Explore more than 128 million structures on the ChemSpider database. Including over 200 data sources, ChemSpider is a valuable source of information for chemical scientists working with data.

Freely accessible and comprehensive, this rich source of structure-based chemistry information is a fundamental resource for chemical scientists working with data everywhere.

Learn more about ChemSpider

Comments Off on Webinar 3: Chemistry data: Challenges and opportunities. Watch the recording

Webinar 2: What does the future hold? Watch the recording

17 Oct 2023

By Richard Kidd.

Webinar 2: What does the future hold?