0% found this document useful (0 votes)
142 views39 pages

RCDK

This package allows users to access functionality in the CDK (Chemistry Development Kit), a Java framework for cheminformatics. It loads molecules, evaluates fingerprints and molecular descriptors, and allows viewing structures in 2D. The package provides an interface to analyze chemical data using descriptors and manipulate molecular structures using CDK classes and methods.

Uploaded by

rednri
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views39 pages

RCDK

This package allows users to access functionality in the CDK (Chemistry Development Kit), a Java framework for cheminformatics. It loads molecules, evaluates fingerprints and molecular descriptors, and allows viewing structures in 2D. The package provides an interface to analyze chemical data using descriptors and manipulate molecular structures using CDK classes and methods.

Uploaded by

rednri
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Package rcdk

October 7, 2013
Version 3.2.3.2 Date 2013-10-07 Title rcdk - Interface to the CDK Libraries Author Rajarshi Guha <[email protected]> Maintainer Rajarshi Guha <[email protected]> Depends ngerprint Imports rJava, rcdklibs (>= 1.5.4), methods, png, iterators Suggests xtable, RUnit License LGPL LazyLoad yes Description This package allows the user to access functionality in the CDK, a Java framework for cheminformatics. This allows the user to load molecules, evaluate ngerprints, calculate molecular descriptors and so on. In addition the CDK API allows the user to view structures in 2D. NeedsCompilation no Repository CRAN Date/Publication 2013-10-07 23:58:34

R topics documented:
Atoms . . . . . . bpdata . . . . . . cdk.version . . . cdkFormula-class do.aromaticity . . eval.atomic.desc . eval.desc

2 generate.formula . . . . get.atomic.desc.names . get.atoms . . . . . . . . get.bonds . . . . . . . . get.connected.atom . . . get.desc.categories . . . get.desc.names . . . . . get.ngerprint . . . . . . get.formula . . . . . . . get.isotopes.pattern . . . get.mol2formula . . . . . get.murcko.fragments . . get.properties . . . . . . get.property . . . . . . . get.smiles . . . . . . . . get.smiles.parser . . . . get.total.charge . . . . . get.total.hydrogen.count get.tpsa . . . . . . . . . hasNext . . . . . . . . . is.connected . . . . . . . isvalid.formula . . . . . load.molecules . . . . . matches . . . . . . . . . Molecule . . . . . . . . parse.smiles . . . . . . . remove.hydrogens . . . . remove.property . . . . . set.charge.formula . . . . set.property . . . . . . . view.molecule.2d . . . . view.table . . . . . . . . write.molecules . . . . . Index

Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 9 9 10 11 11 12 13 15 16 16 17 18 19 20 21 21 22 22 23 24 25 26 27 29 30 31 32 32 33 34 35 36 38

Atoms

Operations on atoms

Description get.symbol returns the chemical symbol for an atom. get.point3d returns the 3D coordinates of the atom get.point2d returns the 2D coordinates of the atom get.atomic.number returns the atomic number of the atom

Atoms

get.hydrogen.count returns the number of implicit Hs on the atom. Depending on where the molecule was read from this may be NULL or an integer greater than or equal to 0 get.charge returns the partial charge on the atom. If charges have not been set the return value is NULL, otherwise the appropriate charge. get.formal.charge is returns the formal charge on the atom. By default the formal charge will be 0 (i.e., NULL is never returned) is.aromatic returns TRUE if the atom is aromatic, FALSE otherwise is.aliphatic returns TRUE if the atom is part of an aliphatic chain, FALSE otherwise is.in.ring returns TRUE if the atom is in a ring, FALSE otherwise get.atom.index returns the index of the atom in the molecule (starting from 0) get.connected.atoms returns a list of atoms that are connected to the specied atom Usage get.symbol(atom) get.point3d(atom) get.point2d(atom) get.atomic.number(atom) get.hydrogen.count(atom) get.charge(atom) get.formal.charge(atom) get.connected.atoms(atom, mol) get.atom.index(atom, mol) is.aromatic(atom) is.aliphatic(atom) is.in.ring(atom) Arguments atom mol Value In the case of get.point3d the return value is a 3-element vector containing the X, Y and Z coordinates of the atom. If the atom does not have 3D coordinates, it returns a vector of the form c(NA,NA,NA). Similarly for get.point2d, in which case the return vector is of length 2. Author(s) Rajarshi Guha (<[email protected]>) See Also get.atoms A jobjRef representing an IAtom object A jobjRef representing an IAtomContainer object

cdk.version

bpdata

Boiling Point Data

Description Structures and associated boiling points for 277 molecules, primarily alkanes and substituted alkanes. Usage bpdata Format A data.frame with two columns: [,1] [,2] SMILES BP character numeric Structure in SMILES format Boiling point in Kelvin

The names of the molecules are used as the row names References Goll, E.S. and Jurs, P.C.; "Prediction of the Normal Boiling Points of Organic Compounds From Molecular Structures with a Computational Neural Network Model", J. Chem. Inf. Comput. Sci., 1999, 39, 974-983.

cdk.version

Get Current CDK Version

Description Returns a string containing the version of the CDK used in this package Usage cdk.version() Value A string representing the CDK version Author(s) Rajarshi Guha (<[email protected]>)

cdkFormula-class

cdkFormula-class

Class cdkFormula, a class for handling molecular formula

Description This class handles molecular formulae. It provides extra information such as the IMolecularFormula Java object, elements contained and number of them. Objects from the Class Objects can be created using new constructor and lled with a specic mass and window accuracy. Note No notes yet. Author(s) Miguel Rojas-Cherto (<[email protected]>) References A parallel effort to expand the Chemistry Development Kit: http://cdk.sourceforge.net See Also get.formula, set.charge.formula, get.isotopes.pattern, isvalid.formula,

do.aromaticity

Perform Aromaticity Detection, atom typing or isotopic conguration

Description These methods can be used to perform aromaticity detection, atom typing or isotopic conguration on a molecule object. In general, when molecules are loaded via load.molecules these are performed by default. If molecules are obtained via parse.smiles these operations are not performed and so the user should call one or both of these methods to corrrectly congure a molecule. Usage do.aromaticity(molecule) do.typing(molecule) do.isotopes(molecule)

6 Arguments molecule

eval.atomic.desc

The molecule on which the operation is to be performed. Should of class jobjRef with a jclass attribute of IAtomContainer

Value No return value. If the operations fail an exception is thrown and an error message is printed Author(s) Rajarshi Guha (<[email protected]>) See Also load.molecules, parse.smiles

eval.atomic.desc

Evaluate an Atomic Descriptor

Description The CDK implements a number of descriptors divided into three main groups - atomic, molecular and bond. This method evaluates the specied atomic descriptor(s) for a molecule Usage eval.atomic.desc(molecule, which.desc, verbose=FALSE) Arguments molecule which.desc verbose A reference to a CDK IAtomContainer object The fully qualied class name of the descriptor to evaluate or a vector such names If TRUE, progress will be written to the screen, otherwise the function performs silently

Value A data.frame is returned. Author(s) Rajarshi Guha (<[email protected]>) See Also get.atomic.desc.names get.desc.names eval.desc

eval.desc

eval.desc

Evaluate a Molecular Descriptor

Description The CDK implements a number of descriptors divided into three main groups - atomic, molecular and bond. This method evaluates the specied molecular descriptor(s) for a molecule Usage eval.desc(molecules, which.desc, verbose=FALSE) Arguments molecules which.desc verbose A single IAtomContainer object or a list of references to CDK IAtomContainer objects The fully qualied class name of the descriptor to evaluate or a vector such names If TRUE, progress will be written to the screen, otherwise the function performs silently

Value A data.frame is returned. For a single molecule it will have one row, for multiple molecules it will have the number of rows equal to the number of molecules Author(s) Rajarshi Guha (<[email protected]>) See Also get.desc.names get.desc.categories Examples
smiles <- c( CCC , c1ccccc1 , CC(=O)C ) mols <- sapply(smiles, parse.smiles) dnames <- get.desc.names( topological ) descs <- eval.desc(mols, dnames, verbose=TRUE)

generate.formula

generate.formula

Generate a cdkFormula object.

Description This function generate a list of cdkFormula objects given a mass. Usage generate.formula(mass, window=0.01, elements=list(c("C",0,50),c("H",0,50), c("N",0,50),c("O",0,50), c("S",0,50)), validation=FALSE, charge=0.0) Arguments mass window elements validation charge Value Objects of class MassToFormulaTool, from the IMolecularFormula package Author(s) Miguel Rojas-Cherto (<[email protected]>) See Also get.formula, set.charge.formula, get.isotopes.pattern, isvalid.formula Examples
mfSet <- generate.formula(18.03383,charge=1, elements=list(c("C",0,50),c("H",0,50),c("N",0,50))) for (i in mfSet) { print(i) }

The mass value from which to be generate the formulas., The window accuracy in the same units as mass., Elements to take into account., TRUE, if the method should only generate valid formulas. If FALSE, nonsensical formulae my be generated which must be ltered out by the user, The charge value of the formula.

get.atomic.desc.names

get.atomic.desc.names Get the names of the available atomic descriptors

Description The CDK implements a number of descriptors divided into three main groups - atomic, molecular and bond. This method returns the names of the available atomic descriptors. Usage get.atomic.desc.names(type = "all") Arguments type A string which can be one of "all", "topological", "geometrical" "hybrid", "constitutional", "electronic", allowing you to choose atomic descriptors of specic categories. The keyword "all" will return all available descriptors

Value A vector of fully qualied descriptor names. Author(s) Rajarshi Guha (<[email protected]>) See Also eval.atomic.desc get.desc.names eval.desc

get.atoms

Get the atoms from a molecule or bond

Description This function returns a list containing IAtom objects from a molecule or a bond. Usage get.atoms(object) get.atom.count(molecule) Arguments object molecule A jObjRef representing an IAtomContainer, IMolecule or IBond object A jobjRef representing an IAtomContainer

10 Value

get.bonds

A list containing jobjRefs to a CDK IAtom object or else the number of atoms in the molecule Author(s) Rajarshi Guha (<[email protected]>) See Also get.bonds, get.point3d, get.symbol

get.bonds

Get the bonds from a molecule

Description This function returns a list containing IABond objects from a molecule Usage get.bonds(molecule) Arguments molecule Value A list containing jobjRefs to a CDK IBond object Author(s) Rajarshi Guha (<[email protected]>) See Also get.atoms, get.connected.atom, A jObjRef representing an IAtomContainer, IMolecule

get.connected.atom

11

get.connected.atom

Get the atom connected to an atom in a bond

Description This function returns the atom that is connected to a specied in a specied bond. Note that this function assumes 2-atom bonds, mainly because the CDK does not currently support other types of bonds Usage get.connected.atom(bond, atom) Arguments bond atom Value A jObjRef representing an IAtom object Author(s) Rajarshi Guha (<[email protected]>) See Also get.atoms A jObjRef representing an IBond object A jObjRef representing an IAtom object

get.desc.categories

Get Descriptor Class Names

Description This function returns the broad descriptor categories that are available. Examples include topolgical, geometrical and so on. You can use a specic category to avoid calculating all descriptors for a set of molecules and saves you having to select individual descriptors by hand. Usage get.desc.categories() Value A character vector of descriptor category names

12 Author(s) Rajarshi Guha (<[email protected]>) See Also eval.desc, get.desc.names

get.desc.names

get.desc.names

Get Descriptor Class Names

Description The CDK implements a number of descriptors divided into three main groups - atomic, molecular and bond. Currently the package will only evaluate molecular descriptors. This function returns the class names of the available descriptors, which can then be used to calculate descriptors for a specic molecule. By default all available descriptor class names are returned. However it is possible to specify that a subset of the descriptors should be considered. The subset is specied by keyword and can be one of: topological, geometrical,hybrid, constitutional, protein, electronic. Usage get.desc.names(type = "all") Arguments type Value A character vector of descriptor class names Author(s) Rajarshi Guha (<[email protected]>) See Also eval.desc, get.desc.categories Indicates which subset of molecular descriptors should be considered

get.ngerprint

13

get.fingerprint

Evaluate Fingerprints

Description This function evaluates ngerprints of a specied type for a set of molecules or a single molecule. Depending on the nature of the ngerprint, parameters can be specied. Currently ve different ngerprints can be specied: standard - Considers paths of a given length. The default is but can be changed. These are hashed ngerprints, with a default length of 1024 extended - Similar to the standard type, but takes rings and atomic properties into account into account graph - Similar to the standard type by simply considers connectivity hybridization - Similar to the standard type, but only consider hybridization state maccs - The popular 166 bit MACCS keys described by MDL estate - 79 bit ngerprints corresponding to the E-State atom types described by Hall and Kier pubchem - 881 bit ngerprints dened by PubChem kr - 4860 bit ngerprint dened by Klekota and Roth shortestpath - A ngerprint based on the shortest paths between pairs of atoms and takes into account ring systems, charges etc. signature - A feature,count type of ngerprint, similar in nature to circular ngerprints, but based on the signature descriptor Depending on whether the input is a single IAtomContainer object, a list or single vector is returned. Each element of the list is an S4 object of class fingerprint-class or featvec-class, which can be manipulated with the ngerprint package. Usage get.fingerprint(molecule, type = standard , fp.mode = bit , depth=6, size=1024, verbose=FALSE) Arguments molecule type fp.mode An IAtomContainer object that can be obtained by loading them from disk or drawing them in the editor. The type of ngerprint. See description for possible values. The default is the standard binary ngerprint. The type of ngerprint to return. Possible values are bit, raw, and count. The raw mode will return a featvec-class type of ngerprint, representing fragments and their count of occurence in the molecule. The count mode is similar, except that it returns hash values of fragments and their count of occurence. While any of these values can be specied, a given ngerprint implementation may not implement all of them, and in those cases the return value is NULL.

14 depth size verbose

get.ngerprint The search depth. This argument is ignored for the pubchem, maccs, kr and estate ngerprints The length of the ngerprint bit string. This argument is ignored for the pubchem, maccs, kr, signature and estate ngerprints If TRUE, exceptions, if they occur, will be printed

Value Objects of class fingerprint-class or featvec-class, from the fingerprint package. If there is a problem during ngerprint calculation, NULL is returned.

Author(s) Rajarshi Guha (<[email protected]>)

References Faulon et al, The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR studies, J. Chem. Inf. Comput. Sci., 2003, 43, 707-720.

See Also load.molecules

Examples
## get some molecules sp <- get.smiles.parser() smiles <- c( CCC , CCN , CCN(C)(C) , c1ccccc1Cc1ccccc1 , C1CCC1CC(CN(C)(C))CC(=O)CC ) mols <- parse.smiles(smiles) ## get a single fingerprint using the standard ## (hashed, path based) fingerprinter fp <- get.fingerprint(mols[[1]]) ## get MACCS keys for all the molecules fps <- lapply(mols, get.fingerprint, type= maccs ) ## get Signature fingerprint ## feature, count fingerprinter fps <- lapply(mols, get.fingerprint, type= signature , fp.mode= raw )

get.formula

15

get.formula

Get the formula object from a formula character.

Description This function returns a formula object containing mass, string character and isotopes when is given a character/string formula.

Usage get.formula(mf, charge=0) Arguments mf charge Value Objects of class cdkFormula, from the IMolecularFormula package Author(s) Miguel Rojas-Cherto (<[email protected]>) References A parallel effort to expand the Chemistry Development Kit: http://cdk.sourceforge.net See Also set.charge.formula, get.isotopes.pattern, isvalid.formula, generate.formula Examples
formula <- get.formula( NH4 , charge = 1) formula

A string containing the formula of the molecular formula of chemical object. The charge of the molecular formula.

16

get.mol2formula

get.isotopes.pattern

Generate the isotope pattern.

Description This function get the isotope pattern given a cdkFormula object. It modies as the IMolecularFormula Java object as the its mass. Usage get.isotopes.pattern(formula,minAbund=0.1) Arguments formula minAbund Value Objects of class IsotopePatternGenerator, from the IMolecularFormula package Author(s) Miguel Rojas-Cherto (<[email protected]>) References A parallel effort to expand the Chemistry Development Kit: http://cdk.sourceforge.net See Also get.formula, set.charge.formula, isvalid.formula, generate.formula A cdkFormula object. Minimal abundance of the isotopes to be added in the combinatorial search.

get.mol2formula

Parser a molecule to formula object.

Description This function convert a molecule object to a formula object. Usage get.mol2formula(molecule, charge=0)

get.murcko.fragments Arguments molecule charge Value The molecule to be parsed., The charge characterizing the molecule.,

17

Objects of class MolecularFormulaManipulator, from the IMolecularFormulaManipulator package Author(s) Miguel Rojas-Cherto (<[email protected]>) See Also set.charge.formula, get.isotopes.pattern, isvalid.formula Examples
molecule <- parse.smiles("N")[[1]] convert.implicit.to.explicit(molecule) formula <- get.mol2formula(molecule,charge=0)

get.murcko.fragments

Molecule Fragmentation Methods

Description A variety of methods for fragmenting molecules are available ranging from exhaustive, rings to more specic methods such as Murcko frameworks. Fragmenting a collection of molecules can be a useful for a variety of analyses. In addition fragment based analysis can be a useful and faster alternative to traditional clustering of the whole collection, especially when it is large. Note that exhaustive fragmentation of large molecules (with many single bonds) can become time consuming. Usage get.murcko.fragments(mols, min.frag.size = 6, as.smiles = TRUE, single.framework = FALSE) get.exhaustive.fragments(mols, min.frag.size = 6, as.smiles = TRUE)

18 Arguments mols min.frag.size as.smiles

get.properties

A molecule object or list of molecule objects. Each object should have a jclass of IAtomContainer The size of the smallest fragments to be considered

If TRUE, the fragments are returned as SMILES strings, otherwise as IAtomContainer objects single.framework If TRUE, then a single framework (i.e., the framework consisting of the union of all ring systems and linkers) is returned for each molecule. Otherwise, all combinations of ring systems and linkers are returned Value get.murcko.fragments returns a list with each element being a list with two elements: rings and frameworks. Each of these elements is either a character vector of SMILES strings or a list of IAtomContainer objects. get.exhaustive.fragments returns a list of length equal to the number of input molecules. Each element is a character vector of SMILES strings or a list of IAtomContainer objects. Author(s) Rajarshi Guha (<[email protected]>) See Also load.molecules, parse.smiles, Examples
mol <- parse.smiles( c1ccc(cc1)CN(c2cc(ccc2[N+](=O)[O-])c3c(nc(nc3CC)N)N)C )[[1]] mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=TRUE) mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=FALSE)

get.properties

Get All Property Values of a Molecule

Description Returns a list of all the properties of a molecule. The names of the list are set to the property names Usage get.properties(molecule) Arguments molecule A Java object of class IAtomContainer or IMolecule

get.property Value

19

A list of the property values, with names equal to the property names. NULL property values are returned as NA Author(s) Rajarshi Guha (<[email protected]>) See Also get.property, set.property, remove.property Examples
smiles <- c1ccccc1 mol <- parse.smiles(smiles)[[1]] set.property(mol, prop1 , 23.45) set.property(mol, prop2 , inactive ) get.properties(mol)

get.property

Get the Value of a Molecule Property

Description This function retrieves the value of a keyed property that has previously been set on the molecule. The get.title function is simply a wrapper around get.property that directly provides access to the molecule title. Usage get.property(molecule, key) get.title(molecule) Arguments molecule key Value The value of the property is the key is found else NA. For get.title, the title of the molecule if available otherwise NA Author(s) Rajarshi Guha (<[email protected]>) A Java object of class IAtomContainer A string naming the property

20 See Also get.properties, set.property, remove.property Examples


smiles <- c1ccccc1 mol <- parse.smiles(smiles)[[1]] set.property(mol, prop1 , 23.45) set.property(mol, prop2 , inactive ) get.property(mol, prop1 )

get.smiles

get.smiles

Get the SMILES for a Molecule

Description The function will generate a SMILES representation of an IAtomContainer object. The default parameters of the CDK SMILES generator are used. This can mean that for large ring systems the method may fail. See CDK Javadocs for more information Usage get.smiles(molecule) Arguments molecule Value An R character object containing the SMILES Author(s) Rajarshi Guha (<[email protected]>) Examples
sp <- get.smiles.parser() smiles <- c( CCC , CCN , CCN(C)(C) , c1ccccc1Cc1ccccc1 , C1CCC1CC(CN(C)(C))CC(=O)CC ) mols <- parse.smiles(smiles)

A Java object of class IAtomContainer

get.smiles.parser

21

get.smiles.parser

Get a SMILES Parser

Description This function returns a reference to a SMILES parser object. If you are parsing multiple SMILES strings, it is preferable to create your own parser and supply it to parse.smiles rather than forcing that function to instantiate a new parser for each call Usage get.smiles.parser() Value A jobjRef to a CDK SmilesParser object Author(s) Rajarshi Guha (<[email protected]>) See Also get.smiles, get.smiles.parser, view.molecule.2d

get.total.charge

Get the Total Charges for the Molecule

Description get.total.charge returns the summed partial charges for a molecule and get.total.formal.charge returns the summed formal charges. Currently, if one or more partial charges are unset, the function simply returns the sum of formal charges (via get.total.formal.charge). This is slightly different from how the CDK evaluates the total charge of a molecule (via AtomContainerManipulator.getTotalCharge()), but is in line with how OEChem determines net charge on a molecule. In general, you will want to use the get.total.charge function. Usage get.total.charge(molecule) get.total.formal.charge(molecule) Arguments molecule A Java object of class IAtomContainer

22 Value A double value indicating the total partial charge or total formal charge Author(s) Rajarshi Guha (<[email protected]>)

get.tpsa

get.total.hydrogen.count Get the Total Hydrogen Count for a Molecule

Description The function will return the summed implicit hydrogens of all atoms in the specied AtomContainer Usage get.total.hydrogen.count(molecule) Arguments molecule Value An integer value indicating the number of implicit hydrogens Author(s) Rajarshi Guha (<[email protected]>) A Java object of class IAtomContainer

get.tpsa

Commonly Used Molecular Descriptors

Description These methods will return the value for the corresponding descriptors. While they can always be evaluated using eval.desc, they are common enough that separate functions are provided. Usage get.tpsa(molecule) get.alogp(molecule) get.xlogp(molecule) get.volume(molecule)

hasNext Arguments molecule Details A jObjRef representing an IAtomContainer object

23

Its important to note that ALogP and XLogP assumes that the molecule has explicit hydrogens. If the molecule is read from an SD le, explicit Hs are usually present. On the other hand, if the molecule is obtained from a SMILES, explicit hydrogens must be added. The molecular volume is calculated using a group contribution method rather than the an analytical method. This allows to avoid the use of 3D structures. Value Single numeric value representing TPSA, ALogP, XLogP or molecular volume. Author(s) Rajarshi Guha (<[email protected]>) See Also eval.desc

hasNext

Does This Iterator Have A Next Element

Description hasNext is a generic function that indicates if the iterator has another element. Usage hasNext(obj, ...) ## S3 method for class hasNext(obj, ...) Arguments obj ... Value Logical value indicating whether the iterator has a next element. In the context of reading a structure le, this indicates whether there are more molecules to read an iterator object. additional arguments that are ignored. iload.molecules

24 See Also iload.molecules

is.connected

is.connected

Get the Largest Component in a Disconnected Molecule

Description These methods allow one to check whether a molecule is fully connected or else retrieve the largest disconnected component

Usage get.largest.component(mol) is.connected(mol)

Arguments mol A jObjRef representing an IAtomContainer object

Value For get.largest.component, if the input molecule has more than one disconnected component, the largest is returned. Otherwise, the molecule itself is returned. For is.connected, TRUE if the molecule is fully connected, FALSE otherwise

Author(s) Rajarshi Guha (<[email protected]>)

Examples
m <- parse.smiles("CC.CCCCCC.CCCC")[[1]] largest <- get.largest.component(m) length(get.atoms(largest)) == 6

isvalid.formula

25

isvalid.formula

Validate a cdkFormula object.

Description This function validates a cdkFormula object. At the moment is using the nitrogen Rule and RDBE Rule.

Usage isvalid.formula(formula, rule = c("nitrogen","RDBE")) Arguments formula rule Value Objects of class MolecularFormulaChecker, from the IMolecularFormula package Author(s) Miguel Rojas-Cherto (<[email protected]>) References A parallel effort to expand the Chemistry Development Kit: http://cdk.sourceforge.net See Also get.formula, set.charge.formula, get.isotopes.pattern, generate.formula Examples
formula <- get.formula( NH4 , charge = 0) isvalid.formula(formula, rule = c("nitrogen","RDBE"))

A cdkFormula object. The rules to be applied: nitrogen and RDBE.

26

load.molecules

load.molecules

Load Molecular Structures From Disk

Description The CDK can read a variety of molecular structure formats. This function encapsulates the calls to the CDK API to load a structure given its lename Usage load.molecules(molfiles=NA, aromaticity = TRUE, typing = TRUE, isotopes = TRUE, verbose=FALSE) iload.molecules(molfile, type="smi", aromaticity = TRUE, typing = TRUE, isotopes = TRUE, skip=TRUE) Arguments molfiles A character vector of lenames. Note that the full path to the les should be provided. URLs can also be used as paths. In such a case, the URL should start with "http://" A string containing the lename to load. Must be a local le Indicates whether the input le is SMILES or SDF. Valid values are "smi" or "sdf" If TRUE then aromaticity detection is performed on all loaded molecules. If this fails for a given molecule, then the molecule is set to NA in the return list If TRUE then atom typing is performed on all loaded molecules. The assigned types will be CDK internal types. If this fails for a given molecule, then the molecule is set to NA in the return list If TRUE then atoms are congured with isotopic masses If TRUE, output (such as le download progress) will be bountiful If TRUE, then the reader will continue reading even when faced with an invalid molecule. If FALSE, the reader will stop at the st invalid molecule

molfile type aromaticity typing

isotopes verbose skip

Details Note that if molecules are read in from formats that do not have rules for handling implicit hydrogens (such as MDL MOL), the molecule will not have implicit or explicit hydrogens. To add explicit hydrogens, make sure that the molecule has been typed (this is TRUE by default for this function) and then call convert.implicit.to.explicit. On the other hand for a format such as SMILES, implicit or explicit hydrogens will be present.

matches Value

27

load.molecules returns a list of CDK Molecule objects, which can be used in other rcdk functions. iload.molecules is an iterating version of the loader and is applicable for large SMILES or SDF les. In contrast to load.molecules this does not load all the molecules into memory at one go, and as a result lets you process arbitrarily large structure les. Author(s) Rajarshi Guha (<[email protected]>) See Also view.molecule.2d, convert.implicit.to.explicit Examples
## Not run: ## load a single file amol <- load.molecules( foo.sdf ) ## load multiple files mols <- load.molecules(c( mol1.sdf , mol2.smi , https://github.com/rajarshi/cdkr/blob/master/data/set2/dhfr00008.sdf?raw=true )) ## iterate over a large file moliter <- iload.molecules("big.sdf", type="sdf") while(hasNext(moliter)) { mol <- nextElem(moliter) print(get.property(mol, "cdk:Title")) } ## End(Not run)

matches

Perform Substructure Searching & MCS Detection

Description These functions perform substructure searches of a query, specied in SMILES or SMARTS forms, over one or more target molecules and maximum common substructure searches for pairs of molecules. Usage matches(query, target, return.matches=FALSE) is.subgraph(query, target) get.mcs(mol1, mol2, as.molecule = TRUE)

28 Arguments query target mol1 mol2 A SMILES or SMARTS string A single IAtomContainer object or a list of IAtomContainer objects An IAtomContainer An IAtomContainer

matches

return.matches If TRUE the lists of atom indices that correspond to the matching substructure are returned as.molecule If TRUE the MCS is returned as a new IAtomContainer object. Otherwise a atom index maping between the two molecules is returned as a 2D array of integers

Details For the case of is.subgraph, the query molecule must be a single IAtomContainer or a valid SMILES string. Note that this method can be signicantly faster than matches, but is limited by the fact that SMARTS patterns cannot be specied. This uses the "TurboSubStructure" SMSD method and so only searches for the rst substructure match. For MCS detection, the default SMSD algorithm is employed and the best scoring MCS is returned by default. Furthermore, one can obtain the resultant MCS either as an IAtomContainer in which the atoms and bonds are clones of the corresponding matching atoms and bonds in one of the molecule. Or else as a 2D array of dimensions Nx2 of atom index mappings. Here N is the size of the MCS and the rst column represents the atom index from the rst molecule and the second column the atom index from the second molecule. Note that since the CDK SMARTS matcher internally will perform aromaticity perception and atom typing, the target molecules need not have these operations done on them beforehand for matches method. However, if is.subgraph or get.mcs is being used, the molecules should have aromaticity detected and atom typing performed explicitly. If the atom indices of the matching substructures (in the target molecule) are desired, use the matches function directly. Value For matches with return.matches = FALSE, a boolean vector where each element is TRUE or FALSE depending on whether the corresponding element in targets contains the query or not. If return.matches = TRUE, the return value is a list of lists. The number of elements of the top level list equals the number of matches. Each element is a list of two elements, named "match" and "mapping". The rst element is TRUE if the query matched the target. If so, the second element is a list of numeric vectors, giving the atom indices (0-indexed) of the target atoms that matched the query. If there was no match for this target molecule, this element will be NULL For is.subgraph, a boolean vector, where each element is TRUE or FALSE depending on whether the corresponding element in targets contains the query or not. For get.mcs an IAtomContainer object or a 2D array of atom index mappings between the two molecules. Author(s) Rajarshi Guha (<[email protected]>)

Molecule See Also load.molecules, get.smiles, do.aromaticity, do.typing, do.isotopes Examples


smiles <- c( CCC , c1ccccc1 , C(C)(C=O)C(CCNC)C1CC1C(=O) ) mols <- sapply(smiles, parse.smiles) query <- [#6]=O doesMatch <- matches(query, mols) ## get mappings mappings <- matches("CCC", mols, TRUE)

29

Molecule

Operations on molecules

Description Various functions to perform operations on molecules. get.exact.mass returns the exact mass of a molecule get.natural.mass returns the natural exact mass of a molecule convert.implicit.explicit converts implicit hydrogens to explicit hydrogens. This function does not return any value but rather modies the molecule object passed to it is.neutral returns TRUE if all atoms in the molecule have a formal charge of 0, otherwise FALSE Usage get.exact.mass(molecule) get.natural.mass(molecule) convert.implicit.to.explicit(molecule) is.neutral(molecule) Arguments molecule Details In some cases, a molecule may not have any hydrogens (such as when read in from an MDL MOL le that did not have hydrogens). In such cases, convert.implicit.to.explicit will add implicit hydrogens and then convert them to explicit ones. In addition, for such cases, make sure that the molecule has been typed beforehand. A jObjRef representing an IAtomContainer or IMolecule object

30 Value exact.mass returns a numeric get.natural.mass returns a numeric convert.implicit.to.explicit has no return value is.neutral returns a boolean. Author(s) Rajarshi Guha (<[email protected]>) See Also get.atoms, do.typing Examples
m <- parse.smiles( c1ccccc1 )[[1]] ## Need to configure the molecule do.aromaticity(m) do.typing(m) do.isotopes(m) get.exact.mass(m) get.natural.mass(m) convert.implicit.to.explicit(m) get.natural.mass(m) do.isotopes(m) # Configure isotopes of newly added hydrogens get.exact.mass(m) is.neutral(m)

parse.smiles

parse.smiles

Parse a Vector of SMILES Strings

Description This function parses a vector of SMILES strings to generate a list of IAtomContainer objects. Note that the resultant molecule will not have any 2D or 3D coordinates. Note that the molecules obtained from this method will not have any aromaticity perception, atom typing or isotopic conguration done on them. This is in contrast to the load.molecules method. Thus, you should perform these steps manually on the molecules. Usage parse.smiles(smiles)

remove.hydrogens Arguments smiles Value A SMILES string

31

A list of jobjRefs to their corresponding CDK IAtomContainer objects. If a SMILES string could not be parsed, NA is returned instead. Author(s) Rajarshi Guha (<[email protected]>) See Also load.molecules, get.smiles, get.smiles.parser, view.molecule.2d, do.aromaticity, do.typing, do.isotopes Examples
smiles <- c( CCC , c1ccccc1 , C(C)(C=O)C(CCNC)C1CC1C(=O) ) mol <- parse.smiles(smiles[1]) mols <- parse.smiles(smiles)

remove.hydrogens

Remove Hydrogens from a Molecule

Description This function generate a new IAtomContainer object in which the hydrogens have been removed. This can be useful for descriptor calculations. Usage remove.hydrogens(molecule) Arguments molecule Value A jobref that refers to a IAtomContainer object Author(s) Rajarshi Guha (<[email protected]>) A Java object of class IAtomContainer

32

set.charge.formula

remove.property

Remove A Property From a Molecule

Description This function removes a keyed property from a molecule object. This deletes the key and its value from the molecule Usage remove.property(molecule, key) Arguments molecule key Value None Author(s) Rajarshi Guha (<[email protected]>) See Also get.property, set.property A Java object of class IAtomContainer A string naming the property

set.charge.formula

Set the charge to a cdkFormula object.

Description This function set the charge to a cdkFormula object. It modies as the IMolecularFormula Java object as the its mass. Usage set.charge.formula(formula, charge=-1) Arguments formula charge A cdkFormula object. The value of the charge to set.

set.property Value Returns the formula object with the specied charge Author(s) Miguel Rojas-Cherto (<[email protected]>) References A parallel effort to expand the Chemistry Development Kit: http://cdk.sourceforge.net See Also get.formula, get.isotopes.pattern, isvalid.formula, generate.formula

33

set.property

Set A Property On A Molecule

Description This function allows one to add a keyed property to a molecule. The key must be a string, but the value can be string, numeric or even an arbitrary Java object (of class jobjRef) Usage set.property(molecule, key, value) Arguments molecule key value A Java object of class IAtomContainer A string naming the property The value of the property. This can be character, integer, double or of class jobjRef

Value None Author(s) Rajarshi Guha (<[email protected]>) See Also get.property, get.properties, remove.property

34 Examples
smiles <- c1ccccc1 mol <- parse.smiles(smiles)[[1]] set.property(mol, prop1 , 23.45) set.property(mol, prop2 , inactive ) get.properties(mol)

view.molecule.2d

view.molecule.2d

View and Copy 2D Structure Diagrams

Description The CDK is capable of generating 2D structure diagrams. These methods allow one to view 2D structure diagrams. Depending on the method called a Swing JFrame is displayed which allows resizing of the image or a raster image (derived from a PNG byte stream) is is returned, which can be viewed using rasterImage. It is also possible to copy a 2D depiction to the system clipboard, which can then be pasted into various external applications. Usage view.molecule.2d(molecule, ncol = 4, cellx = 200, celly = 200) view.image.2d(molecule, width = 200, height = 200) copy.image.to.clipboard(molecule, width = 200, height = 200) Arguments molecule If a single molecule is to be viewed this should be a reference to a IAtomContainer object. If multiple molecules are to be viewed this should be a list of such objects. If a character is specied then it is taken as the name of a le and the molecules are loaded from the le The number of columns if a grid is desired The width of the grid cells The height of the grid cells The width of the image The height of the image

ncol cellx celly width height Details

For the case of view.molecule.2d, if a jobjRef is passed it should be a reference to an IAtomContainer object. In case the rst argument is of class character it is assumed to be a le and is loaded by the function. This function can be used to view a single molecule or multiple molecules. If a list of molecule objects is supplied the molecules are displayed as a grid of 2D viewers. In case a le is specied, it will display a single molecule or multiple molecules depending on how many molecules are loaded. For view.image.2d, the image can be viewed via rasterImage.

view.table

35

copy.image.to.clipboard copies the 2D depiction to the system clipboard in PNG format. You can then paste into other applications. Due to event handling issues, the depiction will show on OS X, but the window will be unresponsive. Also copying images to the clipboard will not work. As a result, on OS X we make use of a standalone helper that is run via the system command. Currently, this is supported for the view.molecule.2d method (for a single molecule) and the copy.image.to.clipboard method. In the future, other view methods will also be accessible via this mechanism. While this allows OS X users to view molecules, it is slow due to invoking a new process. The depictions will work ne (i.e., no need to shell out) on Linux and Windows. Value view.molecule.2d and copy.image.to.ckipboard do not return anything. view.image.2d returns an array of the dimensions height x width x channels, from the original PNG version of the 2D depiction. Author(s) Rajarshi Guha (<[email protected]>) See Also view.table, rasterImage, link{readPNG} Examples
m <- parse.smiles( c1ccccc1C(=O)NC )[[1]] ## Not run: img <- view.image.2d(m, 100,100) plot(1:10, 1:10, pch=19) rasterImage(img, 0,8, 2,10) ## End(Not run)

view.table

View 2D Structures With Data

Description The CDK is capable of generating 2D structure diagrams. This function can be used to view a set of molecules along with some associated data. The format of the output is a table, where the rst column are the 2D images of the molecules, followed by the data columns. Usage view.table(molecules, dat, cellx = 200, celly = 200)

36 Arguments molecules dat A list of jobRef objects that represent IAtomContainer

write.molecules

A data.frame containing numeric or character columns. If columns are named they will be used in the data table. If not, names are autogenerated. The number of rows of the data.frame should be equal to the number of molecules Initial width of the table cells Initial height of the table cells

cellx celly Details

Due to event handling issues, the depiction will show on OS X, but the window will be unresponsive. The depictions will work ne on Linux and Windows. Value Nothing Author(s) Rajarshi Guha (<[email protected]>) See Also view.molecule.2d Examples
smiles <- c( CCC , CCN , CCN(C)(C) , c1ccccc1Cc1ccccc1 , C1CCC1CC(CN(C)(C))CC(=O)CC ) mols <- parse.smiles(smiles) dframe <- data.frame(x = runif(4), toxicity = factor(c( Toxic , solubility = c( yes , yes , ## Not run: view.table(mols[1:4], dframe)

Toxic , Nontoxic , no , yes ))

Nontoxic )),

write.molecules

Write Molecules To Disk

Description This function writes one or more molecules to an SD le on disk, which can be of the single- or multi-molecule variety. In addition, if the molecule has keyed properties, they can also be written out as SD tags. Usage write.molecules(mols, filename, together=TRUE, write.props=FALSE)

write.molecules Arguments mols filename together write.props Details A list of Java objects of class IAtomContainer

37

The name of the SD le to write. Note that if together is FALSE then this argument is taken as a prex for the name of the individual les If TRUE then all the molecules are written to a single SD le. If FALSE each molecule is written to an individual le Should keyed properties be included in the SD le output

This function can be used to write a single SD le containing multiple molecules. In case individual SD les are desired the together argument can be set ot FALSE. In this case, the value of filename is used as a prex, to which a numeric identier and the sufx of ".sdf" is appended. In case, a single molecule is to be written to disk, simply specify the lename and use the default value of together Value The value of the property Author(s) Rajarshi Guha (<[email protected]>) See Also load.molecules, set.property, get.property, remove.property

Index
Topic classes cdkFormula-class, 5 Topic datasets bpdata, 4 Topic methods hasNext, 23 Topic programming Atoms, 2 cdk.version, 4 do.aromaticity, 5 eval.atomic.desc, 6 eval.desc, 7 generate.formula, 8 get.atomic.desc.names, 9 get.atoms, 9 get.bonds, 10 get.connected.atom, 11 get.desc.categories, 11 get.desc.names, 12 get.fingerprint, 13 get.formula, 15 get.isotopes.pattern, 16 get.mol2formula, 16 get.murcko.fragments, 17 get.properties, 18 get.property, 19 get.smiles, 20 get.smiles.parser, 21 get.total.charge, 21 get.total.hydrogen.count, 22 get.tpsa, 22 is.connected, 24 isvalid.formula, 25 load.molecules, 26 matches, 27 Molecule, 29 parse.smiles, 30 remove.hydrogens, 31 remove.property, 32 38 set.charge.formula, 32 set.property, 33 view.molecule.2d, 34 view.table, 35 write.molecules, 36 Atoms, 2 bpdata, 4 cdk.version, 4 cdkFormula-class, 5 charge (get.total.charge), 21 convert.implicit.to.explicit, 26, 27, 29 convert.implicit.to.explicit (Molecule), 29 copy.image.to.clipboard (view.molecule.2d), 34 depict (view.molecule.2d), 34 do.aromaticity, 5, 29, 31 do.isotopes, 29, 31 do.isotopes (do.aromaticity), 5 do.typing, 2931 do.typing (do.aromaticity), 5 eval.atomic.desc, 6, 9 eval.desc, 6, 7, 9, 12, 22, 23 fragment (get.murcko.fragments), 17 generate.formula, 8, 15, 16, 25, 33 get.alogp (get.tpsa), 22 get.atom.count (get.atoms), 9 get.atom.index (Atoms), 2 get.atomic.desc.names, 6, 9 get.atomic.number (Atoms), 2 get.atoms, 3, 9, 10, 11, 30 get.bonds, 10, 10 get.charge (Atoms), 2 get.connected.atom, 10, 11

INDEX get.connected.atoms (Atoms), 2 get.desc.categories, 7, 11, 12 get.desc.names, 6, 7, 9, 12, 12 get.exact.mass (Molecule), 29 get.exhaustive.fragments (get.murcko.fragments), 17 get.fingerprint, 13 get.formal.charge (Atoms), 2 get.formula, 5, 8, 15, 16, 25, 33 get.hydrogen.count (Atoms), 2 get.isotopes.pattern, 5, 8, 15, 16, 17, 25, 33 get.largest.component (is.connected), 24 get.mcs (matches), 27 get.mol2formula, 16 get.murcko.fragments, 17 get.natural.mass (Molecule), 29 get.point2d (Atoms), 2 get.point3d, 10 get.point3d (Atoms), 2 get.properties, 18, 20, 33 get.property, 19, 19, 32, 33, 37 get.smiles, 20, 21, 29, 31 get.smiles.parser, 21, 21, 31 get.symbol, 10 get.symbol (Atoms), 2 get.title (get.property), 19 get.total.charge, 21 get.total.formal.charge (get.total.charge), 21 get.total.hydrogen.count, 22 get.tpsa, 22 get.volume (get.tpsa), 22 get.xlogp (get.tpsa), 22 hasNext, 23 iload.molecules, 24 iload.molecules (load.molecules), 26 is.aliphatic (Atoms), 2 is.aromatic (Atoms), 2 is.connected, 24 is.in.ring (Atoms), 2 is.neutral (Molecule), 29 is.subgraph (matches), 27 isvalid.formula, 5, 8, 1517, 25, 33 load.molecules, 5, 6, 14, 18, 26, 2931, 37 match (matches), 27 matches, 27 mcs (matches), 27 Molecule, 29 parse.smiles, 5, 6, 18, 21, 30 rasterImage, 34, 35 remove.hydrogens, 31 remove.property, 19, 20, 32, 33, 37 set.charge.formula, 5, 8, 1517, 25, 32 set.property, 19, 20, 32, 33, 37 show,cdkFormula-method (cdkFormula-class), 5 smarts (matches), 27 substructure (matches), 27 view.image.2d (view.molecule.2d), 34 view.molecule.2d, 21, 27, 31, 34, 36 view.table, 35, 35 write.molecules, 36

39

You might also like