Documentation
CTS-Lite is a lightweight Chemical Translation Service inspired by the Fiehn Lab's original CTS. It allows users to easily match InChIs, InChIKeys, SMILES, Molecular Formulas, and PubChem CIDs against a curated subset of the PubChem database containing 10.6 million compounds.
Using CTS-Lite
To use CTS-Lite, simply enter your queries into the input box on the main page.
You can separate entries using spaces, tabs, or newlines. When you click Match, the results will
be displayed in the results section. Results can be downloaded in JSON and CSV formats using the
buttons provided.
REST API
Request Formats
To make queries using the REST API, use the following formats:
JSON (Standard):
curl -X POST \
-H "Content-Type: application/json" \
-d '{"queries":"query1 query2 ..."}' \
"cts-lite.metabolomics.us/match"
CSV:
curl -X POST \
-H "Content-Type: application/json" \
-H "Accept: text/csv" \
-d '{"queries":"query1 query2 ..."}' \
"cts-lite.metabolomics.us/match"
Request Parameters
Disable top hit only:
"cts-lite.metabolomics.us/match?top_hit_only=false"
Disable first block matches:
"cts-lite.metabolomics.us/match?first_block_matches=false"
Disable RDKit conversion:
"cts-lite.metabolomics.us/match?rdkit_conversion=false"
Response Formats
Example query: XMBWDFGMSWQBCA-UHDFADDYSA-N will_fail
JSON
[
{
"query": "XMBWDFGMSWQBCA-UHDFADDYSA-N",
"query_type": "inchikey",
"found_match": true,
"match_level": "First Block",
"matches": [
{
"identifier": "24841",
"inchikey": "XMBWDFGMSWQBCA-UHFFFAOYSA-N",
"inchi": "InChI=1S/HI/h1H",
"smiles": "I",
"compound_name": "Hydrogen iodide",
"molecular_formula": "HI",
"exact_mass": 127.9123,
"literature_count": 4430,
"patent_count": 329042
}
],
"error_message": ""
},
{
"query": "will_fail",
"query_type": "unidentified",
"found_match": false,
"match_level": "",
"matches": null,
"error_message": "Invalid query type, could not identify, see documentation"
}
]
CSV
query,query_type,converted_query,found_match,match_level,error_message,pubchem_cid,inchikey,inchi,smiles,compound_name,molecular_formula,exact_mass,literature_count,patent_count
XMBWDFGMSWQBCA-UHDFADDYSA-N,inchikey,,true,First Block,,24841,XMBWDFGMSWQBCA-UHFFFAOYSA-N,InChI=1S/HI/h1H,I,Hydrogen iodide,HI,127.9123,4430,329042
will_fail,unidentified,,false,,"Invalid query type, could not identify, see documentation",,,,,,,,,
Query Types
Query types are parsed using the following logic:
-
InChIKeys must be in the format
XXXXXXXXXXXXXX-XXXXXXXXXX-X(14-10-1, all uppercase letters) -
InChIs must start with
InChI=(case-sensitive) -
SMILES are first identified by the presence of structural characters:
= # - / \ : . @ + [ ] ( ) - Converted SMILES are SMILES queries that failed to match, but were then matched using their converted InChIKey. The SMILES are converted to InChIKeys using RDKit.
- PubChem CIDs are identified as queries which only contain numbers
-
Molecular Formulas are recognized by starting with letters that cannot be at the start of SMILES:
ADEGHKLMRTUVWXYZ -
SMILES/Mol. Formula some queries, like
C, are ambiguous and can be either SMILES or Molecular Formulas. In these cases, the query first tries to match against SMILES, and then Molecular Formula.
Malformed Queries
Malformed queries are identified as follows:
-
InChIKeys that match the regex pattern:
^[a-zA-Z]{12,16}-[a-zA-Z]{9,11}-[a-zA-Z]{0,2}$ -
InChIs that start with
InChI=, but with improper capitalization - Unidentified are queries that didn't fit any of the above criteria
Match Levels
Given the setting for first block matches is enabled (default), "InChIKey" and "Converted SMILES" queries can match by first block if they don't find an exact match. This gives the First Block match level. The first fourteen characters of the InChIKey are the key's first block.
For example, the query XLYOFNOQVPJJNP-XXXXXXXXXX-X would be a first block match with Water, whose key is XLYOFNOQVPJJNP-UHFFFAOYSA-N.
To disable first block matching, use the settings cog in the web UI or add first_block_matches=false to the API request.
All other query types can only be Exact matches.
Top Hit Only
By default, CTS-Lite will return only the top hit per query. This can be changed by toggling the setting from the cog-icon next to the "Match" button, or by adding the top_hit_only=false parameter to the API request.
For each query, the top hit is determined by ranking the hits on a weighted relevance score:
(0.7 * literature_count) + (0.3 * patent_count).
RDKit Conversion
By default, CTS-Lite will attempt to convert failed SMILES queries into InChIKeys using RDKit. It will then retry the lookup against the database using the converted InChIKey. A successful conversion match is returned with the query type Converted SMILES.
Because SMILES are non-canonical, the same compound can have many SMILES representations, but PubChem only stores one of them. Converting to InChIKey ensures the lookup is format-independent.
Disable RDKit conversion by toggling the setting from the cog-icon next to the "Match" button, or by adding the rdkit_conversion=false parameter to the API request.