Skip to content Documentation

CTS-Lite

UC Davis Fiehn Lab

Feedback   

Documentation

CTS-Lite is a lightweight Chemical Translation Service inspired by the Fiehn Lab's original CTS. It allows users to easily match InChIs, InChIKeys, SMILES, Molecular Formulas, and PubChem CIDs against a curated subset of the PubChem database containing 10.6 million compounds.

Using CTS-Lite

To use CTS-Lite, simply enter your queries into the input box on the main page. You can separate entries using spaces, tabs, or newlines. When you click Match, the results will be displayed in the results section. Results can be downloaded in JSON and CSV formats using the buttons provided.

Note: Queries are limited to 100,000 entries at a time. Very large queries may take some time to process. Please be patient while the server handles your request.

REST API

Request Formats

To make queries using the REST API, use the following formats:

JSON (Standard):

curl -X POST \ -H "Content-Type: application/json" \ -d '{"queries":"query1 query2 ..."}' \ "cts-lite.metabolomics.us/match"

CSV:

curl -X POST \ -H "Content-Type: application/json" \ -H "Accept: text/csv" \ -d '{"queries":"query1 query2 ..."}' \ "cts-lite.metabolomics.us/match"

Request Parameters

Disable top hit only:

"cts-lite.metabolomics.us/match?top_hit_only=false"

Disable first block matches:

"cts-lite.metabolomics.us/match?first_block_matches=false"

Disable RDKit conversion:

"cts-lite.metabolomics.us/match?rdkit_conversion=false"

Response Formats

Example query: XMBWDFGMSWQBCA-UHDFADDYSA-N will_fail

JSON

[
  {
    "query": "XMBWDFGMSWQBCA-UHDFADDYSA-N",
    "query_type": "inchikey",
    "found_match": true,
    "match_level": "First Block",
    "matches": [
      {
        "identifier": "24841",
        "inchikey": "XMBWDFGMSWQBCA-UHFFFAOYSA-N",
        "inchi": "InChI=1S/HI/h1H",
        "smiles": "I",
        "compound_name": "Hydrogen iodide",
        "molecular_formula": "HI",
        "exact_mass": 127.9123,
        "literature_count": 4430,
        "patent_count": 329042
      }
    ],
    "error_message": ""
  },
  {
    "query": "will_fail",
    "query_type": "unidentified",
    "found_match": false,
    "match_level": "",
    "matches": null,
    "error_message": "Invalid query type, could not identify, see documentation"
  }
]

CSV

query,query_type,converted_query,found_match,match_level,error_message,pubchem_cid,inchikey,inchi,smiles,compound_name,molecular_formula,exact_mass,literature_count,patent_count
XMBWDFGMSWQBCA-UHDFADDYSA-N,inchikey,,true,First Block,,24841,XMBWDFGMSWQBCA-UHFFFAOYSA-N,InChI=1S/HI/h1H,I,Hydrogen iodide,HI,127.9123,4430,329042
will_fail,unidentified,,false,,"Invalid query type, could not identify, see documentation",,,,,,,,,
                    

Query Types

Query types are parsed using the following logic:

  • InChIKeys must be in the format XXXXXXXXXXXXXX-XXXXXXXXXX-X (14-10-1, all uppercase letters)
  • InChIs must start with InChI= (case-sensitive)
  • SMILES are first identified by the presence of structural characters: = # - / \ : . @ + [ ] ( )
  • Converted SMILES are SMILES queries that failed to match, but were then matched using their converted InChIKey. The SMILES are converted to InChIKeys using RDKit.
  • PubChem CIDs are identified as queries which only contain numbers
  • Molecular Formulas are recognized by starting with letters that cannot be at the start of SMILES: ADEGHKLMRTUVWXYZ
  • SMILES/Mol. Formula some queries, like C, are ambiguous and can be either SMILES or Molecular Formulas. In these cases, the query first tries to match against SMILES, and then Molecular Formula.

Malformed Queries

Malformed queries are identified as follows:

  • InChIKeys that match the regex pattern: ^[a-zA-Z]{12,16}-[a-zA-Z]{9,11}-[a-zA-Z]{0,2}$
  • InChIs that start with InChI=, but with improper capitalization
  • Unidentified are queries that didn't fit any of the above criteria

Match Levels

Given the setting for first block matches is enabled (default), "InChIKey" and "Converted SMILES" queries can match by first block if they don't find an exact match. This gives the First Block match level. The first fourteen characters of the InChIKey are the key's first block.

For example, the query XLYOFNOQVPJJNP-XXXXXXXXXX-X would be a first block match with Water, whose key is XLYOFNOQVPJJNP-UHFFFAOYSA-N.

To disable first block matching, use the settings cog in the web UI or add first_block_matches=false to the API request.

All other query types can only be Exact matches.

Top Hit Only

By default, CTS-Lite will return only the top hit per query. This can be changed by toggling the setting from the cog-icon next to the "Match" button, or by adding the top_hit_only=false parameter to the API request.

For each query, the top hit is determined by ranking the hits on a weighted relevance score:
(0.7 * literature_count) + (0.3 * patent_count).

RDKit Conversion

By default, CTS-Lite will attempt to convert failed SMILES queries into InChIKeys using RDKit. It will then retry the lookup against the database using the converted InChIKey. A successful conversion match is returned with the query type Converted SMILES.

Because SMILES are non-canonical, the same compound can have many SMILES representations, but PubChem only stores one of them. Converting to InChIKey ensures the lookup is format-independent.

Disable RDKit conversion by toggling the setting from the cog-icon next to the "Match" button, or by adding the rdkit_conversion=false parameter to the API request.