SMILES notation to structure converter

SMILES Notation to Structure Converter: A Chemist's Reference for Reading and Debugging

SMILES syntax explained: graph traversal, brackets, ring closures, stereochemistry, plus tool comparison for SMILES-to-structure converters.

ChemStitchMay 6, 2026

The fastest way to read a SMILES string is to know what each character is doing — not to memorize a table of substructures, but to understand that SMILES describes a graph traversal. The atoms appear in the order you would walk through the structure, branches are bracketed, ring closures are tracked with matching numerical labels, and stereochemistry sits as small prefixes on the atom or bond. Once that mental model is in place, every SMILES is decodable on sight, and the failures that send chemists to a SMILES-to-structure converter become diagnosable rather than mysterious.

This post is a practitioner reference for SMILES: the syntax that matters at the bench, the gotchas that turn a valid string into a wrong structure, and a comparison of the converter tools chemists actually use when a string needs to become a 2D drawing. It targets the chemist who has pasted a SMILES string into a tool, gotten back something that looked off, and wanted to know whether the string or the converter was at fault.

SMILES as a graph traversal

SMILES (Simplified Molecular Input Line Entry System) was designed by David Weininger at Daylight Chemical Information Systems in the late 1980s. The canonical specification is maintained by Daylight, and the open-source OpenSMILES specification is a community-maintained extension. Every chemistry tool that consumes SMILES — RDKit, Open Babel, ChemDraw, Ketcher, PubChem — implements some version of one of these specs.

The mental model: you start at any atom, walk through the molecule one bond at a time, and write down what you see. Atoms are letters; bonds between consecutive atoms in the string are single by default; branches open with ( and close with ); rings open and close with matching digits; double and triple bonds are = and #; aromatic atoms use lowercase letters.

  • CC — ethane (two carbons, single bond between them)
  • C=C — ethylene
  • C#N — hydrogen cyanide (HCN, with the H implicit)
  • CCO — ethanol (the third atom is O, bonded to the second C)
  • c1ccccc1 — benzene (six aromatic carbons in a ring)
  • O=C(N)c1ccccc1 — benzamide (carbonyl O, then a C with two branches: an NH2 and a phenyl ring)

The hydrogens are implicit. The parser fills in enough Hs on each atom to satisfy the standard valence (4 for C, 3 for N, 2 for O, 1 for halogens). When the standard valence does not apply — charged atoms, hypervalent species, isotopes — you wrap the atom in square brackets and specify what you mean.

The bracketed atom: where ambiguity ends

Bare letters have implicit defaults. Bracketed atoms make every property explicit. Inside [ ] you can specify isotope, formal charge, explicit hydrogen count, chirality, and atom class:

  • [NH4+] — ammonium (4 explicit H, +1 charge)
  • [OH-] — hydroxide (1 explicit H, −1 charge)
  • [13C] — carbon-13 isotope
  • [Fe+2] — iron(II) cation
  • [C@H] — carbon with explicit H and counterclockwise (S, in many cases) stereochemistry
  • [C@@H] — carbon with explicit H and clockwise (R, in many cases) stereochemistry

The @ and @@ symbols are tetrahedral chirality markers, and they are the most common source of confusion. @ means counterclockwise viewing from the first listed neighbor; @@ means clockwise. The mapping to R/S depends on CIP priorities, not just on which marker is used — @@ is not always R, and parsers do compute the absolute descriptor from the spatial arrangement, not from the symbol itself.

Common Mistake Assuming [C@@H] always means R. The chirality marker describes the geometry of the listed neighbors; the R/S descriptor depends on CIP priorities. Two different SMILES with the same chirality marker can map to opposite R/S descriptors if the neighbor priorities differ. Always verify against a 2D drawing or a CIP-aware tool.

Ring closures and numbering

Rings are written by opening with a digit on one atom and closing with the same digit on another. The bond between them is implicit and single by default; specify it before the closing digit if the ring closure is a double or triple bond.

  • C1CCCCC1 — cyclohexane (six carbons, ring closure between atom 1 and atom 6)
  • C1=CC=CC=C1 — benzene (Kekulé form, with explicit double bonds)
  • c1ccccc1 — benzene (aromatic form, lowercase)
  • C1CC2CCCCC2C1 — decalin (two fused six-membered rings, with two ring-closure pairs)

For molecules with more than nine ring closures in the same string, two-digit closure labels are written with %: C%10CCCCCCCCCCC%10 opens closure 10. Most small-molecule SMILES do not need this, but it shows up in oligonucleotides, peptides, and macrocycles.

Aromatic vs. Kekulé representations

SMILES allows two different ways to represent aromatic systems: lowercase letters for atoms understood to be in an aromatic ring (the “aromatic” form), or alternating single and double bonds with uppercase letters (the Kekulé form). Both are valid; both should round-trip through any compliant parser.

In practice they don’t always. Some parsers handle Kekulé forms more reliably; others prefer aromatic forms. The default canonical form most modern tools emit (RDKit, Open Babel, ChemAxon) is the aromatic form. If a SMILES string round-tripped through one tool comes out different, the most common cause is aromatic perception — the input parser saw aromaticity differently than the output writer wrote it.

Tip For maximum interoperability when sharing SMILES between tools, run the string through RDKit first to get the canonical form, then share that. Two non-canonical SMILES for the same molecule are visually different even though they are chemically equivalent; the canonical form removes the ambiguity.

Canonical vs. non-canonical SMILES

A given molecule can be written as many different valid SMILES strings depending on which atom you start from and which branches you traverse first. CCO, OCC, and C(O)C all describe ethanol. The canonical form is the unique SMILES a tool produces by following its canonicalization algorithm (typically Morgan-algorithm-based atom ranking).

Canonical SMILES are not standardized across tools — RDKit’s canonical form, ChemAxon’s canonical form, and Daylight’s canonical form can all differ for the same structure. They are stable within a tool but not portable across tools. For database keys that need to be portable, InChI is the better identifier; canonical SMILES is fine for a tool-internal lookup.

If you are calculating molecular weight or other properties from a structure, the canonical/non-canonical distinction does not matter — the parser produces the same molecule either way. The distinction matters only for string-equality checks. For setting up bench calculations from a SMILES — computing MW for a stock solution, for example — the form does not matter; see our walkthrough on molarity calculations from molecular weight for how MW from a structure flows into solution prep, and the molarity calculator handles the mass-volume conversion once you have the MW.

Stereochemistry and double-bond geometry

Beyond @ / @@ for tetrahedral chirality, SMILES uses / and \ on the bonds adjacent to a double bond to specify E/Z geometry:

  • F/C=C/F — trans-1,2-difluoroethylene (E)
  • F/C=C\F — cis-1,2-difluoroethylene (Z)

The / means “up to the right” and \ means “down to the right” relative to the double bond. The convention is geometric, not E/Z — the descriptor depends on the CIP priorities of the substituents, like with chirality.

Stereochemistry is also where parser disagreements show up most often. Some tools strip stereochemistry on canonicalization unless explicitly preserved; others preserve it but reinterpret it; a few drop it silently if the input is ambiguous. For substrates where stereochemistry matters — nearly all bioactive molecules and any catalyst-mediated reaction — verify the round-trip through any tool you use to manipulate SMILES.

Tools for SMILES-to-structure conversion

The converter ecosystem has split into single-purpose web tools, integrated structure editors, and library APIs. Pick by what you actually need.

Single-purpose web converters

  • Chempirical, Leskoff SMILES-to-Structure, ChemAI, Novoprolabs — paste-and-render workflows. Useful for one-off visualization. Most do not let you edit the resulting structure or compute properties beyond MW. Free.
  • PubChem “Sketch from SMILES” — converts SMILES, then lets you search the PubChem database for the matching compound. Useful when you have a SMILES and want to find the published name, CAS number, or vendor data.

Structure editors with SMILES support

  • ChemDraw — paste SMILES via Edit → Paste Special → SMILES or via a SMILES dialog. Round-trips reliably for most structures; preserves stereochemistry for explicit @/@@ inputs.
  • Ketcher — paste SMILES via the import dialog; emits SMILES on copy. Open-source. Ketcher on GitHub
  • ChemStitch — SMILES import in the import dialog, SMILES copy from the toolbar; the AI also converts text descriptions (“draw ibuprofen”) to SMILES then to structure, with RDKit validation before the structure loads onto the canvas.
  • MarvinSketch — full SMILES support, including extended SMILES (CXSMILES) for aromaticity hints and reaction SMILES. MarvinSketch

Library APIs (for batch work)

  • RDKit — the open-source standard for cheminformatics. Chem.MolFromSmiles() parses; Chem.MolToSmiles() emits canonical. Robust handling of edge cases, free, well-documented. RDKit
  • Open Babel — cross-format converter; SMILES is one of many formats. Useful when converting between SMILES, MOL, SDF, PDB, etc. Open Babel
  • ChemAxon JChem — commercial library with strong SMILES extensions and stereochemistry handling. The right choice for enterprise workflows that need vendor support.

How to debug a SMILES that “looks wrong”

When a SMILES converter renders something different from what you expected, the cause is almost always one of four things:

  1. Implicit hydrogens differ from intended. The parser fills in Hs to satisfy standard valence; if your intent was a non-standard species (a radical, a hypervalent atom), bracket the atom and specify Hs explicitly.
  2. Aromatic perception mismatch. The input has Kekulé bonds in a system the output writer treats as aromatic, or vice versa. The drawn structure is chemically correct but visually different from your input.
  3. Stereochemistry stripped or reinterpreted. The chirality markers in the input were ambiguous (e.g., not enough explicit neighbors for the parser to assign), and the parser dropped them on output.
  4. Charge or radical not specified. The input was a bare atom symbol where a bracketed atom was needed; the parser assumed neutral, and the output is the neutral form.

The fastest debug move: paste the SMILES into RDKit, get the canonical form, and compare. If RDKit’s canonical SMILES differs from yours in a way that matters (atom count, charge, stereochemistry), the original string was ambiguous or non-standard. If it differs only in atom ordering, the structures are identical and the “wrong” rendering was a canonicalization difference.

For chemists who use SMILES routinely, the time-saving move is to keep both your input SMILES and the canonical form on hand — one for human reading, the other for tool interchange. Most published procedures cite SMILES in the supporting information for exactly this reason: the structure is unambiguous, the reader does not have to retype, and any tool can pick up where the paper left off.

Try ChemStitch

AI-powered chemical structure editor. Free 14-day trial.

Start free trial →