Why Your SMILES Won't Kekulize: Fixing the [nH] Nitrogen and Other Parse Errors
SMILES can't kekulize? Usually it's the [nH] aromatic nitrogen. A chemist's guide to the parse errors that stop a SMILES rendering, and how to fix each.
You paste a SMILES that looks perfectly reasonable — an imidazole, an indole, a pyrrole fused into something bigger — and the parser throws Can't kekulize mol and refuses to draw it. The string isn’t random; it came out of a database or a colleague’s spreadsheet. The problem is almost always a single aromatic nitrogen that didn’t say whether it carries a hydrogen. Here are the parse errors that stop a SMILES from rendering, why each happens, and the specific fix — starting with the one behind most "can’t kekulize" failures.
Mistake 1 — Aromatic nitrogen written n instead of [nH] (the usual can’t-kekulize cause)
What you see: c1cccn1 or n1cccc1 for pyrrole fails to render; the parser reports it can’t kekulize the ring.
Why it happens: An aromatic nitrogen can carry a different number of implicit hydrogens depending on its ring. The nitrogen in pyridine has none; the nitrogen in pyrrole has one. A bare lowercase n leaves that ambiguous, so the parser can’t assign a valid alternating-double-bond (Kekulé) pattern and refuses to guess.
The fix: Write the pyrrole-type nitrogen explicitly as [nH]. Pyrrole is [nH]1cccc1; imidazole is n1c[nH]cc1 — one nitrogen pyridine-like (n), one pyrrole-like ([nH]). A tool built for chemists should detect this exact case and offer the [nH] rewrite rather than just printing "invalid."
n = aromatic nitrogen with no H (pyridine-type). [nH] = aromatic nitrogen carrying one H (pyrrole-type). If a five-membered aromatic ring won’t kekulize, find the nitrogen that should hold the lone hydrogen and bracket it.Mistake 2 — Multi-nitrogen aromatics where protonation is genuinely ambiguous
What you see: A tetrazole, triazole, or fused azole still throws Can't kekulize mol even after you’ve bracketed one nitrogen.
Why it happens: With several aromatic nitrogens in one ring, the parser has to decide which one holds the hydrogen, and more than one assignment can look plausible. RDKit will not pick for you — it raises the kekulization error rather than silently choosing a tautomer you didn’t intend, the behavior documented in the canonical RDKit issue on pyrrole and indole SMILES. The Oxford Protein Informatics Group reviewed this class of failure and found that nitrogen-protonation corrections resolve a large share of "RDKit-invalid" structures — the structures aren’t wrong, the H placement is just underspecified.
The fix: Decide which nitrogen is protonated and bracket it as [nH]. For 1H-1,2,4-triazole that’s c1nc[nH]n1. If you genuinely don’t know the tautomer, draw it as the explicit Kekulé form with uppercase atoms and alternating single/double bonds — that sidesteps aromaticity perception entirely and forces the bond pattern you mean.
Mistake 3 — Invalid valence
What you see: The error names an atom rather than a ring — an over-valent carbon, a four-coordinate neutral nitrogen, or something like [NH8] — and rejects on sanitization.
Why it happens: RDKit validates valences during its sanitize pass and rejects configurations that can’t exist. This is a feature: a permissive converter such as Open Babel will accept odd valences as a bare graph and hand you back a "structure" that is chemically impossible. The stricter behavior tells you the string is wrong instead of silently mangling it.
The fix: Read the atom number in the error and check that atom’s bonds and charge. A nitrogen that needs to be [N+] to carry four bonds, a carbon that picked up an extra bond from a mis-typed ring closure — correct the charge or the bond and re-parse. A good editor points at the specific atom and the violation, not a generic failure.
Mistake 4 — Ring-closure digits that don’t match
What you see: "Unclosed ring bond" or a structure missing a ring you know should be there.
Why it happens: Ring bonds are encoded by matching digit pairs — a 1 opened has to be closed by a later 1. Truncation on copy, reusing a digit before closing it, or going past nine rings without switching to the %nn two-digit notation all break the pairing. This is common when a long SMILES gets clipped pasting out of a PDF.
The fix: Find the unmatched digit the error names and either close it or remove the orphan. For more than nine open rings at once, the syntax is %10, %11, and so on — a bare 10 reads as ring 1 then ring 0.
Mistake 5 — Copy-paste corruption
What you see: A string that worked yesterday fails today, or a paste from a spreadsheet or chat won’t parse at all.
Why it happens: A trailing compound name or tab from a spreadsheet cell, smart-quotes substituted by a word processor, a hard line-wrap splitting the string across two lines, or leading/trailing whitespace. None of these are chemistry errors — they’re transport damage.
The fix: Trim whitespace, strip any trailing label after the SMILES, convert smart quotes back to ASCII, and rejoin a wrapped line. A tool that expects pasted input should normalize this for you and flag "this looks like a SMILES with trailing text" rather than failing on the whole blob.
Spot-check: when a SMILES can’t kekulize, work the fix order
When a SMILES won’t render, work the list in order. Is it a transport problem — whitespace, a wrapped line, a trailing name? Then a ring-closure mismatch, then a valence violation at a named atom. Finally, for any five-membered or fused aromatic that won’t kekulize, look for an aromatic nitrogen that needs [nH]. That last case accounts for most kekulization failures, and the fix is one character.
The broader mechanics of reading and debugging SMILES syntax are in this chemist’s reference for SMILES notation. If you’re deciding which identifier to store once the structure parses, see choosing between InChI and SMILES.
You can run this whole check visually. Paste the string into the browser SMILES-to-structure editor: it surfaces located parse errors — including a one-click [nH] fix for the unambiguous pyrrole case — and renders the structure on an editable canvas the moment it parses, all client-side. Seeing the molecule is still the fastest way to confirm the fix did what you meant.