convert smiles to inchi inchikey

From SMILES to InChI and InChIKey: Which Identifier to Use When

Convert SMILES to InChI and InChIKey, and know which to store: what each format preserves, the one-way hash, and the tautomer-normalization trap.

ChemStitchJune 11, 2026

You have a SMILES and a system that wants something else: the registration database keys on InChIKey, the archival record wants a standard InChI, the modeling script expects canonical SMILES. Converting between them is one click in most tools — the harder question is which identifier to keep for which job, because they preserve different things and fail in different ways. This lays out what you get when you convert a SMILES to InChI and InChIKey, and a decision rule for which to store.

The three identifiers carry different amounts of information

Start from what each format is actually for. They sit in a rough hierarchy of "how much does it carry, and is it unique?"

  • SMILES — portable connectivity, bond orders, charge, isotopes, and (in isomeric form) stereo. Human-readable and editable, but its canonical form is unique only within one toolkit’s algorithm. RDKit, OpenEye, and ChemDraw can each emit a different canonical SMILES for the same molecule.
  • InChI — a layered, non-proprietary string that is algorithmically unique across software. A given structure has exactly one standard InChI no matter who generates it. It normalizes tautomers and encodes only absolute stereo, with separate layers for formula, connectivity, hydrogens, charge, stereo, and isotope.
  • InChIKey — a fixed 27-character hashed digest of the InChI, built for database search and deduplication, not for reconstructing the molecule.

The practical takeaway: reach for SMILES to communicate or edit a molecule, InChI to archive it unambiguously, and InChIKey to look it up or prove two records are the same compound. The two-way trade-off between the first two is covered in depth in choosing between InChI and SMILES; the piece most people miss is where InChIKey fits, so that’s the focus here.

What the InChIKey actually encodes

An InChIKey is always 27 characters in a 14-10-1 pattern — a 14-character skeleton block, a hyphen, a 10-character block, a hyphen, and a final flag character. That structure is the whole reason it’s useful for matching:

Reading an InChIKey for a match The first 14-character block is a hash of connectivity alone. The second 10-character block encodes stereochemistry, isotopes, and charge. So two enantiomers share an identical first block and differ in the second — which lets you ask "same skeleton?" and "same stereoisomer?" as separate questions from one string.

Because it’s a hash, the InChIKey is one-way. You can paste it into PubChem or ChemSpider to find the compound, but you cannot decode a structure back out of it — the only "reverse" is matching against a database of known compounds. If a tool offers to import an InChIKey and draw it, what it’s really doing is a database lookup, not a decode.

Converting SMILES to InChI and InChIKey: the path, and where it loses information

Converting a SMILES to an InChIKey runs through the InChI: SMILES → InChI → InChIKey. Each hop can change what the identifier represents, and that is exactly where the surprises live.

Same InChIKey does not mean identical input Standard InChI normalizes tautomers and charge forms. Paste the enol tautomer and the keto tautomer of the same compound and they collapse to one InChI — and therefore one InChIKey. Two chemists who drew different-looking SMILES (different atom order, charge-separated vs. covalent nitro, a different tautomer) can land on the same InChIKey. That is a feature for deduplication and a trap if you expected your exact drawn form back.

A second loss shows up across SMILES↔MOL round-trips rather than at the InChI hop. A lowercase-aromatic SMILES has to be kekulized to write a molfile, and different toolkits choose different bond assignments. So a SMILES→MOL→SMILES round-trip can change the literal string even though the molecule is unchanged. The fix is to treat the canonical SMILES or InChIKey as your equality test, never byte-for-byte string identity.

Side-by-side

PropertySMILESInChIInChIKey
Unique across software?No (canonical form is per-toolkit)Yes (one standard InChI)Yes (derived from standard InChI)
Reversible to a structure?YesYesNo (one-way hash)
Human-readable / editable?YesBarelyNo
Tautomers / charge formsPreserved as writtenNormalizedNormalized
Fixed length?NoNoYes (27 chars, 14-10-1)
Best atSharing, drawing, editingUnambiguous archivalDatabase search, dedup

Which to store, by what you’re doing

Pick by the task, not by habit:

  • Sharing a molecule in an email, a slide, or a methods section, or handing it to someone to re-draw → SMILES. It’s the only one a chemist can read and edit. Keep your exact input rather than a re-canonicalized form if the precise tautomer or charge state matters to your work.
  • Registering or archiving a compound so it resolves the same way years later, in any tool → standard InChI. Its cross-software uniqueness is the point.
  • Building a lookup key, joining two datasets, or deduplicating a library → InChIKey. Match on the full 27 characters to require the same stereoisomer; match on the first 14-character block to group by connectivity regardless of stereo. Just remember the normalization caveat above before you treat a key collision as proof of an identical drawing.

If you need reversibility and the ability to keep working on the molecule, store SMILES (or the full InChI) — never only the InChIKey, since you can’t get the structure back from it.

Generating all three from a structure you can see

The safest way to convert is to render the structure first, confirm it’s the molecule you meant, then copy the identifier. That way a transcription error in the source SMILES doesn’t propagate silently into your database key. The browser SMILES-to-structure editor parses a pasted SMILES (or InChI, or MOL) onto an editable canvas and copies SMILES, canonical SMILES (RDKit), InChI, or InChIKey from the current structure, client-side. Because the identifiers recompute from what’s on the canvas, an edit is reflected in the InChIKey you copy — exactly what you want when you fixed a stereocenter before generating the key. If the string won’t parse in the first place, the SMILES notation reference covers the common syntax errors and their fixes.

Try ChemStitch

AI-powered chemical structure editor. Free 14-day trial.

Start free trial →