calculating theoretical yield multi-step synthesis

Cumulative Yield in Multi-Step Synthesis: Where the Losses Come From

Calculating theoretical yield in multi-step synthesis is more than multiplying step yields. Where the losses actually come from, with a worked route.

ChemStitchApril 24, 2026

Most route scouting meetings end with the same surprise. A senior chemist sketches a five-step sequence, the team nods at the per-step yields — 80%, 75%, 85%, 70%, 80% — and then someone multiplies them together: 28.6% overall. The room goes quiet. A route that looked reasonable on paper just turned into "we need half a kilogram of starting material to get 100 g of compound." Calculating theoretical yield for a multi-step synthesis is the math that forces that moment of honesty, and the step where most routes get re-scoped or killed.

This post walks through the arithmetic, then does the part the textbook treatments skip: the practitioner-level breakdown of where those per-step losses actually come from. If you only know the formula, you can compute overall yield. If you also know the failure modes, you can shorten the route, reorder the steps, or pick the disconnection that survives scale-up.

The formula for cumulative yield in a multi-step synthesis

For a linear sequence, the overall yield is the product of each step's fractional yield:

$Y_{overall} = Y_1 \times Y_2 \times \cdots \times Y_n$

where each Y_i is the isolated yield of step i expressed as a decimal (80% = 0.80). This is the same logic as percent yield for a single reaction, applied recursively: the product of step 1 becomes the limiting reagent for step 2, so the losses compound.

Common Mistake Averaging step yields instead of multiplying them. Five 75% steps do not give 75% overall — they give 0.75⁵ = 23.7%. The arithmetic mean is always higher than the geometric product when yields vary, and it makes bad routes look acceptable. Always multiply.

Worked example: a hypothetical five-step medicinal chemistry route

Consider an illustrative lead-optimization route (not a published synthesis — the step yields below are typical values drawn from bench experience, used to demonstrate the arithmetic):

Step 1 — Boc protection of a primary amine on the starting aminopyridine: 92%
Step 2 — Suzuki–Miyaura coupling with an arylboronic acid: 78%
Step 3 — Buchwald–Hartwig amination to install the secondary amine: 65%
Step 4 — Amide coupling with a carboxylic acid using HATU: 80%
Step 5 — Boc deprotection with TFA, free-basing, and recrystallization: 85%

Applying the cumulative yield formula:

$Y_{overall} = 0.92 \times 0.78 \times 0.65 \times 0.80 \times 0.85 = 0.317$

That is 31.7% overall. If the target is 50 g of final compound at MW 450, the theoretical starting-material demand at MW 180 is:

$m_{SM} = \frac{50 \text{ g}}{0.317} \times \frac{180}{450} = 63.1 \text{ g}$

That calculation assumes a 1:1 mole ratio through every step, which rarely holds exactly — use the stoichiometry calculator to propagate equivalents and molecular weights correctly across each coupling partner and protecting-group change. For this illustrative route, the verification arithmetic rounds to 63 g of starting material to produce 50 g of target, before accounting for any reagent excess, catalyst loading, or solvent.

Worked Check Walk the calculation backward to sanity-check: 63.1 g × 0.317 overall yield × (450/180) MW ratio = 50 g. The ratios cancel cleanly. If your back-calculated number disagrees with the forward calculation by more than a percent or two, you have a stoichiometric coefficient error somewhere — most commonly a missed equivalents ratio in a coupling step.

Where the losses actually come from

The formula tells you how much is lost. It says nothing about why. Knowing the mechanism of loss is what lets you fix a bad step, and it is the thing most online treatments of overall yield omit entirely.

Incomplete conversion

The reaction stops before all starting material is consumed. Causes: equilibrium positions unfavorable to product, deactivated catalyst, insufficient reaction time, moisture or oxygen ingress in an air-sensitive coupling. In the illustrative route above, the Buchwald amination at 65% is the prime suspect — palladium-catalyzed aminations are notoriously sensitive to ligand choice, base, and trace water. If TLC or LCMS shows substantial remaining starting material at the workup point, incomplete conversion is the dominant loss mode. Fix: longer reaction time, fresh catalyst, degassed solvent, switch ligand (XPhos → RuPhos or BrettPhos).

Chromatography losses

Silica gel flash chromatography routinely loses 5–15% of material even for a clean separation, more when the product streaks or when polar impurities co-elute. A five-step route with chromatography at every step bleeds 25–50% of the theoretical product into column fractions that get discarded. Fix: telescope steps (carry crude forward without purification), switch to recrystallization where a solid product allows it, or use reverse-phase only for the final step. Process chemistry groups routinely redesign medchem routes to eliminate column purification above a certain scale because the throughput and material losses compound.

Recrystallization and trituration losses

Good for purity, bad for yield. A well-designed recrystallization recovers 70–85% of the dissolved product as crystals; the mother liquor retains the rest. If Step 5 in the illustrative route uses a single recrystallization and the API needs ≥99.5% purity, expect closer to 75% recovery on first pass. Fix: concentrate the mother liquor and take a second crop, but validate purity — second crops often carry through trace impurities that the first crystallization rejected.

Workup losses (aqueous extraction)

Product partitions between the organic and aqueous layers based on logP, pH, and salt content. A compound with logP ~1.5 and an ionizable amine can leave 20–30% of material in the aqueous phase if the extraction is done at neutral pH instead of basic pH. Fix: check logP before designing the workup, use pH adjustment to push the compound fully into organic, and run three-volume back-extractions of the aqueous layer for polar intermediates. A practitioner weighs the aqueous phase, back-extracts, and only discards it once a blank TLC confirms nothing is lost.

Solubility-limited reactions

If the substrate is sparingly soluble in the reaction solvent, effective concentration drops, kinetics slow, and the reaction plateaus before completion. Common failure mode in amide couplings of hydrophobic intermediates. Fix: switch to DMF or NMP at higher temperature, or add a co-solvent; if the product precipitates during the reaction it may coat undissolved starting material and freeze conversion.

Tautomer and stereochemical equilibria

For scaffolds with accessible tautomers (e.g., 2-aminopyrimidines, imidazoles) or epimerizable stereocenters adjacent to a carbonyl, a fraction of product can drift into an off-target isomer during workup or purification. You isolate the desired isomer in lower yield because some of the theoretical product became something else. Fix: lower reaction temperature, avoid prolonged basic or acidic conditions during workup, and confirm isomeric purity by chiral HPLC rather than achiral.

Protecting-group mismanagement

Two steps dedicated purely to adding and removing a protecting group multiply into the cumulative yield along with the bond-forming steps. In the illustrative route, the Boc protection (Step 1, 92%) and Boc deprotection (Step 5, 85%) together cost: 1 − (0.92 × 0.85) = 21.8% of the theoretical product — just to babysit one amine. Fix: ask whether the protection is really necessary. Many couplings tolerate free amines with the right conditions, and eliminating a protect/deprotect pair of 90%/90% steps still claws back 19% of overall yield.

The route-design sensitivity chart

Step count amplifies per-step losses nonlinearly. To hit specific overall-yield targets, the required average per-step yield rises steeply as the route gets longer:

Read it as: to hit 20% overall across five steps, you need an average of 72.5% per step. Across ten steps, that target demands 85.1% per step — an average most discovery-stage routes do not hit. This is the curve that drives the process-chemistry instinct to collapse routes to three or four steps before scale-up.

Planning Tip When scoping a new target, write down the planned step count first. If it exceeds five for a non-natural-product, pressure-test every disconnection — route length compounds faster than most chemists intuit. A convergent route with two four-step branches that merge at a late-stage coupling will almost always out-yield a linear eight-step sequence, because only the longest branch contributes to the cumulative multiplication.

Convergent routes: the one case where the formula changes

In a convergent synthesis, two fragments are built separately and coupled late. Only the longest linear branch contributes to the overall yield; the shorter branch's losses are not multiplied with the longer branch except at the convergence step. This is why process chemists aggressively seek convergent disconnections — splitting a six-step linear route into two three-step branches with a coupling step often doubles overall yield, without changing any individual step's efficiency.

For a convergent synthesis with a three-step left branch (yields L₁, L₂, L₃), a three-step right branch (R₁, R₂, R₃), and a coupling step (C) joining them, the overall yield based on the longer linear path (LHS + coupling) is:

$Y_{overall} = L_1 \times L_2 \times L_3 \times C$

assuming the right branch is run to sufficient scale that it is not limiting. If both branches are equal length and equal yield, and the right branch is the limiting reagent in the coupling, the formula depends on how much excess of the right-branch fragment is used. The practical upshot: convergence is almost always worth an extra step if it lets you move a low-yielding transformation off the main line.

From cumulative yield to practical route decisions

Calculating theoretical yield in a multi-step synthesis is the first math every medicinal and process chemist runs when a new target comes in. But the number is only useful if it drives a decision: cut a step, telescope two together, switch to a convergent disconnection, eliminate a protecting group, or redesign the low-yielding step. The formula is one line. The productive work is in identifying which of the seven loss modes above is dragging down each step, and whether it is fixable on the timeline you have.

For route-planning math — propagating equivalents, molecular weights, and theoretical masses across a full reagent table — the stoichiometry calculator handles the arithmetic so you can spend your time on the chemistry. For related topics, see the limiting reagent discussion for scale-up and the companion post on single-step percent yield, which is the building block this cumulative-yield story sits on top of.

Further reading on green-chemistry metrics that build on yield — atom economy and the E-factor — is collected in Sheldon's 25-year retrospective on the E-factor in Green Chemistry. Detailed, independently reproduced multi-step procedures with honest yield ranges are published in Organic Syntheses, which requires each procedure to be verified in a second laboratory before publication — a rare and valuable data source for realistic per-step yield expectations.