A/B Testing Email Design and Content: What to Test and What to Ignore

Subject line tests get most of the attention in email A/B testing, but the biggest revenue opportunities often lie deeper in the email itself — in the offer, the structure, the CTA, and the content hierarchy. The problem is that most brands either don’t test email content at all, or they test the wrong things with the wrong structure and learn nothing useful.

This post gives you a framework for testing email design and content in a way that produces genuine learnings and measurable revenue improvements.

The Testing Hierarchy: What to Test First

Not all email variables carry equal revenue weight. Testing should be directed towards the variables that most directly affect whether subscribers buy — not the variables that are easiest to change.

Tier 1: Offer and CTA (highest revenue impact)

The offer — what you’re promoting and how you’re presenting it — is the most powerful variable in any commercial email. Small changes here produce the largest measurable differences in revenue per recipient.

What to test in this tier:

Free shipping vs percentage discount vs monetary discount: Does your audience respond better to “Free shipping on orders over £40” or “15% off your next order”?
Offer prominently featured at the top vs revealed lower in the email after context-setting: Does leading with the offer convert better than leading with the story first?
Single CTA vs multiple CTAs: Does one focused CTA outperform showing three product options with separate CTAs?
CTA button text specificity: “Shop now” vs “Shop the sale” vs “Claim your 20% off” — more specific CTAs consistently outperform generic ones, but test this for your audience

Tier 2: Email structure (medium revenue impact)

Structure affects how subscribers navigate your email and whether they reach the conversion point. The key structural decisions worth testing:

Long-form vs short-form: A concise email (one strong image, one offer, one CTA) vs a more detailed email (brand story + products + testimonial + CTA). Neither is universally better — it depends on your audience’s relationship with your brand and the campaign context.
Product grid vs editorial layout: A product-forward grid email vs a more editorial, image-led layout. Fashion and lifestyle brands often find editorial layouts perform better for brand-building campaigns; product grids work better for promotional sends.
Content order: Does leading with social proof before the offer outperform leading with the offer directly?
Number of products featured: Showing 2 products vs 6 products. Choice overload is real — fewer featured products often produce more clicks on each one.

Tier 3: Visual and copy elements (lower direct revenue impact, but worth testing)

These elements affect engagement and brand perception but rarely move revenue metrics dramatically on their own. Test these after you’ve exhausted Tier 1 and Tier 2 opportunities:

Image style: Lifestyle photography vs product-on-white vs flat lay vs UGC-style
Hero image vs no hero image: A pure text email sometimes outperforms a heavily visual email for certain campaign types (personal-feeling messages, win-backs)
Colour of CTA button: Contrast matters — a button that blends into the background converts poorly regardless of colour. Test high-contrast alternatives.
Copy tone: Direct and transactional vs conversational and brand-voice-driven

How to Structure an Email Design A/B Test

The same rules that apply to subject line testing apply to content testing, but email content tests require extra care because they’re harder to isolate.

One variable rule

Every email content test must change exactly one meaningful thing. This sounds obvious but is frequently violated in practice. If you’re testing whether leading with social proof improves conversions, you can’t also change the hero image, the CTA text, and the product selection between the variants. You won’t know which change caused any difference you observe.

Define your variable clearly before building variants. “We’re testing whether featuring one product (hero product) vs four products (product grid) drives more total revenue per recipient” is a clear variable. “We’re testing a new email design” is not.

Meaningful difference between variants

For a content test to be worth running, the two variants need to be meaningfully different in the specific variable being tested. Testing two CTA button texts that are nearly identical (“Shop now” vs “Shop here”) is unlikely to produce a meaningful result. Testing “Shop now” vs “Claim your 20% off today” is a genuine test.

Adequate sample size

The same sample size constraints that apply to subject line tests apply here. For click rate tests (the right metric for content tests), you typically need even more recipients per variant than for open rate tests, because click rates are lower and the margins of error are correspondingly wider.

For lists under 10,000, use content tests to build directional hypotheses rather than expecting definitive statistical results from single tests.

Setting Up Content A/B Tests in Klaviyo

Klaviyo’s campaign A/B test feature allows you to test a full email content variant, not just the subject line. In the campaign builder, select “A/B test” and choose “Email content” as the variable.

Build both variants in full. Configure:

Split: 50/50 if running a pure test; or a smaller test group followed by winner send for larger lists
Winning metric: Use revenue per recipient for commercial campaigns. This is available in Klaviyo’s A/B test reporting and is the most direct measure of commercial success. Click rate is the second choice if the send volume is too small for revenue data to be meaningful.
Test duration: For content tests, 24–48 hours is preferable over the 4-hour windows used for subject line tests. Purchase decisions often happen hours after the email is opened.

Measuring the Right Outcomes

This is where most email content tests fail to produce value: the wrong success metric is used.

Open rate is not a useful metric for content tests. By the time a subscriber opens the email, the subject line has done its job — what happens next has nothing to do with the subject line. Content tests should always be measured on post-open behaviour.

Click rate measures whether subscribers engaged with your email content. It’s a meaningful metric for content tests, though it doesn’t tell you whether clicks converted to purchases.

Revenue per recipient (total campaign revenue ÷ total recipients) is the correct primary metric for commercial campaigns. A variant that gets fewer opens, slightly fewer clicks, but higher-value purchases can win on revenue per recipient even when it loses on click rate.

Click-to-open rate (CTOR) is useful when you want to isolate the quality of clicks: how many people who opened actually clicked? This is a clean measure of how effective the email content was for people who were already engaged enough to open.

Common Testing Mistakes

Testing too many things at once

“New email design test” where the template, imagery, copy, CTA, and product selection have all changed simultaneously. This is a brand refresh, not a test. You’ll know which variant performed better, but you won’t know why — and you won’t know which element to carry forward.

Ending tests too early

Seeing a result after 6 hours and declaring a winner. Email engagement is time-dependent: some subscribers open in the first 15 minutes, others won’t open until the next morning or even a few days later. For content tests with a purchase-intent component, cut-off decisions made in the first few hours miss a significant portion of the eventual responders.

Testing small changes on under-sized lists

Testing whether a CTA button changes from blue to green on a 5,000-person list will not produce a statistically meaningful result. Reserve small-variable tests for when your list is large enough to support them. On smaller lists, test larger structural differences that are more likely to produce detectable effects.

Applying learnings from one campaign type to all campaigns

A content structure that wins for promotional sale campaigns may not be optimal for new product launch campaigns or re-engagement campaigns. Build a testing log that categorises results by campaign type and audience segment.

Building a Testing Roadmap

A testing roadmap converts ad hoc testing into a structured programme. For email content, build a 12-week roadmap that moves through the testing hierarchy:

Weeks 1–4: Tier 1 tests — offer presentation and CTA variations
Weeks 5–8: Tier 2 tests — structural variations (long vs short, product grid vs editorial)
Weeks 9–12: Tier 3 tests — visual and copy element variations

At the end of each cycle, review your testing log and apply learnings to your default template. Then repeat, testing the next layer of variables with your improved baseline.

This approach means your email programme gets measurably better every quarter — not because of luck, but because of a disciplined learning process.

Excelohunt manages structured A/B testing programmes for e-commerce brands, ensuring every test produces actionable intelligence that compounds into long-term performance improvement.

Looking to implement these strategies with expert support?

A/B Testing — learn how we implement this for clients
Email Design — learn how we implement this for clients
Email Copywriting — learn how we implement this for clients Book a free strategy call with Excelohunt →

A/B Testing Email Design and Content: What to Test and What to Ignore

The Testing Hierarchy: What to Test First