Most brands running SMS never A/B test. They write one message, send it, see results, and move on. That’s leaving 20–40% of achievable performance on the table every single campaign. The teams that win in SMS are the ones running small, structured tests on every send and compounding what they learn over time.

This guide covers what to test, how to structure tests for valid results, how to calculate sample sizes that actually work, and eight specific tests worth running in the next 30 days.

Why SMS A/B Testing Works So Well

SMS is uniquely suited to A/B testing because of three channel properties:

  1. High open rate (98%) means clicks and conversions are almost a pure function of message content — there’s no inbox-placement variance to obscure the signal.
  2. Fast feedback loop. Most SMS engagement happens within 90 minutes. You can run a test today and have conclusive results by tomorrow morning.
  3. Tight content constraints (160 characters) force every variable to matter. In email, changing 3 words buried in paragraph 4 has negligible impact. In SMS, changing 3 of 160 words can double conversion.

What to Test (In Order of Impact)

Not all tests are equally valuable. Start at the top of this list and work down:

1. Send Time of Day

Biggest lever overall. Send the same message at 11 AM vs 6 PM and you’ll often see 2x different click rates for the same audience. Full data in our best time to send SMS guide.

2. First 30 Characters (Preview Text)

Phones preview the first ~30 characters in the notification. This is the only part 100% of recipients see. Changing the opening word can shift click-through by 15–30%.

3. Call-to-Action Phrasing

“Shop now” vs “Tap to shop” vs “Claim yours” produce meaningfully different click rates. Specific action verbs consistently outperform vague ones.

4. Incentive Type

$10 off vs 15% off vs free shipping. These three usually produce quite different behavior depending on your average order value. Dollar-off often wins below $50 AOV; free shipping wins above $75.

5. Message Length

Short (60 chars) vs medium (130 chars) vs long (210 chars / 2 segments). Shorter usually wins, but some verticals (real estate, high-consideration B2B) benefit from more context.

6. Personalization Depth

Name only vs name + product vs name + product + location. Deeper personalization lifts response rates but plateaus — name alone usually gets 80% of the full lift.

7. Urgency Framing

“Ends tonight” vs “Ends in 6 hours” vs “Only 12 left.” Scarcity framing outperforms time framing for physical products; time wins for digital.

8. Emoji vs No Emoji

One relevant emoji typically lifts engagement 5–12% but doubles cost (encoding switch). Run this test specifically to see if the lift justifies the cost for your audience.

Example Test Setups

Test 1: Time of Day (Subject: highest-impact first test)

Variant A (10:30 AM)Sarah, your cart’s waiting — free shipping ends tonight! example.com/cart
Variant B (6:15 PM)Sarah, your cart’s waiting — free shipping ends tonight! example.com/cart

Split list 50/50, send at respective times on the same day. Measure clicks and orders per recipient over 24h.

Test 2: Dollar-off vs Percent-off

Variant ASarah, $10 off your cart for the next 4 hours → example.com/cart (code SAVE10)
Variant BSarah, 15% off your cart for the next 4 hours → example.com/cart (code SAVE15)

Test 3: Open (first 30 chars)

Variant AHey Sarah, quick question: did you want to grab those shoes?
Variant BSarah — the shoes you liked are down to 2 left. Yours? → example.com/shoes

How to Structure Tests for Valid Results

1. Random Sampling

Split your list by a random method: modulo of contact ID, hashed email, or your platform’s built-in random splitter. Never split by signup date, geography, or engagement score — those introduce confounds that make the result meaningless.

2. One Variable at a Time

If Variant A has a different time AND a different CTA AND a different incentive, you learned nothing when A wins. You don’t know which of the three changes caused it. Change one thing per test.

3. Hold Everything Else Constant

Send to the same audience (not different segments), on the same day, with identical formatting except the variable being tested. Control every dimension you’re not measuring.

4. Define Success Metric Up Front

Decide before launching the test: are you optimizing for click-through, conversion, or revenue per recipient? These can disagree — a discount-heavy variant might drive more clicks but less revenue. Pick one metric and stick to it.

Sample Size: The Math That Actually Matters

The single biggest mistake in SMS testing is declaring a winner with too little data. Noise looks like signal at small sample sizes. Here’s the math:

To detect a 10% lift (e.g., 5% vs 5.5% click rate) at 95% confidence and 80% power, you need approximately 1,500 recipients per variant. To detect a 20% lift, you need about 400 per variant. To detect a 50% lift, you need about 70 per variant.

In practice:

How to Calculate Significance After the Test

Use a free A/B test calculator (Evan Miller’s at evanmiller.org is a classic). Plug in:

The tool returns a p-value. Interpretation:

Eight Tests to Run in the Next 30 Days

  1. Time-of-day: 10 AM vs 6 PM, same message.
  2. Day-of-week: Tuesday vs Thursday, same message.
  3. Offer framing: Dollar-off vs percent-off.
  4. Opening phrase: Name-first vs benefit-first.
  5. CTA verb: “Shop” vs “Claim” vs “Get.”
  6. Personalization: Name-only vs name + product vs generic.
  7. Message length: 80 chars vs 140 chars.
  8. Urgency type: Time-based (“ends tonight”) vs quantity-based (“only 12 left”).

Run them one at a time, two per week. By week 4 you’ll have eight learnings specific to your audience — compounding into 30–60% better baseline performance than you started with.

What to Stop Testing

Some tests aren’t worth running:

Automating the Testing Loop

ReadySMS has native A/B testing and Smart Send built in. You define variants, the platform splits the audience, runs the test, and — for ongoing drip campaigns — automatically routes the remaining audience to the winning variant once statistical significance is reached. This turns a one-time manual test into an always-on optimization loop.

For high-volume accounts, the compounding gains are significant. A brand that starts at 8% click-through and improves by 3% per quarter through automated testing reaches 14% click-through by year-end — a 75% relative improvement in conversion for zero additional spend.

The Bottom Line

A/B testing is the difference between SMS as a “broadcast channel” and SMS as a precision conversion tool. The brands that treat every campaign as an experiment — with one variable changed, sufficient sample size, and rigorous measurement — consistently outperform competitors spending 10x more on creative and strategy. The process is cheap. The math is simple. The only thing that separates the winners is discipline.