What should I A/B test in SMS campaigns?

The highest-impact SMS variables to test are: send time of day, first 30 characters (the preview), call-to-action phrasing, incentive type (dollar-off vs percent-off), message length, and personalization depth. Time-of-day tests typically produce 15-40% conversion lifts; copy tests produce 10-25% lifts.

How large does my SMS list need to be to A/B test?

At minimum, you need 1,000 recipients per variant (2,000 for a two-way test) to detect typical 10-15% conversion differences with 95% confidence. Lists under 1,000 can still run tests but only for large differences (30%+) or over multiple campaigns to accumulate data.

How do I measure statistical significance in SMS tests?

Use a two-proportion z-test or a free calculator like Evan Miller's A/B calculator. Plug in the clicks and total sends for each variant; the tool returns a p-value. A p-value below 0.05 means the difference is statistically significant at 95% confidence.

How long should an SMS A/B test run?

For broadcast campaigns, run both variants simultaneously — there's no 'time' component to the test. Measure results after 24-48 hours, when most clicks and conversions have landed. For drip campaigns, run the test for at least 2 weeks to account for variable response delays.

What's the biggest mistake in SMS A/B testing?

Running tests on too-small samples and calling winners prematurely. A 20% lift on 200 recipients is almost never statistically significant — you're seeing noise, not signal. Either increase sample size, run the test across multiple campaigns, or only test changes large enough to matter at your actual volume.

SMS A/B Testing: The Complete Guide to Doubling Your Campaign ROI

Most brands running SMS never A/B test. They write one message, send it, see results, and move on. That’s leaving 20–40% of achievable performance on the table every single campaign. The teams that win in SMS are the ones running small, structured tests on every send and compounding what they learn over time.

This guide covers what to test, how to structure tests for valid results, how to calculate sample sizes that actually work, and eight specific tests worth running in the next 30 days.

Why SMS A/B Testing Works So Well

SMS is uniquely suited to A/B testing because of three channel properties:

High open rate (98%) means clicks and conversions are almost a pure function of message content — there’s no inbox-placement variance to obscure the signal.
Fast feedback loop. Most SMS engagement happens within 90 minutes. You can run a test today and have conclusive results by tomorrow morning.
Tight content constraints (160 characters) force every variable to matter. In email, changing 3 words buried in paragraph 4 has negligible impact. In SMS, changing 3 of 160 words can double conversion.

What to Test (In Order of Impact)

Not all tests are equally valuable. Start at the top of this list and work down:

1. Send Time of Day

Biggest lever overall. Send the same message at 11 AM vs 6 PM and you’ll often see 2x different click rates for the same audience. Full data in our best time to send SMS guide.

2. First 30 Characters (Preview Text)

Phones preview the first ~30 characters in the notification. This is the only part 100% of recipients see. Changing the opening word can shift click-through by 15–30%.

3. Call-to-Action Phrasing

“Shop now” vs “Tap to shop” vs “Claim yours” produce meaningfully different click rates. Specific action verbs consistently outperform vague ones. (Steal from the 33 SMS copywriting examples for tested CTA phrases.)

4. Incentive Type

$10 off vs 15% off vs free shipping. These three usually produce quite different behavior depending on your average order value. Dollar-off often wins below $50 AOV; free shipping wins above $75.

5. Message Length

Short (60 chars) vs medium (130 chars) vs long (210 chars / 2 segments). Shorter usually wins, but some verticals (real estate, high-consideration B2B) benefit from more context. (See SMS character limits and segments for the encoding rules that decide where the segment break lands.)

6. Personalization Depth

Name only vs name + product vs name + product + location. Deeper personalization lifts response rates but plateaus — name alone usually gets 80% of the full lift.

7. Urgency Framing

“Ends tonight” vs “Ends in 6 hours” vs “Only 12 left.” Scarcity framing outperforms time framing for physical products; time wins for digital.

8. Emoji vs No Emoji

One relevant emoji typically lifts engagement 5–12% but doubles cost (encoding switch). Run this test specifically to see if the lift justifies the cost for your audience.

Example Test Setups

Test 1: Time of Day (Subject: highest-impact first test)

Variant A (10:30 AM)Sarah, your cart’s waiting — free shipping ends tonight! example.com/cart

Variant B (6:15 PM)Sarah, your cart’s waiting — free shipping ends tonight! example.com/cart

Split list 50/50, send at respective times on the same day. Measure clicks and orders per recipient over 24h.

Test 2: Dollar-off vs Percent-off

Variant ASarah, $10 off your cart for the next 4 hours → example.com/cart (code SAVE10)

Variant BSarah, 15% off your cart for the next 4 hours → example.com/cart (code SAVE15)

Test 3: Open (first 30 chars)

Variant AHey Sarah, quick question: did you want to grab those shoes?

Variant BSarah — the shoes you liked are down to 2 left. Yours? → example.com/shoes

How to Structure Tests for Valid Results

1. Random Sampling

Split your list by a random method: modulo of contact ID, hashed email, or your platform’s built-in random splitter. Never split by signup date, geography, or engagement score — those introduce confounds that make the result meaningless.

2. One Variable at a Time

If Variant A has a different time AND a different CTA AND a different incentive, you learned nothing when A wins. You don’t know which of the three changes caused it. Change one thing per test.

3. Hold Everything Else Constant

Send to the same audience (not different segments), on the same day, with identical formatting except the variable being tested. Control every dimension you’re not measuring.

4. Define Success Metric Up Front

Decide before launching the test: are you optimizing for click-through, conversion, or revenue per recipient? These can disagree — a discount-heavy variant might drive more clicks but less revenue. Pick one metric and stick to it.

Sample Size: The Math That Actually Matters

The single biggest mistake in SMS testing is declaring a winner with too little data. Noise looks like signal at small sample sizes. Here’s the math:

To detect a 10% lift (e.g., 5% vs 5.5% click rate) at 95% confidence and 80% power, you need approximately 1,500 recipients per variant. To detect a 20% lift, you need about 400 per variant. To detect a 50% lift, you need about 70 per variant.

In practice:

List under 1,000: Only test large changes (30%+ expected lift) or pool results across multiple campaigns.
List of 2,000–10,000: Test 15%+ expected lifts. Run one test at a time.
List over 10,000: Test anything. Run multiple tests in parallel as long as they’re on different segments.

How to Calculate Significance After the Test

Use a free A/B test calculator (Evan Miller’s at evanmiller.org is a classic). Plug in:

Variant A: clicks and total sends
Variant B: clicks and total sends

The tool returns a p-value. Interpretation:

p < 0.05: Difference is statistically significant. You have a winner.
p between 0.05 and 0.15: Suggestive but not conclusive. Run the test again or accumulate more data.
p > 0.15: No significant difference. Pick either variant based on operational preference.

Eight Tests to Run in the Next 30 Days

Time-of-day: 10 AM vs 6 PM, same message.
Day-of-week: Tuesday vs Thursday, same message.
Offer framing: Dollar-off vs percent-off.
Opening phrase: Name-first vs benefit-first.
CTA verb: “Shop” vs “Claim” vs “Get.”
Personalization: Name-only vs name + product vs generic.
Message length: 80 chars vs 140 chars.
Urgency type: Time-based (“ends tonight”) vs quantity-based (“only 12 left”).

Run them one at a time, two per week. By week 4 you’ll have eight learnings specific to your audience — compounding into 30–60% better baseline performance than you started with.

What to Stop Testing

Some tests aren’t worth running:

Capitalization. “Sale” vs “SALE” almost never moves the needle. Low-signal variable.
Unique promo codes. If two variants use different codes, the difference in redemption is about code, not messaging. Use the same code across variants.
Punctuation. Exclamation points, commas, em dashes — noise-level effects at any realistic sample size.
Sender name variations. Most SMS platforms show your brand name consistently; variation here is invisible to most recipients.

Automating the Testing Loop

ReadySMS has native A/B testing and Smart Send built in. You define variants, the platform splits the audience, runs the test, and — for ongoing drip campaigns — automatically routes the remaining audience to the winning variant once statistical significance is reached. This turns a one-time manual test into an always-on optimization loop.

For high-volume accounts, the compounding gains are significant. A brand that starts at 8% click-through and improves by 3% per quarter through automated testing reaches 14% click-through by year-end — a 75% relative improvement in conversion for zero additional spend.

The Bottom Line

A/B testing is the difference between SMS as a “broadcast channel” and SMS as a precision conversion tool. The brands that treat every campaign as an experiment — with one variable changed, sufficient sample size, and rigorous measurement — consistently outperform competitors spending 10x more on creative and strategy. The process is cheap. The math is simple. The only thing that separates the winners is discipline.