An A/B Testing Framework for Ecommerce SMS Campaigns
Most ecommerce SMS "A/B tests" I see aren't tests at all. Someone sends version A to half the list, version B to the other half, eyeballs the click counts, declares a winner, and moves on. No baseline, no idea whether the difference was real or noise, and usually a sample size too small to tell either way. That's not testing — that's flipping coins and calling heads a strategy.
Full disclosure: I work for ReadySMS, so I think about per-segment costs constantly. That actually helps here, because the thing nobody talks about with SMS A/B testing is that every test costs money per message, and your test design either respects that budget or burns it. Below is the framework I'd actually use to run tests on an ecommerce list — what to vary, how to split, how big the sample needs to be, and how to read the result without fooling yourself.
Pick one variable, and make it matter
The whole point of an A/B test is attribution: if you change three things and the numbers move, you've learned nothing about which thing moved them. So change one variable per test. The trick is picking variables big enough to produce a detectable difference.
Things worth testing, roughly in order of impact:
- Offer structure — "$10 off" vs. "15% off", or "free shipping" vs. a flat discount. This usually moves conversion more than anything else.
- Send timing — morning vs. evening, weekday vs. weekend. (We've got a whole post on the best time to send SMS if you want a starting hypothesis.)
- Urgency framing — "ends tonight" vs. "this week" vs. no deadline.
- First-line hook — what shows in the preview before they open.
- Personalization depth — first name + last-viewed product vs. a generic blast.
- Link presence and placement — link early vs. link after the offer.
Cosmetic things — an emoji, a comma, a single word swap — rarely move conversion enough to detect at realistic list sizes. Test the things customers actually respond to: price, urgency, relevance.
Watch the segment math before you watch the results
Here's where SMS A/B testing diverges from email. Each variant has a character count, and character count drives cost. A test where version A is 150 characters and version B is 175 characters with an emoji isn't a fair comparison — B is now a multipart unicode message and costs more to send.
Quick refresher: a plain-text SMS segment is 160 GSM-7 characters; longer messages split into 153-character parts. Drop in a single emoji and the limit collapses to 70 characters per segment (67 for multipart). So:
- Version A: "Your cart's waiting — 15% off if you check out today: [link]" → ~60 chars → 1 segment
- Version B: "🛒 Your cart's waiting! Grab 15% off before midnight tonight — checkout here: [link]" → ~85 chars with emoji → 2 unicode segments
On a 5,000-contact test split (2,500 per arm), on the Starter tier at $0.0084/segment plus the $0.0045 carrier pass-through:
- Arm A: 2,500 × 1 × ($0.0084 + $0.0045) = $32.25
- Arm B: 2,500 × 2 × ($0.0084 + $0.0045) = $64.50
Arm B costs roughly double to send. If it wins on revenue, you have to net that extra cost out before declaring it the winner. A 4% lift in conversion that costs 2x to deliver may be a loss. Run the numbers through the cost calculator before you commit a recurring template. Our SMS cost optimization post goes deeper on keeping segment counts honest.
Size your sample so the result means something
This is the part everyone skips. If you split 400 contacts into two arms of 200 and one converts at 6% and the other at 5%, that's one extra order — pure noise. You cannot make decisions on that.
Rough rule of thumb for ecommerce SMS: to reliably detect a meaningful conversion difference (say, distinguishing a 3% baseline from a 4% variant), you want several thousand recipients per arm. The smaller the true difference, the bigger the sample you need. Big, obvious offer differences ("10% off" vs. "30% off") show up at smaller sizes; subtle copy tweaks need a lot of volume to confirm.
If your whole opted-in list is only 1,200 people, you mostly can't run subtle tests — and that's fine. Test big swings, or accept that small differences will stay invisible. Don't pretend you've found a winner you don't have the data to see.
| List size available | What you can test | What you can't |
|---|---|---|
| Under ~1,000 | Big offer differences (2x discount) | Copy tweaks, timing nuance |
| ~5,000–20,000 | Offer + timing + urgency framing | Sub-1% conversion differences |
| 50,000+ | Most variables, including subtler copy | Truly cosmetic single-word swaps |
Set the test up so it's actually clean
A few rules that keep a test honest:
- Randomize the split, don't slice by signup date or geography. If arm A is all your newest subscribers, you're testing recency, not your message.
- Send both arms at the same time. Sending A on Tuesday and B on Thursday turns your copy test into a day-of-week test. The exception is when timing is the variable — then hold copy identical.
- Define the success metric before you send. Clicks, add-to-cart, completed orders, and revenue-per-recipient can all disagree. Decide which one you're optimizing. For ecommerce, revenue-per-recipient is usually the truest north star — our conversion metrics breakdown walks through why clicks lie.
- Use unique tracking links per arm so attribution isn't guesswork.
- Respect compliance on both arms. Both versions need consent, STOP handling, and quiet-hours respect. ReadySMS enforces opt-out propagation and quiet hours automatically, so a test can't accidentally text someone who unsubscribed from the control arm.
Read the result without fooling yourself
Once results land, resist the urge to declare a winner the moment one number is higher. Ask:
- Is the difference bigger than the noise? If arm A converted 4.2% and arm B converted 4.4% on a few thousand each, that gap is well within random variation. Call it a tie.
- Did the winner win on the metric that pays the bills? A variant can win on clicks and lose on revenue if it pulls in bargain-hunters who don't check out.
- Did it cost more to send? Back to the segment math — net the delivery cost out of the revenue lift.
- Would it hold up again? A real winner repeats. If you've got the volume, the strongest move is to re-run the test on a fresh split and confirm before you roll it out fleet-wide.
One winning test isn't a law of nature. It's a hypothesis you've made slightly less wrong.
Build a testing cadence, not a one-off
The highest-leverage thing isn't any single test — it's making testing routine. A simple loop:
- Pick the highest-impact variable you haven't tested (start with offer, then timing, then urgency).
- Run it on the largest sample you can afford, both arms simultaneously.
- Measure revenue-per-recipient, net of segment cost.
- Promote a confirmed winner to your default template, then test the next variable against that new baseline.
Each confirmed winner becomes the control for the next round. Over a quarter, a list of decent size can compound several small, real improvements into a meaningfully better campaign. That's worth far more than one lucky blast. The same discipline applies to your abandoned-cart flows, which are usually where the testing ROI is highest because the intent is already there.
The practical takeaway
A/B testing SMS isn't complicated, but it is disciplined: one variable at a time, a sample big enough to see the signal, both arms sent together, revenue-per-recipient as the scorecard, and the delivery cost netted out before you crown a winner. The math matters because, unlike email, every test message has a per-segment price tag — so a "winning" variant that doubles your segment count might not be winning at all.
If you want to model what a given test will cost before you run it, the cost calculator and the pricing tiers will give you the per-segment numbers to plug in. Start with one offer test on your biggest segment, confirm it, and let the wins stack from there.