A/B Testing Guide


What is A/B Testing?

A/B testing or split testing is a method of experimentation wherein two or more versions of a variable are shown to different randomized segments of the target audience within the same time to determine which version has the most impact on the target business metrics.

Why A/B Testing is used?

As marketers would say, A/B testing is particularly used to improve overall conversion rates as part of Conversion Rate Optimization (CRO).

And as a user-centric marketer would say A/B testing is really a way to solve user’s pain points by finding and addressing what they need.

Some of the reasons why it’s one of the best approaches to CRO are listed below:

  • A/B testing eliminates guesswork and brings in data to make data-based decisions
  • It leads to improved conversion rates as we find winners of the A/B tests
  • Making small, incremental changes is a lean process and avoids any big risk to the conversion rates
  • A/B testing is agile and lets businesses reach the audience and include their feedback at any time

Important Terms and Definitions

Some key terms that are used in A/B testing are as follows:

  • Control – Control refers to the original version
  • Variation – Refers to the new version that is being tested
  • Winner – The version that moves the target metrics in the positive direction
  • Target Metric – The particular metric that will be measured to decide the split test winner
  • Lift/Drop – The increase or decrease in the target metrics
  • Hypothesis – The reason behind why we’re doing this A/B test
  • Sample size: The number of visitors or traffic to the test. The larger the sample size, the less variability there will be, which means the mean is more likely to be accurate.
  • Variance: It is the average variability. The higher the variability, the less accurate the mean will be in predicting the data point.
  • Statistical Significance: How significant your test results are

What can you A/B test?

While it seems that you should be split testing everything, it’s important to prioritize elements and channels that would be a higher impact and focus on those.

A/B testing should be done on specific elements within channels. While the list of channel and element combinations are endless, here’s a list that serves as a starting example:

  • Copy on ads
  • Copy on landing pages
  • Colors on ad creatives
  • Colors on landing page banners
  • CTA button texts on ads
  • CTA button texts on landing pages
  • CTA button color on emailers
  • Form fields on landing pages
  • Subject lines on emailers
  • Audiences on ads
  • Price points on product pages
  • New CRO hacks on product pages
  • Order of content on landing pages
  • CTA on SMS campaigns
  • Time of sending on emailers
  • Time of posting on social media
  • Media formats on social media
  • Content style on blogs
  • CTAs on blog posts

If you’re starting out, look at your digital sales funnel and start by optimizing your ads, landing pages, and welcome emails.

This would be a good place to start for most businesses.

Types of A/B Tests

There are mainly 3 types of A/B tests:

  • Split tests
    • Split tests are commonly used as a replacement term for A/B tests, however, usually, a split test tests two fundamentally different designs or versions while A/B tests test very small variations.
    • Something like testing page load time implementations or alternative database implementations should be considered a split test while test color changes or text changes would be considered an A/B test.
    • Also, testing dynamic content makes a test a split test rather than an A/B test.
  • Multivariate tests
    • In this method, multiple page variations are simultaneously tested.
    • When conducted properly, multivariate testing can help eliminate the need to run multiple and sequential A/B tests on a web page with similar goals.
    • The total number of variants in the case of multivariate tests gets multiplied. So if there are 3 variants of element A tested vs 3 variants of element B, the multivariate test would have a total of 9 variants.
  • Multipage tests
    • This is a form of experimentation where you can test changes of certain elements across multiple pages or assets.
    • In this case, for example, we can create a complete duplicate of all the pages or assets in a funnel and change the variant for the entire funnel.
    • This helps an individual from the target audience see a more consistent set of elements thus maintaining the User Experience.

The Statistical Approaches to A/B Testing

The Frequentist Approach (Long-term probability)

What it is: The probability of an event in the future is related to how frequently it occurs repeatedly with multiple trials or data points.

What it means: This means the A/B tests need a large number of visitors or traffic, which could mean longer or more expensive testing.

The Bayesian Approach (Logical Probability)

What it is: The probability of an event is expressed as a degree of belief in an event based on the information available about the event including past information about other tests that can derive information for the other event.

What it means: Rather than just relying on collecting a lot of new data points, you can use past information and make decisions using your understanding and the new data quickly.

While the frequentist approach uses data only for the current experiment, the Bayesian approach resembles more of how we naturally make decisions in life by using data that we are gathering now but also the information we have from the past.

The A/B Testing Process

The A/B testing process resembles the growth marketing process as it’s one of the many methods used in growth marketing.

The key stages in the process are:

  • Research: Research user behavior, competitors, feedback, analytics data, and other quantitative and qualitative data points
  • Ideation: Come up with A/B testing ideas based on the research
  • Experimentation: Run the A/B test
  • Integration: Analyse results, then either drop or integrate the test into the experience (Alternatively, repeat the test with new information as needed)

What makes an A/B test significant?

Not all A/B tests you’ll do will hold significance. It’s important to be able to use the results from statistically significant tests.

A statistically significant result would mean that the test has produced a result that is likely to reflect across a larger scale when implement or integrated into your channels.

To calculate your A/B test significance you need to know (or calculate) the following three things:

  1. Your sample size
  2. Level of confidence needed
  3. The size of your lift

Here’s a calculator you can use. This will help you understand how much traffic or engagement you’ll need to create a significant test.

You can input the variables on the calculator as per your test design.

For the size of the uplift, you can start with 5% as your target. You can further adjust based on your experience.

Eventually, as you have more data and experience, you’ll be able to select your margin of error based on specific experiments.

Let’s say you select a margin of error of 5%.

This means that if you ran your A/B test multiple times, 95% of the ranges will capture the true conversion rate. Or we can say the conversion rate will fall outside the margin of error 5% of the time (of whatever level of statistical significance you’ve set)

How long should you run an A/B test?

While this question depends on multiple factors, it’s important to:

  • First, identify your business cycle and set that as the base for the duration of your split test.
  • Next, make sure you run tests in full weeks to account for weekend variance.
  • Next, continue extending the test duration as needed from two to three to four weeks.

30 days is usually the maximum for A/B testing but this can change depending on the channel you’re experimenting on.

A/B testing Tools

While there are many A/B testing tools out there, the ones we recommend are:

  • Google Optimize: Free (with some multivariate limitations) which shouldn’t really impact you if you’re just getting started. It works well when performing Google Analytics A/B testing, which is a plus. You can forget about multivariate testing unless you have huge numbers of traffic and visitors.
  • Optimizely: Easy to get minor tests up and running, without technical skills. Stats Engine makes it easier to analyze test results. Typically, Optimizely is the most expensive option of the three.
  • VWO: VWO has SmartStats to make analysis easier. Plus, it has a great WYSIWYG editor for beginners. Every VWO plan comes with heatmaps, on-site surveys, form analytics, etc.
  • Ads: For Facebook (and Instagram) Ads, you can use Facebook’s own Experimenting tool or just set up split tests manually within your ad campaigns.
  • Email: For email marketing, ActiveCampaign and Klaivyo (for eCommerce) have great inbuilt A/B testing tools.

A/B testing by user segment

Sometimes, a test might be a loser overall, but chances are it performed well with at least one segment.

This is why you should review your test results by segment.

Some Examples of user segments include:

  • New visitors
  • Returning visitors
  • iOS visitors
  • Android visitors
  • Chrome visitors
  • Safari visitors
  • Desktop visitors
  • Tablet visitors
  • Organic search visitors
  • Paid visitors
  • Social media visitors
  • Logged-in buyers

A/B testing and SEO

This section is for A/B tests conducted on-site (and not tests conducted on Ads or Emails or other channels outside the website)

A/B testing does not pose any risk to website search rankings but it’s important to keep a few things in mind so it’s implemented correctly.

However, it is possible to jeopardize your search rank by abusing an A/B testing tool for purposes such as cloaking.

Here are some best practices, as mentioned by Google, to ensure that this doesn’t happen:

  • No Cloaking: Cloaking is the practice of showing search engines different content than a typical visitor would see. To prevent cloaking, do not abuse A/B testing or visitor segmentation to display different content to Googlebot based on user-agent or IP address.
  • Use rel=”canonical”: If you use multiple URLs for your split tests, use the rel=”canonical” attribute to point the variations back to the original version of the page.
  • Use 302 Redirects Instead Of 301s: If you run a test that redirects the original URL to a variation URL, use a 302 temporary redirect instead of a 301 permanent redirect.

Finally, run experiments only as long as necessary.

Remember the Purpose of A/B Testing

Sometimes, marketers often go too narrow in their focus during A/B testing and incremental learning that they continue cycles of optimizations for the sake of the metric itself.

Do remember that the purpose of A/B testing is to eventually have a positive impact on the business.

This often requires zooming out again to a broader level and analyzing other business metrics before further testing.

For instance, a conversion rate optimization split testing routine can go too deep into various CRO cycles and might miss a drop in Average Order Value (AOV) caused by a few implementations or split tests done to improve conversion rate.

At this point, for this particular example, it’s important to go back and see if the drop in AOV is acceptable relative to the lift in Conversion Rate.

This balance needs to be maintained for the business to have overall good metrics.

Top Tips to Remember:

  1. Understand what you need to test first to improve the business
  2. Focus on your goal metric that you’re looking to improve
  3. Check the data for statistical significance
  4. Test in small iterations for compound learnings and improvement
  5. Use qualitative findings to verify the quantitative data
  6. Keep your A/B tests 30 days or less
  7. Use A/B testing tools that fit your requirement the best
  8. Apply user segmentation for testing
  9. Keep the key business goal in mind
  10. A failed hypothesis doesn’t mean a failed experiment!

The last point #10 is important to remember as every hypothesis you build on for your split tests will not be proven. In fact, most hypotheses you test might be invalidated.

This, however, doesn’t mean that your A/B test has failed. In fact, if you’re able to validate or invalidate your hypothesis, in both cases, your experiment has succeeded.

This is because – in the end – you know what works better. You have more learnings with each test than what you had before.