How To Run Effective A/B Testing For Startups

Business research argues that experimentation should be the organizing principle for entrepreneurial strategy. Indeed there is scientific evidence that experimentation leads to organizational learning, which drives improvements in firm performance (Koning, Hasan, & Chatterji, 2021).

The reason why I decided to write this blog post is that A/B testing has become so popular in online marketing. However, the question how to test if the website has a small number of users and conversions comes up frequently working with early-stage startups.  

Over the years, the term A/B testing has gained popularity in the tech industry and it often has often been used interchangeably with the following terms: A/B Tests, experimentation, CRO, and online randomized controlled experiments. I decided to write a blog post aiming to help startup founders and marketing teams run better experiments in startups and clarify as well some doubts that may arise regarding the terminology.

A/B Testing Terminology For Startups

In order to set accurate expectations for team members and stakeholders like startup investors, VCs, Founders, Product teams, and marketing teams in startups it is important to clarify what these terms mean to your company. 

  • A/B testing is a method of comparing two versions of a product or feature to determine which one performs better. In a simple A/B test, A and B are the two variants usually called Treatment and Control. The first is with the new feature and the second is without the new feature (called the “control” group). The test would then randomly assign a portion of its visitors to each group and measure the conversion rate (the percentage of visitors who complete a major action: ie. purchase or lead) for each group. However, there are some cases when A/B testing is not the best test to run.
  • Online controlled experiments are a more generic term and may include the following tests: A/A tests, A/B tests, A/B/n tests, field experiments, randomized controlled experiments (RCT), split tests, bucket tests, and flights (Kohavi, Deng, & Vermeer, 2022).
  • CRO (Conversion Rate Optimisation) is the process of improving the user experience on the website. The goal of CRO is to increase the efficiency of a website by converting more visitors into customers. This is typically done through a combination of design, content, and usability improvements.   
  • Experimentation is a much broader concept that involves a scientific mindset. Experimentation can be used to validate or disprove hypotheses by trying out new strategies or tactics to produce a wider output. Ideas are converted into test hypotheses, then prioritized, and finally, the team runs scientific tests. As a data-driven company, we will refuse tests that do not achieve statistical significance. 

Why startups should run A/B testing?

Recent research focused on startup performance provides evidence of how digital experimentation affects the performance of tech startups and their products studied a sample of 35,000 startups over four years to analyze if A/B testing improves the performance of the startups in the sample (Koning, Hasan & Chattreji, 2022). 

The study found that while relatively few startups adopt A/B testing, among those that do, performance improved by 30% to 100% after a year of use. Authors mentioned: “We then argue that this substantial effect and relatively low adoption rate arises because startups do not only test one-off incremental changes but also use A/B testing as part of a broader strategy of experimentation. Qualitative insights and additional quantitative analyses show that experimentation improves organizational learning, which helps startups develop more new products, identify and scale promising ideas, and fail faster when they receive negative signals. These findings inform the literature on entrepreneurial strategy, organizational learning, and data-driven decision-making.
Click here to read more about HBR Research.

Statistical principles in A/B Tests

In order to take advantage of A/B testing startup marketing teams need to apply statistical principles. The hypothesis is a prediction about how a change to a product or feature will impact the behaviour of users. To evaluate the impact of an experiment, a manager needs to make data-driven informed decisions based on data and analysis and reject the hypothesis by analyzing the right metrics at the end of the experiment. The best scientific way to establish causality with a high probability is by detecting small changes and that results are trustworthy.

Statistical significance is usually used to validate an experiment. Below you find additional information on what statistical significance is, how to measure it and how to achieve it to validate your experiments. 

What is Statistical Significance? 

The statistical significance of a test provides definitive evidence that the observed outcomes are not the result of random chance. It refers to the likelihood that the difference between the results of the two groups (control and treatment) is not due to random chance. 

I analyzed 4 factors that contribute to the statistical significance of A/B Testing.

Confidence level  

An important aspect of the A/B test is calculating its confidence level. If you run a test with a 95% significance level, you can be 95% confident that the differences are real between your experiment’s control version and the test version. You may want to set a confidence level of 85%, 90% or 95%. 95% is an accepted standard for statistical significance. Click here, to learn more about how to calculate your confidence level. 

Hypothesis testing   

The first step of any experiment is to generate a hypothesis. Whenever conducting an experiment, it is essential to consider both a null hypothesis and an alternative hypothesis.

The null hypothesis assumes that there is no connection between the two variables being observed, whereas the alternative hypothesis attempts to demonstrate that there is in fact a relationship between them, and serves as the basis for the experiment. After establishing the initial and alternate hypotheses, statisticians may choose to do further tests. The null hypothesis can be evaluated through a z-score, while the p-value provides an indication as to the validity of the alternative hypothesis. A null hypothesis is a statement that there is no significant difference between two groups or no relationship between two variables. The null hypothesis is usually denoted as H0.

If the null hypothesis cannot be rejected, it means that the results of the test are not statistically significant and it is not possible to draw any conclusions about the effect of the change. Click here to learn more about hypothesis testing.

Sample size

Sample size refers to how large the sample for your experiment is. By having a larger sample size, you can be more confident in the outcome of the experiment assuming we are running a randomized experiment. The larger the better to achieve statistical significance and validate your test in less time. 

Randomize sampling

It is essential to recognize the necessity of random selection when taking a sample. To obtain accurate results from the population, it is important to distribute the visits to two web pages randomly.

A/B Testing With A Small Sample Size for Startups

It is recommended to have a large sample size when conducting A/B testing, as this can increase the accuracy and reliability of the test results. A small sample size can lead to unreliable or inconclusive results. However, there may be cases where it is not feasible to test a large sample size, such as when you have a limited budget or a small number of users. In these cases, it is still possible to conduct A/B testing with a small sample size, but you should be aware of the potential limitations and take steps to mitigate them.

Here are some tips for conducting A/B testing with a small sample size for early-stage startups:

  1. Increase Time: increase the duration of the test to allow more users to be exposed to the two versions. This can help improve the accuracy of the results.
  2. Accept higher uncertainty: by reducing the confidence threshold and accepting a higher risk of false positives. Also, we may increase the MDE, or accepting the test will have lower power. Learn more on this topic, here: G. Georgiev (A/B Testing with Small Sample Size, 2019)
  3. Micro-conversions: testing micro-conversions instead of macro-conversions. For example, instead of optimizing the A/B Tests in terms of sales, you may want to measure the AddToCart. 
  4. Additional methods: gather data through design thinking tools like user interviews, surveys, and other methods to supplement the results of the A/B test (these methods are not 100% scientific). 

Experimentation process change for every startup and team depending on the team size, business model, and priority needs. However, to design and execute a successful A/B test for startups, you follow these standard steps:

  1. Identify the goal of the test: What do you want to learn or improve upon? Make sure the goal is specific and measurable. I would use the following 5 steps scientific framework. 
  2. Idea Generation: Qualitative and quantitative data are used to generate new test hypotheses.  A new test objective could be the improvement of the design of a website, the subject line of an email, or the pricing of a product. The team aims always for the highest impact. 
  3. Sample size: You can properly understand the minimum sample required and for how long you need to run the experiment by making use of the CXL Calculator.  
  4. Create the variations: Define the Control and Treatment of the element you aim to test. Test only one change at a time, so you can accurately isolate and analyze the effect of that specific change in terms of conversions. 
  5. Tools & Test Setup: Spend some time to evaluate the best that suits the need of your team. In this recent blog post, I reviewed the best 3 A/B testing tools for startups. Also before running your test you should run a pre-test analysis. 
  6. Run the test: Allow the test to run for a sufficient amount of time to collect enough data according to the sample size available.  
  7. Analyze the results: Use statistical analysis to determine which version performed better, and which version achieved statistical significance in implementing the winning version.

Conclusion

Having a strategy is critical for running experiments and identifying areas with high ROI. Recent research provides evidence of the benefits of running A/B testing for startups. A/B tests help break down larger business challenges into smaller, testable hypotheses and then reduce uncertainty. Many firms and startups don’t need a large sample size to run experiments although is better to manage a greater traffic volume to reduce the time of execution.



Bibliography


AB test sample size calculators by CXL, Retrieved January 6, 2023, from https://cxl.com/ab-test-calculator/

Deng, A., Lu, J., & Litz, J. (2017, February). Trustworthy Analysis of Online A/B Tests:
Pitfalls, challenges and solutions. WSDM ’17: Proceedings of the Tenth ACM International
Conference on Web Search and Data Mining, 641-649. doi:10.1145/3018661.3018677

Georgiev-Geo, G. (2019, March 4). A/B testing with a small sample size. Analytics Toolkit.

Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments: A
Practical Guide to A/B Testing. Cambridge: Cambridge University Press.
doi:10.1017/9781108653985

Koning, R., Hasan, S., & Chatterji, A. (2022). Experimentation and Start-up Performance:
Evidence from A/B Testing. Management Science.

Optimizely. (2021, June 24). What’s Next? Steps For Prioritizing Your Experiment Backlog,
2022.

Optimizely (2022, August). Testing tips for low-traffic sites. Optimizely.

Rusonis, S. (2021, June 24). Example A/B test hypothesis. Optimizely. Retrieved April 5,
2022.

Schwartz, T. (2018, June 14). Create a Growth Culture, Not a Performance-Obsessed One.
Harvard Business Review. Retrieved April 5, 2022.

2560 1703 Nicola Rubino

Leave a Reply