How to calculate the sample size for an A/B testing, Including Calculator & Derivation

Introduction

Suppose that we have two treatments A and B representing two different marketing strategies. Treatment A is called the “control” that represents the marketing strategy that is already in place and Treatment B is called the “variation” that represents a newly-developed marketing strategy. We are going to compare the two different marketing strategies A and B by comparing the conversion rate associated with each strategy. The conversion rate is equal to the number of purchases over the number of visiting customers. For example, when considering an e-commerce website, the conversion rate for treatment A, is found by calculating how much of the number of website visitors who view the website designed according to marketing strategy A, purchase the product. A higher conversion rate is an indication of a good marketing strategy whereas a low conversion rate is an indication of a poor marketing strategy. In A/B testing, we decide whether or not there is a significant difference between the two strategies, and if there exists a significant difference, we find which strategy is superior and quantify such a significant difference. The main statistical tool for A/B testing is Hypothesis Testing, and in particular when conversion rates are involved, we are using the test of proportions.

In this article we will first provide a calculator that calculates the required sample size needed before the A/B test is conducted that ensures a proper comparison between the two treatments. The formula for the required sample size is then shown together with a full mathematical description and derivation of the A/B test and the associated sample size equation.

Sample Size Calculator for A/B Testing

Control Conversion Rate (\hat{p}_A in %):

Minimum Detectible Effect (in %):

Confidence Level (1-\alpha):

Power (1-\beta):

Sample Size per Variation:

  • The control conversion rate is the conversion rate associated with the marketing strategy already in place.
  • The confidence level is the probability of correctly stating that the variation is not a better alternative to the marketing strategy already in place.
  • The power is the probability of correctly stating that the variation is indeed a better alternative to the marketing strategy already in place.
  • The minimum detectible effect is the value MDE such that the test can identify a relative increase of MDE or more from the conversion rate of A to the conversion rate of B, with a confidence level of at least 1-\alpha and a power of  at least 1-\beta.

For example, suppose that the conversion rate associated with the marketing strategy already in place is 8%, that is \hat{p}_A is 8%. Suppose that the minimum detectible effect is 25%, the confidence level is 95% and the power is 80%. The required sample size for the variant B is 2513. Thus if we have a sample of 2513 customers for variant B, we can perform the A/B test and be able to correctly state that variant B offers no better alternative to the control A with probability 95%, or else correctly state that variant B offers a better alternative to A with probability of at least 80%, when the percentage change from the conversion rate of A to the conversion rate of B is at least 25%.

Required Sample Size Formula for A/B Testing

The minimum sample size for the variation B given a control conversion rate \hat{p}_A, a minimum detectible effect MDE, that ensures a confidence level 1-\alpha and power of at least 1-\beta is:

\Bigg\lceil\frac{\bigg(z_{\alpha}\sqrt{\frac{(2+MDE)(2-2\hat{p}_A-\hat{p}_A MDE)}{2}}+z_{\beta}\sqrt{1-\hat{p}_A+(1+MDE)(1-\hat{p}_A-\hat{p}_A MDE)}\bigg)^2}{MDE^2 \hat{p}_A}\Bigg\rceil

Thus, if the percentage change from p_A to p_B is MDE or more, and we have such a sample size for variation B, we can perform the A/B Testing with a probability of a Type 1 error of at most \alpha and a probability of a Type 2 error of at most \beta. The following section gives the mathematical derivation of the sample size equation.

Mathematical Derivation of the Sample Size

Let p_A be population proportion (i.e. the conversion rate) for the control A and let p_B be population proportion (i.e. the conversion rate) for treatment B. We will use the one-tailed set of hypothesis. Thus the hypotheses for the test of proportions are:

H_0:\ p_A-p_B=0

H_1:\ p_A-p_B< 0

The control A is the marketing strategy that is currently in place. If the marketing strategy B is better, that is p_B>p_A, then the alternative hypothesis H_1 (p_A-p_B<0) will be true. If the marketing strategy B is worse or equally as effective as marketing strategy A, then B is disregarded.

In practice the true values of p_A and p_B are unknown and in statistics we use the sample proportions \hat{p}_A and \hat{p}_B to derive results on the two population parameters p_A and p_B and thus we are able to choose one of the two hypotheses. In hypotheses testing we can make two types of errors. The first one is when we accept H_0 when in reality H_0 is not true. This is known as a Type 1 error. The second one is when we reject H_0 when in reality H_0 is true. The probability of making a Type 1 error is \alpha and the probability of making a Type 2 error is \beta. We are after an appropriate sample size that ensures that both \alpha and \beta are fixed to certain pre-defined values. A common value for \alpha is 0.05 (5%), whereas a common value for \beta is 0.20 (20%). Thus, in such a case, we will be finding a sample size that ensures that the probability of a Type 1 error is 5% whereas the probability of a Type 2 error is no more than 20%.

Let n_A and n_B be the sample sizes for control A and variation B respectively. Recall, that from probability theory, the sampling distribution of the difference \hat{p}_A-\hat{p}_B is given by:

\hat{p}_A-\hat{p}_B\sim\mathcal{N}(p_A-p_B,\frac{p_A(1-p_A)}{n_A}+\frac{p_B(1-p_B)}{n_B})

Therefore,

Z=\frac{\hat{p}_A-\hat{p}_B-(p_A-p_B)}{\sqrt{\frac{p_A(1-p_A)}{n_A}+\frac{p_B(1-p_B)}{n_B}}}\sim\mathcal{N}(0,1)

We are going to fix the probability of a Type 1 error to be equal to \alpha. Hence:

    \begin{equation*} \begin{split} \mathbb{P}[H_0\mbox{ is rejected}|H_0\mbox{ is true}]&=\alpha\\ \mathbb{P}[H_0\mbox{ is accepted}|H_0\mbox{ is true}]&=1-\alpha\\ \mathbb{P}[ Z\leq z_{\alpha}|H_0\mbox{ is true}]&=1-\alpha\mbox{ (where }\alpha\mbox{ is the critical value corresponding to a cumulative probability of }1-\alpha)\\ \mathbb{P}[ \frac{\hat{p}_A-\hat{p}_B-(p_A-p_B)}{\sqrt{\frac{p_A(1-p_A)}{n_A}+\frac{p_B(1-p_B)}{n_B}}} \leq z_{\alpha}|H_0\mbox{ is true}]&=1-\alpha\\ \mathbb{P}[\frac{\hat{p}_A-\hat{p}_B-0}{\sqrt{P(1-P)(\frac{1}{n_A}+\frac{1}{n_B})}} \leq z_{\alpha}]&=1-\alpha\mbox{ (where }P\mbox{ is the pooled sample proportion }\frac{n_A \hat{p}_A+n_B \hat{p}_B}{n_A+n_B})\\ \mathbb{P}[\hat{p}_A-\hat{p}_B \leq z_{\alpha}\sqrt{P(1-P)(\frac{1}{n_A}+\frac{1}{n_B})}]&=1-\alpha \end{split} \end{equation*}

If we assume that n_A=n_B, the pooled sample proportion P reduces to \frac{\hat{p}_A+\hat{p}_B}{2} and the (1-\alpha) confidence interval reduces to:

    \begin{equation*} \begin{split} \mathbb{P}[\hat{p}_A-\hat{p}_B \leq z_{\alpha}\sqrt{\frac{\hat{p}_A+\hat{p}_B}{n_B}(1-\frac{\hat{p}_A+\hat{p}_B}{2})}]&=1-\alpha \end{split} \end{equation*}

Now let us consder the probability of a Type 2 error associated with the alternate hypothesis H_1:\ p_A-p_B< 0, in particular with the value \lambda=p_A-p_B. We want the probability of a Type 2 error to be at most \beta. Therefore:

    \begin{equation*} \begin{split} \mathbb{P}[H_0\mbox{ is accepted}|H_1\mbox{ is true}]&\leq\beta\\ \mathbb{P}[H_0\mbox{ is rejected}|H_1\mbox{ is true}]&\geq 1-\beta\\ \mathbb{P}[\hat{p}_A-\hat{p}_B>z_{\alpha}\sqrt{\frac{\hat{p}_A+\hat{p}_B}{n_B}(1-\frac{\hat{p}_A+\hat{p}_B}{2})}|H_1\mbox{ is true}]&\geq 1-\beta\\ \mathbb{P}[\frac{\hat{p}_A-\hat{p}_B-\lambda}{\sqrt{\frac{\hat{p}_A(1-\hat{p}_A)}{n_A}+\frac{\hat{p}_B(1-\hat{p}_B)}{n_B}}}>\frac{z_{\alpha}\sqrt{\frac{\hat{p}_A+\hat{p}_B}{n_B}(1-\frac{\hat{p}_A+\hat{p}_B}{2})}-\lambda}{\sqrt{\frac{\hat{p}_A(1-\hat{p}_A)}{n_A}+\frac{\hat{p}_B(1-\hat{p}_B)}{n_B}}}|H_1\mbox{ is true}]&\geq 1-\beta\\ \mathbb{P}[Z>\frac{z_{\alpha}\sqrt{\frac{\hat{p}_A+\hat{p}_B}{n_B}(1-\frac{\hat{p}_A+\hat{p}_B}{2})}-\lambda}{\sqrt{\frac{\hat{p}_A(1-\hat{p}_A)}{n_A}+\frac{\hat{p}_B(1-\hat{p}_B)}{n_B}}}]&\geq 1-\beta \end{split} \end{equation*}

For such an inequality to hold, we have:

    \begin{equation*} \begin{split} \frac{z_{\alpha}\sqrt{\frac{\hat{p}_A+\hat{p}_B}{n_B}(1-\frac{\hat{p}_A+\hat{p}_B}{2})}-\lambda}{\sqrt{\frac{\hat{p}_A(1-\hat{p}_A)}{n_A}+\frac{\hat{p}_B(1-\hat{p}_B)}{n_B}}}&\leq -z_{\beta}\\ z_{\alpha}\sqrt{\frac{\hat{p}_A+\hat{p}_B}{n_B}(1-\frac{\hat{p}_A+\hat{p}_B}{2})}-\lambda&\leq -z_{\beta}\sqrt{\frac{\hat{p}_A(1-\hat{p}_A)}{n_A}+\frac{\hat{p}_B(1-\hat{p}_B)}{n_B}}\\ z_{alpha}\sqrt{\frac{\hat{p}_A+\hat{p}_B}{n_B}(1-\frac{\hat{p}_A+\hat{p}_B}{2})}+z_{\beta}\sqrt{\frac{\hat{p}_A(1-\hat{p}_A)}{n_A}+\frac{\hat{p}_B(1-\hat{p}_B)}{n_B}}&\leq \lambda\\ \frac{1}{\sqrt{n_B}}\bigg(z_{\alpha}\sqrt{(\hat{p}_A+\hat{p}_B)(1-\frac{\hat{p}_A+\hat{p}_B}{2})}+z_{\beta}\sqrt{\hat{p}_A(1-\hat{p}_A)+\hat{p}_B(1-\hat{p}_B)}\bigg)&\leq \lambda\\ n_B&\geq \frac{\bigg(z_{\alpha}\sqrt{(\hat{p}_A+\hat{p}_B)(1-\frac{\hat{p}_A+\hat{p}_B}{2})}+z_{\beta}\sqrt{\hat{p}_A(1-\hat{p}_A)+\hat{p}_B(1-\hat{p}_B)}\bigg)^2}{\lambda^2} \end{split} \end{equation*}

Now since the control A is the market strategy already in place, we assume that we would have a value for \hat{p}_A in hand. In fact \hat{p}_A is the control conversion rate which is one of the inputs of the sample size calculator. We will express p_B in terms of p_A as follows:

p_B=(1+MDE)p_A,

and \hat{p}_B in terms of \hat{p}_A as follows:

\hat{p}_B=(1+MDE)\hat{p}_A,

where MDE stands for the minimum detectable effect, and is the minimum percentage increase from p_A and p_B, that results in a test having confidence level 1-\alpha and power at least 1-\beta.

 

Therefore,

    \begin{equation*} \begin{split} n_B&\geq \frac{\bigg(z_{\alpha}\sqrt{(\hat{p}_A+\hat{p}_B)(1-\frac{\hat{p}_A+\hat{p}_B}{2})}+z_{\beta}\sqrt{\hat{p}_A(1-\hat{p}_A)+\hat{p}_B(1-\hat{p}_B)}\bigg)^2}{\lambda^2}\\ &=\frac{\bigg(z_{\alpha}\sqrt{(\hat{p}_A+(1+MDE)\hat{p}_A)(1-\frac{\hat{p}_A+(1+MDE)\hat{p}_A}{2})}+z_{\beta}\sqrt{\hat{p}_A(1-\hat{p}_A)+(1+MDE)\hat{p}_A(1-(1+MDE)\hat{p}_A)}\bigg)^2}{(p_A-(1+MDE)p_A)^2}\\ &=\frac{\bigg(z_{\alpha}\sqrt{\frac{\hat{p}_A(2+MDE)(2-2\hat{p}_A-\hat{p}_A MDE)}{2}}+z_{\beta}\sqrt{\hat{p}_A(1-\hat{p}_A+(1+MDE)(1-\hat{p}_A-\hat{p}_A MDE))}\bigg)^2}{(-MDE p_A)^2}\\ &=\frac{\bigg(z_{\alpha}\sqrt{\frac{\hat{p}_A(2+MDE)(2-2\hat{p}_A-\hat{p}_A MDE)}{2}}+z_{\beta}\sqrt{\hat{p}_A(1-\hat{p}_A+(1+MDE)(1-\hat{p}_A-\hat{p}_A MDE))}\bigg)^2}{MDE^2 p_A^2}\\ &\simeq\frac{\bigg(z_{\alpha}\sqrt{\frac{\hat{p}_A(2+MDE)(2-2\hat{p}_A-\hat{p}_A MDE)}{2}}+z_{\beta}\sqrt{\hat{p}_A(1-\hat{p}_A+(1+MDE)(1-\hat{p}_A-\hat{p}_A MDE))}\bigg)^2}{MDE^2 \hat{p}_A^2}\mbox{(since we are using }\hat{p}_A\mbox{ as an estimate of }p_A)\\ &=\frac{\bigg(z_{\alpha}\sqrt{\frac{(2+MDE)(2-2\hat{p}_A-\hat{p}_A MDE)}{2}}+z_{\beta}\sqrt{1-\hat{p}_A+(1+MDE)(1-\hat{p}_A-\hat{p}_A MDE)}\bigg)^2}{MDE^2 \hat{p}_A}\\ \end{split} \end{equation*}

Thus the minimum sample size for the variation B given a control conversion rate \hat{p}_A, a minimum detectible effect MDE, confidence level 1-\alpha and power at least 1-\beta is:

\Bigg\lceil\frac{\bigg(z_{\alpha}\sqrt{\frac{(2+MDE)(2-2\hat{p}_A-\hat{p}_A MDE)}{2}}+z_{\beta}\sqrt{1-\hat{p}_A+(1+MDE)(1-\hat{p}_A-\hat{p}_A MDE)}\bigg)^2}{MDE^2 \hat{p}_A}\Bigg\rceil

Worked Example

Suppose that the conversion rate associated with the marketing strategy already in place is 8%, that is \hat{p}_A is 8%. Suppose that if the percentage change from the conversion rate of strategy A to the conversion rate of strategy B, is 25% or more, we can perform an A/B testing with confidence level of at least 95% and power 80%. Thus \alpha=0.05, \beta=0.20 and MDE is 25%. The required minimum sample size is:

    \begin{equation*} \begin{split} &\Bigg\lceil\frac{\bigg(z_{0.05}\sqrt{\frac{(2+0.25)(2-2(0.08)-0.08\times 0.25)}{2}}+z_{0.20}\sqrt{1-0.08+(1+0.25)(1-0.08-0.08\times 0.25)}\bigg)^2}{0.25^2 \times 0.08}\Bigg\rceil\\ =&\Bigg\lceil\frac{\bigg(1.645\sqrt{2.0475}+0.842\sqrt{2.045}\bigg)^2}{0.005}\Bigg\rceil\\ =&2531 \end{split} \end{equation*}