What is Estimated Unique Visitors in A/B testing?
Calculating unique visitor numbers can be resource-intensive, especially over longer periods. Daily aggregation, used for most metrics, doesn’t apply to unique visitors since a returning visitor would be counted multiple times. This makes archiving reports for large A/B tests time-consuming, sometimes taking over 12 hours.
To address this, we now use estimated unique visitors for large A/B tests. This estimation is achieved using a technique called HyperLogLog, which accurately estimates unique elements in large datasets while maintaining performance.
Here’s how it works:
-
Probabilistic Data Structure: HyperLogLog is a probabilistic data structure designed to estimate the cardinality (number of distinct elements) of a multiset (a set that allows for multiple occurrences of the same element) with very low memory usage compared to storing each element individually.
-
Estimation Accuracy: While HyperLogLog provides an estimation rather than an exact count, it allows to set a target accuracy level. Higher accuracy can slow down archiving, so a 98% accuracy level is commonly used.
-
Implementation in Matomo: To balance accuracy with efficiency, Matomo is configured to use Hyperloglog with 99% target accuracy level. While this means your reports are not exact, they are highly accurate and available quickly.
In summary, using HyperLogLog in Matomo for estimating unique visitors helps mitigate the performance of long running and high traffic A/B tests. The Matomo implementation balances accuracy with efficiency, making it a suitable choice for web analysts and marketeers dealing with large datasets who need timely data-driven insights.
Note: The formula processes the entire data set and leverages an estimation algorithm designed to deliver faster results compared to a traditional COUNT DISTINCT SQL query. This approach does not involve any form of data sampling. Every data point is considered in the computation to ensure accuracy while improving performance.