I have the following table showing number of new customers, and number of returning customers:
fact_date NewCustomer ReturningCustomer
01/08/2022 10000 0
02/08/2022 0 9944
03/08/2022 0 9894
04/08/2022 0 9842
05/08/2022 0 9803
06/08/2022 0 9748
07/08/2022 0 9707
08/08/2022 0 9654
I am trying to create a difference column to calculate churn.
select *,
lag(ReturningCustomer,1) over (order by fact_date) as PrevDay,
ReturningCustomer - lag(ReturningCustomer,1) over (order by fact_date) as Diff
from table
order by fact_date
Which gives me:
fact_date NewCustomer ReturningCustomer PrevDay Diff
01/08/2022 10000 0 NULL NULL
02/08/2022 0 9944 0 9944
03/08/2022 0 9894 9944 -50
04/08/2022 0 9842 9894 -52
05/08/2022 0 9803 9842 -39
06/08/2022 0 9748 9803 -55
07/08/2022 0 9707 9748 -41
08/08/2022 0 9654 9707 -53
However you can see the second date shows a "diff" of 9,944. But really it should be -56. Because the first day had 1000 customers and on the second day we lost 56 customers.
How can I make sure the Diff value is 56?
I have tried playing with case statements, default values for the lag etc. but it’s a complete mess.
>Solution :
To calculate the churn correctly, you need to include the NewCustomer column in the calculation. Get the sum of NewCustomer and ReturningCustomer for the previous day, and then subtracts it from the ReturningCustomer count for the current day.
SELECT
*,
LAG(ReturningCustomer + NewCustomer, 1) OVER (ORDER BY fact_date) as PrevDay,
ReturningCustomer - LAG(ReturningCustomer + NewCustomer, 1) OVER (ORDER BY fact_date) as Diff
FROM Customer_data
ORDER BY fact_date