Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Avoiding first zero in SQL Lag() and Diff calculation

I have the following table showing number of new customers, and number of returning customers:


fact_date   NewCustomer ReturningCustomer
01/08/2022  10000       0
02/08/2022  0           9944
03/08/2022  0           9894
04/08/2022  0           9842
05/08/2022  0           9803
06/08/2022  0           9748
07/08/2022  0           9707
08/08/2022  0           9654

I am trying to create a difference column to calculate churn.

select *,
lag(ReturningCustomer,1) over (order by fact_date) as PrevDay,
ReturningCustomer - lag(ReturningCustomer,1) over (order by fact_date) as Diff 
from table
order by fact_date

Which gives me:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

fact_date   NewCustomer ReturningCustomer   PrevDay Diff
01/08/2022  10000       0                   NULL    NULL
02/08/2022  0           9944                0       9944
03/08/2022  0           9894                9944    -50
04/08/2022  0           9842                9894    -52
05/08/2022  0           9803                9842    -39
06/08/2022  0           9748                9803    -55
07/08/2022  0           9707                9748    -41
08/08/2022  0           9654                9707    -53

However you can see the second date shows a "diff" of 9,944. But really it should be -56. Because the first day had 1000 customers and on the second day we lost 56 customers.

How can I make sure the Diff value is 56?

I have tried playing with case statements, default values for the lag etc. but it’s a complete mess.

>Solution :

To calculate the churn correctly, you need to include the NewCustomer column in the calculation. Get the sum of NewCustomer and ReturningCustomer for the previous day, and then subtracts it from the ReturningCustomer count for the current day.

SELECT 
  *,
  LAG(ReturningCustomer + NewCustomer, 1) OVER (ORDER BY fact_date) as PrevDay,
  ReturningCustomer - LAG(ReturningCustomer + NewCustomer, 1) OVER (ORDER BY fact_date) as Diff 
FROM Customer_data
ORDER BY fact_date
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading