THE ISSUE
I need to get all defaults per day of a machine which logs are stored in a MySQL table every 5 seconds like this :
| time | def1 | def2 | def3 | def4 | def5 |
|---|---|---|---|---|---|
| 2022-09-01 12:00:00 | 0 | 0 | 0 | 0 | 0 |
| 2022-09-01 12:00:05 | 0 | 1 | 0 | 0 | 0 |
| 2022-09-01 12:00:10 | 0 | 1 | 0 | 0 | 0 |
| 2022-09-01 12:00:15 | 0 | 0 | 0 | 1 | 0 |
| 2022-09-01 12:00:20 | 1 | 0 | 0 | 1 | 0 |
| 2022-09-01 12:00:25 | 1 | 0 | 0 | 0 | 0 |
| 2022-09-01 12:00:30 | 0 | 1 | 0 | 0 | 0 |
| 2022-09-01 12:00:35 | 0 | 0 | 0 | 1 | 0 |
| 2022-09-01 12:00:40 | 0 | 0 | 0 | 0 | 0 |
| 2022-09-01 12:00:45 | 0 | 0 | 0 | 0 | 0 |
| 2022-09-02 12:00:00 | 0 | 0 | 0 | 0 | 0 |
| 2022-09-02 12:00:05 | 0 | 0 | 1 | 0 | 0 |
| 2022-09-02 12:00:10 | 1 | 0 | 0 | 0 | 0 |
| 2022-09-02 12:00:15 | 1 | 0 | 1 | 0 | 1 |
| 2022-09-02 12:00:20 | 1 | 0 | 1 | 0 | 1 |
| 2022-09-02 12:00:25 | 1 | 0 | 0 | 0 | 1 |
| 2022-09-02 12:00:30 | 1 | 1 | 0 | 0 | 1 |
| 2022-09-02 12:00:35 | 0 | 1 | 1 | 0 | 0 |
| 2022-09-02 12:00:40 | 0 | 0 | 1 | 0 | 0 |
| 2022-09-02 12:00:45 | 0 | 0 | 0 | 0 | 0 |
So the query need to count rising edge only for each column, grouped by date on time column.
Expected result :
| date | def1 | def2 | def3 | def4 | def5 |
|---|---|---|---|---|---|
| 2022-09-01 | 1 | 2 | 0 | 2 | 0 |
| 2022-09-02 | 1 | 1 | 3 | 0 | 1 |
I’ve got now about 30 columns and 3’800’000 records for 6 months of logs. Our customer wants to keep at least 6 months, ideally 2 years of logs.
So, for me the difficulties are to be efficient in storage but also to retrieve data as quick as possible.
FINDING SOLUTION
Based on the following logical (n = actual record, m = previous record) :
m 0 0 1 1
n 0 1 0 1
_______
x 0 1 0 0
We will get only rising edge (not falling edge) so change between 0 to 1 and ignore all other values :
Def3 on 2022-09-02 0 1 0 1 1 0 0 1 1 0
occurred 3 times ↑ ↑ ↑
finding solution :
n - m 0 1 -1 1 0 -1 0 1 0 -1
ABS(n - m) 0 1 1 1 0 1 0 1 0 1
n XOR m 0 1 1 1 0 1 0 1 0 1
So, how to get only rising edges !? In other words, how to get only 1 values as true and -1/0 as false
>Solution :
WITH
cte AS (
SELECT DATE(dt) dt, def1, def2, def3, def4, def5,
ROW_NUMBER() OVER (PARTITION BY DATE(dt) ORDER BY dt) rn
FROM test
)
SELECT dt,
SUM(t2.def1 > t1.def1) def1,
SUM(t2.def2 > t1.def2) def2,
SUM(t2.def3 > t1.def3) def3,
SUM(t2.def4 > t1.def4) def4,
SUM(t2.def5 > t1.def5) def5
FROM cte t1
JOIN cte t2 USING (dt)
WHERE t2.rn = t1.rn + 1
GROUP BY 1;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c8402e9ea53a0aa580b18868c913465a