Is it a good practise to store `IsValid` column in a table, for validation?

byMR

November 12, 2021

Goal

To record room temperatures in MySQL database.

My 2 scenarios to record the values

Scenario 1

Schema (MySQL v5.7)

CREATE TABLE Room (
  Id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
  Description VARCHAR(50)
);

INSERT INTO Room (Description) VALUES ('Room 1'), ('Room 2');

CREATE TABLE RoomSpecification (
  Id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
  RoomId INT NOT NULL,
  MinVal INT,
  MaxVal INT
);

INSERT INTO RoomSpecification (RoomId, MinVal, MaxVal) VALUES
(1, 10, 20),
(2, 20, 30);


CREATE TABLE Records (
  Id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
  RoomId INT,
  RecordedVal INT,
  IsValid BOOL
);

INSERT INTO Records (RoomId, RecordedVal, IsValid) VALUES
(1 , 20, 1), -- Valid
(2 , 10, 0); -- Not-Valid

Query #1

SELECT * FROM Records;

Id	RoomId	RecordedVal	IsValid
1	1	20	1
2	2	10	0

View on DB Fiddle

Summary

This scenario includes a column IsValid. By using data type BOOL, I can recognise that this value is not valid by doing a simple query.

Scenario 2

Schema (MySQL v5.7)

CREATE TABLE Room (
  Id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
  Description VARCHAR(50)
);

INSERT INTO Room (Description) VALUES ('Room 1'), ('Room 2');

CREATE TABLE RoomSpecification (
  Id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
  RoomId INT NOT NULL,
  MinVal INT,
  MaxVal INT
);

INSERT INTO RoomSpecification (RoomId, MinVal, MaxVal) VALUES
(1, 10, 20),
(2, 20, 30);


CREATE TABLE Records (
  Id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
  RoomId INT,
  RecordedVal INT
);

INSERT INTO Records (RoomId, RecordedVal) VALUES
(1 , 20), -- Valid
(2 , 10); -- Not-Valid

Query #1

SELECT
    r.*,
    CASE
        WHEN r.RecordedVal BETWEEN MinVal AND MaxVal THEN 1 ELSE 0
    END AS IsValid
FROM Records r
JOIN RoomSpecification rs ON rs.RoomId = r.Id;

Id	RoomId	RecordedVal	IsValid
1	1	20	1
2	2	10	0

View on DB Fiddle

Summary

On the other hand, I have removed the column IsValid and used a query to do the Validation.

Question

Between the two scenarios, which would be the correct/good practise scenario to go for? Unless someone else has other suggestions?

>Solution :

Your Scenario 1 is an example of denormalization. This is okay if you really need to make it more optimized to check for a record’s validity without doing the join. That is, if you can’t tolerate the overhead both in typing the code and executing the code of doing the join to check if each record is valid, as you show in your Scenario 2.

Denormalization is okay. We do this frequently in database design, in cases where we want a short-cut to avoid overhead or extra code.

But keep in mind that denormalization comes with its own cost. The cost is that you have a risk the denormalized data itself will end up being wrong, because it represents a fact that isn’t true.

For example, in your Scenario 1, what if someone subsequently updates the data?

UPDATE Records SET RecordedVal = 25 WHERE Id = 2;

The value they changed is now between the MinVal, MaxVal range for the references room. So this update should have also set IsValid=1, but it didn’t. The validity column is now wrong. This is called a data anomaly.

Another data anomaly could occur if someone updates the MinValue, MaxValue of a row in the RoomSpecification table. That single update could cause anomalies on thousands of rows or even millions of rows in the Records table.

This is a common risk when you use denormalization. You take on extra work to prevent or at least detect data anomalies. You may have to write more code to ensure updates are forced to update the IsValid column. Maybe you write a trigger to change IsValid if the table is updated. Maybe you write a stored procedure to do the update, and limit update privilege so users must use the procedure. Or maybe you don’t prevent anomalies, but you write a scheduled job to search for anomalies that have crept in, and report them.

Whereas if you use Scenario 2, you don’t get the convenience of the denormalized design, but you do get the assurance against data anomalies.

Which is best? That’s up to you, and the answer could be different for each project, or even for different cases within the same project.

You must use your own judgment based on your knowledge of your own project. This is part of being a professional software engineer.