I’ve been playing with .NET 8 RC2 and I have not come across this before in EF Core.
I have created a dynamic query and part of the query is this:
_ordersQuery = _ordersQuery.Where(x => !stationIds.Contains(x.StationNo));
stationIds is of type List<short>.
When I do .ToQueryString() on the query I notice this:
DECLARE @__stationIds_7 nvarchar(4000) = N'[117,116,98]';
then in the actual WHERE part of the SQL query I see this
NOT IN (
SELECT [s0].[value]
FROM OPENJSON(@__stationIds_7) WITH ([value] smallint '$') AS [s0]
Why is it being treated as JSON?
>Solution :
That’s a smart idea! The type is definitely not wrong. Instead of generating a hard-coded IN, the list is passed as a JSON array, deserialized with OPENJSON (a very fast method) and used with NOT IN.
Until now, list.Contains or !list.Contains were transformed to hard-coded SQL clauses IN (@id1, @id2, ....) or NOT IN (...). The number of items had to be known in advance. Now, while the server can take advantage of the statistics of the IN (...) clause to calculate how many matches there may be, a NOT IN (..) is almost useless. If there are 1M IDs and 100 items in NOT IN, the server will still have to scan the entire table for the other 9999000 matching items.
Another concern is that a varying number of parameters in IN (...) would result in multiple execution plans, not only filling the cache with single-use plans but evicting other useful plans
This was announced in EF Core 8 Preview 4, in the Translating LINQ Contains with a parameter collection section and addressed a pretty serious performance issue, especially for long-lived web sites (emphasis mine):
But crucially, the negative performance impact of constantly varying SQLs goes beyond this particular query. SQL Server (and Npgsql) can only cache a certain number of SQLs; at some point, they have to get rid of old entries to avoid using too much memory. If you frequently use Contains with a variable array, each individual invocation causes valuable cache entries to be taken at the database, for SQLs that will most probably never be used (since they have the specific array values baked in). That means you’re also evicting cache entries for other, important SQLs that will need to be used, and requiring them to be re-planned again and again.
As the docs say, this was the second highest voted performance issue in the EF Core repo.