I have a .netcore 6 BackGroundService which pushes data from on-premise to a 3rd party API.
The 3rd party API takes about 500 milliseconds to process the API call.
The problem is that I have about 1,000,000 rows of data to push to this API one at a time. At 1/2 second per row, it’s going to take about 6 days to sync up.
So, I would like to try to spawn multiple threads in order to hit the API simultaneously with 10 threads.
var startTime = DateTimeOffset.Now;
var batchSize = _config.GetValue<int>("BatchSize");
using (var scope = _serviceScopeFactory.CreateScope())
{
var context = scope.ServiceProvider.GetRequiredService<PlankContext>();
var dncEntries = await context.PlankQueueDnc.Where(x => x.ToProcessFlag == true).Take(batchSize).ToListAsync();
foreach (var plankQueueDnc in dncEntries)
{
var response = await _plankConnector.InsertDncAsync(plankQueueDnc);
context.PlankQueueDnc.Update(plankQueueDnc);
}
await context.SaveChangesAsync();
}
Here is the code. As you can see, it gets a batch of 100 records and then processes them one by one. Is there a way to modify this so this line is not awaited? I don’t quite understand how it would work if it were not awaited. Would it create a thread for each execution in the loop?
var response = await _plankConnector.InsertDncAsync(plankQueueDnc);
I am clearly not up to speed on threads as well as the esteemed @StephanCleary.
So suggestions would be appreciated.
>Solution :
In .NET 6 you can use Parallel.ForEachAsync to execute operations concurrently, using either all available cores or a limited Degree-Of-Parallelism.
The following code loads all records, executes the posts concurrently, then updates the records :
using (var scope = _serviceScopeFactory.CreateScope())
{
var context = scope.ServiceProvider.GetRequiredService<PlankContext>();
var dncEntries = await context.PlankQueueDnc
.Where(x => x.ToProcessFlag == true)
.Take(batchSize)
.ToListAsync();
await Parallel.ForEachAsync(dncEntries,async plankQueueDnc=>
{
var response = await _plankConnector.InsertDncAsync(plankQueueDnc);
plankQueueDnc.Whatever=response.Something;
};
await context.SaveChangesAsync();
}
There’s no reason to call Update as a DbContext tracks the objects it loaded and knows which ones were modified. SaveChangesAsync will persist all changes in a single transaction