I’m working on a project that requires chewing through a lot of data. While looking for ways to make the code run faster – I hate waiting – I decided to throw the new Task Parallel Library at the problem to see what sort of improvements I could gain. Below is the concept and the results I saw (remember: your mileage may vary).
The Code
Here is the gist of the original code:
var lines = File.ReadAllLines(path).ToList();
lines.ForEach(x => parsedResults.Add(lineParser.ParseLine(x)));
Quite simple for sure, but to run through 10,000 lines this was taking about 15,000 Milliseconds, or 15 seconds. To me that felt slow, especially when you start talking about millions of rows (A million rows at 1.5 ms would still take just under 17 minutes).
Task Parallel Library Version (TPL)
There’s a very simple .ForEach method that’s quite handy and simple to use, however I have local variables and a master collection I need to combine into for later processing and therefore need to use a version of ForEach that supports thread-local variables. Here is that code:
Parallel.ForEach(lines,
() => new List<LineResult>(),
(current, loop, threadLocalList) =>
{
threadLocalList.Add(lineParser.ParseLine(current));
return threadLocalList;
},
parsedResults.AddRange
);
Results
Using the 10,000 line test version and taking an average of several runs here are the results:
Before: 15,267
After: 7,546
Speed improvement: 100%
Conclusion
Overall this isn’t a very exciting example, however it does show the potential impacts that the Task Parallel Library (TPL) can have on your runtime performance. I was pleased with the library in that I had a very simple cast (looping) and there was very little to do. I simply added the “using” directive and that’s about it, no configuration or dependencies. If you’ve played with the Task Parallel Library what real-world uses have you found? What benefits have you seen from the code?
Posted
11-19-2010 12:47 AM
by
Tim Barcz