Article 4AARD CodeSOD: For Each Parallel

CodeSOD: For Each Parallel

by
Remy Porter
from The Daily WTF on (#4AARD)

Parallel programming is hard. For all the advancements and tweaks we've made to our abstractions, for all the extra cores we've shoved into every CPU, deep down, software still carries the bias of the old uni-tasking model.

Aleksei P works on a software package that is heavily parallel. As such, when interviewing, he talks to candidates about their experience with .NET's Task objects and the async/await keywords.

One candidate practically exploded with enthusiasm when asked. "I've just finshed a pretty large text processing project that reads a text file in parallel!" They whipped out a laptop, pulled up the code, and proudly showed it to Aleksei (and gave Aleksei a link to their repo for bonus points).

public async Task<IDictionary<string, int>> ParseTextAsync(string filePath, ParamSortDic dic){ if (string.IsNullOrEmpty(filePath)) { throw new ArgumentNullException(nameof(filePath)); } if (Path.GetExtension(filePath)!=".txt") { throw new ArgumentException("Invalid filetype"); } Dictionary<ParamSortDic, Func<IDictionary<string, int>, IOrderedEnumerable<KeyValuePair<string, int>>>> sorting = new Dictionary<ParamSortDic, Func<IDictionary<string, int>, IOrderedEnumerable<KeyValuePair<string, int>>>> { {ParamSortDic.KeyAsc,word=> word.OrderBy(ws=>ws.Key)}, {ParamSortDic.KeyDesc,word=>word.OrderByDescending(ws=>ws.Key)}, {ParamSortDic.ValueAsc,word=>word.OrderBy(ws=>ws.Value)}, {ParamSortDic.ValueDesc,word=>word.OrderByDescending(ws=>ws.Value)}, }; var wordCount = new Dictionary<string, int>(); object lockObject = new object(); using (var fileStream = File.Open(filePath, FileMode.Open, FileAccess.Read)) { using (var streamReader = new StreamReader(fileStream)) { string line; while ((line = await streamReader.ReadLineAsync()) != null) { var lineModifyLower = Regex .Replace(line, "[^D-ND-N \\dictionary]", "") .ToLower(); var words = lineModifyLower .Split(Separators, StringSplitOptions.RemoveEmptyEntries) .Where(ws => ws.Length >= 4); Parallel.ForEach(words, word => { lock (lockObject) { if (wordCount.ContainsKey(word)) { wordCount[word] = wordCount[word] + 1; } else { wordCount.Add(word, 1); } } }); } } } return sorting[dic](wordCount).ToDictionary(k => k.Key, k => k.Value);}

There's so much to dislike about this code. Most of it is little stuff- it's painfully nested, I don't like methods which process file data also being the methods which manage file handles. Generics which look like this: Dictionary<ParamSortDic, Func<IDictionary<string, int>, IOrderedEnumerable<KeyValuePair<string, int>>>> are downright offensive.

But all of that's little stuff, in the broader context here. You'll note that the file is read using ReadLineAsync, which is asynchronous. Of course, we await the results of that method in the same line we call it. await is a blocking operation, so" not asynchronous at all.

Of course, that's a trend with this block of code. Note that the words on each line are processed in a Parallel.ForEach. And the body of that ForEach starts by creating a lock, guaranteeing that only one thread is ever going to enter that block at the same time.

Suffice to say, Aleksei didn't recommend that candidate.

raygun50.png [Advertisement] Forget logs. Next time you're struggling to replicate error, crash and performance issues in your apps - Think Raygun! Installs in minutes. Learn more. TheDailyWtf?d=yIl2AUoC8zAx9O-gjoGs2A
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments