Archive: Threading Do’s and Dont’s

ThreadingFor work recently I was asked to write a little document on some threading tips, and while I was about it, I noticed this thread on  StackOverflow asking for the same thing. I don’t pretend that this is comprehensive or even necessarily 100% correct. However, it’s a start to try and apply some simple guidelines to threading. Any improvements, suggestions, additions, please let me know and I’ll make the necessary changes. Since I see this as a “living” page I won’t clutter it with edit marks as it changes. Look in the comments for change history.

DO reconsider your options

Concurrency is very tricky and difficult to get right. It often leads to subtle and difficult to debug programming errors, and far too commonly does not result in significant speed improvements. Make absolutely certain that there are not other alternatives.

DO Use lock() { … }

In almost all cases this is faster and more efficient than more complex low lock schemes involving Interlocked or ReaderWriterLocks.Threading Safe

DO Ensure that all static methods are thread-safe

It is an accepted pattern that static methods and properties are thread safe, and non-statics are not. If you violate this in either direction, document it, and be prepared to explain why.

DO NOT use the object being locked as the lock

Always create a new object to control the lock e.g.

private object lockCollection = new object();
private List<string> collection = new List<string>();

This ensures that even if you pass the collection around and someone else locks on it, that this will not affect your locking code and will not result in deadlocks.

DO place locks anywhere when multithreaded execution order can be significant

Especially if it seems that your code does not actually need a lock. It is impossible to determine in what order code will be executed if there are no direct dependencies. Consider the following code:

int x = 0, y = 0;

// Thread 1
x = 10;                 // Line a
y++;                    // Line b

// Thread 2
y = 4;                  // Line c
x++;                    // Line d

What will the possible outputs be after both threads complete?

What perhaps is not clear is that there is no requirement that b execute after a, or that d execute after c. The reason is processor reordering, where the CPU sees that the “local” execution is unaffected by order and takes it upon itself to run in any order that makes sense from a performance perspective. Thus, a result of x = 10 and y = 4 is possible. Just because your code is in a certain sequence does not mean that the sequence will be honored by the CLR or the CPU. Wrapping it in a lock will ensure correct ordering. Alternatively you can use Thread.MemoryBarrier to separate a from b, and c from d.

SpeedDO NOT assume that multithreading will automatically create a speedup

Concurrency can lead to enhancements in performance, but not in all cases. A good understanding of execution times should be acquired before looking at concurrency. Note, not estimated execution times,actual execution times. If you have a task that can be broken into two concurrent portions, it does not help if one of the portions only takes 5% of the total execution time. The overhead in threading, context switches and synchronization will almost certainly exceed the expected concurrency gains. Ideally, the tasks should be fairly similar in their execution times for the best improvements.

DO try and use well-understood patterns like Producer-Consumer

Many concurrency issues can be simplified to one of the Producer-Consumer options (single Producer/multi Consumer, multi Producer/single Consumer, or multi Producer/single Consumer).There are a great many articles, code samples, and libraries (e.g. the Task Parallel Library) around this pattern. Make use of them to make your life easier. That way it’s much less likely that you’ll be bitten by some obscure and difficult to debug problem.

DO NOT have long-running tasks in QueueUserWorkItem and other thread pools

Thread pools are designed for short running tasks. Long-running tasks in such pools cause initial starvation while the pool determines that the relevant thread is not going to be usable. This can cause significant performance hits, especially early on in service startup. Long running tasks should have their own thread.

DO NOT spin up threads for short-running tasks

Threads are expensive resources, and should not be created and destroyed without good reason. If you have a quick task, rather use the ThreadPool (or the Task Parallel Library) to execute it.

DO use Begin… and End…

Many classes, such as the IO classes have methods such as these, e.g. File.BeginRead and File.EndRead. These are often much more efficient than using the equivalent synchronous methods. File.Read effectively removes a thread from your app for the duration of the call. File.BeginRead makes use of IO completion ports and does not keep a thread occupied. The callback is fired when the OS (via the device driver) notifies .NET that the operation has completed. This effectively means you are not using a thread for the read operation at all, just a tiny bit in the beginning, and to invoke the callback on completion.

DO call End…

Many async operations create expensive resources which are only disposed when the relevant End… method is called. Always ensure you call End, otherwise you can leak resources. If you’re really, really lucky these will be .NET resources and will eventually be reclaimed. In many cases they will not, which will lead to resource leaks.

DO NOT create threads in IIS or SQL CLR

IIS  and SQL Server are heavily controlled environments with many many threads. They are finely tuned to make effective use of their threads, and adding new threads into the mix can make them run less effectively. Always try to use ThreadPool threads when running in IIS. In fact, in IIS, it is sometimes better to schedule mid to long-running tasks in the ThreadPool than to spin up a new thread, as IIS monitors it and adjusts accordingly. In SQL CLR, creating new threads is usually not a very good idea at all.

DO use concurrency libraries and controls

BackgroundWorker and the Task Parallel Library are brilliant examples of making threading less difficult. Make use of such tools and libraries extensively when you can, as they will help shield you from some of the more common concurrency issues.

DO use InvokeRequired

If you’re writing multi-threaded Windows Forms applications, please, please, please use BackgroundWorker, and only update the UI in the RunWorkerCompleted and ProgressChanged events. If that is not an option for whatever reason, use the following pattern in the method called by the threaded code:

private void OnEventOccurred()
{
    if (!InvokeRequired)
    {
        // Do work here
    }
    else
        Invoke(new Action(OnEventOccurred), null);
}

Obviously if your method takes parameters you would use a delegate other than Action and would pass the parameters in when you call Invoke.

DO buy Joe’s Book

One of the best concurrency books around.  It’s a bit of a hard slog, mainly due to the depth and breadth of the content, but well worth it if you’re interested in concurrent programming.

This article has been recovered from an archive from my old blog site. Slight changes have been made.

Archive: Faking Performance

Performance

What’s something everybody wants for their application, but very few people have the time to deliver? Performance. Let’s face it, in most software projects, performance requirements are relegated to the very end of the project, when every knows they won’t have the time to address them. In one sense this is a good thing, as one of my biggest bugbears is premature optimisation.

Premature optimization is the root of all evil.

- Hoare’s Dictum, Sir Tony Hoare

When should I optimise?

Now keep in mind that by denigrating premature optimisation I am not saying that you should never think of performance when writing an application. Of course you should, especially when looking at your design. Sir Tony’s’ quote is all too often taken out of context as Randall Hyde argues. Good performance is most effectively obtained by thinking very carefully about your design up-front.

However, when you have the choice between an easy to write, slow algorithm and a difficult, fast one then chances are that you should probably write the easy one. This is not always true, of course. For example, you may know that the algorithm is being used on a critical path in your application, in which case you definitely should go for speed. Far too few people even keep these trade-offs in mind and go for one extreme or another; optimising code that will not be a bottleneck, or ignoring code that will. I often comment code that I know is slow with a // PERF: comment , so that I can go back to it later to improve it. The nice thing about this approach is that you can move the slow code into your unit tests in order to ensure that the results match your optimised code, since all too often bugs are introduced during optimisation.

The type of application you are writing matters a lot as well. Obviously, if you are developing a software product, you will want much tighter performance requirements from most of your code than if you’re writing a bog-standard enterprise app. The reason? Well, in my experience almost all custom enterprise applications are IO-bound, and spend an awful lot of time waiting for the user or for results from the database. In such an application, your database design and tweaks will likely make far more impact than anything else you may do.

That said, what happens when, at the end of your project you find that your code is too slow to deliver to the customer? Well, apparently in one talk Rico Mariani said that the Ten Commandments of Performance are

1. Measure

2. Measure

3. Measure

4. Measure

5. Measure

6. Measure

7. Measure

8. Measure

9. Measure

10. Measure

Scott Kirkwood has some interesting arguments and counter arguments to premature optimisation:

[…] Now back to premature optimization. I think what they really want to say is that “Unnecessary optimization makes code that is unmanageable, buggy and late” and there’s more:

  • When a program has performance problems the programmer always knows which part of the code is slow…and is always wrong.
  • Only through profiling do you really see where the performance issue is.
  • You can waste a lot of time doing optimization that doesn’t matter.
  • Optimization can often make the code more obscure, and hard to maintain.
  • Spending more time on optimization means you are spending less time on other things (like correctness and testing).

Well that’s the theory. And here’s some of my counter arguments:

  • If a developer really enjoys what he is doing it wont take any “extra” time. In other words, taking time to optimize doesn’t necessarily steal time from testing, more likely it steals time from surfing the web
  • Every time a developer looks at the code for something to optimize, he’s looking at the code! He understands (groks) it better and may fix more bugs.
  • Encouraging developers to leave in code that they know is embarrassingly slow makes them a little less proud of their code, a little less enthusiastic about finding and fixing their bugs.
  • Products have failed because in a review they mention that it took twice as long to load a document than the competitor (even though it was 2 seconds instead of 1)
  • When you put the code in production and it’s too slow, you may be able to fix it by profiling and optimizing, but then again, you may not – you may have to redesign it.

So, crack out your profiler (I’m a big fan of the ANTS Profiler from Red Gate) and measure and find your bottlenecks and optimise them. If you follow my approach with marking code you know to be slow, you might be surprised to find how rarely you are correct in your estimation of what code is a performance bottleneck.

When should I stop optimising?

Obviously, this is a big question to ask. Usually, you will get a few low-hanging fruit, a couple of optimisations that give you large performance benefits. After that, however, it will become more and more difficult to find good high-value optimisations. At this point most people stop optimising and ship the code. However, what if that’s still not good enough? Well, let’s think about faking performance.

John Maeda says

Often times, the perception of waiting less is just as effective as the actual fact of waiting less. For instance, an owner of a Porsche achieves the thrill of directness between translation of a slight tap on the acceleration pedal, to be manifest as an immediate burst of speed. Yet in any normal rush hour situation, a Porsche doesn’t go any faster than a Hyundai. The Porsche owner, however, still derives pleasure from his or her perception that they are getting to work faster in a quantitatively faster machine. The visual and tactile semantics of the Porsche’s cockpit all support the qualitative illusion that the driver is going faster than when he or she is sitting inside a Hyundai.

[…] The premise was when a user was presented with a task that required time for the computer to crunch on something, when a progress barwas shown, the user would perceive that the computer took less time to process versus having been shown no progress bar at all.

So, one of the easiest ways to fake performance is to slap in a BackgroundWorker component, put your expensive code in there, and report progress via a progress bar. Since you are adding not only an extra thread, but also more UI updates there is no doubt whatsoever that your code is less efficient, yet the user will perceive it as being more efficient.

Now, if even that’s not enough, another approach is obviously to offload the processing to another machine. This is even better than the progress bar if the user does not need the results of the calculation right away, since they say what they want to happen and press the start button or whatever and can immediately begin working on something else. By offloading the processing, perhaps to a multiprocessor server, you are gaining a massive improvement in the users perception of the speed of your applications, as well as an improvement in the running time. The cost, obviously, is the work required to implement the handover as well as the hardware costs of the server.

Now, I am not advocating no optimisations at all, but I am trying to get across that sometimes these “faking it” approaches are easier and cheaper than extensive performance tweaking. Needless to say, sometimes, even with massive optimisation, you still need massive offloading capabilities. Just look at SETI@Home as an example.

Conclusion

So, keep performance very much in mind when designing your software, keep performance trade-offs in mind when writing it, keep difficulty and impact of optimisations in mind when profiling, and keep faking it mind when polishing your application.

Update

Jeff Atwood has a nice post on how changes to the File Copy progress bar made users see the copy as less efficient, even when it was in fact more accurate.

This article has been recovered from an archive from my old blog site. Slight changes have been made.