ThreadPool performance you can see

We’ve spent a lot of time touting improvements to the .NET Framework in .NET 4 around threading, including core enhancements to the performance of the runtime itself.  Sometimes data is more powerful than words, however, and it’s useful to be able to see exactly what kind of difference such improvements can make.  To assist with that, here is code for a small sample you can try compiling and running on your own:


using System;


using System.Diagnostics;


using System.Threading;


 


class Program


{


    static void Main(string[] args)


    {


        Console.WriteLine(


            TimeSpan.FromMilliseconds(


                Enumerable.Range(0, 6).Select(_ =>


                {


                    var sw = Stopwatch.StartNew();


                    CreateAndWaitForWorkItems(10000000);


                    return sw.ElapsedMilliseconds;


                }).Skip(1).Average()


            )


        );


    }


 


    static void CreateAndWaitForWorkItems(int numWorkItems)


    {


        using (ManualResetEvent mre = new ManualResetEvent(false))


        {


            int itemsRemaining = numWorkItems;


            for (int i = 0; i < numWorkItems; i++)


            {


                ThreadPool.QueueUserWorkItem(delegate


                {


                    if (Interlocked.Decrement(


                        ref itemsRemaining) == 0) mre.Set();


                });


            }


            mre.WaitOne();


        }


    }


}


The CreateAndWaitForWorkItems method simply launches N work items using ThreadPool.QueueUserWorkItem and then waits for all N to complete by atomically decrementing a shared counter.  The main method then times the invocation of this method with N equal to 10 million, doing so several times and taking the average.  This microbenchmark is pure overhead, as there’s no actual work being performed in each work item. In fact, we should expect that as we add more cores, the time to complete this operation will increase, as more cores will contend for the data structures employed in both the ThreadPool and in my simple test. The hope is that the work done in .NET 4 decreases that overhead, especially on higher core counts where more and more threads will be contending for the shared data structures employed.


The following numbers are in no way official benchmarks, but they can give you a sense for how the work that’s been done in .NET 4 really does make a difference. These are the numbers I see when I run this microbenchmark informally on .NET 3.5 and on .NET 4 on two laptops I currently have access to while writing this blog post.  The only change I made to go from .NET 3.5 to .NET 4 was modifying the “Target framework” in the project’s properties in Visual Studio, taking advantage of Visual Studio 2010’s multitargeting support.




















Machine


.NET 3.5


.NET 4


Improvement


A dual-core box


  5.03 seconds


2.45 seconds


2.05x


A quad-core box


19.39 seconds


3.42 seconds


5.67x


Some pretty awesome performance improvements simply by upgrading to .NET 4.

 ThreadPool performance you can see

Using Parallel Extensions for .NET 4 in ASP.NET apps

ASP.NET applications already get a lot of concurrency for free. The .NET Framework load balances incoming requests among ThreadPool worker threads, striving for optimal use of available CPUs. As long as you minimize blocking in your ASP.NET page code, ASP.NET will process requests concurrently. In most cases, and in particular for Web applications with heavy usage, it is probably not necessary to introduce extra parallelism since adding more work items will only result in competition for CPU time and ultimately reduce request throughput.

Dealing with I/O bound work

If most of the work being done in an ASP.NET request is asynchronous in nature (such as I/O), doing the asynchronous work synchronously can be a huge scalability bottleneck. Solutions based on Asynchronous Programming Model (APM) and Event-based Asynchronous Pattern (EAP) have been recommended to ease this bottleneck. For an in-depth discussion on this refer to Scalable Apps with Asynchronous Programming in ASP.NET and Asynchronous Pages in ASP.NET 2.0. The article Improving ASP.NET Performance also has some good pointers to improving the scalability of your web applications.

New features in the .NET Framework 4 can also be used to make programming asynchronous pages easier. The System.Threading.Tasks.Task class (and the Task<TResult> class that derives from it) can be used to represent asynchronous operations, both classes implement IAsyncResult, and they provide capabilities for coordinating between multiple asynchronous activities. Since part of ASP.NET’s asynchronous pages support is based on the Asynchronous Programming Model (APM) pattern and IAsyncResult, Task can play a role in easing the implementation of asynchronous pages. In particular, Task is most useful if you want to structure your code with continuations, which can be useful if you have multiple stages of asynchronous activity that need to happen before the rest of the page continues execution. For more details, refer to Tasks and the Event-based Asynchronous Pattern and Tasks and the APM Pattern

Dealing with CPU intensive work

Web applications that need to perform expensive computations may still benefit from parallelism if the latency of an individual request is more important than overall request throughput. If this is the case, the new APIs for parallelism in .NET 4 such as Task Parallel Library and PLINQ can simplify writing the parallel code. When integrating parallelism into your web application, consider the following factors:

ASP.NET thumb Using Parallel Extensions for .NET 4 in ASP.NET apps  

If requests are computationally cheap to process, then parallelism is probably an unnecessary overhead.

If the incoming request rate is high, then adding more parallelism will likely yield few benefits and could actually decrease performance, since the incoming rate of work may be high enough to keep the CPUs busy.

If the incoming request rate is low, then the Web application could benefit from parallelism by using the idle CPU cycles to speed up the processing of an individual request. We can use either PLINQ or TPL (either Parallel loops or the Task class) to parallelize the computation over all the processors. Note that by default, however, the PLINQ implementation in .NET 4 will tie-up one ThreadPool worker per processor for the entire execution of the query. As such, it should only be used in Web applications that see few but expensive requests.

If the incoming request rate is variable, i.e. there are long periods when request rate is low (say, at night) and then other periods when request rate is high (say, midday), we need a strategy that will dynamically adjust to the available resources. When the load is high, we don’t want to add to the contention but when the load is low, we want to use the idle resources. For this scenario, we can use TPL’s Parallel or Task constructs since they can adapt to use available resources within a process. If the server is already loaded, the Parallel loops can use as little as one worker and make forward progress. If the server is mostly free, they can grow to use as many workers as the ThreadPool can spare.

Developing libraries for ASP.NET

If you’re developing a library that uses the parallel programming features of .NET 4, you should consider whether it is going to be to be used within ASP.NET. If it is, you should consider exposing knobs from your library that enable controlling how much parallelism is employed by the library. This is particularly important for libraries that utilize PLINQ. In .NET 4, PLINQ by default uses a fixed number of workers equal to the number of logical processors. By exposing control to the consumer of the library, the consumer can specify a maximum amount of parallelism to be employed, and this value can be configured based on the environment. The number of workers PLINQ utilizes is controllable through the WithDegreeOfParallelism operator; the maximum number of workers utilized by the Parallel loops is controllable through the ParallelOptions class, an instance of which is supplied as a parameter to overloads of the looping constructs.

Conclusion

ASP.NET already takes advantage multiple processors on your server. Most developers will not need to explicitly add any parallelism into their ASP.NET Web applications. However, if your particular situation requires explicit parallelism, the new parallelism APIs in .NET 4 can be beneficial to you.

 Using Parallel Extensions for .NET 4 in ASP.NET apps

Using Parallel Extensions for .NET 4 in ASP.NET apps

ASP.NET applications already get a lot of concurrency for free. The .NET Framework load balances incoming requests among ThreadPool worker threads, striving for optimal use of available CPUs. As long as you minimize blocking in your ASP.NET page code, ASP.NET will process requests concurrently. In most cases, and in particular for Web applications with heavy usage, it is probably not necessary to introduce extra parallelism since adding more work items will only result in competition for CPU time and ultimately reduce request throughput.

Dealing with I/O bound work

If most of the work being done in an ASP.NET request is asynchronous in nature (such as I/O), doing the asynchronous work synchronously can be a huge scalability bottleneck. Solutions based on Asynchronous Programming Model (APM) and Event-based Asynchronous Pattern (EAP) have been recommended to ease this bottleneck. For an in-depth discussion on this refer to Scalable Apps with Asynchronous Programming in ASP.NET and Asynchronous Pages in ASP.NET 2.0. The article Improving ASP.NET Performance also has some good pointers to improving the scalability of your web applications.

New features in the .NET Framework 4 can also be used to make programming asynchronous pages easier. The System.Threading.Tasks.Task class (and the Task<TResult> class that derives from it) can be used to represent asynchronous operations, both classes implement IAsyncResult, and they provide capabilities for coordinating between multiple asynchronous activities. Since part of ASP.NET’s asynchronous pages support is based on the Asynchronous Programming Model (APM) pattern and IAsyncResult, Task can play a role in easing the implementation of asynchronous pages. In particular, Task is most useful if you want to structure your code with continuations, which can be useful if you have multiple stages of asynchronous activity that need to happen before the rest of the page continues execution. For more details, refer to Tasks and the Event-based Asynchronous Pattern and Tasks and the APM Pattern

Dealing with CPU intensive work

Web applications that need to perform expensive computations may still benefit from parallelism if the latency of an individual request is more important than overall request throughput. If this is the case, the new APIs for parallelism in .NET 4 such as Task Parallel Library and PLINQ can simplify writing the parallel code. When integrating parallelism into your web application, consider the following factors:

ASP.NET thumb Using Parallel Extensions for .NET 4 in ASP.NET apps  

If requests are computationally cheap to process, then parallelism is probably an unnecessary overhead.

If the incoming request rate is high, then adding more parallelism will likely yield few benefits and could actually decrease performance, since the incoming rate of work may be high enough to keep the CPUs busy.

If the incoming request rate is low, then the Web application could benefit from parallelism by using the idle CPU cycles to speed up the processing of an individual request. We can use either PLINQ or TPL (either Parallel loops or the Task class) to parallelize the computation over all the processors. Note that by default, however, the PLINQ implementation in .NET 4 will tie-up one ThreadPool worker per processor for the entire execution of the query. As such, it should only be used in Web applications that see few but expensive requests.

If the incoming request rate is variable, i.e. there are long periods when request rate is low (say, at night) and then other periods when request rate is high (say, midday), we need a strategy that will dynamically adjust to the available resources. When the load is high, we don’t want to add to the contention but when the load is low, we want to use the idle resources. For this scenario, we can use TPL’s Parallel or Task constructs since they can adapt to use available resources within a process. If the server is already loaded, the Parallel loops can use as little as one worker and make forward progress. If the server is mostly free, they can grow to use as many workers as the ThreadPool can spare.

Developing libraries for ASP.NET

If you’re developing a library that uses the parallel programming features of .NET 4, you should consider whether it is going to be to be used within ASP.NET. If it is, you should consider exposing knobs from your library that enable controlling how much parallelism is employed by the library. This is particularly important for libraries that utilize PLINQ. In .NET 4, PLINQ by default uses a fixed number of workers equal to the number of logical processors. By exposing control to the consumer of the library, the consumer can specify a maximum amount of parallelism to be employed, and this value can be configured based on the environment. The number of workers PLINQ utilizes is controllable through the WithDegreeOfParallelism operator; the maximum number of workers utilized by the Parallel loops is controllable through the ParallelOptions class, an instance of which is supplied as a parameter to overloads of the looping constructs.

Conclusion

ASP.NET already takes advantage multiple processors on your server. Most developers will not need to explicitly add any parallelism into their ASP.NET Web applications. However, if your particular situation requires explicit parallelism, the new parallelism APIs in .NET 4 can be beneficial to you.

 Using Parallel Extensions for .NET 4 in ASP.NET apps

Debugging TPL apps in VS2010

The new parallel debugger windows in Visual Studio 2010 (Parallel Tasks and Parallel Stacks) have had many fixes and updates.


I have refreshed the existing content and also added new material for Beta 2. Find links to all of it from my blog post on Parallel Debugging.


Cheers
Daniel

 Debugging TPL apps in VS2010