天天看点

Managed I/O Completion Ports (IOCP)

1. Introduction - Native Win32 IOCP

I/O Completion Ports (IOCP) supported on Microsoft Windows platforms has two facets. It first allows I/O handles like file handles, socket handles, etc., to be associated with a completion port. Any async I/O completion events related to the I/O handle associated with the IOCP will get queued onto this completion port. This allows threads to wait on the IOCP for any completion events. The second facet is that we can create a I/O completion port that is not associated with any I/O handle. In this case, the IOCP is purely used as a mechanism for efficiently providing a threadsafe waitable queue technique. This technique is interesting and efficient. Using this technique, a pool of a few threads can achieve good scalability and performance for an application. Here is a small example. For instance, if you are implementing a HTTP server application, then you need to do the following mundane tasks apart from protocol implementation.

  1. Create a client connection listen socket.
  2. Once we get the client connection, use the client socket to communicate with the client to and fro.

You can implement it by creating one dedicated thread per client connection that can continuously communicate with the client to and fro. But this technique quickly becomes a tremendous overhead on the system and will reduce the performance of the system as the number of simultaneous active client connections increase. This is because threads are costly resources and thread switching is the major performance bottle neck especially when there are more number of threads.

The best way to solve this is to use an IOCP with a pool of threads that can work with multiple client connections simultaneously. This can be achieved using some simple steps...

  1. Create a client connection listen socket.
  2. Once we get the client connection, post an IOCP read message on the socket to an IOCP.
  3. One of the threads waiting for completion events on this IOCP will receive the first read message for the client. It immediately posts another read onto the same IOCP and continues processing the read message it got. Once completed processing the read message, it again waits on the IOCP for another event.

This technique will allow a small pool of threads to efficiently handle communication with hundreds of client connections simultaneously. Moreover this is a proven technique for developing scalable server side applications on Windows platforms.

The above is a simplified description of using IOCP in multithreaded systems. There are some good in-depth articles on this topic in CodeProject and the Internet. Do a bit of Googling on words like IO Completion Ports, IOCP, etc., and you will be able to find good articles.

2. Introduction - Managed IOCP

Managed IOCP is a small .NET class library that provides the second facet of Native Win32 IOCP. This class library can be used both by C# and VB.NET applications. I chose the name Managed IOCP to keep the readers more close to the techniques they are used to with native Win32 IOCP. As the name highlights, Managed IOCP is implemented using pure .NET managed classes and pure .NET synchronization primitives. At its core, it provides a threadsafe object queuing & waitable object receive mechanism. Apart from that, it provides a lot more features. Here is what it does.

  1. Multiple Managed IOCP instances per process.
  2. Registration of multiple threads per Managed IOCP instance.
  3. Dispatching

    System.Object

    types to a threadsafe queue maintained by each Managed IOCP instance.
  4. Waitable multi-thread safe retrieval of objects from the Managed IOCP instance queue by all the threads registered for that particular Managed IOCP instance.
  5. Ability to restrict number of concurrent active threads processing the queued objects related to a particular Managed IOCP instance.
  6. Policy based replaceable/customizable approach for choosing a registered thread to process the next available queued object.
  7. Pause the Managed IOCP processing. Internally pauses processing of queued objects by registered threads. Also, by default, disallows enqueuing new objects (can be changed).
  8. Run the Managed IOCP instance. Internally re-starts the processing of queued objects by registered threads. Also allows enqueuing new objects (if it is disallowed previously).
  9. Modify the max allowed concurrent threads at runtime.
  10. Provides easy accessibility to Managed IOCP instance runtime properties like...
    1. Number of active concurrent threads.
    2. Number of objects left in queue.
    3. Number of allowed concurrent threads.
    4. Running status.
  11. Safe and controlled closing of a Managed IOCP instance.

2.1. Managed IOCP in Job/Task oriented Business Processes

Managed IOCP can be used in other scenarios apart from the sample that I mentioned in the introduction to native Win32 IOCP. It can be used in process oriented server side business applications. For instance, if you have a business process ( _not_ a Win32 process) with a sequence of tasks that will be executed by several clients, you will have to execute several instances of the business process one for each client in parallel. As mentioned in my introduction to native Win32 IOCP, you can achieve this by spawning one dedicated thread per business process instance. But system will quickly run out of resources and the system/application performance will come down as more instances are created. Using Managed IOCP, you can achieve the same sequential execution of multiple business process instances, but with fewer threads. This can be done by dispatching each task in a business process instance as an object to Managed IOCP. It will be picked up by one of the waiting threads and will be executed. After completing the execution, the thread will dispatch the next task in the business process instance to the same Managed IOCP, which will be picked up by another waiting thread. This is a continuous cycle. The advantage is that you will be able to achieve the sequential execution goal of a business process, as only one waiting thread can receive a dispatched object and at the same time keep the system resource utilization to required levels. Also, the system and business process execution performance will increase as there are few threads executing multiple parallel business processes.

3. Using Managed IOCP in .NET applications

Multithreaded systems are complex in the context that most problems will show up in real time production scenarios. To limit the possibility of such surprises while using Managed IOCP, I created a test application using which several aspects of the Managed IOCP library can be tested. Nevertheless, I look forward for any suggestions/corrections/inputs to improve this library and its demo application.

Before getting into the demo application, below is the sequence of steps that an application would typically perform while using the Managed IOCP library.

  1. Create an instance of the

    ManagedIOCP

    class:
    using Sonic.Net;
    ManagedIOCP mIOCP = new ManagedIOCP();      
    The

    ManagedIOCP

    constructor takes one argument,

    concurrentThreads

    . This is an integer that specifies how many max concurrent active threads are allowed to process objects queued onto this instance of

    ManagedIOCP

    . I used a no argument constructor, which defaults to a max of 1 concurrent active thread.
  2. From a thread that needs to wait on objects queued onto the

    ManagedIOCP

    instance, call

    Register()

    method on the

    ManagedIOCP

    instance. This will return an instance of the

    IOCPHandle

    class. This is like Native Win32 IOCP handle, using which the registered thread can wait on arrival of objects onto the

    ManagedIOCP

    instance. This thread can use the

    Wait()

    method on the

    IOCPHandle

    object. The

    Wait(

    ) will indefinitely wait until it grabs an object queued onto the

    ManagedIOCP

    instance to which the calling thread is registered. It either comes out with an object or an exception in case the

    ManagedIOCP

    instance is stopped (we will cover this later).
    IOCPHandle hIOCP = mIOCP.Register();
    while(true)
    {
        try
        {
            object obj = hIOCP.Wait();
            // Process the object
        }
        catch(ManagedIOCPException e)
        {
            break;
        }
        catch(Exception e)
        {
            break;
        }
    }      
  3. Any thread (one that is registered with the

    ManagedIOCP

    instance and any non-registered thread) that has access to the

    ManagedIOCP

    instance can dispatch (

    Enqueue

    ) objects to it. These objects are picked up by waiting threads that are registered with the

    ManagedIOCP

    instance onto which objects are being dispatched.
    string str = "Test string";
    mIOCP.Dispatch(str);      
  4. When a thread decides not to wait for objects any more, it should un-register with the

    ManagedIOCP

    instance.
    mIOCP.UnRegister();      
  5. Once the application is done with an instance of

    ManagedIOCP

    , it should call the

    Close()

    method on it. This will release any threads waiting on this instance of

    ManagedIOCP

    , clears internal resources, and resets the internal data members thus providing a controlled and safe closure of a

    ManagedIOCP

    instance.
    mIOCP.Close();      
    There are certain useful statistics that are exposed as properties in the

    ManagedIOCP

    class. You can use them for fine tuning the application during runtime.
    // Current number of threads that are
    // concurrently processing the objects queued
    // onto this instance of Managed IOCP
    // (This is readonly property)
    
    int activeThreads = mIOCP.ActiveThreads;      
    // Max number of concurrent threads
    // allowed to process objects queued onto this
    // instance of Managed IOCP (This is a read/write property)
    
    int concurThreads = mIOCP.ConcurrentThreads;      
    // Current count of objects queued onto this Managed IOCP instance. 
    // NOTE: This value may change very quickly
    // as multiple concurrent threads might 
    // be processing objects from this instance of Managed IOCP queue. 
    // So _do not_ depend on this value
    // for logical operations. Use this only for
    // monitoring purpose (Status reporting, etc.)
    // and during cleanup processes 
    // (like not exiting main thread untill the queued object becomes 0, 
    // i.e. no more objects to be processed, etc)
    // (This is readonly property)
    
    int qCount = mIOCP.QueuedObjectCount;      
    // Number of threads that are registered with this instance of Managed IOCP
    // (This is readonly property)
    
    int regThreadCount = mIOCP.RegisteredThreads;      

3.1. Advanced usage

Following are the advanced features of Managed IOCP that need to be used carefully.

Managed IOCP execution can be paused at runtime. When a Managed IOCP instance is paused, all the threads registered with this instance of Managed IOCP will stop processing the queued objects. Also, if the '

EnqueueOnPause

' property of the

ManagedIOCP

instance is

false

(by default, it is

false

), then no thread will be able to dispatch new objects onto the Managed IOCP instance queue. Calling

Dispatch

on the

ManagedIOCP

instance will throw an exception in the

Pause

state. If the '

EnqueueOnPause

' property is set to

true

, then threads can dispatch objects onto the queue, but you need to be careful while setting this property to

true

, as this will increase the number of pending objects in queue, thus occupying more memory. Also, when the Managed IOCP instance is re-started, all the registered threads will suddenly start processing huge number of objects thus creating greater hikes in system resource utilization.

mIOCP.Pause();      

Once paused, the

ManagedIOCP

instance can be re-started using the

Run

method.

mIOCP.Run();      

The running status of the Managed IOCP instance can be obtained using the

IsRunning

property:

bool bIsRunning = mIOCP.IsRunning;      

One of the core functions of Managed IOCP is to choose a thread for notifying when an object is dispatched to its queue. When Managed IOCP identifies a thread for notifying an object dispatch, it evaluates whether that thread is in a state to process that object. This is done by calling a delegate named '

ChooseThreadPolicy

'. Managed IOCP will pass the

IOCPHandle

instance of the selected thread to the function attached to this delegate. If the function returns

true

, it will notify the thread of an object arrival, otherwise it tries to find another thread and the process will continue until the delegate function returns

true

for one of the registered threads. If none of the registered threads are selected, then Managed IOCP will pick up the first thread in its registered thread list.

Now the interesting part of this technique is that you can attach your own function to this delegate using the Managed IOCP instance method, '

SetChooseThreadPolicy

'. Within your function, you can determine based on your own conditions whether the thread belonging to the

IOCPHandle

instance passed onto your function can be notified/activated to process the objects or not. You can retrieve the

System.Threading.Thread

object of the thread associated with the

IOCPHandle

instance from its property named '

OwningThread

'.

3.2. Demo Application

I provided two demo applications with similar logic. First is implemented using Managed IOCP. Other using native Win32 IOCP. These two demo applications perform the following steps.

  1. Create a global static

    ManagedIOCP

    instance or Native Win32 IOCP.
  2. Create five threads.
  3. Each thread will dispatch one integer value at a time to the

    ManagedIOCP

    instance or Native Win32 IOCP until the specified number of objects are completed.
  4. Start (creates a new set of five threads) & Stop (closes the running threads) the object processing.

Soni.Net (

ManagedIOCP

) demo application additionally demonstrates the following features of Managed IOCP that are un-available in Win32 IOCP:

  1. Pause & Continue the object processing during runtime.
  2. Changing the concurrent threads at runtime.
  3. Statistics like, Active Threads, Max Concurrent threads, Queued object count and Running status of Managed IOCP.

Below is the image showing both the demo applications after their first cycle of object processing:

Managed I/O Completion Ports (IOCP)
Demo application results

As you can see in the above figure, Managed IOCP gives the same speed (slightly even better) as native Win32 IOCP. The goal of these two demo applications is _not_ to compare the speed or features of Win32 IOCP with that of the Managed IOCP, but rather to highlight that Managed IOCP provides all the advantages of native Win32 IOCP (with additional features) but in a purely managed environment.

I tested these two demo applications on a single processor CPU and a dual processor CPU. The results are almost similar, in the sense the Managed IOCP is performing as good as (sometimes performing better than) native Win32 IOCP.

3.3. Source & Demo application files

Below are the details of files included in the article's ZIP file.

  1. Sonic.Net (Folder) - I named this class library as Sonic.Net (Sonic stands for speed). The namespace is also specified as

    Sonic.Net

    . The two classes that I described in this article,

    ManagedIOCP

    and

    IOCPHandle

    are defined within this namespace. The folder hierarchy is described below:
    Sonic.Net
    |
     --> Assemblies
    |
     --> Solution Files
    |
     --> Sonic.Net
    |
     --> Sonic.Net Demo Application      
    The Assemblies folder contains the Sonic.Net.dll (contains

    ManagedIOCP

    and

    IOCPHandle

    classes) and Sonic.Net Demo Application.exe (demo application showing the usage of

    ManagedIOCP

    and

    IOCPHandle

    classes).

    The Solution Files folder contains the VS.NET 2003 solution file for the Sonic.Net assembly project and Sonic.Net demo application WinForms project.

    The Sonic.Net folder contains the Sonic.Net assembly source code.

    The Sonic.Net Demo Application folder contains the Sonic.Net demo application source code.

  2. Win32IOCPDemo (Folder) - This folder contains the WinForms based demo application for demonstrating the Win32 IOCP usage using PInvoke. When compiled, the Win32IOCPDemo.exe will be created in the Win32IOCPDemo/bin/debug or Win32IOCPDemo/bin/Release folder based on the current build configuration you selected. Default build configuration is set to Release mode.

4. Inside Managed IOCP

This section discusses the how and why part of the core logic that is used to implement Managed IOCP.

4.1. Waiting and retrieving objects in Managed IOCP

Managed IOCP provides a thread safe object dispatch and retrieval mechanism. This could have been achieved by a simple synchronized queue. But with synchronized queue, when a thread (thread-A) dispatches (enqueues) an object onto the queue, for another thread (thread-B) to retrieve that object, it has to continuously monitor the queue. This technique is inefficient as thread-B will be continuously monitoring the queue for arrival of objects irrespective of whether objects are present in the queue. This leads to heavy CPU utilization and thread switching in the application when multiple threads are monitoring the same queue, thus degrading the performance of the system.

Managed IOCP deals with this situation by attaching an auto reset event to each thread that wants to monitor the queue for objects and retrieve them. This is why any thread that wants to wait on a Managed IOCP queue and retrieve objects from it has to register with the Managed IOCP instance using its '

Register

' method. The registered threads wait for object arrival and retrieve them using the '

Wait

' method of the

IOCPHandle

instance. The

IOCPHandle

instance contains an

AutResetEvent

that will be set by the Managed IOCP instance when any thread dispatches an object onto its queue. There is an interesting problem in this technique. Let us say that there are three threads, thread-A dispatching the objects, thread-B and thread-C waiting on object arrival and retrieving them. Now, say if thread-A dispatches 10 objects in its slice of CPU time. Managed IOCP will set the

AutoResetEvent

of thread-B and thread-C, thus informing them of the new object arrival. Since it is an event, it does not have indication of how many times it has been set. So if thread-B and thread-C just wake up on event set and retrieve one object each from the queue and again waits on the event, there would be 8 more objects left over in the queue unattended. Also, this mechanism would waste the CPU slice given to thread-B and thread-C as they are trying to go into waiting mode after processing a single object from the Managed IOCP queue.

So in Managed IOCP, when thread-B and thread-C calls the '

Wait

' method on their respective

IOCPHandle

instances, the method first tries to retrieve an object from the Managed IOCP instance queue before waiting on its event. If it was able to successfully retrieve the object, it does not go into wait mode, rather it returns from the

Wait

object. This is efficient because there is no point for threads to wait on their event until there are objects to process in the queue. The beauty of this technique is that when there are no objects in the queue, the

IOCPHandle

instance

Wait

method will suspend the calling thread by waiting on its internal

AutoResetEvent

, which will be set again by the Managed IOCP instance '

Dispatch

' method when thread-A dispatches more objects.

4.2. Compare-And-Swap (CAS) in Managed IOCP

CAS is a very familiar term in the software community dealing with multi-threaded applications. It allows you to compare two values and update one of them with a new value, all in a single atomic thread-safe operation. In Managed IOCP, when a thread successfully grabs an object from the IOCP queue, it is considered to be active. Before grabbing an available object from the queue, Managed IOCP checks if the number of currently active threads is less than the allowed max concurrent threads. In case the number of current active threads is equal to max allowed concurrent threads, then Managed IOCP will block the thread trying to receive the object from the IOCP queue. To do this, Managed IOCP has to follow the logical steps as mentioned below:

  1. Get the new would-be value of active threads (current active threads + 1).
  2. Compare it with the max allowed concurrent threads.
  3. If new would-be value is <= max allowed concurrent threads, then assign the would-be value to the active threads.

In the above logic, step-3 consists of two operations, comparison and assignment. If we perform these two operations separately in Managed IOCP, then for instance, thread-A and thread-B might both reach conditional expression with the same would-be value for active threads. If this value is less than or equal to the max allowed concurrent threads, then the condition will pass for both the threads and both of them will assign the same would-be value for the active threads. Though the active thread count may not increase in this scenario, the actual number of physically active threads will be more than the desired max concurrent threads, as in the above scenario both the threads think that they can be active.

So Managed IOCP performs this operation as shown below:

  1. Get the current value of active threads and store it in a local variable.
  2. Get the new would-be value of active threads (current active threads + 1).
  3. Compare it with the max allowed concurrent threads.
  4. If new would-be value is <= max allowed concurrent threads, then

    CAS(ref activethreads variable, would-be value of active threads, current value of active threads stored in a local variable in step-1)

    . Come out of the method if would-be value is greater than max allowed concurrent threads.
  5. If

    CAS

    returns

    false

    then go to step-1.

In the above logic, the CAS operation supported by the .NET framework (

Interlocked.CompareExchange

) is used to assign the new would-be value to active threads only if the original value of active threads has not been changed since the time we observed (stored in local variable) it before proceeding to our compare and decide step. This way though two threads might pass the decision in step-4, one of them will fail in the CAS operation thus not going into active mode. Below is the active threads increment method extracted from the

ManagedIOCP

class implementation.

internal bool IncrementActiveThreads()
{
    bool incremented = true;
    do
    {
        int curActThreads = _activeThreads;
        int newActThreads = curActThreads + 1;
        if (newActThreads <= _concurrentThreads)
        {
           // Break if we had successfully incremented
           // the active threads
           if (Interlocked.CompareExchange(ref _activeThreads, 
                  newActThreads,curActThreads) != _activeThreads)
               break;
        }
        else
        {
            incremented = false;
            break;
        }
    } while(true);
    return incremented;
}      

I could have used a lock mechanism like

Monitor

for the entire duration of the active threads increment operation. But since this is a very frequent operation in Managed IOCP, it would lead to heavy lock contention and will decrease the performance of the system/application in multi-CPU environments. This technique that I used in Managed IOCP is generally called lock-free technique and is used heavily to build lock-free data structures in performance critical applications.

4.3. Concurrency management in Managed IOCP

Concurrency is one area that native Win32 IOCP excels. It provides a mechanism where the max allowed concurrent threads can be set during its creation. It guarantees that at any given point of time, only max allowed concurrent threads are running, and more importantly, it sees to it that _atleast_ max allowed concurrent threads are _always_ notified/awakened to process completion events, if the number of threads using its IOCP handle is more than the max allowed concurrent threads.

Managed IOCP also provides the above two guarantees with more features like ability to modify the max allowed concurrent threads at runtime, which native Win32 IOCP does not provide. Managed IOCP provides this guarantee using the Compare-And-Swap (CAS) technique in its

Wait

mode, as described in the previous section (4.2). When a thread waits on its

IOCPHandle

instance to grab a Managed IOCP queue object, it first tries to become active by incrementing the active thread count using CAS technique as mentioned in previous section (4.2). It it fails to increment active threads, it means that the number of current active threads is equal to max allowed concurrent threads and the calling thread will go into

Wait

mode. You can see this in the code implementation of the

IOCPHandle::Wait()

method in ManagedIOCP.cs in the attached source code ZIP file.

I could have used Win32 Semaphores to limit the max allowed concurrent threads. But it will defeat the whole purpose of the Managed IOCP, being completely managed, as .NET 1.1 does not provide a Semaphore type. Also I wanted this library to be as compatible as possible with Mono .NET runtime. These are the reasons I did not explore the usage of semaphore for this feature. Maybe I'll take a serious look at it if .NET 2.0 has a Semaphore object.

The second feature of IOCP as described in the beginning of this section is described in more detail in the next section (Dispatching objects in Managed IOCP).

4.4. Dispatching objects in Managed IOCP

Managed IOCP maintains an array of

IOCPHandle

objects for all registered threads. When an object is dispatched to it by any thread, it increments an internal index and picks up the next item (

IOCPHandle

object) in the array. It then sets the

AutoResetEvent

of the

IOCPHandle

object at that index. Before doing that, Managed IOCP tries to evaluate whether the thread at the index can be used to process the object. It does it by checking whether the thread is in waiting mode using its

IOCPHandle

instance's

Wait

method or the thread is running. If so, it sets its

AutoRestEvent

so that the thread wakes up and processes the object if it is waiting on IOCP or if it is running (which means it is not suspended for some reason).

This thread choosing policy is round robin relative to the registered threads, and will choose those threads to activate that are not in suspended mode thus providing a chance for other running threads or threads suspended on IOCP to process objects. Below is the method that is used to choose a thread when an object is dispatched onto Managed IOCP.

private void WakeUpNextThread(int iocpHandleCount)
{
    // Wake up next waiting thread to service a dispatch request
    // only if we are running
    //
    if (_run == true)
    {
        int thIndex = _threadIndex;
        IOCPHandle hIOCP = _regIOCPHandles[thIndex] as IOCPHandle;
        while(true)
        {
            // Check whether we can choose this thread for handling this 
            // dispatch notification
            //
            if (_chThPolicy(hIOCP) == true)
            {
                if (++thIndex >= iocpHandleCount)
                {
                    thIndex = 0;
                }
                break;
            }
            else
            {
                if (++thIndex >= iocpHandleCount)
                {
                    thIndex = 0;
                    break;
                }
                hIOCP = _regIOCPHandles[thIndex] as IOCPHandle;
            }
        }
        _threadIndex = thIndex;
        hIOCP.SetEvent();
    }
}      

This policy can be changed by providing your own thread choose function for the delegate '

ChooseThreadPolicy

'.

This technique provides the second aspect of native Win32 IOCP's concurrency management that guarantees _atleast_ max allowed concurrent threads are _always_ notified/awakened to process queued object, if the number of threads using the Managed IOCP instance is more than the max allowed concurrent threads.

5. Points of interest

Currently, I'm using synchronized

System.Collection.Queue

class as internal object queue for Managed IOCP. This provides a scope for lock contention between threads that are dispatching objects to Managed IOCP and threads that are retrieving objects from Managed IOCP. This lock contention can be reduced and performance of Managed IOCP can be improved further by using a Lock-Free queue. Lock-Free algorithms are complex to design. A more subtle part of them is that it is difficult to verify their correctness. There is a lot of research going on in this field as to their design and verification. I'm studying these algorithms and when I have some good implementation of Lock-Free queue, I'll share the updated Managed IOCP code appropriately in future.

The other data-structures that I use to maintain the list of registered threads with a Managed IOCP instance will not have lock contention as they are mostly used for enumerating their elements. And .NET Collection classes in their synchronized mode provide lock free thread safe reading and enumerations (as far as I know) in multi-threaded/multi-CPU environments.

5.1. Managed IOCP & Mono

I believe that Managed IOCP (Sonic.Net assembly, but _not_ demo applications) conforms to core .NET specifications and can be compiled and used easily on Mono .NET runtime. I request developers working on Mono to try out Managed IOCP on Mono .NET runtime and post any suggestions/comments/inputs to the .NET community.

6. History

Sonic.Net v1.0 (Class library hosting

ManagedIOCP

and

IOCPHandle

class implementation).

7. Software Usage

This software is provided "as is" with no expressed or implied warranty. I accept no liability for any type of damage or loss that this software may cause.

继续阅读