Mac/PC Abstraction, Part 2:
Multithreading

Overview

For this article, I will focus on multithreading, outlining the subset of Win32 and pthread routines needed to create a simple thread control class that can be used on either Windows or MacOS. This uses some of the low-level wrapper routines covered in the previous article, and covers some issues with spawning threads in a way that works the same for both Windows and Mac.

Some multi-platform projects attempt to emulate Win32 routines using pthreads, or to emulate pthreads using Win32 routines. This ends up being an awkward proposition, since the two systems behave differently and expose different types of functionality. A pthread condition variable requires both a condition and a mutex, while Win32 only exposes a single handle for an event. Win32 has PulseEvent and WaitForMultipleObjects, for which there are no equivalents in pthreads. Some libraries go to great lengths to emulate the full set of functionality from one API using the other. In my experience, it is easier and safer to implement a single abstraction that can use either Win32 or pthreads, rather than dealing with code that attempts to emulate Win32 functionality using pthreads or vice versa.

This article takes the approach of defining a single interface class, with both a Win32 version and a pthread version that provide the same behavior on both Windows and MacOS. Only the common subset of Win32 and pthread functionality is exposed — functionality specific to only one platform is ignored to avoid trying to emulate non-native functionality.

Grab this zip file: crossplat.zip. It contains a project that can be built with both DevStudio 7 and Xcode 3. There's quite of bit of extra framework in there for something I've just started working on (which you can ignore), along with some platform-specific abstractions I've used on a number of other projects built on both Mac and PC.

The Thread Function

Although the semantics are different between the platforms, the basic approach to creating and using threads is the same for both Win32 and pthreads.

As an entry point, both APIs require a thread function, which is essentially the thread's equivalent of main(). The operating system will call into the thread function. When the thread function returns, the OS will automatically terminate the thread.

Win32 requires the following prototype for the function that is passed into _beginthreadex:

    unsigned __stdcall QzBaseThreadFunc(void *pContext);

For pthreads, the thread function given to pthread_create needs to have the following prototype:

    void* QzBaseThreadFunc(void *pContext);

Although the return types are different, both systems allow for a void* pointer to be passed into thread function. The programmer can use this pointer to pass any desired block of data to the worker thread, such as a struct filled with parameters. When dealing with C++ code, a common technique is to pass in a this pointer, allowing the thread to be contained inside a class object. However, normal class methods cannot be used for this, since calling a class method involves passing a hidden this pointer.

Since only one pointer can be passed to the thread function, a simple approach is to use two class methods, one of which is static. Since static methods do not have hidden this pointers, they can be used for callbacks. The static method can recast the context pointer to a this pointer, then invoke a normal class method that serves as the real thread function, which allows full access to the class's member variables.

Such a class generally looks like this:

class MyThreadClass
{
public:
    // This is the real thread function, which does the real work.
    void* RealFunc(void)
    {
        while (StillBusy()) {
            DoSomething();
        }
        return NULL;
    }

    // This is a static function, since normal class methods cannot be
    // used to create threads.  The context pointer is a pointer to an
    // instance of MyThreadClass, so all the code needs to do is recast
    // the pointer and pass the function call into the real thread
    // function for this class.
    static void* StaticFunc(void *pContext)
    {
        return reinterpret_cast<MyThreadClass*>(pContext)->RealFunc();
    }

    // Calling this method will spawn a worker thread for this object.
    // We pass in a function pointer to the static function, and the
    // *this pointer so StaticFunc can reference the object.
    void StartThread(void)
    {
        pthread_t hThread;
        pthread_create(&hThread, NULL, StaticFunc, this);
    }
};

Calling the StartThread method will create a new thread, with StaticFunc as the entry point and this as the context pointer. StaticFunc only exists to recast the pointer back to a MyThreadClass pointer and invoke RealFunc. This is a bit of a circuitous way to do things, but the end result is that a class method is used for the thread function, providing an object oriented wrapper for the worker thread.

The downside to this specific example is that the code only works with pthreads. Win32 uses a slightly different prototype for the thread function, so a different function declaration is needed for the current operating system. That could be accomplished by using #ifdefs to provide the correct prototype for the current platform, but that kind of construct can be messy when multiple thread functions are required, and it would require cross-compiling to verify every declaration.

A cleaner implemention is to move the platform-specific prototypes behind an abstraction layer. This allows a single prototype to be used for worker threads across all platforms. It also allows extra platform-specific logic to be added to the thread entry and exit code, as well as the addition of debugging support (assigning names to threads, logging when threads are created and terminated, setting status flags, etc.).

The approach I use with threading is to define a custom function pointer:

    U32 FuncPointer(void *pContext);

This allows a common prototype to be used for both platforms. Microsoft's prototype requires the __stdcall declaration, which Xcode does not like. Meanwhile, pthreads use a void* as the return type, but which is not supported by Win32 (although it could be simulated if required).

A Common Interface

In my library, I use a single class declaration, as defined in QzSystem.h. This allows the app to use a single construct for interacting with threads, providing a common set of routines to start, stop, signal, and wait on a thread. More importantly, this class behaves the same, regardless of whether the Win32 or pthread implementation is being used.

class QzThreadControl
{
private:
    void* m_pContext;

public:
    QzThreadControl(void);
    ~QzThreadControl(void);

    bool StartThread(U32 (*func)(void*), void *pParams, const char threadName[]);
    void StopThread(void);
    U32  WaitForClosure(void);
    U32  WaitForEvent(U32 milliseconds = 0xFFFFFFFF);
    void SignalThread(void);

    U32  GetID(void);
    bool IsThreadAlive(void);
    bool TestStopFlag(void);
};

The class hides its implementation by using a custom structure for each implementation. This structure is dynamically allocated, and stored as a void* pointer. The definition of the structure only exists in the platform-specific CPP file, and requires that the internal methods cast this pointer back to the actual type before accessing any of the data hidden in the struct. Needing to recast the pointer is a bit ugly, but it keeps the definition of the struct hidden from the rest of the app.

Internal Structs

For Win32, the custom structure is defined as follows:

struct QzThreadContext_t
{
    char          ThreadName[32];
    U32           ThreadID;
    HANDLE        hThread;
    volatile bool StopRequested;
    volatile bool StopCompleted;
    void*         pParams;
    U32           (*Func)(void*);
    QzSyncEvent   SignalEvent;
};

This struct stores all of the information required for the thread. These values are used both from the worker thread itself for information and communication, and from the main thread to signal to the worker thread and to detect when the thread has terminated.

ThreadName
For debugging purposes, an ASCII string is assigned to the thread. Under Win32, this is used to name the thread at the kernel level, so the name will appear in system-level debugging messages. For MacOS, this is just used in logging message. In either case, assigning a unique string to each thread makes it much easier to identify an arbitrary thread when breaking in the debugger.
ThreadID
On Windows, each thread is assigned a unique ID, which is displayed by low-level debugging messages. This is really just for informational and debugging purposes. Thread creation will log the ID in the struct for reference when debugging. For MacOS, this serves no purpose, but is emulated to expose the same functionality on both platforms. The ID is useful enough on Windows that it is worth emulating on MacOS. (If you're familiar enough the kernel internals, you could store a task ID here, but I have not needed to do this on MacOS.)
hThread
The thread handle is used to uniquely identify the thread. Whereas the thread ID is just informational, the handle is used in function calls to access the thread. There are a number of Win32 functions that can touch a thread, but the only one this code cares about is for detecting whether the thread has terminated. MacOS also provides a thread handle (although it uses a different type than Win32), which is needed when calling some pthread functions.
StopRequested
This flag is used by the waiting logic to test whether the app has requested that the thread terminate.
StopCompleted
Once the thread exits, this flag is set so that the main thread is able to test the exit status of the thread. Although the main thread can wait for the worker thread to terminate, this flag allows the main thread to poll the status without executing an infinite wait.
pParams
This is a generic void* context pointer that is provided by the app when creating a thread. This one is called pParams since there is already an m_pContext pointer being used.
Func
This is the static function pointer that is used for the worker thread's main function. The pParams value will be passed into this function as the void* context pointer.
SignalEvent
This event is used to wake up a waiting worker thread. It is used for both internal signalling to wake up a thread that needs to terminate, and to wake up the thread in the general case to process data.

A further point on using events with worker threads: Win32 programmers often use WaitForMultipleObjects when communicating with a worker thread. This allows multiple events to be used: one to indicate that the thread should terminate, another indicates that data on a queue needs to be processed, and as many as 62 other events could be used to indicate other logic states.

The problem with this is twofold. First, any logic being driven by WaitForMultipleObjects must be coded very carefully, or it is possible that some events would be dropped, or for the code to not detect multiple signals to the same event.

The second — and more important — problem is that pthreads do not have any concept of multiple events. It is possible to jerry-rig a form of multiple wait using pthreads, but the reliability of any such implementation would be questionable. I personally have never seen a reliable implementation. The much safer approach is to simply never use multiple-wait logic. Use only one event to signal the worker thread.

Sometimes you can get away with using a few volatile status variables to indicate why a thread is being signalled. However, the safest approach is to define a message queue, with all information to the worker thread going through the queue. After a message has been appended to the queue, use the QzThreadControl::SignalThread() method to wake up the thread. If messages need to be returned, use a separate response queue that the main thread can poll. Obviously, any such queues would need to be thoroughly protected by critical sections (mutexes).

There are also various non-blocking wait techniques for inter-thread communication, but those are beyond the scope of this article.

Now, for comparison, here is the struct I use for MacOS:

struct QzThreadContext_t
{
    char          ThreadName[32];
    U32           ThreadID;
    pthread_t     hThread;
    volatile bool StopRequested;
    volatile bool StopCompleted;
    void*         pParams;
    U32           (*Func)(void*);
    QzSyncEvent   SignalEvent;
};

This is almost identical to the Win32 struct. The only difference is hThread. Win32 uses a handle (literally, just a void*) for thread handles, whereas pthreads define a specific variable type. Depending on the platform, pthread_t may be a 32 or 64-bit integer, or it may be a pointer.

Note: You could define hThread as a void* pointer, then put the struct definition in the header file and reuse it for both Win32 and MacOS, but this would not necessarily be reusable for pthread implemenations on other platforms. If you've ported code to work on OS X, you're half way to supporting Linux. Keeping the specifics of the implementation in the CPP file will future-proof the code for working with other versions of pthreads, as well as any future non-backwards-compatible changes Apple makes to pthread_t in future versions of their OS.

For both Win32 and pthreads, the class constructor is the same: all it needs to do is allocate a struct, initialize all of the variables to zeroes (using the correct semantic symbols), and create the event that will be used to signal the thread.

QzThreadControl::QzThreadControl(void)
    :   m_pContext(NULL)
{
    QzThreadContext_t *pTC = new QzThreadContext_t;
    pTC->ThreadName[0] = '\0';
    pTC->ThreadID      = 0;
    pTC->hThread       = NULL;
    pTC->StopRequested = false;
    pTC->StopCompleted = false;
    pTC->pParams       = NULL;
    pTC->Func          = NULL;
    pTC->SignalEvent.Create();

    m_pContext = pTC;
}

Base Thread Function

A separate thread function is defined for each platform. This function, QzBaseThreadFunc, is the actual function pointer that will be passed into the OS call to create the thread. This is little more than a hook function, which passes control to the app-provided function pointer. By using a hook function, we gain two benefits: a common function pointer type is exposed to the rest of the app (hiding the platform-specific prototype), and we can have the hook function to do some set-up before calling into the worker function, then any clean-up when the function exits.

Here is the Win32 version of the hook function (minus some logging code):

static unsigned __stdcall QzBaseThreadFunc(void *pContext)
{
    QzThreadContext_t *pTC = reinterpret_cast<QzThreadContext_t*>(pContext);

    // Apply the name to the worker thread.  This name will be
    // displayed in some Win32 debugging messages, and can be
    // accessed in the debugger to help figure out which thread
    // is currenting being viewed.
    QzSetThreadName(pTC->ThreadName);

    // Invoke the static thread function defined in higher-level
    // code.  This function call will not return until the thread
    // has finished.
    unsigned result = pTC->Func(pTC->pParams);

    // The last thing we do is set this volatile flag, indicating
    // that the thread has terminated normally.
    pTC->StopCompleted = true;

    // Return the result code from pTC->Func().  This gets passed
    // through to Win32, which will write a termination message
    // to debug output.
    return result;
}

The MacOS hook function is almost identical:

static void* QzBaseThreadFunc(void *pContext)
{
    QzThreadContext_t *pTC = reinterpret_cast<QzThreadContext_t*>(pContext);

    // Invoke the static thread function defined in higher-level
    // code.  This function call will not return until the thread
    // has finished.
    unsigned result = pTC->Func(pTC->pParams);

    // The last thing we do is set this volatile flag, indicating
    // that the thread has terminated normally.
    pTC->StopCompleted = true;

    // Return NULL, since the return value is not important to
    // pthreads.  It is possible for the app to retrieve the
    // return pointer, but that logic is specific to pthreads.
    // Since it does not generalize to Win32, we cannot do
    // anything useful with this pointer.
    return NULL;
}

The Win32 version will set the thread's kernel-level name. Since MacOS does not have any capacity to name threads (or at least none I can find), it just ignores the thread name. The MacOS version also ignores the return result from the thread function, since the integer return type cannot be mapped to a pointer (the return value is only intended for logging purposes).

Starting Threads

Under Win32, a thread is created as follows:

bool QzThreadControl::StartThread(
            U32 (*func)(void*),
            void *pParams,
            const char threadName[])
{
    QzThreadContext_t *pTC = reinterpret_cast<QzThreadContext_t*>(pContext);

    // Do not attempt to start the thread if one is already running.
    if (NULL != pTC->hThread) {
        return false;
    }

    // The thread name needs to be ASCII, so we need to use standard string
    // copy functions instead of UTF-8 routines.  And since strncpy() is not
    // safe, we must explicitly make certain the copied string is terminated.
    strncpy(pTC->ThreadName, threadName, ArraySize(pTC->ThreadName));
    pTC->ThreadName[ArraySize(pTC->ThreadName)-1] = '\0';

    pTC->StopRequested = false;
    pTC->StopCompleted = false;
    pTC->pParams       = pParams;
    pTC->Func          = func;

    // Use a temporary variable to hold the thread ID, since strict
    // compiling will complain about pointer types if we attempt to
    // pass in a pointer to any other integer type.
    unsigned int id = 0;
    pTC->hThread  = reinterpret_cast<Handle_t>(_beginthreadex(NULL, 0, QzBaseThreadFunc, m_pContext, 0, &id));
    pTC->ThreadID = id;

    // NOTE: _beginthreadex returns NULL if it cannot create the thread
    // (as opposed to _beginthread, which returns the inconsistently
    // used INVALID_HANDLE_VALUE value if there is an error)

    return (NULL != pTC->hThread);
}

Also, be wary of the INVALID_HANDLE_VALUE symbol. _beginthreadex will return NULL if there is an error. Since it takes a lot of effort to force thread creation to fail, any error handling code you put in place is almost certainly never going to be exercised by test code, so you may never realize that the wrong value is being used when testing for errors. Confusion about when to use INVALID_HANDLE_VALUE is common among Win32 programmers (thanks to Microsoft's inconsistent use of it), and a lot of code has been written over the years with _beginthread, only later to have it changed over to using _beginthreadex, without careful checking of the return symbol.

Make certain any error handling code you write is testing for NULL.

In comparison, here is the MacOS version of the code:

bool QzThreadControl::StartThread(
            U32 (*func)(void*),
            void *pParams,
            const char threadName[])
{
    QzThreadContext_t *pTC = reinterpret_cast<QzThreadContext_t*>(pContext);

    // Do not attempt to start the thread if one is already running.
    if (NULL != pTC->hThread) {
        return false;
    }

    // The thread name needs to be ASCII, so we need to use standard string
    // copy functions instead of UTF-8 routines.  And since strncpy() is not
    // safe, we must explicitly make certain the copied string is terminated.
    strncpy(pTC->ThreadName, threadName, ArraySize(pTC->ThreadName));
    pTC->ThreadName[ArraySize(pTC->ThreadName)-1] = '\0';

    pTC->StopRequested = false;
    pTC->StopCompleted = false;
    pTC->pParams       = pParams;
    pTC->Func          = func;
    pTC->ThreadID      = QzThreadSafeIncrement(&g_NextThreadID);

    pthread_create(&(pTC->hThread), NULL, QzBaseThreadFunc, m_pContext);

    return (NULL != pTC->hThread);
}

This is almost identical to the Win32 implementation, with two differences. The first is that we have to manufacture the ThreadID value, since pthreads do not have an equivalent value. We only do this to maintain compatibility with Win32's logic, so the thread ID can be written to the log and exposed to any code that needs to fetch it for testing the ID value. (It is possible to substitute a low-level process ID, if you are familiar enough with the Linux routines, but this information is not exposed by pthreads.)

The second difference, of course, is the call to pthread_create to create the worker thread.

Signalling

Signalling a thread is a common operation. This is done any time the thread needs to wake up and process data. A common case is when using a message queue to send commands to the thread: after new messages have been pushed onto the queue, the thread is signalled to make sure it wakes up and processes all of the data in the queue.

There is already an abstraction for events, so the main thread only needs to signal that event after appending a message to the queue (or by whatever other technique is used to pass data to the worker thread for processing).

void QzThreadControl::SignalThread(void)
{
    QzThreadContext_t *pTC = reinterpret_cast<QzThreadContext_t*>(pContext);

    pTC->SignalEvent.Signal();
}

The worker thread would sit in a loop, waiting for events to process (see the example code at the end of this article for one possible way to implement the body of a worker thread that waits for events). It calls WaitForEvent to handle the waiting, which will return one of three results: signalled, timed out, or a stop request. The stop request indicates that the thread function needs to break out of its loop, clean up, and return so that the thread can be terminated.

The distinction between signalled and timed out is less important. Although signalled indicates that there is some kind of event to process, this is not a guaranteed state. Due to concurrency, it is possible that the "new" work to process was already processed on the previous iteration of the loop, before the event was signalled. Or the thread's wait may time out just before the main thread signals the thread to wake up, so new data is available when a time-out occurs. These conditions may not be likely, but they will occur. It might happen once every few seconds or only once a week, depending on the vagaries of timing in your app. As such, the simplest approach is to treat signalled and timed out as the same state: always process some work when the thread wakes up, regardless of the reason (unless a termination request has been received).

Since abstractions hide the platform-specific details, WaitForEvent is implemented the same on both platforms.

U32 QzThreadControl::WaitForEvent(U32 milliseconds)
{
    QzThreadContext_t *pTC = reinterpret_cast<QzThreadContext_t*>(pContext);

    // Test the termination flag before waiting.
    if (pTC->StopRequested) {
        return QzSyncWait_StopRequest;
    }

    U32 result = pTC->SignalEvent.Wait(milliseconds);

    // Test the termination flag once more after the wait completes.
    // We have no idea how long the wait required, so the flag may have
    // been set in the interrum.  This requires that the flag be volatile
    // to avoid memory caching problems.
    if (pTC->StopRequested) {
        return QzSyncWait_StopRequest;
    }

    return result;
}

All the code really needs to do is check whether the termination flag is set. If not, it needs to wait for the sync event to become signalled.

Stopping Threads

Stopping a thread works the same as signalling the thread. The difference is that the termination flag needs to be set before signalling the sync event.

void QzThreadControl::StopThread(void)
{
    QzThreadContext_t *pTC = reinterpret_cast<QzThreadContext_t*>(pContext);

    // First, set this volatile flag.  WaitForClosure() will test this
    // flag to determine whether the thread should terminate.
    pTC->StopRequested = true;

    // Then we signal this event.  This will wake up the worker thread
    // so it can check the status of the StopRequested flag.
    pTC->SignalEvent.Signal();
}

By setting the termination flag, WaitForEvent will detect the termination request and return the stop request enum value.

The final step in terminating a thread is to wait for the thread to enter a fully terminated state. Win32 and pthreads both provide functions that can be used to wait for the thread to stop. The difference between them is that pthread_join will never return unless the thread terminates. However, Win32 exposes thread termination as another event that can be tested by WaitForSingleObject. Since it is possible to provide a short timeout for WaitForSingleObject, the Win32 API allows an app to periodically poll whether a thread has terminated, then go do something else if the thread is still running.

Since pthread_join will wait forever, a generic thread wrapper class should adopt the "wait forever" approach even when using Win32 threads. This is a good approach, since many programmers (myself included) have used this timeout period to assume that a thread is hung, and then proceed with shutting down the app. The danger is that a still-running thread can wake up during the app shut down sequence and attempt to access resources that are being deleted, causing the app to crash while shutting down. Some users may not notice or care about this condition, but any good SQA group will detect this and write it up in the bug database.

The problem is that if it app takes a long time to shut down, you may feel inclined to use a short wait so the app shuts down faster when there is a very high CPU load. This is a case where SQA or management may try to browbeat you into reducing the wait so it times out after a short period of time, allowing the app shut down faster when being subjected to stress tests. This only results in more bug reports from SQA when the app periodically crashes while shutting down, and you have to choose between "shutdown is slow" versus "sometimes crashes when shutting down". Been there, done that, resolved the bug reports.

There is no good answer to a worker thread that deadlocks or crashes. Waiting forever for the thread to terminate can cause the whole app to deadlock. Abandoning the wait so the app can terminate may give a thread enough time to wake up and access deleted resources. Neither choice is good, and either way, the app must shut down or risk letting things get worse. The only real answer is to never let worker threads crash or deadlock — in other words, the answer to that problem is not something this article can address.

The implementation for WaitForClosure is almost identical for Win32 and MacOS.

The difference is that once WaitForSingleObject has returned, we need to call CloseHandle to release the handle, otherwise the handle will not be freed, causing a minor resource leak that could result in the app running out of handle values if an extremely large number of threads are created and destroyed over the lifetime of the app.

U32 QzThreadControl::WaitForClosure(void)
{
    QzThreadContext_t *pTC = reinterpret_cast<QzThreadContext_t*>(pContext);

    U32 result = QzSyncWait_Signalled;

    // WaitForClosure() may be called more than once to assure
    // that a thread is really stopped, as part of fall-back
    // error handling when shutting down.  So this pointer is
    // NULL after the thread has been stopped.
    //
    if (NULL != pTC->hThread) {
        U32 flag = WaitForSingleObject(pTC->hThread, INFINITE);

        // Once the thread is done executing, we need to close
        // the handle to release the associated resources, and
        // so the rest of the logic in this class can detect
        // that the thread is has finished.
        CloseHandle(pTC->hThread);
        pTC->hThread = NULL;

        if (WAIT_OBJECT_0 == flag) {
            result = QzSyncWait_Signalled;
        }
        else if (WAIT_TIMEOUT == result) {
            result = QzSyncWait_Timeout;
        }
        else {
            result = QzSyncWait_Error;
        }
    }

    return result;
}

The pthread version of WaitForClosure is nearly the same, except that it uses pthread_join to wait for the thread to terminate. Since pthread_join is inherently an infinite wait, we don't need to worry about timing out.

U32 QzThreadControl::WaitForClosure(void)
{
    QzThreadContext_t *pTC = reinterpret_cast<QzThreadContext_t*>(pContext);

    U32 result = QzSyncWait_Signalled;

    // WaitForClosure() may be called more than once to assure
    // that a thread is really stopped, as part of fall-back
    // error handling when shutting down.  So this pointer is
    // NULL after the thread has been stopped.
    //
    if (NULL != pTC->hThread) {
        // This gets assigned the void* pointer that is returned by
        // the function pointer that was passed to StartThread.
        void *pReturnedVoid = NULL;

        // Wait for the thread to terminate.
        S32 result = pthread_join(pTC->hThread, &pReturnedVoid);

        pTC->hThread = NULL;

        if (0 == result) {
            result = QzSyncWait_Signalled;
        }
        else {
            result = QzSyncWait_Error;
        }
    }

    return result;
}

Example Thread

An example worker thread is provided in the QzTest project (consult the TestThread.cpp source file for more comments). This example is simple, but demonstrates how all of the above code is used by a worker thread. All the thread does is wake up every 250 milliseconds, increment a counter, then go back to sleep. Visually, you will see that the thread is working because there is a little spinning character at the top of the screen.

class TestThread
{
private:
    volatile S32 m_Counter;

    QzThreadControl m_Thread;

public:
    TestThread(void)
        :    m_Counter(0)
    {
    }

    ~TestThread(void)
    {
        // Always make certain that the worker thread has been terminated.
        DestroyWorkerThread();
    }

    void CreateWorkerThread(void)
    {
        m_Thread.StartThread(StaticThreadFunc, this, "ThreadFunc");
    }

    void DestroyWorkerThread(void)
    {
        if (false == m_Thread.IsThreadAlive()) {
            return;
        }

        // Request the thread to stop.
        m_Thread.StopThread();

        // Now wait for the thread to wake up, process the
        // termination request, and exit.
        if (QzSyncWait_Signalled == m_Thread.WaitForClosure()) {
            LogMessage("ThreadFunc is dead");
        }
        else {
            LogErrorMessage("ThreadFunc did not terminate");
        }
    }

    static U32 StaticThreadFunc(void *pContext);
    {
        TestThread *p = reinterpret_cast<TestThread*>(pContext);
        return p->ThreadFunc();
    }

    U32 ThreadFunc(void)
    {
        for (;;) {
            // Set the time-out duration so the thread will wake up
            // and rotate through all four states once every second.
            U32 result = m_Thread.WaitForEvent(250);

            if (QzSyncWait_Error == result) {
                LogErrorMessage("ThreadFunc wait failed");
                break;
            }

            if (QzSyncWait_StopRequest == result) {
                LogMessage("ThreadFunc detected termination request");
                break;
            }

            // Cycle through four possible counter values.
            m_Counter = (m_Counter + 1) % 4;
        }

        return 0;
    }

    S32 GetCounter(void)
    {
        return m_Counter;
    }
};

Final Warnings

Never call _endthreadex. It will kill the thread without allowing the stack to unwind. This prevents the destructors for stack objects from being called, which can result in memory and resource leaks. Unfortunately, some of the MSDN documentation shows this function being used to terminate a thread, which has resulted in many programs getting written that make this mistake. (Of course, if the thread function does not have any local objects, then it is possible for this to be a safe way to terminate a thread, but as a rule, avoid this function. If you are writing C++ code, local objects are almost guaranteed to be added to the thread function at some point — even if avoided in the initial version of the code, someone maintaining the code can easily add them without realizing the problems that will result.)

Do not use _beginthread to create a thread. First, _beginthread will close the thread handle when the thread exits, which prevents the app from detecting when the thread has terminated via WaitForSingleObject. More subtly, _beginthread may not return a valid handle to begin with, if the worker thread can terminate fast enough. The odds of this second case happening are very small, but possible — the worker thread needs to start, run, and exit before the call to _beginthread returns.

Never use the Win32 CreateThread function. This suffers from potential memory leaks due to standard library functions. Certain functions (such as strtok) require thread local storage to store state information between calls. This TLS will only be allocated the first time one of these standard functions is called, but it will not be freed when the thread terminates. The size of the leak is small, but will add up over the lifetime of the app if a very large number of threads are created and destroyed. This problem does not exist with _beginthreadex; use that function instead of CreateThread.