Symmetric Multi-Processing on Haiku – A Programming Example

By Andrew Hudson

Multiple Threads and Multiple Cores

BeOS and Haiku were designed from the start to run multiple threads on multiple cores.  The BeOS application framework was designed to run multiple threads from its inception and the first workstation that ran BeOS had 2 processors.  Some 20 years after BeOS was first released PCs with multiple cores are now common place, but applications are still struggling to effectively make use of extra cores.  This article will explore one coding technique that allows you to easily use all cores.

First of all, what is a thread? Wikipedia defines a thread as, “the smallest sequence of programmed instructions that can be managed independently by a scheduler”.  One or more threads can run within a process. A process is the basis for multi-tasking and time-sharing. A process has a context or process state, which consists of all the CPU registers, the code, the data, the call stack, open files, etc.  Threads run within a process and share process resources such as variables, memory, open files. 

Threads are useful for managing I/O. Why have your whole application process lock up while waiting for input from a device or a user?  Threads in Haiku make the user interface very responsive, even when the system is under load.

Thread Functions

Threads under Haiku can be explicitly created, controlled, and deleted with a variety of functions:

spawn_thread() – spawn a new thread

find_thread() – look for a thread using its name

get_thread_info() – retrieve info on a thread

resume_thread() – start a thread and return to process, thread runs asynchronously

wait_for_thread() – start a thread and wait for it to complete, thread runs synchronously

snooze() – put a thread to sleep for a time

suspend_thread() – suspend execution of a thread

exit_thread() – kill the calling thread

 kill_thread() – kill some other thread

Additional  functions can be found in the BeBook.

The Challenges of Thread Programming

Usually programming with threads means very careful attention to a whole variety of issues. Synchronization, race conditions, deadlocks, variables going out of scope, are just some of the many things that can cause bugs. Multi-threaded programming bugs can be very subtle. A program might run fine on your development system, but become intermittently erratic on another PC.  Or it might run fine 19 out of 20 times, and deliver an incorrect result once out of 20 runs.  Multi-threaded programs can also be difficult to debug because the debugger can alter the threading state, changing the conditions that caused the bug. Fortunately, few of these conditions apply to this example.

Haiku’s New Scheduler

In February of 2014, Haiku updated its scheduler. Up to that point, BeOS and Haiku had been limited to a maximum of 8 cores. The new scheduler allows you can use up to 64 cores. As someone interested in graphics, multiple cores, and Haiku, I thought it would be fun to write a Haiku application that would use multiple cores and create some fun graphics. This was the start of BeTracer, a ray tracer using symmetric multi-processing.

Using an Old Technique - Mandelbrot

Mandelbrot is a nice demo that ships with Haiku and nicely shows off symmetrical multi-processing for 2 cores. If you haven’t tried it recently, watch carefully the next time you run it. You will notice that the screen updates two separate rows at a time. Mandelbrot uses a simple technique for multiple threads. It does this by duplicating the same rendering code into two functions, and calls one for even lines and the other for odd lines.

 

void TShowBit::mand(double vx, double vy, double sx, double sy)

{

    vvx = vx; vvy = vy; ssx = sx;

    t1_done = 0; t2_done = 0;

 

    precompute(vx, vy, sx, sy);

   

    resume_thread(spawn_thread(__calc1, "calc1", B_NORMAL_PRIORITY, NULL));

    resume_thread(spawn_thread(__calc2, "calc2", B_NORMAL_PRIORITY, NULL));

    busy = TRUE;

}

__calc1 and __calc2 are pointers to functions that are identical. Line for line they are the same code.

The result is that the Mandelbrot demo runs twice as fast when you have 2 cores. The downside is that to support additional cores, you have to duplicate the rendering code that many more times. As an exercise I rewrote the demo to use 4 functions and use 4 cores. It did work, but it does not scale well and it is not elegant. It’s not how the professionals do rendering.

Using a BeOS Pattern

Haiku is a great platform to learn programming. There are a lot of great programming resources to be found with a quick search. I highly recommend browsing through the old Be Newsletters, and looking through the old BeOS sample code. There are plenty of code samples with explanations by the original BeOS team. While browsing through the old Be Newsletters, I came across a bare-bones C++ class that would launch multiple threads. When I read it, a light bulb went off in my head. This was what I needed to implement symmetric multi-processing code.

The class is called ThreadPrimitive and it packs a lot into one class:

class ThreadPrimitive {

public:

 

  ThreadPrimitive(int32 priority = B_LOW_PRIORITY, const char *name = 0):  scanThread(-1), priority(priority), name(name)

    {

    }

 

  virtual ~ThreadPrimitive(){

      if (scanThread > 0) {

        kill_thread(scanThread);

        ASSERT(!"should not be here");

      }

    }

 

  void Go()

    {

      scanThread = spawn_thread(&ThreadPrimitive::RunBinder,

        name ? name : "UntitledThread", priority, this);

      resume_thread(scanThread);

    }

 

  virtual void Run() = 0;

 

private:

 

  static status_t RunBinder(void *castToThis)

    {

      // In this call we do the dirty casting work, making

      // the rest of the interfaces fully typed and clean

 

      ThreadPrimitive *self = (ThreadPrimitive *)castToThis;

      self->Run();

 

      return B_OK;

    }

 

protected:

  thread_id scanThread;

  int32 priority;

 

private:

  const char *name;  // only valid in the constructor and in

                     // the Go call

};

 

ThreadPrimitive is an abstract base class and has two important aspects. The constructor and Go()  method set up the thread context, and Run() is subclassed with code for useful work.  In other words, your code inherits from ThreadPrimitive, and your code goes in Run().  RunBinder() takes Run(), casts it to a function pointer, and which is passed to the thread for execution. In this way you can call your own code with as many threads as you want with whatever parameters you need. This is what I used for the basis of my code.

Adding Threads To Ray Tracing

Ray tracing is a very cool technique used for rendering photorealistic 3D scenes. It is based on an algorithm for firing off light rays from a source, bouncing them off 3D objects, and tracing the ray of light back to a point of view. It is considered to be an “embarrassingly easy” algorithm to parallelize.  This means it does not require any of the usual methods used to synchronize and control threads. Some of those techniques are discussed here.  Raytracing was first made popular with the famous juggling balls demo on the Amiga.

 I didn’t want to re-invent the ray tracing wheel, as it were, so I began looking at source code I could use for ray tracing. I looked at, ported, and discarded 5 implementations before settling on Micha Riser’s Ray-Tracer code. This was after stumbling across Micha’s long lost BeOS implementation of POV3.1. 

Micha’s code was well-written, easily ported, and single threaded. The key to converting this code to multiple threads was to map an equal portion of each image rendering onto each thread. This was done by parameterizing each thread call with a sub class named RenderThread.  RenderThread would know how big the image was, how many threads there were, and parcel an equal piece of the image to each thread. The scheduler would then run each thread on a separate core. The cool thing about multi-threaded ray tracing is that you watch each thread render its results to the screen in real time. Each additional processor you add means the rendering is that much faster.

 

From MainWindow.cpp, we start a thread for each core, pass in enough parameters for processing:

for (i=0; i<NumThreads; i++){

    RenderThread::Launch(scene, pout, XRes, YRes, Xloc, Yloc, Zloc,

        NumThreads, i, (uchar *)theBitmap->Bits(),

        B_LOW_PRIORITY, "thread");

}

               

    From RenderThread.cpp, the inner loop of rendering, for each thread:

//do rendering    

void RenderThread::Run(){

     int y,x;

     int Ychunk, Ystart, Yend;

     uchar *b0;

     b0 = MyBits;

   

     // outer loop

     // Calculate offsets for thread image portion

     Ychunk = YRes / MyTotThreads;

     Ystart = MyWhichThread * Ychunk;

     Yend = Ystart + Ychunk;

 

     for (y=Ystart; y<Yend; y++){

           b0 = MyBits + (y*XRes*4); // pointer into thread’s section

           for (x=xloc; x<XRes+xloc; x++){

           // inner loop

               // create a point of view vector of X,Y,Z

                Vector3 pos( ((DBL)x+((DBL).5))/XRes, ((DBL)1)-((DBL)y+yloc+((DBL)0.5))/YRes, (DBL)zloc);

 

               // trace the ray from light to POV

                ColourA res = imageplane->evaluateAt(pos);

 

               // once calculated, update bitmap with R,G,B

               *b0++ = (uint)((std::min(res.c[2],(CLR)1)*255));

               *b0++ = (uint)((std::min(res.c[1],(CLR)1)*255));

               *b0++ = (uint)((std::min(res.c[0],(CLR)1)*255));

               *b0++;

     }

 

     // redraw image at the end of every row

     // lock the bitmap so only one thread updates the screen

     // at a time

     theBitmap->Lock();

     povView->LockLooper();

     povView->Invalidate(BRect(0,y,x,y));

     povView->UnlockLooper();

     theBitmap->Unlock();

 

     }

     delete this;

 }

Adding a GUI

I added a simple GUI to the ray tracing code to allow choices for screen size, number of threads, scene selection, basic motion control, and to display the image. The technique for display is to create a single bitmap that would be updated by each thread. Normally you would not want multiple threads updating an object willy-nilly. This would be a source of contention, resulting in lost writes. Normally you would need to enforce orderly updates using semaphores and locks. But in this case each thread is guaranteed to update only its section of the bitmap. For efficiency sake the bitmap is only redrawn at the completion of each line.

Screen Shot 1 – Basic Interface

 

Screen Shot 2 – Shows rendering with 4 threads

Future Directions: Real Time Rendering, Cloud Computing, GPU

Micha’s code was a great place to start. It had the benefits of being portable and very modular. What it didn’t have was the ability to read in scene files.  It would be nice to create additional 3D scenes and import them. The code is also not the fastest. Other code such as Intel’s Embree  code are faster, and bring the promise of real-time rendering within reach.

Another direction is to de-couple the GUI from the rendering, and implement a client/server approach. The GUI would act as a front end client that requests the rendering from another, presumably faster server. The server would render the image and return portions of the image when completed. This is how commercial rendering is done, with thousands of servers sharing the rendering.  By running Haiku ray trace servers remotely, in  Amazon AWS, for instance, it would be possible to introduce Haiku to cloud computing.

It should also be possible to add code that uses the processing power of a video card, or GPU, to compute the ray tracing.

The method to assign image portions to threads in BeTrace is the simplest possible. The down side is that it is not always efficient. A thread could have a portion of an image with little to render. It will finish early and the core will go idle.  A smarter subdivision method would assign smaller portions of the image for more efficient overall usage of CPU. For instance, on a line by line basis.

POV-Ray and Blender are two open source 3D development applications. It would be great to port them to Haiku.

BeTracer source code is provided in a link in the appendix.

Appendix

 

Wikipedia Definition of Threads

https://en.wikipedia.org/wiki/Thread_(computing)

Be Newsletters

https://www.haiku-os.org/legacy-docs/benewsletter/

Haiku Scheduler

https://www.haiku-os.org/blog/pawe%C5%82_dziepak/2013-12-20_haiku_meets_9th_processor

Fun with threads

What Makes Multi-threaded Programming Hard?

http://www.futurechips.org/tips-for-power-coders/parallel-programming.html

Haiku Mandelbrot Code

https://github.com/luciang/haiku/tree/master/src/apps/mandelbrot

Archived Be Newsletters

https://www.haiku-os.org/legacy-docs/benewsletter/Issue3-32.html

Slider code

https://www.haiku-os.org/legacy-docs/benewsletter/Issue3-9.html

More slider code, Dynagraph sample code

https://www.haiku-os.org/legacy-docs/benewsletter/Issue3-22.html

Intel Embree Optimized Ray Tracing Kernels

http://embree.github.io/

POV-Ray

http://www.povray.org/download/

Blender

https://www.blender.org/download/

BeTracer Source Code

https://onedrive.live.com/redir?resid=23646DEFC0827C85!33167&authkey=!AC9ascWSzjh5Nik&ithint=file%2ctar

A Programmers Introduction to Haiku

http://www.osnews.com/story/24945/A_Programmer_s_Introduction_to_the_Haiku_OS

Writing Applications for Haiku

http://www.osnews.com/story/22903/Writing_Applications_for_Haiku