This project is at its core a System Programming example, demonstrating the use of shared memory, semaphores, signal handling and pthreads in c. It consist of an "ImageWriter" program and a "PixelGenerator" program. The "PixelGenerator" continuously calculates images of the Mandelbrot set and writes them into a shared memory segment. The "ImageWriter" program reads the image data out of the shared memory segment and stores the image in a P6 ppm file.
The shared memory segment can hold one image at a time. The "PixelGenerator" and "ImageWriter" are synchronized by semaphores to avoid simultaneous access of the shared memory segment.
The "PixelGenerator" generates an image of the Mandelbrot set an alters start parameters each time a new images is calculated to zoom into a section of the Mandelbrot set.
The computation of the Mandelbrot set can be done by multiple threads. The number of threads to process the image, as well as the width and height of the image can be changed. Depending on the specified number of threads the computation of one image is either done by one thread or split on several threads. For example: If the number of threads is set to 4, every thread generates a quarter of the image. If image width is set to 800- and image height to 600 pixels, thread number one will calculated image row 0 to 149, thread number two will process line 150 to 299, thread three 300 - 449 and thread four row 450 to 599.
This project has been extended to use the OpenMP library or the OpenCL framework to calculate the image in parallel instead of using pthreads. Using OpenMP the number of threads is set automatically depending on the number of threads supported by the CPU. The processing of the image will be divided on the the available threads automatically. Using OpenCL it can be specified if the calculation of the image should be done by the CPU or the GPU.
The processing of the image with use of phtreads or the OpenMP library happens concurrently - without context switching - if the CPU has enough cores to execute all of the specified threads at the same time.
At least for the version of the program using pthreads for parallelization each thread can only calculate one pixel at a time. Making use of SIMD "Single Instruction Multiple Data" Intrinsics however, each thread can process multiple pixels at the same time. The Mandelbrot set is obtained from quadratic recurrence equation zn+1 = zn2 + C . The calculation of the Mandelbrot set is done by applying mathematical operations on each pixel of the image over and over again. Interim results are thereby stored in 64-Bit double variables. Using SIMD Intrinsics for SSE it is possible to load 128-Bit wide registers with two 64-Bit double variables. This makes it possible to operate on two variables - two pixels - at once, by performing calculations on the 128-Bit register, instead of having to apply the same operations successively on each variable - each pixel. Using SIMD Intrinsics for AVX utilizing 256-Bit registers is possible. So processing four 64-Bit variables at once can be realized.
Note
|
All of the above mentioned methods of parallelization are implemented solely to calculated pixels of one image in parallel and not to generate several images at once. |
1.2.1
See CHANGELOG for details.
Note
|
After having been introduced to pthreads, OpenMP, OpenCL, SIMD this has been a first attempt of implementing these techniques to improve the performance of the image generation. The implementation is mostly try and error and for that matter should not be viewed as the right way to use these methods. |
The Image-Generator folder contains five subdirectories each representing a full version of the "ImageWriter" and "PixelGenerator" program with respective changes to the image generation routine of the "PixelGenerator":
Each folder contains a makefile and also three subdirectories "PixelGenerator", "ImageWriter" and "shared" that contain the source code. Typing make will generate two executables:
-
pixelGenerator.out
-
imageWriter.out
The subdirectory "PixelGenerator" and "ImageWriter" each hold a "src" and "include" directory as well as a makefile. The subdirectory shared holds a src and include directory for files used by both programs. The makefile at the top changes into both subdirectories - "ImageWriter" and "PixelGenerator" and executes the make command there.
Start each program in a separate terminal window or tab and the "ImageWriter" will start dumping images into your current directory.
In addition to the ImageWriter a SDL_Viewer has been added. See 99_SDL_Viewer for more details.
To Quit the programs you have to press "ctrl-c" as both programs run in an endless loop. Terminating the "PixelGenerator" by pressing ctrl-c will automatically shut down the "ImageWriter" program.
on OSX 10.9.5 and Fedora 23 with image WIDTH and HEIGHT set to 800x600, 2560x1920 and number of threads specified in thread_handler.h set to 1, 2, 4 and 8.
Nothing to do, you are good to go.
The makefile will link -lpthread
Note
|
Compiler optimizations are set -O3 -mavx -ffast-math to see how the compiler can optimize the code compared to the version of the program written explicitly with SIMD AVX Intrinsics. |
On OSX 10.9.5 you will need to install "clang-omp". On OSX the makefile sets the compiler to "clang-omp".
The makefile will link -fopenmp
Note
|
Compiler optimizations are set -O3 -mavx -ffast-math to see how the compiler can optimize the code compared to the version of the program written explicitly with SIMD AVX Intrinsics. |
On OSX OpenCL works out of the box. If you want to run this on Linux you will need the OpenCL headers and library to compile the code and the OpenCL driver runtime to be able to run the executable. For that purpose you could install the Intel OpenCL SDK and Intel OpenCL Runtime.
On Linux the makefile will link -lOpenCL
On OSX the makefile will link -framework OpenCL
On OSX you may have to add /Library/Frameworks directory to your search path:
echo 'export PATH=$PATH:/Library/Frameworks' >> ~/.bash_profile
As long as your CPU supports SSE you are good to go.
The makefile will link -lpthread and set optimizations to -O3 -msse -ffast-math
As long as your CPU supports AVX you are good to go.
The makefile will link -lpthread and set optimizations to -O3 -mavx -ffast-math
Amount of pictures after about one minute of execution on a 2012 MacBook Pro with i7 3615QM @2,3GHz and GeForce GT 650M:
Method of parallelization |
# of images after about 1 minute of execution |
|
on OSX 10.9.5 |
on FEDORA 23 |
|
pthread |
185 |
182 |
pthread -o3 -mavx -ffast-math |
331 |
335 |
OpenMP |
180 |
x |
OpenMP -o3 -mavx -ffast-math |
323 |
319 |
OpenCL CPU |
554 |
684 |
OpenCL GPU GeForce GT 650M |
363 |
x |
pthread-SIMD-SSE |
510 |
502 |
pthread-SIMD-AVX |
847 |
850 |
Method of parallelization |
# of images after about 1 minute of execution |
|
on OSX 10.9.5 |
on FEDORA 23 |
|
pthread |
55 |
55 |
pthread -o3 -mavx -ffast-math |
72 |
71 |
OpenMP |
55 |
x |
OpenMP -o3 -mavx -ffast-math |
72 |
70 |
OpenCL CPU |
111 |
133 |
OpenCL GPU GeForce GT 650M |
82 |
x |
pthread-SIMD-SSE |
97 |
96 |
pthread-SIMD-AVX |
144 |
146 |
Note
|
The OpenCL performance could probably be improved by making better use of the OpenCL memory model. |
For the versions using pthreads the number of threads is set to 8 thread_handler.h but changing it to 1, 2 or 4 is also possible. If you want to choose a custom image size make sure that the HEIGHT of the image divided by the number of threads you want to use, results in an integer value.
For the OpenCL version it can be changed if the kernel should be executed on the CPU or GPU by changing COMPUTE_DEVICE in setup_OpenCL.h
In the file numberOfPixel.h LARGE_IMAGE can be set to 1 or 0. Setting large image to 1 will generate images of 2560 x 1920. Setting it to 0 will generate images of 800 x 600. In the file numberOfPixel.c you can manually change the width and height of the image. If you set LARGE_IMAGE to 1 the shared memory segment may be bigger than the maximum shared memory size set on your system. So you have to increase "shmmax" on your system.
there are .clang_complete files in the PixelGenerator, ImageWriter and shared directory (specifying -I include paths for the atom text editor with linter-clang)
A great introduction to OpenMP:
-
"Easy multithreading programming for C++": http://bisqwit.iki.fi/story/howto/openmp/
A great article on OpenCL:
-
"A Gentle Introduction to OpenCL" http://www.drdobbs.com/parallel/a-gentle-introduction-to-opencl/231002854
A great lecture course on OpenCL:
-
"Hands On OpenCL" https://handsonopencl.github.io
The video that introduced me to SIMD:
-
"Handmade Hero Day 115 - SIMD Basics" https://www.youtube.com/watch?v=YnnTb0AQgYM
Two great SIMD Mandelbrot examples that helped me designing the termination condition of the Mandelbrot calculation.
-
https://github.com/skeeto/mandel-simd by Chris Wellons
-
http://iquilezles.org/www/articles/sse/sse.htm by Inigo Quilez
Intel Intrinsics Guide to SIMD programming:
Thank you to Christian Fibich for introducing me to OpenMP, OpenCL and providing valuable inputs to advance the project and improve my skills in c programming.
This project is licensed under the terms of the MIT License. See LICENSE for details