This project at its core is a System Programming example, demonstrating the use of shared memory, semaphores, signal handling and pthreads in c. It consist of an "ImageWriter" program and a "PixelGenerator" program. The "PixelGenerator" continuously calculates images of the mandelbrot set and writes them into a shared memory segment. The "ImageWriter" program reads the image out of the shared memory segment and stores the image in a P6 ppm file.
The shared memory segment can hold one image at a time. The "PixelGenerator" and "ImageWriter" are synchronized by semaphores to avoid simultaneous access of the shared memory segment.
The "PixelGenerator" generates an image of the mandelbrot set an alters start parameters each time a new images is calculated to zoom into a section of the mandelbrot set.
The computation of the mandelbrot set can be done by multiple threads. The number of threads to process the image, as well as the width and height of the image can be changed. Depending on the number of threads the computation of one image is split on several threads. For example: If the number of threads is set to 4, every thread generates a quarter of the image. If image width is set to 800- and image height to 600 pixels, thread number one will calculated image row 0 to 149, thread number two will process line 150 to 299, thread three 300 - 449 and thread four row 450 to 599.
This project has been extended to use the OpenMP library or the OpenCL framework to calculate the image in parallel instead of using pthreads. Using OpenMP the number of threads is set automatically depending on the number of threads supported by the CPU. The processing of the image will be divided on the the available threads automatically. Using OpenCL it can be specified if the calculation of the image should be done by the CPU or the GPU.
The processing of the image with use of phtreads or the OpenMP library happens concurrently, without context switching, if the CPU has enough cores to execute all of the specified threads at the same time. Until now each thread can only calculate on pixel at a time. Making use of SIMD "Single Instruction Multiple Data" Intrinsics however, each thread can process multiple pixels at the same time. The calculation of the mandelbrot set is done by applying mathematical operations on each pixel over and over again. Interim results are stored in 64-Bit double variables. Using SIMD Intrinsics for SSE it is possible to load up 128-Bit wide registers with two 64-Bit double variables. This makes it possible to operate on two variables - two pixels - at once, by performing calculations on the 128-Bit register, instead of having to apply the same operations successively on each variable - each pixel. Using SIMD Intrinsics for AVX utilizing 256-Bit registers is possible. So processing four 64-Bit variables at once can be realized.
Note
|
All of the above mentioned methods of parallelization are implemented solely to calculated pixels of one image in parallel and not to generate several images at once. |
1.1
Note
|
After having been introduced to pthreads, OpenMP, OpenCL, SIMD this has been a first attempt of implementing them to improve the performance of the image generation. The implementation is mostly try and error and for that matter should not be viewed as the right way to do so. |
Version 1.1 Split OpenCL setup (creating context, building program, kernel) and execution of the kernel (calculating the mandelbrot set, generating the image) into separate functions (setup_OpenCL() and generate_image()) to avoid rebuilding the same OpenCL kernel after every image calculation and wasting CPU time.
Version 1.0 Initial Release
The Image-Generator folder contains five subdirectories each representing a full version of the "ImageWriter" and "PixelGenerator" program with respective changes to the image generation routine of the "PixelGenerator":
Each folder contains a makefile and also three subdirectories "PixelGenerator", "ImageWriter" and "shared" that contain the source code. Typing make will generate two executables:
-
pixelGenerator.out
-
imageWriter.out
The subdirectory "PixelGenerator" and "ImageWriter" each hold a "src" and "include" directory as well as a makefile. The subdirectory shared holds a src and include directory for files used by both programs. The makefile at the top changes into the both subdirectories - "ImageWriter" and "PixelGenerator" and executes the make command there.
Start each program in a separate terminal window or tab and the "ImageWriter" will start dumping images into your current directory.
To Quit the programs you have to press "ctrl-c" as both programs run in an endless loop. Terminating the "PixelGenerator" by pressing ctrl-c will automatically shut down the "ImageWriter" program.
on OSX 10.9.5 and Fedora 23 with image WIDTH and HEIGHT set to 800x600, 2560x1920 and number of threads specified in thread_handler.h set to 1, 2, 4 and 8.
Nothing to do, you are good to go.
The makefile will link -lpthread
Note
|
Compiler optimizations are set -O3 -mavx -ffast-math to see how the compiler can optimize the code compared to the version of the program written explicitly with SIMD AVX Intrinsics. |
On OSX 10.9.5 you will need to install "clang-omp". On OSX the makefile sets the compiler to "clang-omp".
The makefile will link -fopenmp
Note
|
Compiler optimizations are set -O3 -mavx -ffast-math to see how the compiler can optimize the code compared to the version of the program written explicitly with SIMD AVX Intrinsics. |
On OSX OpenCL works out of the box. If you are running this under Fedora you will need the OpenCL headers and library to compile the code and the OpenCL driver runtime to be able to run the executable. For that purpose you could install the Intel OpenCL SDK and Intel OpenCL Runtime.
On Linux the makefile will link -lOpenCL
On OSX the makefile will link -framework OpenCL
On OSX you may have to add /Library/Frameworks directory to your search path:
echo 'export PATH=$PATH:/Library/Frameworks' >> ~/.bash_profile
As long as your CPU supports SSE you are good to go.
The makefile will link -lpthread and set optimizations to -O3 -msse -ffast-math
As long as your CPU supports AVX you are good to go.
The makefile will link -lpthread and set optimizations to -O3 -mavx -ffast-math
Amount of pictures after about one minute of execution on a 2012 MacBook Pro with i7 3615QM @2,3GHz and GeForce GT 650M:
Image resolution 800x600, max iterations 1023:
OSX 10.9.5
-
Image-Generator_pthread → 185 Images
-
Image-Generator_pthread -o3 -mavx -ffast-math → 331 Images
-
Image-Generator_OpenMP → 180 Images
-
Image-Generator_OpenMP -o3 -mavx -ffast-math → 323 Images
-
Image-Generator_OpenCL CPU → 554 Images
-
Image-Generator_OpenCL GPU GeForce GT 650M → 363 Images
-
Image-Generator_pthread-SIMD-SSE → 510 Images
-
Image-Generator_pthread-SIMD-AVX → 847 Images
FEDORA 23
-
Image-Generator_pthread → 182 Images
-
Image-Generator_pthread -o3 -mavx -ffast-math → 335 Images
-
Image-Generator_OpenMP -o3 -mavx -ffast-math → 319 Images
-
Image-Generator_OpenCL CPU → 684 Images
-
Image-Generator_pthread-SIMD-SSE → 502 Images
-
Image-Generator_pthread-SIMD-AVX → 850 Images
Image resolution 2560x1920, max iterations 1023:
OSX 10.9.5
-
Image-Generator_pthread → 55 Images
-
Image-Generator_pthread -o3 -mavx -ffast-math → 72 Images
-
Image-Generator_OpenMP → 55 Images
-
Image-Generator_OpenMP -o3 -mavx -ffast-math → 72 Images
-
Image-Generator_OpenCL GPU GeForce GT 650M → 82 Images
-
Image-Generator_OpenCL CPU → 111 Images
-
Image-Generator_pthread-SIMD-SSE → 97 Images
-
Image-Generator_pthread-SIMD-AVX → 144 Images
FEDORA 23
-
Image-Generator_pthread → 55 Images
-
Image-Generator_pthread -o3 -mavx -ffast-math → 71 Images
-
Image-Generator_OpenMP -o3 -mavx -ffast-math → 70 Images
-
Image-Generator_OpenCL CPU → 133 Images
-
Image-Generator_pthread-SIMD-SSE → 96 Images
-
Image-Generator_pthread-SIMD-AVX → 146 Images
Note
|
The OpenCL performance could probably be improved by making better use of the OpenCL memory model. |
For the versions using pthreads the number of threads is set to 8 /PixelGenerator/include/thread_handler.h but changing it to 1, 2 or 4 is also possible. If the HEIGHT of the Image (which is set to 600) divided by the number of threads you want to use, results in an integer value, it should be possible to use it.
For the OpenCL version it can be changed if the kernel should be executed on the CPU or GPU by changing COMPUTE_DEVICE in /include/setup_OpenCL.h
In the file /shared/include/numberOfPixel.h LARGE_IMAGE can be set to 1 or 0. Setting large image to 1 will generate images of 2560 x 1920. Setting it to 0 will generate images of 800 x 600. In the file /shared/src/numberOfPixel.c you can manually change the width and height of the image. If you set LARGE_IMAGE to 1 the shared memory segment may be bigger than the maximum shared memory size set on your system. So you have to increase "shmmax" on your system.
there are .clang_complete files in the PixelGenerator, ImageWriter and shared directory (specifying -I include paths for the atom text editor with linter-clang)
A great introduction to OpenMP:
-
"Easy multithreading programming for C++": http://bisqwit.iki.fi/story/howto/openmp/
A great article on OpenCL:
-
"A Gentle Introduction to OpenCL" http://www.drdobbs.com/parallel/a-gentle-introduction-to-opencl/231002854
A great lecture course on OpenCL:
-
"Hands On OpenCL" https://handsonopencl.github.io
The video that introduced me to SIMD:
-
"Handmade Hero Day 115 - SIMD Basics" https://www.youtube.com/watch?v=YnnTb0AQgYM
Two great SIMD mandelbrot examples that helped me designing the termination condition of the mandelbrot calculation.
-
https://github.com/skeeto/mandel-simd by Chris Wellons
-
http://iquilezles.org/www/articles/sse/sse.htm by Inigo Quilez
Intel Intrinsics Guide to SIMD programming:
Thank you to Christian Fibich for introducing me to OpenMP, OpenCL and providing valuable inputs to advance the project and improve my skills in c programming.
This project is licensed under the terms of the MIT License. See LICENSE for details