Skip to content

using pthreads, OpenMP, OpenCL and SIMD Intrinsics to generate images of the mandelbrot set

License

Notifications You must be signed in to change notification settings

BernhardLindner/Image-Generator

Repository files navigation

Image Generator

This project is at its core a System Programming example, demonstrating the use of shared memory, semaphores, signal handling and pthreads in c. It consist of an "ImageWriter" program and a "PixelGenerator" program. The "PixelGenerator" continuously calculates images of the Mandelbrot set and writes them into a shared memory segment. The "ImageWriter" program reads the image data out of the shared memory segment and stores the image in a P6 ppm file.

The shared memory segment can hold one image at a time. The "PixelGenerator" and "ImageWriter" are synchronized by semaphores to avoid simultaneous access of the shared memory segment.

The "PixelGenerator" generates an image of the Mandelbrot set an alters start parameters each time a new images is calculated to zoom into a section of the Mandelbrot set.

The computation of the Mandelbrot set can be done by multiple threads. The number of threads to process the image, as well as the width and height of the image can be changed. Depending on the specified number of threads the computation of one image is either done by one thread or split on several threads. For example: If the number of threads is set to 4, every thread generates a quarter of the image. If image width is set to 800- and image height to 600 pixels, thread number one will calculated image row 0 to 149, thread number two will process line 150 to 299, thread three 300 - 449 and thread four row 450 to 599.

This project has been extended to use the OpenMP library or the OpenCL framework to calculate the image in parallel instead of using pthreads. Using OpenMP the number of threads is set automatically depending on the number of threads supported by the CPU. The processing of the image will be divided on the the available threads automatically. Using OpenCL it can be specified if the calculation of the image should be done by the CPU or the GPU.

The processing of the image with use of phtreads or the OpenMP library happens concurrently - without context switching - if the CPU has enough cores to execute all of the specified threads at the same time.

At least for the version of the program using pthreads for parallelization each thread can only calculate one pixel at a time. Making use of SIMD "Single Instruction Multiple Data" Intrinsics however, each thread can process multiple pixels at the same time. The Mandelbrot set is obtained from quadratic recurrence equation zn+1 = zn2 + C . The calculation of the Mandelbrot set is done by applying mathematical operations on each pixel of the image over and over again. Interim results are thereby stored in 64-Bit double variables. Using SIMD Intrinsics for SSE it is possible to load 128-Bit wide registers with two 64-Bit double variables. This makes it possible to operate on two variables - two pixels - at once, by performing calculations on the 128-Bit register, instead of having to apply the same operations successively on each variable - each pixel. Using SIMD Intrinsics for AVX utilizing 256-Bit registers is possible. So processing four 64-Bit variables at once can be realized.

Note
All of the above mentioned methods of parallelization are implemented solely to calculated pixels of one image in parallel and not to generate several images at once.

Version

1.2.1

See CHANGELOG for details.

Note
After having been introduced to pthreads, OpenMP, OpenCL, SIMD this has been a first attempt of implementing these techniques to improve the performance of the image generation. The implementation is mostly try and error and for that matter should not be viewed as the right way to use these methods.

Usage

The Image-Generator folder contains five subdirectories each representing a full version of the "ImageWriter" and "PixelGenerator" program with respective changes to the image generation routine of the "PixelGenerator":

Each folder contains a makefile and also three subdirectories "PixelGenerator", "ImageWriter" and "shared" that contain the source code. Typing make will generate two executables:

  • pixelGenerator.out

  • imageWriter.out

The subdirectory "PixelGenerator" and "ImageWriter" each hold a "src" and "include" directory as well as a makefile. The subdirectory shared holds a src and include directory for files used by both programs. The makefile at the top changes into both subdirectories - "ImageWriter" and "PixelGenerator" and executes the make command there.

Start each program in a separate terminal window or tab and the "ImageWriter" will start dumping images into your current directory.

In addition to the ImageWriter a SDL_Viewer has been added. See 99_SDL_Viewer for more details.

To Quit the programs you have to press "ctrl-c" as both programs run in an endless loop. Terminating the "PixelGenerator" by pressing ctrl-c will automatically shut down the "ImageWriter" program.

Tested

on OSX 10.9.5 and Fedora 23 with image WIDTH and HEIGHT set to 800x600, 2560x1920 and number of threads specified in thread_handler.h set to 1, 2, 4 and 8.

Prerequisite

Nothing to do, you are good to go.

The makefile will link -lpthread

Note
Compiler optimizations are set -O3 -mavx -ffast-math to see how the compiler can optimize the code compared to the version of the program written explicitly with SIMD AVX Intrinsics.

On OSX 10.9.5 you will need to install "clang-omp". On OSX the makefile sets the compiler to "clang-omp".

The makefile will link -fopenmp

Note
Compiler optimizations are set -O3 -mavx -ffast-math to see how the compiler can optimize the code compared to the version of the program written explicitly with SIMD AVX Intrinsics.

On OSX OpenCL works out of the box. If you want to run this on Linux you will need the OpenCL headers and library to compile the code and the OpenCL driver runtime to be able to run the executable. For that purpose you could install the Intel OpenCL SDK and Intel OpenCL Runtime.

On Linux the makefile will link -lOpenCL

On OSX the makefile will link -framework OpenCL

On OSX you may have to add /Library/Frameworks directory to your search path:

echo 'export PATH=$PATH:/Library/Frameworks' >> ~/.bash_profile

As long as your CPU supports SSE you are good to go.

The makefile will link -lpthread and set optimizations to -O3 -msse -ffast-math

As long as your CPU supports AVX you are good to go.

The makefile will link -lpthread and set optimizations to -O3 -mavx -ffast-math

Performance Results

Amount of pictures after about one minute of execution on a 2012 MacBook Pro with i7 3615QM @2,3GHz and GeForce GT 650M:

Image resolution 800x600, max iterations 1023:

Method of parallelization

# of images after about 1 minute of execution

on OSX 10.9.5

on FEDORA 23

pthread

185

182

pthread -o3 -mavx -ffast-math

331

335

OpenMP

180

x

OpenMP -o3 -mavx -ffast-math

323

319

OpenCL CPU

554

684

OpenCL GPU GeForce GT 650M

363

x

pthread-SIMD-SSE

510

502

pthread-SIMD-AVX

847

850

Image resolution 2560x1920, max iterations 1023:

Method of parallelization

# of images after about 1 minute of execution

on OSX 10.9.5

on FEDORA 23

pthread

55

55

pthread -o3 -mavx -ffast-math

72

71

OpenMP

55

x

OpenMP -o3 -mavx -ffast-math

72

70

OpenCL CPU

111

133

OpenCL GPU GeForce GT 650M

82

x

pthread-SIMD-SSE

97

96

pthread-SIMD-AVX

144

146

Note
The OpenCL performance could probably be improved by making better use of the OpenCL memory model.

Additional Information

For the versions using pthreads the number of threads is set to 8 thread_handler.h but changing it to 1, 2 or 4 is also possible. If you want to choose a custom image size make sure that the HEIGHT of the image divided by the number of threads you want to use, results in an integer value.

For the OpenCL version it can be changed if the kernel should be executed on the CPU or GPU by changing COMPUTE_DEVICE in setup_OpenCL.h

In the file numberOfPixel.h LARGE_IMAGE can be set to 1 or 0. Setting large image to 1 will generate images of 2560 x 1920. Setting it to 0 will generate images of 800 x 600. In the file numberOfPixel.c you can manually change the width and height of the image. If you set LARGE_IMAGE to 1 the shared memory segment may be bigger than the maximum shared memory size set on your system. So you have to increase "shmmax" on your system.

there are .clang_complete files in the PixelGenerator, ImageWriter and shared directory (specifying -I include paths for the atom text editor with linter-clang)

A great introduction to OpenMP:

A great article on OpenCL:

A great lecture course on OpenCL:

The video that introduced me to SIMD:

Two great SIMD Mandelbrot examples that helped me designing the termination condition of the Mandelbrot calculation.

Intel Intrinsics Guide to SIMD programming:

Thanks

Thank you to Christian Fibich for introducing me to OpenMP, OpenCL and providing valuable inputs to advance the project and improve my skills in c programming.

License

This project is licensed under the terms of the MIT License. See LICENSE for details

About

using pthreads, OpenMP, OpenCL and SIMD Intrinsics to generate images of the mandelbrot set

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published