Problem 1: Parallel Fractal Generation Using Pthreads#

下面是提供的mandelbrotSerial算法，由此得出两种解决方法

void mandelbrotSerial(
    float x0, float y0, float x1, float y1,
    int width, int height,
    int startRow, int totalRows,
    int maxIterations,
    int output[])
{
    float dx = (x1 - x0) / width;
    float dy = (y1 - y0) / height;

    int endRow = startRow + totalRows;

    for (int j = startRow; j < endRow; j++) {
        for (int i = 0; i < width; ++i) {
            float x = x0 + i * dx;
            float y = y0 + j * dy;

            int index = (j * width + i);
            output[index] = mandel(x, y, maxIterations);
        }
    }
}

Method1#

上述代码会遍历给定范围行的每个像素
我们为每个线程分配指定行数，进行并行，得到解法1

void* workerThreadStart(void* threadArgs) {
    WorkerArgs* args = static_cast<WorkerArgs*>(threadArgs);
    int rowsPerThread = args->height / args->numThreads;
    int startRow = args->threadId * rowsPerThread;  // 计算起始行
    int totalRows = rowsPerThread;
    // 该情况下的边界条件，最后一个线程负责执行到最后一行
    if (args->threadId + 1 == args->numThreads) {
        totalRows = args->height - startRow;
    }
    mandelbrotSerial(args->x0, 
                     args->y0, 
                     args->x1,
                     args->y1,
                     args->width, 
                     args->height, 
                     startRow, 
                     totalRows, 
                     args->maxIterations, 
                     args->output);
    printf("Hello world from thread %d\n", args->threadId);
    return NULL;
}

Method2#

我们不再为每个线程分配连续的行，而是分配k*ThreadNumber + ThreadID的行给第ThreadID号线程。由此得到解法2
常见的CPU上parallel for的形式

void mandelbrotParallelfor(
    float x0, float y0, float x1, float y1,
    int width, int height,
    int startRow, int step,
    int maxIterations,
    int output[])
{
    float dx = (x1 - x0) / width;
    float dy = (y1 - y0) / height;

    int endRow = startRow + totalRows;

    for (int j = startRow; j < endRow; j+=step) {
        for (int i = 0; i < width; ++i) {
            float x = x0 + i * dx;
            float y = y0 + j * dy;

            int index = (j * width + i);
            output[index] = mandel(x, y, maxIterations);
        }
    }
}

Problem 1: Parallel Fractal Generation Using Pthreads#

Method1#

Method2#

Problem 2: Vectorizing Code Using SIMD Intrinsics#

Problem 3: Using Instruction-level Parallelism in Fractal Generation#

Problem 4: Parallel Fractal Generation Using ISPC#

Problem 4.1: A Few ISPC Basics#

Problem 4.2: Combining instruction-level and SIMD parallelism#

Problem 4.3: ISPC Tasks#

Problem 5: Iterative Cubic Root#