Anthosq
451 字
2 分钟
CMU15418_A1

Problem 1: Parallel Fractal Generation Using Pthreads#

下面是提供的mandelbrotSerial算法,由此得出两种解决方法

void mandelbrotSerial(
    float x0, float y0, float x1, float y1,
    int width, int height,
    int startRow, int totalRows,
    int maxIterations,
    int output[])
{
    float dx = (x1 - x0) / width;
    float dy = (y1 - y0) / height;

    int endRow = startRow + totalRows;

    for (int j = startRow; j < endRow; j++) {
        for (int i = 0; i < width; ++i) {
            float x = x0 + i * dx;
            float y = y0 + j * dy;

            int index = (j * width + i);
            output[index] = mandel(x, y, maxIterations);
        }
    }
}

Method1#

  • 上述代码会遍历给定范围的每个像素
  • 我们为每个线程分配指定行数,进行并行,得到解法1
void* workerThreadStart(void* threadArgs) {
    WorkerArgs* args = static_cast<WorkerArgs*>(threadArgs);
    int rowsPerThread = args->height / args->numThreads;
    int startRow = args->threadId * rowsPerThread;  // 计算起始行
    int totalRows = rowsPerThread;
    // 该情况下的边界条件,最后一个线程负责执行到最后一行
    if (args->threadId + 1 == args->numThreads) {
        totalRows = args->height - startRow;
    }
    mandelbrotSerial(args->x0, 
                     args->y0, 
                     args->x1,
                     args->y1,
                     args->width, 
                     args->height, 
                     startRow, 
                     totalRows, 
                     args->maxIterations, 
                     args->output);
    printf("Hello world from thread %d\n", args->threadId);
    return NULL;
}

Method2#

  • 我们不再为每个线程分配连续的行,而是分配k*ThreadNumber + ThreadID的行给第ThreadID号线程。由此得到解法2
  • 常见的CPU上parallel for的形式
void mandelbrotParallelfor(
    float x0, float y0, float x1, float y1,
    int width, int height,
    int startRow, int step,
    int maxIterations,
    int output[])
{
    float dx = (x1 - x0) / width;
    float dy = (y1 - y0) / height;

    int endRow = startRow + totalRows;

    for (int j = startRow; j < endRow; j+=step) {
        for (int i = 0; i < width; ++i) {
            float x = x0 + i * dx;
            float y = y0 + j * dy;

            int index = (j * width + i);
            output[index] = mandel(x, y, maxIterations);
        }
    }
}
  • 还有很多的解法,但是出于时间考虑暂时不深入

Problem 2: Vectorizing Code Using SIMD Intrinsics#

Problem 3: Using Instruction-level Parallelism in Fractal Generation#

Problem 4: Parallel Fractal Generation Using ISPC#

Problem 4.1: A Few ISPC Basics#

Problem 4.2: Combining instruction-level and SIMD parallelism#

Problem 4.3: ISPC Tasks#

Problem 5: Iterative Cubic Root#

CMU15418_A1
https://anthosq.github.io/posts/cmu15418/cmu15418_a1/
作者
Anthosq
发布于
2024-05-31
许可协议
CC BY-NC-SA 4.0