Compute PI in parallel

并行計算圓周率

看到這個題目，俗了，大家都在計算圓周率。不過咱們的目的是看一下并行計算的基本流程。

書上計算PI用的是精确的數值計算方法，我這裡再給出一種機率計算方法。

OpenMP和MPI将同時亮相。

計算PI的方法

1.tan(PI/4)=1 => PI=4arctan1。知道arctan1轉化為定積分的形式是什麼吧。

`01`	`#include<stdio.h>`

`02`	`#include<time.h>`

`03`	`#define N 1000000`

main(){

`05`	`double` `local,pi=0.0,w;`

`06`	`long` `i;`

`07`	`w=1.0/N;`

`08`	`clock_t` `t1=` `clock` `();`

`09`	`for` `(i=0;i<N;i++){`

`10`	`local=(i+0.5)*w;`

`11`	`pi=pi+4.0/(1.0+local*local);`

`13`	`clock_t` `t2=` `clock` `();`

`14`	`printf` `(` `"PI is %.20f/n"` `,pi*w);`

`15`	`printf` `(` `"Time: %.2f seconds/n"` `,(` `float` `)(t2-t1)/CLOCKS_PER_SEC);`

[email protected]:~/Program$ ./PI1

PI is 3.14159265358976336202

Time: 0.02 seconds

2.以坐标原點為形心，作半徑為1的圓和邊長為2的正方形。正方形與圓的面積之比即為PI

`01`	`#include<stdio.h>`

`02`	`#include<stdlib.h>`

`03`	`#include<time.h>`

`04`	`#include<math.h>`

`05`	`#define N 1000000`

main(){

`07`	`long` `i,sum;`

`08`	`double` `x,y;`

`09`	`srand` `((unsigned)` `time` `(NULL));`

sum=0;

`11`	`clock_t` `t1=` `clock` `();`

`12`	`for` `(i=0;i<N;i++){`

`13`	`x=(` `double` `)` `rand` `()/RAND_MAX;`

`14`	`y=(` `double` `)` `rand` `()/RAND_MAX;`

`15`	`if` `(xx+yy<1)`

sum++;

`18`	`clock_t` `t2=` `clock` `();`

`19`	`printf` `(` `"PI is %.20f/n"` `,4*(` `double` `)sum/N);`

`20`	`printf` `(` `"Time: %.2f/n"` `,(` `float` `)(t2-t1)/CLOCKS_PER_SEC);`

[email protected]:~$ ./PI0

PI is 3.14301599999999980994

Time: 0.16

對比可以看到方法1在計算精度和速度上都具有絕對的優勢。在下面的openMP和MPI計算中我們都采用方法1。

OpenMP

OpenMP[OMP]是一個編譯器指令和庫函數的集合（已包含在gcc中），它用于為共享存儲器計算機建立并行程式。OMP組合了C、C++和Fortran。

`01`	`#include<stdio.h>`

`02`	`#include<time.h>`

`03`	`#include<omp.h>`

`04`	`#define N 1000000`

main(){

`06`	`double` `local,pi=0.0,w;`

`07`	`long` `i;`

`08`	`w=1.0/N;`

`09`	`clock_t` `t1=` `clock` `();`

`10`	`#pragma omp parallel for private(local) reduction(+:pi)`

`11`	`for` `(i=0;i<N;i++){`

`12`	`local=(i+0.5)*w;`

`13`	`pi=pi+4.0/(1.0+local*local);`

`15`	`clock_t` `t2=` `clock` `();`

`16`	`printf` `(` `"PI is %.20f/n"` `,pi*w);`

`17`	`printf` `(` `"Time: %.2f seconds/n"` `,(` `float` `)(t2-t1)/CLOCKS_PER_SEC);`

[email protected]:~/Program$ ./PI2

PI is 3.14159265358976336202

Time: 0.02 seconds

跟串行計算結果是一模一樣。

#pragma omp parallel表示下面的一行代碼或代碼塊要配置設定到多個執行單元中并行計算。

#pragma omp parallel for用在一個for循環的前面

private(local)預設情況下定義在并行代碼之外的變量為各并行的執行單元所共享，使用private限制，表示每個執行單元建立該變量的一個副本

reduction(+:pi)表示并行代碼執行完畢後對各個執行單元中的pi進行相加操作

MPICH2

ubuntu下mpich2的安裝配置見這位老兄的部落格http://hi.baidu.com/mousetwo/blog/item/ddba71a9b0adf4f11f17a2fc.html

MPI（Message Parsing Interface）消息傳遞接口是用于分布式存儲器并行計算機的标準程式設計環境。MPI的核心構造是消息傳遞：一個程序将資訊打包成消息，并将該消息發送給其他程序。MPI最常用的兩個實作是LAM/MPI[LAM]和MPICH[MPI]。

在MPI中執行單元（UE）指的就是程序。

`01`	`#include<stdio.h>`

`02`	`#include<mpi.h>`

`03`	`#include<math.h>`

`05`	`int` `main(` `int` `argc,` `char` `*argv[]){`

`06`	`int` `my_rank,num_procs;`

`07`	`int` `i,n=0;`

`08`	`double` `sum,width,local,mypi,pi;`

`09`	`double` `start=0.0,stop=0.0;`

`10`	`int` `proc_len;`

`11`	`char` `processor_name[MPI_MAX_PROCESSOR_NAME];`

`13`	`MPI_Init(&argc,&argv);` `//初始化環境`

`14`	`MPI_Comm_size(MPI_COMM_WORLD,&num_procs);` `//擷取并行的程序數`

`15`	`MPI_Comm_rank(MPI_COMM_WORLD,&my_rank);` `//目前程序在所有程序中的序号`

`16`	`MPI_Get_processor_name(processor_name,&proc_len);` `//擷取總的處理機數和各個處理機的名稱`

`18`	`printf` `(` `"Processor %d of %d on %s/n"` `,my_rank,num_procs,processor_name);`

`19`	`if` `(my_rank==0){`

`20`	`printf` `(` `"please give n="` `);`

`21`	`scanf` `(` `"%d"` `,&n);`

`22`	`start=MPI_Wtime();` `//MPI計時`

`24`	`MPI_Bcast(&n,1,MPI_INT,0,MPI_COMM_WORLD);` `//把n廣播給本通信環境中的所有程序`

`25`	`width=1.0/n;`

`26`	`sum=0.0;`

`27`	`for` `(i=my_rank;i<n;i+=num_procs){`

`28`	`local=width*((` `double` `)i+0.5);`

`29`	`sum+=4.0/(1.0+local*local);`

`31`	`mypi=width*sum;`

`32`	`MPI_Reduce(&mypi,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);` `//由程序0進行歸約，把每個程序計算出來的mypi進行相加（MPI_SUM）,賦給pi`

`33`	`if` `(my_rank==0){`

`34`	`printf` `(` `"PI is %.20f/n"` `,pi);`

`35`	`stop=MPI_Wtime();`

`36`	`printf` `(` `"Time: %f/n"` `,stop-start);`

`37`	`fflush` `(stdout);`

`39`	`MPI_Finalize();`

`40`	`return` `0;`

[email protected]:~/Program$ mpdboot

[email protected]:~/Program$ mpicc -o PI3 PI3.c

[email protected]:~/Program$ mpirun -np 4 ./PI3

Processor 0 of 4 on orisun-desktop

please give n=Processor 2 of 4 on orisun-desktop

Processor 1 of 4 on orisun-desktop

Processor 3 of 4 on orisun-desktop

1000000

PI is 3.14159465358887635134

Time: 0.012510

[email protected]:~/Program$ mpdcleanup

時間是0.01251秒，比0.02秒明顯減少。

注意輸出中有這麼一行：please give n=Processor 2 of 4 on orisun-desktop

這說明是我們不能保證代碼中的18行和20行的執行順序。

Reference: http://www.cnblogs.com/zhangchaoyang/articles/1871168.html

Compute PI in parallel

繼續閱讀

Java中Integer和int的差別

最新2007年OWASP十大Web資安漏洞 (2007 OWASP Top 10)

Flash AS3 垃圾回收機制詳解

[分享] 英文寫作基本注意事項

object-c 記憶體管理學習筆記

Replace Data Value with Object(以對象取代資料值)

Intel® 64 and IA-32 Architectures Software Developer's Manuals

關于Flex的一些學習資料

Common Questions Of Gtk+

Learning Perl: 1.3. How Can I Get Perl?

Partitioning Concepts

JVM 記憶體模型和垃圾回收（二）：串行垃圾回收器

兩種清除浮動的方法

effective c++ 第三版讀書筆記1

Compile workrave under windows &ndash; My exprience 在Windows上編譯Workrave

什麼是BNF範式