如何合理地估算線程池大小？

感謝網友【蔣小強】投稿。

如何合理地估算線程池大小？

這個問題雖然看起來很小，卻并不那麼容易回答。大家如果有更好的方法歡迎賜教，先來一個天真的估算方法：假設要求一個系統的TPS（Transaction Per Second或者Task Per Second）至少為20，然後假設每個Transaction由一個線程完成，繼續假設平均每個線程處理一個Transaction的時間為4s。那麼問題轉化為：

如何設計線程池大小，使得可以在1s内處理完20個Transaction？

計算過程很簡單，每個線程的處理能力為0.25TPS，那麼要達到20TPS，顯然需要20/0.25=80個線程。

很顯然這個估算方法很天真，因為它沒有考慮到CPU數目。一般伺服器的CPU核數為16或者32，如果有80個線程，那麼肯定會帶來太多不必要的線程上下文切換開銷。

再來第二種簡單的但不知是否可行的方法（N為CPU總核數）：

如果是CPU密集型應用，則線程池大小設定為N+1
如果是IO密集型應用，則線程池大小設定為2N+1

如果一台伺服器上隻部署這一個應用并且隻有這一個線程池，那麼這種估算或許合理，具體還需自行測試驗證。

接下來在這個文檔：伺服器性能IO優化中發現一個估算公式：

最佳線程數目 = （（線程等待時間+線程CPU時間）/線程CPU時間 ）* CPU數目

比如平均每個線程CPU運作時間為0.5s，而線程等待時間（非CPU運作時間，比如IO）為1.5s，CPU核心數為8，那麼根據上面這個公式估算得到：((0.5+1.5)/0.5)*8=32。這個公式進一步轉化為：

最佳線程數目 = （線程等待時間與線程CPU時間之比 + 1）* CPU數目

可以得出一個結論：

線程等待時間所占比例越高，需要越多線程。線程CPU時間所占比例越高，需要越少線程。

上一種估算方法也和這個結論相合。

一個系統最快的部分是CPU，是以決定一個系統吞吐量上限的是CPU。增強CPU處理能力，可以提高系統吞吐量上限。但根據短闆效應，真實的系統吞吐量并不能單純根據CPU來計算。那要提高系統吞吐量，就需要從“系統短闆”（比如網絡延遲、IO）着手：

盡量提高短闆操作的并行化比率，比如多線程下載下傳技術
增強短闆能力，比如用NIO替代IO

第一條可以聯系到Amdahl定律，這條定律定義了串行系統并行化後的加速比計算公式：

加速比=優化前系統耗時 / 優化後系統耗時

加速比越大，表明系統并行化的優化效果越好。Addahl定律還給出了系統并行度、CPU數目和加速比的關系，加速比為Speedup，系統串行化比率（指串行執行代碼所占比率）為F，CPU數目為N：

Speedup <= 1 / (F + (1-F)/N)

當N足夠大時，串行化比率F越小，加速比Speedup越大。

寫到這裡，我突然冒出一個問題。

是否使用線程池就一定比使用單線程高效呢？

答案是否定的，比如Redis就是單線程的，但它卻非常高效，基本操作都能達到十萬量級/s。從線程這個角度來看，部分原因在于：

多線程帶來線程上下文切換開銷，單線程就沒有這種開銷
鎖

當然“Redis很快”更本質的原因在于：Redis基本都是記憶體操作，這種情況下單線程可以很高效地利用CPU。而多線程适用場景一般是：存在相當比例的IO和網絡操作。

是以即使有上面的簡單估算方法，也許看似合理，但實際上也未必合理，都需要結合系統真實情況（比如是IO密集型或者是CPU密集型或者是純記憶體操作）和硬體環境（CPU、記憶體、硬碟讀寫速度、網絡狀況等）來不斷嘗試達到一個符合實際的合理估算值。

最後來一個“Dark Magic”估算方法（因為我暫時還沒有搞懂它的原理），使用下面的類：

package pool_size_calculate;

import java.math.BigDecimal;

import java.math.RoundingMode;

import java.util.Timer;

import java.util.TimerTask;

import java.util.concurrent.BlockingQueue;


/**

A class that calculates the optimal thread pool boundaries. It takes the
desired target utilization and the desired work queue memory consumption as
input and retuns thread count and work queue capacity.
@author Niklas Schlimm

public abstract class PoolSizeCalculator {

/**
 * The sample queue size to calculate the size of a single {@link Runnable}
 * element.
 */
private final int SAMPLE_QUEUE_SIZE = 1000;

/**
 * Accuracy of test run. It must finish within 20ms of the testTime
 * otherwise we retry the test. This could be configurable.
 */
private final int EPSYLON = 20;

/**
 * Control variable for the CPU time investigation.
 */
private volatile boolean expired;

/**
 * Time (millis) of the test run in the CPU time calculation.
 */
private final long testtime = 3000;

/**
 * Calculates the boundaries of a thread pool for a given {@link Runnable}.
 *
 * @param targetUtilization
 *            the desired utilization of the CPUs (0 &lt;= targetUtilization &lt;= 	 *            1) 	 * @param targetQueueSizeBytes 	 *            the desired maximum work queue size of the thread pool (bytes) 	 */ 	protected void calculateBoundaries(BigDecimal targetUtilization, 			BigDecimal targetQueueSizeBytes) { 		calculateOptimalCapacity(targetQueueSizeBytes); 		Runnable task = creatTask(); 		start(task); 		start(task); // warm up phase 		long cputime = getCurrentThreadCPUTime(); 		start(task); // test intervall 		cputime = getCurrentThreadCPUTime() - cputime; 		long waittime = (testtime * 1000000) - cputime; 		calculateOptimalThreadCount(cputime, waittime, targetUtilization); 	} 	private void calculateOptimalCapacity(BigDecimal targetQueueSizeBytes) { 		long mem = calculateMemoryUsage(); 		BigDecimal queueCapacity = targetQueueSizeBytes.divide(new BigDecimal( 				mem), RoundingMode.HALF_UP); 		System.out.println("Target queue memory usage (bytes): " 				+ targetQueueSizeBytes); 		System.out.println("createTask() produced " 				+ creatTask().getClass().getName() + " which took " + mem 				+ " bytes in a queue"); 		System.out.println("Formula: " + targetQueueSizeBytes + " / " + mem); 		System.out.println("* Recommended queue capacity (bytes): " 				+ queueCapacity); 	} 	/** 	 * Brian Goetz' optimal thread count formula, see 'Java Concurrency in 	 * Practice' (chapter 8.2) 	 *  	 * @param cpu 	 *            cpu time consumed by considered task 	 * @param wait 	 *            wait time of considered task 	 * @param targetUtilization 	 *            target utilization of the system 	 */ 	private void calculateOptimalThreadCount(long cpu, long wait, 			BigDecimal targetUtilization) { 		BigDecimal waitTime = new BigDecimal(wait); 		BigDecimal computeTime = new BigDecimal(cpu); 		BigDecimal numberOfCPU = new BigDecimal(Runtime.getRuntime() 				.availableProcessors()); 		BigDecimal optimalthreadcount = numberOfCPU.multiply(targetUtilization) 				.multiply( 						new BigDecimal(1).add(waitTime.divide(computeTime, 								RoundingMode.HALF_UP))); 		System.out.println("Number of CPU: " + numberOfCPU); 		System.out.println("Target utilization: " + targetUtilization); 		System.out.println("Elapsed time (nanos): " + (testtime * 1000000)); 		System.out.println("Compute time (nanos): " + cpu); 		System.out.println("Wait time (nanos): " + wait); 		System.out.println("Formula: " + numberOfCPU + " * " 				+ targetUtilization + " * (1 + " + waitTime + " / " 				+ computeTime + ")"); 		System.out.println("* Optimal thread count: " + optimalthreadcount); 	} 	/** 	 * Runs the {@link Runnable} over a period defined in {@link #testtime}. 	 * Based on Heinz Kabbutz' ideas 	 * (http://www.javaspecialists.eu/archive/Issue124.html). 	 *  	 * @param task 	 *            the runnable under investigation 	 */ 	public void start(Runnable task) { 		long start = 0; 		int runs = 0; 		do { 			if (++runs &gt; 5) {
			throw new IllegalStateException("Test not accurate");
		}
		expired = false;
		start = System.currentTimeMillis();
		Timer timer = new Timer();
		timer.schedule(new TimerTask() {
			public void run() {
				expired = true;
			}
		}, testtime);
		while (!expired) {
			task.run();
		}
		start = System.currentTimeMillis() - start;
		timer.cancel();
	} while (Math.abs(start - testtime) &gt; EPSYLON);
	collectGarbage(3);
}

private void collectGarbage(int times) {
	for (int i = 0; i &lt; times; i++) {
		System.gc();
		try {
			Thread.sleep(10);
		} catch (InterruptedException e) {
			Thread.currentThread().interrupt();
			break;
		}
	}
}

/**
 * Calculates the memory usage of a single element in a work queue. Based on
 * Heinz Kabbutz' ideas
 * (http://www.javaspecialists.eu/archive/Issue029.html).
 *
 * @return memory usage of a single {@link Runnable} element in the thread
 *         pools work queue
 */
public long calculateMemoryUsage() {
	BlockingQueue queue = createWorkQueue();
	for (int i = 0; i &lt; SAMPLE_QUEUE_SIZE; i++) {
		queue.add(creatTask());
	}
	long mem0 = Runtime.getRuntime().totalMemory()
			- Runtime.getRuntime().freeMemory();
	long mem1 = Runtime.getRuntime().totalMemory()
			- Runtime.getRuntime().freeMemory();
	queue = null;
	collectGarbage(15);
	mem0 = Runtime.getRuntime().totalMemory()
			- Runtime.getRuntime().freeMemory();
	queue = createWorkQueue();
	for (int i = 0; i &lt; SAMPLE_QUEUE_SIZE; i++) {
		queue.add(creatTask());
	}
	collectGarbage(15);
	mem1 = Runtime.getRuntime().totalMemory()
			- Runtime.getRuntime().freeMemory();
	return (mem1 - mem0) / SAMPLE_QUEUE_SIZE;
}

/**
 * Create your runnable task here.
 *
 * @return an instance of your runnable task under investigation
 */
protected abstract Runnable creatTask();

/**
 * Return an instance of the queue used in the thread pool.
 *
 * @return queue instance
 */
protected abstract BlockingQueue createWorkQueue();

/**
 * Calculate current cpu time. Various frameworks may be used here,
 * depending on the operating system in use. (e.g.
 * http://www.hyperic.com/products/sigar). The more accurate the CPU time
 * measurement, the more accurate the results for thread count boundaries.
 *
 * @return current cpu time of current thread
 */
protected abstract long getCurrentThreadCPUTime();

}

然後自己繼承這個抽象類并實作它的三個抽象方法，比如下面是我寫的一個示例（任務是請求網絡資料），其中我指定期望CPU使用率為1.0（即100%），任務隊列總大小不超過100,000位元組：

package pool_size_calculate;

import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStreamReader;

import java.lang.management.ManagementFactory;

import java.math.BigDecimal;

import java.net.HttpURLConnection;

import java.net.URL;

import java.util.concurrent.BlockingQueue;

import java.util.concurrent.LinkedBlockingQueue;


public class SimplePoolSizeCaculatorImpl extends PoolSizeCalculator {

       @Override
protected Runnable creatTask() {
	return new AsyncIOTask();
}

@Override
protected BlockingQueue createWorkQueue() {
	return new LinkedBlockingQueue(1000);
}

@Override
protected long getCurrentThreadCPUTime() {
	return ManagementFactory.getThreadMXBean().getCurrentThreadCpuTime();
}

public static void main(String[] args) {
	PoolSizeCalculator poolSizeCalculator = new SimplePoolSizeCaculatorImpl();
	poolSizeCalculator.calculateBoundaries(new BigDecimal(1.0), new BigDecimal(100000));
}
                   


}


/**

自定義的異步IO任務
@author Will

class AsyncIOTask implements Runnable {

@Override
public void run() {
	HttpURLConnection connection = null;
	BufferedReader reader = null;
	try {
		String getURL = "http://baidu.com";
		URL getUrl = new URL(getURL);

		connection = (HttpURLConnection) getUrl.openConnection();
		connection.connect();
		reader = new BufferedReader(new InputStreamReader(
				connection.getInputStream()));

		String line;
		while ((line = reader.readLine()) != null) {
			// empty loop
		}
	}

	catch (IOException e) {

	} finally {
		if(reader != null) {
			try {
				reader.close();
			}
			catch(Exception e) {

			}
		}
		connection.disconnect();
	}

}

得到的輸出如下：

Target queue memory usage (bytes): 100000
createTask() produced pool_size_calculate.AsyncIOTask which took 40 bytes in a queue
Formula: 100000 / 40
* Recommended queue capacity (bytes): 2500
Number of CPU: 4
Target utilization: 1
Elapsed time (nanos): 3000000000
Compute time (nanos): 47181000
Wait time (nanos): 2952819000
Formula: 4 * 1 * (1 + 2952819000 / 47181000)
* Optimal thread count: 256

推薦的任務隊列大小為2500，線程數為256，有點出乎意料之外。我可以如下構造一個線程池：

ThreadPoolExecutor pool =
 new ThreadPoolExecutor(256, 256, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(2500));

原創文章，轉載請注明：轉載自并發程式設計網 – ifeve.com本文連結位址: 如何合理地估算線程池大小？

如何合理地估算線程池大小？

添加本文到我的收藏

并發工具類（一）等待多線程完成的CountDownLatch
并發工具類（二）同步屏障CyclicBarrier
線程池
跟着執行個體學習ZooKeeper的用法：隊列
Java鎖的種類以及辨析（四）：可重入鎖
Bug:LinkedTransferQueue的資料暫失和CPU爆滿以及修複
《 Java并發程式設計從入門到精通》 Java線程池的監控
如何建立并運作java線程
AKKA文檔（Java版）—建立有限狀态機角色
LockSupport 源碼閱讀
并發集合（四）用優先級對使用阻塞線程安全的清單排序
JIT與可見性
《Java并發性和多線程介紹》-Java TheadLocal
并發集合（五）使用線程安全的、帶有延遲元素的清單
定制并發類（八）自定義在 Fork/Join 架構中運作的任務

原文位址：http://ifeve.com/how-to-calculate-threadpool-size/

如何合理地估算線程池大小？

Related posts:

繼續閱讀

關于JSch的問題描述

ThreadLocal線程局部變量-多線程與高并發

基于jdk1.8的Vector源碼分析

遊戲性能優化（基礎）

python兩種方法解決線程沖突問題線程沖突起因解決方案

面試題:vector和map的差別，異同。空間分布，100萬資料存哪個比較合适。一、疊代器差別二、vector三、Map、Set四、vector_map 為什麼比map效率高五、如何選擇六、容器選擇原則七、效率對比

C++ 多線程用條件變量确定線程的執行順序而不是使用 sleep(1)

C#多線程——前台線程和背景線程

線程同步，可重入鎖，synchronized

雲計算面試題——mysql/存儲引擎/備份

雲計算面試題——檔案/權限/分區/軟體包管理

測試理論面試題

面試題解析：你接口測試是怎麼做的？

測試面試題整理

軟體測試經典面試題（小題彙總）

JBoss,Geronimo和Glassfish初窺