天天看點

NUMA and node interleaving

Chris Linton-Ford writes:

> "Enable Node Interleaving": initially disabled, when turned on the

> operating system booted, but I got the following error message:

> "MPO disabled because memory is interleaved"

You definitely don't want "Node Interleaving."  As a BIOS option, it

wins a special award for being the most misleading misfeature since

plug-n-play.

Opteron systems have memory attached to each CPU, but a global address

space.  When you access memory that's not local to your CPU, you incur

a penalty going over the internal "hypertransport" between the CPUs.

On OSes (such as Solaris) that are aware of NUMA hardware, that's not

a problem.  The OS automatically optimizes the allocation of memory

based on where the task is running (or vice-versa) so that you get

good locality.

On OSes that are ignorant of this sort of hardware, naive testing can

produce strange results.  You run your test once and it runs fast.

Run it again, and it's slow.  Run again, it's fast again.  It all

depends on where Wind^Wthat OS puts your application, and it's

unpredictable.

This is where "Node Interleaving" comes in.  Turning that misfeature

on causes memory to be mapped page-by-page in a round-robin fashion

among the CPUs instead of being contiguous.  In other words, every 4K

boundary talks to a different CPU in the system.  This averages out

the costs, making everything run equally poorly, and making your naive

OS and benchmarks seem to be stable.

The OS can't keep track of that sort of addressing mess, so it causes

MPO to be turned off.  You don't want this on Solaris.

繼續閱讀