如何求二进制表示中“1”的个数

前言

刚翻开《编程之美》，中间就有一道我很眼熟的题，“求二进制中1的个数”。书中的题目描述如下：

对于一个字节(8 bit)的无符号整型变量，求其二进制表示中“1”的个数，要求算法执行效率尽可能高。

这当然是一道比较简单的题目，而我在LeetCode上也做过这道题——

Number of 1 Bits: https://leetcode.com/problems/number-of-1-bits/

Write a function that takes an unsigned integer and returns the number of ’1’ bits it has (also known as the Hamming weight).

For example, the 32-bit integer 11 has binary representation “00000000000000000000000000001011”, so the function should return 3.

虽然题目本身，但鉴于此题十分经典，且做法较多，我们在此把（我所知道的）各种可能的方案都讨论一下。

分析与解答

首先想到的一种方案大概就是一直除以2然后统计，先给个例子：

对于10100010(bin)，值为162(Dec)，

第一次除以2，商1010001(81)，余0；

第二次除以2，商101000(40)，余1；

第三次除以2，商10100(20)，余0；

第四次除以2，商1010(10)，余0；

第五次除以2，商101(5)，余0；

第六次除以2，商10(2)，余1；

第七次除以2，商1，余0；

第八次除以2，商0，余1；

我们注意到，将数字不断除以2，其实就是不断从最低位开始“丢弃”，如果最低位为1，再除以2的时候就余1，否则余0。那么我们就可以通过不断除以2，并判断模2的值来统计一共有多少个“1”了。

class Solution {
public:
    int hammingWeight(uint32_t n) {
      int count = ;
      while (n) {
        if (n %  == ) {
          count++;
        }
        n /= ;
      }
      return count;
    }
};

但是上面这种想法明显“朴素”了一点，既然是要对二进制表示操作，直接位运算即可。我们可以不断右移一位，但如何判断最低位是不是1呢？可以将数字和0x1进行“与”运算。这个思路相比第一种解法就更为直接了，而且位运算也比除，余要快。

class Solution {
public:
    int hammingWeight(uint32_t n) {
      int count = ;
      while (n) {
        count += n & ;
        n >>= ;
      }
      return count;
    }
};

不过就算如此，我们发现这个解法的时间复杂度仍为O(logn)，注意，这里的log是指以2为底的对数，logn其实就是n的二进制表示的位数。《编程之美》书中提出了一个问题，能否让算法的复杂度只与“1”的个数有关？实际上，做到这点并不需要多么复杂的思路。《编程之美》给出了这样一个例子——

先考虑只有一个“1”的情况，如何判断一个给定的二进制数中有且仅有一个1？如n = 01000000，对于这个n，我们可以进行一个“与”操作，作01000000&00111111，得到了0代表着只有一个“1”。而这个操作也可以写成这样： n & (n-1)。

我们再考察有两个“1”的情况，如n = 00100100，则n - 1 = 00100011，n&(n-1) = 00100000，而00100000&(00100000 - 1) = 0。

根据这样的分析，我们能够写出下面的代码：

class Solution {
public:
  int hammingWeight(uint32_t n) {
    int count = ;
    for (; n != ; count++) {
        n = n & (n-);
    }     
    return count;
  }
};

显然，如果给定的n的二进制表示中有m个“1”，循环就将执行m次，我们成功地获得了一种更快的算法。

书中还给出了另外两个针对8bit数的解法，一个是使用switch，相当于打表，但相当笨拙。基于这种拿空间换时间的思路，我们可以得出另一种解法——把已知的结果存储在size为256的数组中，然后查表返回值即可。仅就时间复杂度而言，这个算是最快的。

书上还提到了一个扩展问题，给定两个正整数A和B，问把A变成B需要改变多少位，换句话说，A和B的二进制表示中有多少位是不同的？

实际上，这个问题非常简单，设想如果有一个数字C，它的二进制中“1”的个数刚好就是A和B中不同位的个数，那么我们只需要对C求解“二进制表示中1的个数”就可以，上文已经给出了各种不同的方案。那么该如何得到这个C呢？这也不难，位运算提供了最直接的做法： C = A ^ B，异或一下就行。

uint32_t cal(uint32_t a, uint32_t b) {
  uint32_t c = a ^ b;
  int num = ;
  while (c) {
    c &= (c-);
    num++;
  }
  return num;
}

但这个议题其实还可以继续深挖——

LeetCode提示我们，这道题与“Hamming Weight”有关，什么是这个“Hamming Weight”呢？”汉明重量是一串符号中非零符号的个数。因此它等同于同样长度的全零符号串的汉明距离。在最为常见的数据位符号串中，它是1的个数。”所以其实“求二进制中1的个数”就是求汉明重量，而书后的扩展问题，就是求两个字符串的汉明距离。

Wikipedia为我们揭示了这样一种比较玄妙的做法（这个思路同时也可以在LeetCode讨论区的高票答案中看到）。实际上是类似一种“分治法”（Divide and Conquer）的思路，下面的代码是wikipedia针对64位长数字给出的方案：

const uint64_t m1  = ; //binary: 0101...
const uint64_t m2  = ; //binary: 00110011..
const uint64_t m4  = ; //binary:  4 zeros,  4 ones ...
const uint64_t m8  = ; //binary:  8 zeros,  8 ones ...
const uint64_t m16 = ; //binary: 16 zeros, 16 ones ...
const uint64_t m32 = ; //binary: 32 zeros, 32 ones
const uint64_t hff = ; //binary: all ones
const uint64_t h01 = ; //the sum of 256 to the power of 0,1,2,3...

//This is a naive implementation, shown for comparison,
//and to help in understanding the better functions.
//It uses 24 arithmetic operations (shift, add, and).
int popcount_1(uint64_t x) {
    x = (x & m1 ) + ((x >>  ) & m1 ); //put count of each  2 bits into those  2 bits 
    x = (x & m2 ) + ((x >>  ) & m2 ); //put count of each  4 bits into those  4 bits 
    x = (x & m4 ) + ((x >>  ) & m4 ); //put count of each  8 bits into those  8 bits 
    x = (x & m8 ) + ((x >>  ) & m8 ); //put count of each 16 bits into those 16 bits 
    x = (x & m16) + ((x >> ) & m16); //put count of each 32 bits into those 32 bits 
    x = (x & m32) + ((x >> ) & m32); //put count of each 64 bits into those 64 bits 
    return x;
}

//This uses fewer arithmetic operations than any other known  
//implementation on machines with slow multiplication.
//It uses 17 arithmetic operations.
int popcount_2(uint64_t x) {
    x -= (x >> ) & m1;             //put count of each 2 bits into those 2 bits
    x = (x & m2) + ((x >> ) & m2); //put count of each 4 bits into those 4 bits 
    x = (x + (x >> )) & m4;        //put count of each 8 bits into those 8 bits 
    x += x >>  ;  //put count of each 16 bits into their lowest 8 bits
    x += x >> ;  //put count of each 32 bits into their lowest 8 bits
    x += x >> ;  //put count of each 64 bits into their lowest 8 bits
    return x & ;
}

//This uses fewer arithmetic operations than any other known  
//implementation on machines with fast multiplication.
//It uses 12 arithmetic operations, one of which is a multiply.
int popcount_3(uint64_t x) {
    x -= (x >> ) & m1;             //put count of each 2 bits into those 2 bits
    x = (x & m2) + ((x >> ) & m2); //put count of each 4 bits into those 4 bits 
    x = (x + (x >> )) & m4;        //put count of each 8 bits into those 8 bits 
    return (x * h01)>>;  //returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + ... 
}

在这个代码中，popcount_1是最朴素的形式，可以帮助理解后面的popcount_2和popcount_3，具体思路在注释中也写的很明白了。对于32位宽的数字，如LeetCode的这道题，我们可以改写成以下形式：

class Solution {
public:
    const int helper1 = ;
    const int helper2 = ;
    const int helper3 = ;
    const int helper4 = ;
    const int helper5 = ;
    int hammingWeight(uint32_t n) {
      n = (n & helper1) + (n >>   & helper1); // put count of each  2 bits into those  2 bits 
      n = (n & helper2) + (n >>   & helper2); // put count of each  4 bits into those  4 bits 
      n = (n & helper3) + (n >>   & helper3); // put count of each  8 bits into those  8 bits 
      n = (n & helper4) + (n >>   & helper4); // put count of each 16 bits into those 16 bits 
      n = (n & helper5) + (n >>  & helper5); // put count of each 32 bits into those 32 bits 
      return n;  
    }
};

关于Hamming Weight的其他一些信息，可参考Wikipedia的词条页面。

其他

这道题虽然很简单，但背后包含的知识背景和代码技巧还是很多的，以我浅见，写程序就需要这种不懈探索，乐于学习的精神。一道面试题如此，做工程如此，乃至作为项目管理者、领导者应该亦如此。

如何求二进制表示中“1”的个数

前言

分析与解答

其他

继续阅读

Command Network(POJ 3164)---定根最小树形图模板题题目描述输入格式输出格式输入样例输出样例分析源程序

开源低带宽语音编解码器

241 Different Ways to Add Parentheses（C代码版）

【趋高机器视觉】机器视觉技术原理解析及解决方案

LeetCode-110-平衡二叉树-C语言

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制规程及特点4． CSMA/CD协议5． CSMA/CD的优点6．结束语

极大似然法(ML)与最大期望法(EM)

C++ 第十五周报告1--《冒泡法排序》

笔试面试题目：滑动窗口(二)

27. Remove Element(列表)题目代码

数据结构与算法（27）——排序（二）

Dijkstra--简易版（最短路径）

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

LeetCode OJ Binary Tree Right Side View

leetcode809

hdu7108哈希