天天看點

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

php, apache, perl, bsddb都使用time33哈希.

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  uint32_t time33(char const *str, int len) 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        unsigned long  hash = 0; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        for (int i = 0; i < len; i++) { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            hash = hash *33 + (unsigned long) str[i]; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        } 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        return hash; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    }

這個版本最可以展現time33的算法思路,夠簡單。

把乘法操作換成位操作

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        unsigned long time33(char const *str, int len) 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            hash = ((hash <<5) + hash) + (unsigned long) str[i]; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    } 

59個字元1000 0000次運作(gcc沒有開啟優化,因為開了優化後兩個函數的實際代碼會一樣)

第一個:

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

real    0m4.389s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

user    0m4.388s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

sys     0m0.000s

第二個:

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

real    0m4.137s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

user    0m4.120s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

gcc –O2優化後是

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

real    0m1.367s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

user    0m1.360s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

inline unsigned time33(char const*str, int len) 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

     unsigned long hash = 5381; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

     /* variant with the hash unrolled eight times */ 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

     for (; len >= 8; len -= 8) { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

         hash = ((hash << 5) + hash) + *str++; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        hash = ((hash << 5) + hash) + *str++; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    switch (len) { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        case 7: hash = ((hash << 5) + hash) + *str++; /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

 */ 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        case 6: hash = ((hash << 5) + hash) + *str++; /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        case 5: hash = ((hash << 5) + hash) + *str++; /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        case 4: hash = ((hash << 5) + hash) + *str++; /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        case 3: hash = ((hash << 5) + hash) + *str++; /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        case 2: hash = ((hash << 5) + hash) + *str++; /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        case 1: hash = ((hash << 5) + hash) + *str++; break; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        case 0: break; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    return hash; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

59個字元,1000 0000次

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

real    0m1.088s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

user    0m1.068s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

速度提升主要在循環展開, 對于短字元,這個是不明顯的。

php版本的hash初始值是5381, 這個

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

Magic Constant 5381:

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  1. odd number

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  2. prime number

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  3. deficient number

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  4. 001/010/100/000/101 b

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

unsigned long time33(char const  *str, int *len) 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    unsigned long hash = 0;

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    const char *p=str; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    if (*len<=0) { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        for(p = str; *p; p++) { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            hash = hash * 33 + *p; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        *len = p - str; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    else { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        int i = *len; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        for (p = str;i; i--, p++) { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

}

測試結果

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

real    0m1.418s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

user    0m1.412s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

sys     0m0.004s

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

#define likely(x) __builtin_expect((x),1) 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

#define unlikely(x) __builtin_expect((x),0) 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    //php版本 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

    unsigned long time33(char const *str, int len=-1) 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        unsigned long hash = 5381; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        /* variant with the hash unrolled eight times */ 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        char const *p = str; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        if (unlikely(len<0)) { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

                for(; *p; p++) { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

                    hash = hash * 33 + *p; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

                } 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

                return hash; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        }

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

#define TIME33_HASH_MIXED_CH()  hash = ((hash<<5)+hash) + *p++ 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

        for (; len >= 8; len -= 8) { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            TIME33_HASH_MIXED_CH(); //1 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            TIME33_HASH_MIXED_CH(); //2 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            TIME33_HASH_MIXED_CH(); //3 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            TIME33_HASH_MIXED_CH(); //4 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            TIME33_HASH_MIXED_CH(); //5 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            TIME33_HASH_MIXED_CH(); //6 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            TIME33_HASH_MIXED_CH(); //7 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

            TIME33_HASH_MIXED_CH(); //8 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

       } 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

       switch (len) { 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

           case 7: TIME33_HASH_MIXED_CH(); /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

           case 6: TIME33_HASH_MIXED_CH(); /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

           case 5: TIME33_HASH_MIXED_CH(); /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

           case 4: TIME33_HASH_MIXED_CH(); /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

           case 3: TIME33_HASH_MIXED_CH(); /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

           case 2: TIME33_HASH_MIXED_CH(); /* fallthrough

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

           case 1: TIME33_HASH_MIXED_CH(); break; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

           case 0: break; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

       return hash; 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

   }

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

#undef TIME33_HASH_MIXED_CH

time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash
time33 哈希函數,又叫 DJBX33A,Bernstein's hash

real    0m1.072s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

user    0m1.064s 

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

測試過, 重複率在 1/2000。

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

DJBX33A (Daniel J. Bernstein, Times 33 with Addition)

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  This is Daniel J. Bernstein's popular `times 33' hash function as

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  posted by him years ago on comp.lang.c. It basically uses a function

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  known hash functions for strings. Because it is both computed very

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  fast and distributes very well.

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  The magic of number 33, i.e. why it works better than many other

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  constants, prime or not, has never been adequately explained by

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  anyone. So I try an explanation: if one experimentally tests all

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  multipliers between 1 and 256 (as RSE did now) one detects that even

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  numbers are not useable at all. The remaining 128 odd numbers

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  (except for the number 1) work more or less all equally well. They

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  all distribute in an acceptable way and this way fill a hash table

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  with an average percent of approx. 86%.

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  If one compares the Chi^2 values of the variants, the number 33 not

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  even has the best value. But the number 33 and a few other equally

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  good numbers like 17, 31, 63, 127 and 129 have nevertheless a great

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  advantage to the remaining numbers in the large set of possible

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  multipliers: their multiply operation can be replaced by a faster

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  operation based on just one shift plus either a single addition

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  or subtraction operation. And because a hash function has to both

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  distribute good _and_ has to be very fast to compute, those few

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  numbers should be preferred and seems to be the reason why Daniel J.

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

  Bernstein also preferred it.

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

                   -- Ralf S. Engelschall [email protected]

time33 哈希函數,又叫 DJBX33A,Bernstein's hash

Bob在他的文章說,小寫英文詞彙适合33, 大小寫混合使用65。time33比較适合的是英文詞彙的hash.