Consistent Hashing II

In Consistent Hashing I we introduced a relatively simple consistency hashing algorithm. This simple version has two defects：

After adding a machine, the data comes from one of the machines. The read load of this machine is too large, which will affect the normal service.
When adding to 3 machines, the load of each server is not balanced, it is 1:1:2

In order to solve this problem, the concept of micro-shards was introduced, and a better algorithm is like this：

From 0~359 to a range of 0 ~ n-1, the interval is connected end to end and connected into a circle.
When joining a new machine, randomly choose to sprinkle k points in the circle, representing the k micro-shards of the machine.
Each data also corresponds to a point on the circumference, which is calculated by a hash function.
Which machine belongs to which data is to be managed is determined by the machine to which the first micro-shard point that is clockwise touched on the circle is corresponding to the point on the circumference of the data.

n and k are typically 2^64 and 1000 in a real NoSQL database.

Implement these methods of introducing consistent hashing of micro-shard.

create(int n, int k)
addMachine(int machine_id) // add a new machine, return a list of shard ids.
getMachineIdByHashCode(int hashcode) // return machine id

Example

Example 1:

Input:
  create(100, 3)
  addMachine(1)
  getMachineIdByHashCode(4)
  addMachine(2)
  getMachineIdByHashCode(61)
  getMachineIdByHashCode(91)
Output:
  [77,83,86]
  1
  [15,35,93]
  1
  2

Example 2:

Input:
  create(10, 5)
  addMachine(1)
  getMachineIdByHashCode(4)
  addMachine(2)
  getMachineIdByHashCode(0)
  getMachineIdByHashCode(1)
  getMachineIdByHashCode(2)
  getMachineIdByHashCode(3)
  getMachineIdByHashCode(4)
  getMachineIdByHashCode(5)
Output:
  [2,3,5,6,7]
  1
  [0,1,4,8,9]
  2
  2
  1
  1
  2
  1

Notice

When n is 2^64, there is almost no repetition in the interval within this interval.

However, in order to test the correctness of your program, n may be small in the data, so you must ensure that the k random numbers you generate will not be duplicated.

LintCode does not judge the correctness of your returnMachine's return (because it is a random number), it will only judge the correctness of your getMachineIdByHashCode result based on the result of the addMachine you return.

思路：懂得用treeMap，treeMap有個method ceilingEntry(key)，可以找到比這個key稍微大一點最小的那個entry。還有random.nextInt(n); [0,n)

public class Solution {
    /*
     * @param n: a positive integer
     * @param k: a positive integer
     * @return: a Solution object
     */
    private int n, k;
    private TreeMap<Integer, Integer> nodeToMachine;
    private HashSet<Integer> randomNumSet;
    public static Solution create(int n, int k) {
        Solution solution = new Solution();
        solution.n = n;
        solution.k = k;
        solution.nodeToMachine = new TreeMap<Integer, Integer>();
        solution.randomNumSet = new HashSet<Integer>();
        return solution;
    }

    /*
     * @param machine_id: An integer
     * @return: a list of shard ids
     */
    public List<Integer> addMachine(int machine_id) {
        List<Integer> list = new ArrayList<Integer>();
        Random random = new Random();
        int count = 0;
        while(count < k) {
            int num = random.nextInt(n);
            if(!randomNumSet.contains(num)){
                count++;
                list.add(num);
                nodeToMachine.put(num, machine_id);
                randomNumSet.add(num);
            }
        }
        return list;
    }

    /*
     * @param hashcode: An integer
     * @return: A machine id
     */
    public int getMachineIdByHashCode(int hashcode) {
        // TreeMap 有ceilingEntry method去找比這個數key大的最小值；
        Map.Entry<Integer, Integer> entry = nodeToMachine.ceilingEntry(hashcode);
        if(entry == null) {
            return nodeToMachine.firstEntry().getValue();
        }
        return entry.getValue();
    }
}

Consistent Hashing II

Example

Notice

繼續閱讀

通過TreeMap了解紅黑樹總體介紹預備知識方法剖析 TreeSet

Java基礎Map和Set

JAVA的Map接口及實作類，properties類的配置檔案、collections工具類的常用方法Map接口及實作類propertiescollections工具類

java SE基礎(Map接口及其實作)1。Map接口2。HashMap類3。SortedMap接口4。TreeMap類5。Hashtable類

Java基礎總結篇---Map接口

Java集合-TreeMap源碼

集合架構(HashSet，HashMap，HashTable)

吃透Java集合系列十二：TreeMap

java集合系列——Map介紹（七）

JAVA集合之二—TreeMap

【Java】集合系列12(TreeMap詳細介紹源碼解析和使用示例)

Map集合之TreeMap

Java集合-Map接口-TreeMap和HashTable

Map,HashMap,Hashtable,TreeMap總結

NC97 字元串出現次數的TopK問題