In Consistent Hashing I we introduced a relatively simple consistency hashing algorithm. This simple version has two defects:
- After adding a machine, the data comes from one of the machines. The read load of this machine is too large, which will affect the normal service.
- When adding to 3 machines, the load of each server is not balanced, it is 1:1:2
In order to solve this problem, the concept of micro-shards was introduced, and a better algorithm is like this:
- From 0~359 to a range of 0 ~ n-1, the interval is connected end to end and connected into a circle.
- When joining a new machine, randomly choose to sprinkle k points in the circle, representing the k micro-shards of the machine.
- Each data also corresponds to a point on the circumference, which is calculated by a hash function.
- Which machine belongs to which data is to be managed is determined by the machine to which the first micro-shard point that is clockwise touched on the circle is corresponding to the point on the circumference of the data.
n and k are typically 2^64 and 1000 in a real NoSQL database.
Implement these methods of introducing consistent hashing of micro-shard.
-
create(int n, int k)
-
// add a new machine, return a list of shard ids.addMachine(int machine_id)
-
// return machine idgetMachineIdByHashCode(int hashcode)
Example
Example 1:
Input:
create(100, 3)
addMachine(1)
getMachineIdByHashCode(4)
addMachine(2)
getMachineIdByHashCode(61)
getMachineIdByHashCode(91)
Output:
[77,83,86]
1
[15,35,93]
1
2
Example 2:
Input:
create(10, 5)
addMachine(1)
getMachineIdByHashCode(4)
addMachine(2)
getMachineIdByHashCode(0)
getMachineIdByHashCode(1)
getMachineIdByHashCode(2)
getMachineIdByHashCode(3)
getMachineIdByHashCode(4)
getMachineIdByHashCode(5)
Output:
[2,3,5,6,7]
1
[0,1,4,8,9]
2
2
1
1
2
1
Notice
When n is 2^64, there is almost no repetition in the interval within this interval.
However, in order to test the correctness of your program, n may be small in the data, so you must ensure that the k random numbers you generate will not be duplicated.
LintCode does not judge the correctness of your returnMachine's return (because it is a random number), it will only judge the correctness of your getMachineIdByHashCode result based on the result of the addMachine you return.
思路:懂得用treeMap,treeMap有個method ceilingEntry(key),可以找到比這個key稍微大一點最小的那個entry。還有random.nextInt(n); [0,n)
public class Solution {
/*
* @param n: a positive integer
* @param k: a positive integer
* @return: a Solution object
*/
private int n, k;
private TreeMap<Integer, Integer> nodeToMachine;
private HashSet<Integer> randomNumSet;
public static Solution create(int n, int k) {
Solution solution = new Solution();
solution.n = n;
solution.k = k;
solution.nodeToMachine = new TreeMap<Integer, Integer>();
solution.randomNumSet = new HashSet<Integer>();
return solution;
}
/*
* @param machine_id: An integer
* @return: a list of shard ids
*/
public List<Integer> addMachine(int machine_id) {
List<Integer> list = new ArrayList<Integer>();
Random random = new Random();
int count = 0;
while(count < k) {
int num = random.nextInt(n);
if(!randomNumSet.contains(num)){
count++;
list.add(num);
nodeToMachine.put(num, machine_id);
randomNumSet.add(num);
}
}
return list;
}
/*
* @param hashcode: An integer
* @return: A machine id
*/
public int getMachineIdByHashCode(int hashcode) {
// TreeMap 有ceilingEntry method去找比這個數key大的最小值;
Map.Entry<Integer, Integer> entry = nodeToMachine.ceilingEntry(hashcode);
if(entry == null) {
return nodeToMachine.firstEntry().getValue();
}
return entry.getValue();
}
}