天天看點

Consistent Hashing算法及相關技術

當面對bigdata, scale-up思路完全行不通, 需要使用scale-out來進行系統擴充的時候 

data sharding将是必須要面對的問題, how to map records to physical nodes?

1. load balance, 避免出現hotspots

2. 節點發生變化時, fail, new add, leave, 不會影響到sharding的結果

使用的方法,

1. 基于master, 比如google的bigtable, master來協調一切, 避免hotspots, 當節點失效或加入時, 經行相應的調整 

    所有的資料分布狀況資訊都存在master上, 當然問題就是單點

2. 基于内容的劃分, 比如時間, 地點, 問題在于無法解決hotspots問題

3. 基于hash的劃分 (partition = key mod (total_vns)), 這個可以比較有效的解決hotspots問題, 但是無法解決節點變化問題 

   當節點變化時, 會導緻之前的所有劃分失效

4, 一緻性hash, 比較理想的方案, 并且是去中心化設計

the idea of consistent hashing was introduced by david karger et al. in 1997 (cf. [kll+97]) in the context of a paper about “a family of caching protocols for distributed networks that can be used to decrease or eliminate the occurrence of hot spots in the networks”.

Consistent Hashing算法及相關技術

a到b就是, 普通hash到一緻性hash的轉化 

c, 當節點增加或删除時, 一緻性hash可以簡單的應對 

d, 為了load balance, 使用虛拟節點的概念, 并可以根據節點的能力配置設定不同個數的節點

client是否需要儲存一緻性hash環資訊, 并如何更新的問題? 

如果能容忍多一跳的延遲, client可以不儲存任何一緻性hash環資訊, 就近将request發給任一server, 由server進行coordinate

to provide high reiability from individually unreliable resource, we need to replicate the data partitions.

Consistent Hashing算法及相關技術

in a partitioned database where nodes may join and leave the system at any time without impacting its operation all nodes have to communicate with each other, especially when membership changes.

一緻性hash有效解決節點動态變化的問題, 也隻是将影響降到較小的範圍, 在節點變化時, 鄰接的節點仍然需要做一定的調整和資料transfer 

when a new node joins the system the following actions have to happen (cf. [ho09a]): 

1. the newly arriving node announces its presence and its identifier to adjacent nodes or to all nodes via broadcast. 

2. the neighbors of the joining node react by adjusting their object and replica ownerships. 

3. the joining node copies datasets it is now responsible for from its neighbours. this can be done in bulk and alsoasynchronously. 

4. if, in step 1, the membership change has not been broadcasted to all nodes, the joining node is now announcing its arrival.

Consistent Hashing算法及相關技術

when a node leaves the system the following actions have to occur (cf. [ho09a]):

1. nodes within the system need to detect whether a node has left as it might have crashed and not been able to notify the other nodes of its departure.

2. if a node’s departure has been detected, the neighbors of the node have to react by exchanging data with each other and adjusting their object and replica ownerships.

Consistent Hashing算法及相關技術

本文章摘自部落格園,原文釋出日期:2013-04-13

繼續閱讀