天天看點

# Twitter雪花算法(Snowflake)Twitter雪花算法(Snowflake)

Twitter雪花算法(Snowflake)

The Problem

We currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters. For various reasons, the details of which merit a whole blog post, we’re working to replace many of these systems with the Cassandra distributed database or horizontally sharded MySQL (using gizzard).

Unlike MySQL, Cassandra has no built-in way of generating unique ids – nor should it, since at the scale where Cassandra becomes interesting, it would be difficult to provide a one-size-fits-all solution for ids. Same goes for sharded MySQL.

Our requirements for this system were pretty simple, yet demanding:

We needed something that could generate tens of thousands of ids per second in a highly available manner. This naturally led us to choose an uncoordinated approach.

These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how we and most Twitter clients sort tweets.

Additionally, these numbers have to fit into 64 bits. We’ve been through the painful process of growing the number of bits used to store tweet ids before. It’s unsurprisingly hard to do when you have over 100,000 different codebases involved.

以上是Snowflake背景。

Snowflake算法是一個分布式下的ID生成器,官方由Scala實作。

Snowflake算法由64位存儲ID(對應Java中long類型),其中:

  • 1位辨別,由于long基本類型在Java中是帶符号的,最高位是符号位,正數是0,負數是1,是以ID一般是正數,最高位是0;
  • 41位時間截(毫秒級),注意,41位時間截不是存儲目前時間的時間截,而是存儲時間截的內插補點(目前時間截 - 開始時間截)得到的值,這裡的的開始時間截,一般是我們的ID生成器開始使用的時間,由我們程式來指定的(如下下面程式IdWorker類的startTime屬性)。41位的時間截,可以使用69年; 1 L < < 41 1000 L × 60 × 60 × 24 × 365 = 69 \frac{1L<<41}{1000L\times60\times60\times24\times365}=69 1000L×60×60×24×3651L<<41​=69
  • 10位的資料機器位,可以部署在1024個節點,包括5位datacenter ID和5位worker ID;
  • 12位序列,毫秒内的計數,12位的計數順序号支援每個節點每毫秒(同一機器,同一時間截)産生4096個ID序号;
  • ID有含義/可逆性, ID可以反解出來,加入我們對ID進行統計分析,我們可以很簡單的分析出整個系統的繁忙曲線。還可以下鑽到每個機器,在某段時間分别承擔了多少工作,很容易分析出負載均衡情況。這個特性對系統監控非常有價值。
    # Twitter雪花算法(Snowflake)Twitter雪花算法(Snowflake)

SnowFlake的優點是,整體上按照時間自增排序,并且整個分布式系統内不會産生ID碰撞(由資料中心ID和機器ID作區分),并且效率較高,經測試,SnowFlake每秒能夠産生26萬ID左右。

import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.concurrent.atomic.AtomicLong;

/**
 * java edition of Twitter <b>Snowflake</b>, a network service for generating
 * unique ID numbers at high scale with some simple guarantees.
 * 
 * https://github.com/twitter/snowflake
 */
public class Snowflake {

    /*
     * bits allocations for timeStamp, datacenterId, workerId and sequence
     */

    private final long unusedBits = 1L;
    /**
     * 'time stamp' here is defined as the number of millisecond that have
     * elapsed since the {@link #epoch} given by users on {@link Snowflake}
     * instance initialization
     */
    private final long timestampBits = 41L;
    private final long datacenterIdBits = 5L;
    private final long workerIdBits = 5L;
    private final long sequenceBits = 12L;

    /*
     * max values of timeStamp, workerId, datacenterId and sequence
     */
    private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); // 2^5-1
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits); // 2^5-1
    private final long maxSequence = -1L ^ (-1L << sequenceBits); // 2^12-1

    /**
     * left shift bits of timeStamp, workerId and datacenterId
     */
    private final long timestampShift = sequenceBits + datacenterIdBits + workerIdBits;
    private final long datacenterIdShift = sequenceBits + workerIdBits;
    private final long workerIdShift = sequenceBits;

    /*
     * object status variables
     */

    /**
     * reference material of 'time stamp' is '2016-01-01'. its value can't be
     * modified after initialization.
     */
    private final long epoch = 1451606400000L;

    /**
     * data center number the process running on, its value can't be modified
     * after initialization.
     * <p>
     * max: 2^5-1 range: [0,31]
     */
    private final long datacenterId;

    /**
     * machine or process number, its value can't be modified after
     * initialization.
     * <p>
     * max: 2^5-1 range: [0,31]
     * 
     */
    private final long workerId;

    /**
     * the unique and incrementing sequence number scoped in only one
     * period/unit (here is ONE millisecond). its value will be increased by 1
     * in the same specified period and then reset to 0 for next period.
     * <p>
     * max: 2^12-1 range: [0,4095]
     */
    private long sequence = 0L;

    /** the time stamp last snowflake ID generated */
    private long lastTimestamp = -1L;

    /**
     * generate an unique and incrementing id
     *
     * @return id
     */
    public synchronized long nextId() {
        long currTimestamp = timestampGen();

        if (currTimestamp < lastTimestamp) {
            throw new IllegalStateException(
                    String.format("Clock moved backwards. Refusing to generate id for %d milliseconds",
                            lastTimestamp - currTimestamp));
        }

        if (currTimestamp == lastTimestamp) {
            sequence = (sequence + 1) & maxSequence;
            if (sequence == 0) { // overflow: greater than max sequence
                currTimestamp = waitNextMillis(currTimestamp);
            }

        } else { // reset to 0 for next period/millisecond
            sequence = 0L;
        }

        // track and memo the time stamp last snowflake ID generated
        lastTimestamp = currTimestamp;

        return ((currTimestamp - epoch) << timestampShift) | //
                (datacenterId << datacenterIdShift) | //
                (workerId << workerIdShift) | // new line for nice looking
                sequence;
    }

    /**
     * @param datacenterId
     *            data center number the process running on, value range: [0,31]
     * @param workerId
     *            machine or process number, value range: [0,31]
     */
    public Snowflake(long datacenterId, long workerId) {
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(
                    String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(
                    String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }

        this.datacenterId = datacenterId;
        this.workerId = workerId;
    }

    /**
     * track the amount of calling {@link #waitNextMillis(long)} method
     */
    private final AtomicLong waitCount = new AtomicLong(0);

    /**
     * @return the amount of calling {@link #waitNextMillis(long)} method
     */
    public long getWaitCount() {
        return waitCount.get();
    }

    /**
     * running loop blocking until next millisecond
     * 
     * @param   currTimestamp   current time stamp
     * @return current time stamp in millisecond
     */
    protected long waitNextMillis(long currTimestamp) {
        waitCount.incrementAndGet();
        while (currTimestamp <= lastTimestamp) {
            currTimestamp = timestampGen();
        }
        return currTimestamp;
    }

    /**
     * get current time stamp
     * 
     * @return current time stamp in millisecond
     */
    protected long timestampGen() {
        return System.currentTimeMillis();
    }

    /**
     * show settings of Snowflake
     */
    @Override
    public String toString() {
        return "Snowflake Settings [timestampBits=" + timestampBits + ", datacenterIdBits=" + datacenterIdBits
                + ", workerIdBits=" + workerIdBits + ", sequenceBits=" + sequenceBits + ", epoch=" + epoch
                + ", datacenterId=" + datacenterId + ", workerId=" + workerId + "]";
    }
    
    public long getEpoch() {
        return this.epoch;
    }

    /**
     * extract time stamp, datacenterId, workerId and sequence number
     * information from the given id
     * 
     * @param id
     *            a snowflake id generated by this object
     * @return an array containing time stamp, datacenterId, workerId and
     *         sequence number
     */
    public long[] parseId(long id) {
        long[] arr = new long[5];
        arr[4] = ((id & diode(unusedBits, timestampBits)) >> timestampShift);
        arr[0] = arr[4] + epoch;
        arr[1] = (id & diode(unusedBits + timestampBits, datacenterIdBits)) >> datacenterIdShift;
        arr[2] = (id & diode(unusedBits + timestampBits + datacenterIdBits, workerIdBits)) >> workerIdShift;
        arr[3] = (id & diode(unusedBits + timestampBits + datacenterIdBits + workerIdBits, sequenceBits));
        return arr;
    }

    /**
     * extract and display time stamp, datacenterId, workerId and sequence
     * number information from the given id in humanization format
     * 
     * @param id snowflake id in Long format
     * @return snowflake id in String format
     */
    public String formatId(long id) {
        long[] arr = parseId(id);
        String tmf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS").format(new Date(arr[0]));
        return String.format("%s, #%d, @(%d,%d)", tmf, arr[3], arr[1], arr[2]);
    }

    /**
     * a diode is a long value whose left and right margin are ZERO, while
     * middle bits are ONE in binary string layout. it looks like a diode in
     * shape.
     * 
     * @param offset
     *            left margin position
     * @param length
     *            offset+length is right margin position
     * @return a long value
     */
    private long diode(long offset, long length) {
        int lb = (int) (64 - offset);
        int rb = (int) (64 - (offset + length));
        return (-1L << lb) ^ (-1L << rb);
    }

}
           

繼續閱讀