Java DualPivotQuickSort 雙軸快速排序源碼筆記

同步自簡書

DualPivotQuicksort source code

這個算法是Arrays.java中給基本類型的資料排序使用的具體實作。它針對每種基本類型都做了實作，實作的方式有稍微的差異，但是思路都是相同的，是以這裡隻挑了

int

類型的排序來看。

整個實作中的思路是首先檢查數組的長度，比一個門檻值小的時候直接使用雙軸快排。其它情況下，先檢查數組中資料的順序連續性。把數組中連續升序或者連續降序的資訊記錄下來，順便把連續降序的部分倒置。這樣資料就被切割成一段段連續升序的數列。

如果順序連續性好，直接使用TimSort算法。這個我們之前介紹過，TimSort算法的核心在于利用數列中的原始順序，是以可以提高很多效率。這裡的TimSort算法是之前介紹的TimSort算法的精簡版，剪掉了動态門檻值的那一部分。

順序連續性不好的數組直接使用了雙軸快排 + 成對插入排序。成對插入排序是插入排序的改進版，它采用了同時插入兩個元素的方式調高效率。雙軸快排是從傳統的單軸快排到3-way快排演化過來的，網上之前已經有很多部落格介紹這種算法。這裡推薦國外一篇文章，它的3張圖和下面的代碼幫助我了解了快排，3-way和雙軸快排之間的關系。

代碼風格來看感覺不如之前TimSort的代碼風格好。代碼中的變量命名大部分都是

a, b, i, k, j, t

這種，讓人不好了解。是以建議大家日常寫代碼也不要使用這種不明含義的命名。最好能做到讓其它人一看就懂，比如說用

index

代替

, 用

temp

等等。好在它的核心代碼部分注釋很全，看起來到不麻煩。

這裡對 jdk1.7中

java.util.DualPivotQuicksort

類中的源碼做了翻譯和解釋，有興趣的同學可以一起研究一下。

final class DualPivotQuicksort{


    /**
     * 保護這個類不被執行個體化
     */
    private DualPivotQuickSort(){}
        
    /**
     * 待合并的序列的最大數量
     */
    private static final int MAX_RUN_COUNT = 67;

    /**
     * 待合并的序列的最大長度
     */
    private static final int MAX_RUN_LENGTH = 33;

    /**
     * 如果參與排序的數組長度小于這個值，優先使用快速排序而不是歸并排序
     */
    private static final int QUICKSORT_THRESHOLD = 286;

    /**
     * 如果參與排序的數組長度小于這個值，有限考慮插入排序，而不是快速排序
     */
    private static final int INSERTION_SORT_THRESHOLD = 47; 

    /**
     * 給指定數組排序
     *
     * @param 指定的數組
     */
    public static void sort(int[] a) {
        sort(a, 0, a.length - 1);
    }

    /**
     * 給指定數組的指定範圍排序
     * @param 指定的數組
     * @param 指定範圍的第一個元素(包括)
     * @param 指定範圍的最後一個元素(不包括)
     */
    public static void sort(int[] a, int left, int right) {
        
        if(right-left < QUICKSORT_THRESHOLD){
            sort(a, left, right, true);
            return;
        }
        
        /**
         * run[i] 意味着第i個有序數列開始的位置，（升序或者降序）
         **/
        int[] run =new int[MAX_RUN_COUNT + 1];
        int count=0; run[0] = left;
        
        // 檢查數組是不是已經接近有序狀态
        for(int k = left; k < right; run[count] = k) {
            if(a[k] < a[k + 1]){ // 升序
                while(++k <= right && a[k - 1] <= a[k]) ;
            } else if(a[k] > a[k + 1]) { // 降序
                while(++k <=right && a[k - 1] >= a[k]);
                //如果是降序的，找出k之後，把數列倒置
                for (int lo = run[count],hi = k;++lo < --hi) {
                    int t = a[lo]; a[lo] = a[hi]; a[hi] = t;
                }
            } else { // 相等
                for(int m = MAX_RUN_LENGTH; ++k <=right && a[k - 1] == a[k];) {
                    // 數列中有至少MAX_RUN_LENGTH的資料相等的時候，直接使用快排。
                    // 這裡為什麼這麼處理呢？
                    if(--m == 0){
                        sort(a, left, right, true);
                        return;
                    }
                }
            }
            
            /**
             * 數組并非高度有序，使用快速排序,因為數組中有序數列的個數超過了MAX_RUN_COUNT
             */
            if(++count == MAX_RUN_COUNT) {
                sort(a, left, right, true);
                return;
            }
        }
        //檢查特殊情況
        if(run[count] == right++){ // 最後一個有序數列隻有最後一個元素
            run[++count] =right; // 那給最後一個元素的後面加一個哨兵
        } else if(count == 1) { // 整個數組中隻有一個有序數列，說明數組已經有序啦，不需要排序了
            return;
        }

        /**
         * 建立合并用的臨時數組。
         * 注意： 這裡變量right被加了1，它在數列最後一個元素位置+1的位置
         * 這裡沒看懂，沒發現後面的奇數處理和偶數處理有什麼不同
         */
        int[] b; byte odd=0;
        for(int n=1; (n <<= 1) < count; odd ^=1);

        if(odd == 0) {
            b=a;a= new int[b.length];
            for(int i=left -1; ++i < right; a[i] = b[i]);
        } else {
            b=new int[a.length];
        }

        // 合并
        // 最外層循環，直到count為1，也就是棧中待合并的序列隻有一個的時候，标志合并成功
        // a 做原始數組，b 做目标數組
        for(int last; count > 1; count = last) { 
            // 周遊數組，合并相鄰的兩個升序序列
            for(int k = (last = 0) + 2; k <= count; k += 2) {
                // 合并run[k-2] 與 run[k-1]兩個序列
                int hi = run[k], mi = run[k - 1];
                for(int i = run[k - 2], p = i,q = mi; i < hi; ++i){
                    // 這裡我給源碼加了一個括号，這樣好了解一點。 之前總覺得它會出現數組越界問題，
                    // 後來加了這個括号之後發現是沒有問題的
                    if(q >= hi  ||  (p < mi && a[p] <= a[q])) {
                        b[i] = a[p++];
                    } else {
                        b[i] = a[q++];
                    }
                }
                // 這裡把合并之後的數列往前移動
                run[++last] = hi;
            }
            // 如果棧的長度為奇數，那麼把最後落單的有序數列copy過對面
            if((count & 1) != 0) {
                for(int i = right, lo =run[count -1]; --i >= lo; b[i] = a[i]);
                run[++last] = right;
            }
            //臨時數組，與原始數組對調，保持a做原始數組，b 做目标數組
            int[] t = a; a = b; b = t;
        }

    }

    /**
     * 使用雙軸快速排序給指定數組的指定範圍排序
     * @param a 參與排序的數組
     * @param left 範圍内最左邊的元素的位置(包括該元素)
     * @param right 範圍内最右邊的元素的位置(包括該元素)
     * @param leftmost 指定的範圍是否在數組的最左邊
     */
     private static void sort(int[] a, int left, int right, boolean leftmost) {
        int length = right - left + 1;
        
        // 小數組使用插入排序
        if (length < INSERTION_SORT_THRESHOLD) {
            if(leftmost) {
                /**
                 * 經典的插入排序算法，不帶哨兵。做了優化，在leftmost情況下使用
                 */
                for(int i = left, j = i; i < right; j = ++i) {
                    int ai = a[i + 1];
                    while(ai < a[j]){
                        a[j + 1] = a[j];
                        if(j-- == left){
                            break;
                        }
                    }
                    a[j + 1] = ai;
                }
            } else {
               
               /**
                * 首先跨過開頭的升序的部分
                */
                do {
                    if(left > right) {
                        return;
                    }
                }while(a[++left] >= a[left - 1]);
                
                /**
                 * 這裡用到了成對插入排序方法，它比簡單的插入排序算法效率要高一些
                 * 因為這個分支執行的條件是左邊是有元素的
                 * 是以可以直接從left開始往前查找。
                 */
                for(int k = left; ++left <= right; k = ++left) {
                    int a1 = a[k], a2 = a[left];
                    
                    //保證a1>=a2
                    if(a1 < a2) {
                        a2 = a1; a1 = a[left];
                    }
                    //先把兩個數字中較大的那個移動到合适的位置
                    while(a1 < a[--k]) {
                        a[k + 2] = a[k]; //這裡每次需要向左移動兩個元素
                    }
                    a[++k + 1] = a1;
                    //再把兩個數字中較小的那個移動到合适的位置
                    while(a2 < a[--k]) {
                        a[k + 1] = a[k]; //這裡每次需要向左移動一個元素
                    }
                    a[k + 1] = a2;
                }
                int last = a[right];

                while(last < a[--right]) {
                    a[right + 1] = last;
                }
                a[right + 1] = last;
            }
            return;
        }
        
        // length / 7 的一種低複雜度的實作, 近似值(length * 9 / 64 + 1)
        int seventh = (length >> 3) + (length >> 6) + 1;
        
        // 對5段靠近中間位置的數列排序，這些元素最終會被用來做軸(下面會講)
        // 他們的標明是根據大量資料積累經驗确定的
        int e3 = (left + right) >>> 1; //中間值
        int e2 = e3 - seventh;
        int e1 = e2 - seventh;
        int e4 = e3 + seventh;
        int e5 = e4 + seventh;

        //這裡是手寫的冒泡排序，沒有for循環
        if(a[e2] < a[e1]){ int t = a[e2]; a[e2] = a[e1]; a[e1] = t; }
        if (a[e3] < a[e2]) {
            int t = a[e3]; a[e3] = a[e2]; a[e2] = t;
            if (t < a[e1]) {
                a[e2] = a[e1]; a[e1] = t; 
            }
        }
        if (a[e4] < a[e3]) {
            int t = a[e4]; a[e4] = a[e3]; a[e3] = t;
            if (t < a[e2]) {
                a[e3] = a[e2]; a[e2] = t;
                if (t < a[e1]) {
                    a[e2] = a[e1]; a[e1] = t;
                }
            }
        }
        if (a[e5] < a[e4]) {
            int t = a[e5]; a[e5] = a[e4]; a[e4] = t;
            if (t < a[e3]) {
                a[e4] = a[e3]; a[e3] = t;
                if (t < a[e2]) {
                    a[e3] = a[e2]; a[e2] = t;
                    if (t < a[e1]) {
                        a[e2] = a[e1]; a[e1] = t;
                    }
                }
            }
        }
        
        //指針
        int less = left;   // 中間區域的首個元素的位置
        int great = right; //右邊區域的首個元素的位置
        if (a[e1] != a[e2] && a[e2] != a[e3] && a[e3] != a[e4] && a[e4] != a[e5]) {
            /*
             * 使用5個元素中的2，4兩個位置，他們兩個大緻處在四分位的位置上。
             * 需要注意的是pivot1 <= pivot2
             */
            int pivot1 = a[e2];
            int pivot2 = a[e4];

            /*
             * The first and the last elements to be sorted are moved to the
             * locations formerly occupied by the pivots. When partitioning
             * is complete, the pivots are swapped back into their final
             * positions, and excluded from subsequent sorting.
             * 第一個和最後一個元素被放到兩個軸所在的位置。當階段性的分段結束後
             * 他們會被配置設定到最終的位置并從子排序階段排除
             */
            a[e2] = a[left];
            a[e4] = a[right];

            /*
             * 跳過一些隊首的小于pivot1的值，跳過隊尾的大于pivot2的值
             */
            while (a[++less] < pivot1);
            while (a[--great] > pivot2);

            /*
             * Partitioning:
             *
             *   left part           center part                   right part
             * +--------------------------------------------------------------+
             * |  < pivot1  |  pivot1 <= && <= pivot2  |    ?    |  > pivot2  |
             * +--------------------------------------------------------------+
             *               ^                          ^       ^
             *               |                          |       |
             *              less                        k     great
             *
             * Invariants:
             *
             *              all in (left, less)   < pivot1
             *    pivot1 <= all in [less, k)     <= pivot2
             *              all in (great, right) > pivot2
             *
             * Pointer k is the first index of ?-part.
             */
            outer:
            for (int k = less - 1; ++k <= great; ) {
                int ak = a[k];
                if (ak < pivot1) { // Move a[k] to left part
                    a[k] = a[less];
                    /*
                     * 這裡考慮的好細緻，"a[i] = b; i++"的效率要好過
                     * 'a[i++] = b'
                     */
                    a[less] = ak;
                    ++less;
                } else if (ak > pivot2) { // Move a[k] to right part
                    while (a[great] > pivot2) {
                        if (great-- == k) { // k遇到great本次分割
                            break outer;
                        }
                    }
                    if (a[great] < pivot1) { // a[great] <= pivot2
                        a[k] = a[less];
                        a[less] = a[great];
                        ++less;
                    } else { // pivot1 <= a[great] <= pivot2
                        a[k] = a[great];
                    }
                    /*
                     * 同上，用"a[i]=b;i--"代替"a[i--] = b"
                     */
                    a[great] = ak;
                    --great;
                }
            } // 分割階段結束出來的位置,上一個outer結束的位置

            // 把兩個放在外面的軸放回他們應該在的位置上 
            a[left]  = a[less  - 1]; a[less  - 1] = pivot1;
            a[right] = a[great + 1]; a[great + 1] = pivot2;

            // 把左邊和右邊遞歸排序，跟普通的快速排序差不多
            sort(a, left, less - 2, leftmost);
            sort(a, great + 2, right, false);

            /*
             * If center part is too large (comprises > 4/7 of the array),
             * swap internal pivot values to ends.
             * 如果中心區域太大，超過數組長度的 4/7。就先進行預處理，再參與遞歸排序。
             * 預處理的方法是把等于pivot1的元素統一放到左邊，等于pivot2的元素統一
             * 放到右邊,最終産生一個不包含pivot1和pivot2的數列，再拿去參與快排中的遞歸。
             */
            if (less < e1 && e5 < great) {
                /*
                 * Skip elements, which are equal to pivot values.
                 */
                while (a[less] == pivot1) {
                    ++less;
                }

                while (a[great] == pivot2) {
                    --great;
                }

                /*
                 * Partitioning:
                 *
                 *   left part         center part                  right part
                 * +----------------------------------------------------------+
                 * | == pivot1 |  pivot1 < && < pivot2  |    ?    | == pivot2 |
                 * +----------------------------------------------------------+
                 *              ^                        ^       ^
                 *              |                        |       |
                 *             less                      k     great
                 *
                 * Invariants:
                 *
                 *              all in (*,  less) == pivot1
                 *     pivot1 < all in [less,  k)  < pivot2
                 *              all in (great, *) == pivot2
                 *
                 * Pointer k is the first index of ?-part.
                 */
                outer:
                for (int k = less - 1; ++k <= great; ) {
                    int ak = a[k];
                    if (ak == pivot1) { // Move a[k] to left part
                        a[k] = a[less];
                        a[less] = ak;
                        ++less;
                    } else if (ak == pivot2) { // Move a[k] to right part
                        while (a[great] == pivot2) {
                            if (great-- == k) {
                                break outer;
                            }
                        }
                        if (a[great] == pivot1) { // a[great] < pivot2
                            a[k] = a[less];
                            /*
                             * Even though a[great] equals to pivot1, the
                             * assignment a[less] = pivot1 may be incorrect,
                             * if a[great] and pivot1 are floating-point zeros
                             * of different signs. Therefore in float and
                             * double sorting methods we have to use more
                             * accurate assignment a[less] = a[great].
                             */
                            a[less] = pivot1;
                            ++less;
                        } else { // pivot1 < a[great] < pivot2
                            a[k] = a[great];
                        }
                        a[great] = ak;
                        --great;
                    }
                } // outer結束的位置
            }

            // Sort center part recursively
            sort(a, less, great, false);

        } else { // 這裡選取的5個元素剛好相等，使用傳統的3-way快排 
            
            /*
             * 在5個元素中取中值
             */
            int pivot = a[e3];

            /*
             * 
             * Partitioning degenerates to the traditional 3-way
             * (or "Dutch National Flag") schema:
             *
             *   left part    center part              right part
             * +-------------------------------------------------+
             * |  < pivot  |   == pivot   |     ?    |  > pivot  |
             * +-------------------------------------------------+
             *              ^              ^        ^
             *              |              |        |
             *             less            k      great
             *
             * Invariants:
             *
             *   all in (left, less)   < pivot
             *   all in [less, k)     == pivot
             *   all in (great, right) > pivot
             *
             * Pointer k is the first index of ?-part.
             */
            for (int k = less; k <= great; ++k) {
                if (a[k] == pivot) {
                    continue;
                }
                int ak = a[k];
                if (ak < pivot) { // 把a[k]移動到左邊去，把center區向右滾動一個機關
                    a[k] = a[less];
                    a[less] = ak;
                    ++less;
                } else { // a[k] > pivot - 把a[k]移動到右邊
                    while (a[great] > pivot) { // 先找到右邊最後一個比pivot小的值
                        --great;
                    }
                    if (a[great] < pivot) { // a[great] <= pivot ，把他移到左邊
                        a[k] = a[less];
                        a[less] = a[great];
                        ++less;
                    } else { // a[great] == pivot //如果相等，中心區直接擴充
                        /*
                         * 這裡因為是整型值，是以a[k] == a[less] == pivot;
                         */
                        a[k] = pivot;
                    }
                    a[great] = ak;
                    --great;
                }
            }

            /*
             * 左右兩邊還沒有完全排序，是以遞歸解決
             * 中心區隻有一個值，不再需要排序
             */
            sort(a, left, less - 1, leftmost);
            sort(a, great + 1, right, false);
        }
     }
}

源碼來自 jdk1.7/src/java/util/DualPivotQuickSort.java 檔案，這裡對其中的邏輯進行了翻譯和解釋。如有侵權，馬上删除。

于曉飛

Java DualPivotQuickSort 雙軸快速排序源碼筆記

DualPivotQuicksort source code

繼續閱讀

關于Gradle配置的小結

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method

Java DualPivotQuickSort 雙軸快速排序 源碼 筆記

DualPivotQuicksort source code

繼續閱讀

Java DualPivotQuickSort 雙軸快速排序源碼筆記