玩轉Java8的 Stream 之Collectors收集器

之前的文章中也提到了，Stream 的核心在于Collectors，即對處理後的資料進行收集。Collectors 提供了非常多且強大的API，可以将最終的資料收內建List、Set、Map，甚至是更複雜的結構(這三者的嵌套組合)。

Collectors 提供了很多API，有很多都是一些函數的重載，這裡我個人将其分為三大類，如下：

資料收集：set、map、list
聚合歸約：統計、求和、最值、平均、字元串拼接、規約
前後處理：分區、分組、自定義操作

API 使用

這裡會講到一些常用API 的用法，不會講解所有API，因為真的是太多了，而且各種API的組合操作起來太可怕太複雜了。

資料收集

1.Collectors.toCollection() 将資料轉成Collection，隻要是Collection 的實作都可以，例如ArrayList、HashSet ，該方法接受一個Collection 的實作對象或者說Collection 工廠的入參。

示例：

//List
Stream.of(1,2,3,4,5,6,8,9,0)
    .collect(Collectors.toCollection(ArrayList::new));
//Set
Stream.of(1,2,3,4,5,6,8,9,0)
     .collect(Collectors.toCollection(HashSet::new));

2.Collectors.toList()和Collectors.toSet() 其實和Collectors.toCollection() 差不多，隻是指定了容器的類型，預設使用ArrayList 和 HashSet。本來我以為這兩個方法的内部會使用到Collectors.toCollection()，結果并不是，而是在内部new了一個CollectorImpl。

預期：

public static <T>
    Collector<T, ?, List<T>> toList() {
    return toCollection(ArrayList::new);
}
public static <T>
    Collector<T, ?, Set<T>> toSet() {
    return new toCollection(HashSet::new);
}

剛開始真是不知道作者是怎麼想的，後來發現CollectorImpl 是需要一個

Set<Collector.Characteristics>

(特征集合)的東西，由于Set 是無序的，在toSet()方法中的實作傳入了CH_UNORDERED_ID，但是toCollection()方法默都是CH_ID，難道是說在使用toCollecion()方法時不建議傳入Set類型？如果有人知道的話，麻煩你告訴我一下。

示例：

//List
Stream.of(1,2,3,4,5,6,8,9,0)
                .collect(Collectors.toList());
//Set
Stream.of(1,2,3,4,5,6,8,9,0)
                .collect(Collectors.toSet());

Collectors.toMap() 和Collectors.toConcurrentMap()，見名知義，收內建Map和ConcurrentMap，預設使用HashMap和ConcurrentHashMap。這裡toConcurrentMap()是可以支援并行收集的，這兩種類型都有三個重載方法，不管是Map 還是ConcurrentMap，他們和Collection的差別是Map 是K-V 形式的，是以在收內建Map的時候必須指定收集的K(依據)。這裡toMap()和toConcurrentMap() 最少參數是，key的擷取，要存的value。

示例：這裡以Student 這個結構為例，Student 包含 id、name。

public class Student{

        //唯一
        private String id;

        private String name;

        public Student() {
        }

        public Student(String id, String name) {
            this.id = id;
            this.name = name;
        }

        public String getId() {
            return id;
        }

        public void setId(String id) {
            this.id = id;
        }

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }
    }

說明：這裡制定k 為 id，value 既可以是對象本身，也可以指定對象的某個字段。可見，map的收集自定義性非常高。

Student studentA = new Student("20190001","小明");
Student studentB = new Student("20190002","小紅");
Student studentC = new Student("20190003","小丁");
//Function.identity() 擷取這個對象本身，那麼結果就是Map<String,Student> 即 id->student
//串行收集
Stream.of(studentA,studentB,studentC)
                .collect(Collectors.toMap(Student::getId,Function.identity()));
//并發收集
Stream.of(studentA,studentB,studentC)
                .parallel()
                .collect(Collectors.toConcurrentMap(Student::getId,Function.identity()));
//================================================================================
//Map<String,String> 即 id->name
//串行收集
Stream.of(studentA,studentB,studentC)
                .collect(Collectors.toMap(Student::getId,Student::getName));
//并發收集
Stream.of(studentA,studentB,studentC)
                .parallel()
                .collect(Collectors.toConcurrentMap(Student::getId,Student::getName));

那麼如果key重複的該怎麼處理？這裡我們假設有兩個id相同Student，如果他們id相同，在轉成Map的時候，取name大一個，小的将會被丢棄。

//Map<String,Student>
Stream.of(studentA, studentB, studentC)
                .collect(Collectors
                        .toMap(Student::getId,
                                Function.identity(),
                                BinaryOperator
                                        .maxBy(Comparator.comparing(Student::getName))));
//可能上面比較複雜，這編寫一個指令式
//Map<String,Student>
Stream.of(studentA, studentB, studentC)
                .collect(Collectors
                        .toMap(Student::getId,
                                Function.identity(),
                                (s1, s2) -> {
    //這裡使用compareTo 方法 s1>s2 會傳回1,s1==s2 傳回0 ，否則傳回-1
    if (((Student) s1).name.compareTo(((Student) s2).name) < -1) {
        return s2;
    } else {
        return s1;
    }
}
));

如果不想使用預設的HashMap 或者 ConcurrentHashMap , 第三個重載方法還可以使用自定義的Map對象(Map工廠)。

//自定義LinkedHashMap
//Map<String,Student>
Stream.of(studentA, studentB, studentC)
                .collect(Collectors
                        .toMap(Student::getId,
                                Function.identity(),
                                BinaryOperator
                                        .maxBy(Comparator.comparing(Student::getName)),
                                LinkedHashMap::new));

聚合歸約

Collectors.joining()，拼接，有三個重載方法，底層實作是StringBuilder，通過append方法拼接到一起，并且可以自定義分隔符（這個感覺還是很有用的，很多時候需要把一個list轉成一個String，指定分隔符就可以實作了，非常友善）、字首、字尾。

示例：

Student studentA = new Student("20190001", "小明");
Student studentB = new Student("20190002", "小紅");
Student studentC = new Student("20190003", "小丁");
//使用分隔符：201900012019000220190003
Stream.of(studentA, studentB, studentC)
                .map(Student::getId)
                .collect(Collectors.joining());
//使用^_^ 作為分隔符
//20190001^_^20190002^_^20190003
Stream.of(studentA, studentB, studentC)
                .map(Student::getId)
                .collect(Collectors.joining("^_^"));
//使用^_^ 作為分隔符
//[]作為前字尾
//[20190001^_^20190002^_^20190003]
Stream.of(studentA, studentB, studentC)
                .map(Student::getId)
                .collect(Collectors.joining("^_^", "[", "]"));

Collectors.counting() 統計元素個數，這個和Stream.count() 作用都是一樣的，傳回的類型一個是包裝Long，另一個是基本long，但是他們的使用場景還是有差別的，這個後面再提。

示例：

// Long 8
Stream.of(1,0,-10,9,8,100,200,-80)
                .collect(Collectors.counting());
//如果僅僅隻是為了統計，那就沒必要使用Collectors了，那樣更消耗資源
// long 8
Stream.of(1,0,-10,9,8,100,200,-80)
                .count();

Collectors.minBy()、Collectors.maxBy() 和Stream.min()、Stream.max() 作用也是一樣的，隻不過Collectors.minBy()、Collectors.maxBy()适用于進階場景。

示例：

// maxBy 200
Stream.of(1, 0, -10, 9, 8, 100, 200, -80)
                .collect(Collectors.maxBy(Integer::compareTo)).ifPresent(System.out::println);
// max 200
Stream.of(1, 0, -10, 9, 8, 100, 200, -80)
                .max(Integer::compareTo).ifPresent(System.out::println);
// minBy -80
Stream.of(1, 0, -10, 9, 8, 100, 200, -80)
                .collect(Collectors.minBy(Integer::compareTo)).ifPresent(System.out::println);
// min -80
Stream.of(1, 0, -10, 9, 8, 100, 200, -80)
                .min(Integer::compareTo).ifPresent(System.out::println);

Collectors.summingInt()、Collectors.summarizingLong()、Collectors.summarizingDouble() 這三個分别用于int、long、double類型資料一個求總操作，傳回的是一個SummaryStatistics(求總)，包含了數量統計count、求和sum、最小值min、平均值average、最大值max。

雖然IntStream、DoubleStream、LongStream 都可以是求和sum 但是也僅僅隻是求和，沒有summing結果豐富。如果要一次性統計、求平均值什麼的，summing還是非常友善的。

示例：

//IntSummaryStatistics{count=10, sum=55, min=1, average=5.500000, max=10}
Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
                .collect(Collectors.summarizingint(Integer::valueOf));
//DoubleSummaryStatistics{count=10, sum=55.000000, min=1.000000, average=5.500000, max=10.000000}
Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
                .collect(Collectors.summarizingdouble(double::valueOf));
//LongSummaryStatistics{count=10, sum=55, min=1, average=5.500000, max=10}
Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
                .collect(Collectors.summarizinglong(long::valueOf));
// 55
Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).mapToint(Integer::valueOf)
                .sum();
// 55.0
Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).mapTodouble(double::valueOf)
                .sum();
// 55
Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).mapTolong(long::valueOf)
                .sum();

Collectors.averagingInt()、Collectors.averagingDouble()、Collectors.averagingLong() 求平均值，适用于進階場景，這個後面再提。

示例：

Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
                .collect(Collectors.averagingint(Integer::valueOf));
Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
                .collect(Collectors.averagingdouble(double::valueOf));
Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
                .collect(Collectors.averaginglong(long::valueOf));

Collectors.reducing() 好像也和Stream.reduce()差不多，也都是規約操作。其實Collectors.counting() 就是用reducing()實作的，如代碼所示：

public static <T> Collector<T, ?, Long> counting() {
        return reducing(0L, e -> 1L, Long::sum);
    }

那既然這樣的話，我們就實作一個對所有學生名字長度求和規約操作。

示例：

//Optional[6]
Stream.of(studentA, studentB, studentC)
                .map(student -> student.name.length())
                .collect(Collectors.reducing(Integer::sum));
//6
//或者這樣，指定初始值，這樣可以防止沒有元素的情況下正常執行
Stream.of(studentA, studentB, studentC)
                .map(student -> student.name.length())
                .collect(Collectors.reducing(0, (i1, i2) -> i1 + i2));
//6
//更或者先不轉換，規約的時候再轉換
Stream.of(studentA, studentB, studentC)
                .collect(Collectors.reducing(0, s -> ((Student) s).getName().length(), Integer::sum));

前後處理

Collectors.groupingBy()和Collectors.groupingByConcurrent()，這兩者差別也僅是單線程和多線程的使用場景。為什麼要groupingBy歸類為前後處理呢？groupingBy 是在資料收集前分組的，再将分好組的資料傳遞給下遊的收集器。

這是 groupingBy最長的參數的函數classifier 是分類器，mapFactory map的工廠，downstream下遊的收集器，正是downstream 的存在，可以在資料傳遞個下遊之前做很多的騷操作。

public static <T, K, D, A, M extends Map<K, D>>
    Collector<T, ?, M> groupingBy(Function<? super T, ? extends K> classifier,
                                  Supplier<M> mapFactory,
                                  Collector<? super T, A, D> downstream)

示例：這裡将一組數整型數分為正數、負數、零，groupingByConcurrent()的參數也是跟它一樣的就不舉例了。

//Map<String,List<Integer>>
Stream.of(-6, -7, -8, -9, 1, 2, 3, 4, 5, 6)
                .collect(Collectors.groupingBy(integer -> {
    if (integer < 0) {
        return "小于";
    } else if (integer == 0) {
        return "等于";
    } else {
        return "大于";
    }
}
));
//Map<String,Set<Integer>>
//自定義下遊收集器
Stream.of(-6, -7, -8, -9, 1, 2, 3, 4, 5, 6)
                .collect(Collectors.groupingBy(integer -> {
    if (integer < 0) {
        return "小于";
    } else if (integer == 0) {
        return "等于";
    } else {
        return "大于";
    }
}
,Collectors.toSet()));
//Map<String,Set<Integer>>
//自定義map容器 和 下遊收集器
Stream.of(-6, -7, -8, -9, 1, 2, 3, 4, 5, 6)
                .collect(Collectors.groupingBy(integer -> {
    if (integer < 0) {
        return "小于";
    } else if (integer == 0) {
        return "等于";
    } else {
        return "大于";
    }
}
,LinkedHashMap::new,Collectors.toSet()));

Collectors.partitioningBy() 字面意思話就叫分區好了，但是partitioningBy最多隻能将資料分為兩部分，因為partitioningBy分區的依據Predicate，而Predicate隻會有true 和false 兩種結果，所有partitioningBy最多隻能将資料分為兩組。partitioningBy除了分類器與groupingBy 不一樣外，其他的參數都相同。

示例：

//Map<Boolean,List<Integer>>
Stream.of(0,1,0,1)
                .collect(Collectors.partitioningBy(integer -> integer==0));
//Map<Boolean,Set<Integer>>
//自定義下遊收集器
Stream.of(0,1,0,1)
                .collect(Collectors.partitioningBy(integer -> integer==0,Collectors.toSet()));

Collectors.mapping() 可以自定義要收集的字段。

示例：

//List<String>
Stream.of(studentA,studentB,studentC)
                .collect(Collectors.mapping(Student::getName,Collectors.toList()));

Collectors.collectingAndThen()收集後操作，如果你要在收集資料後再做一些操作，那麼這個就非常有用了。

示例：這裡在收集後轉成了listIterator，隻是個簡單的示例，具體的實作邏輯非常有待想象。

//listIterator 
Stream.of(studentA,studentB,studentC)
                .collect(Collectors.collectingAndThen(Collectors.toList(),List::listIterator));

總結

Collectors.作為Stream的核心，工能豐富強大，在我所寫的業務代碼中，幾乎沒有Collectors 完不成的，實在太難，隻要多想想，多試試這些API的組合，相信還是可以用Collectors來完成的。

之前為了寫個排序的id，我花了差不多6個小時去組合這些API，但還好寫出來了。這是我寫業務時某個複雜的操作

還有一點就是，像Stream操作符中與Collectors.中類似的收集器功能，如果能用Steam的操作符就去使用，這樣可以降低系統開銷。

END

原作者：litesky

玩轉Java8的 Stream 之Collectors收集器

API 使用

資料收集

聚合歸約

前後處理

總結

繼續閱讀

Cloud Studio初體驗

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

NOSQL安全攻擊

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method

在python中建立excel并寫入