搭建項目環境并處理流式資料

pom

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.bygones</groupId>
    <artifactId>learn-flink</artifactId>
    <version>1.0.0-SNAPSHOT</version>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.10.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>1.10.1</version>
        </dependency>
    </dependencies>
</project>

Java & flink 處理流式資料的代碼

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

// data stream
// Flink 處理流式資料
// 處理流式資料的API是 DataStream
public class StreamWordCount {
    public static void main(String[] args) throws Exception {
        // 建立流處理執行環境
        StreamExecutionEnvironment executionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();
        // 設定并行度,預設是電腦CPU核數
        executionEnvironment.setParallelism(8);

        // 從檔案中讀取資料
        String inputPath = "F:\\workspace\\data-process\\flink\\learn-flink\\src\\main\\resources\\batch1.txt";
        // DataStreamSource<String> stringDataStreamSource = executionEnvironment.readTextFile(inputPath);
        DataStream<String> stringDataStream = executionEnvironment.readTextFile(inputPath);

        // 基于資料流對資料進行處理
        SingleOutputStreamOperator<Tuple2<String, Integer>> resultStream = stringDataStream.flatMap(new MyFlatMapper())
                .keyBy(0)
                .sum(1);

        resultStream.print();

        // 上面的代碼隻是定義的對資料的處理流程

        // 執行任務, 資料來一條處理一條
        executionEnvironment.execute();

    }

    // FlatMapFunction<輸入參數的類型, 輸出參數的類型>
    // Tuple2<String,Integer> 二進制組資料類型,例如(word , 1);
    public static class MyFlatMapper implements FlatMapFunction<String,Tuple2<String,Integer>> {

        // 定義對資料的處理規則

        /***
         * <p>
         *     知識點描述:
         *     Collector 收集器, 将需要傳回的資料收集起來
         * </p>
         * @param value 輸入的資料
         * @param collector 收集器, 将需要傳回的資料收集起來
         * @throws Exception
         */
        public void flatMap(String value, Collector<Tuple2<String, Integer>> collector) throws Exception {
            // 按照空格分詞
            String[] words = value.split(" ");
            // 周遊所有word，包成二進制組
            for (String word: words) {
                collector.collect(new Tuple2<String, Integer>(word , 1));
            }
        }
    }
}

流式資料的載體

hello world
hello flink
hello spark
hello scala
how are you
fine thank you
and you

輸出結果分析

會逐條處理，相當于資料來一條處理一條

會記錄資料的狀态

預設會并行處理資料

Java&Flink資料處理學習-搭建Flink項目(入門-處理流資料)搭建項目環境并處理流式資料輸出結果分析

Flink處理流式資料

搭建項目環境并處理流式資料

輸出結果分析

繼續閱讀

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

Ambari介紹和架構原理

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

NOSQL安全攻擊

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method

Java&amp;Flink資料處理學習-搭建Flink項目(入門-處理流資料)搭建項目環境并處理流式資料輸出結果分析

Flink處理流式資料

搭建項目環境并處理流式資料

輸出結果分析

繼續閱讀

Java&Flink資料處理學習-搭建Flink項目(入門-處理流資料)搭建項目環境并處理流式資料輸出結果分析