天天看点

【Structured Streaming】-- 输出模式

1、环境

  • spark 2.4.0
  • scala 2.11.8
  • jdk 1.8
  • maven
<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
            <version>2.4.0</version>
</dependency>      

2、源码说明

package org.apache.spark.sql.streaming;

import org.apache.spark.annotation.InterfaceStability;
import org.apache.spark.sql.catalyst.streaming.InternalOutputModes;

/**
 * OutputMode describes what data will be written to a streaming sink when there is
 * new data available in a streaming DataFrame/Dataset.
 *
 * @since 2.0.0
 */
@InterfaceStability.Evolving
public class OutputMode {

  /**
   * OutputMode in which only the new rows in the streaming DataFrame/Dataset will be
   * written to the sink. This output mode can be only be used in queries that do not
   * contain any aggregation.
   *
   * @since 2.0.0
   */
  public static OutputMode Append() {
    return InternalOutputModes.Append$.MODULE$;
  }

  /**
   * OutputMode in which all the rows in the streaming DataFrame/Dataset will be written
   * to the sink every time there are some updates. This output mode can only be used in queries
   * that contain aggregations.
   *
   * @since 2.0.0
   */
  public static OutputMode Complete() {
    return InternalOutputModes.Complete$.MODULE$;
  }

  /**
   * OutputMode in which only the rows that were updated in the streaming DataFrame/Dataset will
   * be written to the sink every time there are some updates. If the query doesn't contain
   * aggregations, it will be equivalent to `Append` mode.
   *
   * @since 2.1.1
   */
  public static OutputMode Update() {
    return InternalOutputModes.Update$.MODULE$;
  }
}      

3、区别

  • Append :默认模式这种模式保证每一行只输出一次(假设是容错接收器)。只输出结果表中本批次新增的数据,即本批次中的数据。支持使用:select、where、map、flatMap、filter、join等的查询。
  • Completed:每次触发后,输出最新的完整的结果表数据。 支持使用:聚合查询。
  • Updated:(自Spark 2.1.1起可用)只输出结果表中被本批次修改的数据。 

4、参考