Spark Transformation算子-＞mapPartitionWithIndex

2022-07-01 17:23:15

类似于 mapPartitions,除此之外还会携带分区的索引值。

java

package transformations;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function2;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;

/**
 * @Author yqq
 * @Date 2021/12/09 20:45
 * @Version 1.0
 */
public class MapPartitionsWithIndexTest {
    public static void main(String[] args) {
        JavaSparkContext context = new JavaSparkContext(
                new SparkConf()
                        .setMaster("local")
                        .setAppName("mappartitionswithindex")
        );
        context.setLogLevel("Error");
        JavaRDD<String> rdd = context.parallelize(Arrays.asList("a", "b", "c", "e", "f", "g"), 3);
        List<String> list = new ArrayList<>();
        JavaRDD<String> rdd1 = rdd.mapPartitionsWithIndex(new Function2<Integer, Iterator<String>, Iterator<String>>() {
            @Override
            public Iterator<String> call(Integer v1, Iterator<String> v2) throws Exception {
                while (v2.hasNext())
                    list.add("partition:"+v1+"\t"+"value:"+v2.next());
                return list.iterator();
            }
        }, false);
        rdd1.collect().forEach(e-> System.out.println(e));
    }
}

package transformation

import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.mutable.ListBuffer

/**
 * @Author yqq
 * @Date 2021/12/09 21:11
 * @Version 1.0
 */
object MapPartitionsWithIndexTest {
  def main(args: Array[String]): Unit = {
    val context = new SparkContext(
      new SparkConf()
        .setMaster("local")
        .setAppName("mappartitionswithindex")
    )
    context.setLogLevel("Error")
    context.parallelize(Array[String]("a", "b", "c", "e", "f", "g"),3)
      .mapPartitionsWithIndex((index,ite)=>{
        val b = new ListBuffer[String]()
        while (ite.hasNext)
          b.append(s"partition:$index,value:${ite.next()}")
        b.iterator
      }).foreach(println)
  }
}

Spark Transformation算子-＞mapPartitionWithIndex

继续阅读

nginx location中斜线的位置的重要性

Apache2.4.x 配置文件详解Apache配置需要了解如下：开始讲解：

配置apache支持PHP（win7）

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method