如何使用zeppelin進行資料分析

2022-11-02 15:02:54

spark官方文檔：

http://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.sql.Column

一、資料導入

建議預先下載下傳一個檔案拖拽上傳的軟體

yum -y installl lrzsz

1.在hadoop目錄下執行，從本地（windows）上傳

user.csv

檔案到虛拟機

2.檢視hdfs目錄

hdfs dfs -ls /

3.遞歸建立目錄

hdfs dfs -mkdir -p /events/users

4.上傳檔案到hdfs

hdfs dfs -put user.csv /events/users

5.在zeepelin中操作

（1）錄入資料

val users=spark.read.options(Map("inferSchema"->"true","delimiter"->"\t","header"->"true"))
  .csv("/events/users/users.csv")

users.printSchema

spark zeppelin內建 HDFS apache

上一篇: QTdefaultargumentgivenofparameter_F_hawk189_新浪部落格

下一篇: 虛拟成像技術_全景錄影機有哪些特别技術全景錄影機特别技術介紹【詳解】...

繼續閱讀

搭建httpd服務
運維知識 apache httpd
08-07
windows下配置Apache的vhost初次接觸，強烈歡迎拍磚，指出錯誤
apache 虛拟主機
08-07
Apache與PHP環境下配置本地虛拟主機
php apache 虛拟主機
08-07
Linux 7 中配置Apache服務，及禁止ip通路，删除apache廣告頁面。
Linux apache httpd lInux httpd
08-07
Apache配置檔案中的deny和allow的使用
伺服器 apache .htaccess deny allow httpd.conf
08-07
Apache 配置預設編碼
httpd apache
08-07
伺服器配置——Apache
apache centos ubuntu
08-07
Apache靜态檔案通路配置（書封伺服器）
Web開發 apache
08-07
apache httpd 配置
server apache httpd https rewrite ssl
08-07
大資料排錯SparkSpark叢集啟動時候，JAVA_HOME is not sethadoop叢集，某台伺服器jps無任何輸出IDEAkafkahadoopspark sqlfile permissionsIDEA本地測試 - OutOfMemoryError: GC overhead limit exceededhdfs負載均衡
spark
08-07
Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin
php apache ubuntu mysql httpd
08-07
Apache配置SSLApache配置SSL
伺服器 php apache openssl ssl
08-07
Windows下配置Apache的SSL服務
apache Windows ssl 伺服器 asynchronous server
08-07
Apache2.4.x 配置檔案詳解Apache配置需要了解如下：開始講解：
apache httpd.conf 配置
08-07
配置apache支援PHP（win7）
apache httpd-conf php
08-07
spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結
spark java jar scala
08-07