hive資料倉庫的應用資料倉庫hive的随堂筆記

2023-07-31 23:54:36

資料倉庫hive的随堂筆記

hive基本内容：

hive是一個基于Hadoop的一個資料倉庫工具，可以将結構化的資料檔案映射成一張表，并提供類SQL查詢功能。

本質是将HQL轉化成MapReduce程式（底層實作）。

（1）hive處理的資料存儲在HDFS。

（2）hive分析資料底層的實作是MapReduce。

（3）執行程式運作在Yarn上。

hive表的建立：

文法： create table 表名（字段名 資料類型） row format delimited fields terminated by ',';

例子：create table student(id int,name string) row format delimited fields terminated by ',';

row format delimited：進行資料檔案行的格式化
fields terminated by ‘,’：指定資料檔案中列的分隔符

hive表中資料加載方式:

1.直接插入：

insert into student values(9,'stu1');

2.加載本地資料檔案到hive表中：

load data local inpath '/usr/local/data.txt' into table student;

3.加載HDFS上資料檔案到hive表中：

load data inpath '/data/data.txt' into table student;

建立外部表：

文法： create external table 表名(字段名 資料類型)row format delimited fields terminated by ' ';

例子： create external table student3(id int,name string)row format delimited fields terminated by ' ';

External：建立外部表的關鍵字
row format delimited：進行資料檔案行的格式化
fields terminated by ‘,’：指定資料檔案中列的分隔符

外部表的資料加載方式：同内部表加載方式一樣

内部表與外部表的差別：

兩者的主要差別在執行drop操作：

在執行drop操作時,删除内部表的時，會删除表的結構同時删除表中資料；删除外部表的時，僅僅會删除表的結構但是不會删除表中資料；

hive的資料類型

基本資料類型：

Int、boolen、tinyint、smallint、float、double、string、date、timestamps、…

複合資料類型：

array 數組，儲存的是同一類型的資料

Map 儲存鍵值對資料

Struct 儲存結構化資料

hive的基本函數使用：

函數	使用	列舉
Split(string A,分隔符)	根據分隔符将字元串A切分成多個字段	select split(line,’ ') from textword;
Size(數組名)	求數組長度	select size(split(line,’ ')) from textword;
Explode()	将一行中多列的内容轉換成多行	select explode(split(line,’ ')) from textword;
Round(double A)	四舍五入函數	select round(3.1415926);
Round(double A,int B)	保留資料A後的B位小數	select round(3.5415926,2);
Round(double A,int B)	保留資料A後的B位小數	select round(3.5415926,2);
Length（string A）	求字元串的長度	select length(line) from textword;
Reverse(string A)	擷取字元串A的反串	select reverse(‘asdfg’);
Upper(string A)	字元串轉換成大寫字元	select upper(“asbh”);
Lower(string B)	字元串轉換成小寫字元	select lower(‘SKJ’);
Avg() 、sum()、min()、max()、

hive資料倉庫的應用資料倉庫hive的随堂筆記

資料倉庫hive的随堂筆記

hive基本内容：

hive表的建立：

hive表中資料加載方式:

建立外部表：

内部表與外部表的差別：

hive的資料類型

基本資料類型：

複合資料類型：

hive的基本函數使用：

繼續閱讀

jdk1.7+Eclipse+Maven3.5+Hadoop2.7.3建構hadoop項目

HDFS指令行工具

【51CTO學院三周年】自學路上的伴侶

線上教育巨頭多鄰國Duolingo入華一周年，中國市場馬力全開

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

Sql優化一：sql語句優化

Nacos 2.0 更新前後性能對比壓測

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

Storm編譯打包過程中遇到的一些問題及解決方法

MapReduce的幾個企業級經典面試案例MapReduce的幾個企業級經典面試案例

9.spark Core 進階2--Cashe

淺談企業活動中進行資料分析的重要性

Ambari介紹和架構原理

NOSQL安全攻擊

win10本地scala和spark安裝安裝scala安裝spark