2021-02-26 numpy學習

numpy的概念

一個重在計算且是大部分Python科學計算庫的基礎庫，多用于在大型、多元數組上執行數值運算

數組形狀

按數組的值分類

一維數組

t1 = np.arange(12)
print(t1)
#array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])		#一維數組
print(t1.shape)
#(12,)	#隻有一個值時表示該元組内元素的個數

二維數組

t2 = np.array([[1,2,3],[4,5,6]])
print(t2)
#array([[1, 2, 3],		#二維數組
#       [4, 5, 6]])
print(t2.shape)
#(2, 3)	#2表示行數，3表示列數

三維數組

t3 = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
print(t3)
'''
array([[[ 1,  2,  3],	#三維數組
        [ 4,  5,  6]],
       [[ 7,  8,  9],
        [10, 11, 12]]])
 '''
print(t3.shape)
#(2, 2, 3)	#2塊數，2塊數裡有2行，3塊數裡有3列

按軸分類（axis）

在numpy中可以了解為方向,使用0,1,2…數字表示,對于一個一維數組,隻有一個0軸,對于2維數組(shape(2,2)),有0軸和1軸,對于三維數組(shape(2,2, 3)),有0,1,2軸

一維數組 0軸
二維數組，0和1軸

2021-02-26 numpy學習
三維數組的軸 0,1,2軸

2021-02-26 numpy學習

numpy中常見的資料類型（拓展）

2021-02-26 numpy學習

數組形狀轉換

一維換多元

t4 = np.arange(12)
print(t4.reshape(3,4))
'''
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
 
 tips:改變的形狀要與原值比對，t4 3行4列可分，3行5列不可分
 ValueError: cannot reshape array of size 12 into shape (3,5)
'''

t5 = np.arange(24).reshape((2,3,4))
print(t5)
'''
分成2塊，每塊3行4列資料
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
'''
print(t5.reshape((4,6)))

'''
tips:reshape()是有傳回值的，同時調用的t5這個變量本身沒有發生改變
    1,如果要調用reshape()的傳回值内容，使用新變量接收  t6 = t5.reshape((4,6))
    2,如果使用原調用的變量t5去接收，則是對資料進行原地修改，沒有傳回值，t5發生改變  t5=t5.reshape((4,6))
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
'''

二維換一維

t5 = t5.reshape((4, 6))
t6 = t5.reshape((24,))  #reshape裡隻有1個值才會獲得一維數組
print(t6)
t7 = t5.reshape((t5.shape[0]*t5.shape[1],)) #t5的行數*t5的列數
print(t7)
print(t5.flatten()) #在二維情況下将數組展開為一維
'''
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
'''
t8 = t5.reshape(1,24)
print(t8)
'''
[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]]
tips:t8的數組形狀不是一維數組，是1行24列二維數組
'''

數組計算

數組計算的廣播原則，如果兩個數組的後緣次元（即從末尾開始算起的次元）的軸長度相符或其中一方的長度為1，則認為他們是廣播相容的，廣播會在缺失和/或長度為1的次元上進行

數組和數的計算，數組内的每個數都與數進行運算

import numpy as np
#數組和數的計算時，數組内的每個數都與數進行運算
t1 = np.arange(24).reshape((4,6))
#print(t1.shape)
'''
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
'''
print(t1 + 2)
print(t1 - 2)
print(t1 * 2)
print(t1 / 2)
print(t1/0)
'''
加
[[ 2  3  4  5  6  7]
 [ 8  9 10 11 12 13]
 [14 15 16 17 18 19]
 [20 21 22 23 24 25]]
減
[[-2 -1  0  1  2  3]
 [ 4  5  6  7  8  9]
 [10 11 12 13 14 15]
 [16 17 18 19 20 21]]
乘
[[ 0  2  4  6  8 10]
 [12 14 16 18 20 22]
 [24 26 28 30 32 34]
 [36 38 40 42 44 46]]
除
[[ 0.   0.5  1.   1.5  2.   2.5]
 [ 3.   3.5  4.   4.5  5.   5.5]
 [ 6.   6.5  7.   7.5  8.   8.5]
 [ 9.   9.5 10.  10.5 11.  11.5]]

除0：nan不是個數字，Inf(inf表示正無窮，-inf表示負無窮)
[[nan inf inf inf inf inf]
 [inf inf inf inf inf inf]
 [inf inf inf inf inf inf]
 [inf inf inf inf inf inf]]
'''

數組和數組的計算（同次元），數組與數組與之對應的元素進行計算

import numpy as np
t1 = np.array([[3,4,5,6,7,8],[9,10,11,12,13,14]])
print(t1.shape)
t2 = np.array([[21,22,23,24,25,26],[27,28,29,30,31,32]])
print(t2.shape)

print(t1+t2)
'''
[[24 26 28 30 32 34]
 [36 38 40 42 44 46]]
'''
print(t2-t1)
'''
[[18 18 18 18 18 18]
 [18 18 18 18 18 18]]
'''
print(t1*t2)
'''
[[ 63  88 115 144 175 208]
 [243 280 319 360 403 448]]
'''
print(t2/t1)
'''
[[7.         5.5        4.6        4.         3.57142857 3.25      ]
 [3.         2.8        2.63636364 2.5        2.38461538 2.28571429]]
'''

數組和數組的計算（不同次元），隻要他們在某個次元上相符（行或列），則可以進行對應計算

import numpy as np
#數組和數組在不同次元時，隻要他們在某個次元上相符，則可以進行對應計算

###########################可取值############################
t1 = np.array([[3,4,5,6,7,8],[9,10,11,12,13,14]])
#print(t1.shape)
t2 = np.array(range(2)).reshape((2,1))
'''[[0]
 [1]]'''

#print(t1-t2)
'''
t1對應t2的是行數，行數内的每個元素都進行計算
[[ 3  4  5  6  7  8]    t1:[3,4,5,6,7,8]    t2:[0]
 [ 8  9 10 11 12 13]]   t1:[9,10,11,12,13,14]   t2:[1]
'''

t3 = np.array(range(1,7))
'''[1 2 3 4 5 6]'''
#print(t1*t3)
'''
t1對應t3的列數，則可以進行計算，t1的2行資料都與t2的1行資料進行對應元素（列數）計算
[[ 3  8 15 24 35 48]    t1:[3,4,5,6,7,8]    t2:[1 2 3 4 5 6]
 [ 9 20 33 48 65 84]]   t1:[9,10,11,12,13,14]   t2:[1 2 3 4 5 6]
'''

###########################不可取值############################
t4 = np.arange(24).reshape((3,2,4))
'''
[[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]

 [[16 17 18 19]
  [20 21 22 23]]]
'''

t5 = np.arange(12).reshape((3,4))
'''
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
'''


'''各次元都取不到對應值時，無法計算
print(t4-t3)
ValueError: operands could not be broadcast together with shapes (3,2,4) (6,) 

print(t1*t5)
ValueError: operands could not be broadcast together with shapes (2,6) (3,4) 
'''

numpy操作資料

使用方法

2021-02-26 numpy學習

讀取資料的參數補充

2021-02-26 numpy學習

數值修改（給nan賦均值）

import numpy as np



def fill_ndarray(a):
    for i in range(a.shape[1]): #周遊每一列
        temp_col = a[:,i]   #目前這一列
        nan_num = np.count_nonzero(temp_col != temp_col)
        if nan_num != 0:    #不為0，有nan值
            temp_not_nan_col = temp_col[temp_col == temp_col]   #nan當列不為nan的值
            #選中目前為nan的位置，把nan當列的均值指派給nan
            temp_col[np.isnan(temp_col)] = temp_not_nan_col.mean()

    return a

if __name__ == '__main__':
    a = np.arange(12).reshape((3, 4)).astype('float')

    a[1, 2:] = np.nan
    print(a)
    '''
    [[ 0.  1.  2.  3.]
     [ 4.  5. nan nan]
     [ 8.  9. 10. 11.]]
    '''
    a = fill_ndarray(a)
    print(a)
    '''
    [[ 0.  1.  2.  3.]
     [ 4.  5.  6.  7.]
     [ 8.  9. 10. 11.]]
    '''

多資料取值并繪圖

擷取極值後判斷組距獲得組數，根據繪制結果調整，離散資料考慮放棄，選擇密集的點位繼續繪圖找到可供觀察規律的資料

#繪制直方圖（YouTube評論資料）
import numpy as np
from matplotlib import pyplot as plt

us_file_path = 'US_video_data_numbers.csv'
gb_file_path = 'GB_video_data_numbers.csv'

#t1 = np.loadtxt(us_file_path,delimiter=',',dtype='int',unpack=True)
t_us = np.loadtxt(us_file_path,delimiter=',',dtype='int')

#擷取評論（最後一列）
t_us_comment = t_us[:,-1]

t_us_comment = t_us_comment[t_us_comment<=5000]
print(t_us_comment.max(),t_us_comment.min())

d = 250   #組距
bin_nums = (t_us_comment.max()-t_us_comment.min())//d

plt.figure(figsize=(20,8),dpi=80)
plt.hist(t_us_comment,bin_nums)
plt.show()

#2個資訊對比，作散點圖，找分布規律
import numpy as np
from matplotlib import pyplot as plt

us_file_path = 'US_video_data_numbers.csv'
gb_file_path = 'GB_video_data_numbers.csv'

#t1 = np.loadtxt(us_file_path,delimiter=',',dtype='int',unpack=True)
t_us = np.loadtxt(us_file_path,delimiter=',',dtype='int')

#限定範圍，使資料圖更清晰
#在此處做限定，獲得低于50Wlike數的整條資料（包括評論）
t_us = t_us[t_us[:,1]<=500000]
#擷取comment like資料
t_us_comment = t_us[:,-1]
t_us_like = t_us[:,1]

plt.scatter(t_us_like,t_us_comment)
plt.show()

資料拼接

給資料分别添加識别碼，再進行拼接

import numpy as np

#加載原始資料
us_data = 'US_video_data_numbers.csv'
uk_data = 'GB_video_data_numbers.csv'

us_data = np.loadtxt(us_data,delimiter=',',dtype=int)
uk_data = np.loadtxt(uk_data,delimiter=',',dtype=int)

#添加國家資訊（使用1和0編碼）
#構造全為0的資料
zeros_data = np.zeros((us_data.shape[0], 1)).astype(int)
ones_data = np.ones((uk_data.shape[0], 1)).astype(int)

#分别在資料前列添加國家碼
#使用拼接
us_data = np.hstack((us_data,zeros_data))
uk_data = np.hstack((uk_data,ones_data))

#tips:拼接方法裡隻能傳1個值（元組），注意括号，否則報錯
#拼接2組資料
final_data = np.vstack((us_data,uk_data))
print(final_data)

2021-02-26 numpy學習

numpy的概念

數組形狀

按數組的值分類

按軸分類（axis）

數組形狀轉換

數組計算

numpy操作資料

使用方法

數值修改（給nan賦均值）

多資料取值并繪圖

資料拼接

繼續閱讀

網格搜尋最優參數

3分鐘SAS資料分析入門，3秒赢得開門紅Q1:在SAS認證考試裡，怎麼打開考試題？正規的教材裡不教這個，但是考試又要用，

資料分析入門~資料分析的基礎邏輯

python資料可視化之日期折線圖畫法

資料分析入門之KNN影片類型和癌症預測1、預測電影類型2、預測是否患癌症

資料分析入門攻略

地理空間資料分析入門【Shapely+Geopandas】

資料分析-回歸

資料預處理-标準化

資料分析入門：五個習慣讓你快速提升職場資料分析能力（連載-3）

Python資料分析入門（十九）：繪制散點圖散點圖繪制散點圖：繪制回歸曲線：

Python資料分析（pandas入門）

學習資料分析入門的小夥伴可以看看#資料分析##M y SQL#

STATA資料分析入門-時間序列面闆S17-資料查驗和比較_

Python資料分析入門（二十二）：資料可視化之繪制雷達圖Python學習交流群：1039649593雷達圖使用plt.polar繪制雷達圖：使用子圖繪制雷達圖：