27 個Python資料科學庫實戰案例 (附代碼)

本文約8000字，建議閱讀15分鐘本文對目前較為常見的人工智能庫進行簡要全面的介紹。

為了大家能夠對人工智能常用的 Python 庫有一個初步的了解，以選擇能夠滿足自己需求的庫進行學習，對目前較為常見的人工智能庫進行簡要全面的介紹。

1、Numpy

NumPy(Numerical Python)是 Python的一個擴充程式庫，支援大量的次元數組與矩陣運算，此外也針對數組運算提供大量的數學函數庫，Numpy底層使用C語言編寫，數組中直接存儲對象，而不是存儲對象指針，是以其運算效率遠高于純Python代碼。我們可以在示例中對比下純Python與使用Numpy庫在計算清單sin值的速度對比：

import numpy as np
import math
import random
import time


start = time.time()
for i in range(10):
    list_1 = list(range(1,10000))
    for j in range(len(list_1)):
        list_1[j] = math.sin(list_1[j])
print("使用純Python用時{}s".format(time.time()-start))


start = time.time()
for i in range(10):
    list_1 = np.array(np.arange(1,10000))
    list_1 = np.sin(list_1)
print("使用Numpy用時{}s".format(time.time()-start))

從如下運作結果，可以看到使用 Numpy 庫的速度快于純 Python 編寫的代碼：

使用純Python用時0.017444372177124023s

使用Numpy用時0.001619577407836914s

2、OpenCV

OpenCV 是一個的跨平台計算機視覺庫，可以運作在 Linux、Windows 和 Mac OS 作業系統上。它輕量級而且高效——由一系列 C 函數和少量 C++ 類構成，同時也提供了 Python 接口，實作了圖像處理和計算機視覺方面的很多通用算法。下面代碼嘗試使用一些簡單的濾鏡，包括圖檔的平滑處理、高斯模糊等：

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('h89817032p0.png')
kernel = np.ones((5,5),np.float32)/25
dst = cv.filter2D(img,-1,kernel)
blur_1 = cv.GaussianBlur(img,(5,5),0)
blur_2 = cv.bilateralFilter(img,9,75,75)
plt.figure(figsize=(10,10))
plt.subplot(221),plt.imshow(img[:,:,::-1]),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(222),plt.imshow(dst[:,:,::-1]),plt.title('Averaging')
plt.xticks([]), plt.yticks([])
plt.subplot(223),plt.imshow(blur_1[:,:,::-1]),plt.title('Gaussian')
plt.xticks([]), plt.yticks([])
plt.subplot(224),plt.imshow(blur_1[:,:,::-1]),plt.title('Bilateral')
plt.xticks([]), plt.yticks([])
plt.show()

OpenCV

3、Scikit-image

scikit-image是基于scipy的圖像處理庫，它将圖檔作為numpy數組進行處理。例如，可以利用scikit-image改變圖檔比例，scikit-image提供了rescale、resize以及downscale_local_mean等函數。

from skimage import data, color, io
from skimage.transform import rescale, resize, downscale_local_mean


image = color.rgb2gray(io.imread('h89817032p0.png'))


image_rescaled = rescale(image, 0.25, anti_aliasing=False)
image_resized = resize(image, (image.shape[0] // 4, image.shape[1] // 4),
                       anti_aliasing=True)
image_downscaled = downscale_local_mean(image, (4, 3))
plt.figure(figsize=(20,20))
plt.subplot(221),plt.imshow(image, cmap='gray'),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(222),plt.imshow(image_rescaled, cmap='gray'),plt.title('Rescaled')
plt.xticks([]), plt.yticks([])
plt.subplot(223),plt.imshow(image_resized, cmap='gray'),plt.title('Resized')
plt.xticks([]), plt.yticks([])
plt.subplot(224),plt.imshow(image_downscaled, cmap='gray'),plt.title('Downscaled')
plt.xticks([]), plt.yticks([])
plt.show()

Scikit-image

4、PIL

Python Imaging Library(PIL) 已經成為 Python 事實上的圖像處理标準庫了，這是由于，PIL 功能非常強大，但API卻非常簡單易用。但是由于PIL僅支援到 Python 2.7，再加上年久失修，于是一群志願者在 PIL 的基礎上建立了相容的版本，名字叫 Pillow，支援最新 Python 3.x，又加入了許多新特性，是以，我們可以跳過 PIL，直接安裝使用 Pillow。

5、Pillow

使用 Pillow 生成字母驗證碼圖檔：

from PIL import Image, ImageDraw, ImageFont, ImageFilter


import random


# 随機字母:
def rndChar():
    return chr(random.randint(65, 90))


# 随機顔色1:
def rndColor():
    return (random.randint(64, 255), random.randint(64, 255), random.randint(64, 255))


# 随機顔色2:
def rndColor2():
    return (random.randint(32, 127), random.randint(32, 127), random.randint(32, 127))


# 240 x 60:
width = 60 * 6
height = 60 * 6
image = Image.new('RGB', (width, height), (255, 255, 255))
# 建立Font對象:
font = ImageFont.truetype('/usr/share/fonts/wps-office/simhei.ttf', 60)
# 建立Draw對象:
draw = ImageDraw.Draw(image)
# 填充每個像素:
for x in range(width):
    for y in range(height):
        draw.point((x, y), fill=rndColor())
# 輸出文字:
for t in range(6):
    draw.text((60 * t + 10, 150), rndChar(), font=font, fill=rndColor2())
# 模糊:
image = image.filter(ImageFilter.BLUR)
image.save('code.jpg', 'jpeg')

驗證碼

6、SimpleCV

SimpleCV 是一個用于建構計算機視覺應用程式的開源架構。使用它，可以通路高性能的計算機視覺庫，如 OpenCV，而不必首先了解位深度、檔案格式、顔色空間、緩沖區管理、特征值或矩陣等術語。但其對于 Python3 的支援很差很差，在 Python3.7 中使用如下代碼：

from SimpleCV import Image, Color, Display
# load an image from imgur
img = Image('http://i.imgur.com/lfAeZ4n.png')
# use a keypoint detector to find areas of interest
feats = img.findKeypoints()
# draw the list of keypoints
feats.draw(color=Color.RED)
# show the  resulting image. 
img.show()
# apply the stuff we found to the image.
output = img.applyLayers()
# save the results.
output.save('juniperfeats.png')

會報如下錯誤，是以不建議在 Python3 中使用：

SyntaxError: Missing parentheses in call to 'print'. Did you mean print('unit test')?

7、Mahotas

Mahotas 是一個快速計算機視覺算法庫，其建構在 Numpy 之上，目前擁有超過100種圖像處理和計算機視覺功能，并在不斷增長。使用 Mahotas 加載圖像，并對像素進行操作：

import numpy as np
import mahotas
import mahotas.demos


from mahotas.thresholding import soft_threshold
from matplotlib import pyplot as plt
from os import path
f = mahotas.demos.load('lena', as_grey=True)
f = f[128:,128:]
plt.gray()
# Show the data:
print("Fraction of zeros in original image: {0}".format(np.mean(f==0)))
plt.imshow(f)
plt.show()

Mahotas

8、Ilastik

Ilastik 能夠給使用者提供良好的基于機器學習的生物資訊圖像分析服務，利用機器學習算法，輕松地分割，分類，跟蹤和計數細胞或其他實驗資料。大多數操作都是互動式的，并不需要機器學習專業知識。

9、Scikit-Learn

Scikit-learn 是針對 Python 程式設計語言的免費軟體機器學習庫。它具有各種分類，回歸和聚類算法，包括支援向量機，随機森林，梯度提升，k均值和 DBSCAN 等多種機器學習算法。使用Scikit-learn實作KMeans算法：

import time


import numpy as np
import matplotlib.pyplot as plt


from sklearn.cluster import MiniBatchKMeans, KMeans
from sklearn.metrics.pairwise import pairwise_distances_argmin
from sklearn.datasets import make_blobs


# Generate sample data
np.random.seed(0)


batch_size = 45
centers = [[1, 1], [-1, -1], [1, -1]]
n_clusters = len(centers)
X, labels_true = make_blobs(n_samples=3000, centers=centers, cluster_std=0.7)


# Compute clustering with Means


k_means = KMeans(init='k-means++', n_clusters=3, n_init=10)
t0 = time.time()
k_means.fit(X)
t_batch = time.time() - t0


# Compute clustering with MiniBatchKMeans


mbk = MiniBatchKMeans(init='k-means++', n_clusters=3, batch_size=batch_size,
                      n_init=10, max_no_improvement=10, verbose=0)
t0 = time.time()
mbk.fit(X)
t_mini_batch = time.time() - t0


# Plot result
fig = plt.figure(figsize=(8, 3))
fig.subplots_adjust(left=0.02, right=0.98, bottom=0.05, top=0.9)
colors = ['#4EACC5', '#FF9C34', '#4E9A06']


# We want to have the same colors for the same cluster from the
# MiniBatchKMeans and the KMeans algorithm. Let's pair the cluster centers per
# closest one.
k_means_cluster_centers = k_means.cluster_centers_
order = pairwise_distances_argmin(k_means.cluster_centers_,
                                  mbk.cluster_centers_)
mbk_means_cluster_centers = mbk.cluster_centers_[order]


k_means_labels = pairwise_distances_argmin(X, k_means_cluster_centers)
mbk_means_labels = pairwise_distances_argmin(X, mbk_means_cluster_centers)


# KMeans
for k, col in zip(range(n_clusters), colors):
    my_members = k_means_labels == k
    cluster_center = k_means_cluster_centers[k]
    plt.plot(X[my_members, 0], X[my_members, 1], 'w',
            markerfacecolor=col, marker='.')
    plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
            markeredgecolor='k', markersize=6)
plt.title('KMeans')
plt.xticks(())
plt.yticks(())


plt.show()

KMeans

10、SciPy

SciPy 庫提供了許多使用者友好和高效的數值計算，如數值積分、插值、優化、線性代數等。SciPy 庫定義了許多數學實體的特殊函數，包括橢圓函數、貝塞爾函數、伽馬函數、貝塔函數、超幾何函數、抛物線圓柱函數等等。

from scipy import special
import matplotlib.pyplot as plt
import numpy as np


def drumhead_height(n, k, distance, angle, t):
    kth_zero = special.jn_zeros(n, k)[-1]
    return np.cos(t) * np.cos(n*angle) * special.jn(n, distance*kth_zero)


theta = np.r_[0:2*np.pi:50j]
radius = np.r_[0:1:50j]
x = np.array([r * np.cos(theta) for r in radius])
y = np.array([r * np.sin(theta) for r in radius])
z = np.array([drumhead_height(1, 1, r, theta, 0.5) for r in radius])




fig = plt.figure()
ax = fig.add_axes(rect=(0, 0.05, 0.95, 0.95), projection='3d')
ax.plot_surface(x, y, z, rstride=1, cstride=1, cmap='RdBu_r', vmin=-0.5, vmax=0.5)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_xticks(np.arange(-1, 1.1, 0.5))
ax.set_yticks(np.arange(-1, 1.1, 0.5))
ax.set_zlabel('Z')
plt.show()

SciPy

11、NLTK

NLTK 是建構Python程式以處理自然語言的庫。它為50多個語料庫和詞彙資源(如 WordNet )提供了易于使用的接口，以及一套用于分類、分詞、詞幹、标記、解析和語義推理的文本處理庫、工業級自然語言處理 (Natural Language Processing, NLP) 庫的包裝器。NLTK被稱為 “a wonderful tool for teaching, and working in, computational linguistics using Python”。

import nltk
from nltk.corpus import treebank


# 首次使用需要下載下傳
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('treebank')


sentence = """At eight o'clock on Thursday morning Arthur didn't feel very good."""
# Tokenize
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)


# Identify named entities
entities = nltk.chunk.ne_chunk(tagged)


# Display a parse tree
t = treebank.parsed_sents('wsj_0001.mrg')[0]
t.draw()

NLTK

12、spaCy

spaCy 是一個免費的開源庫，用于 Python 中的進階 NLP。它可以用于建構處理大量文本的應用程式；也可以用來建構資訊提取或自然語言了解系統，或者對文本進行預處理以進行深度學習。

import spacy


  texts = [
      "Net income was $9.4 million compared to the prior year of $2.7 million.",
      "Revenue exceeded twelve billion dollars, with a loss of $1b.",
  ]


  nlp = spacy.load("en_core_web_sm")
  for doc in nlp.pipe(texts, disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer"]):
      # Do something with the doc here
      print([(ent.text, ent.label_) for ent in doc.ents])

nlp.pipe 生成 Doc 對象，是以我們可以對它們進行疊代并通路命名實體預測：

[('$9.4 million', 'MONEY'), ('the prior year', 'DATE'), ('$2.7 million', 'MONEY')]
[('twelve billion dollars', 'MONEY'), ('1b', 'MONEY')]

13、LibROSA

librosa 是一個用于音樂和音頻分析的 Python 庫，它提供了建立音樂資訊檢索系統所必需的功能和函數。

# Beat tracking example
import librosa


# 1. Get the file path to an included audio example
filename = librosa.example('nutcracker')


# 2. Load the audio as a waveform `y`
#    Store the sampling rate as `sr`
y, sr = librosa.load(filename)


# 3. Run the default beat tracker
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
print('Estimated tempo: {:.2f} beats per minute'.format(tempo))


# 4. Convert the frame indices of beat events into timestamps
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

14、Pandas

Pandas 是一個快速、強大、靈活且易于使用的開源資料分析和操作工具， Pandas 可以從各種檔案格式比如 CSV、JSON、SQL、Microsoft Excel 導入資料，可以對各種資料進行運算操作，比如歸并、再成形、選擇，還有資料清洗和資料加工特征。Pandas 廣泛應用在學術、金融、統計學等各個資料分析領域。

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np


ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts = ts.cumsum()


df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list("ABCD"))
df = df.cumsum()
df.plot()
plt.show()

Pandas

15、Matplotlib

Matplotlib 是Python的繪圖庫，它提供了一整套和 matlab 相似的指令 API，可以生成出版品質級别的精美圖形，Matplotlib 使繪圖變得非常簡單，在易用性和性能間取得了優異的平衡。使用 Matplotlib 繪制多曲線圖：

# plot_multi_curve.py
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0.1, 2 * np.pi, 100)
y_1 = x
y_2 = np.square(x)
y_3 = np.log(x)
y_4 = np.sin(x)
plt.plot(x,y_1)
plt.plot(x,y_2)
plt.plot(x,y_3)
plt.plot(x,y_4)
plt.show()

Matplotlib

16、Seaborn

Seaborn 是在 Matplotlib 的基礎上進行了更進階的API封裝的Python資料可視化庫，進而使得作圖更加容易，應該把 Seaborn 視為 Matplotlib 的補充，而不是替代物。

import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")


df = sns.load_dataset("penguins")
sns.pairplot(df, hue="species")
plt.show()

seaborn

17、Orange

Orange 是一個開源的資料挖掘和機器學習軟體，提供了一系列的資料探索、可視化、預處理以及模組化元件。Orange 擁有漂亮直覺的互動式使用者界面，非常适合新手進行探索性資料分析和可視化展示；同時進階使用者也可以将其作為 Python 的一個程式設計子產品進行資料操作群組件開發。使用 pip 即可安裝 Orange，好評～

$ pip install orange3

安裝完成後，在指令行輸入 orange-canvas 指令即可啟動 Orange 圖形界面：

$ orange-canvas

啟動完成後，即可看到 Orange 圖形界面，進行各種操作。

Orange

18、PyBrain

PyBrain 是 Python 的子產品化機器學習庫。它的目标是為機器學習任務和各種預定義的環境提供靈活、易于使用且強大的算法來測試和比較算法。PyBrain 是 Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library 的縮寫。我們将利用一個簡單的例子來展示 PyBrain 的用法，建構一個多層感覺器 (Multi Layer Perceptron, MLP)。首先，我們建立一個新的前饋網絡對象：

from pybrain.structure import FeedForwardNetwork
n = FeedForwardNetwork()

接下來，建構輸入、隐藏和輸出層：

from pybrain.structure import LinearLayer, SigmoidLayer


inLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outLayer = LinearLayer(1)

為了使用所建構的層，必須将它們添加到網絡中：

n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)

可以添加多個輸入和輸出子產品。為了向前計算和反向誤差傳播，網絡必須知道哪些層是輸入、哪些層是輸出。這就需要明确确定它們應該如何連接配接。為此，我們使用最常見的連接配接類型，全連接配接層，由 FullConnection 類實作：

from pybrain.structure import FullConnection
in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)

與層一樣，我們必須明确地将它們添加到網絡中：

n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)

所有元素現在都已準備就位，最後，我們需要調用.sortModules()方法使MLP可用：

n.sortModules()

這個調用會執行一些内部初始化，這在使用網絡之前是必要的。

19、Milk

MILK(MACHINE LEARNING TOOLKIT) 是 Python 語言的機器學習工具包。它主要是包含許多分類器比如 SVMS、K-NN、随機森林以及決策樹中使用監督分類法，它還可執行特征選擇，可以形成不同的例如無監督學習、密切關系傳播和由 MILK 支援的 K-means 聚類等分類系統。使用 MILK 訓練一個分類器：

import numpy as np
import milk
features = np.random.rand(100,10)
labels = np.zeros(100)
features[50:] += .5
labels[50:] = 1
learner = milk.defaultclassifier()
model = learner.train(features, labels)


# Now you can use the model on new examples:
example = np.random.rand(10)
print(model.apply(example))
example2 = np.random.rand(10)
example2 += .5
print(model.apply(example2))

20、TensorFlow

TensorFlow 是一個端到端開源機器學習平台。它擁有一個全面而靈活的生态系統，一般可以将其分為 TensorFlow1.x 和 TensorFlow2.x，TensorFlow1.x 與 TensorFlow2.x 的主要差別在于 TF1.x 使用靜态圖而 TF2.x 使用Eager Mode動态圖。這裡主要使用TensorFlow2.x作為示例，展示在 TensorFlow2.x 中建構卷積神經網絡 (Convolutional Neural Network, CNN)。

import tensorflow as tf


from tensorflow.keras import datasets, layers, models


# 資料加載
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()


# 資料預處理
train_images, test_images = train_images / 255.0, test_images / 255.0


# 模型建構
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))


# 模型編譯與訓練
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10, 
                    validation_data=(test_images, test_labels))

21、PyTorch

PyTorch 的前身是 Torch，其底層和 Torch 架構一樣，但是使用 Python 重新寫了很多内容，不僅更加靈活，支援動态圖，而且提供了 Python 接口。

# 導入庫
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt


# 模型建構
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))


# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )


    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits


model = NeuralNetwork().to(device)


# 損失函數和優化器
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)


# 模型訓練
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)


        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)


        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()


        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

22、Theano

Theano 是一個 Python 庫，它允許定義、優化和有效地計算涉及多元數組的數學表達式，建在 NumPy 之上。在 Theano 中實作計算雅可比矩陣：

import theano
import theano.tensor as T
x = T.dvector('x')
y = x ** 2
J, updates = theano.scan(lambda i, y,x : T.grad(y[i], x), sequences=T.arange(y.shape[0]), non_sequences=[y,x])
f = theano.function([x], J, updates=updates)
f([4, 4])

23、Keras

Keras 是一個用 Python 編寫的進階神經網絡 API，它能夠以 TensorFlow, CNTK, 或者 Theano 作為後端運作。Keras 的開發重點是支援快速的實驗，能夠以最小的時延把想法轉換為實驗結果。

from keras.models import Sequential
from keras.layers import Dense


# 模型建構
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))


# 模型編譯與訓練
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=32)

24、Caffe

在 Caffe2 官方網站上，這樣說道：Caffe2 現在是 PyTorch 的一部分。雖然這些 api 将繼續工作，但鼓勵使用 PyTorch api。

25、MXNet

MXNet 是一款設計為效率和靈活性的深度學習架構。它允許混合符号程式設計和指令式程式設計，進而最大限度提高效率和生産力。使用 MXNet 建構手寫數字識别模型：

import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn
from mxnet import autograd as ag
import mxnet.ndarray as F


# 資料加載
mnist = mx.test_utils.get_mnist()
batch_size = 100
train_data = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_data = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)


# CNN模型
class Net(gluon.Block):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.conv1 = nn.Conv2D(20, kernel_size=(5,5))
        self.pool1 = nn.MaxPool2D(pool_size=(2,2), strides = (2,2))
        self.conv2 = nn.Conv2D(50, kernel_size=(5,5))
        self.pool2 = nn.MaxPool2D(pool_size=(2,2), strides = (2,2))
        self.fc1 = nn.Dense(500)
        self.fc2 = nn.Dense(10)


    def forward(self, x):
        x = self.pool1(F.tanh(self.conv1(x)))
        x = self.pool2(F.tanh(self.conv2(x)))
        # 0 means copy over size from corresponding dimension.
        # -1 means infer size from the rest of dimensions.
        x = x.reshape((0, -1))
        x = F.tanh(self.fc1(x))
        x = F.tanh(self.fc2(x))
        return x
net = Net()
# 初始化與優化器定義
# set the context on GPU is available otherwise CPU
ctx = [mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()]
net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})


# 模型訓練
# Use Accuracy as the evaluation metric.
metric = mx.metric.Accuracy()
softmax_cross_entropy_loss = gluon.loss.SoftmaxCrossEntropyLoss()


for i in range(epoch):
    # Reset the train data iterator.
    train_data.reset()
    for batch in train_data:
        data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
        label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
        outputs = []
        # Inside training scope
        with ag.record():
            for x, y in zip(data, label):
                z = net(x)
                # Computes softmax cross entropy loss.
                loss = softmax_cross_entropy_loss(z, y)
                # Backpropogate the error for one iteration.
                loss.backward()
                outputs.append(z)
        metric.update(label, outputs)
        trainer.step(batch.data[0].shape[0])
    # Gets the evaluation result.
    name, acc = metric.get()
    # Reset evaluation result to initial state.
    metric.reset()
    print('training acc at epoch %d: %s=%f'%(i, name, acc))

26、PaddlePaddle

飛槳 (PaddlePaddle) 以百度多年的深度學習技術研究和業務應用為基礎，集深度學習核心訓練和推理架構、基礎模型庫、端到端開發套件、豐富的工具元件于一體。是中國首個自主研發、功能完備、開源開放的産業級深度學習平台。使用 PaddlePaddle 實作 LeNtet5：

# 導入需要的包
import paddle
import numpy as np
from paddle.nn import Conv2D, MaxPool2D, Linear


## 組網
import paddle.nn.functional as F


# 定義 LeNet 網絡結構
class LeNet(paddle.nn.Layer):
    def __init__(self, num_classes=1):
        super(LeNet, self).__init__()
        # 建立卷積和池化層
        # 建立第1個卷積層
        self.conv1 = Conv2D(in_channels=1, out_channels=6, kernel_size=5)
        self.max_pool1 = MaxPool2D(kernel_size=2, stride=2)
        # 尺寸的邏輯：池化層未改變通道數；目前通道數為6
        # 建立第2個卷積層
        self.conv2 = Conv2D(in_channels=6, out_channels=16, kernel_size=5)
        self.max_pool2 = MaxPool2D(kernel_size=2, stride=2)
        # 建立第3個卷積層
        self.conv3 = Conv2D(in_channels=16, out_channels=120, kernel_size=4)
        # 尺寸的邏輯：輸入層将資料拉平[B,C,H,W] -> [B,C*H*W]
        # 輸入size是[28,28]，經過三次卷積和兩次池化之後，C*H*W等于120
        self.fc1 = Linear(in_features=120, out_features=64)
        # 建立全連接配接層，第一個全連接配接層的輸出神經元個數為64， 第二個全連接配接層輸出神經元個數為分類标簽的類别數
        self.fc2 = Linear(in_features=64, out_features=num_classes)
    # 網絡的前向計算過程
    def forward(self, x):
        x = self.conv1(x)
        # 每個卷積層使用Sigmoid激活函數，後面跟着一個2x2的池化
        x = F.sigmoid(x)
        x = self.max_pool1(x)
        x = F.sigmoid(x)
        x = self.conv2(x)
        x = self.max_pool2(x)
        x = self.conv3(x)
        # 尺寸的邏輯：輸入層将資料拉平[B,C,H,W] -> [B,C*H*W]
        x = paddle.reshape(x, [x.shape[0], -1])
        x = self.fc1(x)
        x = F.sigmoid(x)
        x = self.fc2(x)
        return x

27、CNTK

CNTK(Cognitive Toolkit) 是一個深度學習工具包，通過有向圖将神經網絡描述為一系列計算步驟。在這個有向圖中，葉節點表示輸入值或網絡參數，而其他節點表示對其輸入的矩陣運算。CNTK 可以輕松地實作群組合流行的模型類型，如 CNN 等。CNTK 用網絡描述語言 (network description language, NDL) 描述一個神經網絡。簡單的說，要描述輸入的 feature，輸入的 label，一些參數，參數和輸入之間的計算關系，以及目标節點是什麼。

NDLNetworkBuilder=[
    
    run=ndlLR
    
    ndlLR=[
      # sample and label dimensions
      SDim=$dimension$
      LDim=1
    
      features=Input(SDim, 1)
      labels=Input(LDim, 1)
    
      # parameters to learn
      B0 = Parameter(4) 
      W0 = Parameter(4, SDim)
      
      
      B = Parameter(LDim)
      W = Parameter(LDim, 4)
    
      # operations
      t0 = Times(W0, features)
      z0 = Plus(t0, B0)
      s0 = Sigmoid(z0)   
      
      t = Times(W, s0)
      z = Plus(t, B)
      s = Sigmoid(z)    
    
      LR = Logistic(labels, s)
      EP = SquareError(labels, s)
    
      # root nodes
      FeatureNodes=(features)
      LabelNodes=(labels)
      CriteriaNodes=(LR)
      EvalNodes=(EP)
      OutputNodes=(s,t,z,s0,W0)
    ]

27 個Python資料科學庫實戰案例 (附代碼)

繼續閱讀

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入