我用 tensorflow 實作的 “一個神經聊天模型”

這個工作嘗試重制這個論文的結果 A Neural Conversational Model (aka the Google chatbot).

它使用了循環神經網絡（seq2seq 模型）來進行句子預測。它是用 python 和 TensorFlow 開發。

程式的加載主體部分是參考 Torch 的 neuralconvo from macournoyer.

現在, DeepQA 支援一下對話語料:

Cornell Movie Dialogs corpus (default). Already included when cloning the repository.

OpenSubtitles (thanks to Eschnou). Much bigger corpus (but also noisier). To use it, followthose instructions and use the flag --corpus opensubs.

Supreme Court Conversation Data (thanks to julien-c). Available using --corpus scotus. See the instructions for installation.

Ubuntu Dialogue Corpus (thanks to julien-c). Available using --corpus ubuntu. See theinstructions for installation.

Your own data (thanks to julien-c) by using a simple custom conversation format (See herefor more info).

To speedup the training, it's also possible to use pre-trained word embeddings (thanks toEschnou). More info here.

這個程式需要一下依賴 (easy to install using pip: pip3 install -r requirements.txt):

python 3.5

tensorflow (tested with v1.0)

numpy

CUDA (for using GPU)

nltk (natural language toolkit for tokenized the sentences)

tqdm (for the nice progression bars)

你可能需要下載下傳附帶的資料讓 nltk 正常工作。

Cornell 資料集已經包括了。其他的資料集檢視 readme 檔案到他們所在的檔案夾。 (在 data/).

網站接口需要一些附加的包：

django (tested with 1.10)

channels

Redis (see here)

asgi_redis (at least 1.0)

Docker 安裝也是支援的，更多詳細的教程參考 here.

訓練這個模型，直接運作 main.py 。一旦訓練完成，你可以測試結果用 main.py --test

(結果生成在 'save/model/samples_predictions.txt') 或者用 main.py --test interactive (更有趣).

Here are some flags which could be useful. For more help and options, use python main.py -h:

--modelTag: allow to give a name to the current model to differentiate between them when testing/training.

--keepAll: use this flag when training if when testing, you want to see the predictions at different steps (it can be interesting to see the program changes its name and age as the training progress). Warning: It can quickly take a lot of storage space if you don't increase the --saveEvery option.

--filterVocab 20 or --vocabularySize 30000: Limit the vocabulary size to and optimize the performances and memory usage. Replace the words used less than 20 times by thetoken and set a maximum vocabulary size.

--verbose: when testing, will print the sentences as they are computed.

--playDataset: show some dialogue samples from the dataset (can be use conjointly with --createDataset if this is the only action you want to perform).

To visualize the computational graph and the cost with TensorBoard, just run tensorboard --logdir save/.

預設的這個網絡架構是一個标準的 encoder/decoder 有兩個 LSTM layers (隐藏層大小 256) ，然後 vocabulary 的 embedding size 是 32. 這個網絡用 ADAM 訓練。最大的句子長度設定為 10 個單詞，但是可以增加。

當然，這個網絡并不會很擅長聊天：

這裡有一些情況它并不能正确回答：

chatbot_miniature.png

Screenshot from 2017-09-05 14-47-52.png

1. 下載下傳這個項目：

https://github.com/Conchylicultor/DeepQA

2. 下載下傳訓練好的模型：

https://drive.google.com/file/d/0Bw-phsNSkq23OXRFTkNqN0JGUU0/view

（如果網址不能打開的話，今晚我會上傳到百度網盤，分享到：http://www.tensorflownews.com/）

3. 解壓之後放在項目 save 目錄下，如圖所示

Screenshot from 2017-09-05 14-52-13.png

4. 複制 save/model-pretrainedv2/dataset-cornell-old-lenght10-filter0-vocabSize0.pkl 這個檔案到 data/samples/，如圖所示：

Screenshot from 2017-09-05 14-55-00.png

5. 在項目目錄執行一下指令：

程式讀取了預訓練的模型之後，如圖：

Screenshot from 2017-09-05 14-57-14.png

本文作者：AI研習社

我用 tensorflow 實作的 “一個神經聊天模型”

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入