天天看點

已達到人類水準語音識别模型的whisper,真的有這麼厲害嗎?

作者:拉風小霸王

嗨,好久不見,很長時間沒有寫東西了,是以今天來簡單的帶大家了解一下語音識别模型Whisper。

Whisper是openai在9月釋出的一個開源語音識别翻譯模型,它的英語翻譯的魯棒性和準确性已經達到了很高的水準,支援99種語言翻譯,安裝使用都比較簡單快捷,現在讓我帶大家看看whisper的安裝和簡單使用,過程中也遇到了一些問題,也會把解決辦法貼上去,希望對你們有用。

  • 環境:

Window,Python3.8

  • 安裝:

1.whiper庫安裝

pip install git+https://github.com/openai/whisper.git

運作成功以後cmd界面執行whisper會有如下提示說明安裝成功:

已達到人類水準語音識别模型的whisper,真的有這麼厲害嗎?

2.ffmpeg安裝

Whisper需要使用ffmpeg工具提取聲音資料,是以需要下載下傳安裝ffmpeg,下載下傳位址:

進入下載下傳頁面以後根據下圖依次點選

已達到人類水準語音識别模型的whisper,真的有這麼厲害嗎?
已達到人類水準語音識别模型的whisper,真的有這麼厲害嗎?

根據上圖1,2兩步即可下載下傳ffmpeg壓縮包,解壓到電腦任意位置,然後為其添加環境變量即可,本人路徑為例C:\Users\heyj01\Desktop\ffmpeg-master-latest-win64-gpl-shared\bin添加到環境變量cmd視窗輸入ffmpeg有如下提示代表成功:

已達到人類水準語音識别模型的whisper,真的有這麼厲害嗎?

3.依賴的其它python庫

由于whisper還依賴pytorch,transform等庫,不過當你在接下運作使用whisper進行翻譯的時候根據提示依次使用pip installer子產品名字 安裝即可

  • 使用:

Whisper使用非常簡單

#引用whisper子產品
import whisper
#加載large模型
model = whisper.load_model("large")
#根據視訊的語音翻譯成中文
result = model.transcribe("test.mp4",language='Chinese')
#whispe預設是30秒的翻譯視窗,根據30秒語音切片,生成2秒翻譯結果清單
for i in result["segments"]:    
    print(i['text'])           

首先whisper的模型有下面這幾種,每種大小不一樣,所需要的記憶體計算時間效果也不一樣,模型越小翻譯速度快,但是語音識别翻譯其它跟視訊語言不一緻的語言效果就越差,反之模型越大翻譯速度使用記憶體也越大,效果是越好的。

已達到人類水準語音識别模型的whisper,真的有這麼厲害嗎?

load_model函數還有兩個參數是device,download_root

device是計算引擎,可以選擇cpu,或者cuda(也就是gpu),不填預設為cpu,有顯示卡并且顯存滿足你所選的模型大小可以正常跑起來,不然會報記憶體錯誤。

download_root是模型儲存以及讀取路徑,不填預設為系統使用者下的路徑,我的為例C:\Users\heyj01\.cache\whisper,第一次加載模型,模型沒有在路徑下會下載下傳模型到download_root路徑下。

transcribe函數的language目前支援99種語言,如下:

"en": "english","zh": "chinese",
"de": "german","es": "spanish",
"ru": "russian","ko": "korean",
"fr": "french","ja": "japanese",
"pt": "portuguese","tr": "turkish",
"pl": "polish","ca": "catalan",
"nl": "dutch","ar": "arabic",
"sv": "swedish","it": "italian",
"id": "indonesian","hi": "hindi",
"fi": "finnish","vi": "vietnamese",
"he": "hebrew","uk": "ukrainian",
"el": "greek","ms": "malay",
"cs": "czech","ro": "romanian",
"da": "danish","hu": "hungarian",
"ta": "tamil","no": "norwegian",
"th": "thai","ur": "urdu",
"hr": "croatian","bg": "bulgarian",
"lt": "lithuanian","la": "latin",
"mi": "maori","ml": "malayalam",
"cy": "welsh","sk": "slovak",
"te": "telugu","fa": "persian",
"lv": "latvian","bn": "bengali",
"sr": "serbian","az": "azerbaijani",
"sl": "slovenian","kn": "kannada",
"et": "estonian","mk": "macedonian",
"br": "breton","eu": "basque",
"is": "icelandic","hy": "armenian",
"ne": "nepali","mn": "mongolian",
"bs": "bosnian","kk": "kazakh",
"sq": "albanian","sw": "swahili",
"gl": "galician","mr": "marathi",
"pa": "punjabi","si": "sinhala",
"km": "khmer","sn": "shona",
"yo": "yoruba","so": "somali",
"af": "afrikaans","oc": "occitan",
"ka": "georgian","be": "belarusian",
"tg": "tajik","sd": "sindhi",
"gu": "gujarati","am": "amharic",
"yi": "yiddish","lo": "lao",
"uz": "uzbek","fo": "faroese",
"ht": "haitian creole","ps": "pashto",
"tk": "turkmen","nn": "nynorsk",
"mt": "maltese","sa": "sanskrit",
"lb": "luxembourgish","my": "myanmar",
"bo": "tibetan","tl": "tagalog",
"mg": "malagasy","as": "assamese",
"tt": "tatar","haw": "hawaiian",
"ln": "lingala","ha": "hausa",
"ba": "bashkir","jw": "javanese","su": "sundanese",           

官方還提供了另外一種調用方案:

import whisper
model = whisper.load_model("base")
# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)
# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
# decode the audio
options = whisper.DecodingOptions(language='Chinese')
result = whisper.decode(model, mel, options)
# print the recognized text
print(result.text)           

這種方法在我這裡是有報錯的,因為我電腦沒有gpu是以這一行代碼

options = whisper.DecodingOptions(language='zh')

改成:options = whisper.DecodingOptions(language='zh',fp16 = False),因為cpu不支援fp16。

  • 總結

測試了一下,whiper對英語的識别還是很厲害的,一些小語種的識别翻譯需要用到大模型效果才會好些,不過比起其他的一些識别翻譯模型還是強很多,而且開源了,相信whisper會越來越好的,最後給出whsiper的github位址:

https://github.com/openai/whisper

Whsper的安裝簡單使用就介紹到這了,希望你們能夠使用這個開源模型開發一些有趣的工具,下一篇文章将是我使用whisper+pyqt5開發一個具有語音識别翻譯生成字幕,自動為視訊添加字幕,監聽麥克風生成字幕的工具,有興趣的可以期待一下。

繼續閱讀