天天看點

利用Google Speech API實作Speech To Text

利用Google Speech API實作Speech To Text

很久很久以前, 網上流傳着一個免費的,識别率暴高的,穩定的 Speech To Text API, 那就是Google Speech API. 但是最近再使用的時候,總是傳回500 Error. 後來通過檢視源碼知道需要增加一個參數:key=.... 可能是為了防止濫用吧. 并且, 最近Chrome另外釋出了一個長連接配接實時的識别接口, 這對開發者來說真是巨大的福音啊. 在這裡主要對這兩個接口的用法進行介紹.

很久很久以前, 網上流傳着一個免費的,識别率暴高的,穩定的 Speech To Text API, 那就是Google Speech API. 但是最近再使用的時候,總是傳回500 Error. 後來通過檢視源碼知道需要增加一個參數:

key=...

. 可能是為了防止濫用吧. 并且, 最近Chrome另外釋出了一個長連接配接實時的識别接口, 這對開發者來說真是巨大的福音啊. 在這裡主要對這兩個接口的用法進行介紹.

  • 部落格: http://www.cnblogs.com/jhzhu
  • 郵箱: [email protected]
  • 作者: 知明是以
  • 時間: 2014-03-28

關鍵字

SpeechToText,API,google,STT,ASR,SR,speech,recognition

申請Chromium API keys

本文使用的Google Speech API是為google自家的浏覽器Chrome服務的. 可以通過這個Demo體驗一下實際使用的效果: Google Speech To Text Demo.

Chrome來源于開源項目Chromium. 為了友善開發者調試使用, google 開放了這個STT(Speech to Text)接口. 但是, 因為這個借口隻供調試使用, 是以在流量和次數上都有限制.并且, 不提供購買.

好了, 背景介紹完畢, 我們來第一步: 申請Chromium開發者權限.

具體步驟請參考how to get chromium API keys).

Acquiring Keys
  1. Make sure you are a member of [email protected] (you can just subscribe to chromium-dev and choose not to receive mail).

    For convenience, the APIs below are only visible to people subscribed to that group.

  2. Make sure you are logged in with the Google account associated with the email address that you used to subscribe to chromium-dev.
  3. Go to https://cloud.google.com/console(請使用舊版console)
  4. Click the red Create project… button.
  5. (Optional) You may add other members of your organization or team on the Team tab.
  6. In the ‘APIs & auth’ > APIs tab, click the On/Off button to turn each of the following APIs to the On position, and read and agree to the Terms of Service that is shown:

    (This list might be out of date; try searching for APIs starting with “Chrome” or having “for Chrome” in the name.) * Chrome Remote Desktop API

    • Chrome Spelling API
    • Chrome Suggest API
    • Chrome Sync API
    • Chrome Translate Element
    • Google Maps Geolocation API (requires enabling billing but is free to use; you can skip this one, in which case geolocation features of Chrome will not work)
    • Safe Browsing API
    • Speech API
    • Time Zone API
    • Google Cloud Messaging for Chrome
    • Google Now For Chrome API

      If any of these APIs are not shown, recheck step 1.

  7. Go to the Credentials tab under the APIs & auth tab.
  8. Click the red Create New Client ID button in the OAuth section to create an OAuth 2.0 client ID.
    • You want “Installed Application” for the Application type section
    • You want “Other” for the Installed application type section
  9. A new box should now appear titled “Client ID for installed applications”. In the next sections, we will refer to the values of the “Client ID” and “Client secret” fields in this box later (below).
  10. Click the red Create New Key button in the Public API Access section and create a new Browser key.

    You want to leave the box on the “Create a browser key and configure allowed referers” empty.

  11. A new box should appear titled “Key for browser applications”. The next sections will refer to the value of the “API key” field too.

好了, 到這裡, 我們已經獲得了應用key, 在下文我們用

{key}

表示這個key.

One Shot Recognition

我們用

curl

來向伺服器發送請求:

curl -X POST \
--data-binary @speech.flac \
--user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7' \
--header 'Content-Type: audio/x-flac; rate=8000;' \
'https://www.google.com/speech-api/v1/recognize?client=chromium&lang=zh-CN&maxresults=5&pfilter=0&key=AIzaSyC6Tkf4*****Q0CdISn-qnHhwLaS3cg2a0'             
參數 解釋
-X POST 表示發送HTTP請求
–data-binary @speech.flac 發送音頻檔案

speech.flac

–user-agent ‘…’ http的參數,設定浏覽器的

user-agent

資訊
–header http的參數. 指定了傳送内容的類型(

audio/flac

)和音頻頻率(

8000Hz

). 注意, 隻支援特定的幾種頻率(

8000Hz,4000Hz

還有幾個記不清了),上傳的flac檔案頻率要和參數一緻.
https://www.google.com/…/&key=AIzaSyC6Tkf*****Q0CdISn-qnHhwLaS3cg2a0 http請求位址,其中最後一部分的key,應該替換為您申請的

{key}

.

等待一分鐘左右, 如果你運氣好的話, 能看到如下結果:

結果格式如下, 應該很清晰了吧:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": 
    [
        {
            "utterance": "i like pickles",
            "confidence": 0.9012539
        },
        {
            "utterance": "i like pickle"
        }
    ]
}           

如果您錄音的格式不對的話, 可以用開源軟體

sox

友善的轉換格式和碼率. 舉個栗子:

sox ./speech.mp3 -b 8 speech.flac trim 0 15           
./speech.mp3 輸入檔案
-b 8 輸出檔案頻率為 8kHz
speech.flac 輸出檔案名
trim 0 15 截取輸入檔案的0~15秒的部分, 輸出出來

Stream Recognition

後來, Google 提供了更先進的live的雙向的識别接口. 即同時打開兩個HTTP連接配接, 一個負責實時發送(

POST

)音頻流, 一個負責接受(

GET

).

這裡有一個

PHP

版本的Demo. 可以參考實作您自己的

Stream Recognition

:

Google Speech API – Full Duplex PHP Version

引用:

  1. Google Speech API – Full Duplex PHP Version

    http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/

  2. Accessing Google Speech API / Chrome 11

    http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/

  3. Google Speech To Text API ( 9 months ago )

    https://gist.github.com/alotaiba/1730160

  4. 避開Google Voice Search利用Google Speech API實作Android語音識别

    http://my.eoe.cn/sisuer/archive/5960.html

  5. How to Use Google Speech API( with sox )

    http://www.x2q.net/blog/2013/09/16/how-to-use-google-speech-api/

  6. Google Chomium Open Project

    http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

    http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/google_one_shot_remote_engine.cc

Written with StackEdit.

此部落格已遷移至blog.bookbook.in,以後不再更新

繼續閱讀