iOS中語音識别功能／語音轉文字教程詳解

前言：最近研究了一下語音識别，從百度語音識别到訊飛語音識别；首先說一下個人針對兩者的看法，訊飛毫無疑問比較專業，識别率也很高真對語音識别是比較精準的，但是很多開發者和我一樣期望離線識别，而訊飛離線是收費的；請求次數來講，兩者都可以申請高配額，真對使用者較多的幾乎都一樣。基于免費并且支援離線我選擇了百度離線語音識别。比較簡單，ui設計多一點，下面寫一下教程：

1.首先：需要的庫

2.我是自定義的ui是以以功能實作為主（頭檔案）

// 頭檔案

#import "bdvrcustomrecognitonviewcontroller.h"

#import "bdvrclientuimanager.h"

#import "wbvoicerecordhud.h"

#import "bdvrviewcontroller.h"

#import "myviewcontroller.h"

#import "bdvrsconfig.h"

3.需要知道的功能：能用到的如下：

//－－－－－－－－－－－－－－－－－－－類方法－－－－－－－－－－－－－－－－－－－－－－－－

// 建立語音識别客戶對像，該對像是個單例

+ (bdvoicerecognitionclient *)sharedinstance;

// 釋放語音識别用戶端對像

+ (void)releaseinstance;

//－－－－－－－－－－－－－－－－－－－識别方法－－－－－－－－－－－－－－－－－－－－－－－

// 判斷是否可以錄音

- (bool)iscanrecorder;

// 開始語音識别，需要實作mvoicerecognitionclientdelegate代理方法，并傳入實作對像監聽事件

// 傳回值參考 tvoicerecognitionstartworkresult

- (int)startvoicerecognition:(id<mvoicerecognitionclientdelegate>)adelegate;

// 說完了，使用者主動完成錄音時調用

- (void)speakfinish;

// 結束本次語音識别

- (void)stopvoicerecognition;

/**

* @brief 擷取目前識别的采樣率

* @return 采樣率(16000/8000)

- (int)getcurrentsamplerate;

* @brief 得到目前識别模式(deprecated)

* @return 目前識别模式

- (int)getcurrentvoicerecognitionmode __attribute__((deprecated));

* @brief 設定目前識别模式(deprecated)，請使用-(void)setproperty:(tbdvoicerecognitionproperty)property;

* @param 識别模式

* @return 是否設定成功

- (void)setcurrentvoicerecognitionmode:(int)amode __attribute__((deprecated));

// 設定識别類型

- (void)setproperty:(tbdvoicerecognitionproperty)property __attribute__((deprecated));

// 擷取目前識别類型

- (int)getrecognitionproperty __attribute__((deprecated));

// 設定識别類型清單, 除evoicerecognitionpropertyinput和evoicerecognitionpropertysong外

// 可以識别類型複合

- (void)setpropertylist: (nsarray*)prop_list;

// cityid僅對evoicerecognitionpropertymap識别類型有效

- (void)setcityid: (nsinteger)cityid;

// 擷取目前識别類型清單

- (nsarray*)getrecognitionpropertylist;

//－－－－－－－－－－－－－－－－－－－提示音－－－－－－－－－－－－－－－－－－－－－－－

// 播放提示音，預設為播放,錄音開始，錄音結束提示音

// bdvoicerecognitionclientresources/tone

// record_start.caf 錄音開始聲音檔案

// record_end.caf 錄音結束聲音檔案

// 聲音資源需要加到項目工程裡，使用者可替換資源檔案，檔案名不可以變，建音提示音不宜過長，0。5秒左右。

// atone 取值參考 tvoicerecognitionplaytones，如沒有找到檔案，則傳回ＮＯ

- (bool)setplaytone:(int)atone isplay:(bool)aisplay;

4.錄音按鈕相關動畫（我自定義的，大家可以借鑒）

// 錄音按鈕相關

@property (nonatomic, weak, readonly) uibutton *holddownbutton;// 說話按鈕

* 是否取消錄音

@property (nonatomic, assign, readwrite) bool iscancelled;

* 是否正在錄音

@property (nonatomic, assign, readwrite) bool isrecording;

* 當錄音按鈕被按下所觸發的事件，這時候是開始錄音

- (void)holddownbuttontouchdown;

* 當手指在錄音按鈕範圍之外離開螢幕所觸發的事件，這時候是取消錄音

- (void)holddownbuttontouchupoutside;

* 當手指在錄音按鈕範圍之内離開螢幕所觸發的事件，這時候是完成錄音

- (void)holddownbuttontouchupinside;

* 當手指滑動到錄音按鈕的範圍之外所觸發的事件

- (void)holddowndragoutside;

5.初始化系統ui

#pragma mark - layout subviews ui

* 根據正常顯示和高亮狀态建立一個按鈕對象

* @param image 正常顯示圖

* @param hlimage 高亮顯示圖

* @return 傳回按鈕對象

- (uibutton *)createbuttonwithimage:(uiimage *)image hlimage:(uiimage *)hlimage ;

- (void)holddowndraginside;

- (void)createinitview; // 建立初始化界面，播放提示音時會用到

- (void)createrecordview; // 建立錄音界面

- (void)createrecognitionview; // 建立識别界面

- (void)createerrorviewwitherrortype:(int)astatus; // 在識别view中顯示詳細錯誤資訊

- (void)createrunlogwithstatus:(int)astatus; // 在狀态view中顯示詳細狀态資訊

- (void)finishrecord:(id)sender; // 使用者點選完成動作

- (void)cancel:(id)sender; // 使用者點選取消動作

- (void)startvoicelevelmetertimer;

- (void)freevoicelevelmetertimertimer;

6.最重要的部分

// 錄音完成

[[bdvoicerecognitionclient sharedinstance] speakfinish];

// 取消錄音

[[bdvoicerecognitionclient sharedinstance] stopvoicerecognition];

7.兩個代理方法

- (void)voicerecognitionclientworkstatus:(int)astatus obj:(id)aobj

{

switch (astatus)

{

case evoicerecognitionclientworkstatusflushdata: // 連續上屏中間結果

{

nsstring *text = [aobj objectatindex:0];

if ([text length] > 0)

{

// [clientsampleviewcontroller logouttocontinusmanualresut:text];

uilabel *clientworkstatusflushlabel = [[uilabel alloc]initwithframe:cgrectmake(kscreenwidth/2 - 100,64,200,60)];

clientworkstatusflushlabel.text = text;

clientworkstatusflushlabel.textalignment = nstextalignmentcenter;

clientworkstatusflushlabel.font = [uifont systemfontofsize:18.0f];

clientworkstatusflushlabel.numberoflines = 0;

clientworkstatusflushlabel.backgroundcolor = [uicolor whitecolor];

[self.view addsubview:clientworkstatusflushlabel];

}

break;

}

case evoicerecognitionclientworkstatusfinish: // 識别正常完成并獲得結果

[self createrunlogwithstatus:astatus];

if ([[bdvoicerecognitionclient sharedinstance] getrecognitionproperty] != evoicerecognitionpropertyinput)

// 搜尋模式下的結果為數組，示例為

// ["公園", "公元"]

nsmutablearray *audioresultdata = (nsmutablearray *)aobj;

nsmutablestring *tmpstring = [[nsmutablestring alloc] initwithstring:@""];

for (int i=0; i < [audioresultdata count]; i++)

{

[tmpstring appendformat:@"%@\r\n",[audioresultdata objectatindex:i]];

}

clientsampleviewcontroller.resultview.text = nil;

[clientsampleviewcontroller logouttomanualresut:tmpstring];

else

nsstring *tmpstring = [[bdvrsconfig sharedinstance] composeinputmoderesult:aobj];

[clientsampleviewcontroller logouttocontinusmanualresut:tmpstring];

if (self.view.superview)

[self.view removefromsuperview];

case evoicerecognitionclientworkstatusreceivedata:

// 此狀态隻有在輸入模式下使用

// 輸入模式下的結果為帶置信度的結果，示例如下：

// [

// {

// "百度" = "0.6055192947387695";

// },

// "擺渡" = "0.3625582158565521";

// ]

// "一下" = "0.7665404081344604";

// }

// ],

// ]

//暫時關掉 -- 否則影響跳轉結果

// nsstring *tmpstring = [[bdvrsconfig sharedinstance] composeinputmoderesult:aobj];

// [clientsampleviewcontroller logouttocontinusmanualresut:tmpstring];

case evoicerecognitionclientworkstatusend: // 使用者說話完成，等待伺服器傳回識别結果

if ([bdvrsconfig sharedinstance].voicelevelmeter)

[self freevoicelevelmetertimertimer];

[self createrecognitionview];

case evoicerecognitionclientworkstatuscancel:

{

if ([bdvrsconfig sharedinstance].voicelevelmeter)

[self createrunlogwithstatus:astatus];

if (self.view.superview)

case evoicerecognitionclientworkstatusstartworking: // 識别庫開始識别工作，使用者可以說話

if ([bdvrsconfig sharedinstance].playstartmusicswitch) // 如果播放了提示音，此時再給使用者提示可以說話

[self createrecordview];

if ([bdvrsconfig sharedinstance].voicelevelmeter) // 開啟語音音量監聽

[self startvoicelevelmetertimer];

[self createrunlogwithstatus:astatus];

case evoicerecognitionclientworkstatusnone:

case evoicerecognitionclientworkplaystarttone:

case evoicerecognitionclientworkplaystarttonefinish:

case evoicerecognitionclientworkstatusstart:

case evoicerecognitionclientworkplayendtonefinish:

case evoicerecognitionclientworkplayendtone:

case evoicerecognitionclientworkstatusnewrecorddata:

default:

}

- (void)voicerecognitionclientnetworkstatus:(int) astatus

switch (astatus)

case evoicerecognitionclientnetworkstatusstart:

{

[[uiapplication sharedapplication] setnetworkactivityindicatorvisible:yes];

break;

case evoicerecognitionclientnetworkstatusend:

[[uiapplication sharedapplication] setnetworkactivityindicatorvisible:no];

}

8.錄音按鈕的一些操作

#pragma mark ------ 關于按鈕操作的一些事情-------

- (void)holddownbuttontouchdown {

// 開始動畫

_displaylink = [cadisplaylink displaylinkwithtarget:self selector:@selector(delayanimation)];

_displaylink.frameinterval = 40;

[_displaylink addtorunloop:[nsrunloop currentrunloop] formode:nsdefaultrunloopmode];

self.iscancelled = no;

self.isrecording = no;

// 開始語音識别功能，之前必須實作mvoicerecognitionclientdelegate協定中的voicerecognitionclientworkstatus:obj方法

int startstatus = -1;

startstatus = [[bdvoicerecognitionclient sharedinstance] startvoicerecognition:self];

if (startstatus != evoicerecognitionstartworking) // 建立失敗則報告錯誤

nsstring *statusstring = [nsstring stringwithformat:@"%d",startstatus];

[self performselector:@selector(firststarterror:) withobject:statusstring afterdelay:0.3]; // 延遲0.3秒，以便能在出錯時正常删除view

return;

// "按住說話－松開搜尋"提示

[voiceimagestr removefromsuperview];

voiceimagestr = [[uiimageview alloc]initwithframe:cgrectmake(kscreenwidth/2 - 40, kscreenheight - 153, 80, 33)];

voiceimagestr.backgroundcolor = [uicolor colorwithpatternimage:[uiimage imagenamed:@"searchvoice"]];

[self.view addsubview:voiceimagestr];

- (void)holddownbuttontouchupoutside {

// 結束動畫

[self.view.layer removeallanimations];

[_displaylink invalidate];

_displaylink = nil;

// 取消錄音

[[bdvoicerecognitionclient sharedinstance] stopvoicerecognition];

if (self.view.superview)

[self.view removefromsuperview];

- (void)holddownbuttontouchupinside {

[[bdvoicerecognitionclient sharedinstance] speakfinish];

- (void)holddowndragoutside {

//如果已經開始錄音了, 才需要做拖曳出去的動作, 否則隻要切換 iscancelled, 不讓錄音開始.

if (self.isrecording) {

// if ([self.delegate respondstoselector:@selector(diddragoutsideaction)]) {

// [self.delegate diddragoutsideaction];

// }

} else {

self.iscancelled = yes;

- (uibutton *)createbuttonwithimage:(uiimage *)image hlimage:(uiimage *)hlimage {

uibutton *button = [[uibutton alloc] initwithframe:cgrectmake(kscreenwidth/2 -36, kscreenheight - 120, 72, 72)];

if (image)

[button setbackgroundimage:image forstate:uicontrolstatenormal];

if (hlimage)

[button setbackgroundimage:hlimage forstate:uicontrolstatehighlighted];

return button;

#pragma mark ----------- 動畫部分 -----------

- (void)startanimation

calayer *layer = [[calayer alloc] init];

layer.cornerradius = [uiscreen mainscreen].bounds.size.width/2;

layer.frame = cgrectmake(0, 0, layer.cornerradius * 2, layer.cornerradius * 2);

layer.position = cgpointmake([uiscreen mainscreen].bounds.size.width/2,[uiscreen mainscreen].bounds.size.height - 84);

// self.view.layer.position;

uicolor *color = [uicolor colorwithred:arc4random()%10*0.1 green:arc4random()%10*0.1 blue:arc4random()%10*0.1 alpha:1];

layer.backgroundcolor = color.cgcolor;

[self.view.layer addsublayer:layer];

camediatimingfunction *defaultcurve = [camediatimingfunction functionwithname:kcamediatimingfunctiondefault];

_animationgroup = [caanimationgroup animation];

_animationgroup.delegate = self;

_animationgroup.duration = 2;

_animationgroup.removedoncompletion = yes;

_animationgroup.timingfunction = defaultcurve;

cabasicanimation *scaleanimation = [cabasicanimation animationwithkeypath:@"transform.scale.xy"];

scaleanimation.fromvalue = @0.0;

scaleanimation.tovalue = @1.0;

scaleanimation.duration = 2;

cakeyframeanimation *opencityanimation = [cakeyframeanimation animationwithkeypath:@"opacity"];

opencityanimation.duration = 2;

opencityanimation.values = @[@0.8,@0.4,@0];

opencityanimation.keytimes = @[@0,@0.5,@1];

opencityanimation.removedoncompletion = yes;

nsarray *animations = @[scaleanimation,opencityanimation];

_animationgroup.animations = animations;

[layer addanimation:_animationgroup forkey:nil];

[self performselector:@selector(removelayer:) withobject:layer afterdelay:1.5];

- (void)removelayer:(calayer *)layer

[layer removefromsuperlayer];

- (void)delayanimation

[self startanimation];

完成以上操作，就大功告成了！

溫馨提示：

1.由于是語音識别，需要用到麥克風相關權限，模拟器會爆12個錯誤，使用真機可以解決；

2.涉及到授權檔案相關并不複雜，工程bundle identifier隻需要設定百度的離線授權一緻即可，如下圖：

最終效果如下：

有不懂或不明白的地方可以微網誌聯系我：

iOS中語音識别功能／語音轉文字教程詳解

繼續閱讀

Android中Webview使用經驗總結

Android開發學習筆記：淺談WebView(轉)

轉 Android WebView應用詳解

Android WebView學習Android WebView學習

（轉）proguard 原理proguard 原理

第一個APP——GeoQuiz

參加微軟MDC大會

位元組面試官：2021年Android常見面試題，面試必備

Android 車載之車載投屏~

新聞 | Mapbox 牽手阿裡，飛豬旅行上線六大城市地圖功能

EGORefreshTableHeaderView 解讀代碼解讀 ELTableViewController 的使用寫在最後

HBuilder開發App Step1——環境搭建，HelloMUI 以及真機調試

今日頭條iOS用戶端啟動速度優化技術調研實測資料

[HTML5]自定義屬性 data-* 和 jQuery.data 詳解

[轉]iOS微信小視訊優化心得

android 主線程的相關問題

iOS中 語音識别功能／語音轉文字教程詳解

繼續閱讀

iOS中語音識别功能／語音轉文字教程詳解