本文介紹小娜語音指令集的使用場景,如何将UWP應用接入小娜的語音指令集,使使用者直接通過小娜啟動應用并使用應用中 一些輕量級的功能。文中以必應詞典作為執行個體講解必應詞典UWP版本是如何接入小娜語音功能的。
小娜算得上是Windows 10一大賣點,跟鄰居家的Google Now和Siri比起來叫好聲也更多。除了經典開始菜單的“回歸”,UWP通用應用架構,營銷人員口中三句話不離的,自然還有微軟小娜。其實微軟小娜是具有第三方應用整合能力的,而且隐隐可見微軟讓小娜平台化的意圖,是以小娜的入口自然也就成了開發者的兵家必整之地了。
使用情景
目前小娜開放給第三方應用的接口主要是語音指令集(Voice Command Definitions)。
現來看看VCD到底能做些什麼。VCD的使用場景概括說來有兩種:
第一種是利用第三方應用本身的資料能力,在使用者輸入語音指令或文字指令後,在小娜的界面内顯示由第三方應用提供的一些資料,完成一些輕量級功能。比如提供一些文字、資訊。
第二種是将使用者輸入的語音指令或文字中的資訊,作為第三方應用的啟動參數,在應用打開後直接跳轉到相應的功能頁面,縮短導航的路徑長度。比如對小娜說“在大衆點評中查找附近吃烤魚的飯館”,小娜将會打開大衆點評,直接跳轉到能吃烤魚的附近的餐廳。這裡小娜為使用者省去了打開應用,打開查找頁,搜尋附近吃烤魚的餐廳這幾步。
在新版本的必應詞典中,主要用到的是第一種情景。我們先來看一看整個體驗長啥樣:
使用者對着小娜說:“必應詞典,告訴我cute是什麼意思?” 小娜了解以後跟必應詞典溝通,取回cute的意思,并顯示出來
![](https://img.laitimes.com/img/_0nNw4CM6IyYiwiM6ICdiwiIn5GcuIjY4UWMkV2M4gzY3M2M5MmMyU2MmJWNyU2MwEDZ2YGZfdWbp9CXt92Yu4GZjlGbh5SZslmZxl3Lc9CX6MHc0RHaiojIsJye.png)
開發實作
要實作這些,有兩個關鍵部分,第一個是定義文法。
小娜VCD的文法檔案是一個xml格式的檔案。先來看看官方的文檔以及官文的執行個體代碼:
<a href="https://github.com/Microsoft/Windows-universal-samples/tree/master/Samples/CortanaVoiceCommand">https://github.com/Microsoft/Windows-universal-samples/tree/master/Samples/CortanaVoiceCommand</a>
我在這裡重點介紹必應詞典的VCD實作。下面是必應詞典VCD的文法檔案:
<?xml version="1.0" encoding="utf-8" ?>
<VoiceCommands xmlns="http://schemas.microsoft.com/voicecommands/1.2">
<CommandSet xml:lang="zh-hans-cn" Name="DictCommandSet_zh-cn">
<AppName> 必應詞典 </AppName>
<Example> 翻譯一下 friend </Example>
<Command Name="searchWord">
<Example> 翻譯一下 friend </Example>
<ListenFor RequireAppName="ExplicitlySpecified"> {builtin:AppName}[告訴我]{query}的意思 </ListenFor>
<ListenFor RequireAppName="ExplicitlySpecified"> {builtin:AppName}[告訴我]{query}[是]什麼意思 </ListenFor>
<ListenFor RequireAppName="ExplicitlySpecified"> {builtin:AppName}{query}[用][英語][英文]怎麼說 </ListenFor>
<ListenFor RequireAppName="ExplicitlySpecified"> {builtin:AppName}[英語][英文]{query}怎麼說 </ListenFor>
<ListenFor RequireAppName="ExplicitlySpecified"> {builtin:AppName}{query}用[漢語][中文]怎麼說 </ListenFor>
<ListenFor RequireAppName="ExplicitlySpecified"> {builtin:AppName}什麼是{query}</ListenFor>
<Feedback>正在查詢{query}的釋義...</Feedback>
<VoiceCommandService Target="DictVoiceCommandService"/>
</Command>
<Command Name="translate">
<Example> 翻譯一下 friend</Example>
<ListenFor RequireAppName="ExplicitlySpecified"> {builtin:AppName}翻譯[一下][單詞]{query}</ListenFor>
<Feedback>正在翻譯{query}...</Feedback>
<PhraseTopic Label="query" Scenario="Search">
<Subject> Words </Subject>
</PhraseTopic>
</CommandSet>
</VoiceCommands>
VCD中的文法是區分語言的,每個Cortana語言,都是一個CommandSet。對于中文來說,是zh-CN或zh-hans-CN。每一個CommandSet要求一個AppName。理論上這個AppName的名稱是可以自定義的,未必非要與應用一模一樣。比如我們的應用全名叫“必應詞典Win10版”,如果使用者需要說:“必應詞典Win10版告訴我cute是什麼意思?”估計使用者會崩潰。不過取應用名的時候還是要稍微講究一點,一來是使用者用着友善,二來如果名字起得太常見可能會跟其它應用産生歧義,也可能有因為破壞了小娜自身的一些功能而被使用者解除安裝的風險。
在ListenFor語句中,[]表示可選字,{}表示特殊字。一句ListenFor中,不能全部由可選字組成,否則就像正規表達式中的.或*一樣,無法比對了。{builtin:AppName} 是應用名字出現的位置,應用的名字可以出現在一句話的開頭,也可以在其它位置。
例如:<ListenFor RequireAppName="ExplicitlySpecified"> {builtin:AppName}{query}[用][英語][英文]怎麼說 </ListenFor>
對于這 句文法,“必應詞典xxx怎麼說”,“必應詞典xxx用英語怎麼說”,“必應詞典xxx用英文怎麼說”,“必應詞典xxx英語怎麼說”,等等都是可識别的。
正如msdn所述,PhraseTopic可表示任意詞,subject和scenario用來輔助語言識别模型更準确的識别語音輸入。枚舉類型可以通過msdn查到。
在應用的App.xaml.cs檔案中,需要把寫好的檔案在應用啟動時裝載進Cortana。
protected async override void OnLaunched(LaunchActivatedEventArgs e)
{
…
InstallVoiceCommand();
}
private async Task InstallVoiceCommand()
try
{
//user can stop VCD in settings
if (AppSettings.GetInstance().CortanaVCDEnableStatus == false)
return;
// Install the main VCD. Since there's no simple way to test that the VCD has been imported, or that it's your most recent
// version, it's not unreasonable to do this upon app load.
StorageFile vcdStorageFile = await Package.Current.InstalledLocation.GetFileAsync(@"DictVoiceCommands.xml");
await Windows.ApplicationModel.VoiceCommands.VoiceCommandDefinitionManager.InstallCommandDefinitionsFromStorageFileAsync(vcdStorageFile);
}
catch (Exception ex)
System.Diagnostics.Debug.WriteLine("Installing Voice Commands Failed: " + ex.ToString());
第二個重要的部分是語音應用服務(app service)
仿照msdn的sample,必應詞典也在解決方案中建立了一個BingDictUWP.VoiceCommands工程。需要注意的是,這個工程的output type必須是Windows Runtime Component。否則backgroundtask将不工作。如下圖:
對于backgroundtask這個項目,大家仍然可以從github上下載下傳剛才分享的連結裡的項目。大體架構可以直接用那個sample,自己在相應位置做一些修改。
以下是必應詞典用來處理小娜發回的語音指令的代碼
namespace BingDictUWP.AppExtensions
/// <summary>
/// The VoiceCommandService implements the entrypoint for all headless voice commands
/// invoked via Cortana. The individual commands supported are described in the
/// AdventureworksCommands.xml VCD file in the AdventureWorks project. The service
/// entrypoint is defined in the Package Manifest (See section uap:Extension in
/// AdventureWorks:Package.appxmanifest)
/// </summary>
public sealed class DictVoiceCommandService : IBackgroundTask
{
...
/// <summary>
/// Background task entrypoint. Voice Commands using the <VoiceCommandService Target="...">
/// tag will invoke this when they are recognized by Cortana, passing along details of the
/// invocation.
///
/// Background tasks must respond to activation by Cortana within 0.5 seconds, and must
/// report progress to Cortana every 5 seconds (unless Cortana is waiting for user
/// input). There is no execution time limit on the background task managed by Cortana,
/// but developers should use plmdebug (https://msdn.microsoft.com/en-us/library/windows/hardware/jj680085%28v=vs.85%29.aspx)
/// on the Cortana app package in order to prevent Cortana timing out the task during
/// debugging.
/// Cortana dismisses its UI if it loses focus. This will cause it to terminate the background
/// task, even if the background task is being debugged. Use of Remote Debugging is recommended
/// in order to debug background task behaviors. In order to debug background tasks, open the
/// project properties for the app package (not the background task project), and enable
/// Debug -> "Do not launch, but debug my code when it starts". Alternatively, add a long
/// initial progress screen, and attach to the background task process while it executes.
/// </summary>
/// <param name="taskInstance">Connection to the hosting background service process.</param>
public async void Run(IBackgroundTaskInstance taskInstance)
{
mServiceDeferral = taskInstance.GetDeferral();
// Register to receive an event if Cortana dismisses the background task. This will
// occur if the task takes too long to respond, or if Cortana's UI is dismissed.
// Any pending operations should be cancelled or waited on to clean up where possible.
taskInstance.Canceled += OnTaskCanceled;
var triggerDetails = taskInstance.TriggerDetails as AppServiceTriggerDetails;
// Load localized resources for strings sent to Cortana to be displayed to the user.
mCortanaResourceMap = ResourceManager.Current.MainResourceMap.GetSubtree("Resources");
// Select the system language, which is what Cortana should be running as.
mCortanaContext = ResourceContext.GetForViewIndependentUse();
var lang = Windows.Media.SpeechRecognition.SpeechRecognizer.SystemSpeechLanguage.LanguageTag;
mCortanaContext.Languages = new string[] { Windows.Media.SpeechRecognition.SpeechRecognizer.SystemSpeechLanguage.LanguageTag };
// Get the currently used system date format
mDateFormatInfo = CultureInfo.CurrentCulture.DateTimeFormat;
// This should match the uap:AppService and VoiceCommandService references from the
// package manifest and VCD files, respectively. Make sure we've been launched by
// a Cortana Voice Command.
if ((triggerDetails != null) && (triggerDetails.Name == "DictVoiceCommandService"))
try
{
mVoiceServiceConnection = VoiceCommandServiceConnection.FromAppServiceTriggerDetails(triggerDetails);
mVoiceServiceConnection.VoiceCommandCompleted += OnVoiceCommandCompleted;
VoiceCommand voiceCommand = await mVoiceServiceConnection.GetVoiceCommandAsync();
//var properties = voiceCommand.SpeechRecognitionResult.SemanticInterpretation.Properties.Values.First()[0];
// Depending on the operation (defined in AdventureWorks:AdventureWorksCommands.xml)
// perform the appropriate command.
switch (voiceCommand.CommandName)
{
case "searchWord":
case "translate":
var keyword = voiceCommand.Properties["query"][0];
await SendCompletionMessageForKeyword(keyword);
break;
}
}
catch (Exception ex)
System.Diagnostics.Debug.WriteLine("Handling Voice Command failed " + ex.ToString());
}
…}
大家需要在Package.appxmanifest裡聲名App Service并正确填寫該service的entry point,如下圖:
寫在最後
關于Cortana語音指令集,目前還存在一些可以改進的地方,比如語音指令集的文法全靠手寫,并沒有自然語音了解的能力。如果開發者在使用中還有其它痛點,也歡迎給我們留言一起讨論。說不定這些痛點,下個版本就能解決了呢 :)