天天看點

如何為chatbot提供訓練語料

對話的實質是什麼?

我們的生活中充滿對話,從和男朋友準備晚餐的聊天,從快餐店訂一個烤鴨,對公司季度銷售進行總結報告,對話無處不在。對話有不同的長短,不同的主題,不同的重要性和不同的聊天場合,但是我們很少思考:我為什麼要進行這次對話?我的目的是什麼?

本文中,我們從對話是協同行動(coordinating joint action)這個視角來了解它。對話是動态的,充滿了信号和互動。我們可以按照自己的設想開始一段對話,但是很多時候不能保證對話在哪裡結束。chatbot和對話是名詞,但是要很好的了解它們,我們傾向于把它們想象成動詞。我們如何和别人互動?我們怎麼确定對話按照我們想要的方向發展?

關于對話

對話的很大一部分構成是場合。設想如果你在舞會上,想請人跳舞,你可能隻需要走過去,點點頭說一句:我可以麼(May I)?你的舞伴就會明白你的意圖是想邀請她一起跳舞。但是設想如果你是在大街上這樣問一個人,她可能就會很困惑,不知道你的請求是什麼,或者隻是了解為一次善意的打招呼.這就是不符合場合的對話。

這個道理也适用于chatbot。當你在和一個旅行社或者航空公司的chatbot聊天時,這個場合意味着chatbot應該可以幫你預定一個酒店或者改簽你的航班。而不要期望它可以和你深入的聊政治新聞或者微積分方面的知識。場合對成功的對話非常重要,那麼對話的是指是關于什麼呢?通常我們可以把對話分成4個主要組成部分:

1. 互相問候:這部分很容易了解,但是當你說“早上好”,對方說“吃了麼”的時候表達的都是問好,問候有非常多的表達方式,但是目标是一樣的:建立良好關系

2. 資訊交換: 當你說”你晚上打算做什麼“你期望對方給你相應的答案。對話是關于提問和給出答案的過程

3. 鼓動行為:當你說”我們明天一起去逛街吧“或者”你可以幫我拿一下電腦麼“時,對話的一部分是關于制定計劃和提出請求做某事

4. 确定觀點:當你說:”我同意,葡萄比蘋果好吃“時,你在确定自己的觀點

當然,強行把對話按照分成界限分明的4類是不可能的,但是當你在建立chatbot時,有這種把語句單獨分類的意識很重要,你需要AI了解這4個分離的部分和它們之間的互相關系。最重要的是記住對話的關鍵是:協同行動。一個chatbot需要把對話者的話當作輸入内容并且協調統籌後面會發生的事情和要采取的行動:是你的公司要申請退款麼?在chatbot行動之前是否需要對話的人提供跟多的資訊?是否需要真人介入來解決這個問題?一個chatbot在行動之前需要對這些問題做出判斷。

為什麼要開發chatbot?

如果你認為chatbot有點死闆,這是可以了解的,因為之前人們對chatbot和智能助手進行了大肆宣傳,你有理由懷疑:chatbot到底有多大的用處?畢竟在企業投入人力研發之前,我們有必要了解chatbot對我們來說有哪些研發的必要性。

下面是幾個突出的原因:

起初大多數的企業研究chatbot用于客戶服務。它可以幫助潛在的消費者确認适合的衣服尺寸下周是否會到貨,重新預定一個酒店甚至處理更加複雜敏感的金融問題。

我們應該意識到,當對話成了客戶服務重要的一部分時,企業是可以大大節省在每個客戶身上所花費的服務成本的。畢竟客服人員可以同時服務幾個客戶并且隻是相當于電話互動的成本的30%。當使用chatbot時,可以極大的提高效率,因為在某個場合下,大多數的問題都是反複詢問并且可以預測的(一個酒店預定網站處理無數次的關于取消預定,房間更新或者入住時間方面的問題),在很多特定場合下,chatbot可以處理客戶遇到的大多數的問題,這使得你的客戶服務人員可以解放出來去做更多複雜和必須需要人協助的事情

斯坦福大學的教授Chris Pott曾經有個準則:正常的事件可以用正常的語言解決,非正常事件需要非正常語言來解決。chatbot處理正常事件一般沒有問題,是那些非正常的事件它解決起來有困難

這些非正常事件正是客服人員可以并且能夠解決的。通過把不同的對話情景配置設定不同的解決方案(電話,聊天視窗,chatbot),你可以讓客服去解決更棘手的問題,在這種情況下你不再需要衆多的客服人員進而節省費用。此外,chatbot可以24小時工作,他們在春節期間可以工作,從來不請病假,它是可靠的不會曠工。這些聽起來不錯。

現在我們想要說另外一個事實:資訊發送APP近幾年非常流行。事實上,根據商業調查,資訊交流APP現在比社交網絡更加流行。換句話說,資訊交流APP的使用者就是你的客戶,chatbot可以無縫的添加到這些APP當中,如果你的客戶在whatsapp和Messenger上花費的時間超過了Twitter和領英,那麼你為什麼不好好利用呢?Chatbot生來就屬于那裡,聊天APP讓你的客戶可以友善的和你交流而不用在額外的下載下傳你們公司的聊天軟體或者通過電話和你們公司交流。是以為什麼不選擇chatbot呢?它可以把客戶人員從重複枯燥的問題中解救出來,它可以使你的客戶在現有的流行聊天軟體中随和你們公司互動

好吧,那麼我們來開始開發CHATBOT!

首先我們要思考的第一個問題就是我們為什麼要重新開發一個CHATBOT。我們經常把CHATBOT用于客戶服務,它可以幫助C端消費者做決策,在旅行中幫忙訂酒店,為大型SaaS供應商提供問題解答,或者在任何需要大量員工和客戶互動來解決問題的場景。記住,你要實作的目标和建立的内容非常重要。你不需要開發類似SIRI那樣的CHATBOT,同時如果你曾經嘗試問SIRI一些業務細分場景的問題,你會發現它也不能提供滿意的回答。同時你還需要确認你的CHATBOT的打分标準,可以是每個小時服務的客戶數量或者NPS得分,或者其他名額。你需要檢測這些名額,聰明的CHATBOT可以在很多重要衡量名額上向你提供不間斷的回報資訊。在這篇文章中,我們會以為航空公司建立CHATBOT為例子,來簡要介紹如果使你的CHATBOT更聰明,更靈活,更強健,最終滿足你的商業應用。

我們周圍有各式各樣的CHATBOT。它們可以幫助得知明天的天氣,向你定時更新某個新聞,幫你安排會議時間,管理你的财産或者如果你願意,你可以和他們談心,成為朋友。但是我們今天談論的是應用到這些聊天應用的本質的東西,關于對話和訓練CHATBOT對話的一些原則方法

确立了目标之後,你需要考慮我們可以從日常的互動中學習什麼經驗。其中一個“騙局”就是,其實在CHATBOT背後有很多工程師前提程式設計好的答案,比如當你對SIRI說:給我将一個笑話時。SIRI并不是真的當場“想“出了一個上周末聽到的笑話。實際是,SIRI背景在咨詢查詢表格(consult a lookup table),蘋果的工程師提前設想到我們會問這個問題是以把這個問題編寫在SIRI知識庫裡面。對于大多數的公司,這個方法是可用的。記得我們上文提到的”普通事件可以用普通語言回答麼“,你可以對某個應用場景中經常出現的問題,編寫相應的回答。這樣可行是因為我們預先就可以猜測客戶會和我們如何互動,或者我們知道客戶的正常的行為方式。通常,你的客服人員會知道客戶經常會問哪些問題,或者通過你的選票系統或者其他的大資料分析,你可以知道你的客戶經常會問的問題和對話的方式,是以我們隻需要盡可能的把把所有的情況都編碼進去就可以了是麼?實際上,CHATBOT沒有那麼簡單,我們上面談論的隻是一個資訊檢索系統,或者搜尋系統,但是成功的CHATBOT不是搜尋欄。它需要有互動,需要有對話,需要協同行動。下面我們想要介紹4重算法訓練,你通常需要從你的資料庫或者資料服務提供商哪裡擷取訓練資料,這些資料用來使你的CHATBOT裡面對話和客戶進行互動,分别是:

    表達方式:描述同樣一件事有多少種表達方式?你的CHATBOT需要了解盡可能的表達方式,否則它永遠是迷惑的

    相關性:某個特定的回答是否和某個問題相關?

    意圖檢測:你的CHATBOT明白你的客戶的意圖或者目的麼?如果它不明白客戶想要做什麼,那麼無法協同行動。

    實體提取:”我特别想吃蘋果"和“這個蘋果特别好吃”是不一樣的意思。實體提取對于算法了解語言的細微差别非常有幫      助。

訓練CHATBOT的4種語言任務

1: Utterance, or, How Many Ways Can YouOrder a Pizza?

To work at all, your chatbot needs to understand what users are asking it to do. And while you can likely easily identify the most frequent,most normal requests from a user, it's tough to come up with every permutation of those core questions on your own.That's what utterance data collection is all about.The task is simple: set up a task where a bunch of people come up with different ways to ask the same question. What's the question? That'sup to you and your team. But you'd be surprised just how many ways there are to ask for the simplest things.

KEY USE CASES

• Transforming FAQ content into a chatbot(you’ve already written answers, but want tomatch them to lots of different questions)

• Building up voice/text activation for a new feature (how many ways are there to ask fora song to play?)

An example? Reddit's Random Acts of Pizza,where people ask for pizzas and potentially the community responds. If we look at 5,671 requests for pizza, we’ll find that 99.4% of all of them have unique titles. In fact, there were only four repeats at all! Inside the body of the posts themselves, the only repetition that exists over 27,000 sentences are basically just greetings and assorted gratitude:

如何為chatbot提供訓練語料

This a good example of the breadth of just simple requests. “Please pass the salt” and “Salt!” are both ways to make a request, after all, but they feel rather different. And while people will interact with chatbots differently than people(think about how you search for shoes or use Google; it's not exactly how you talk to your friends), accruing a database of the ways people ask for things gives your chatbot fuel to answer those requests in kind.

Now, a section or two ago, we mentioned that we're going to use this eBook to demonstrate how to create the data you need to train a chatbot. We chose to create data around an airline customer service chatbot, but of course,you can do utterance tasks for whatever utterances you want to capture.For our example job, we chose to ask for ways to ask for "can I change my flight?" Again, there are no specifics here (like "I need to change flight 563" or "I have to fly to Vegas instead") so the pool of utterance data is artificially limited a bit,but here's how you do it:

如何為chatbot提供訓練語料

Pretty easy right? Now, one of the things we prides itself on is quality control.But with utterance tasks, that can be tough. You can't come up with the "correct" ways to ask this question (in fact, you're trying to accrue just that data) so you can't use the typical test question format most of jobs take. We get around this ina pretty simple way: two different, intertwined jobs.

Next, let’s look at relevance:

2: Relevance, or, Are We Making Senseor Not?

Once you have a set of utterances, you want tobe able to match them with answers and actions.Relevance tasks do this by giving you trainingdata about you can use to map utterances thatusers might say to the help pages and actiontriggersin your database. They are usually of theform, “here’s a question, here’s an answer, howrelevant is it?”In doing this mapping, you are likely to find thatcertain flavors of questions need longer orshorter responses. The more a response justlooks like “the best matching paragraphs” or "anadjacent answer from our FAQ section," the lessdirect help it offers, the less human it feels, andthe less satisfied your user is.To get a sense of how people know what tosay, let’s look at the four maxims Paul Gricedeveloped that people follow when talking. Ifyou flout these maxims, things get weird.

1. Quantity: be as informative as you possiblycan and give as much information as is needed,and no more

2. Quality: try to be truthful and don’t giveinformation that is false or that is not supportedby evidence

3. Relation: try to be relevant and say things thatare pertinent to the discussion

4. Manner: try to be as clear, as brief, and asorderly as you can in what you say and avoidobscurity and ambiguity

We can reduce these even more. For DanSperber and Deirdre Wilson, the centralthing is “Be relevant”. Or more formally:The issue for chatbots is they can havetrouble understanding context. They'recertainly worse at it than we are. And because of that, some of their responsesare, well, irrelevant. And irrelevantresponses make for bad conversations.They don't coordinate joint action.

This is one of the reasonsit's much simpler to createa chatbot to handle discreteissues (like rescheduling aflight) than one that justwants to talk about any old thing

You see similar tasks in search relevance projects:given a query, does this resultmatch? Is it relevant? Doingthat with chatbot question/answer pairings gives youthe tools you need to tweakyour models and make themmore accurate. It also willshow you where your modelis falling down and where it's succeeding.

3: Intent, or, What Were You Trying to Do Anyway?

When we’re engaging with people in jointactivities like conversation, we are (orbecome) attuned to their intentions. That’swhat’s behind the comedy of somethinglike Lucy and Charlie Brown’s “I knowyou know I know you know” chains ofreasoning. Other minds aren’t entirely opaque to us, even if we tend to fill them inwith our own projections.

Much like the last example, you see intentwork in informational retrieval projects likeinternal search relevance tasks. Basically:does this output match the intent of whatsomeone wanted? When someone searchesfor an iPhone and they're presented with aniPhone case, does that match their intent?The same is true for chatbot replies. Givena question from your utterance corpus,how relevant is the answer your model orhardcoded bot returns?

The reality is that relevance isn't quiteenough for chatbots. Conversationis simply too complicated for simplerelevance to make chatbot responsesgood enough.

Take the airline customer chatbot we'rebuilding. Imagine a customer typing"baggage fees?" What do they actuallymean? Are they asking what the baggagefees for a particular flight are? Are theydemanding a refund for baggage fees theywere recently charged? A chatbot whodoesn't understand context and intentmight just send the customer to an FAQabout baggage fees. And that customerisn't going to be particularly enthusedabout that interaction.

Intent and relevance are intrinsicallylinked. You want to start the process byidentifying which flags your chatbot willbe able to support. Do you want to handleyour top ten issues? Top five? You wantto tackle as many permutations of thoseconversations as possible in your relevanceand intent tasks. And keep in mind,these tasks are sometimes even morevaluable for tuning your bot after it’s beenreleased or with test conversations youconduct with it. You'll be able to analyzewhole conversations, find out wherethey fall down, and give annotators fullerconversations to understand customerintent.

Because, really, that's an important pointhere: intent shows itself most clearlyin the context of a full conversation.That "baggage fees?" comment means amuch different thing based on particular,individual conversations.

如何為chatbot提供訓練語料

Intent tasks often present annotators withconversations (or snippets thereof) and askusers if the chatbot is understanding theintent of the customer. In the places it didnot, it's important to understand whereand why your bot hit a snag. Once that'sunderstood, you can hone your models orhard-code answers to deal with preciselythose issues.

Last thing: remember that point we madeabout your chatbot's personality? That playshere. If your chatbot isn't sure it's going tobe relevant (essentially, it's unconfidentabout output) or is at sea over intent, justask! Chatbots that deal with requests byasking a series of probing questions to findthe exact thing that user is looking to doare far, far more successful that those thatmake pseudo-guesses where they're notfully confident. When in doubt, your chatbotshould aim towards further clarity, notaction.

4: Entity Recognition, or, WhichWashington is this Washington?

Entity recognition is the last major trainingjob for your algorithm. Essentially, itinvolves looking at passages of texts andidentifying "entities" within. Those mightbe places, people, product names, youname it, but generally work best lookingfor specific entities that are valid for yourparticular use case.

Take our example use case of an airlineservice chatbot. If you tell it that you'relooking to go to Washington, what doesthat mean? Because it could mean any ofthe following:

如何為chatbot提供訓練語料

You get the idea. Now, if you're buildinga chatbot that's looking to engage overAmerican history, Washington has atotally different meaning. Ditto to a botlooking to give out college sports scores.The list goes on.

For starters, this is why more generic,multi-purpose bots are so difficult andwhy context is so important for anychatbot. But it's also why you need towork on entity extraction for your chatbotproject. In fact, named entity recognitionis one of the basic building blocks ofnatural language processing and it allowsyour bot to function properly

We've created an entity extractiontool that's very similar to a popular oneyou may have heard of called BRAT.Essentially, on our platform, you provideusers with text blocks and they highlightthe entities you care about. You can seean example below:

如何為chatbot提供訓練語料

In that screenshot, we're interested in afew salient things to build to our airlinechatbot. Note especially that numbers areimportant here. Is it a flight number? Anarrival time? An amount of ounces for carryonsunscreen? The more examples of namedentities your model sees, the more it learnsto understand that some time people typingwon't write "7:25" and instead just write"725" but your bot will actually understand.That increases your bot's accuracy, itsability to actually converse, and, yes, makesit function in the way it's supposed to:coordinating joint action.

CONCLUSION

Nice as it would be, you can't just buy chatbot software out of a box and simply deployit. You need to test, tune, and train your chatbot. Hopefully, this eBook gave you theunderstanding of how that's actually done. But we do want to highlight a few of the keytakeaways we'd love to leave you with now that we're finished:

• Conversations are about coordinating joint action. The best chatbots have realconversations and, thus, coordinate realjoint actions

.• When in doubt, make sure your chatbotis curious. A curious chatbot understandswhat a user really wants before acting. Andpeople are much more willing to answera few extra questions than deal with badoutcomes.

• There are four major chatbot dataprojects. Each are important.

 They are: • Utterance tasks: How many ways arethere to say a thing?

              • Relevance tasks: Does this responseeven make sense?

          • Intent tasks: What did the user want tohappen here?

           • Entity extraction: What are theseparticular words exactly

繼續閱讀