天天看點

A new field to work on - J. V. King

A new field to work on

在美國之後寫部落格可能還能起到保持中文的作用。這點挺好的。

實驗室老闆給我的第一個project竟然是machine translation。這是一個我一直都有興趣但沒接觸也據一些同僚介紹說沒什麼意思的領域= =!不過蘿蔔青菜各有所愛,既然涉及到中英文的翻譯,其實我的中文背景還能幫上點忙。前段時間把Dan Jurafsky和Chris Manning的online course videos看完了。沒有完全掌握裡面的知識點,大概隻記住了50%。不過對natural language processing的情況有了些sense。列下來:

I made some analogies between speech and natural language processing. For me, it seems the function of linguistics in NLP is just like that of signal processing in speech science. Linguistics provides ways for feature extraction and objective or subjective metric for system evaluation. It\'s the "heuristic" or "not so automatic" part in NLP, just like signal processing in speech. Linguistics also provides ways for preprocessing of raw NLP data or post-processing techniques on nal outcome. All other parts in NLP relates to machine learning.

Problems in NLP seem to have even more exibility than those in speech processing. In speech recognition or synthesis, there is not that much variability in output text or sounds, but NLP outcome may have several forms or interpretations. Thus there might be more unsupervised or heuristic learning methods applied in NLP than in speech processing.

最近在看Peter Brown早期的machine translation的文章,希望能對這個具體的領域有一些sense。後面的一個主要工作是把這兩篇paper看完,對NLP整個領域的會議做一個調研(會議水準,paper接受率,每年的deadline是什麼時候等等)。以及對machine translation做一個field survey,看看大家都在做哪些hot topic。試着把這個subfield做一個分類,每一類找一些survey paper或者journal paper讀一下。選擇一個自己的方向。另外一個需要research的方面是看看有沒有開源的代碼,像speech裡面的HTK或者image中的OpenCV這些baseline tools。

另外重要的一點就是要開始上手看看我們目前的system了,基于讀的paper和system本身選一個方向可能會更靠譜一點。這些問題我還需要經常性的和導師sync meeting。