LargeNGramModel API 語言模型

<span style="font-size:24px;">public class LargeNGramModel implements LanguageModel, BackoffLanguageModel
使用一個二進制NGram（n元）語言模型檔案（”DMP 檔案”）的語言模型。此語言模型是通過SphinxBase sphinx_lm_convert産生的。
本類的屬性：
@S4String(mandatory = false)
public static final String PROP_QUERY_LOG_FILE = "queryLogFile";記錄了所有詢問N-grams的檔案名屬性。如果此屬性值為null，則它意味着詢問詢問N-grams沒有被記錄。
  @S4Integer(defaultValue = 100000)
public static final String PROP_NGRAM_CACHE_SIZE = "ngramCacheSize";屬性定義了能緩存的ngrams的最大的數目即個數。
@S4Boolean(defaultValue = false)
public static final String PROP_CLEAR_CACHES_AFTER_UTTERANCE = "clearCachesAfterUtterance";屬性用于控制在每一個utterance後是否清空ngram緩存。
@S4Double(defaultValue = 1.0f)
public final static String PROP_LANGUAGE_WEIGHT = "languageWeight";屬性定義了為搜尋定義了語言權重。
@S4Component(type = LogMath.class)
public final static String PROP_LOG_MATH = "logMath";屬性定義了logmath元件。
@S4Boolean(defaultValue = false)
public final static String PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP = "applyLanguageWeightAndWip";屬性用于控制語言模型是否将會應用語言權重和字插入機率。
@S4Double(defaultValue = 1.0f)
public final static String PROP_WORD_INSERTION_PROBABILITY = "wordInsertionProbability";字插入機率屬性。
@S4Boolean(defaultValue = false)
public final static String PROP_FULL_SMEAR = "fullSmear";如果為true，使用全bigram資訊來決定smear。
public static final int BYTES_PER_NGRAM = 4;
public static final int BYTES_PER_NMAXGRAM = 2;
由the CMU-Cambridge Statistical Language Modeling Toolkit統計語言模型工具産生的語言模型檔案中的每一個N-gram的所占的位元組數。
private final static int SMEAR_MAGIC = 0xC0CAC01A;
smear魔法數。事情将會更好。
配置資料：
URL location;
    protected Logger logger;
    protected LogMath logMath;
    protected int maxDepth;、
    protected int ngramCacheSize;
    protected boolean clearCacheAfterUtterance;
    protected boolean fullSmear;
    protected Dictionary dictionary;
    protected String format;
    protected boolean applyLanguageWeightAndWip;
    protected float languageWeight;
    protected float unigramWeight;
    protected double wip;
統計資料：
private int ngramMisses;
private int ngramHits;
private int smearTermCount;
protected String ngramLogFile;
子元件：
private BinaryLoader loader;
private PrintWriter logFile;
工作時資料：
private Map<Word, UnigramProbability> unigramIDMap;
    private Map<WordSequence, NGramBuffer>[] loadedNGramBuffers;
    private LRUCache<WordSequence, ProbDepth> ngramDepthCache;
    private Map<Long, Float> bigramSmearMap;
    private NGramBuffer[] loadedBigramBuffers;
    private UnigramProbability[] unigrams;
    private int[][] ngramSegmentTable;
    private float[][] ngramProbTable;
    private float[][] ngramBackoffTable;
    private float[] unigramSmearTerm;
本類的構造方法：
public LargeNGramModel( String format, URL location, String ngramLogFile,
int maxNGramCacheSize, boolean clearCacheAfterUtterance, int maxDepth,  LogMath logMath, Dictionary dictionary,boolean applyLanguageWeightAndWip, float languageWeight,double wip, float unigramWeight, boolean fullSmear )；給定參數建立對象。
public LargeNGramModel()；空的構造方法.
本類的方法：
public void newProperties(PropertySheet ps)；對屬性進行設定。
public void allocate()；配置設定資源。
private void buildUnigramIDMap(Dictionary dictionary)；建立word與UnigramProbability對的map即對unigramIDMap進行了設定。往其中放入word與UnigramProbability對。
public void start()；在識别之前調用。
public void stop()；在識别後調用。在本方法中清空了緩存和logfile。
private void clearCache()；清空ngram緩存。
public ProbDepth getProbDepth(WordSequence wordSequence)；傳回預測的機率和深度。使用了為高階的ngrams。wordSequence為字序列用來獲得機率。
private ProbDepth getUnigramProbDepth(WordSequence wordSequence) ；傳回的是給定unigram的unigram機率。參數：wordSequence 為unigram字序列。傳回的是unigram的機率。傳回的是wordSequence 中第一個字對應的unigram的機率即ProbDepth 。
public float getProbability(WordSequence wordSequence)；獲得字序列的ngram機率。機率是在log域的。
private NGramProbability findNGram(WordSequence wordSequence)；傳回的一個給定ngram的NGramProbability。
wordSequence為裝載ngram。本方法查找或裝載給定ngram的NGramProbability。
private boolean is32bits()；告訴模型是16位的還是32位的。是32位傳回為true。
private NGramBuffer loadNGramBuffer(WordSequence ws)；傳回的是給定字序列的所有NGram跟随者的一個NGramBuffer對象，ws為n-1gram用來查找跟随者。本方法把給定的n-1gram的所有ngram跟随者裝載入一個緩存中。
private NGramBuffer getBigramBuffer(int firstWordID)；傳回給定字的bigrams。輸入參數：firstWordID為字的id。傳回的是字的bigrams。
private NGramBuffer getNGramBuffer(WordSequence wordSequence)；傳回的是給定字序列的ngrams，wordSequence用來得到緩存。傳回的是字序列的ngrambuffer。
  private int getFirstNGramEntry(NGramProbability nMinus1Gram, int firstNMinus1GramEntry, int n)；傳回的是給定n-1gram的第一個ngram項的索引。輸入參數：nMinus1Gram為我們所查找的首個ngram項的n-1gram。firstNMinus1GramEntry為在考慮的n-1gram的第一個n-1gram項。N為ngram的階。
private ProbDepth getUnigramProbDepth(WordSequence wordSequence)；傳回的是給定unigram的unigram機率。wordSequence為unigram字序列。
private UnigramProbability getUnigram(Word unigram)；如果此語言模型有給定的unigram，則傳回它的unigramprobability。否則為null。Unigram為要查找的unigram。
private UnigramProbability getUnigram(Word unigram)；如果此語言模型有給定的unigram，則傳回它的UnigramProbability 。通過給定字來獲得相應的UnigramProbability 對象。
private boolean hasUnigram(Word unigram) ；如果此語言模型包括輸入的unigram 字，則傳回為true。否則為false。
public final int getWordID(Word word)；傳回給定字的id。
public float getSmearOld(WordSequence wordSequence)；得到給定字序列的smear項。傳回的是the smear term associated with this word sequence。
public float getSmear(WordSequence wordSequence)；傳回與輸入相關的smearterm。
private int getNumberBigramFollowers(int wordID)；傳回的是一個字的bigrams跟随者的數目即個數。wordID 為字的id。傳回的是the number of bigram followers 。
public int getMaxDepth() ；傳回的是語言模型的最大深度。即tMaxDepth屬性。
public Set<String> getVocabulary()；傳回的是在語言模型中的字拼寫的集合。此集合是不可改變的。String 為字的拼寫。
public int getNGramMisses()；傳回當一個ngram被詢問，但是在語言模型中卻不存在此ngram的次數。在這種情況下它使用的是backoff機率。傳回的ngram丢失的次數。即NGramMisses屬性。
public int getNGramHits()；傳回的是ngram 碰撞的次數，即NGramHits屬性。
private NGramBuffer getBigramBuffer(int firstWordID)；傳回的是給定字的bigrams。firstWordID為字的id。傳回的是存儲bigrams的ngrambuffer。
private NGramBuffer loadTrigramBuffer(int firstWordID, int secondWordID)；傳回的是把給定bigram的所有trigram跟随者存入緩存。輸入參數：firstWordID為首字id，secondWordID為第二個字id。傳回的是存儲trigram跟随者的ngrambuffer。
private void buildSmearInfo()；建立smear資訊。
private void dumpProbs(double[] ugNumerator, double[] ugDenominator, int i,int j, float logugprob, float logbgprob, double ugprob,double bgprob, double backoffbgprob, double logbackoffbgprob)；列印出機率資訊。
private void writeSmearInfo(String filename)；把smear資訊寫入到指定的檔案中。
private void readSmearInfo(String filename)；從給定檔案中讀取smear資訊。
private void putSmearTerm(int word1, int word2, float smearTerm)；為2個字放置smear term。
private Float getSmearTerm(int word1, int word2)；獲得2個字的smear term。
private float getBigramProb(int word1, int word2)；獲得2個給定字的bigram機率。
public void deallocate()；釋放相應的資源。調用了load.deallocate（）方法。
private void readSmearInfo(String filename)；從給定檔案中讀取smear資訊。</span>

LargeNGramModel API 語言模型

繼續閱讀

Linux 16.04 + CUDA8.0 + kaldi + CNTK

李宏毅深度學習 Transformer一、Transformer是什麼二、訓練Transformer的Tips

7-FreeSwitch-mrcp-plugin-with-freeswitch（親測可用，自我整理）

百度語音識别SDK使用方法

放肆玩，一起玩！這次鴻蒙4主打一個時尚、智慧、流暢。【設計更年輕更時尚】這應該是鴻蒙視覺層面迄今為止最大幅度更新。雜志化

基于MATLAB的多方法車牌識别識别系統【GUI，多方法，對比，語音播報，出入庫，剩餘車位】...

基于MATLAB的車票識别系統

基于MATLAB的說話人識别系統

基于ASRT中文語音識别系統的優化

2018自然語言研究報告

【新到車型】雷克薩斯2020款ES200豪華版【上牌時間】2021年3月【行駛裡程】4.7萬KM【4S店指導價】30.9

MATLAB神經網絡手寫數字識别（GUI界面）

語音識别，語義了解一站式解決（android平台&olami sdk）

Android語音識别SDK語義了解與解析方法

語音識别之HTK重了解

電話機器人API接口-空号識别-座席WEBAPI