天天看點

sklearn .feature_extraction.text.TfidVectorizer.fit_transform(text)

sklearn .feature_extraction.text.TfidVectorizer.fit_transform(text)

def normal_test():
    from sklearn.feature_extraction.text import TfidfVectorizer
    corpus = [
     'This is the first document.',
     'This document is the second document.',
    ]  
    vectorizer = TfidfVectorizer() 
    X = vectorizer.fit_transform(corpus)
    print(X)
           

output:

(0, 0)        0.40909010368335985
  (0, 1)        0.5749618667993135
  (0, 4)        0.40909010368335985
  (0, 2)        0.40909010368335985
  (0, 5)        0.40909010368335985
  (1, 3)        0.4691317250431934
  (1, 0)        0.6675821723880022
  (1, 4)        0.3337910861940011
  (1, 2)        0.3337910861940011
  (1, 5)        0.3337910861940011
           

sklearn .feature_extraction.text.TfidVectorizer.fit_transform(text)

  • 功能解析:

    計算每個詞在其所在的文章中的tf_idf,即逆文檔詞頻。