天天看點

機器學習 資料挖掘知識點總結大綱

**

Basis(基礎):

**

  • SSE(Sum of Squared Error, 平方誤差和)
  • SAE(Sum of Absolute Error, 絕對誤差和)
  • SRE(Sum of Relative Error, 相對誤差和)
  • MSE(Mean Squared Error, 均方誤差)
  • RMSE(Root Mean Squared Error, 均方根誤差)
  • RRSE(Root Relative Squared Error, 相對平方根誤差)
  • MAE(Mean Absolute Error, 平均絕對誤差)
  • RAE(Root Absolute Error, 平均絕對誤差平方根)
  • MRSE(Mean Relative Square Error, 相對平均誤差)
  • RRSE(Root Relative Squared Error, 相對平方根誤差)
  • Expectation(期望)&Variance(方差)
  • Standard Deviation(标準差,也稱Root Mean Squared Error, 均方根誤差)
  • CP(Conditional Probability, 條件機率)
  • JP(Joint Probability, 聯合機率)
  • MP(Marginal Probability, 邊緣機率)
  • Bayesian Formula(貝葉斯公式)
  • CC(Correlation Coefficient, 相關系數)
  • Quantile (分位數)
  • Covariance(協方差矩陣)
  • GD(Gradient Descent, 梯度下降)
  • SGD(Stochastic Gradient Descent, 随機梯度下降)
  • LMS(Least Mean Squared, 最小均方)
  • LSM(Least Square Methods, 最小二乘法)
  • NE(Normal Equation, 正規方程)
  • MLE(Maximum Likelihood Estimation, 極大似然估計)
  • QP(Quadratic Programming, 二次規劃)
  • L1 /L2 Regularization(L1/L2正則, 以及更多的, 現在比較火的L2.5正則等)
  • Eigenvalue(特征值)
  • Eigenvector(特征向量)

Common Distribution(常見分布):

Discrete Distribution(離散型分布):

  • Bernoulli Distribution/Binomial Distribution(貝努利分布/二項分布)
  • Negative Binomial Distribution(負二項分布)
  • Multinomial Distribution(多項分布)
  • Geometric Distribution(幾何分布)
  • Hypergeometric Distribution(超幾何分布)
  • Poisson Distribution (泊松分布)

Continuous Distribution (連續型分布):

  • Uniform Distribution(均勻分布)
  • Normal Distribution/Gaussian Distribution(正态分布/高斯分布)
  • Exponential Distribution(指數分布)
  • Lognormal Distribution(對數正态分布)
  • Gamma Distribution(Gamma分布)
  • Beta Distribution(Beta分布)
  • Dirichlet Distribution(狄利克雷分布)
  • Rayleigh Distribution(瑞利分布)
  • Cauchy Distribution(柯西分布)
  • Weibull Distribution (韋伯分布)

Three Sampling Distribution(三大抽樣分布):

  • Chi-square Distribution(卡方分布)
  • t-distribution(t-分布)
  • F-distribution(F-分布)

Data Pre-processing(資料預處理):

  • Missing Value Imputation(缺失值填充)
  • Discretization(離散化)
  • Mapping(映射)
  • Normalization(歸一化/标準化)

Sampling(采樣):

  • Simple Random Sampling(簡單随機采樣)
  • Offline Sampling(離線等可能K采樣)
  • Online Sampling(線上等可能K采樣)
  • Ratio-based Sampling(等比例随機采樣)
  • Acceptance-rejection Sampling(接受-拒絕采樣)
  • Importance Sampling(重要性采樣)
  • MCMC(Markov Chain MonteCarlo 馬爾科夫蒙特卡羅采樣算法:Metropolis-Hasting& Gibbs)

Clustering(聚類):

K-MeansK-Mediods

二分K-Means

FK-Means

Canopy

Spectral-KMeans(譜聚類)

GMM-EM(混合高斯模型-期望最大化算法解決)

K-Pototypes

CLARANS(基于劃分)

BIRCH(基于層次)

CURE(基于層次)

STING(基于網格)

CLIQUE(基于密度和基于網格)

2014年Science上的密度聚類算法等

Clustering Effectiveness Evaluation(聚類效果評估):

Purity(純度)

RI(Rand Index, 芮氏名額)

ARI(Adjusted Rand Index, 調整的芮氏名額)

NMI(Normalized Mutual Information, 規範化互資訊)

F-meaure(F測量)

Classification&Regression(分類&回歸):

LR(Linear Regression, 線性回歸)

LR(Logistic Regression, 邏輯回歸)

SR(Softmax Regression, 多分類邏輯回歸)

GLM(Generalized Linear Model, 廣義線性模型)

RR(Ridge Regression, 嶺回歸/L2正則最小二乘回歸),LASSO(Least Absolute Shrinkage and Selectionator Operator , L1正則最小二乘回歸)

DT(Decision Tree決策樹)

RF(Random Forest, 随機森林)

GBDT(Gradient Boosting Decision Tree, 梯度下降決策樹)

CART(Classification And Regression Tree 分類回歸樹)

KNN(K-Nearest Neighbor, K近鄰)

SVM(Support Vector Machine, 支援向量機, 包括SVC(分類)&SVR(回歸))

CBA(Classification based on Association Rule, 基于關聯規則的分類)

KF(Kernel Function, 核函數)

Polynomial Kernel Function(多項式核函數)

Guassian Kernel Function(高斯核函數)

Radial Basis Function(RBF徑向基函數)

String Kernel Function 字元串核函數

NB(Naive Bayesian,樸素貝葉斯)

BN(Bayesian Network/Bayesian Belief Network/Belief Network 貝葉斯網絡/貝葉斯信度網絡/信念網絡)

LDA(Linear Discriminant Analysis/Fisher Linear Discriminant 線性判别分析/Fisher線性判别)

EL(Ensemble Learning, 內建學習)

Boosting

Bagging

Stacking

AdaBoost(Adaptive Boosting 自适應增強)

MEM(Maximum Entropy Model, 最大熵模型)

Classification EffectivenessEvaluation(分類效果評估):

Confusion Matrix(混淆矩陣)

Precision(精确度)

Recall(召回率)

Accuracy(準确率)

F-score(F得分)

ROC Curve(ROC曲線)

AUC(AUC面積)

Lift Curve(Lift曲線)

KS Curve(KS曲線)

PGM(Probabilistic Graphical Models, 機率圖模型):

BN(BayesianNetwork/Bayesian Belief Network/ Belief Network , 貝葉斯網絡/貝葉斯信度網絡/信念網絡)

MC(Markov Chain, 馬爾科夫鍊)

MEM(Maximum Entropy Model, 最大熵模型)

HMM(Hidden Markov Model, 馬爾科夫模型)

MEMM(Maximum Entropy Markov Model, 最大熵馬爾科夫模型)

CRF(Conditional Random Field,條件随機場)

MRF(Markov Random Field, 馬爾科夫随機場)

Viterbi(維特比算法)

NN(Neural Network, 神經網絡)

ANN(Artificial Neural Network, 人工神經網絡)

SNN(Static Neural Network, 靜态神經網絡)

BP(Error Back Propagation, 誤差反向傳播)

HN(Hopfield Network)

DNN(Dynamic Neural Network, 動态神經網絡)

RNN(Recurrent Neural Network, 循環神經網絡)

SRN(Simple Recurrent Network, 簡單的循環神經網絡)

ESN(Echo State Network, 回聲狀态網絡)

LSTM(Long Short Term Memory, 長短記憶神經網絡)

CW-RNN(Clockwork-Recurrent Neural Network, 時鐘驅動循環神經網絡, 2014ICML)等.

Deep Learning(深度學習):

Auto-encoder(自動編碼器)

SAE(Stacked Auto-encoders堆疊自動編碼器)

Sparse Auto-encoders(稀疏自動編碼器)

Denoising Auto-encoders(去噪自動編碼器)

Contractive Auto-encoders(收縮自動編碼器)

RBM(Restricted Boltzmann Machine, 受限玻爾茲曼機)

DBN(Deep Belief Network, 深度信念網絡)

CNN(Convolutional Neural Network, 卷積神經網絡)

Word2Vec(詞向量學習模型)

Dimensionality Reduction(降維):

LDA(Linear Discriminant Analysis/Fisher Linear Discriminant, 線性判别分析/Fish線性判别)

PCA(Principal Component Analysis, 主成分分析)

ICA(Independent Component Analysis, 獨立成分分析)

SVD(Singular Value Decomposition 奇異值分解)

FA(Factor Analysis 因子分析法)

Text Mining(文本挖掘):

VSM(Vector Space Model, 向量空間模型)

Word2Vec(詞向量學習模型)

TF(Term Frequency, 詞頻)

TF-IDF(TermFrequency-Inverse Document Frequency, 詞頻-逆向文檔頻率)

MI(Mutual Information, 互資訊)

ECE(Expected Cross Entropy, 期望交叉熵)

QEMI(二次資訊熵)

IG(Information Gain, 資訊增益)

IGR(Information Gain Ratio, 資訊增益率)

Gini(基尼系數)

x2 Statistic(x2統計量)

TEW(Text Evidence Weight, 文本證據權)

OR(Odds Ratio, 優勢率)

N-Gram Model

LSA(Latent Semantic Analysis, 潛在語義分析)

PLSA(Probabilistic Latent Semantic Analysis, 基于機率的潛在語義分析)

LDA(Latent Dirichlet Allocation, 潛在狄利克雷模型)

SLM(Statistical Language Model, 統計語言模型)

NPLM(Neural Probabilistic Language Model, 神經機率語言模型)

CBOW(Continuous Bag of Words Model, 連續詞袋模型)

Skip-gram(Skip-gram Model)

Association Mining(關聯挖掘):

Apriori算法

FP-growth(Frequency Pattern Tree Growth, 頻繁模式樹生長算法)

MSApriori(Multi Support-based Apriori, 基于多支援度的Apriori算法)

GSpan(Graph-based Substructure Pattern Mining, 頻繁子圖挖掘)

Sequential Patterns Analysis(序列模式分析)

AprioriAll

Spade

GSP(Generalized Sequential Patterns, 廣義序列模式)

PrefixSpan

Forecast(預測)

LR(Linear Regression, 線性回歸)

SVR(Support Vector Regression, 支援向量機回歸)

ARIMA(Autoregressive Integrated Moving Average Model, 自回歸積分滑動平均模型)

GM(Gray Model, 灰色模型)

BPNN(BP Neural Network, 反向傳播神經網絡)

SRN(Simple Recurrent Network, 簡單循環神經網絡)

LSTM(Long Short Term Memory, 長短記憶神經網絡)

CW-RNN(Clockwork Recurrent Neural Network, 時鐘驅動循環神經網絡)

……

Linked Analysis(連結分析)

HITS(Hyperlink-Induced Topic Search, 基于超連結的主題檢索算法)

PageRank(網頁排名)

Recommendation Engine(推薦引擎):

SVD

Slope One

DBR(Demographic-based Recommendation, 基于人口統計學的推薦)

CBR(Context-based Recommendation, 基于内容的推薦)

CF(Collaborative Filtering, 協同過濾)

UCF(User-based Collaborative Filtering Recommendation, 基于使用者的協同過濾推薦)

ICF(Item-based Collaborative Filtering Recommendation, 基于項目的協同過濾推薦)

Similarity Measure&Distance Measure(相似性與距離度量):

EuclideanDistance(歐式距離)

Chebyshev Distance(切比雪夫距離)

Minkowski Distance(闵可夫斯基距離)

Standardized EuclideanDistance(标準化歐氏距離)

Mahalanobis Distance(馬氏距離)

Cos(Cosine, 餘弦)

Hamming Distance/Edit Distance(漢明距離/編輯距離)

Jaccard Distance(傑卡德距離)

Correlation Coefficient Distance(相關系數距離)

Information Entropy(資訊熵)

KL(Kullback-Leibler Divergence, KL散度/Relative Entropy, 相對熵)

Optimization(最優化):

Non-constrained Optimization(無限制優化):

Cyclic Variable Methods(變量輪換法)

Variable Simplex Methods(可變單純形法)

Newton Methods(牛頓法)

Quasi-Newton Methods(拟牛頓法)

Conjugate Gradient Methods(共轭梯度法)。

Constrained Optimization(有限制優化):

Approximation Programming Methods(近似規劃法)

Penalty Function Methods(罰函數法)

Multiplier Methods(乘子法)。

Heuristic Algorithm(啟發式算法)

SA(Simulated Annealing, 模拟退火算法)

GA(Genetic Algorithm, 遺傳算法)

ACO(Ant Colony Optimization, 蟻群算法)

Feature Selection(特征選擇):

Mutual Information(互資訊)

Document Frequence(文檔頻率)

Information Gain(資訊增益)

Chi-squared Test(卡方檢驗)

Gini(基尼系數)

Outlier Detection(異常點檢測):

Statistic-based(基于統計)

Density-based(基于密度)

Clustering-based(基于聚類)。

Learning to Rank(基于學習的排序):

Pointwise

McRank

Pairwise

RankingSVM

RankNet

Frank

RankBoost;

Listwise

AdaRank

SoftRank

LamdaMART

Tool(工具):

MPI

Hadoop生态圈

Spark

IGraph

BSP

Weka

Mahout

Scikit-learn

PyBrain

Theano

轉自:http://blog.csdn.net/heyongluoyao8/article/details/47840255

繼續閱讀