semi-supervised learning

2023-04-09 07:15:47

半監督學習已經興起七八年了吧，但在中國還是剛剛起步罷。

一、Introduction to semi-supervised learning

What is semi-supervised learning and transductive learning? Why can we ever learn a classifier from unlabeled data? Does unlabeled data always help? Which semi-supervised learning methods are out there? Which one should I use? Answers to these questions set the stage for a detailed look at individual algorithms.

二、Semi-supervised learning algorithms

In fact we will focus on classification algorithms that uses both labeled and unlabeled data. Several families of algorithms will be discussed, which uses different model assumptions:

1、Self-training

Probably the earliest semi-supervised learning method. Still extensively used in the natural language processing community.

2、Generative models

Mixture of Gaussian or multinomial distributions, Hidden Markov Models, and pretty much any generative model can do semi-supervised learning. We will also look into the EM algorithm, which is often used for training generative models when there is unlabeled data.

3、S3VMs

Originally called Transductive SVMs, they are now called Semi-Supervised SVMs to emphasize the fact that they are capable of induction too, not just transduction. The idea is simple and elegant, to find a decision boundary in 'low density' regions. However, the optimization problem behind it is difficult, and so we will discuss the various optimization techniques for S3VM, including the one used in SVM-light, Convex-Concave Procedure (CCCP), Branch-and-Bound, continuation method, etc.

4、Graph-based methods

Here one constructs a graph over the labeled and unlabeled examples, and assumes that two strongly-connected examples tend to have the same label. The graph Laplacian matrix is a central quantity. We will discuss representative algorithms, including manifold regularization.

5、Multiview learning

Exemplified by the Co-Training algorithm, these methods employ multiple 'views' of the same problem, and require that different views produce similar classifications.

6、Other approaches

Metric based model selection, tree-based learning, information-based method, etc.

7、Related problems

Regression with unlabeled data, clustering with side information, classification with positive and unlabeled data; dimensionality reduction with side information, inferring label missing mechanism, etc.

三、Semi-supervised learning in nature

Long before computers come around and machine learning becomes a discipline, learning has occurred in nature. Is semi-supervised learning part of it? The research in this area has just begun. We will look at a few case studies, ranging from infant word learning, human visual system, and human categorization behavior.

四、Challenges for the future

There are many open questions. What new algorithms / assumptions can we make? How to efficiently perform semi-supervised learning for very large problems? What special methods are needed for structured output domains? Can we find a way to guarantee that unlabeled data would not decrease performance? What can we borrow from natural learning? We suggest these as a few potential research directions.

研究半監督的人，首頁上有更多更詳細的介紹：

http://pages.cs.wisc.edu/~jerryzhu/

http://www.kyb.tuebingen.mpg.de/~chapelle

from:http://bbs.w3china.org/blog/more.asp?name=DMman&id=27357

semi-supervised learning

繼續閱讀

K-近鄰算法以及圖像分類應用

weka之NB算法

使用weka的select attribute

weka中分類器算法

在weka中內建自己的算法

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

各種二分查找

查找算法學習之二分查找（Python版本）——BinarySearch

一道某高大上網際網路公司的筆試題分享

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告