Abstract: 机器学习基石(林轩田)第三讲,从不同角度对学习进行分类。
Learning with Different Output Space $\mathcal{Y}$
- binary classification: $\mathcal{Y}=\lbrace -1, +1 \rbrace$
- multiclass classification: $\mathcal{Y}=\lbrace 1,2,···,K \rbrace$
- regression: $\mathcal{Y}=\mathbb{R}$
- structured learning: $\mathcal{Y}=structures$ (e.g. 句子的词性结构)
- …
其中,binary classification和regression是解决更复杂问题的核心与基础。
Learning with Different Data Label
- supervised learning: every $\pmb{x}_n$ comes with corresponding $y_n$
- unsupervised learning: learning without $y_n$
- clustering
- density estimation
- outlier detection
- …
- semi-supervised learning: leverage unlabeled data to avoid ‘expensive’ labeling with some given $y_n$
- reinforcement learning: learn with ‘partial/implicit information’ (often sequentially), implicit $y_n$
其中,supervised learning是目前的核心工具。
Learning with Different Protocol $f\Rightarrow (\pmb{x}_n, y_n)$
- batch learning: learn from all known data
- online learning: hypothesis ‘improves’ through receiving data
instances sequentially (passive) - active learning: improve hypothesis with fewer labels (hopefully) by asking questions strategically
其中,目前最常见的还是batch learning。
Learning with Different Input Space $\mathcal{X}$
- concrete features: each dimension of $\mathcal{X}\subset \mathbb{R}^d$ represents ‘sophisticated physical meaning’
- raw features: often need human or machines to convert to concrete ones
- abstract: again need ‘feature conversion/extraction/construction’
其中,concrete features类型的输入是最容易处理的。