Plover Tsai: [word2vec] Fitting Probability Models

2017年5月6日星期六

[word2vec] Fitting Probability Models

Reference Book: Simon J. D. Prince, Computer Vision: Models, Learning, and Inference

Question: How to fit probability models to data {x_i}?

Answer: Learn about the parameters θ of the model.

Methods:

maximum likelihood
maximum a posteriori
Bayesian approach

Maximum likelihood (ML)

Likelihood function:

Pr(x_i | θ) at single data point x_i
Pr(x_{1...I} | θ) for a set of points

Assume that drawn independently from the distribution
Pr(x_{1...I} | θ) = \prod_{i from 1 to I} Pr(x_i | θ)

Estimate of the parameter

θ^{\hat} = argmax_θ [ Pr(x_{1...I} | θ) ]

Example #1: The skip-gram model

Reference: https://arxiv.org/pdf/1402.3722v1.pdf
Given a corpus of words w and their contexts c
Consider the conditional probabilities Pr(c|w)
Goal: Set the parameters θ of Pr(c|w;θ) so as to maximize the corpus probability:

argmax_θ \prod_{w in Text} [ \prod_{c in C(w)} Pr(c|w;θ) ]
argmax_θ \prod_{(w, c) in D} Pr(c|w;θ)

Model in Pr(c|w;θ):

e^{v_c \dot v_w} / \sum_{c' in C} e^{v_c' \dot v_w}

v_c, v_w: vector representation for c and w
C: all available contexts

Estimate: Take log

argmax_θ \sum_{(w, c) in D} log Pr(c|w;θ)
argmax_θ \sum_{(w, c) in D} (v_c \dot v_w - log(...) )
Very expensive to compute due to log(...)

https://www.tensorflow.org/tutorials/word2vec

Solutions:

Hierarchical softmax
Negative sampling

Negative sampling:

Pr( D=1 | w,c;θ ) = σ(v_c \dot v_w), σ: sigmoid
Estimate: argmax_θ \sum_{(w, c) in D} log σ(v_c \dot v_w)
D': all incorrect random (w, c) pairs
Estimate:

argmax_θ \sum_{(w, c) in D} log Pr( D=1 | w,c;θ ) + \sum_{(w, c) in D'} log Pr( D=0 | w,c;θ )
argmax_θ \sum_{(w, c) in D} log σ(v_c \dot v_w) + \sum_{(w, c) in D'} log σ(-v_c \dot v_w)

Random sampling:

Distributed Representations of Words and Phrases and their Compositionality
\sum_{i = 1 to k} E_{w_i ~ P_n(w)} log σ(-v_c \dot v_w)

Example #2: Bernoulli trial

Part 1: https://www.youtube.com/watch?v=I_dhPETvll8
Part 2: https://www.youtube.com/watch?v=Z582V53dfr8
Part 3: https://www.youtube.com/watch?v=jpHreXjtw1Q

沒有留言:

張貼留言

訂閱：張貼留言 (Atom)