Consider a single Boolean random variable $Y$ (the “classification”). Let the prior probability $P(Y=true)$ be $\pi$. Let’s try to find $\pi$, given a training set $D=(y_1,\ldots,y_N)$ with $N$ independent samples of $Y$. Furthermore, suppose $p$ of the $N$ are positive and $n$ of the $N$ are negative.
-
Write down an expression for the likelihood of $D$ (i.e., the probability of seeing this particular sequence of examples, given a fixed value of $\pi$) in terms of $\pi$, $p$, and $n$.
-
By differentiating the log likelihood $L$, find the value of $\pi$ that maximizes the likelihood.
-
Now suppose we add in $k$ Boolean random variables $X_1, X_2,\ldots,X_k$ (the “attributes”) that describe each sample, and suppose we assume that the attributes are conditionally independent of each other given the goal $Y$. Draw the Bayes net corresponding to this assumption.
-
Write down the likelihood for the data including the attributes, using the following additional notation:
-
$\alpha_i$ is $P(X_i=true | Y=true)$.
-
$\beta_i$ is $P(X_i=true | Y=false)$.
-
$p_i^+$ is the count of samples for which $X_i=true$ and $Y=true$.
-
$n_i^+$ is the count of samples for which $X_i=false$ and $Y=true$.
-
$p_i^-$ is the count of samples for which $X_i=true$ and $Y=false$.
-
$n_i^-$ is the count of samples for which $X_i=false$ and $Y=false$.
[Hint: consider first the probability of seeing a single example with specified values for $X_1, X_2,\ldots,X_k$ and $Y$.]
-
-
By differentiating the log likelihood $L$, find the values of $\alpha_i$ and $\beta_i$ (in terms of the various counts) that maximize the likelihood and say in words what these values represent.
-
Let $k = 2$, and consider a data set with 4 all four possible examples of thexor function. Compute the maximum likelihood estimates of $\pi$, $\alpha_1$, $\alpha_2$, $\beta_1$, and $\beta_2$.
-
Given these estimates of $\pi$, $\alpha_1$, $\alpha_2$, $\beta_1$, and $\beta_2$, what are the posterior probabilities $P(Y=true | x_1,x_2)$ for each example?