This post is just a silly attempt at the following, trying to look smart by putting fancy equations on my blog and playing around with latex, and talking about MLE.
Down below is an html version of a portion of a tex file that I generated for a homework.
I am new to Latex and finding it incredibly useful for formatting documents that include equations or other mathematical symbols.
It also great for general typesetting, for instance, if you want to create a resume that really stands out there are great templates available.
From my vantage, latex documents work similar to html documents with tags that control various features. The great thing is that once you get the hang of them it is very easy to type in equations like below. Here is the latex code for equation 1.
\begin{equation}
\vec{\theta} = (\lambda_1,\lambda_2,...,\lambda_k, Z)
\end{equation}
And similar ideas are used for the rest. Here is the complete file.
Some blogs and web frameworks actually have renderers that will translate an latex equation directly to a png file dynamically.
Unfortunately, I haven't found anything in asp.net that does this. So, I actually ran a program called htlatex to generate the png files below of the equations.
Now, for a quick, crude explanation of what's below.
Maximum Entropy is a conecpt that is very popular in statistics these days. The ideas is let's say you care about certain statistics about a dataset or problem, for instance you might be interested in the mean of the data and the variance. And then lets say you want to fit a distribution to the data while preserving those statistics.
Well, there may be many different distributions that share the same mean and variance, so one idea is to just choose the distribution with the maximum entropy that fits the data and preserves the statistics. If you actually do the derivation you end up with an expression for the probability like in equation 2 that is a function of a set of phis, which are just your statistics. So if we just cared about the mean and variance we would have two phis and two lambdas in the equations below.
So, now that you have equation 2 it depends on the parameters lambda and needs to be normalized. So one still needs to find the lambdas. One eventually derives equations like 19 which you can use the data to calculate the lambdas.
This all can be show using the maximum entropy point of view, but we were asked to show that the equations for lambda could also be derived using Maximum Likelihood Estimation or MLE for short. Here, we are maximizing the probability of the data given our choice of lambdas. So one tries to find the lambdas for which the probability is a maximum.
Below is the meat of the derivation. Nothing earthshattering below.
We want to use Maximum Likelihood Estimation to find the parameters
 | (1) |
We want to maximize the probability of the data given the parameters
 | (2) |
 | (3) |
Here we rewrite the equation as a log-likelihood
 | (4) |
 | (5) |
 | (6) |
 | (7) |
We want to maximize these parameters subject to the constraint that the integral of
p(x) is normalized to 1 over all space.
 | (8) |
We then use a Lagrange multiplier to constrain the log-likelihood when maximizing.
 | (9) |
 | (10) |
 | (11) |
Assuming Z is just a parameter or normalization constant that doesn’t depend on x
we can move it outside the integral.
 | (12) |
And using equation 8 to replace the right hand term and setting the derivative to 0
we get.
 | (13) |
 | (14) |
 | (15) |
Replacing alpha in equation 10 with -N and setting the derivative to 0 we
get
 | (16) |
 | (17) |
 | (18) |
and finally we get substituting equation 2 into equation 18
 | (19) |