Thoughts on the Science of Computing

Bayesian Networks
January 25, 2007
Currently, in one of my classes,  Reasoning with Partial Beliefs, I am learning about Bayesian Networks.  What are they? Probabilistic networks in the form of Directed Acylicy Graphs that can be used to model belief networks.  From these representations one can make various statements about the probability of the various events composing the system given evidence.

I find them incredibly fascinating because they offer powerful ways to model and make statements about various systems.  The applications range from medical diagnosis, to problems involving Genetic linkage and the reliablity of circuts.

The techniques are relatively new in that there was some skepticism about these techniques maybe 10 or 15 years ago, but this no longer the case. 

One can view Bayesian Networks as separate from Computer Science but the linkage is that in most cases computer programs process the Bayesian Networks and give us the answers from the model we input.  The graph representation of these networks also easily lends itself to computer algorithms. 

In a follow up to this article I will provide a simple example of a Bayesian Network and show how it can be used to model something.  Stay tuned.


Birthday Paradox and Probability of Collisions
September 28, 2009

The birthday paradox is the phenomenon where even with a small fraction of the number of days in the year the probability of two people having the same birthday becomes large.  For instance if there are 36 people in a room (10% of 365) there is a 83% probability that two of them have the same birthday.  With only 23 people we are already over 50%. 

 

This same logic doesn't only apply, though, to birthdays.  One can calculate the probability of collision for any set of elements given the number of samples.  For instance, assume you want to assign a unique number to each of your customers.  And let's say you were going to choose the number at random from a huge pool of numbers say 999,999,999. If you anticpate having 100,000 customers then one would expect with probability 99.3% to sample a number that was previously sampled for another customer, even though you have only selected .01% of the available numbers. I thought it would be fun to write an F# program to calculate this.

 

The probability is calculated as follows P = 1 - Probability of no collisions

 

It is easier to calculate the probability that there aren't collisions and subtract that from one. We can use simple counting principles to calculate the probability of choosing n samples without any collisions. For example if we have 365 days as the elements and we have 3 people (samples), then the first person has 365 possibble days he could choose without collision. The second person has 364 possible days out of 365 and the third 363 possible days. So

Probabiliby of No Collisions = (365/365)*(364/365)*(363/365)
and
Probabiliby of Collision = 1 - (365/365)*(364/365)*(363/365)

Here is the following code which implements the above algorithm in F#

 

let collisionProbability elements samples =
  let probNthSample n =
    let a = (float elements)
    let b = (float (elements - n + 1))
     b / a
  let rec probNoCollision counter answer =
     match counter with
     | 0 -> answer
     | _ -> probNoCollision (counter - 1) (answer * (probNthSample counter))
   in
     1.0 - (probNoCollision samples 1.0)

 

Some things I like about F# are its terseness. You don't have to provide type annotations for everyting unlike in C# or Java. The compiler infers them based on how they are used. Here, I am also using nested functions, which are great for breaking up an algorithm into smaller bits while still maintaining the scope of the functions to where they will be used. Also, nested functions can use values in scope. For instance elements is not passed into probNthSample. This algorithm was coded using tail-recursion to iterate over the different samples. Typically one would use a loop for this in imperative programming, but in a functional style recursion is used more often. (One could also make a list of the samples and fold_left.) Tail-recursion prevents the stack size from growing by not keeping previously called procedures on the stack if there is no additional work to be done. In the inner probNoCollision method, this is the case. The last thing that is used here is pattern matching. Stylistically, I like pattern matching better than if statements.

 

The probNoCollision function calculates what it name says. It multiplies together the probability of all the samples not colliding.

probNthSample is a helper function which just gets a float of the probability of the nth sample not colliding. For instance, in the birthday problem for the 2nd sample this method would return 364/365.

The last step just returns 1-probNoCollision.

 

To run this you could load this in the F# interactive window. And then run the command
collisionProbability 365 23;;
collisionProbability 999999999 100000;;

 

I am really enjoying F# and OCAML programming so far. I will be posting an article soon on permutations that I also coded in F#. This will also be in a functional style and will show more techniques of code reuse that are powerful, that one doesn't typically enjoy in languages like C#.

 

Lastly, I haven't had much time to focus on Yabe as I am working on some research projects. One thing that I will release soon is a replacement of the view engine. I have switched to using StringTemplate. I am enjoying this engine much more. It supports cleaner syntax for views and limits what you can do, programatically, in views. It is also functional in nature. This has resulted in cleaner views.

YABE Blogging Engine

Copyright © 2008 Jeff Bergman