Thoughts on the Science of Computing

Data Mining On Genes
May 23, 2007
    One of the hottest areas in computer science is in the field of Biology.  It seems that so much of the research dollars are going towards genetic research that can hopefully unlock the secrets to diseases.  There is tons of Biological data to be mined for patterns.  One freely available data set is the Gensat data, which contains information about genes in a mouse's brain and the level of expression in these regions.  One type of analysis would try and determine any link between these genes regions with the hope of finding out a possible function for the gene. 

I recently did such an analysis on the data using the LSI methods mentioned in a previous article.  LSI does a singular value decomposition of a matrix.  For this problem I created a genes versus regions matrix and then in each element recorded a numerical value for the strength of the expression in that region. Once the SVD is done, one can map the genes and regions on to two principal components and see what genes and regions lie in the same part of the plot. 

Unfortunately, data mining can be a messy business and my results didn't show anything interesting.  It would be helpful if I had a biologist who could help better guide me on how to interpret the expression levels, patterns.  Another analysis would take into account the age of the mice in question and see how genes vary with age. 

This is often the way with data mining that you have to try several techniques until hopefully you find the pattern or the key to the data, if there is one.

There aren't currently any comments.

YABE Blogging Engine

Copyright © 2008 Jeff Bergman