- A user should be able to enter a list of genes in the first search box, and the results displayed should be the same as when they search by gene id and fold value, except that in place of the Pearson coefficient, the p-value should be displayed. Also - the search results should be sorted by p-value.
Status: Solved this weekend. I created an approximation of the incomplete irregularized beta function (which you can then use to calculate the p-value for the t-distribution given the degrees of freedom of the data set). I used the approximation found in Numerical Recipes in C (1992).
Additionally, this approximation also uses another approximation (that of the natural log of the gamma function), and I used a modified version of the famous Sterling approximation to solve this (based on a suggestion by Robert H. Windschitl suggested it in 2002 as a formula convenient for inputting into programmable calculators because it was fast and did not consume a lot of memory. Scroll down to the bottom for his formula). I tested some of the results against the t-test in R and the results seemed ok. I would like to do a more advanced plot of errors on data with different degrees of freedom, however. - The default search for the first box (genes + fold values) - should input the fold values in log2.
- Search by Gene Title (the third search) takes a really long time to filter search results and display an answer. I need to figure out what is up with that.
- Related to this - find a way to speed up all SQL queries in general (if there is a way)
- Fix the heatmap - by overlaying the heatmap with a clickable image map.
- Based upon the size of the result set - the site should return either a heatmap or a bargraph.
Status: This has been partially solved. For moderately large results sets - a heatmap will be returned. Unfortunately - for really large results sets (like 100 genes or so) - this will kill the webserver and the script will time out. It would be nice to have a zoomable heatmap - but I'm not sure how easy it will be to implement given the constraints of the hardware we are currently running the site on. And of course - if you get more than two people hitting the site at the same time and trying to generate their own heatmap, it will totally die. I might be able to implement this functionality in PHP and then put it on the site if we ever go to a fully dedicated server. From my Googling searches, I do not believe it has ever been done before in PHP (and maybe there is a good reason for that). What we need is really fast image generating code (perhaps in Python?) which I can run on the server, and then use PHP to deliver the images client-side. Pretty sure downtown hosting will not let me run my own C-code to generate images on their server (even if we have a dedicated server). But they do have python installed.
