Arraysearch bugs which need to be fixed
[info]arraysearch


  1. A user should be able to enter a list of genes in the first search box, and the results displayed should be the same as when they search by gene id and fold value, except that in place of the Pearson coefficient, the p-value should be displayed. Also - the search results should be sorted by p-value.



    Status: Solved this weekend. I created an approximation of the incomplete irregularized beta function (which you can then use to calculate the p-value for the t-distribution given the degrees of freedom of the data set). I used the approximation found in Numerical Recipes in C (1992).
    Additionally, this approximation also uses another approximation (that of the natural log of the gamma function), and I used a modified version of the famous Sterling approximation to solve this (based on a suggestion by Robert H. Windschitl suggested it in 2002 as a formula convenient for inputting into programmable calculators because it was fast and did not consume a lot of memory. Scroll down to the bottom for his formula). I tested some of the results against the t-test in R and the results seemed ok. I would like to do a more advanced plot of errors on data with different degrees of freedom, however.

  2. The default search for the first box (genes + fold values) - should input the fold values in log2.



  3. Search by Gene Title (the third search) takes a really long time to filter search results and display an answer. I need to figure out what is up with that.


  4. Related to this - find a way to speed up all SQL queries in general (if there is a way)

  5. Fix the heatmap - by overlaying the heatmap with a clickable image map.

  6. Based upon the size of the result set - the site should return either a heatmap or a bargraph.

    Status: This has been partially solved. For moderately large results sets - a heatmap will be returned. Unfortunately - for really large results sets (like 100 genes or so) - this will kill the webserver and the script will time out. It would be nice to have a zoomable heatmap - but I'm not sure how easy it will be to implement given the constraints of the hardware we are currently running the site on. And of course - if you get more than two people hitting the site at the same time and trying to generate their own heatmap, it will totally die. I might be able to implement this functionality in PHP and then put it on the site if we ever go to a fully dedicated server. From my Googling searches, I do not believe it has ever been done before in PHP (and maybe there is a good reason for that). What we need is really fast image generating code (perhaps in Python?) which I can run on the server, and then use PHP to deliver the images client-side. Pretty sure downtown hosting will not let me run my own C-code to generate images on their server (even if we have a dedicated server). But they do have python installed.


Updates for September 9, 2009
[info]arraysearch
I have been going over the PHP code for the website and cleaning it up. The sections of code for pChart which I hacked to create the heatmap and bargraph plots are really ugly, and I will have to tackle them next. Anyway - Dr. Ge gave me a to-do list of things to finish as soon as possible.



  1. Instead of 'Searching by Gene Fold Value', the title for this search should be 'Search for Similar Expression Profiles'

  2. Widen the text area for the searches (especially the first one)

  3. Input list of genes and fold values in log2 scale

  4. Display graphical as well as textual results in this first search (ie. show a heatmap/bargraph)

  5. Currently we are sorting by maximum positive Pearson coefficient. Should also return correlations whith high negative correlations.

  6. The query can could also be just a list of genes. Use t-test to return significant contrasts.

  7. Dynamically switch between bargraph and heatmap.

  8. search_ontology.php : This page looks messed up in IE even though it looks great in Firefox. Find out why.




Additionally, I am thinking strongly about moving this journal over to wikidot (the support for latex code being a prime motivating factor).
Note to self: Set up office hours as soon as possible.

LJ Developer Blog created
[info]arraysearch
This is the new development blog for the Arraysearch research project at South Dakota State University.   Primarily I intend to use this site to document bugs in the source code for the arraysearch website, as well as discuss where I am currently in developing new functions and algorithms.

Yesterday:  I changed the DNS nameservers at GoDaddy so that now they point to Downtown Hosting.

http://www.arraysearch.org now redirects to http://zanshin3d.net/arraysearch/arraysearch.php

Currently working on this function:

function getExpressionMatrix($db,$query){
   
    $expressions = array();
   
    $contrasts = array();

    $dtls = ExecuteQuery($db,$query);

    $getColumns = "SHOW columns FROM tbl_signature";
    $cNames = ExecuteQuery($db,$getColumns);
    $i=0;
   
    while($row = mysql_fetch_array($cNames,MYSQL_BOTH)){         
        $contrasts[$i]=$row[0];       
        $i++;
     }//end while
    
   
      $numCols = count($contrasts);
    $rowNum = 0;
   
    while($row = mysql_fetch_array($dtls,MYSQL_ASSOC)){

     for($colNum = 1;$colNum<count($contrasts);$colNum++){

      $expressions[$rowNum][$colNum]=$row[$contrasts[$colNum]];

      }
      $rowNum++;

     }
    
     $numRows = $rowNum;

    return array($expressions,$numRows,$numCols);
}


The matrix returned from this function is sometime HUGE - and I need to limit it's potential size - while allowing the user to navigate through the data quickly and easily.





Home