Dr Raj Acharya, Professor and Head from the department of Computer Science and Engineering, Penn State University, University Park is currently giving a keynote speech on A Grid-based Virtual Center in the Framework of Information Fusion. The whole goal of the project is to take in biomarkers from the subject level and all and put it in a datawarehouse. And on top of that, an information fusion box is created so that we can make sence out of it.

The national center institute has started a new project called the Cancer Grid to support cancer research – so that individual nodes or cancer centers can chair the application on a grid and perform and analysis. What was proposed was whether we can simulate a virtual cancer – one that allows the sharing of data and that the application should not reside in just a few elite centers but everywhere.

Information fusion, which is the main concept behind this project notes that the whole is less than the parts. It enables 1 to analyse multiple, disparate biomedical data sets simultaneously. One of the ideas is to make use of wireless technology for data collection. For this, they have developed a prototype grid-based data warehouse for cancer data. The system simulates aspects of a virtual cancer center scenario.

The cancer research grid allows the collation of patient, clinical and pathology information, gene express information as well as public data (literature, etc) such that all these could be used for cancer analysis applications. The basic idea is to think of the information as a set of facts and dimensions. Facts are the objects to be analysed. They are analysed with respecto the dimensions. Example, dataset = Patient Demographics; User Query = What percentage of the prostate cancer patients belong to the African American race and fall in the age range of 50-60?; Fact = Percentage of Patients; Dimensions = Race, age at diagnosis. 

Multidimensional analysis helps discover simple patterns and associations among various data sets. From here, a toolkit comprising of the multidimensional analysis, combined clustering, correspondence analysis and homogeneity analysis and other components may be included. The correspondence analysis tries to see if there is any association from the data given. For the patient demographic data set, this tool helps answer questions such as: which age/race profiles, if any, define a typical profile of a prostate cancer patient?

So the correspondence analysis tool is really a dimension reduction tool. It can also be generalised as a multivariate analysis tool.

Information fusion may also be used in the area of gene expression analysis. It is known that genes from DNA are responsible for the expression of proteins, and Dr Raj has skipped a couple of slides from here. He looked into the co-regulation of genes to see if profiles of expressions can be used for the determination of gene co-regulation. He next covered KL clustering, in which, the KL divergene measuers the relative dissimilarity of the shapes of 2 gene profiles. Common motifs were next looked into, and combined with gene expression data. The aim of doing so is to identify clusters of genes with similar properties among all data.

Some experimental data are available, but are not allowed for public viewing because investigations are still ongoing.

In conclusion, a proposal for virtual cancer center was discussed, the CABIG framework was described and preliminary results on information fusion datawarehouse on the GRID is presented.



Leave a Comment

%d bloggers like this: