Microarray Analysis through Transcription Kinetic
Modeling and Information Theory
Abdallah Sayyed-Ahmad, Kagan Tuncay and Peter Ortoleva*
Center for Cell and Virus Theory, Department of Chemistry, Indiana University, Bloomington IN 47405,
USA
ABSTRACT
cDNA microarray and other multiplex data hold promise for addressing the challenges of cellular complexity,
disease progression and drug discovery. We believe that combining transcription kinetic modeling with
microarray time series data through information theory will yield more information about the gene regulatory
networks than obtained previously. A novel analysis of gene regulatory networks is presented based on the
integration of microarray data and cell modeling through information theory. Given a partial network and time
series data, a probability density is constructed that is a functional of the time course of intra-nuclear
transcription factor (TF) thermodynamic activities, and is a function of RNA degradation and transcription
rate and equilibrium constants for TF/gene binding. The most probable TF time courses and the values of
aforementioned parameters are computed. Accuracy and robustness of the method are evaluated and an
application to Escherichia Coli is demonstrated. A kinetic (and not a steady state) formulation allows us to
analyze phenomena with a strongly dynamical character (e.g. the cell cycle, metabolic oscillations, viral
infection or response to changes in the extra-cellular medium).
1 INTRODUCTION
cDNA (Schena et al., 1995; DeRisi et al., 1997; Sauter et al., 2003) and other multiplex data acquisition
techniques yield a great volume of information and thereby hold promise for addressing the challenges of
cellular complexity, disease progression and drug discovery (Brown and Botstein, 1999; Debouck and
Goodfellow, 1999; Gerhold et al., 1999; Chitler, 2004). Recently we proposed an approach to the analysis
of such data based on its integration with cell modeling through information theory (Sayyed-Ahmad et al.,
2003). Here we show how this approach can be extended to analyze microarray time series data. The
approach has the potential to yield more information on the network of cellular processes from this data
than existing methods. Cell models hold great promise for understanding the complex networks of
processes underlying cell behavior (Slepchenko et al., 2003; Weitzke and Ortoleva, 2003; Navid and
Ortoleva, 2004). Unfortunately they suffer from a lack of information about many of the rate and
equilibrium constants for reaction and transport processes (Mendes and Kell, 1998; Sayyed-Ahmad et al.,
2003). Furthermore, key aspects of the biochemical network have yet to be resolved i.e. we are presented
with the challenge of calibrating and using an incomplete model. In contrast, microarray, protein mass
* To whom correspondence should be addressed. E-mail:[email protected], Telephone: 812-855-2717, Fax: 812-855-8300
spectroscopy, NMR and other multiplex data acquisition techniques yield many simultaneous
measurements but they are often only indirectly related to the quantities we seek (e.g. protein and mRNA
production and degradation rates and binding constants or the stoichiometry or the biochemical network).
Our MACI (Model-Automated Cell Informatics) approach integrates cell modeling and multiplex data
analysis into one seamless endeavor, using the strengths of one to overcome weaknesses in the other.
The state of a cell is specified by a set of variables Ψ for which we know the governing equations
and a set ( )T t which are at the frontier of our understanding for which we do not know the governing
equations. The challenge is that the dynamics of Ψ is given by a cell model,
( ), ,d
G Tdt
Ψ = Ψ Λ . (1)
in which the rate G depends not only on many rate and equilibrium constants Λ , but also on the frontier
variables ( )T t at each time. Thus the model cannot be solved completely (i.e. Ψ can only be determined
as a functional of the unknown time course ( )T t ).
A microarray time series data M reflects the evolving state of a cell and hence one should in
principle be able to construct synthetic time series synM from the predicted RNA time course (or more
generally Ψ and ( )T t ). By solving (1) one can determine Ψ as a function of Λ and initial
state ( )0tΨ = and as a functional of ( )T t over the time of the experiment (i.e. [ )0, ,Ψ = Ψ Φ Λ Ψ ). In this
way the microarray data constructed, synM , is a function of Λ and ( )0Ψ and a functional of ( )T t . An
attractive aspect of the present approach is that the physics and chemistry built into a kinetic cell model
puts strong constraints on the relationship between Ψ and ( ) ,T t Λ and ( )0Ψ to facilitate the
determination of these variables from time series data. In a sense, physics and chemistry enhance the
solvability of the inverse problem – i.e. the determination of ( ) ,T t Λ and ( )0Ψ given the time series
microarray data.
In addressing cellular complexity, we must account for the omnipresent uncertainty in observed
data and the structure of a cell model. Thus a probabilistic framework is needed. We suggest that the
probability of interest (denoted ρ ) is a function of Λ and ( )0Ψ and a functional of the time course ( )T t
of the frontier variables. We use information theory to construct ρ , develop a model (e.g. realization of
(1)), and introduce time series microarray data as a way to constrain ρ (i.e. we know M so, in principle,
some ( ) 0, , and T tΛ Ψ are more likely to be consistent with it than others).
Time series experiments involve monitoring a synchronized culture of cells over their cycle or
observing changes due to time-varying conditions in the extra-cellular medium. Other dynamical
phenomena of interest involve behaviors in response to nuclear transplantation, fertilization or viral
infection, or the time course of normal development, gene insertion/deletion, radiation, heat shock and
transitions to abnormality or drug resistance. All these phenomena, and the analysis of time series data on
them, require kinetic approaches if the full dynamics of these systems is to be explored. For example,
steady state approaches can only yield ratios of rate coefficients and not all coefficients independently.
A number of methods have been proposed for extracting information and overcoming systematic
errors from microarray data after it has been preprocessed (e.g. through image analysis, statistical
approaches and channel normalization) (Wilson et al., 2002; Hyduke et al., 2003; Symth and Speed, 2003).
Among them are Boolean network models (Shmulevich et al., 2002; Shmulevich et al., 2003), Bayesian
network models (Pe'er et al., 2001), Bayesian statistics (Friedman et al., 2000; Li et al., 2002), cluster
analysis (Azuaje, 2002; Bolshakova and Azuaje, 2003), independent component analysis (ICA)
(Liebermeister, 2002), principal component analysis (PCA) (Holter et al., 2000; Holter et al., 2001) and
network component analysis (NCA) (Liao et al., 2003; Kao et al., 2004). These techniques assume that the
system is at steady state. In Boolean and Bayesian network models the goal is to infer gene regulatory
network structure. Cluster analysis, Bayesian statistics, ICA, and PCA classify genes into groups; genes that
have similar expression profiles are identified as functionally similar. NCA differs from other techniques in
that the structure of the transcription control network is assumed to be known. Furthermore, a number of
assumptions are made in NCA to arrive at the final model. The MACI algorithm presented here also
requires the structure of the transcription control network. However, we place no restrictions on the
structure of this network, use a kinetic model (1), construct synthetic time series microarray, apply a
physically motivated regularization constraint for the time-dependence of TF activities and then derive the
information of interest. We determine rate and equilibrium constants that appear in our chemical kinetic
formulation of RNA polymerase binding to the genes. Rate constants for transcription and RNA
degradation are also obtained. The use of a kinetic model also allows us to generalize our approach to
proteomic and metabolic data either by themselves or with cDNA microarray data. Our method is
significantly different from the approach proposed by (Gardner et al., 2003) whose objective is to construct
the gene control network using microarray data. However we demonstrate how our method provides a
method to fill in incomplete networks to identify inconsistencies of a proposed network with observed data.
2 A TRANSCRIPTION MODEL
2.1 The Model
In the MACI approach, one no longer requires a comprehensive cell model to make quantitative predictions
even though processes within and among the genome, proteome and metabolome are all strongly coupled.
Rather our approach only requires an incomplete model wherein governing equations for the variables Ψ
of (1) are to be set forth and the model must contain those variables with which one may construct synthetic
data of the type available. Thus the following model is based on equations for RNA levels so that synthetic
microarray data can be constructed. But to do so one must know the time course of TFs as they up/down
regulate genes. Specifically, to implement our approach the TF intra-nuclear thermodynamic activities are
identified as the frontier variables ( )T t of (1). Closure is obtained by using time series data to generate
functional differential equations for the ( )T t . Thus one need not make oversimplified assumptions on
( )T t to compute Ψ i.e. one may calibrate and run an incomplete transcription model even though the
dynamical variables (e.g. RNA populations) are strongly coupled to ( )T t .
We first develop a forward model in which given a set of time courses of TF thermodynamic
activities, ( )T t , the time course of intra-cellular RNA populations ( )R t is predicted. Due to the dense
environment within the nucleus, TF thermodynamic activities are preferred over concentrations. With such
a model and time series microarray data, we show in the next section how transcription rate and other
central genome model parameters, the most probable ( )T t can be obtained, and the gene control network
can be quantitatively characterized. Finally, note that the following model is rather simple. While more
complete models could be used (Weitzke and Ortoleva, 2003), our purpose here is to demonstrate the
approach and not to address the supercomputer challenge of a very detailed approach. Given a transcription
control network with TFN TFs and gN genes, the i-th gene (gi), 1, 2, gi N= , is assumed to have isN TF
binding sites labeled 1, 2, isj N= . Assume the binding at any site is independent of the state of others and
that only one type of TF can bind to each site. While the latter two assumptions are not an inherent
restrictions of our approach, they are made here for simplicity. Let ijn label the TF that can bind to site j on
gene i. Let ijb be minus one or plus one indicate the nature of regulation (down or up) of TF type ijn on gi.
At TF/gene equilibrium, the probability iH that gi is available for RNA polymerase (RP) complexing is
taken to be given by an equilibrium Langmuir uncompetitive adsorption isotherm (Laidler and Meiser,
1995)
( )( ) ( )1
2
1
1
i bijs
ij ij
N
i ij n ij nj
H Q T Q T+
=
= +∏ , (2)
for binding constant ijQ (liter/mole) and intra-nuclear activity ijnT of the ijn -th TF. The rate of RNA
transcription initiation is written as
( )max , ,i i ik k H Q b T= , (3)
where maxik is a saturation rate coefficient that we suggest is diffusion-limited. After RP binds to a gene
RNA elongation commences. If nucleotide concentrations are roughly steady during transcription and the
RP advancement velocity iu (nucleotides/sec), then the transcription polymerization rate iA is taken to be
( )1 1
[ ]
inuc
i i i
N
A k RP u= + . (4)
A form which captures the rate limiting step of the two serial processes (RP binding and the transcription).
With this, the governing equation for the RNA populations is written
ii i i
dRA R
dtλ= − , (5)
where inucN is the total length of the gene gi, and iλ is the decay constant. If the rate limiting step for
transcription is RNA polymerase binding to the gene, then the second term in (4) may be dropped. Finally,
[ ]RP is the activity of free RNA polymerase and is assumed constant (and henceforth is subsumed in
maxik ), then the governing equation for RNA evolution becomes
( )max , ,ii i i i
dRk H Q b T R
dtλ= − . (6)
Finally, our methodology can easily be generalized by relaxing any of the above assumptions.
3 INFORMATION THEORY MODEL/DATA INTEGRATION
4 RESULTS
4.1 Synthetic Example
In order to test our implementation of the approach described above, and to find its practical limitations, we
used a model network that consists of 20 genes and 10 TFs. The gene control network is shown in Table 1
where 1± implies up/down regulation**. We took all binding constants and initial RNA concentrations to
be unity and 910 M− , respectively.
We generated TF time courses and created the synthetic time series microarray data using our
transcription model, selecting 10 “data points” that are 500 seconds apart. In the following, we demonstrate
the robustness of our approach in reconstructing ( )T t despite incorrect regulatory network and noise in
microarray data, conditions commonly encountered in practice.
4.2 Uncertain Regulatory Network Information
Sequence analysis can be used to determine the structure of the gene control network based on likely
binding sites. However, this approach is likely to suggest a large number of incorrect interactions in the
gene control network. It is of interest to test whether our approach can filter the redundant nonzero entries
in the control network. In our approach, if a TF is assumed to upregulate a gene, large binding constants,
i.e. 1QT , imply that this interaction is unlikely (redundant) as ( )1 1QT QT+ ≈ . A similar argument
hold for wrongly assumed down regulation as indicated by small binding constants ( 1QT , and therefore
( )1 1 1QT+ ≈ ). Therefore our methodology conveniently filters out incorrect interactions by assigning
large/small binding constants for up/down regulation. To check the vulnerability of our approach to such
redundant interactions, we added random nonzero factors in the control network, and obtained a
“conjectured” full control matrix as shown in Table 2. As this matrix is full, the NCA method fails. In our
approach, the match between the predicted and real TF time courses is remarkable even when the
“conjectured” full matrix was used. Fig. 2 illustrates the effect of the number of redundant interactions on
the mismatch between the predicted and actual TF time courses. The mismatch (relative to the one obtained
using the actual network) does not exceed 2 even when the full interaction matrix is used as a starting point.
Thus our approach effectively filters the gene control network of unnecessary interactions.
** Table 1 and Table 2 are provided in the supplementary materials
Fig.2 Mismatch between the predicted and actual TF time course relative to the one obtained using the actual gene
control network shown in Table 1 Although the mismatch increases as the number of interactions in the gene control
network increases, it stays within a limit of 2 in this particular example, showing the robustness of our approach to
discover the operating gene control network hidden in a larger one (see Table 1 and 2 provided in the supplementary
material).
4.3 Robustness to Microarray Error Levels
Despite advances in the technology, microarray data has considerable levels of error. In our
implementation, we input raw microarray channel data, perform standard channel normalization (based on
housekeeping genes determined using ranking of channel intensities and quality filtering for multiple spot
and slide data (Hyduke et al., 2003)). The resulting confidence intervals constitute prior information about
the level of noise in the microarray response. In this test, we investigate the vulnerability of our approach to
error in microarray data. We added random noise to the synthetic microarray data that was obtained using
the assumed TF time courses, and the gene control network (Table 1). Fig. 3 shows the mismatch (relative
to the TF time course obtained without added noise) as a function of added noise level with and without the
regularization constraint (see equation 10). Regularization constraint yields a better match at noise levels
higher than 30%, providing robustness to the calibration process for very noisy data. More generally,
Lagrange multiplier for the regularization constraint in (12) should decrease with the width of the
confidence intervals although we did not pursue this relationship here. This example also illustrates one of
the advantages of time series data over steady state data in that time series data is less vulnerable to noise
due to the use of our novel regularization constraint.
Fig.3 Mismatch between the predicted and actual TF time course (relative to that obtained using noise-free microarray
data) as a function of the amplitude of random noise added to the microarray data. The diamond and square markers
represent the mismatch with and without the regularization constraint. For large noise levels the regularization
constraint provides significant improvement in the calibration process.
4.4 Healing Mistakes in a Proposed Regulatory Network
Often we rely on TF-gene interactions obtained from questionable quality resources. Therefore, it is
important that our algorithm is robust to potential sign mistakes in the gene control network. In this case,
we run our code in a discovery mode that searches for network mistakes. We first rank genes based on the
mismatch between the predicted and observed microarray response, highest ranked having the greatest
mismatch. Then, we rank the TFs based on the rank and number of genes that they regulate. As calibration
progresses, we periodically check the genes whose mismatch is greater than averageE aσ+ where averageE
is the average mismatch and σ is the standard deviation of gene mismatch and a is an empirical
parameter. Once the genes satisfying this criterion are identified, we change the sign of the highest ranked
TF (up/down). We also consider additional input that the user provides in his/her confidence in each
element of the gene control matrix. At a given step in this process, we only change one sign per column of
the gene control network. After a few iterations, we monitor the mismatch behavior; if the sign change
failed to improve the mismatch, we change the sign back. To test this algorithm, we took the gene control
network of Table 1 and introduced four mistakes by changing the sign of the diagonal elements of genes
1-4. Our algorithm successfully corrected the network after only 10 iterations. Fig. 4 shows the predicted
and observed microarray data.
Fig. 4 Comparison of observed (solid line) and predicted microarray data for genes 1-4 (see Table 1). Triangles
indicate the best microarray fit when there are four mistakes in the gene control network (the first four entries in the
upper left diagonal for genes 1-4); squares indicate the best microarray fit when we allow our program to correct the
network. This shows that our algorithm is not only able to calculate the TF time courses, binding constants, etc.; it can
also be used as a tool to decide whether a TF is up or down regulating a gene.
The square markers represent the best fit to the microarray data when the control network with four
mistakes is assumed, whereas the triangle markers represent the best fit after our algorithm corrected the
mistakes. This demonstrates another aspect of our methodology, correcting a user-supplied gene control
network via microarray data. For large, real systems, we believe that many iterations will be necessary to
arrive at an accurate network. The methodology will recommend changes in the gene control network
based on the microarray data and a user-suggested network. The rankings supplied with improved network
can be used to guide literature searches or carry out sequence analysis that can be used to further refine the
network.
4.5 Application to E. coli
To test our methodology we used the E.coli microarray data obtained for carbon source transition from
glucose to acetate media. Details of the experimental conditions and the microarray procedure are provided
in Kao et al. (2004). The data included expression levels (relative to initial time) of 100 genes at 300, 900,
1800, 3600, 7200, 10800, 14400, 18000 and 21600 seconds. The gene control matrix used is based on
RegulonDB (Salgado et al., 2001) as modified by Kao et al. (2004). We made additional changes based on
EcoCyc (Karp et al., 1998). Fig. 5 shows the time courses of 16 (out of 38) representative TF activities.
Kao et al. (2004) applied NCA (Liao et al. 2003) to the same problem. Furthermore, the transcription
kinetics in our approach differs from the steady state assumption and binding formulation used in NCA.
Despite these differences, 15 out of 16 TF activity time courses (Kao et al. only presented the time courses
of 16) are in qualitative agreement. As shown in Fig. 5 PhoB increases as a result of response to acetate
enrichment of the medium in contrast to the decreasing activity predicted by NCA. Therefore one would
expect the phoB activity to increase as well. As another check on our results, consider the arcA gene which
makes ArcA TF that upregulates gene itself.
Fig. 5 TF activity time courses for 16 of 38 TFs. These results are in qualitative agreement with those obtained by Kao
et al. (2003) except for PhoB. pstC and pstS are upregulated by PhoB and their level of expression increases in time
(shown in Fig. 6), therefore one would expect the activity of PhoB to increase as well.
One would expect a correlation between the expression of arcA and TF activity of ArcA. To have an
unbiased test, we took the expression of arcA from the microarray data and calculated the time course of
ArcA activity based on the other 17 genes in the network that it regulates. The resultant ArcA time course
was indistinguishable from the one obtained via that using the arcA microarray expression data. A
comparison of ArcA activity (Fig. 5) and expression of arcA (Fig. 6) shows a similar trend. We also added
up to 30% noise to the E. Coli. Microarray data, the approach is still found to be robust.
Fig. 6 Comparison of predicted (hollow markers) and observed (solid markers) microarray response for a) nuoJ, nuoA,
b) arcA, c) livK, ppsA, pykF, d) pstC and pstS.
CONCLUSIONS
ACKNOWLEDGEMENTS
This project was supported by the United States Air Force (through the Defense Advanced Research
Projects Agency), the United States Department of Energy (Genomics: Genome to Life Program) and IBM
Life Sciences Institute of Innovations as well as the College of Arts and Sciences and the Office of the Vice
President for research (Indiana University) as general support for the Center for Cell and Virus Theory.
REFERENCES
Azuaje, F. (2002) A cluster validity framework for genome expression data. Bioinformatics, 18, 319-320.
Bolshakova, N. and F. Azuaje (2003) Cluster validation techniques for genome expression data. Signal Processing, 83,
825-833.
Brown, P. and D. Botstein (1999) Exploring the new world of the genome with DNA microarray. Nature Genetics, 21,
33-37.
Chitler, S. (2004) DNA microarrays:Tools for the 21(st) century. Combinatorial Chemistry and High throughput
Screening, 7, 531-537.
Debouck, C. and P. N. Goodfellow (1999) DNA microarrays in drug discovery and development. Nature Genetics, 21,
48-50.
DeRisi, J. L., V. R. Iyer and P. O. Brown (1997) Exploring the Metabolic and Genetic Control of Gene Expression on
a Genome Scale. Science, 278, 680-686.
Friedman, N., M. Linial, I. Nachman and D. Pe'er (2000) Using Bayesian networks to analyze expression data. Journal
of Compuational Biology, 7, 601-620.
Gardner, T. S., D. d. Bernardo, D. Lorenz and J. J. Collins (2003) Inferring Genetic Networks and Identifying
Compound Mode of Action via Expression Profiling. Science, 301, 102-105.
Gerhold, D., T. Rushmore and C. T. Caskey (1999) DNA chips: promising toys have become powerful tools. Trends in
Biochemical Sciences, 24, 168-173.
Holter, N. S., A. Maritan, M. Cieplak, N. V. Fedroff and J. R. Banavar (2001) Dynamics modeling of gene expression
data. Proceedings of National Academy of Science, 98, 1693-1698.
Holter, N. S., M. Mitra, A. Maritan, M. Cieplak, J. R. Banavar and N. V. Fedroff (2000) Fundamental patterns
underlying gene expression profiles: Simplicity from complexity. Proceedings of National Academy of Science,
97, 8409-8414.
Hyduke, D., L. Rohlin, K. Kao and J.Liao (2003) A software package for cDNA microarray data normalization and
assessing confidence intervals. OMICS: A Journal of Integrative Biology, 7, 227-234.
Kao, K. C., Y.-L. Yang, R. Boscolo, C. Sabatti, V. Roychowdhury and J. C. Liao (2004) Transcriptome-based
determination of multiple transcription regular activities in Escherichia coli by using network component analysis.
Proceedings of National Academy of Science, 101, 641-646.
Karp, P. D., M. Riley, S. M. Paley, A. Pellegrini-Toole and M. Krummenacker (1998) EcoCyc:Encyclopedia of
Escherichia coli genes and metabolism. Nucleic Acids Research, 26, 50-53.
Laidler, K. J. and J. H. Meiser (1995). Physical Chemistry. Boston, Houghton Mifflin Company.
Li, Y., C. Cambell and M. Tipping (2002) Bayesian automatic relevance determination algorithms for classifying gene
expression data. Bioinformatics, 18, 1332-1339.
Liao, J. C., R. Boscolo, Y.-L. Yang, L. M. Tran, C. Sabatti and V. P. Roychowdhury (2003) Network component
analysis:Reconstruction of regulatory signals in biological systems. Proceedings of National Academy of Science,
100, 15522-15527.
Liebermeister, W. (2002) Linear modes of gene expression determined by independent compnent analysis.
Bioinformatics, 18, 51-60.
Mendes, P. and D. Kell (1998) Nonlinear optimization of biochemical pathways: applications to metabolic engineering
and parameter estimation. Bioinformatics, 14, 869-883.
Navid, A. and P. J. Ortoleva (2004) Simulated complex dynamics of glycolysis in the protozoan parasite Trypanosoma
brucei. Journal of Theoretical Biology, 228, 449-458.
Pe'er, D., A. Regev, G. Elidan and N. Friedman (2001) Inferring subnetworks from perturbed expression profiles.
Bioinformatics, 17, S215-S224.
Salgado, H., A. Santos-Zavaleta, S. Gama-Castro, D. Millen-Zarate, E. Diaz-Peredo, F. Sanchez-Solano, E. Perez-
Rueda and C. Bonavides-Martinez (2001) RegulonDB(version 3.2):transcriptional regulation and operon
organization in Escherichia coli K-12. Nucleic Acids Research, 29, 72-74.
Sauter, G., R. Simon and K. Hillan (2003) Tissue microarrays in drug discovery. Nature Reviews Drug Discovery, 2,
962-972.
Sayyed-Ahmad, A., K. Tuncay and P. J. Ortoleva (2003) Toward Automated Cell Model Development through
Information Theory. Journal of Physical Chemistry A, 107, 10554-10565.
Schena, M., D. Shalon, R. W. Davis and P. O. Brown (1995) Quantitative Monitoring of Gene Expression Patterns with
a Complementary DNA microarray. Science, 270, 467-470.
Shmulevich, I., E. R. Dougherty, S. Kim and W. Zhang (2002) Probablistic Boolean Networks: a rule based uncertainty
for gene regulatory networks. Bioinformatics, 18, 261-274.
Shmulevich, I., H. Lahdesmaki, E. R. Dougherty, J. Astola and W. Zhang (2003) The role of certain Post classes in
Boolean newtork models of genetic networks. Proceedings of National Academy of Science, 100, 10734-
10739.
Slepchenko, B., J. Schaff, I. Macara and L. Loew (2003) Quantitative cell biology with the virtual cell. Trends in Cell
Biology, 13, 570-576.
Symth, G. and T. Speed (2003) Normalization of cDNA Microarray Data. Methods, 31, 265-273.
Weitzke, E. L. and P. J. Ortoleva (2003) Simulating cellular dynamics through a coupled transcription, translation,
metabolic model. Computational Biology and Chemistry, 27, 469-480.
Wilson, D. L., M. J. Buckley, C. A. Helliwell and I. W. Wilson (2002) New normalization methods for cDNA
microarray data. Bioinformatics, 19, 1325-1332.
Top Related