Identifying Actives from HTS Data Sets: Practical Approaches for the Selection of an Appropriate HTS...

15
http://jbx.sagepub.com/ Journal of Biomolecular Screening http://jbx.sagepub.com/content/16/1/1 The online version of this article can be found at: DOI: 10.1177/1087057110389039 2011 16: 1 originally published online 15 December 2010 J Biomol Screen Tong Ying Shun, John S. Lazo, Elizabeth R. Sharlow and Paul A. Johnston Data-Processing Method and Quality Control Review Identifying Actives from HTS Data Sets: Practical Approaches for the Selection of an Appropriate HTS Published by: http://www.sagepublications.com On behalf of: Journal of Biomolecular Screening can be found at: Journal of Biomolecular Screening Additional services and information for http://jbx.sagepub.com/cgi/alerts Email Alerts: http://jbx.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: What is This? - Dec 15, 2010 OnlineFirst Version of Record - Jan 7, 2011 Version of Record >> at UNIV OF VIRGINIA on August 13, 2013 jbx.sagepub.com Downloaded from

Transcript of Identifying Actives from HTS Data Sets: Practical Approaches for the Selection of an Appropriate HTS...

http://jbx.sagepub.com/Journal of Biomolecular Screening

http://jbx.sagepub.com/content/16/1/1The online version of this article can be found at:

 DOI: 10.1177/1087057110389039

2011 16: 1 originally published online 15 December 2010J Biomol ScreenTong Ying Shun, John S. Lazo, Elizabeth R. Sharlow and Paul A. Johnston

Data-Processing Method and Quality Control ReviewIdentifying Actives from HTS Data Sets: Practical Approaches for the Selection of an Appropriate HTS

  

Published by:

http://www.sagepublications.com

On behalf of: 

  Journal of Biomolecular Screening

can be found at:Journal of Biomolecular ScreeningAdditional services and information for    

  http://jbx.sagepub.com/cgi/alertsEmail Alerts:

 

http://jbx.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

What is This? 

- Dec 15, 2010 OnlineFirst Version of Record 

- Jan 7, 2011Version of Record >>

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

© 2011 Society for Laboratory Automation and Screening www.slas.org 1

IntroductIon

High-throughput screening (hts) is the dominant

lead generation strategy used in drug discovery to identify chemically tractable compounds that modulate the bio-logical activity of selected targets or cellular phenotypes.1-10 to achieve this goal, automated screening assays must be developed and optimized to provide a robust and reproducible signal win-dow with sufficient throughput to screen the desired number of

compounds.1-10 advances in assay technologies, miniaturization, robotic systems, and detection platforms have expanded the types of targets and cellular phenotypes that can be addressed, increased the number of compounds that can be screened within a given time period, and reduced the costs of screening.1-11 For example, more than a million compounds per day may be screened in ultra high throughput against targets compatible with 1536-well assay formats.11 For more complex biochemical or phenotypic cellular targets, however, an appropriately relevant assay may be of lower throughput, and an hts campaign may need to be prosecuted over several weeks. Fewer compounds will typically be tested in concentration-response assays to con-firm the hits from primary hts assays, and most secondary hit characterization or lead optimization assays will be conducted at significantly lower throughput.1,12 however, regardless of throughput, compound numbers, or assay format, the goal for any screening assay is to accurately distinguish active com-pounds from inactive compounds.

in an ideal hts assay, the behavior of active compounds would be completely distinct from that of inactive compounds, the maximum (maX) and minimum (min) control values

1university of pittsburgh drug discovery institute, pittsburgh, pa, usa.2department of pharmacology and chemical Biology, university of pittsburgh, school of medicine, pittsburgh, pa, usa.

received Jun 9, 2010, and in revised form sep 20, 2010. accepted for publica-tion sep 23, 2010.

supplementary material for this article is available on the Journal of Biomolecu-lar Screening Web site at http://jbx.sagepub.com/supplemental.

Journal of Biomolecular screening 16(1); 2011doi: 10.1177/1087057110389039

Identifying Actives from HtS data Sets: Practical Approaches for the Selection of an Appropriate HtS data-Processing

Method and Quality control review

tong YIng SHun,1 JoHn S. LAzo,1,2 ELIzAbEtH r. SHArLow,1,2 and PAuL A. JoHnSton1,2

high-throughput screening (hts) has achieved a dominant role in drug discovery over the past 2 decades. the goal of hts is to identify active compounds (hits) by screening large numbers of diverse chemical compounds against selected targets and/or cellular phenotypes. the hts process consists of multiple automated steps involving compound handling, liquid transfers, and assay signal capture, all of which unavoidably contribute to systematic variation in the screening data. the challenge is to distinguish biologically active compounds from assay variability. traditional plate controls-based and non-controls-based statistical methods have been widely used for hts data processing and active identification by both the pharmaceutical industry and academic sectors. more recently, improved robust statistical methods have been introduced, reducing the impact of systematic row/column effects in hts data. to apply such robust methods effectively and properly, we need to understand their necessity and functionality. data from 6 hts case histories are presented to illustrate that robust statistical methods may sometimes be misleading and can result in more, rather than less, false positives or false negatives. in practice, no single method is the best hit detection method for every hts data set. however, to aid the selection of the most appropriate hts data-processing and active identification methods, the authors developed a 3-step statistical decision methodology. step 1 is to determine the most appropriate hts data-processing method and establish criteria for quality control review and active identification from 3-day assay signal window and dmso validation tests. step 2 is to perform a multilevel statistical and graphical review of the screening data to exclude data that fall outside the quality control criteria. step 3 is to apply the established active criterion to the quality-assured data to identify the active compounds. (Journal of Biomolecular Screening 2011:1-14)

Key words: high-throughput screening, hit identification, statistical decision

original Articles

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Shun et al.

2 www.slas.org Journal of biomolecular Screening 16(1); 2011

would be highly reproducible and widely separated from each other, and there would be no false positives or false nega-tives.12-16 in practice, all hts assays contain both random and systematic variability that contribute to the “noise” associated with the experimental data.1,3,5,7,8,12-15 sources of random varia-bility include variability in biological activity (true activity) and random errors that arise from occasional technical or pro-cedural failures.12-16 in addition, there are many potential sources of systematic variability, including differences between reagent lots, buffer preparations, compound concentration, and consumable materials (microplates, pipette tips); defective signal capture by detection instruments (e.g., nonnormalized photomultiplier tubes); uneven temperature or evaporation equilibration across plates; and failed compound or reagent transfer.12-16 the consequences of a high false-positive rate will depend on the time and resources that will be required to iden-tify and eliminate these compounds during the follow-up hit characterization effort. False-negative results are more serious because they represent potential lead candidates that were missed in the screen and consequently never selected for follow-up studies. the compound collection may also contrib-ute to systematic variability and false positives, promiscuously interfering with assay formats due to physical properties such as color, fluorescence, and aggregation or by reacting chemi-cally with target proteins or assay reagents.2,17-20 For example, michael acceptors may modulate activity through the alkyla-tion of critical cysteine or amino groups of target proteins, and redox cycling compounds generate h2o2 in reducing environ-ments that may indirectly inhibit the activity of target proteins susceptible to oxidation.2,17

there are typically 3 major steps in any strategy to identify active compounds from a primary hts campaign: (1) the development and validation of a quality-assured assay compat-ible with hts, (2) data processing and quality control (Qc) review of the screening data, and (3) analysis of quality-assured hts data to identify actives.2-10,15,17 assay quality plays a major role in determining the hit confirmation rate,15,16 and specific criteria for the exclusion or inclusion of hts assay data will typically be established during the assay development and validation phases of an hts campaign. some of the critical objectives of assay development are to maximize the assay signal window, minimize or control assay variability, estimate statistical variability of the assay, and select a threshold to dis-tinguish actives from inactive compounds.2-10,17 the hts com-munity has developed and established automated data analysis methods to process data, provide measures for Qc review, and select active compounds.12-15,21,22 traditionally, hts data are preprocessed for analysis by 1 of 2 methods, plate controls-based normalization methods (e.g., percent inhibition) or non-controls-based statistical methods (e.g., Z score). Both approaches have been widely used to Qc hts data and to identify hits.12,13 more recently, advanced statistical methods such as B scores and BZ scores have been introduced for robust

hit identification that reduce the impact of systematic row/ column effects in hts data.13,23 given the enormous variety and complexity of targets, assay formats, and data outputs, it is hardly surprising that none of the data-processing and analysis procedures provides a universal hit selection method for hts.12,13 here, we describe a practical 3-step process to guide the selection of the most appropriate data-processing method for a particular hts assay and to select active compounds. the process begins with a rigorous statistical analysis of hts assay development and validation data and incorporates multiple-level statistical and graphical Qc review of the primary screen-ing data. Finally, we present an analysis of the unbiased application of 5 different processing and hit identification methods to the data from 6 hts campaigns.

MAtErIALS And MEtHodS

Experimental data

as part of the national institutes of health (nih) roadmap initiative, the pittsburgh molecular screening library center (pmlsc) was 1 of 10 pilot phase (2005-2009) centers of the molecular libraries screening center network (mlscn) established to provide the academic community broader access to hts technology and capabilities.24,25 By the third year of the mlscn pilot phase, the nih had amassed a screening library of ~220,000 chemically diverse small molecules that were dis-tributed to the centers for hts campaigns. the assay descrip-tions and results from all mlscn-sponsored hts campaigns are published to the pubchem database and are publically accessible via their respective assay identifier numbers (aids; suppl. table s1).26 all 6 hts assays used in the present study were developed and implemented in a 384-well plate format with the same layout for plate controls and compounds (suppl. Fig. s1). the controls were located in columns 1, 2, 23, and 24 of the plate with 32 maX control wells distributed at the 4 plate corners, 24 min control wells in the middle of the 2 sides of the plate, and eight 50% inhibition controls at 1 side. the distributed locations for both the min and maX controls minimized the edge-related variability, and locating the 50% inhibition controls at 1 side provided a means to identify plate orientation. We used assay development and validation data from the 6 of the pmlsc hts campaigns (suppl. table s1) conducted in the first (65K library), second (97K library), and third (200K library) years of the pilot phase of the mlscn to demonstrate the application of the 3-step statistical methodol-ogy described in this article.

HTS data-processing and hit identification methods

in this report, we compare 5 hts data-processing and hit identification methods, which are either widely used or newly introduced. the 5 methods include controls-based percent inhibition/activation,12,13 Z score,12,13 median absolute deviation

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Appropriate HtS data-Processing and Quality control Methods

Journal of biomolecular Screening 16(1); 2011 www.slas.org 3

(mad) score,12 B score,12,13 and BZ score.23 the details of each method are described in detail in the supplementary text.

Statistical indices to direct selection of HTS data-processing and hit identification methods

We developed decision criteria based on statistical indices derived from 2 experimental data sets generated during our hts assay development and validation process to allow us to select the most appropriate data-processing and hit identifica-tion methods for an hts assay.

Statistical analysis of 3-day assay signal window variability data

in practice, the robustness and reproducibility of the assay signal window are arguably the most critical features of an hts assay.4,8,12,13,15,22 We employed a procedure to rigorously measure and evaluate the signal windows of our assays by cap-turing data from 2 full plates each of the maX and min con-trols tested in 3 independent experiments conducted on separate days. the reproducibility and variability of the 3-day assay signals and window were evaluated within the assay plate itself (intraplate), from plate to plate (interplate), and from day to day.3-5,8,22

the Z′ factor is one of the most widely accepted statistical coefficients that is used to assess the quality of hts assays,15 and we used the data to calculate Z′ factors for each of the 3 days.

We also calculated the signal-to-background ratio (s/B), another commonly used parameter employed to estimate the separation or range of an assay signal window,15,22 for each of the 3 days.

the percent coefficient of variation (%cV) is a normalized measure of the dispersion of a probability distribution. We cal-culated the %cVs for the 2 full plates of maX and min con-trol signals for each day.

the definitions of Z′ factor, s/B, and %cV are described in detail in the supplementary text.

Statistical analysis of the 3-day DMSO validation data

the 3-day, 5-plate dmso validation experiments mimic 3 independent days of screening operations where dmso is sub-stituted for compounds. in a random diversity screen, it is assumed that the majority of the compounds will be inactive,12 and the 3-day dmso validation data therefore represent a meas-ure of the variability of inactive compounds in the hts assay. We analyzed the 3-day dmso validation data by performing a normality evaluation and analysis of variance (anoVa) to iden-tify row/column effects or other positional biases.

Normality analysis. the normplot functionality of matlab (mathworks, inc., natick, ma) was used to generate normal probability plots of the dmso validation data and to evaluate

the normality of the measurements. a normal probability plot presents the sample data superimposed on the line joining the first and third quartiles of each column (a robust linear fit of the sample order statistics). the x-axis values are the data measure-ments. in the normal probability plot, when samples approxi-mately follow straight lines through the first and third quartiles of the samples, an assumption of normality is reasonable. otherwise, if the points between the first and third quartiles curve away from the line, an assumption of normality is false.

Two-way ANOVA of columns/rows effects. the anova2 func-tionality of matlab was used to perform the 2-way anoVa of the dmso validation data by comparing the means of columns and rows of data in the matrix X, where each data point repre-sented an independent sample containing mutually independent observations. the function returns the columns’ and rows’ p-values under the null hypothesis that all samples in matrix X are drawn from populations with the same mean. if the col-umns’ or rows’ p-value is less than 0.01, it casts doubt on the null hypothesis and suggests that at least 1 sample mean is significantly different from the other sample means. the row/column matrixes of the 15 dmso validation plates for each assay were loaded into matlab, and the anova2 function was applied to each plate data matrix. if the columns’ p-value of a plate was <0.01, probabilistically (1 out of 100 times), 2 or more columns of that plate were statistically different (i.e., column effects existed on that plate). similarly, if the rows’ p-value of a plate was <0.01, probabilistically (1 out of 100 times), 2 or more rows of that plate were statistically different (i.e., row effects existed on that plate).

if the dmso data are not normally distributed, the 2-way anoVa is not a valid means of evaluating the column/row effects, but a qualitative decision could be made from a graph-ical inspection of the data (plate heat map and plate scatter plot vs. rows or columns). the evaluation of the normality and col-umn/row effects is useful to help select an appropriate hit iden-tification method. For example, if the anoVa of dmso data exhibits a statistically normal distribution with significant row and/or column position effects, a B score hit identification method is recommended over a Z score method.12,13

Multilevel quality control review of HTS data

For Qc purposes, hts data are typically reviewed at 3 levels: at the plate, operations run, and overall screen performance lev-els.2-10,17 We used a variety of data visualization tools such as heat maps, scatter plots, and results frequency distribution plots together with plate controls-based Z′ factors and s/B ratios for the Qc of hts data.2,3,5-10,17 We also incorporated non-controls-based Qc parameters such as plate median and robust %cV for the Qc of hts data.14,27 the robust %cV is calculated through the median and mad of the normalized sample values that reduce plate row and column positional effects by fitting the 2-way median polish algorithm. the median of the sample

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Shun et al.

4 www.slas.org Journal of biomolecular Screening 16(1); 2011

values on a plate is more resistant to outliers and provides a bet-ter representation of any trends in plate-to-plate measure-ments.13,14 in a primary screen, the majority of samples are inactive, whereas a few outliers are either true or false (random errors) actives. since the mean and standard deviation of all samples can be skewed by the unusual values of active com-pounds, we calculated the plate robust %cV to take into consid-eration any plate row and column position effects, which may unexpectedly increase the sample variability, and small numbers of actives or random errors that act like outliers. For Qc pur-poses, we passed all hts plates with robust %cVs <20%.

rESuLtS And dIScuSSIon

Statistical analysis of HTS assay development and validation data

during the assay development phase, all 6 of the hts assay case histories considered here (suppl. table s1) were subjected to the 3-day assay signal window test described above, and derived average Z factors, %cVs of maX and min control signals, and s/B ratios are presented in table 1. the 3-day Z-factors for all 6 assays were >0.5, indicating that they were high-quality assays that should be compatible with hts.15 the 3-day %cVs of the maX controls for all 6 assays were all <15%, well below the 20% threshold for the maximum accept-able dispersion of the probability distribution of the assay control data.22 similarly, the 3-day %cVs of the min controls for 5 of the 6 assays were also <15%. only the 3-day %cVs of the min controls for the WnV ns2B-ns3 assay (47.7 ± 15.6) exceeded the 20% threshold. the cdc25B, WnV ns2B-ns3, plK-1, pKd, and rt-rnaseh assays all exhibited 3-day aver-age s/B ratios >3.5-fold, whereas the plK-1 pBd assay 3-day average s/B ratio was only 3.06 ± 0.7-fold. the combination of Z′ factors >0.5 together with low %cVs in both control popula-tions and average s/B ratios >3.5-fold all indicated that a controls-based data-processing and active identification strat-egy would be most suitable for the cdc25B, plK-1, pKd, and rt-rnaseh assays. in contrast, the relatively narrow 3-fold signal window of the plK-1 pBd assay and the very large (%cV = 47%) variation observed in the min control popula-tion of the WnV ns2B-ns3 assay would seem to be better suited to non-controls-based statistical data-processing and hit identification methods.

the Z score statistical method is only valid for normal dis-tribution measurements.12,13,15,21,22 the graphical normal proba-bility plots for the dmso validation data for the 6 assays are presented in Figure 1. superimposed on the dmso sample data is a line representing the robust linear fit of the sample order statistics for a normal distribution. the 15 plates of dmso validation data for all 6 hts assays closely approxi-mated a normal distribution line (Fig. 1), and a 2-way anoVa performed by rows and columns indicated that none of these

assays exhibited significant (p > 0.01) row or column effects. the same statistical indices described above were also gener-ated from the dmso validation plate controls (suppl. table s2). consistent with the 3-day assay signal window data (table 1), the cdc25B, plK-1, pKd, and rt-rnaseh assays all per-formed well during the 3-day dmso validation test, and their plate controls exhibited robust and reproducible Z′ factors and s/B ratios (suppl. table s2). in addition, for all 4 of these assays, the %cVs of the maX and min controls were consist-ently <20%, confirming that they were well behaved (suppl. table s2). using an active criterion of ≥50% inhibition/activa-tion for these 4 assays, the plK-1 and pKd assays exhibited false-positive rates of 0.19% and 0.12%, respectively, in the 4800 dmso wells tested (suppl. table s2). For the plK-1 pBd assay, however, the average s/B ratio collapsed from 3-fold (table 1) to ~2-fold, the corresponding plate Z′ factors were only acceptable on day 3 of the test, and based on an active criterion of ≥50% inhibition, the assay exhibited a false-positive rate of 0.17% (suppl. table s2). the average %cVs of the plK-1 pBd maX and min controls were <20%, indicat-ing that the assay was reasonably well behaved (suppl. table s2). although the majority of the WnV ns2B-ns3 assay plate Z′ factors were acceptable during the 15-plate dmso valida-tion test, the average s/B ratio had collapsed from ~15-fold (table 1) to ~3-fold, and based on an active criterion of ≥50% inhibition, the assay produced the highest false-positive rate of 0.52% (suppl. table s2). the large variation observed in the WnV ns2B-ns3 min plate control data during the 3-day assay signal window test (table 1) was also apparent during the dmso validation test and contributed to the lower Z′ fac-tors, the collapse in the s/B ratio, and the high false-positive rate (suppl. table s2).

Figure 2 illustrates how the multiple statistical indices cal-culated from the 3-day assay signal window (table 1) and dmso validation tests (Fig. 1 and suppl. table s2) may be applied to aid the selection of the most suitable data-processing method for hts data. controls-based percent inhibition/ activation normalization methods are sensitive to variability in plate controls when the numbers of replicates are limited and

table 1. three-day assay signal Window statistical indices summary

Assays Z FactorMAX Control

%CVMIN Control

%CV S/B

cdc25B 0.60 ± 0.08 11.43 ± 2.55 14.43 ± 2.09 20.60 ± 1.81WnV ns2B-ns3 0.66 ± 0.044 7.29 ± 1.18 47.74 ± 15.56 14.94 ± 4.29plK-1 0.60 ± 0.18 5.77 ± 3.23 12.24 ± 7.46 3.82 ± 0.23plK-1 pBd 0.65 ± 0.03 3.91 ± 0.62 11.61 ± 0.52 3.06 ± 0.07rt-rnaseh 0.85 ± 0.02 4.62 ± 0.65 4.56 ± 0.25 13.60 ± 0.53pKd 0.68 ± 0.13 4.47 ± 1.74 11.75 ± 0.46 5.29 ± 0.41

cV, coefficient of variation; s/B, signal-to-background ratio.

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Appropriate HtS data-Processing and Quality control Methods

Journal of biomolecular Screening 16(1); 2011 www.slas.org 5

FIg. 1. normal probability plots of the 3-day, 5-plate dmso validation data: (A) cdc25B, (b) WnV ns2B-ns3, (c) plK-1, (d) plK-1 pBd, (E) rt-rnaseh, and (F) pKd. a normal probability plot presents the sample data (■) superimposed on the line (—) joining the first and third quartiles of each column (a robust linear fit of the sample order statistics). the x-axis values are the data measurements. in the normal probabil-ity plot, when samples approximately follow straight lines through the first and third quartiles of the samples, an assumption of normality is reasonable.

FIg. 2. Work flow to aid selection of high-throughput screening (hts) data-processing methods, establish quality control criteria, and select actives. cV, coefficient of variation; false +ve, false positive; s/B, signal-to-background ratio.

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Shun et al.

6 www.slas.org Journal of biomolecular Screening 16(1); 2011

also when the magnitude of the assay signal window is small.12,13,15,22 in contrast to the 2 full plates of maX and min plate controls (n = 768) of the 3-day assay signal window deter-minations (table 1), the 384-well hts plate format used in the dmso validation test and hts only provides a limited number of control wells (24 min controls, 32 maX controls, and 320 compound wells; suppl. Fig. s1). nevertheless the cumulative statistical indices from the 3-day assay signal window and dmso validation studies indicated that a controls-based data-processing and active identification strategy would be suitable for the cdc25B, plK-1, pKd, and rt-rnaseh assays (Fig. 1, table 1, and suppl. table s2); the maX and min control %cVs were both <20%, the s/B ratio was >3.5-fold, the dmso validation data closely approximated a normal distribu-tion, and no significant row or column biases were detected. in contrast, the statistical indices for the plK-1 pBd and WnV ns2B-ns3 assays indicated that they would be better suited to non-controls-based Z score statistical data-processing and active identification methods. although the dmso validation data closely approximated a normal distribution (Fig. 1) and no significant row or column biases were detected, the WnV ns2B-ns3 and plK-1 pBd assays violated 1 or more of the 3 other statistical indices (table 1 and suppl. table s2). the WnV ns2B-ns3 assay exhibited min control %cVs >20%, and for the plK-1 pBd assay, the s/B ratio was consistently ≤3-fold. robust hit identification methods such as B scores and BZ scores can reduce the impact of systematic row/column/edge effects in hts data,13,23 but since none of the hts assays exhibited significant row/column biases in either test, these methods would not have been selected. indeed, it would be our recommendation that any assay that exhibits significant row/column biases in these tests should be suspended until the source of the systematic error has been identified and cor-rected. the value of such robust methods resides in their ability to rescue hts data on occasions when a systematic positional bias such as a row/column/edge effect may have occurred in a large hts data set. rather than failing the affected plates and rescreening them, the retrospective application of these data analysis methods may serve as an alternative that allows the data to be used and actives identified.

Multiple-level statistical and graphical quality control review of HTS data

although measured signals from hts operations are deter-mined accurately, the issue is that artifacts contributing to the measurements lead to false conclusions (inappropriate posi-tives or negatives).12-14 nonrandom systematic variability from technical or procedural failures can often be detected through the application of appropriate statistical and graphical Qc pro-cedures at the plate, operations run, and overall hts perform-ance levels.2-10,13,14,17 a thorough Qc review and analysis of the

hts data are therefore critical to the correct identification of actives/hits. to illustrate the utility of a multilevel Qc process, we present data from one of three 150-plate days of screening operations from the pKd hts campaign of 197,000 com-pounds.

Plate-level QC. the scatter plot (Fig. 3A) data visualization of the percent inhibition from a single assay plate randomly selected in the pKd hts campaign (aid:797) indicated that the assay was well behaved, and there was a robust assay signal window between the maX and min plate controls and that the majority of compounds were inactive and exhibited activity levels that coincided with the dmso controls. the correspond-ing heat map of the percent inhibition data encoded by continu-ous color changes from white (low) to black (high) revealed no discernible systematic patterns within the data (Fig. 3b). the Z′ factor for the plate presented was 0.62, the s/B ratio was 3.2-fold, and a single active compound (well e13, 95% inhibi-tion) was readily apparent in both the scatter plot and heat map data visualizations (Fig. 3).

Screening operations run QC. a scatter plot of the plate Z′ factors versus plate order from a 150-plate day of operations in the pKd hts is presented in Figure 4A. the average Z′ factor across all 150 plates was 0.44 ± 0.24. the negative Z′ factor values for 3 of the plates (121, 122, and 125) were clearly out-liers and indicated that the plate control data for these 3 plates were of poor quality and that they should be failed and sched-uled for retesting. When the number of plate controls is limited and the assay has a narrow assay signal range, the Z′ factor is very sensitive to variability in the control data,15 and the use of Z′ factors alone may be a poor choice for assessing the quality of an hts assay. Figure 4b depicts the plate median raw data value for all 320 samples on the plate versus plate order for the same 150 plates presented in Figure 4A. Figure 4b clearly illustrates that the 150 plates screened on this day were batch processed in 3 × 50-plate batches, which was not apparent from the Z′ factor scatter plot (Fig. 4A). the average median raw data value for all plates was 334.7 ± 18.0, and the plate medians of plates 122 and 125 were >3 standard deviations below the mean of the plate medians. the median values for plates 122 and 125 clearly departed from the general trend of the plate medians in the third batch of plates processed (Fig. 4b). Figure 4c is a scatter plot of the plate robust %cV versus plate order for the same 150 plates presented in Figure 4A,b. the average robust %cV for all 150 plates was 4.3 ± 3.1. the plate robust %cV values for 5 plates (121-125) were clearly outliers com-pared to the other 145 plates screened (Fig. 4c), raising con-cerns about the quality of data from all 5 of these hts plates. the plate robust %cV was the only plate-level Qc parameter that captured the suspicious data quality of plates 123 and 124. in practice, we strongly suggest that the plate robust %cV

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Appropriate HtS data-Processing and Quality control Methods

Journal of biomolecular Screening 16(1); 2011 www.slas.org 7

FIg. 3. primary high-throughput screening (hts) data quality control review. (A) percent inhibition single-plate scatter plot and (b) percent inhibition data single-plate heat map from the pKd hts campaign. (A) the scatter plot data visualization of the percent inhibition from a single assay plate in the pKd hts campaign (aid:797) indicated that there was a robust assay signal window between the maX (0 % inhibition, ■) and min (100% inhibition, ▲) plate controls and that the majority of compounds (○) were inactive and exhibited activity levels that coincided with the dmso controls. one active compound (●) was apparent on the plate. (b) the corresponding heat map of the percent inhibition processed data encoded by continuous color changes from white (low) to black (high) revealed no discernible systematic pattern within the data and 1 active compound.

FIg. 4. high-throughput screening (hts) operations data quality control review. (A) plate Z′ factors, (b) plate medians, and (c) plate robust % coefficients of variation (cVs). (A) a scatter plot of the plate Z′ factors (▲) versus plate order from a single 150-plate day of operations from the pKd hts. individual plate Z′ factors were calculated from the 32 maX and 24 min control wells on each plate. (b) a scatter plot of the plate median raw data values calculated from all 320 compound sample wells on the plate (▲) versus plate order for the same 150 plates pre-sented in A. (c) a scatter plot of the plate robust %cV (▲) versus plate order for the same 150 plates presented in A and b.

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Shun et al.

8 www.slas.org Journal of biomolecular Screening 16(1); 2011

should be calculated to evaluate the Qc of hts assay plates since it implicitly and robustly reflects the variation in the sam-ples tested.

to further illustrate the important contribution of data visu-alizations to the Qc review process, we present trellis views of heat maps (Fig. 5A) and scatter plots (Figure 5b) for 15 of the same 150 assay plates from the pKd hts presented in Figure 4. columns 1, 2, 23, and 24 on each plate contained the maX and min plate controls that produced consistent color patterns in the heat maps (Fig. 5A) and a robust assay signal window in the scatter plots (Fig. 5b). even a brief glance reveals that the sample data from 5 of the plates (pmlsc1071 to pmlsc1075) exhibit significantly different patterns as compared to the rest of the plates in both the heat map and scatter plot representa-tions (Fig. 5). pmlsc plates 1071, 1072, and 1075 exhibit quite high variability in their maX and min plate controls data and negative Z′ factor values (Fig. 4A). there was less varia-bility in the controls on plates 1073 and 1074, but the system-atic errors in the sample data for both of these plates were

readily apparent in the heat map and scatter plot representations (Fig. 5), even though they produced Z′ factors of 0.38 and 0.31, respectively. an underlying assumption of controls-based data-processing methods is that the behavior of control samples truly reflects the performance of the test samples. in practice, however, several factors may compromise the validity of the control-based assumption and the accuracy of any Qc conclu-sions: (1) as discussed above, limitations in both the number and position of control well replicates relative to the number and placement of samples on the plate can lead to large differ-ences between the Qc parameters calculated from controls and the actual behavior of the samples; (2) plate controls are often stored differently than the compound samples and/or may be added to assay plates by a different automated process; and (3) temporal or spatial plate effects and artifacts from compound properties may also not be apparent in the plate controls. For these reasons, we strongly recommend the use of both statisti-cal and graphical Qc procedures to review and pass/fail all hts data.

(FIg. 5. continued)

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Appropriate HtS data-Processing and Quality control Methods

Journal of biomolecular Screening 16(1); 2011 www.slas.org 9

FIg. 5. trellis heat map and scatter plot high-throughput screening (hts) operations data quality control review. (A) trellis plate heat map visualization of the raw data from 15 of the 384-well plates from the single 150-plate day of the pKd hts operations and corresponding to plate numbers 116 to 130 designated in Figure 4. the 15 heat maps represent the percent inhibition processed data encoded by continuous color changes from white (low) to black (high). (b) trellis scatter plot visualization of the raw data from 15 of the 384-well plates from the single 150-plate day of the pKd hts operations presented in Figure 4. maX plate controls (0% inhibition, ■), min plate controls (100% inhibition, ▲), and compounds (○).

HTS campaign QC. in Figure 6, we directly compare the plate Z′ factor statistical coefficient, a widely used indicator of hts assay quality,12,13 to the robust %cVs for the assay plates that passed the Qc review from our 6 hts campaigns.

overall, the cdc25B assay performed extremely well during the hts campaign, and the plate controls provided a robust percent inhibition assay signal window and produced average plate Z′ factors and s/B ratios of 0.74 ± 0.1 and 64.4 ± 27-fold, respectively (Fig. 6A and suppl. table s3).6 the compound wells of the plates in the cdc25B hts exhibited an average robust %cV of 4.4% ± 1.2% for the whole campaign (Fig. 6A and suppl. table s3), consistent with high-quality hts data. a

similar analysis of the data from 5 days of operations for the pKd kinase screening campaign revealed that the assay also performed very well in hts (Fig. 6c)10 with a reproducible controls-based percent inhibition assay signal window that produced average Z′ factors and s/B ratios of 0.55 ± 0.1 and 3.22 ± 0.22-fold, respectively (suppl. table s3). For each of the 5 pKd operations runs (Fig. 6c), the plate Z′ factors improved with plate order, indicating that there was a systematic enhance-ment of the assay signal window over time. it may be that as the screening run progressed, later plates benefitted from longer incubation times after certain additions during the hts process or that starting reagents exhibited a time-dependent increase in

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Shun et al.

10 www.slas.org Journal of biomolecular Screening 16(1); 2011

activity as they sat in reservoirs over the duration of the run. in any case, the observed trend was an increase in assay quality (improved Z′ factors), and the compound wells of the plates in the pKd hts exhibited an average robust %cV of 3.09% ± 0.83% for the campaign, also consistent with high-quality hts data (Fig. 6c and suppl. table s3). the same was also true for the plK-1 kinase hts campaign (Fig. 6d), where the plate controls again provided a good percent inhibition assay signal window and produced average Z′ factors and s/B ratios of 0.66 ± 0.09 and 4.65 ± 0.38-fold, respectively (suppl. table s3). in addition, the compound wells of the plates in the plK-1 hts exhibited an average robust %cV of 3.31% ± 0.96% for the whole campaign (Fig. 6d and suppl. table s3), indicative of high-quality hts data. the rt-rnaseh assay also performed extremely well in hts, and the plate controls provided a robust assay percent activation signal window with average Z′ factors and s/B ratios of 0.84 ± 0.04 and 10.89 ± 1.09-fold, respec-tively (Fig. 6F and suppl. table s3). the high quality of the rt-rnaseh hts data was also indicated by the average robust %cV of 5.56% ± 1.06% for the campaign (Fig. 6F and suppl.

table s3). in agreement with the data from the 3-day assay signal window and dmso validation tests (table 1 and suppl. table s2), the hts performance data indicate that the plate controls for the cdc25B, pKd, plK-1, and rt-rnaseh assays exhibited very low variability and produced signal windows with relatively large and stable dynamic ranges. the average plate Z′ factors for all 4 hts campaigns were >0.5, and the compound-derived average robust %cVs were <10%, indicat-ing that both run-level parameters reflected the quality of the hts data (Fig. 6A,c,d,F and suppl. table s3).

For the WnV ns2B-ns3 hts campaign, however, the large variation observed in the min plate control data during the 3-day assay signal window and dmso validation tests (table 2 and suppl. table s2) was also apparent throughout the screen and resulted in negative Z′ factor performance data for all of the hts plates (Fig. 6b and suppl. table s3). in contrast, the average robust %cV for the compounds in the WnV ns2B-ns3 hts campaign was 5.28% ± 1.71%, indicating that despite the poor Z′ factor coefficients of the plate controls, the hts compound data were of good quality (Fig. 6b).5 in the

FIg. 6. high-throughput screening (hts) campaign-level data quality control review. plate Z′ factors and plate robust % coefficients of varia-tion (%cVs). scatter plots of the plate Z′ factors (solid triangles, left y-axis) and plate robust %cVs (filled circles, right y-axis) versus plate order for the 6 hts campaigns. individual plate Z′ factors were calculated from the 32 maX and 24 min control wells on each plate. (A) cdc25B hts campaign, (b) WnV ns2B-ns3 hts campaign, (c) pKd kinase hts campaign, (d) plK-1 kinase hts campaign, (E) plK-pBd binding hts campaign, and (F) rt rnase h hts campaign.

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Appropriate HtS data-Processing and Quality control Methods

Journal of biomolecular Screening 16(1); 2011 www.slas.org 11

table 2. in silico comparison of high-throughput screening (hts) data-processing and hit identification methods

HTS Campaign Percent Inhibition Z Score MAD Score B Score BZ Score

cdc25B compounds tested 65,222 65,222 65,222 65,222 65,222 primary active threshold 50 3 sd 3 top 1% 3 sd # primary actives 84 378 665 653 432 # confirmed actives 29 29 29 29 29 # False positives 55 349 636 624 403 # False negatives 0 0 0 0 0 primary active rate 0.13 0.58 1.02 1.00 0.66 hit confirmation rate 34.52 7.67 4.36 4.44 6.71 False-negative rate 0.00 0.00 0.00 0.00 0.00WnV ns2B-ns3 compounds tested 65,218 65,218 65,218 65,218 65,218 primary active threshold 50 3 sd 3 top 1% 3 sd # primary actives 4726 119 227 653 254 # confirmed actives 2 15 15 14 14 # False positives 4724 104 212 639 240 # False negatives 13 0 0 1 1 primary active rate 7.25 0.18 0.35 1.00 0.39 hit confirmation rate 0.04 12.61 6.61 2.14 5.51 False-negative rate 86.67 0.00 0.00 6.67 6.67plK-1 compounds tested 97,101 97,101 97,101 97,101 97,101 primary active threshold 50 3 sd 3 top 1% 3 sd # primary actives 518 1012 2397 971 1042 # confirmed actives 39 38 39 39 38 # False positives 479 974 2358 932 1004 # False negatives 0 1 0 0 1 primary active rate 0.53 1.04 2.47 1.00 1.07 hit confirmation rate 7.53 3.75 1.63 4.02 3.65 False-negative rate 0.00 2.56 0.00 0.00 2.56plK-1 pBd compounds tested 97,041 97,041 97,041 97,041 97,041 primary active threshold 50 3 sd 3 top 1% 3 sd # primary actives 67 127 195 971 188 # confirmed actives 14 15 15 14 13 # False positives 53 112 180 957 175 # False negatives 1 0 0 1 2 primary active rate 0.07 0.13 0.20 1.00 0.19 hit confirmation rate 20.90 11.81 7.69 1.44 6.91 False-negative rate 6.67 0.00 0.00 6.67 13.33rt-rnaseh compounds tested 65,218 65,218 65,218 65,218 65,218 primary active threshold 50 3 sd 3 top 1% 3 sd # primary actives 1250 1151 3166 653 1193 # confirmed actives 370 370 370 273 368 # False positives 878 781 2796 380 825 # False negatives 0 0 0 97 2 primary active rate 1.92 1.76 4.85 1.00 1.83 hit confirmation rate 29.60 32.15 11.69 41.81 30.85 False-negative rate 0.00 0.00 0.00 26.22 0.54pKd compounds tested 196,146 196,146 196,146 196,146 196,146 primary active threshold 50 3 sd 3 top 1% 3 sd # primary actives 109 823 3456 1962 912 # confirmed actives 31 31 31 31 31 # False positives 78 792 3425 1931 881 # False negatives 0 0 0 0 0 primary active rate 0.06 0.42 1.76 1.00 0.46 hit confirmation rate 28.44 3.77 0.90 1.58 3.40 False-negative rate 0.00 0.00 0.00 0.00 0.00

# primary actives represents the number of primary actives. # confirmed actives represents the number of confirmed actives. # False positives represents the number of false positives. # False negatives represents the number of false negatives. the primary actives identified by applying the optimal hit identification method for the primary screens of the 6 hts campaign are highlighted in bold. the bold-highlighted confirmed actives for each campaign were determined by running concentration-response screening on the primary actives (in bold), which are true actives of the campaign. the un-highlighted confirmed actives were estimated by searching the number of the true actives (bold-highlighted confirmed actives) in the primary actives identified by other nonoptimal hit identification methods. the # false positives defines the number of primary actives subtracted by number of confirmed actives (i.e., # false positives = # primary actives – # confirmed actives). the # false negatives defines the missing number of the true actives (bold-highlighted confirmed actives) in the primary actives. mad, median absolute deviation.

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Shun et al.

12 www.slas.org Journal of biomolecular Screening 16(1); 2011

plK-1 pBd campaign, a significant number of the plates also exhibited negative Z′ factors that contributed to a low overall average Z′ factor of 0.10 ± 0.12 for the screen (Fig. 6E and suppl. table s3). the narrow dynamic range of the plK-1 pBd assay observed in the 3-day assay signal window and dmso validation tests (table 1 and suppl. table s2) was also apparent in the screen where the plate controls only produced an average s/B ratio of 2.14 ± 0.1-fold (suppl. table s3). despite the very narrow assay signal window and the apparent low plate Z′ factors, the compound wells of the plK-1 pBd plates exhibited an average robust %cV of 6.55% ± 0.83%, consistent with high-quality hts data (Fig. 6E and suppl. table s3).

Active identification determined by an optimal active identification method

to be consistent in our comparison of the 5 different data-processing and active identification methods, we applied the following terms to assess the performance of the methods:

1. the primary active rate16 defines the ratio of the total number of primary actives to the number of total tested compounds in a primary screen (i.e., primary active rate = [100 * number of actives]/number of tested compounds).

2. the hit confirmation rate16 defines the ratio of the number of confirmed actives to the total number of primary actives selected in a primary hts screen (i.e., hit confirmation rate = number of confirmed actives * 100/number of pri-mary hts actives). primary hts actives were confirmed as hits in concentration-response assays.

3. the false-positive rate16 defines the ratio of the number of primary hts actives that failed to be confirmed in concentration-response assays (i.e., false-positive rate = [number of primary hts actives – number of confirmed hits] * 100/number of primary hts actives).

4. the false-negative rate16 defines the ratio of the number of the actives missed in primary actives to the number of con-firmed actives (i.e., false-negative rate = number of missed actives * 100/number of confirmed actives).

to compare the 5 data-processing and hit identification methods, we estimated false-positive and false-negative rates based on the number of actives that were actually confirmed in concentration-response screening of the 6 campaigns (table 2). a good data-processing and active identification method should provide a lower primary active rate, a higher hit confirmation rate, and a lower false-negative rate. We applied each of the 5 methods to the committed quality-assured hts data measure-ments acquired in the 6 campaigns (table 2). the active thresholds for the 5 methods, the number of primary actives identified, the corresponding active rate, and the number of hits

confirmed in concentration-response assays, together with the estimated numbers of false positives and negatives, are pre-sented (table 2).

our analysis indicates that the controls-based percent inhibi-tion method clearly provided the lowest primary hts active rate and the highest hit confirmation rates for the cdc25B, plK-1, and pKd hts campaigns (table 2). For all 3 of these screens, the other data-processing and active identification methods would have produced a significantly higher primary hts active rate with correspondingly more false positives and lower hit confirmation rates. in the WnV ns2B-ns3 cam-paign, however, a plate controls-based percent inhibition method would have produced a primary hts active rate (7.25%) that was much too high, and 13 of the 15 confirmed hits would have been missed because they would not have met the active criterion in the primary screen (table 2). For the WnV ns2B-ns3 screen, the Z score statistical method pro-vided the lowest primary hts active rate (0.18%) and the high-est hit confirmation rate (12.6%). For the WnV ns2B-ns3 data set, the mad score, B score, and BZ score methods all provided higher primary hts active rates with correspond-ingly more false positives and lower hit confirmation rates than the Z score method (table 2). the Z score statistical method was also optimal for the plK-1 pBd campaign, where the other methods produced significantly higher primary hts active rates, more false positives, and lower hit confirmation rates (table 2). Both the plate controls-based percent inhibition and the Z score statistical methods provided comparable pri-mary hts active and hit confirmation rates for the rt-rnaseh hts data set. in this instance, the mad score method would have provided a much higher primary hts active rate with more false positives and a lower hit confirmation rate (table 2). although the B score method actually provided the highest active confirmation rate for the rt-rnaseh hts data set with the lowest primary hts active and false-positive rates, it also produced a false-negative rate of 26% that would be unaccept-able (table 2). these data suggest that although the use of advanced robust statistical methods such as B scores may help to reduce the impact of systematic row/column biases in hts data, care must be exercised to ensure that they are effectively and properly used.

the 5 hit identification methods described in this article can all be used to normalize plate-to-plate variability.12,13,23 By definition, the Z score data-processing method makes a normal distributional assumption of the assay measurements.12,13 since the mean and standard deviation are influenced by statistical outliers, Z score methods are not resistant to statistical outliers. in contrast, the mad score and B score methods are nonpara-metric with minimal distributional assumptions for the meas-urements and are subsequently resistant to the presence of statistical outliers. the BZ score does not assume normal

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Appropriate HtS data-Processing and Quality control Methods

Journal of biomolecular Screening 16(1); 2011 www.slas.org 13

distribution for the measurements but makes normal distribu-tional assumptions for calculated B scores.23 With the B score method, imbedding a 2-way median polish removes systematic row and/or column effects that has the potential to provide 3 main advantages: (1) minimal distributional assumptions, (2) a reduction of measurement bias to positional effects, and (3) resistance to statistical outliers.12,13 however, setting the limit of actives identified by the B score is often contingent on per-sonal preference (e.g., top 1%).13

concLuSIonS

many factors beyond biological activity affect hts data quality. identifying systematic errors and taking steps to reduce and/or control their impact is critical for identifying actives in primary hts data. no single data-processing and active iden-tification method is optimal for all hts data sets. to aid the selection of the most appropriate hts data-processing and active identification method, we have developed a 3-step statis-tical decision methodology (Fig. 2). step 1 is to conduct the 3-day assay signal window and dmso validation tests and, on the basis of the statistical indices determined from these data, to select the most appropriate hts data-processing method and to establish criteria for Qc review and active identification. step 2 is to perform a multilevel statistical and graphical qual-ity control review of the screening data to exclude data that fall outside the Qc criteria. step 3 is to apply the established active criterion to the quality-assured data to identify the actives. When used in conjunction with the optimal data-processing method selected in step 1 and applied systematically to every hts run, the multiple-level statistical and graphical Qc meth-ods provide a better way to accurately evaluate the quality of hts screen data and improve hit confirmation rates.

AcKnowLEdgMEntS

We thank caleb Foster, sunita shinde, Jennifer phillips, John skoko, and stephanie leimgruber for their excellent work in the development and implementation of the 6 hts campaigns used in this article. We also acknowledge the many screen leaders and researchers at eli lilly’s research triangle park global assay development and hts facility (sphinx) who contributed to the genesis of the practical meth-odologies described herein (http://www.ncgc.nih.gov/ guidance/index.html). the work reported here was supported by national institutes of health grants u54 mh074411, X01 mh077605, X01 mh077601, X01 mh078959, X01 mh76330, X01 mh078944, and r03 da024898; a national cancer institute grant ca078093; and a grant from the commonwealth of pennsylvania (doh-sap4100027294).

rEFErEncES

1. dove a: screening for content: the evolution of high throughput. Nat Biotechnol 2003;21:859-864.

2. Johnston pa, soares Km, shinde sn, Foster ca, shun tY, takyi hK, et al: development of a 384-well colorimetric assay to quantify hydrogen peroxide generated by the redox cycling of compounds in the presence of reducing agents. Assay Drug Dev Technol 2008;6:505-518.

3. Johnston pa, Foster ca, shun tY, skoko JJ, shinde s, Wipf p, et al: development and implementation of a 384-well homogeneous fluores-cence intensity high-throughput screening assay to identify mitogen- activated protein kinase phosphatase-1 dual-specificity protein phosphatase inhibitors. Assay Drug Dev Technol 2007;5:319-332.

4. Johnston pa: cellular platforms for hts: three case studies. Drug Discov Today 2002;7:353-363.

5. Johnston pa, phillips J, shun tY, shinde s, lazo Js, huryn dm, et al: hts identifies novel and specific uncompetitive inhibitors of the two-component ns2B-ns3 proteinase of West nile virus. Assay Drug Dev Technol 2007;5:737-750.

6. Johnston pa, tierno mB, shun tY, Brummond Km, Wipf p, lazo Js: characterization of the cdc25B dual specificity phosphatase inhibitor hits identified in a high throughput screen of the nih compound library. Assays Drug Dev Technol 2009;7:250-265.

7. sharlow er, leimgruber s, shun tY, lazo Js: development and imple-mentation of a miniaturized high-throughput time-resolved fluorescence energy transfer assay to identify small molecule inhibitors of polo-like kinase 1. Assay Drug Dev Technol 2007;5:723-736.

8. tierno mB, Johnston pa, Foster c, skoko JJ, shinde sn, shun tY, et al: development and optimization of high-throughput in vitro protein phos-phatase screening assays. Nat Protoc 2007;2:1134-1144.

9. sharlow er, leimgruber s, Yellow-duke a, Barrett r, Wang QJ, lazo Js: development, validation and implementation of immobilized metal affin-ity for phosphochemicals (imap)–based high-throughput screening assays for low-molecular-weight compound libraries. Nat Protoc 2008;3:1350-1363.

10. sharlow er, giridhar KV, laValle cr, chen J, leimgruber s, Barrett r, et al: potent and selective disruption of protein kinase d functionality by a benzoxoloazepinolone. J Biol Chem 2008;283:33516-33526.

11. Wunder F, Kalthof B, muller t, huser J: Functional cell-based assays in microliter volumes for ultra-high throughput screening. Comb Chem High Throughput Screen 2008;11:495-504.

12. malo n, hanley Ja, cerquozzi s, pelletier J, nadon r: statistical practice in high-throughput screening data analysis. Nat Biotechnol 2006;24:167-175.

13. Brideau c, gunter B, pikounis B, liaw a: improved statistical methods for hit selection in high-throughput screening. J Biomol Screen 2003;8:634-647.

14. gunter B, Brideau c, pikounis B, liaw a: statistical and graphical meth-ods for quality control determination of high-throughput screening data. J Biomol Screen 2003;8:624-633.

15. Zhang Jh, chung td, oldenburg Kr: a simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen 1999;4:67-73.

16. Zhang Jh, chung td, oldenburg Kr: confirmation of primary active substances from high throughput screening of chemical and biological

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from

Shun et al.

14 www.slas.org Journal of biomolecular Screening 16(1); 2011

populations: a statistical approach and practical considerations. J Comb Chem 2000;2:258-265.

17. soares K, Blackmon n, shun tY, shinde sn, takyi hK, Wipf p, et al: profiling the nih small molecule repository for compounds that generate h(2)o(2) by redox cycling in reducing environments. Assay Drug Dev Technol 2010;8:152-174.

18. coan K, maltby da, Burlingame al, shoichet BK: promiscuous aggregate-based inhibitors promote enzyme unfolding. J Med Chem 2009;52:2067-2075.

19. Jadhav a, Ferreira rs, Klumpp c, mott Bt, austin cp, inglese J, et al: Quantitative analyses of aggregation, autofluorescence, and reactivity artifacts in a screen for inhibitors of a thiol protease. J Med Chem 2010;53:37-51.

20. simeonov a, thomas cJ, Wang Y, huang r, southall nt, shinn p, et al: Fluorescence spectroscopic profiling of compound libraries. J Med Chem 2008;51:2363-2371.

21. gribbon p, lyons r, laflin p, Bradley J, chambers c, Williams Bs, et al: evaluating real-life high-throughput screening data. J Biomol Screen 2005;10:99-107.

22. sittampalam gs, iversen pW, Boadt Ja: design of signal windows in high throughput screening assays for drug discovery. J Biomol Screen 1997;2:159-169.

23. Zhijin W, dongmei l, Yunxia s: Quantitative assessment of hit detection and confirmation in single and duplicate high-throughput screenings. J Biomol Screen 2008;13:159-167.

24. austin cp, Brady ls, insel tr, collins Fs: nih molecular libraries initiative. Science 2004;306:1138-1139.

25. lazo Js: roadmap or roadkill: a pharmacologist’s analysis of the nih molecular libraries initiative. Mol Interv 2006;6:240-243.

26. the pubchem project. retrieved from http://pubchem.ncbi.nlm.nih.gov/. 27. hoaglin dc, mosteller F, tukey JW: Understanding Robust and

Exploratory Data Analysis. new York: John Wiley, 1983.

address correspondence to:Tong Ying Shun

University of Pittsburgh Drug Discovery Institute, School of Medicine

Room 9049, BST-3, 3501 Fifth Avenue, Pittsburgh, PA 15260

E-mail: [email protected]

at UNIV OF VIRGINIA on August 13, 2013jbx.sagepub.comDownloaded from