Large Scale Imaging Analytics for In Silico Biomedicine
-
Upload
independent -
Category
Documents
-
view
0 -
download
0
Transcript of Large Scale Imaging Analytics for In Silico Biomedicine
1
Large Scale Imaging Analytics for In Silico Biomedicine
Joel Saltz, Fusheng Wang, George Teodoro, Lee Cooper, Patrick Widener, Jun Kong, David
Gutman, Tony Pan, Sharath Cholleti, Ashish Sharma, Daniel Brat, Tahsin Kurc
Center for Comprehensive Informatics, the Department of Biomedical Informatics, and the
Department of Pathology & Laboratory Medicine
Emory University
Introduction
The ability to quantitatively characterize biological structure and function in detail through in
silico experiments1 has great potential to reveal new insights into disease mechanisms and enable
the development of novel preventive approaches and targeted treatments. High-resolution
microscopy imaging is playing an increasingly pivotal role in realizing this potential in
healthcare delivery and biomedical research. Digital microscopy technology reduces dependence
on physical slides; it can also enable more effective ways of screening for disease, classifying
disease state, understanding its progression, and evaluating the efficacy of therapeutic strategies.
Systematic studies of tumors at the cellular and sub-cellular levels, for example, provide
tremendous insight as to how alternations in intercellular signaling occur and allow investigators
to study the relationship among morphologic characteristics, cellular-level processes, and
genetic, genomic, and protein expression signatures. Studies conducted using tissue slide images
and genomic data in the In Silico Brain Tumor Research Center[1] have produced results that
reveal morphological subtypes of glioblastoma not previously recognized by pathologists and
1 The term “in silico experiment” broadly refers to an experiment performed on a computer by analyzing, mining, and
2
demonstrate significantly correlated genes through correlation of the extent of necrosis and
angiogenesis with gene expression data[2, 3]. In these studies, in silico experiments analyzing
images from 480 tissue slides from 167 patients discovered that the morphological signatures in
glioblastoma self-aggregate into four distinct clusters. The survival characteristics of this
morphology-driven stratification are significant when compared to the survival of molecular
subtypes, suggesting that morphology is a good predictor of prognosis.
Since the first application of digital technology to microscopic data[4], ability to acquire
high-resolution images from whole tissue slides and tissue microarrays has become more
affordable, faster, and practical[5-13]. The latest generation of devices offer advanced dynamic
focus mechanisms to improve auto-focus quality. Cassette-style slide holders that handle slides
indirectly have reduced mishandling of slides and malfunction during scanning. These advances
have made the digital microscopy technology more practical and efficient. Image scanning times
have decreased from 6-8 hours per whole tissue slide about a decade ago to a few minutes with
advanced scanners; and improvements in auto-focusing and slide holders have facilitated high
throughput image generation from batches of slides with minimal manual intervention.
With the improving cost-effectiveness of scanners, it is rapidly becoming possible for a
research study or healthcare operation to routinely generate hundreds to thousands of whole slide
images per day. This progress is fueling the emergence of what we refer to as the analytical
digital pathology, which involves the investigation of quantitative correlation and integration of
high resolution, high throughput datasets comprising imaging features from multiple imaging
modalities and from across temporal, functional, and dimensional scales. However, unlike other
imaging modalities (such as MRI, which enjoys widespread adoption), to date, microscopic
integrating biomedical databases and/or through simulations.
3
imaging has been underutilized in biomedicine. This is primarily because even moderate
numbers of digitized microscopy specimens quickly lead to formidable information synthesis
and management problems. Software to support the extraction and interpretation of information
from thousands of tissue images has to deal with hundreds of terabytes of data, expensive data
processing requirements, and trillions of microscopic objects and their features.
This chapter describes the computational challenges of employing large volumes of
digitized tissue slide data in biomedicine and presents some of the data-intensive computing
approaches we have developed and employed to address these challenges.
Background
A basic application of the microscopy imaging technology is telepathology, in which a
pathologist can remotely render diagnoses for patient care in the absence of glass slides and a
microscope[14, 15]. In this form of use, a whole slide imaging system should support the
implementation of a “virtual microscope”[14, 16-29], which emulates the basic operation of a
microscope, enabling browsing through a slide to locate an area of interest, local browsing in a
region of interest to observe the region surrounding the current view, and changing magnification
level and focal plane. A virtual microscope can also implement functionality that cannot be
achieved with a physical microscope, such as manual annotation of areas of interest and viewing
of a slide by multiple users simultaneously.
While a virtual microscope facilitates remote viewing and sharing of images, the
digitization of microscopy slides has been accompanied by a proliferation of image analysis
methods as well. The objectives of extracting detailed cellular and sub-cellular information from
whole slide images will depend on the particular study being conducted – a study may focus on
reducing variability and error in diagnosis by emulating established diagnostic procedures;
4
another study may target novel insights into the biology of disease progression by investigating
morphological characteristics of the disease. Nevertheless, analytical digital pathology employs a
set of common data processing operations[30]: (1) Stitching and registration. Some instruments
capture a whole slide images as a set of image tiles. These tiles need to be aligned and stitched
together to form a full image. Image registration is the process of mapping two or more images
into the same coordinate frame. (2) Segmentation of objects and regions. Often the entities to be
segmented are composed of collections of simple and complex objects and structures and are
defined by a complex shape and textural appearance. Examples include identifying cell nuclei,
cell membranes, the boundaries of blood vessels, and the extent of regions at the level of tissues.
(3) Feature extraction. It is the process of calculating informative descriptions of objects or
regions, and often precedes classification or segmentation tasks. It can be applied on the whole
image or individual segmented objects to describe characteristics such as shape and texture. (4)
Classification. Segmented objects, regions, or whole slides can be classified into meaningful
groups based on extracted features. Classification of cell types, antibody activation, or entire
slides into pathologic categories are common classification themes. A high level depiction of a
nuclear segmentation and classification workflow is shown in Figure 1.
We should note that this list of operations is a high-level, simplified description of steps
in microscopy image analysis. Individual steps are often composed of a series of sub-steps. There
have been algorithmic advances in improving accuracy of image analysis methods and reducing
their execution times. Nonetheless, computation, storage, and networking still remain significant
challenges in analytical digital pathology. We describe these challenges in the next section.
5
Figure 1. Nuclear segmentation and classification pipeline. Images are processed through a set of
operations for detecting boundaries of nuclei, computing a set of features for each nucleus, and
classifying nuclei into categories using machine learning algorithms on these features. The results are
stored in a database for further analysis and algorithm evaluation.
Data Intensive Computation Challenges in Analytical Digital Pathology
A typical whole slide pathology image contains 20 billion pixels (with digitization at 40X
objective magnification). An 8-bit color uncompressed representation of this typical image
contains 56 GB. A multilayer image stack that provides a focus capability typically contains tens
of such images. In a typical analysis scenario, computational requirements for a single analysis
type (e.g., feature extraction and region classification) are approximately 10 hours for a single
image stack at 5X magnification. Furthermore, a single scanner can generate hundreds of images,
and a study may generate or reference thousands of slides. For example, through the NCI-funded
In Silico Brain Tumor Research Center (ISBTRC) project[1], our team has so far collected 678
slides at 40X and 480 slides at 20X magnification. We will continue to collect approximately
3500 slides from about 700 patients over the next couple of years.
The data intensive computation requirements are exacerbated in large-scale studies
involving thousands of images. Each image analysis pipeline has its strengths and limitations.
The effectiveness of an analysis pipeline depends on many factors including the nature of the
histological structures being segmented, the classifications being carried out, and on sample
6
preparation and staining. It is not feasible to manually inspect each image for every feature and
fine-tune the analysis pipelines in a large-scale study. Thus, detailed multi-scale characterization
of morphology requires (1) coordinated use of many interrelated analysis pipelines, (2)
performing algorithm sensitivity analyses, and (3) comparison of analysis results from multiple
analysis pipelines and analysis runs. For instance, several hundred variations of analysis
pipelines can be evaluated on a few hundred images. Systematic comparison and evaluation of
the results from these runs can help weed out bad pipeline choices and identify a smaller number
of priority pipelines. These pipelines are then executed on the larger collection of images.
This approach leads to a difficult data management, querying, and integration problem.
Image analysis algorithms segment and classify 105 to 107 cells in each virtual slide of size 105
by 105 pixels. Brain tumor tissue analyses, for instance, can encompass the identification and
quantification of mitotic figures and subcellular structures, which is done through processing in
cells or regions identified as being brain tumor, as well as of angiogenesis and pseudopalisades,
which requires a synthesis of regional texture analyses and segmentation of larger scale
histological structures. As these analyses will execute multiple interrelated analysis pipelines as
described above, a systematic analysis of large-scale image data, therefore, involves
classification of roughly 109 to 1012 micro-anatomic structures. The process of classifying a
given cell is done using roughly 10-100 shape, texture, and (when appropriate) stain
quantification features. Thus, a thorough data analysis limited to classifying cells could
encompass 1010 to 1013 features. It could take an hour or longer to compare results generated
from two algorithms for a single image with a database without parallelization. Comparing two
result sets from a hundred images could take a week. In addition to comparing results from
multiple analyses, scalable mechanisms are needed for producing biologically or
7
computationally meaningful data aggregates (e.g., machine learning based clustering) from
spatial objects and features. Computation of data aggregates on large number of images could
take days or weeks.
Data Intensive and High Performance Computing Approaches for Large Scale Analytical
Digital Pathology
Analysis of Large Microscopy Imaging Datasets
Processing of very large images and image datasets require careful coordination of data retrieval,
distribution of data among processing nodes, and mapping of processing tasks to nodes. A
combination of multiple parallelism approaches can be employed to quickly render results from a
large dataset. Multiple images can be processed concurrently in a bag-of-tasks strategy, in which
images or image tiles are assigned to groups of computing nodes, generally using a demand-
driven strategy. Several systems have employed this type of parallelism on cluster systems and in
Grid computing environments. Gurcan et al. reported the successful application of distributed
computing in a pilot project to support automated characterization of Neuroblastoma using the
Shimada classification system[31]. The ImageMiner system employed IBM’s World Community
Grid in July 2006, using more than 100,000 imaged tissue discs[27, 32]. High-resolution images
may not fit in the main memory of a single processor. In addition, image analysis workflows may
consist of operations that can process data in a pipelined, streaming manner. These
characteristics of data and operations are suitable for combined use of task- and data-parallelism.
The out-of-core virtual microscope (OCVM) system[33, 34], based on the DataCutter
infrastructure[35], supports multiple parallelism approaches. In this system, multiple instances of
workflows can be created and executed with each instance processing a subset of images. Within
8
each workflow instance, an image is partitioned into chunks (rectangular sub-regions) so that I/O
operations can be coordinated by the runtime system rather than relying on the virtual memory.
The processing operations constituting the workflow can be mapped to processors to reduce I/O
and communication overheads. Multiple instances of an operation can be instantiated to allow for
data-parallelism and pipelined processing. The execution time of image analyses can further be
reduced when it is acceptable to trade off analysis accuracy for performance – in some case,
there may not be enough resources available to carry out an analysis at the highest resolution or
an exploratory study may not need the highest accuracy to process a large set of potential data
points quickly. A framework to support accuracy-performance trade-offs in microscopy image
analysis has been developed by Kumar et. al.[33] This framework integrates the Wings system
for high-level semantic expression of image analysis workflows, Pegasus for execution of
workflows in a Grid environment, OCVM for fine-grain parallelism and pipelined execution
within a high-performance Grid node, and a set of algorithms for adaptive processing. The
algorithms exploit spatial locality of image features to create dynamic data processing schedules
in order to improve performance while meeting quality of output requirements.
General purpose graphics processing units (GPGPUs) have emerged as a popular
implementation platform for many data-intensive computational science applications. An
increasing number of research projects have developed GPU-accelerated implementations of
image processing operations[36-60]. The GPU implementations of image processing operations
aim to exploit low-latency and high-bandwidth GPU memories and massively multi-threaded
execution models.
Representing and Managing Image Data and Analysis Results
Efficient data repositories anchored on rich and flexible data models play a crucial role in
9
interpretation, reusability, and reproducibility of imaging studies. The underlying data model
should be able to precisely and unambiguously describe image datasets, analyses, and analysis
results. It should be able to represent a) context relating to patient data, specimen preparation,
special stains, etc, b) human observations involving pathology classification and characteristics,
and c) algorithm and human-described segmentations (markups), features, and annotations.
Markups can be either geometric shapes or image masks; annotations can be calculations,
observations, disease inferences or external annotations. The relationships between data elements
can also be complex. For example, additional annotations can be derived from existing
annotations. As a result, generic and extensible metadata models are required to support different
types of analyses and applications.
Several projects have developed data models for representation and management of
microscopy images and analysis results[61-65], although there are yet no official standard
models. The Open Microscopy Environment (OME)[61] project has developed a data model and
a database system that can be used to represent, exchange, and manage image data and metadata.
The OME provides a data model of common specification for storing details of microscope setup
and image acquisition. Cell-Centered Database (CCDB)[64, 65] provides a data model to capture
image analysis output, image data, and information on the specimen preparation and imaging
conditions that generated the image data. DICOM Working Group 26 is developing a DICOM
based standard for storing microscopy images[66]. The metadata in this model captures
information such as patient, study and equipment information. The PAIS model[62, 63] is
designed to provide an object-oriented, extensible, semantically enabled data model to support
large-scale analytical imaging and human observations to be storage and performance efficiency
oriented, and supports alternative implementations. An XML based representation of the PAIS
10
model can be used to facilitate exchange and sharing of results in a format more compatible with
Web standards and tools. However, for very large result sets, the XML representation is not
efficient, even with compression of the documents. An alternative approach is to employ self-
describing structured container technologies such as HDF5. Such container technologies provide
more efficient storage than text-based file formats like XML, while still making available the
structure of the data for query purposes. We have observed in our projects that the HDF5
representation of analysis results is on average 6-7 times smaller, in compressed form, than
compressed XML representation of the same results.
Query Support
An image analysis results repository should be able to allow retrieval of information from a large
number of data elements by a wide range of queries. Many of the data elements are anatomic
objects such as lesions, cells, nuclei, blood vessels, etc. In addition to an object’s classification,
spatial relationships among objects are also important in understanding the biological structure
and function. The examples of query types include: (i) retrieval of image data and metadata (e.g.,
count nuclei where their grades are less than 3); (ii) queries to compare results generated from
different approaches, and validate machine generated results against human observations (e.g.,
find nuclei that are classified by observer A and by algorithm B and whose feature f is within the
range of a and b); (iii) queries on assessing relative prevalence of features or classified objects, or
assessing spatial coincidence of combinations of features or objects (e.g., which nuclei types
preserve nuclei features: distance and shape between two images); (iv) queries to support
selection of collections of segmented regions, features, objects for further machine learning or
content based retrieval applications (e.g., find nuclei with an area between 50 pixels and 200
pixels in selected region of interest); and (v) semantic queries based on spatial relationships and
11
annotations and properties drawn from domain ontologies (e.g., search for objects with an
observation concept astrocytoma and that are within 100 pixels of each other, but also expand to
include all the subclass concepts, gliosarcoma and giant cell glioblastoma, of astrocytoma).
In order to scale to large volumes of data, databases of analysis results can be physically
partitioned across the nodes of a cluster system. Distributed memory can also be leveraged to
reduce I/O costs. We have investigated the performance of different database configurations for
spatial joins and cross-match operations[67]. The configurations included a parallel database
management system with active disk style execution support for some types of database
operations, a database system designed for high-availability and high-throughput (MySQL
Cluster), and a distributed collection of database management systems with data replication. Our
experimental evaluation of cross-match algorithms[68, 69] shows that the choice of a database
configuration can significantly impact the performance of the system. The configuration with
distributed database management systems with replication (i.e., replication of portions of the
database) provides a flexible environment, which can be adjusted to the data access patterns and
dataset characteristics.
Applying the computing infrastructure and software stack, such as MapReduce[70], from
the domain of enterprise data analysis to pathology image analysis has potential to pave the way
for efficient, cost-effective solutions as well. A recent work[71] has demonstrated the
implementation using Hadoop [72, 73] of spatial query processing in analytical digital pathology
as illustrated in Figure 2. The implementation provides a declarative query language and an
efficient real-time spatial query engine with dynamically built spatial indexes to support query
processing on clusters with multi-core CPUs. Processing of a query is accomplished in several
steps: i) Analysis results with spatial boundaries are retrieved by the query engine, and R*-tree
12
indices are built on the fly; ii) Initial spatial filtering (spatial join) is done through minimal
bounding boundary (MBR) based on the entries in the R*-tree indices; and iii) computational
geometry algorithms for query refinement and spatial measurement are performed to generate the
final results. The last step dominates the cost of query execution, and the first two steps
effectively filter non-intersecting spatial objects to minimize geometry computations. To parallel
spatial data processing, data is partitioned based on image tiles, which form natural units for
MapReduce based execution. The data is staged on HDFS[72] (Figure 2a), and the map function
forms tasks based on keys - tiles, and the reduce function calls the query engine to execute
spatial joins (Figure 2b). Support of feature queries are based on Hive[73], which provides a
SQL like query language and supports major aggregation queries running on MapReduce. To
provide an integrated query language, Hive can be extended to support both feature queries and
spatial queries. The MapReduce based approach not only provides high performance spatial
queries on cost-effective clusters, but also makes it convenient for users to write queries.
Tile
…
Boundary files
Merged boundary file
record(imageID, tileID, boundary)
Copied to HDFS
…
HDFS
Result of Algorithm 1Result of Algorithm 2
Algorithm results
Image
Data Scan Data Scan Data Scan
MAP
(one reducer per tile)
Spatial Join
REDUCE
…
…
…
(a) HDFS data staging
(b) MapReduce based queries
Figure 2. MapReduce based query processing for result comparison.
13
In addition to spatial query support, semantic query support is also needed, because annotations
on objects may draw from domain ontologies (e.g., cell ontology to describe different cell types,
genome ontology to represent genomic characteristics), creating a semantically rich environment.
An important aspect of semantic information systems is the fact that additional assertions (i.e.,
annotations and classifications) can be inferred from initial assertions (also called explicit
assertions) based on the ontology and the semantics of the ontology language. This facilitates a
more comprehensive mechanism for exploration of experiment results in the context of domain
knowledge. Query execution and on-the-fly computation of assertions may take too long on a
single processor machine. Pre-computation of inferred assertions, also referred to as the
materialization process, can reduce the execution of subsequent queries. Combined use of
semantic stores[74-76] and rule engines[77] can offer a repository system capable of evaluating
spatial predicates and rules. In such a system, the rule engine and the semantic store/inference
engine interact to compute inferred assertions based on the ontology in the system, the set of
rules, and the initial set of explicit assertions (annotations). Rules that utilize spatial relationships
might generate new instances of ontological concepts based on the evaluation of the rules.
Execution strategies leveraging high-performance parallel and distributed machines can reduce
execution times and speed up the materialization process for very large datasets[78, 79]. One
possible strategy is to employ data parallelism by partitioning the space in which the spatial
objects are embedded. Another parallelization strategy is to partition the ontology axioms and
rules, distributing the computation of axioms and rules to processors. This partitioning would
enable processors to evaluate different axioms and rules in parallel. Inter-processor
communication might be necessary to ensure correctness. This parallelization strategy attempts
14
to leverage axiom-level parallelism and will benefit applications where the ontology contains
many axioms with few dependencies. A third possible strategy is to combine the first two
strategies with task-parallelism. In this strategy, N copies of the semantic store engine and M
copies of the rule engine are instantiated on the parallel machine. The system coordinates the
exchange of information and the partitioning of workload between the semantic store engine
instances and the rule engine instances. The numbers N and M will depend on the cost of the
inference execution as well as the partitioning of the workload based on spatial domain and/or
ontology axioms.
Discussion and Conclusions High-resolution, high-throughput instruments are being employed routinely not only in medical
science, but also in health care delivery settings at an accelerating rate. As this decade
progresses, significant advances in medical information technologies will transform very large
volumes of multi-scale, multi-dimensional data into actionable information to drive the
discovery, development, and delivery of new mechanisms of preventing, diagnosing, and healing
complex disease. Data produced by advances in digitization and image analysis are outpacing the
storage and computation capacities of workstations and small cluster systems. The “big data”
from image analysis has similar high performance and scalability requirements as enterprise
healthcare data, but presents unique challenges. In the future, even medium scale hospitals and
research projects will require capabilities to manage thousands of high-resolution images,
execute and manage interrelated analysis pipelines, and query trillions of microscopic objects
and their features. These applications demand fast loading and query response, as well as
declarative query interfaces for high usability.
Computational systems with multiple levels of computing and memory hierarchies, such
15
as high performance computing systems consisting of multi-core CPUs and multiple GPUs and
composed of multiple levels of coupled spinning drives and SSDs in RAID configurations, are
becoming mainstream configurations, replacing more traditional homogeneous computing
clusters. These systems offer tremendous computing power and low-latency and high-throughput
I/O capabilities. Many challenges, however, remain for the effective use of these new
technologies. Novel storage, indexing, data staging, and scheduling techniques and middleware
support are needed to manage storage hierarchies in tandem with executing computations on
heterogeneous systems of CPU-GPU nodes. There have been substantial advances in network
switches and networking protocols for intra-cluster communications. Technologies such as
Infiniband provide low-latency, high-bandwidth communication substrates. However, progress
in wide-area networking has been relatively slow. While multi-Gigabit networks are becoming
more widely deployed within institutions, access to remote resources is still hindered by slow,
high-latency networks. Efficient compression, progressive data transmission, and intelligent data
caching and computation reuse methods will continue to play critical roles in enabling digital
pathology and scientific collaborations involving large pathology image datasets.
Acknowledgments. This work was supported in part by SAIC/NCI Contract No. HHSN261200800001E and N01-CO-12400 from the National Cancer Institute, R24HL085343 from the National Heart Lung and Blood Institute, by Grants 1R01LM011119-01 and R01LM009239 from the National Library of Medicine, RC4MD005964 from National Institutes of Health, PHS Grant UL1RR025008 from the Clinical and Translational Science Awards program, and P20 EB000591 by the Biomedical Information Science and Technology Initiative program. References 1. Saltz, J., et al., Multi-Scale, Integrative Study of Brain Tumor: In Silico Brain Tumor
Research Center. Proceedings of the Annual Symposium of American Medical Informatics Association 2010 Summit on Translational Bioinformatics (AMIA-TBI 2010), San Francisco, LA., 2010.
2. Cooper, L.A., et al., An integrative approach for in silico glioma research. IEEE Trans Biomed Eng, 2010. 57(10): p. 2617-21.
3. Cooper, L.A.D., et al., Morphological Signatures and Genomic Correlates in
16
Glioblastoma, in International Symposium on Biomedical Engineering2011, IEEE: Chicago.
4. Mayall, B.H. and M.L. Mendelsohn, Deoxyribonucleic acid cytophotometry of stained human leukocytes. II. The mechanical scanner od CYDAC, the theory of scanning photometry and the magnitude of residual errors. J Histochem Cytochem, 1970. 18(6): p. 383-407.
5. Eide, T.J., I. Nordrum, and H. Stalsberg, The validity of frozen section diagnosis based on video-microscopy. Zentralbl Pathol, 1992. 138(6): p. 405-7.
6. Eide, T.J. and I. Nordrum, Frozen section service via the telenetwork in northern Norway. Zentralbl Pathol, 1992. 138(6): p. 409-12.
7. Kaplan, K.J., et al., Use of robotic telepathology for frozen-section diagnosis: a retrospective trial of a telepathology system for intraoperative consultation. Mod Pathol, 2002. 15(11): p. 1197-204.
8. Nordrum, I., et al., Remote frozen section service: a telepathology project in northern Norway. Hum Pathol, 1991. 22(6): p. 514-8.
9. Fey, E.G. and S. Penman, The morphological oncogenic signature. Reorganization of epithelial cytoarchitecture and metabolic regulation by tumor promoters and by transformation. Dev Biol (N Y 1985), 1986. 3: p. 81-100.
10. Weinstein, R.S., K.J. Bloom, and L.S. Rozek, Telepathology and the networking of pathology diagnostic services. Arch Pathol Lab Med, 1987. 111(7): p. 646-52.
11. Weinstein, R.S., et al., Overview of telepathology, virtual microscopy, and whole slide imaging: prospects for the future. Hum Pathol, 2009. 40(8): p. 1057-69.
12. Williams, S., et al., Telepathology for patient care: what am I getting myself into? Adv Anat Pathol, 2010. 17(2): p. 130-49.
13. Rojo, M.G., et al., Critical comparison of 31 commercially available digital slide systems in pathology. Int J Surg Pathol, 2006. 14(4): p. 285-305.
14. Wilbur, D.C., et al., Whole-slide imaging digital pathology as a platform for teleconsultation: a pilot study using paired subspecialist correlations. Arch Pathol Lab Med, 2009. 133(12): p. 1949-53.
15. Gilbertson, J.R., et al., Primary histologic diagnosis using automated whole slide imaging: a validation study. BMC Clin Pathol, 2006. 6: p. 4.
16. Afework, A., et al., Digital dynamic telepathology--the Virtual Microscope. Proc AMIA Symp, 1998: p. 912-6.
17. Catalyurek, U., et al., The Virtual Microscope. IEEE Transactions on Information Technology in Biomedicine, 2003. 7(4): p. 230--248.
18. Ferreira, R., et al., The Virtual Microscope. Proc AMIA Annu Fall Symp, 1997: p. 449-53.
19. Balis, U.J., Telemedicine and telepathology. Clin Lab Med, 1997. 17(2): p. 245-61. 20. Dziegielewski, M., G.M. Velan, and R.K. Kumar, Teaching pathology using 'hotspotted'
digital images. Med Educ, 2003. 37(11): p. 1047-8. 21. Farah, C.S. and T. Maybury, Implementing digital technology to enhance student
learning of pathology. Eur J Dent Educ, 2009. 13(3): p. 172-8. 22. Furness, P.N., The use of digital images in pathology. J Pathol, 1997. 183(3): p. 253-63. 23. Guzman, M. and A.R. Judkins, Digital pathology: a tool for 21st century
neuropathology. Brain Pathol, 2009. 19(2): p. 305-16. 24. Leong, F.J. and A.S. Leong, Digital imaging in pathology: theoretical and practical
17
considerations, and applications. Pathology, 2004. 36(3): p. 234-41. 25. Marchevsky, A.M., et al., Storage and distribution of pathology digital images using
integrated web-based viewing systems. Arch Pathol Lab Med, 2002. 126(5): p. 533-9. 26. Saltz, J.H., Digital pathology--the big picture. Hum Pathol, 2000. 31(7): p. 779-80. 27. Yang, L., et al., Virtual microscopy and grid-enabled decision support for large-scale
analysis of imaged pathology specimens. IEEE Trans Inf Technol Biomed, 2009. 13(4): p. 636-44.
28. Zheng, L., et al., Design and analysis of a content-based pathology image retrieval system. IEEE Trans Inf Technol Biomed, 2003. 7(4): p. 249-55.
29. Hadida-Hassan, M., et al., Web-based telemicroscopy. J Struct Biol, 1999. 125(2-3): p. 235-45.
30. Gurcan, M.N., et al., Histopathological Image Analysis: A Review. IEEE Rev Biomed Eng, 2009. 2: p. 147-171.
31. Gurcan, M.N., et al., Computerized pathological image analysis for neuroblastoma prognosis. AMIA Annu Symp Proc, 2007: p. 304-8.
32. Yang, L., et al., High throughput analysis of breast cancer specimens on the grid. Med Image Comput Comput Assist Interv Int Conf Med Image Comput Comput Assist Interv, 2007. 10(Pt 1): p. 617-25.
33. Kumar, V., et al., An Integrated Framework for Parameter-based Optimization of Scientific Workflows (accepted for publication), in The 18th International Symposium on High Performance and Distributed Computing (HPDC 2009)2009: Germany.
34. Kumar, V., et al., Large-scale biomedical image analysis in grid environments. IEEE Transactions on Information Technology in Biomedicine, 2008. 12(2): p. 154-161.
35. Beynon, M., et al., Distributed Processing of Very Large Datasets with DataCutter. Parallel Computing, 2001. 27(11): p. 1457-2478.
36. Meilander, D., et al., Parallel medical image reconstruction: from graphics processing units (GPU) to Grids. Journal of Supercomputing, 2011. 57(2): p. 151-160.
37. Ying, Z.G., et al., Parallel fuzzy connected image segmentation on GPU. Medical Physics, 2011. 38(7): p. 4365-4371.
38. Huang, M.C., F. Liu, and E.H. Wu, A GPU-based matting Laplacian solver for high resolution image matting. Visual Computer, 2010. 26(6-8): p. 943-950.
39. Shams, R., et al., A Survey of Medical Image Registration on Multicore and the GPU. Ieee Signal Processing Magazine, 2010. 27(2): p. 50-60.
40. Abramov, A., et al., Real-Time Image Segmentation on a GPU. Facing the Multicore-Challenge: Aspects of New Paradigms and Technologies in Parallel Computing, 2010. 6310: p. 131-142.
41. Singhal, N., I.K. Park, and S. Cho, Implementation and Optimization of Image Processing Algorithms on Handheld Gpu. 2010 Ieee International Conference on Image Processing, 2010: p. 4481-4484.
42. Zhang, N., J.L. Wang, and Y.S. Chen, Image Parallel Processing Based on GPU. 2nd Ieee International Conference on Advanced Computer Control (Icacc 2010), Vol. 3, 2010: p. 367-370.
43. Herout, A., et al., GP-GPU Implementation of the "Local Rank Differences" Image Feature. Computer Vision and Graphics, 2009. 5337: p. 380-390.
44. Allusse, Y., et al., GpuCV: A GPU-Accelerated Framework for Image Processing and
18
Computer Vision. Advances in Visual Computing, Pt Ii, Proceedings, 2008. 5359: p. 430-439.
45. Xu, Z.P. and W.B. Xu, GPU in texture image processing. Dcabes 2006 Proceedings, Vols 1 and 2, 2006: p. 380-383.
46. Schmeisser, M., et al., Parallel, distributed and GPU computing technologies in single-particle electron microscopy. Acta Crystallographica Section D-Biological Crystallography, 2009. 65: p. 659-671.
47. Crookes, D., et al., Gpu Implementation of Map-Mrf for Microscopy Imagery Segmentation. 2009 Ieee International Symposium on Biomedical Imaging: From Nano to Macro, Vols 1 and 2, 2009: p. 526-529.
48. Tan, G.M., et al., Single-particle 3D Reconstruction from Cryo-Electron Microscopy Images on GPU. Ics'09: Proceedings of the 2009 Acm Sigarch International Conference on Supercomputing, 2009: p. 380-389.
49. Hartley, T.D.R., et al., Investigating the Use of GPU-Accelerated Nodes for SAR Image Formation. 2009 Ieee International Conference on Cluster Computing and Workshops, 2009: p. 663-670.
50. Ruiz, A., et al., Pathological image analysis using the GPU: Stroma classification for neuroblastoma. 2007 Ieee International Conference on Bioinformatics and Biomedicine, Proceedings, 2007: p. 78-85.
51. Teodoro, G., et al., Coordinating the Use of GPU and CPU for Improving Performance of Compute Intensive Applications. 2009 Ieee International Conference on Cluster Computing and Workshops, 2009: p. 437-446.
52. Coutinho, B.R., et al. Profiling general purpose gpu applications. in Computer Architecture and High Performance Computing, 2009. SBAC-PAD'09. 21st International Symposium on. 2009. IEEE.
53. Saltz, J.H., et al., Feature-based analysis of large-scale spatio-temporal sensor data on hybrid architectures. International Journal of High Performance Computing Applications, 2013. 27(3): p. 263-272.
54. Teodoro, G., Efficient Execution of Dataflows on Parallel and Heterogeneous Environments. Distributed Computing Innovations for Business, Engineering, and Science, 2012: p. 1.
55. Teodoro, G., et al. Run-time optimizations for replicated dataflows on heterogeneous environments. in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 2010. ACM.
56. Teodoro, G., et al., Optimizing dataflow applications on heterogeneous environments. Cluster Computing, 2012. 15(2): p. 125-144.
57. Teodoro, G., et al. Accelerating large scale image analyses on parallel, CPU-GPU equipped systems. in Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International. 2012. IEEE.
58. Teodoro, G., et al., A Fast Parallel Implementation of Queue-based Morphological Reconstruction using GPUs. Technical Report, CCI-TR-2012-2, Center for Comprehensive Informatics, Emory University., 2012.
59. Teodoro, G., et al., High-throughput execution of hierarchical analysis pipelines on hybrid cluster platforms. arXiv preprint arXiv:1209.3332, 2012.
60. Ü. V. Çatalyürek, R.F., T. D. R. Hartley, R, George Teodoro, Data Flow Frameworks for Emerging Heterogeneous Architectures and Their Application to Biomedicine, 2010.
19
61. Goldberg, I.G., et al., The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and quantitative analysis in biological imaging. Genome Biol, 2005. 6(5): p. R47.
62. Foran, D.J., et al., ImageMiner: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology. Journal of the American Medical Informatics Association : JAMIA, 2011. 18(4): p. 403-15.
63. Wang, F., et al., A data model and database for high-resolution pathology analytical image informatics. Journal of pathology informatics, 2011. 2: p. 32.
64. Martone, M.E., et al., A cell-centered database for electron tomographic data. J Struct Biol, 2002. 138(1-2): p. 145-55.
65. Martone, M.E., et al., The cell-centered database: a database for multiscale structural and protein localization data from light and electron microscopy. Neuroinformatics, 2003. 1(4): p. 379-95.
66. DICOM. Digital Imaging and Communications in Medicine. 2011 [cited 2011 May]; Available from: http://medical.nema.org/.
67. Kumar, V., et al., Architectural Implications for Spatial Object Association Algorithms. the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 09), Rome, Italy, 2009.
68. Gray, J., M. Nieto-Santisteban, and A. Szalay, The zones algorithm for finding points-near-a-point or cross-matching spatial datasets. The ACM Computing Research Repository (CoRR), abs/cs/0701171, 2007.
69. Becla, J., et al., Organizing the extremely large LSST database for real-time astronomical processing. 17th Annual Astronomical Data Analysis Software and Systems Conference (ADASS 2007), London, England., 2007.
70. Dean, J. and S. Ghemawat, MapReduce: Simplified data processing on large clusters. USENIX Association Proceedings of the Sixth Symposium on Operating Systems Design and Implementation (OSDE '04), 2004: p. 137-149.
71. Wang, F., et al., Hadoop-GIS: A High Performance Query System for Analytical Medical Imaging with MapReduce. Technical Report, CCI-TR-2011-3, Center for Comprehensive Informatics, Emory University. , 2011.
72. Shvachko, K., et al., The Hadoop Distributed File System. 2010 Ieee 26th Symposium on Mass Storage Systems and Technologies (Msst), 2010.
73. Thusoo, A., et al., Hive - A Petabyte Scale Data Warehouse Using Hadoop. 26th International Conference on Data Engineering Icde 2010, 2010: p. 996-1005.
74. Wilkinson, K., et al., Efficient RDF storage and retrieval in Jena2. Proceedings of VLDB Workshop on Semantic Web and Databases, 2003: p. 131-150.
75. Broekstra, J., A. Kampman, and F. van Harmelen, Sesame: A generic architecture for storing and querying RDF and RDF schema. International Semantic Web Conference, Lecture Notes in Computer Science, 2002(2342): p. 54--68.
76. Kiryakov, A., D. Ognyanov, and D. Manov, OWLIM - A pragmatic semantic repository for OWL. WISE Workshops, volume 3807 of Lecture Notes in Computer Science, 2005: p. 182-192.
77. Hill, E.F., Jess in Action: Java Rule-Based Systems. 2003: Manning Publications Co., Greenwich, CT, USA.
78. Narayanan, S., et al., Parallel Materialization of Large ABoxes. the 24th Annual ACM