Cluster theory and competitive advantage: the Torquay surfing experience
Astronomical Databases and Virtual Observatories Surfing the ...
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of Astronomical Databases and Virtual Observatories Surfing the ...
ISYA 28th 2005 Astronomical Databases & VOs
Astronomical Databases and Virtual Observatories
Surfing the Tsunami of Data28th International School for Young Astronomers
Instituto Nacional de Astronomía, Óptica y Electrónica
Tonantzintla, México
Rodolfo H. Barbá
Universidad de La Serena Chile
ISYA 28th 2005 Astronomical Databases & VOs
Astronomy has become immensely datarich
Data sets are measured in Terabytes (1012 bytes) and soon in Petabytes (1015 bytes)
The sky is being systematically surveyed at many wavelengths, with billions of stars, galaxies, quasars, planets and other unexpected monsters detected and measured with an unprecedent level of detail.
These massive data sets are a new empirical foundation for the astronomy of the 21st century, hopefully leading to a new golden era of discoveries (Djorgovski 2002).
This data revolution is based on the great progress in technology, including:
digital imaging (the realm of the astronomy)
processing
storing
accessing the information
The New Astronomy
ISYA 28th 2005 Astronomical Databases & VOs
The New AstronomyMost of scientific measurements are generated in a digital form, and most of the instruments contain digital imaging arrays.
Such devices are based on the same technology governed by the Moore's law (exponential growth), the number of bits in astronomy doubling every 1 to 2 years, while telescope technology develope more slowly (Szalay & Gray 2001)
Total area of 3m+ size telescopes in the world (in m2), and the total number of CCD pixels available to astronomers in Mpx, as function of t. Growth over the last 25 years is a factor of 30 in glass, a factor 3000 in pixels.
ISYA 28th 2005 Astronomical Databases & VOs
The New Astronomy
15 years ago
Microprocessor
Tube, transistor
Mechanical relay
At the moment, the increase of CPU performance and falling prices of harddisks compensate the exponential growth in astronomical data.
Moore's LawPerformance/Price of CPU doubles each 18 months, 100x per decade
The progress in the next 18 months is equivalent to ALL previous progress
New storage = sum of ALL old storage (ever)
New processing = sum of all old processing
ISYA 28th 2005 Astronomical Databases & VOs
The New AstronomyTechnology issues (Gray 2000)
Good news
In the limit, the processing & storage & network is freeProcessing & network is infinitely fast
Bad news
Most of us live in the presentPeople are getting more expensiveManagement/programming cost exceeds hardware costSpeed of light not improvingWide Area Network prices have not changed so much in ten years
ISYA 28th 2005 Astronomical Databases & VOs
The New AstronomyHow are research modes changing?
It is now common for (particularly younger) astronomers to investigate the existence of relevant archival data at the beginning of a project and to evaluate the role that those data might play in their research.
Archives could provide:
all of the data (with an obvious cost advantage), part of the data (complemented by new observations), or valuable feasibility tests, or finder charts.
During their careers, younger astronomers will consume far more data via data archives than their older counterparts.
ISYA 28th 2005 Astronomical Databases & VOs
The New AstronomyThere will also be a boom in multiwavelength astronomy as the archives are populated with wellcalibrated datasets.
The tyranny of the energy band that restricts astronomers to working in their ``native wavelength regime'' will be broken as the requirement for a deep technical understanding of every instrument that one uses is alleviated.
This is all part of the gradual shift induced by space missions, queuemode observing, and data archiving of the technical burden away from the individual scientist, thus freeing that scientist to focus on physics rather than technical considerations.
In summary, we see a quiet revolution occurring among graduate students (USTEDES!) and young astronomers who are not tied to conventional proposal/observing/reduction/analysis routines but are eager to operate in whatever mode is the most scientifically effective.
This will result in the best possible science.
ISYA 28th 2005 Astronomical Databases & VOs
The information
Document (old name: “file”):
FITS, HTML, TeX, Image, PPT, PDF, EPS
Database Table:
Tabsep, Commasep, Excel, OpenOffice, FITS
Group of documents: Tar/zip
Web pages: XML, XDF, XSIL, HTML
XML: Astrores, XSIL table format
we do• Data Request
• Object Catalog• Program/Filter/Query
• Exception/diagnostic report• Service Capability
• Service catalog
And for each one• Semantic definition
• C++ and Java libraries• Browser /editor
• Relationships/Translations• Documentation
ISYA 28th 2005 Astronomical Databases & VOs
From files to services: virtual dataMost web pages are dynamic
e.g. search using Google or Yahoo
This is what I want to do:“Convert these positions to images of the sky”“Give me the result of this SQL query”“Give me the next block from the archive”“Connect these services and do crossmatch”
Needs:Document representations (FITS, Table, HTML)Request formulations (Keyword/value, XML)Request for capabilitiesSession servicesExceptions and diagnosticsPublication and discoveryAuthentication
ISYA 28th 2005 Astronomical Databases & VOs
name= M51x= 10y= 10sky_survey=DSS2_redmime_type=download_gif
http
://arc
hiv
e.e
so.o
rg/d
ss/dss/im
ag
e
Client interface
Request: Keyword/value
Response: image/gif(dynamically generated)
Server
Example of service
ISYA 28th 2005 Astronomical Databases & VOs
XML and VOTableXML, the eXtensible Markup Language, which was released in February 1998 by the W3 Consortium, was designed to permit the exchange of formatted informations over the World Wide Web in providing a standardized framework for the description of both data and metadata.
XML has since then been used in the scientific community for both :the management of information documents such as publications, for instance to ease the process of handling them in different formats ; the modelling and exchange of scientific data
XML is certainly a key tool for metadata management in the future. CDS has lead the Astrores consortium, which defined an XML standard for tabular data in 1999 (Ochsenbein et al. 2000b). ALADIN is fully Astrores/XML compatible, and VizieR results can be retrieved in XML format (among many others) to be interpreted and used by any other service.
ISYA 28th 2005 Astronomical Databases & VOs
XML and VOTable: structured info<From>Antonio Stadivarius</ From><To>Domenico Scarlatti</ To><Date>
<Day>13</ Day><Month>4</ Month><Year>1723</ Year>
</ Date><Body>Io bisogno una appartamento acoglienti a Cremona …</ Body>
13/ 4/ 23
April 1 3, 1 723
17.iv.1723
Separa tion of structure from presentation
The c omputer c an read the doc ument:“Find all memos from April 1723”
XML is transforming the WWW to the WWDatabase
ISYA 28th 2005 Astronomical Databases & VOs
• Documents and data• Human readable, editable, mailable• Can encode many data models• Can encode program too• Many tools
Parsers in Java, C, C++, Perl, Python, ...Browsers and editorsXML databasesStyle sheets, formatting, transformation
• For serialization, mediation, brokers
XML and VOTable: structured info
ISYA 28th 2005 Astronomical Databases & VOs
Topic mapsXML allows for excellent structuring the information, however XML itself is not good enough for the intelligent maneuvering necessary in a large database. Topic maps are an ideal solution for this problem, one can define complex knowledge structures and attribute them as metadata to information resources allowing one to systematically organize knowledge on a variety of data subject such that the retrieval and sharing with other users as easy.
Topic map is a collection of topics linked together by associations between the topics.
ISYA 28th 2005 Astronomical Databases & VOs
XML is a comfortable vehicle for our thoughts and metadata but the real challenge is:
Astronomical information is very heterogeneous
We need to define VOspecific data objects and how they are used
We need consensus more than either software or hardware(and the people is working on...)
XML consensus
ISYA 28th 2005 Astronomical Databases & VOs
each topic is a lecture...
new statistical approaches
multiscale approaches
neural networks (selforiganizing maps)
wavelet decomposition
many etc...
New analysis tools
ISYA 28th 2005 Astronomical Databases & VOs
Automatic extraction and classification of multiobject spectroscopy (Pirzkal et al. 2001).
The new software aXe was recently used to analyze new HST/ACS grism observations of the HDFN. Using this method, 1700 objects were extracted in a matter of minutes, and each extracted spectra could then be quickly examined. This resulted in two supernova being discovered within a matter of hours from when the data was first archived!!!!(Pirzkal & Kerber 2002).
ISYA 28th 2005 Astronomical Databases & VOs
Organising the mesh● Global astrophysics data resources loosely
connected by the Internet
– Observational data archives or repositories
– Derived data products (astronomical catalogs, browse images, video)
– Data analysis packages
– Visualization/presentation packages
– Special services (bibliography, disciplinespecific knowledge bases, directories)
● Requires both Vertical and Horizontal Integration
ISYA 28th 2005 Astronomical Databases & VOs
Organising the meshPath to the Future
● Current (connections via hyperlinks): one to one
● Near Future (connections to multiple DBs all at once, via middleware): one to many
● Long Term (multiple interconnectivity, federated databases): many to many
– Distributed Autonomous data centers
– Intelligent Agents
– Userdefined Profiles and Preferences
– Access via Multiple Interfaces
User
User
User
Middleware
Resources
Resources
ISYA 28th 2005 Astronomical Databases & VOs
Huge surveys on the way
bottleneck : STORAGE
searching/manipulating databases too slow
bottlenecks : I/O and SOFTWARE
want a global archive query system
bottlenecks : NETWORK and SOCIOLOGY and SOFTWARE
massive simulations
bottlenecks : RAM and FLOPs
real time operations
bottlenecks : I/O and FLOPs
Organising the mesh
ISYA 28th 2005 Astronomical Databases & VOs
storage : needs and technology keeping pace
invest in hardware
searching : I/O speed won’t change much
need new database techniques, parallel machines
comms : needs growing faster than network
need remote analysis services at expert data centres
shift the results not the data
hard to keep up with growing list of archives
need global standards and middleware
data, metadata, annotation, exchange protocols...
Organising the mesh
ISYA 28th 2005 Astronomical Databases & VOs
What is the data minig● Data mining is defined as “an information extraction activity whose goal is
to discover hidden facts contained in databases."
● Data mining is used to find find patterns and relationships in data by using sophisticated techniques to build modelsabstract representations of reality.
● There are two main types of data minig models:
✔ Descriptive : Describe patterns and to create meaningful subgroups or clusters.
✔ Predictive : Forecast explicit values, based upon patterns in known results.
● How does this apply to scientific research? ...
through KNOWLEDGE DISCOVERY
ISYA 28th 2005 Astronomical Databases & VOs
What is knowledge discovery● Knowledge discovery refers to “finding out new knowledge about an
application domain using data on the domain usually stored in a database.” (“application domain” may be astrophysics, solar system science, earth science, credit card usage histories, etc. ….)
● In large scientific databases, data mining and knowledge discovery comes in two flavors:
Eventbased mining
Relationshipbased mining
ISYA 28th 2005 Astronomical Databases & VOs
Eventbased minig for science
Eventbased mining is based upon events or trends in data.● Known events / known algorithms use existing physical models
(descriptive models) to locate known phenomena of interest either spatially or temporally within a large database.
● Known events / unknown algorithms use pattern recognition and clustering properties of data to discover new observational (in our case, astrophysical) relationships among known phenomena.
● Unknown events / known algorithms use expected physical relationships (predictive models) among observational parameters of astrophysical phenomena to predict the presence of previously unseen events within a large complex database.
● Unknown events / unknown algorithms use thresholds or trends to identify transient or otherwise unique ("oneofakind") events and therefore to discover new phenomena.
ISYA 28th 2005 Astronomical Databases & VOs
Relationshipbased minig for science
Relationshipbased mining is based on associations.● Spatial associations identify events (astronomical objects) at
the same location in the sky.
● Temporal associations identify events occurring during the same or related periods of time.
● Coincidence associations use clustering techniques to identify events that are colocated within a multidimensional parameter space.
ISYA 28th 2005 Astronomical Databases & VOs
Science requirements for data minig● CrossIdentification refers to the classical problem of connecting the source
list in one catalog (or observational database) to the source list in another, in order to derive new astrophysical understanding of the crossidentified objects.
● CrossCorrelation refers to the application of “what if” scenarios to the full suite of parameters in a database (e.g. identify distant galaxies as Uband dropouts in a colorcolor scatter plot from the HDF survey)
● NearestNeighbor Identification refers to the general application of clustering algorithms in multidimensional parameter space, usually within a database (e.g. finding the closest known population of young stars in the TW Hya assoc. through their similar kinematics, Xray emission, Halpha, and Li abundance).
● Systematic Data Exploration refers to the application of the broad range of eventbased and relationshipbased queries to a database in the hope of making a serendipitous discovery of new objects or a new class of objects (e.g. finding new type of variable stars, such as bumpers, in the MACHO database.
ISYA 28th 2005 Astronomical Databases & VOs
The virtual observatory• Data avalanche in astronomy (multiTB, heading to petabytes)• Data distributed across institutions, agencies, and countries• Data in heterogeneous systems (H/W, S/W, DBMS, metadata)• Data from large areas of the sky at multiple wavelengths• Data from many(!!) instruments, in complex data structures• VO provides seamless, transparent, integrated access to data• VO enables astronomical “observations” and discoveries via remote
access to digital representations of the sky• VO supports a wide range of astronomical explorations• VO enables discovery via new computational & analysis tools
==> distributed DATA MINING
ISYA 28th 2005 Astronomical Databases & VOs
Function of a VO• To facilitate data mining and knowledge discovery within the very
large astronomical databases (the new “terabases”) Requires: – indexing for fast queries, filtering of large queries, data
subsetting, visualization, parallelization (queries, access), ...
• To facilitate linkages and crossarchive investigations utilizing new mission data in conjunction with the rich legacy data archives that preceded them Requires:
– distributed computing, scalable architectures, load balancing, thin middleware layer, interoperability, code libraries, codeshipping, datafinding services, data standards & interchange formats, query/results protocols, data fusion, quality assessment, archive/metadata profiles, user profiles, intelligent agents, …
• To serve a broad community of users (thousands of queries per day): professionals, amateurs, schools, general public
ISYA 28th 2005 Astronomical Databases & VOs
Virtual observatoriesThe IVOA was formed in June 2002
The mission is to facilitate:the international coordination and collaboration necessary for the development and deployment of the tools, systems and organizational structures necessary to enable the internationalutilization of astronomical archives as an integrated and interoperating virtual observatory.
In January 2005, the IVOA has grown to include 15 funded VO projects.
This membership is now being expanded to include representation from projects constructing and planning new observatories and astronomical communities that seek to benefit from the global availability of VO facilities and technologies.
ISYA 28th 2005 Astronomical Databases & VOs
Virtual ObservatoriesVirtual Observatories and related institutions
AVO The Astrophysical Virtual Observatory (ESO), http://www.eso.org/avo/ or http://www.eurovo.org
NVO National Virtual Observatory (USA), http://www.usvo.org
ASTROGRID UK contribution to a Global Virtual Observatory, http://www.astrogrid.org
OPTICON The European Optical and Infrared Coordinate Network, http://www.astroopticon.org/
ISYA 28th 2005 Astronomical Databases & VOs
Supersurveys: SuperCosmos• Digitising UK Schmidt plates
• Entire sky in B,R,I,2nd-R : on-line (www-wfau.roe.ac.uk/sss/)
• H-alpha survey Gal.Plane + LMC, SMC : by 2002-3
• Coordinates, colours, proper motions and morphology for first time, B<22, R<20, I<19
• External errors : posn 0.15” / phot 0.2 mag / proper motion 3-15 mas/yr
• Pixel image and object catalogues in same file
• Catalogues to 10 degrees; search facility on whole sky soon
• Multiple object small pixel images in batch mode
• Final raw data 15TB
• Random patch in seconds but searching whole SGC-cat takes 2 hours…..
ISYA 28th 2005 Astronomical Databases & VOs
Rare object searches
moving object is B=20.6 ...candidate halo WD ... probably one every few degrees (hundreds more in SuperCOSMOS data ..)
ISYA 28th 2005 Astronomical Databases & VOs
Ideas... stellar long term variability
At the present, there are about 60 plates archives in the world,with a total of more than 2 million plates acumulated.
500,000 Harvard Colleague Observatory (USA)270,000 Sonneberg Observatory (Germany)165,000 Charlottesville (USA)110,000 Ondrejov (Czech Rep.)Complete list by Hudec (1999)
Kroll et al. (2002) analised 300 stars in Orion/Taurus/Auriga region a subset of 500 plates obtained between 1960 and 1996 from Sonneberg Plate Archive.They found that more of 50% of those stars have to be regarded as variable(accuracy of about 0.1 mag), some of them show new new types of variability
ISYA 28th 2005 Astronomical Databases & VOs
Ideas... stellar long term variability
Kroll et al. (2002)
ISYA 28th 2005 Astronomical Databases & VOs
• new IR widefield camera for UKIRT (3.8m) •http://www.ukidss.org
• 4 x 2K x 0.4” Rockwell arrays at 94% spacing• Operated by UKIDSS consortium (about 100 astronomers!)
● Five year plan 20052010● 7500 sq.deg. JHK=18.5 IR atlas ● Milky Way atlas to a depth K=19● Large area survey open cluster and star forming regions
to K=18.8● 35 sq.deg. K=21 Deep Survey● 0.77 sq.deg. K=23 Ultra Deep Survey● several passes : variability and proper motion
UKIDSS: UK Infrared Deeps Sky Survey
ISYA 28th 2005 Astronomical Databases & VOs
New 4m wide field telescope in Chileowned by 18 University consortium, under development
1 deg IR field 9 x 2K Rockwell arrays at 90% spacing, with 0.34”to be located next to ESOVLT
in operation by 2006; ESO will have 25% of time.by 2010 raw data ~ 2PB, science data ~300TB
simple minded search would take several months.....Provision for later 2.25 sqdeg optical camera, 0.25”
VISTA: Visible and Infrared Telescope for Astronomy
ISYA 28th 2005 Astronomical Databases & VOs
VISTA: Visible and Infrared Telescope for Astronomy– 25% open time
– 75% preplanned legacy programmes
● suite of key programmes currently being developed by proposal and debate process within VISTA consortium
– Likely to include
● thousands sq.deg. to K~20hundreds sq.deg. to K~21few sq.deg. to K~23intranight monitoring for solar system bodies and exoplanetsnight to night monitoring for supernovae, microlensinglong term monitoring for quasars, variable starsdeep proper motion samples for solar neighbourhood
ISYA 28th 2005 Astronomical Databases & VOs
VISTA: Science drivers– Evolution of Structure
– High Redshift galaxies
– High Redshift supernovae
– Dark matter mapping : weak lensing
– Quasar populations and variability
– Disk and Halo Brown Dwarfs and White Dwarfs
– Trans Neptunian Objects
– Exoplanet transits
– Support of space facilities : XMM IDs25% open time
ISYA 28th 2005 Astronomical Databases & VOs
VISTA surveys
http://www.vista.ac.uk/jifbid/case/index.html
ISYA 28th 2005 Astronomical Databases & VOs
My pretty telescope
Don't worried about the future ...
... it's almost here
ISYA 28th 2005 Astronomical Databases & VOs
My pretty telescope...
TMT: Thirty Meter Telescope
30-m segmented mirror, 20' FOV, 0.25” 80% PSF at 0.3-30 mu
AURA (GSMT)
ACURA (VLOT)
California Univ., CalTech, LLNL (CELT)
Univ. Victoria
The astronomers are now thinking about the next generation of supertelescopes (2010+).There some examples (uncomplete list)
ISYA 28th 2005 Astronomical Databases & VOs
My pretty telescope...
Giant Segment Mirror TelescopeGSMT: http://www.auranio.noao.edu/book/
California Extremely Large TelescopeCELT: http://celt.ucolick.org
ISYA 28th 2005 Astronomical Databases & VOs
My pretty telescope...VOLT: Very Large Optical Telescope, the Next Generation CFHT.A possible design is 20-30 m petal or segmented mirror diffraction limit at 0.7 deg FOV in optical and IR wavelengths
http://www.hia-iha.nrc-cnrc.gc.ca/projects/VLOT_e.html
ISYA 28th 2005 Astronomical Databases & VOs
My pretty telescope...
EURO50: Extremely Large Optical and Infrared TelescopeThe telescope will have a segmented 50-m mirror and monolitic secondary mirror. They will be full, dual-conjugate adaptive optics in the K-band.Resolution: 2-3 mas optical, 10 mas IR Finland, Sweden, Spain, Ireland and United Kingdom
At the same time, the telescope would have a collecting area of about 2000 square meters which is about 25 times larger than the largest telescope existing today (Keck Telescope) and of the same magnitude as the total collecting area of all telescopes built to date. The aim is to complete this telescope in about 10 years from now, yielding maximal synergy with the NGST and the ALMA telescopes, and ensuring competitiveness of European astronomy in the second decade of the century.
http://www.astro.iu.se/~torben/euro50
ISYA 28th 2005 Astronomical Databases & VOs
My pretty telescope...EURO50: Extremely Large Optical and Infrared Telescope
ISYA 28th 2005 Astronomical Databases & VOs
My pretty telescope... or SciFi?OWL: OverWhelmingly Large Telescope,
The European Southern Observatory has undertaken a concept study for the next generation of ground-based Extremely Large Telescopes (ELTs).Dubbed OWL, ESO's concept is conceived as a 100 m. diameter optical and near-infrared, adaptive telescope. With milliarc second resolution and limiting magnitude V~38, OWL will be capable of imaging solar system objects at resolutions comparable to that offered by space probes, over much longer time scales. It will unveil the intricate processes underlying the formation of stellar and planetary systems. It will be able to image exoplanets and determine theirs atmospheres' composition, and thereby, possibly, reveal the existence of biospheres. It will peer into the deepest reaches of the universe and witness the birth of the very fisrt stars and galaxies. It may, eventually, revolutionize our perception of the universe as much as Galileo's telescope did.
ISYA 28th 2005 Astronomical Databases & VOs
My pretty telescope... or SciFi?OWL: OverWhelmingly Large Telescope,
http://www.eso.org/projects/owl
The OWL feasibility study completed in fall 2005
The major technological breakthroughs that made the current generation of 8- to 10-m telescopes possible provide strong confidence that with a sound industrial approach, OWL could be feasible at an affordable cost.
OWL could start to deliver scientific data by 2016, and become fullyoperational in 2020-21.
It is conceived as an difraction-limit 100 m. diameter fully steerable telescope, with milliarc second resolution and limiting magnitude V~38
YOU COULD BE AN FUTURE USER OF THAT MONSTER !!!!!!!!!!!
ISYA 28th 2005 Astronomical Databases & VOs
My pretty telescope... OWL
A crude simulation showing the increase of angular resolution with the telescope size
ISYA 28th 2005 Astronomical Databases & VOs
My pretty telescope... OWL
A crude simulation showing the increase of angular resolution with the telescope size
ISYA 28th 2005 Astronomical Databases & VOs
Expertise people request (and eat) them... yummm!!!!
and me toooo!!!!!
ISYA 28th 2005 Astronomical Databases & VOs
other are lost in the darkness of the databases and they think...