DESY Data Preservation Project - CERN Indico
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of DESY Data Preservation Project - CERN Indico
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
1
DESY Data Preservation ProjectKatarzyna Wichmann
on behalf of DESY Data Preservation Group:
DESY-IT: Y. Kemp, D. OzerovDESY-Library / INSPIRE: Z. Akopov
H1: S. Baghdasaryan, V. Dodonov, S. Levonian, B. Lobodzinski, J. Olsson, D. South (coordinator), Michael Steder*
HERMES: E. Avetisyan, G. SchnellZEUS: A. Ausheva, V. Bokhonov, A. Geiser, J. Malka, KW
* big thanks for help with preparing this talk
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
2
HEP Data Preservation Project● Global DPHEP initiative was launched in 2009 at DESY● DESY Data Preservation Group established soon after
● 17 people involved
● Many HEP data sets are unique and retain their scientific potential● No clear model of long term preservation before DPHEP● Physics cases for data preservation
● Long-term data analysis● Re-using and re-analyzing data● Combining results between experiments● Education, training and outreach
● Four models of preservation are defined:
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
3
DESY Data Preservation Project● Big effort directed into DESY Data Preservation Project
● Digital and non-digital documentation● Software preservation● Data archiving
● Lots of progress in all fields since the beginning of the project● Very fruitful collaboration between H1, HERMES, ZEUS, DESY
Library & Inspire and DESY IT● Regular meetings
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
4
Non-Digital DocumentationGreat care taken to preserve as much as possible of the various documentation collected over years of running of the experiment● Much material exists from pre-web days● All kinds of web applications were used● Requires quite some management, cataloguing and new archives● non-digital documentation sorted out and safely stored in a dedicated
library archive● some part of non-digital documentation digitized
● theses● talks● minutes● log book● internal notes
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
5
Digital DocumentationDigital documentation:● Former online monitoring and shift tools● Web-based documentation, electronic logbooks, presentations in
meetings, minutes...● digital documentation needs to be improved and modernized
➔ missing or unavailable documentation restored ➔ providing new condensated “tutorials” on topics most important for
future analysis
ZEUS Primer: http://www-zeus.desy.de/ZEUS_ONLY/analysis/primer/
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
6
INSPIRE● Inspire: new dedicated effort to create an improved info storage like Spires● Inspire offers many convenient options for digitized documents archiving
● collaborations internal notes submitted to INSPIRE (password protected)
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
7
INSPIRE● Inspire: new dedicated effort to create an improved info storage like Spires● Inspire offers many convenient options for digitized documents archiving
● collaborations internal notes submitted to INSPIRE (password protected)
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
8
INSPIRE● Inspire: new dedicated effort to create an improved info storage like Spires● Inspire offers many convenient options for digitized documents archiving
● collaborations internal notes submitted to INSPIRE (password protected)
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
9
● Inspire: new dedicated effort to create an improved info storage like Spires● Inspire offers many convenient options for digitized documents archiving
● collaborations internal notes submitted to INSPIRE (password protected)
● Other documents from the HERA collaborations are under discussion: preliminary results, theses, conference talks, proceedings, paper history...
Inspire gives unique opportunity to conserve documentation, wikis, news forums and even data outside collaboration resources and keep it available and undisturbed “forever”
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
10
Long-Term Archiving of Web Pages● Most of essential digital information for analysis is stored on web
servers – huge amount of various type of data to be safely stored● Investigating and refining collaboration web pages● All webservers are now running in DESY-IT central environment
● Hardware renewal and failures handled by IT● Consolidation of web pages still requires quite some work
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
11
Analysis Software & Data Preservation
The interface to access and handle the data has to be fully
functional
The integrity of data has to be guaranted (without frequent
user access)
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
12
Data Analysis Models @ DESY● H1
● preservation level 4● full chain from compilation of simulation, reconstruction and
analysis code● full flexibility in the future for data and MC
● HERMES● preservation level 4● ADAMO-based micro-DST files for data and MC
● ZEUS● preservation level between 3&4● data and MC preserved in form of ROOT-based Common Ntuples● in addition maintain the ability of simulation of small samples of
new MC in the future using existing executables
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
13
sp-system - Software Preservation System● Automated validation system to facilitate future software and OS transitions● Clear separation of tasks between IT and experiments
● Utilisation of virtual machines offers great flexibility– OS and configuration is chosen by parameter
● Development system allows interactive coding and debugging ● Computing resources of all experiments end in the next few years● Access to DESY-IT infrastructure given for future analysis & file production● Requires an installation similar to Grid/IT cluster nodes
– No administrative actions, no pre-installed software, no afs access, ...
Computing CentreExperiment
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
14
Example Structure of a Validation Project
Simulation / Reconstruction
Fortran
Analysis Software
h1oo
req
uire
s
ROOT
FastJet
Neurobayes
...
data base snapshot
h1oo snapshot
create
create
create tar ballsof H1 sw
Fortran Executables
h1oo Executables
outside withinsp-system
Physics Analyses
~5x
~40x
~10x
Event Display
HAT/μODS dst2all
DST production h1simrec
1996
2007
1996
2007
all periods
all periods
10-20x10-20x
MC generators to come
Compilation of MC generators
to come
Commonstorage
softwarerepository
currently cvs
needs access to
running VM
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
15
Example Structure of a Validation Project
Simulation / Reconstruction
Fortran
Analysis Software
h1oo
requ
ires
ROOT
FastJet
Neurobayes
...
data base snapshot
h1oo snapshot
create
create
create tar ballsof H1 sw
Fortran Executables
h1oo Executables
outside withinsp-system
Physics Analyses
~5x
~40x
~10x
Event Display
HAT/μODS dst2all
DST production h1simrec
1996
2007
1996
2007
all periods
all periods
10-20x10-20x
MC generators to come
Compilation of MC generators
to come
Commonstorage
softwarerepository
currently cvs
needs access to
running VM
Compilation ofreconstruction andsimulation software(+ external dependencies)
Sequential testing ofa) MC generators b) DST productionc) Analysis Level Data Production
Parallel testing ofa) Physics Analysesb) Eventdisplayc) Tools and Binaries
9 /13
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
16
Example Structure of a Validation Project
Simulation / Reconstruction
Fortran
Analysis Software
h1oo
requ
ires
ROOT
FastJet
Neurobayes
...
data base snapshot
h1oo snapshot
create
create
create tar ballsof H1 sw
Fortran Executables
h1oo Executables
outside withinsp-system
Physics Analyses
~5x
~40x
~10x
Event Display
HAT/μODS dst2all
DST production h1simrec
1996
2007
1996
2007
all periods
all periods
10-20x10-20x
MC generators to come
Compilation of MC generators
to come
Commonstorage
softwarerepository
currently cvs
needs access to
running VM
Compilation
10/13
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
17
Example Structure of a Validation Project
Simulation / Reconstruction
Fortran
Analysis Software
h1oo
requ
ires
ROOT
FastJet
Neurobayes
...
data base snapshot
h1oo snapshot
create
create
create tar ballsof H1 sw
Fortran Executables
h1oo Executables
outside withinsp-system
Physics Analyses
~5x
~40x
~10x
Event Display
HAT/μODS dst2all
DST production h1simrec
1996
2007
1996
2007
all periods
all periods
10-20x10-20x
MC generators to come
Compilation of MC generators
to come
Commonstorage
softwarerepository
currently cvs
needs access to
running VM
ValidationCompilation
10/13
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
18
Running jobs in the sp-system
validation step
running step
build step
tgz
${ID}
sourcecode
Initial step● Compilation of analysis (level 3) and sim/rec
(level 4) software ● OR: use tar-balls of pre-compiled software ● Provide access to software
Copy tar-balls to persistent storage● All output kept in directory with unique name
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
19
Running jobs in the sp-system
validation step
running step
build step
persistent
Initial step● Compilation of analysis (level 3) and sim/rec
(level 4) software ● OR: use tar-balls of pre-compiled software ● Provide access to software
Copy tar-balls to persistent storage● All output kept in directory with unique name
Run tests in parallel● Set up software environment● Validate binaries with persistent input
e.g. event display, DB access, ...
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
20
Running jobs in the sp-system
validation step
running step
build step
persistent
tmp
Initial step● Compilation of analysis (level 3) and sim/rec
(level 4) software ● OR: use tar-balls of pre-compiled software ● Provide access to software
Copy tar-balls to persistent storage● All output kept in directory with unique name
Run tests in parallel● Set up software environment● Validate binaries with persistent input
e.g. event display, DB access, ...Run sequential tests
● Set up software enviroment● Validate file production1. MC generation (-> generator files)2. Reconstruction (gen. Files -> DSTs)3. Analysis level (DSTs -> RooT files)● Test use output of previous test as input
-> Results remain accessible or can be reproduced with identical results
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
21
Bookkeeping of Validation Results● Display validation results in a comprehensible way● Provide links to additional information
– plots, root files,…● Similar task for all collaborations
– Profit by synergies● Allow different levels of complexity
11/13
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
22
Example of Validation of Physics Analysis
● Test stability of analysis against changing software environment● Validate results of ZEUS Z0 analysis● Check access to data/MC in common ntuple format (very important)
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
23
Example of Validation of Physics Analysis
● Test stability of analysis against changing software environment● Validate results of ZEUS Z0 analysis● Check access to data/MC in common ntuple format (very important)● Check results of the Z0 analysis (event list, cross section, acceptance
& invariant mass calculations) against various possible future changes● 32 – 64 bit machines● New ROOT versions● Speed● New data access schemes● New operating systems● ...
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
24
Status of Experiments’ Software
ok
ongoing
to be done
problem
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
25
- All software compiles in SL5/64bit without problems
- DST and analysis level production started - Implementation of validation scripts for full chain ongoing - First tests for other binaries implemented, e.g. db integrity
ok
ongoing
to be done
problem
Status of H1 Software
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
26
● HERMES: - All experiment software successfully compiled - Validation of results ongoing (small differences wrt SL3
spotted) - Production of Adamo based μDST require cernlib2005@64bit - Archival mode has to be used already from 2013 on
ok
ongoing
to be done
problem
Status of HERMES Software
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
27
● ZEUS: - Pre-compiled SL5/32bit software runs without problems - Validation of stand-alone MC and common ntuple production
started - First physics validation scripts implemented
ok
ongoing
to be done
problem
Status of ZEUS Software
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
28
Status of Experiments’ Software● All experiments run software inside sp-system
– Reference OS for experiments is SL5/32bit
● All experiments plan to finalise validation scheme on SL5 this year● Very different requirements - flexibility and scalability of the system● Validation of 64-bit systems: major step towards migrations to future OS ● Next step will be migration to SL6● sp-system has already proven to be useful
– Database snapshot file production was corrupted (fixed)– Bug in analysis level filling code has been identified (fixed)
ok
ongoing
to be done
problem
K. Wic hm
a nn , 2 4 .05 .1 2, C HE P 20 12 D
ES Y Dat a Pr es er va ti on P ro je ct
29
Summary● DESY Data Preservation Group very active
● Very good collaboration between the experiments, DESY library,
Inspire and DESY-IT
● (Non-)digital documentation ● Huge effort has been made by DESY library and experiments
to find, digitise and catalog all documents
● Unique possibilities given by Inspired pursued by DESY collaborations
● Software preservation system● All three experiments use the sp-system and its development system● Successful compilation and/or running of experiment’s software● Necessary tests have been identified, implementation is in full swing