Anomaly detection with Machine Learning at the LHC - CERN ...
-
Upload
khangminh22 -
Category
Documents
-
view
7 -
download
0
Transcript of Anomaly detection with Machine Learning at the LHC - CERN ...
AnomalydetectionwithMachineLearningattheLHCJudita Mamužić
IFIC / CSIC - University of Valencia 28 April 2021, I Workshop de Computing y Software de la Red Española de LHC
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
Contents
2
•Introduction
•Anomalydetection
•Proofofconceptexamples
•Exampleinphysicsanalyses
•Extendedandfuturetopics
•Discussion
Thisoverviewrepresentsapersonalchoiceofinterestingtopics.Asthefieldisdevelopingveryfast,newexcitingtopicsarisedaily.
IncollaborationwithRobertoRuiz,manythanksforusefulinput!
Mainresources:•TheLHCOlympics2020•DarkMachinesChallenge-UnsupervisedLearning(inpreparation)
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
Introduction:SearchesforNewPhysics
3
•NosignificantsignofnewphysicsinanyofthesearchesatATLASandCMS.•Highnumberofmodelsandsignaturesconsidered.•Withincreasingcomputingpower,exploredatainnewways(MachineLearning).
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
Introduction:AnomalyDetection
4
•Anomaly-anydeviationfromtheexpectation.•AnynewphysicsisadeviationfromtheStandardModelexpectation,andrepresentsananomaly.•Conventionalsearches:
•Defineasearchregiondedicatedtoa(oneoragroup)modelassumptions.•Optimisesensitivityforsignaltobackgroundratio(cut-and-count,multi-binfit,machinelearning).•DeterminebackgroundusingMonteCarloand/ordata-drivenbackgrounddeterminationusingcontrolandvalidationregions.•Lookforanexcessinthedata(statisticalinterpretation).
•Changeofparadigm:•Lookforanyanomalyasanydeviationfromthestandardmodel(modelagnostic,consideralargevarietyofmodelsinasingleanalysis).
m
No.
eve
nts
Signal
Background
MET
No.
eve
nts
SignalBackground
q
q
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
Introduction:AnalysesattheLHC
5
•Withincreasingcomputingpower,MachineLearningtechniquesgainingmomentum.•Machinelearningbeingappliedtoalltypesofanalyses.•Modelagnosticsearchesfornewphysicsshouldbedoneinadditiontoclassicalsearches.
Combined PerformanceTrigger
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
Introduction:ModelIndependence
6
•AimtohaveleastmodeldependenceonMonteCarloforbothsignalandbackground(analysisdependent).•Background:
•CanbedeterminedfromMonteCarlo•Usingcontrolregionsforscaling•UsingABCDmethod•Usingdatadrivenmethods.
•Signal:•ClassicalsearchestrainusingMCsimulations•Someanalysestrainsignalvsdata(rare)•Generalsearchesmakelittle/noassumptiononthesignal,butrelyonMCforbackgrounddetermination.•AnomalydetectionMLalgorithmsaimtohaveleastdependenceonsignalassumptionsandMonteCarlobackground.
arXiv:2010.14554
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
AnomalyDetection:GeneralSearches(noML)
7
•Searchfornewphysicsina(quasi)modelindependentway:
•Conventionalsearcheshaveabiasontheselectionduetothemodelassumption(SUSY,exotics)•Fillinthegapifwemissedsomeselection.
•Eventsarefirstclassifiedin704finalstatescategories,consideringelectrons,muons,b-taggedjets,non-b-taggedjets,photonsandMET(type,multiplicity,pt).•UseMCtopredictexpectedbackground.AuxiliarymeasurementsforCRandVR.•Scanondifferentvariables(e.g.Meff)andlookforstatisticallysignificantdeviation.•Calculateprobabilityoffindingthisp-valusingallbinsandallcategories,usingtoys.•ResultsconsistentwiththeSMexpectation.
arXiv:1807.07447
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
AnomalyDetection:LevelofsupervisionforML
8
Semi-supervised:•Labelsofsignalandbackgroundforsomeevents•UsesignalMCtobuildsignalsensitivity,butnotfortesting.
WeaklySupervised:•Labelsarenoisy•Comparetwodatasetswithdifferentamountsofpotentialsignal•Insteadofsignalandbackgroundusepossibly-signal-enrichedandpossibly-signal-depleted.
Unsupervised:•Nolabelinformation•Learndirectlyfrombackgrounddominateddata.
•Supervision-leveloflabelinformation(e.g.signalandbackground)giventothemachinelearningalgorithmduringtraining.
SupervisedMLmethods:•Labelsofsignalandbackgroundforallevents.
Levelofsupervision
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
AnomalyDetection:Semi-supervised
9
DeepEnsembleAnomalyDetection•Useslabeledsignalandbackground•Hybridsolutionusing:
•ConvolutionalNN,inputrawdata2Dimagesofunclusteredparticlesin
theevent•BDT,inputCNNandkinematicvariablesoffatjets
•~5%increaseBDTclassificationpower.•Methodgoodinidentifyingnewphysicsevents,butnotsogoodinestimatingtheirmass.•Methodnotsomodelindependent.
η − ϕ
arXiv:2101.08320
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
AnomalyDetection:Weakly-supervised
10
Classificationwithoutlabels(CWoLa):•Insteadofusingsignalandbackground,usemixedsampleswithdifferentproportionofsignal.•AlleventsinmixedsampleM0labeled0,allsamplesinmixedsampleM1labeled1.•TraintheclassifiertodistinguishM0fromM1,thenitwillbealsooptimaltodistinguishsignalfrombackground.•Assumptions:
•Mixedsamplesshouldbestatisticallyidenticalasidefromdifferentclassproportion.
•Physicsanalysisperformedusingasearchfordi-jetresonance.•Trainclassifiersdirectlyondata.
arXiv:2010.14554
M0 M1
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
PhysicsAnalyses:ATLAS
11
•ATLASdi-jetresonancesearchusingweaksupervision.•FeatureforMLaremassesoffirsttwojets,bumphunt.•Genericsearch:A->BC(tau-leptons,b-quarks,top-quarks,vectorbosons,Higgsbosonandasymmetricdecays),smalltrialfactor.•6SRswithsidebands,NNdifferentforeachSR•NNabletodetecttheinjectedsignal.•Strongestlimitsfordi-jetevents.
q
qABC
Phys.Rev.Lett.125(2020)13
Low-efficiency(signal-like)regionsoftheNNarelocalisedneartheinjectedsignal.
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
AnomalyDetection:Unsupervised
12
Autoencoder•Performanomalydetectioninhighdimensionalparameterspace.•DeepNeuralNetworkwithatasktolearntheidentityofamap.•Enforcescompressioninthelatentspace.•TrainingperformedonQCD,reconstructionerrorlargeforsignalordifferentsample.•Usesomemetrictocompareinputandoutput(reconstructionerror),andperformclassification.
arXiv:1808.08992
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|13
UnsupervisedComparison:DarkMachinesChallenge•Inspectunsupervisedanomalydetectionalgorithms.•ResultswillbepublishedsoononarXiv.
DarkMachineschallenge:•UsepublicdatasetatarXiv:2002.12220section23.•Developandtrainthealgorithms(classificationdoneonevent-by-eventbasis,nodensities)todefineananomalyscore,withinDarkMachineseffort.•Consideringanumberofunsupervisedlearningmethods.•Firststudyallmethodsusingknownnewphysicssignals.•Comparemethodsagainsta“secret”dataset.
•Aimtoiteratewiththepublic,invitepublictotraintheiralgorithmsandcompare.•Convergeonanoptimal,globalmodel-independentunsupervisedmethodtobeappliedone.g.ATLAS/CMSdata.
StrongcontributiontoopendataMCgenerationandcalculationusingArtemisacomputingfacility.
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
UnsupervisedComparison:DarkMachinesChallenge
14
Publicdatasetforthechallenge:•https://zenodo.org/record/3685861•SamplesincludeMadGraph+Pythia8atLOeventgeneration,andDelphes(ATLAScard)forsimulation,preselectionusingobjectthatcouldbetriggered.•O(1B)eventsinCSVformat•Includes:
•Backgroundandsignal(Z’,stop,squarks,gluinos(RPCandRPV),charginos,variousmasspoints)•Datagroupedin4channelswithdifferentpreselection:
1)HT>600GeV,MET>200GeV,MET/HT>0.2,atleast4jetswithpT>200,50,50,50GeV2.a)MET>50GeVandatleast3leptonswithpT>15GeV2.b)MET>50GeV,HT>50GeVandatleast2leptonswithpT>15GeV3)HT>600GeV,MET>100GeV
SM Processes
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|15
UnsupervisedComparison:PreliminaryResults#1,#2
Inspectthesignal:==>Notallanomaliesareanomalous,AreaUndertheCurve(AUC)around0.5forchargino-neutralinoproductionmodels,whileveryhighforgluinopairproduction.
Inspecttheamountoftrainingdata(channel/colour):==>Systematicunderperformanceforalgorithmsthattrainonsmallamountofdata,e.g.channel2a-verytightselection(resultsintightcutsandlowstatistics)
ROCcurve
ϵS 1
1-ϵ B
1 Excellent
Poor
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|16
UnsupervisedComparison:PreliminaryResults#3,#4,#5Inspectbestperformingmethodbasedondifferentscore:==>Differentoptimalmethodfordifferentchoiceofscore(e.g.overallbestmedianvsabovethreshold).
InspectAUC(AreaUnderROCCurve):==>AUCisnotthebestmeasurewhenconsideringlowsignalefficiencyregioninROC.
Inspecttightbackgroundcuts:==>Nomethodwasabletofindanysignalwithtightbackgroundcuts.
==>Clearpotentialofstudiedalgorithmsandpossibilitytostudyhybridapproaches.Noone-size-fits-all,methodsneedtobestudied.
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
AnomalyDetection:Trigger
17
•DeployautoencoderalgorithmsatL1triggeronFPGAtotriggerBSMsignatures.•arXiv:1811.10276,IMLslides•Autoencodersverygoodinfindingdifferentnewphysicsscenarios.•Useencoderpartoftheautoencoder,reducetoverysmalllatentspace.Notrunningthedecoder.•Addfigureofmeritinthelatentspace,deviationsfromexpectationwillbethetriggerfornewphysics.•Usesimple𝜒2functiontodeterminethelevelofdeviationfromexpectation(smallfornoise,largefornewphysics).
MLinterfacetoFPGAinPython FPGA
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
QuantumTechnology:QuantumAnnealing
18
Nature550(2017)7676
•ProofofprinciplestudyfortheHiggsbosondiscoveryusingquantumannealing.•ConsideringsignalH->𝜸𝜸andSMbackgroundprocesseswithdecaystotwophotons.•AreaunderROCcurve.Solidlinescorrespondtoquantum(green)orsimulated(blue)annealing,anddottedlinestotheDNN(red)orXGB(cyan).•DNNandXGBhaveanadvantageforlargetrainingdatasets,whileannealer-trainednetworksperformbetterforsmalltrainingdatasets.•QuantumcomputingcanfindtheHiggsboson.•Anomalydetectionusingquantumtechnologyshouldbeexploredfurther.
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
Summary
19
•Anomalydetectioncomplementaryto“conventional”newphysicssearches.•Proofofconceptforsupervised,weakly-supervisedandunsupervisedlearning.•Comparisonofunsupervisedalgorithms(inpreparation).•Physicsanalysisexamples.•Furthertopicsforanomalydetection.
JuditaMamužić|AnomalydetectionwithMachineLearningattheLHC|28May2021|IWorkshopdeComputingySoftwaredelaRedEspañoladeLHC|
Discussion
20
1.Whatshouldopendatasamplesinclude?
2.Whatareadvantagesofsupervisedvsunsupervisedapproach?
3.Howgeneralcantheanomalydetectionalgorithmsbe,dependenceontheinputfeatures?
4.WhatimprovementsareneededfordataformatforMLalgorithms?
5.WhatshouldweconsiderforHL-LHCandfuturecolliders?
6.HowshouldtheMLmodelsbesystematicallycollected?
7.Whatotherapplicationsforanomalydetectioncouldbemade(advantagesandlimitations)?