Searching for New Galaxies in the nearby Universe

8
ICRAR-IVEC Summer Project 2013-14 Searching for New Galaxies in the Nearby Universe Asif RRasha Supervisor: Dr Tobias Westmeier International Centre for Radio Astronomy Research University of Western Australia, 35 Stirling Highway, Crawley WA 6009, Australia Abstract The objective of the summer project was to process and analyze the deep sky survey data collected using the 64m Parkes Radio Telescope at the 21cm emission line of neutral hydrogen to search for galaxies. The report discusses the data processing done using Parkes data reduction software and source finding using existing software called Duchamp. It also discusses the process of analyzing the output data from Duchamp to produce a parameterized catalogue of galaxies. I. Introduction Hydrogen is the most abundant element in the Universe and is a major constituent of galaxies. Neutral hydrogen, which consists of a proton and an electron, emits/absorbs a photon at 21cm wavelength (1420.405751 MHz) when there is a relative spin flip between the proton and the electron [2]. Since the Universe is expanding, most of the galaxies are receding away from us. As the galaxies recede, due to the Doppler effect, there is a shift in observed frequency. By sur- veying the sky with a radio telescope at 1420 MHz with a wider bandwidth, the receding galaxies can be detected at frequencies lower than the HI rest frequency. II. Background An HI survey was done between May and September 2013 using the Parkes Radio Tele- scope across the Sculptor region. The sur- vey frequency ranged from 1100 MHz to 1500MHz, covering the Sculptor region out to a redshift of up to 0.2. The Parkes Radio Telescope has a few backend correlators, with the MBCORR and HIPSR cor- relators used for this survey. MBCORR is a fairly old correlator with the HIPSR having being installed recently. Both correlators were simultaneously used for the survey, however, our project only looked at the HIPSR data. The MBCORR data was used for retriving T-sys information from the header. Table 1: Parameters for the HIPSR Sculptor group sur- vey Sky Coverage(RA) 00 < RA < 01 h Sky Coverage(dec) -30 < δ<-18 Integration time per beam 1h15min Frequency resolution 10km/s FWHM of beam 14.68 arcmin RMS 4.8 mJy Since the HIPSR correlator is new and largely untested, the final results could be used to provide indirectly some feedback on the efficiency and reliability of the correla- tor. Also the survey covers a much wider fre- quency range with a longer integration time (i.e. lower RMS) than the original HIPASS sur- vey, increasing the possibility of finding new galaxies that were not previously detected in the HIPASS survey. III. Methodology I. Data Reduction A script written by Danny Price was used to convert the HIPSR files (from HDF5 to sdfits format) while extracting the calibration in- formation from the corresponding MBCORR 1

Transcript of Searching for New Galaxies in the nearby Universe

ICRAR-IVEC Summer Project ‚ 2013-14

Searching for New Galaxies in theNearby Universe

Asif R Rasha

Supervisor: Dr Tobias WestmeierInternational Centre for Radio Astronomy Research

University of Western Australia, 35 Stirling Highway, Crawley WA 6009, Australia

Abstract

The objective of the summer project was to process and analyze the deep sky survey data collectedusing the 64m Parkes Radio Telescope at the 21cm emission line of neutral hydrogen to search forgalaxies. The report discusses the data processing done using Parkes data reduction software and sourcefinding using existing software called Duchamp. It also discusses the process of analyzing the outputdata from Duchamp to produce a parameterized catalogue of galaxies.

I. Introduction

Hydrogen is the most abundant element in theUniverse and is a major constituent of galaxies.Neutral hydrogen, which consists of a protonand an electron, emits/absorbs a photon at21cm wavelength (1420.405751 MHz) whenthere is a relative spin flip between the protonand the electron [2].Since the Universe is expanding, most of thegalaxies are receding away from us. As thegalaxies recede, due to the Doppler effect,there is a shift in observed frequency. By sur-veying the sky with a radio telescope at „1420MHz with a wider bandwidth, the recedinggalaxies can be detected at frequencies lowerthan the HI rest frequency.

II. Background

An HI survey was done between May andSeptember 2013 using the Parkes Radio Tele-scope across the Sculptor region. The sur-vey frequency ranged from 1100 MHz to1500MHz, covering the Sculptor region outto a redshift of up to 0.2.The Parkes Radio Telescope has a few backendcorrelators, with the MBCORR and HIPSR cor-relators used for this survey. MBCORR is afairly old correlator with the HIPSR havingbeing installed recently. Both correlators weresimultaneously used for the survey, however,

our project only looked at the HIPSR data. TheMBCORR data was used for retriving T-sysinformation from the header.

Table 1: Parameters for the HIPSR Sculptor group sur-vey

Sky Coverage(RA) 00 < RA < 01 hSky Coverage(dec) -30 < δ<-18Integration time per beam 1h15minFrequency resolution 10km/sFWHM of beam 14.68 arcminRMS 4.8 mJy

Since the HIPSR correlator is new andlargely untested, the final results could beused to provide indirectly some feedback onthe efficiency and reliability of the correla-tor. Also the survey covers a much wider fre-quency range with a longer integration time(i.e. lower RMS) than the original HIPASS sur-vey, increasing the possibility of finding newgalaxies that were not previously detected inthe HIPASS survey.

III. Methodology

I. Data Reduction

A script written by Danny Price was used toconvert the HIPSR files (from HDF5 to sdfitsformat) while extracting the calibration in-formation from the corresponding MBCORR

1

ICRAR-IVEC Summer Project ‚ 2013-14

data. The MBCORR data, however, were con-verted using Livedata to sdfits format firstbefore they were applied to the HIPSR data(else the script doesn’t generate any output).

Parkes data reduction software Livedataand Gridzilla [1, pg 497] were then used toapply bandpass calibration on the convertedHIPSR sdfits file and for gridding the band-pass calibrated data to create datacubes. Thesame parameters and techniques used previ-ously for HIPASS data reduction were appliedto the HIPSR data. For the smoothing kernel,aTuckey filter was used because a Hanning fil-ter would have degraded spectral resolutionby a factor of 2 [1, pg 491]. The median estima-tor was used, as it is statistically more robustand widely used with multibeam processing[1,pg 492].

Table 2: Parameters used for bandpass calibration usingLivedata

Parameters HIPSR Survey

Spectral smoothing TuckeySmoothing kernal MedianDoppler frame BarycentricSpectral baseline fit None(with Livedata)

The datacube produced had a large num-ber of continuum sources with standing waveripples existing throughout all the frequencychannels at the position of continuum sources(Figure 1).

Figure 1: First Data cube produced using Livedata &Gridzilla

To get rid of the strong continuum emis-

sion and standing wave ripples, we restrictedthe frequency from 1300 MHz to 1450 MHz(initial coverage was from 1100 MHz to 1500MHz) to discard lots of RFI that existed out-side this range(the redshift coverage nowdown to 0.1). Two algorithms written by mysupervisor, Dr Tobias Westmeier, were used totry different bandpass polynomial fits (figure2) and to minimize the effect of the standingwaves (figure 3). Also one of the region in thecube that was observed with the MBCORRcorelator in the first few days had very highnoise. This was mainly due to not havingenough HIPSR data for that region. This wasthe only region that had MBCORR HI data inour datacube.

0 2 4 6 8 10

0.005

0.010

0.015

0.020

0.025

0.030

0.035

Order of fitting

rms in

MJy

Figure 2: Effect of different order fitting on the RMSvalue. Once the standing wave algorithm isapplied, the RMS drops significantly

Figure 3: Standing wave ripple from two continuumsources

The data were reprocessed without thenoisy region and a 10th order polynomial fitwith the standing wave removal algorithm wasapplied to the cube. A 10th order polynomialwas chosen to completely overlay the band-

2

ICRAR-IVEC Summer Project ‚ 2013-14

pass before the standing wave removal wasapplied. This was to ensure that the standingwave removal algorithm doesn’t effect any HIdetections. Also the edges were very noisyand were trimmed using Miriad[3] to avoidfalse detections during source finding. Figure4 shows the datacube after applying the cor-rections above, showing significant improve-ments.

Figure 4: HIPSR Final Cube]

We also had access to the HIPASS 2 data(publication still under process), which is there-processing of the original all sky HIPASSHI survey data and covers the entire southernsky pδ ą 0.2q. A portion of the HIPASS 2 dataoverlaps with our sculptor region HIPSR data,so we decided to combine both the HIPSR andHIPASS2 data to increase the sensitivity by afactor of

?2.

Using Miriad, regridding was done on theHIPASS 2 cube to extract the region of over-lap with HIPSR and also to change the radialvelocity axis of the HIPASS 2 to frequency (un-like HIPSR data which is in frequency, HIPASS2 is in radial velocity). Both the HIPASS 2and HIPSR cube were then combined usingMiriad.

Figure 5: HIPSR & HIPASS 2 combined

Table 3: Combining both the HIPSR & HIPASS 2 cubeshows significant improvements in the rmslevel by a factor of

?2

Datacube RMSpmJyq

HIPSR 0.0048Combined 0.0033

However, when the HIPASS2 cube wasoverlaid with the HIPSR cube, an offset infrequency was observed between identicalsources. Figure 6 shows the overlaid cubeswith a contour of a source on the HIPSR cubealthough there is no visible source underneaththe contour.

Figure 6: HIPSR & HIPASS 2 overlaid

Duchamp was used (described in SourceFinding section) with a high detection thresh-old p9σq to find genuine sources from boththe HIPSR and HIPASS 2 cubes to see if thereis any correlation in the frequency differencebetween those sources. No correlation in fre-quency was found, so we averaged the fre-quency difference of the similar sources inboth the cubes and subtracted it (0.000138149GHz) from the centre frequency (1.374992187GHz) in the header of the HIPSR cube. Fig-ure 7 shows the HIPASS 2 cube overlaidwith the HIPSR cube (new centre frequency1.374854038 GHz).

Figure 7: HIPASS 2 & HIPSR cube overlaid withHIPSR centre frequency correction

3

ICRAR-IVEC Summer Project ‚ 2013-14

In the figure above, the source (middle left)HIPSR cube overlaps perfectly with the samesource found in the HIPASS 2 cube (contour).Using Karma, the overlaid cubes were checkedin all channels and no false contours were ob-served, with the same sources from HIPASS 2overlaying perfectly with the HIPSR sources.

With the processed data cubes, source find-ing and parameterization was done only onthe HIPSR datacube and the process is dis-cussed in the following section.

II. Source Finding (Using Duchamp)

Duchamp is a open source software primar-ily designed to work with spectral-line cubes,and is publicly available for download fromthe Duchamp website[9]. It offers a widerange of different parameters providing flexi-bility to produce the required output. All theDuchamp parameters (with their values), theimage file directory and the desired outputs(i.e. output catalogue, moment map etc) arewritten in a text file which then can be excutedfrom the command line.

Different combinations of input param-eters were used for source finding usingDuchamp to get the best detection results. TheDuchamp user guide [8] was used to familiar-ize with Duchamp parameters. Different com-bination of parameter sets and their effect onsource finding using Duchamp was previouslytested and published in a paper titled "Basictesting of the Duchamp source finder" [7]. We re-stricted our selection of Duchamp parametersand their values in line with the ones used inthat paper and tried 20 different combinationsof parameters on the HIPSR datacube.

A flux threshold between 2.5σ and 3σ wasused for all the tests, as going below 2.5σresulted in lots of false detections whereas go-ing above 3σ missed out on low flux genuinesources.

Duchamp provides a useful method ofreconstructing the entire data cube usingwavelets. It improves the relaibility of thesource finding by reducing lots of noises fromthe image. Reconstruction can be either car-ried out in the spectral (frequency in our case)dimension, or both spatial and spectral dimen-sion. Almost all the sources in the data cubehad smaller angular size, so there wasn’t any

reasonable gain by doing a spatial reconstruc-tion. The reconstruction was only done in thespectral domain (recondim set to 1) as most ofour sources were well resolved and extendedin frequency[7, p. 17].

The data cube was reconstructed inDuchamp using a "A-Trous" wavelet recon-struction method [4]. It convolved our datacube with a wavelet filter function available inDuchamp. The wavelet coefficient was thencalculated by taking the difference betweenthe convolved and the original data cube andwas added to the reconstructed cube. Wespecified wavelet coefficient threshold (SNR-Recon), a minimum and a maximum scale forthe filter function (scale min and scale max)in the parameter file. Duchamp then doubledthe scale of the filter function and repeated theprocedure using the convolved array as thenew input data set. When the maximum filterscale (scale max) was reached, the final con-volved data was added to the reconstructedcube and the source finding was initiated.

ScaleMin was kept between 2 and 3with ScaleMax and reconstruction thresh-old(snrRecon) of constant value 8 and 4. Wevaried the flux threshold, the growthThresh-old (for growing detected objects to a smallerthreshold) and the minimum pixel (minimumnumber of pixel for a spatial detection to becounted) to optimize the source detection.

The table below gave the optimum results(described below) out of the 20 different testedparameter sets.

Table 4: Duchamp parameters

Parameters Test1 Test2 Test3 Test4

Threshold 1.5 1 1.5 3.5GrowthThreshold 1.5 1 1.5 1reconDim 1 1 1 1snrRecon 4 4 4 4ScaleMin 2 2 2 2minPix 5 5 25 25Total detections 403 651 119 207No of matches 7 3 5 1

There are different useful outputs that canbe produced using Duchamp. For our sourcefinding, we generated an output catalogue, aspectra file containing spectral and other use-

4

ICRAR-IVEC Summer Project ‚ 2013-14

ful information for each individual detectedsource and a detection map showing all thedetections made by Duchamp from the cube.

The detection map (figure 8) shows thesources detected by Duchamp throughout thecube. Although there were lots of false detec-tions from the edges, we didn’t trim the edgetoo much to retain some of what could be gen-uine sources. There were 618 detections fromtest 1 out of which 31 sources were selectedas possible candidates after manually lookingthrough the source’s spectral information inthe spectra file produced by Duchamp.

Figure 8: Detection Map produced by Duchamp fromTest1

For measuring the performance of eachrun, we matched the Duchamp output galax-ies with the original HIPASS sources in skycoordinates with a maximum angular seper-ation of 15 arcmins (Parkes beam size „14.6arcmins at „1420 MHz) and a frequency dif-ference (between two similar sources) of lessthan 140 KHz using a software called Topcat[5]. Row 8 in table 4 shows the maximumnumber of matched sources between HIPSRand original HIPASS all sky survey catalogue.Test 1 shows the highest number of matches(7 sources).

The integrated flux (Fint) of matchedHIPSR sources (all four tests) from theDuchamp catalogue were compared againstthe Fint of the corresponding matched origi-nal HIPASS sources. Figure 9 shows the per-

centage difference in Fint along the x-axis andthe frequency difference between the sourcesalong the y-axis.

Figure 9: Fint difference between HIPSR and originalHIPASS galaxies against their frequency dif-ference in kHz

The percentage difference in Fint is calcu-lated by the following formula:

FintpHIPASSq´FintpHIPSRqFintpHIPASSq ˆ 100

Test 1 gave the highest number ofmatched sources, with less frequency differ-ence and percentage difference in Fint thanthe corresponding sources in the other threetests(Figure 9). So the test 1 Duchamp cata-logue was used to parameterize and producea final HIPSR source catalogue.

III. Parameterization

The Fint of the HIPASS 2 (yet to be published),differs from the original Fint of HIPASS byabout 10%. The original flux calibration ofthe HIPASS is correct as it has a relaibility ofup to 99%[10]. So we presume that the Fintdifference between two same galaxies fromour HIPSR and HIPASS should agree within10%. However, from figure 9, only one sourcein test 1 agrees within 10%.After trying a few different techniques toimprove the Fint difference, we found thatDuchamp was incorrectly calculating the Fintof the sources. Using Miriad, we re-calculated

5

ICRAR-IVEC Summer Project ‚ 2013-14

Fint for each sources and the new Fint differ-ence showed significant improvement. How-ever, the Fint difference of the galaxies stilldidn’t completely agree within 10%.

The beam size (in pixels) in the HIPSR datacube is calculated from the header informa-tion (BMAJ and BMIN) by Duchamp using thefollowing equation[8, pg 30]:

Beam size (arcmin)= π.Bmaj.Bmin4ln2

However, the actual image beam woulddiffer from the intrinsic Parkes beam becauseof the gridding process (the beam depends onthe smoothing radius, pixel size, sky samplingand source shape) . So a reasonable approachcould be using a beam size of a known sourcecalibrator within the data cube, which wouldbe a better estimate of beam size for the HIPSRsource’s rather than using actual Parkes beamsize as specified in the header of the data cube.

We generated a continuum map of theoriginal data cube using Livedata and thenselected a strong continuum source 0023-263from the continuum map at 1384 MHz (fig-ure 9). By using karma, the beam size of thesource was measured to be 13.8 arcmin. Weran Duchamp again on the HIPSR datacubeusing the same parameters of test 1 but speci-fying a beam size of 13.8 arcmin instead 14.68arcmin (used by Duchamp).

Figure 10: The continuum map of the HIPSR data.Strong continuum source 0023-263 in themiddle of the cube is used for flux correction

We then applied flux correction to theHIPSR sources using the following formula(assuming all are point sources and intrinsi-cally Gaussian) to scale the Fint of the source’swith the HIPASS:

Fint13.8 arcmin ˆ 14.68 arcmin

where 13.8 arcmin is the measured beamsize and 14.68 arcmin is the Parkes beam sizeas specified in the header of the data cube.The following graph was plotted after apply-ing the above flux correction to all 7 matchedHIPSR galaxies:

-40 -30 -20 -10 0 10 20 30 40

-150

-100

-50

050

100

150

Integrated Flux difference(%error)

Fre

qu

en

cy D

iffe

ren

ce

(K

Hz)

Figure 11: Red points are the sources before applyingbeam correction and green points are thesame sources after applying beam correction.

From the above plot, the flux correctiontechnique showed significant improvement asthe percentage difference in Fint between 6 outof 7 matched galaxies now agree within 10%.

Assuming that all the 31 confirmed galax-ies from our catalogue are point sources, fluxcorrection was applied to all the galaxies bythe similar process described above.

Since we know the observed frequency ofthe sources, the radial velocity of the sourcesare calculated using the following formula[6]:

Vradc “ 1- f

fo

Using the radial velocity of the sources,their distance from the Milky Way is calcu-lated using Hubble’s Law:

d = VradHo

6

ICRAR-IVEC Summer Project ‚ 2013-14

Using the corrected integrated flux andthe Vrad calculated above, the HI masses of thegalaxies were calculated using the followingformula [6]:

MHIM@

“ 236 ˆ SintmJykm{s ˆ p

dMpc q

2

Theoretical masses of the HIPSR galaxieswere calculated by taking an overall 5σ fluxthreshhold and velocity resolution of 50km{s.The following plot shows the log of theoretical,calculated and the original HIPASS survey HImass (in M@) against distance in Mpc.

0 20000 40000 60000 80000 100000 120000

7.5

8.0

8.5

9.0

9.5

10.0

10.5

Distance squared (megaparsec)

log

HI M

ass(S

ola

rMa

ss)

Figure 12: The green, red & blue curves are the logof HI mass (in solar mass) of the theoreti-cal, our calculated HIPSR sources and theoriginal HIPASS survey against distance(in Mpc) from our Milky Way respectively.HIPSR sources stays within the "Theoreticaldetection limit"

IV. conclusion

From the figure above, the HI mass with dis-tance of the HIPSR sources stays within thetheoretical curve (detection limit) and agreewith the curve plotted from HIPASS sources.To summarize, these are the following out-comes from the project:

- Successfully converting and processingthe HIPSR data using the existing datareduction softwares.

- Succesfully combining the existingHIPASS 2 data with our HIPSR data.

- A frequency shift (138 KHz) of unknownorigin.

- Flux callibration appears to be correct.

- A catalogue of 30 galaxies producedwith their physical parameters.

- 23 out of 30 galaxies seems to be newHI detection, however, further investiga-tions are required.

However, while applying flux corrections,we assumed all the HIPSR sources to be pointsources and intrinsically Gaussian, which isnot completely true and might have addedsome uncertaininty to the Fint of few sources.However, the result should not be significantlyeffected by the uncertainity as almost all ourHIPSR sources are point sources.

While the main objectives of the projectwere succesfully achieved, these are the fewthings that still needs to be looked at:

- Try to further improve on the flux cor-rections to a higher accuracy.

- Process and parameterize the combineddata which, should reveal some newsources due to its higher sensitivity.

- Do scientific analysis with the galaxiesfrom the catalogue.

V. acknowledgement

We thank International Centre for Radio As-tronomy Research and IVEC for providing mewith this wonderful oppurtunity. Thanks toDr Valerie Maxville for her support during thewhole project and Kirsten Gottschalk for allthe administrative help and support. Specialthanks to my supervisor Dr Tobias Westmeierfor looking after me throughout the project.Also thanks to A/Prof Martin Meyer and DrAtilla Poppings for providing extra assistancewith the project.

7

ICRAR-IVEC Summer Project ‚ 2013-14

References

[1] DG Barnes, L Staveley-Smith, WJG e 3 al De Blok, T Oosterloo, IM Stewart, AE Wright,GD Banks, R Bhathal, PJ Boyce, MR Calabretta, et al. The HI parkes all sky survey: southernobservations, calibration and robust imaging. Monthly Notices of the Royal AstronomicalSociety, 322(3):486–498, 2001.

[2] National Radio Astronomy Observatory. The HI 21 cm Line. http://www.cv.nrao.edu/course/astr534/HILine.html/, 2014.

[3] R Sault and N Killeen. Multichannel image reconstruction, image analysis and display(miriad) users guide, atnf, 1996.

[4] JL Starck, E Pantin, and F Murtagh. Deconvolution in astronomy: A review. Publications ofthe Astronomical Society of the Pacific, 114(800):1051–1069, 2002.

[5] MB Taylor. Topcat & stil: starlink table/votable processing software. In Astronomical DataAnalysis Software and Systems XIV, volume 347, page 29, 2005.

[6] Tobias Westmeier. Useful equations for radio astronomy flux conversion. http://www.atnf.csiro.au/people/Tobias.Westmeier/, 2014.

[7] Tobias Westmeier, Attila Popping, and Paolo Serra. Basic testing of the Duchamp sourcefinder. Publications of the Astronomical Society of Australia, 29(3):276–295, 2012.

[8] Matthew Whiting. Source Detection with Duchamp- A Users Guide. http://www.atnf.csiro.au/people/Matthew.Whiting/Duchamp/downloads/UserGuide-1.5.pdf/, 2006.

[9] Matthew T Whiting. Duchamp: a 3d source finder for spectral-line data. Monthly Notices ofthe Royal Astronomical Society, 421(4):3242–3256, 2012.

[10] MA Zwaan, MJ Meyer, RL Webster, L Staveley-Smith, MJ Drinkwater, DG Barnes, R Bhathal,WJG De Blok, MJ Disney, RD Ekers, et al. The hipass catalogue–ii. completeness, reliabilityand parameter accuracy. Monthly Notices of the Royal Astronomical Society, 350(4):1210–1219,2004.

8