A New False Peak Elimination Method for Poor DNA Gel Images Analysis
Transcript of A New False Peak Elimination Method for Poor DNA Gel Images Analysis
978-1-4799-7938-7/14/$31.00 ©2014 IEEE 180
A New False Peak Elimination Method for Poor DNA Gel Images Analysis
Ros Surya Taher1, Nursuriati Jamil2, Sharifalillah Nordin3
Faculty of Computer and Mathematical Sciences, UiTM Malaysia,
Shah Alam, Selangor, Malaysia. 1Surya, 2liza,3sharifa @tmsk.uitm.edu.my
Umikalsum Mohamed Bahari Horticulture Research Centre,
Institut Penyelidikan dan Kemajuan Pertanian Malaysia, Serdang, Selangor, Malaysia.
Abstract - In this paper, we describe the convention of threshold value in the analysis of poor DNA gel electrophoresis to eliminate false peak contained in the intensity profile obtained from image data projection. Peak detection will be performed during the common task of DNA gel image analysis which is lane detection. In this study, the peaks of the profile represent the gaps between lanes. Due to the poor quality and inconsistent pattern of the images, the detection of true peaks become a challenge to automate the analysis of the DNA gel images. To resolve the problem, a set of threshold value was applied in the peak detection process to eliminate the false peaks. Fifty images with total of 2,164 lanes suffered from various noise and artefacts were used to evaluate the proposed false peak elimination scheme. Number of lanes in individual image varies, ranging from 36 to 50 lanes. The results showed that the sensitivity and precision was 99% and the latter was 93% with an accuracy of 96%.
Keywords – false peak, true peak, lane detection, DNA gel electrophoresis
I. INTRODUCTION The lane detection in poor DNA gel electrophoresis
analysis presents a great challenge to computer interpretation due to the quality and pattern of the images. Besides the problems that are caused during the electrophoresis, the gel image also suffers with various types of noise, distortion and often degraded by large intensity variation among the bands within each lane [1][2]. In common, the problems of poor DNA gel images are they suffered with low intensity bands, smeared bands, curved bands and atypical banding patterns. Figure 1 shows the example of the poor quality DNA gel images. Because of some experimental errors and the qualities of the gel images, the images are not exploitable in many situations. Researchers have to spend much time extracting data from the gel image manually, which leads to reproducibility issues; subsequently relates to reliability of results obtained [3][4].
Thus, various studies have been presented using image analysis and pattern recognition approach to improve interpretation of the gel image qualitatively and quantitatively in extracting information from the banding patterns of the DNA. The image processing tasks that are
commonly used to analyse the gel image can be classified into four tasks: lane detection, band segmentation, band quantification and band scoring.
Source : Horticulture Research Centre, MARDI, Malaysia Figure 1. Example of curved bands in a DNA electrophoresis gel image that also affected by low intensity bands, smeared bands and having atypical banding patterns.
In this paper, we proposed a new method that is applied
in the lane detection process of DNA gel image analysis. Our study focused on examining the local maxima also known as peak, to find the true peaks by eliminating the false peak prior to gap detection between lanes based on projected intensity profile. The true peaks detected will ease the process of lane’s gap tracking as shown in example in Figure 2. Instead of lane center detection, which has been performed in various studies [5][6][7][8][9], we detect the lane border or the gap between lanes. With this approach, a lane can be detected even if the sample does not contain any bands in a lane and can avoid elimination of any important band information.
Work on the lane separation or detection by tracking the lane’s border were proposed by several studies. In [10], the peaks were obtained from the relative intensity of one third of the gel images. Then by applying the tracking algorithm,
1812014 International Conference on Intelligent Systems Design and Applications (ISDA)
the detection of lane border will start identified in earlier phase. While [11]boundaries of lanes by computing the posiminima further separating the lanes based owidth to generate a sub-image for each lanethe use of intensity profile to identify gapbased on the background levels. By background level change slowly comparedthe lanes, the left and right edges of each lanand divide the image into lanes [4]. Howestudies presented require user input paranumber of lanes to ease the process ofindividual lanes. With the input, it westimating the true peaks and the lanes in the
Figure 2. Example of lane’s gap tracking based The challenge is when we proposed to fu
process of lane detection. It may be cruciatrue peaks in poor DNA gel images due topattern of the images. Unfortunately, the elipeak in DNA gel image analysis from previwidely being discussed. Most of the proposmethods require number of lanes as a prThus, based on the intensity profile, the mocan be identified and its equivalent lanestimated based on the number of lanes give
Except in [6], the elimination of false pewhere they take out the false peak usincriterion. With the assumption a true peak ithe top of a hill which representing the centhe slope of the tangent turns at the locpositive to negative value. The positive slofrom the starting point of the lane to the lothe negative slopes are counted from the locend point of the lane. Then, the estimated labeen calculated in earlier process becoparameter.
On the other hand, there are studies beidetection in other area of study such as in[15][16], time series [17] and bioinformatBenjamini-Hochberg(BH)-based approach thselects the number of peaks was proposed i
from the peaks ] estimated the ition of the local on estimated lane e. There was also ps between lanes considering the
d to the levels of ne can be located ever, most of the ameter, which is f identifying the
will correlate in e gel image.
on local maxima. fully automate the al to identify the o the quality and imination of false ous studies is not
sed lane detection riori information. ost accurate peak
ne width can be en by the user. eak has been done g slope changes is only located at nter of a lane, so al maxima from opes are counted ocal maxima, and cal maxima to the ane width that has ome a baseline
ing done on peak n mass spectrum tics [12][18]. A hat automatically in [12] in nuclear
magnetic resonance (NMR)-badetermination. Whilst in [15], transform (CWT)-based peak dedevised that identifies peaks wiamplitudes. In the same area, [algorithm for low level processincluding denoising with the undectransform (UDWT), baseline correcquantification. [18] also developed baseline correction, peak detection peaks in chromatographic data. Dintroduced in peak detection, becautheir data, a few parameters are ridentification of peaks.
II. METHODO
A. Dataset The images are collections
electrophoresis images affected artefacts generated during thexperiment or during the acquisitioBesides the large intensity variatiimage, they also suffered with smeaand atypical banding patterns.
Our experiment examines 50 collected from Horticulture RePenyelidikan dan Kemajuan PertanMalaysia. The DNA gel imagelectrophoresis-based method usreaction (PCR)-based marker systemin .jpeg format with a grayscale of sof 748x2841 pixel dimension. Thesethe experts as the quality of the imaglevel for their analysis.
B. True Peaks Detection Typically, in a good quality DN
peaks can be easily differentiateHowever, in our study it is crucial tothere can be a number of weak peathat are caused by the noise. Theidentify even by manual processes. no clear cutoff threshold intensity true peaks from the false ones.
Thus, in this paper we proposed simple approach to eliminate falseprofile of poor DNA gel images. Ato detect the gaps between lanes, a been performed to minimize the ocimages. Once the gaps have been dthe lane’s width, thus the prediction image can be made for later analysissaid process is a part of lane detectremaining procedures are excludedillustrates the process to detect the the false peak using the threshold int
ased protein structure a continuous wavelet
etection algorithm was th different scales and 16] developed a novel ing of SELDI spectra, cimated discrete wavelet ction, peak detection and an automated method for and grouping of similar
Despite several methods use of the complexity of required to facilitate the
OLOGY
of poor DNA gel by various noise and e gel electrophoresis
on process of the images. ion of bands in the gel ared bands, curved bands
plant DNA gel images search Centre, Institut
nian Malaysia (MARDI), es were produced by
sing polymerase chain m. The images are saved similar resolution and size e images were rejected by ge was below satisfactory
NA gel images, the true d from the false ones. o select the true peaks, as aks with diverse intensity se peaks are difficult to As agreed in [1], there is value to distinguish the
a threshold interval as a e peaks in the projected
As the aim of this study is scheme of processes has
ccurrence of noise in the detected, we can estimate of number of lanes in the
s. As to mention here, the tion procedure whilst the d in this paper. Figure 3 true peak by eliminating terval.
182 2014 International Conference on Intelligent Systems Design and Applications (ISDA)
1) Vertical Projection Profile To estimate the most accurate posit
between lanes, the intensity profile of a whobeen enhanced [13] was projected. Thintensity profile is projected along the lanprojection of image intensity is obtained bgrey level values of each column in the ima(1).
1 , 1, . .
Figure 3. The process of eliminating false peak in lanestimate individual lane width.
P(x) is the projected profile of image I(x
Based on the common assumption of the geprofile, the peak (local maxima) represents lanes and the valley (local minima) represeeach lanes. Figure 4 shows an example oimage and Figure 5 is its corresponding prprofile. Figure 6 is another example of DNAa better view of lane separation and the prprofile shown in Figure 7.
tion of the gap ole image that has he corresponding nes. This vertical by averaging the
age as in equation
. ,
(1)
ne’s gap detection to
x,y) of size MxN. el image intensity
the gap between ents the center of f poor DNA gel rojected intensity A gel image with rojected intensity
Source : Horticul Figure 4. An example of DNA gel im
Figure 5. Intensity profile by applyiFigure 4.
Source : HorticulFigure 6. An example of DNA gel im
lture Research Centre, MARDI, Malaysia mage with SD (σ) = 18.1118
ing equation (1) in image in
lture Research Centre, MARDI, Malaysia mage with SD (σ) = 8.0543
1832014 International Conference on Intelligent Systems Design and Applications (ISDA)
Figure 7. Intensity profile by applying equatiFigure 6.
2) Profile Smoothing
As can be seen from the projected veFigure 5 and Figure 7, local maxima possibtimes between adjacent gap because of thgenerated during the wet lab procedure and/oprocess of the image. Thus, we apply a filter to smooth the vertical profile with thleave only one local maxima representinglanes in the profile. The moving averconvolution operator to output the averagebeneath the predefined window. The pixel the center of the window is updated with thof pixels. Experimentally, the ideal windowaverage filter was set to 5. As shown in equainput signal, y[ ] is the output signal, and Mused in the moving average. The filter size of points that being chosen symmetrically apoint that requires M be an odd number.
y i 1 x i j/
M /
Although the features of poor DNA gel
several challenges to detect the gap betweentwo characteristics of the profile that were value of the projection profile calculated lower than the one along a gap between twlanes are composed of the sparsely located hwhich are relatively darker than backgrouncan classify the local maxima (peak) reprand on the other hand, the lower intensity vathe lanes. However, although after smoothinoccurrence of more than one peak (false pelanes can exist. Therefore we proposedelimination scheme to overcome this problem
ion (1) in image in ertical profiles in bly appear several the image noises or the acquisition moving average
he expectation to g a gap between rage filter is a e value of pixels corresponding to
the average value w for the moving ation 2, x[ ] is the
M is the filter size consists a group
around the output
(2)
images bring out n lanes, there are in common. The along a lane is
wo lanes because horizontal bands,
nd. Therefore, we resenting the gap alues substituting ng the profile, the eak) between two d a false peak m.
3) Mean and Standard DeviatioWe utilized the mean and standa
image intensity to construct the scheme. In general, standard deviatidispersion of a set of data from its standard deviation shows us the sprin a particular image. From viscollected images, it shows that tdeviation value, the poorer the DNA
Figure 8. Standard deviation distributishows the ± 1σ
Figure 8 shows the standard dev
images. With the assumption that twhere most of the data from the daverage, the quantified standard dand the mean is 14.9931. Furtherinterval can be computed based on9.0258, and +1σ is 20.9604. Ththreshold interval to categorize theTable I with assumption any imageof intensity less than -1σ is poor, wgreater than +1σ is very poor.
TABLE I. THRESHOLD INTERVAL UDNA GEL IMAG
Standard deviation (σ) σ <= 9
9 < σ <= 21 σ > 21
Figure 9 shows an example
deviation 32.2732 that is categorizeFigure 4 is the example of image Figure 6 is categorized as ‘pocharacteristic of very poor images lanes in the image. The change in rthe lab procedure may influence thehence this kind of electrophoresis aintensity of the empty lanes is ve
on of Intensity Profile rd deviation (σ) value of
false peak elimination ion is used to measure the mean. In our study, the
reading of intensity level ual supervision on our the bigger the standard
A gel image.
ion of 50 images. Error bar σ. viation distribution of 50 the distribution is normal data set are close to the eviation value is 5.9673 r, the standard deviation n value of -1σ, which is he proposed set of the e images is indicated in s with standard deviation within ±1σ is poorer and
USED TO CLASSIFY THE POOR GES
Category of image Poor
Poorer Very Poor
of image with standard ed as ‘very poor’ image. in category ‘poorer’ and
oor’ image. The same is the existing of empty
running conditions during e DNA fragment mobility artefacts occurred. As the ery low compared to the
184 2014 International Conference on Intelligent Systems Design and Applications (ISDA)
lanes consisting bands, the distributiodispersion leads to bigger standard deviation
Source : Horticulture Research Figure 9. An example of DNA gel image with S
Another attribute needed to compose
elimination scheme is the mean of image int(μ) is the average of image intensity calculaintensity profile by applying (3). mean µ 1 ∑ ,
Based on the threshold interval and thecomposed the conditions and parameter foelimination process as shown in Figure 10. Iof techniques to evaluate DNA gel imintroduced. One of the parameter being useDNA fragment is minimum value of peakbasis, we are using the same parameter topeaks as we set an acceptable intensity levetrue peaks. Therefore, we proposed elimination scheme by incorporating valupeak height, standard deviation and mean true peaks detection that will be explainsection.
Figure 10. False Peak Elimination Scheme : Theeliminate false peak in poor DNA gel image
on of intensity n.
Centre, MARDI, Malaysia D (σ) = 32.2732
the false peak tensity. The mean ated based on the
(3)
e mean (μ), we or the false peak In [14], variations mage have been ed to evaluate the k height. On this o filter the false el to preserve the the false peak
ues of minimum to maximize the
n further in next
conditions to e profile.
4) Finding Local Maxima In our experiment, a point in the i
easily defined as local maxima (ppoint with its neighboring points. intensity profile as a function, f(xcritical point where the first derivsecond derivative is negative, then fthis point. As the objective of this sby eliminating false peaks, we scheme in Figure 10 whereby thefiltered using the conditions given.
Based on standard deviation individual image, the conditions wilReferring to the image in Figure 4intensity of bands complicate the plane position. After smoothing and scheme, the result can be seen in Fipeaks were correctly detected and tcorrectly eliminated. The example elimination scheme on this image is
1. The standard deviation (σ) v
18.1118; 2. Second conditions will be ap3. Calculated mean (μ) of inten
equation (3) = 228.9551. 4. The minimum peak height (m
(228.9551 - 18.1118) = 2105. Eliminate peaks with minim
210.8433
Figure 11. Smoothed intensity profile Noticed two false peaks ,’*’, was elimelimination scheme in
Figure 12 is the result for image
deviation of 8.0543. With standard 9, the first condition is applied. Tpeaks done perfectly using the schseven false peaks were detected prewith the well separated lanes an
ntensity profile cannot be peak) by comparing the
Hence, we assume the x). If the function has a vative equals 0 and the f(x) has local maximum at study is to find true peaks applied the elimination
e peaks detected will be
value calculated for ll be applied accordingly. , the irregularities on the
process of identifying the applying the elimination
igure 11 whereby the true the two false peaks were of calculation using the given below:
value of image =
pplied; nsity of the image using
mean(μ)-(σ)) = 0.8433. um peaks height =
of the image in Figure 4. minated by applying the Figure 10. in Figure 6 with standard deviation value less than
The elimination of false heme proposed in which cisely. We presumed that
nd consistency of bands
1852014 International Conference on Intelligent Systems Design and Applications (ISDA)
intensity, the elimination of false peaks usinsuits better and is more accurate.
Figure 12. Smoothed intensity profile of the image seven false peaks ,’*’, was eliminated by applyingscheme in Figure 10.
III. RESULT AND DISCUSSI
To validate the false peak elimination, wpoor DNA gel electrophoresis images withlanes is 2,164 as counted manually. Thequality and the number of lanes in individufrom 36 to 50 lanes. As explained in prevthreshold interval was applied to eliminate teach test images. For comparison purposes,in each image has been analyzed manually compared with the peak detected using the pscheme.
The result by applying the false peak elion the 50 images is summarized in Table II.
TABLE II. THE RESULTS OBTAINED BY APPLYINELIMINATION SCHEME
No. of peaks
Test Images
TP detected (tp)
FP detected (fp)
TP miss(fn)
50 2270 175 13 TP = True Peaks; FP = False Peaks; tp = true positive; fp = false posit
Based on this result, we evaluate the p
respect to the measurement of recall, precisiF-measure as shown in Table III, described equations (4), (5), (6) and (7). The acdetection by eliminating the false peaks is 96
ng this scheme is
in Figure 6. Noticed g the elimination ION
we have used 50 h total number of e images vary in ual image ranging vious section, the the false peaks in , number of lanes and later will be
proposed process
imination scheme
NG THE FALSE PEAK
sed FP correctly rejected
(tn) 2422
tive; fn = false negative; tn = true negative
performance with ion, accuracy and by the following
ccuracy of peak 6.15%.
Recall
Precision Accuracy
TABLE III. PERFORMANCE EVALUAELIMINATION SCH
Recall Precision
99.43% 92.84%
The actual peaks of the 50 imag
the total number of lanes was 2,1difference between the actual peakusing the proposed elimination schthe scheme has improved the detepercentage of undetected peak redu10.44% before applying the eliminat
TABLE IV. COMPARISON ON UNDETECAPPLYING ELIMINATION
Actual Before %
2214 2472 10.44%
To evaluate further on the perforscheme, we have applied the schemhas been used by the experts performance evaluation is shown in98.2%, it is higher as compared witpoor image, which is 96.15%. Thisapplicable and can be implement asfalse peak elimination of DNA gel im
TABLE V. PERFORMANCE EVALUAELIMINATION SCHEME ON 30 GOO
Recall Precision
99.28% 97.11%
(4)
(5)
(6)
ATION ON THE FALSE PEAK HEME
Accuracy
96.15%
ges are equal to 2,214 as 164. Although there is a ks and the detected peaks heme, Table IV indicates ection rate in which the uced to 9.45% instead of tion scheme.
CTED PEAK BEFORE AND AFTER N SCHEME
After %
2445 9.45%
rmance of the elimination me on 30 good images that
for their analysis. The n Table V. With accuracy th the performance on the s indicates the scheme is s a promising approach in mages.
ATION ON THE FALSE PEAK OD DNA GEL IMAGES.
Accuracy
98.2%
186 2014 International Conference on Intelligent Systems Design and Applications (ISDA)
IV. CONCLUSION
To detect the true peaks in a poor DNA gel image is very challenging as the images contained low quality and inconsistency of pixel intensity. The implementation of false peak elimination scheme proposed has facilitated the process of true peaks detection based on the projected intensity profile. This scheme helped to determine the false peak depending on the threshold interval of standard deviation computed from pixel intensity spreading in DNA gel images. The false peak can be identified well, hence maximized the true peak detection. This scheme will be improved further as from the experimental result, the existing of false peaks still occurred although after elimination process has been done. Locating the true peaks in the poor DNA gel images is very important, where the correct detection of true peaks will ease the process of lane extraction for further analysis. Additionally, more images will be considered in determining the robustness of the proposed scheme.
ACKNOWLEDGEMENT The authors would like to thank the Ministry of Higher
Education (MOHE), Malaysia for their financial support received through the Exploratory Research Grant Scheme, (ERGS/1/2013/ICT07UiTM/02/03)) and to the Research Management Institute (RMI), Universiti Teknologi MARA.
REFERENCES [1] Bajla, I., et al., Improvement of Electrophoretic Gel Image Analysis,
Measurement Science Review, Vol 1(1), pp. 5-10, 2001. [2] Akbari, A. & Albregtsen, F., “Automatic Segmentation of DNA
Bands in One Dimensional Gel Iamges Produced by Hybridizing Techniques”, 26th Annual International Conference of the IEEE in Engineering in Medicine and Biology Society, pp.2852-2855, 2004.
[3] Weising, K., Nybom, H., Wolff, K., & Kahl, G., DNA Fingerprinting In Plants : Principles, Methods And Applications, CRC Press, Taylor & Francis Group, NW, 2005.
[4] Labyed, Y. et al., An Improved 1-D Gel Electrphoresis Image Analysis System, Arabnia, H.R. (ed) Advances in Computational Biology, pp. 609-617, Springer New York, 2010.
[5] Machado, A. M. C. et al., “An Iterative Algorithm For Segmenting Lanes In Gel Electrophoresis Images”, The X Brazilian Symposium on Computer Graphics and Image Processing, pp. 140-146, 1997.
[6] Park, S. C. et al., “Lane Detection And Tracking In PCR Gel Electrophoresis Images”, Computers and Electronics in Agriculture, Vol 83, pp. 85-91, 2012.
[7] Lee, J., Huang, C., Wang, N. and Lu, C., “Automatic DNA Sequencing For Electrophoresis Gels Using Image Processing Algorithms”, J. Biomedical Science and Engineering, Vol 4, pp. 523-528, 2011.
[8] Wong, R. T. F. et al., “LaneRuler : Automated Lane Tracking for DNA Electrophoresis Gel Images”, IEEE Transactions on Automation Science and Engineering, Vol. 7(3), pp.706-708, 2010.
[9] Moreira, B. et al., “Automatic Lane Segmentation in TLC Images Using the Continuous Wavelet Transform”, Computational and Mathematical Methods in Medicine, Vol 2013, p.19, 2013.
[10] Skutkova, H. et al., “Preprocessing and Classification of Electrophoresis Gel Images Using Dynamic Time Warping”, International Journal of Electrochemical Science, pp. 1609-1622, 2013.
[11] Akbari, A., Albregtsen, F., & Jakobsen, K. S., Automatic Lane Detection And Separation In One Dimensional Gel Images Using Continuous Wavelet Transform. Analytical Methods, Vol. 2(9), pp. 1360-1371, 2010.
[12] A. Abbas et al., “Automatic Peak Selection by a Benjamin-Hochberg Based Algorithm”, PLoS ONE 8(1) : e53112, 2013.
[13] Taher, R.S., Jamil, N.; Nordin, S., Yusof, F.H. and Mohamed Bahari, U., “Poor DNA Gel Electrophoresis Image Enhancement : Spatial vs Frequency Domain Filters”, 2013 IEEE Conference on Systems, Process & Control (ICSPC2013), p.6, 2013.
[14] Weising, K., Nybom, H., Wolff, K., & Kahl, G., DNA Fingerprinting In Plants : Principles, Methods And Applications, CRC Press, Taylor & Francis Group, NW, 2005.
[15] Du, P., Kibbe, W.A. and Lin S. M., “Improved Peak Detection In Mass Spectrum By Incorporating Continuous Wavelet Transform-Based Pattern Matching”, Bioinformatics, Vol, 22(17), pp. 2059-2065, 2006.
[16] Coombes, K.R. et al, “Improved Peak Detection and quantification of mass spectrometry dta acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform”, Proteomics, Vol. 5 (16), 4107-4117, 2005.
[17] Palshikar, G.K., “Simple Algorithms for Peak Detection in Time-Series”, [Online]. http://tcs-trddc.com/trddc_website/pdf/srl/palshikar_sapdts_2009.pdf
[18] Johnsen, L.G. et al, “An Automated Method For Baseline Correction, Peak Finding And Peak Grouping In Chromatographic Data”, Analyst, 138, pp. 3502-3511, 2013.