Breast Mass Detection in Mammograms using Modified K-means Clustering
Transcript of Breast Mass Detection in Mammograms using Modified K-means Clustering
29th NATIONAL RADIO SCIENCE CONFERENCE
(NRSC 2012) April 10 - 12, 2012, Faculty of Engineering/Cairo University, Egypt
C19. Breast Mass Detection in Mammograms using Modified K-means Clustering
Osama R. Shahin 1, Hamdy M. Kelash
2, Gamal Mahrous
2, and Osama S. Faragallah
2
1 Physic and Mathematic Department, Faculty of Engineering, Helwan University, Egypt
2Department of Computer science and Engineering, Faculty of Electronic Engineering, Menoufia University,
Menouf, 32952, Egypt
ABSTRACT
Breast cancer is one of the major causes of death among women. Small clusters of masses appearing as
collection of white spots on mammograms show an early warning of breast cancer. Early detection performed on
X-ray mammography is the key to improve breast cancer diagnosis. In order to increase radiologist’s diagnostic
performance, several computer-aided diagnosis (CAD) schemes have been developed to improve the detection of
primary identification of this disease. In this paper, an attempt is made to develop a modified k-means clustering
algorithm for breast image segmentation for early detection of breast cancer. Our proposal aims to segment the
breast for grouping those pixels with similar tissue appearance according to certain number of clusters. The
number of clusters were determined a according to the number of bins in the image histogram. The method was
tested over several images of image databases taken from Breast Imaging Reporting and Data System (BI-RADS)
developed by the American College of Radiology, and from digital mammogram images that were provided from
online mammogram database (MIAS database), that provides a standardized classification for mammographic
studies.
Keywords: K-mean, breast image, segmentation, detection.
I. INTRODUCTION
Breast cancer is the most frequently diagnosed cancer in women worldwide with an estimated 1.4 million new
cases in 2008. About half of these cases occurred in economically developing countries [1]. Breast cancer has
placed itself on top of the list of health problems for women in Egypt, representing 35.1% of all female cancer
cases, according to the National Cancer Institute in Cairo [2]. No effective way to prevent the occurrence of breast
cancer exists. Therefore, early detection is the first crucial step towards treating breast cancer. It plays a key role
in breast cancer diagnosis and treatment. This process requires image segmentation and analysis of the images.
Different segmentation algorithms are used for breast density segmentation such as k-Means algorithm [3], Fuzzy
C-Means (FCM) algorithm [4], the Dogs and Rabbit (DaR) algorithm [5]. The main difficulty for those algorithms
was the selection of the final segmentation clusters. The work presented in this paper was concerned specifically
to segment the breast pixels with similar tissue appearance according to certain number of clusters. The number of
clusters were determined a according to the number of bins in the image histogram. All the input images were in
gray level, so that the histogram for these images can be used to classify the images into three main regions. These
regions are the background, normal tissues, and lesions tissues. According to these regions we can define the
number of the final segmentation clusters. In our experimental study, we will use the digital mammogram images
that were provided from online mammogram database (MIAS database) [6]. The remainder of the paper is
organized as follows: we discuss the main mammographic abnormalities in section II. The methodology was
described in section III. Experimental results are presented in section IV. Finally, conclusions are drawn in
section V.
II. MAMMOGRAPHIC ABNORMALITIES
There are about eight typical kinds of abnormalities revealed with a conventional mammogram. An
experienced radiologist is highly tuned to the appearance of abnormalities in breast X-rays, and most of the time
has a pretty good idea whether a suspicious abnormality is likely to be malignant or not. Typical mammographic
findings from breast cancer screening mammograms would include asymmetrical breast tissue, asymmetric
density, architectural distortion, mass, microcalcifications, interval changes compared with previous films,
adenopathy, and other miscellaneous findings. Usually, a mammographic abnormality is followed by additional
imaging studies, such as ultrasound, and if the lesion still appears suspicious it may be sent for biopsy [7].
A. Characteristics of Mass Lesion
In terms of shape, if it is round, oval, or slightly lobular, the mass is probably benign. If the mass has a multi-
lobular contour, or an irregular shape, then it is suggestive of malignancy. 'Margin' refers to the characteristics of
the border of the mass image. When the margin is circumscribed and well-defined the mass is probably benign. If
309978-1-4673-1887-7/12/$31.00 ©2012 IEEE
29th NATIONAL RADIO SCIENCE CONFERENCE
(NRSC 2012) April 10 - 12, 2012, Faculty of Engineering/Cairo University, Egypt
the margin is obscured more than 75% by adjacent tissue, it is moderately suspicious of malignancy. Likewise,
there is moderate suspicion if the margin having many small lobes. If the margin is indistinct or speculated
(consisting of many small 'needle-like' sections) then there is also high suspicion of malignancy. 'Density' is
usually classified as either fatty, low, or high. The mass is probably benign for fatty and low densities, moderately
suspicious of malignancy for high densities [8].
B. Microcalcifications
Microcalcifications are one of the main ways breast cancer mammographically detected when it is in the very
early stages. Microcalcifications are actually tiny specks of mineral deposits (such as calcium) they can be
distributed in various ways. Sometimes microcalcifications are found scattered throughout the breast tissue, and
they often occur in clusters. Frequently, microcalcification deposits are due to benign causes. However, certain
features and presentations of microcalcifications are more likely to be associated with malignant breast cancer
Fig.1.
Fig. 1: Microcalcifications
Three categories of calcifications have been identified by the “The American College of Radiology (ACR)
BIRADS” [9]
(a) Typically benign
(b) Intermediate concern
(c) High probability of malignancy
The summary of BIRADS Classification of Calcifications summarized in Table 1.
Table 1.Summary of BIRADS Classification of Calcifications [9]
Type of calcification Characteristics
Typically benign
Skin Typical polygonal shape.
Vascular Parallel tracks or linear tubular.
Coarse or pop-corn like Involving fibro adenomas.
Rod-shaped Large rod usually > 1mm.
Round Smooth, round clusters.
Punctuate Round or oval calcifications.
Spherical or lucent centered Found in debris collected in ducts, or necrosis areas.
Rim or egg-shell Found in wall of cysts.
Milk or calcium Calcium precipitates.
Dystrophic usually large > 0.5mm in size
Intermediate concern Indistinct or amorphous Appear round and hazy uncertain morphology
High risk Pleomorphic or heterogenous irregular in shape, size and < 0.5mm raises suspicion
Fine, linear or branching Thin, irregular that appear linear from a Distance
310
29th NATIONAL RADIO SCIENCE CONFERENCE
(NRSC 2012) April 10 - 12, 2012, Faculty of Engineering/Cairo University, Egypt
III. PROPOSED METHOD
A breast cancer CAD scheme separates suspicious regions that may contain masses from the background
parenchyma – the tissue characteristic of an organ, as distinguished from associated connective or supporting
tissues [10]. In other words, such schemes partition the mammogram into several nonintersecting regions and
extract regions of interest (ROIs) and suspicious mass candidates from the ultrasound image. While a suspicious
area is darker than its surroundings, it has a similar density, a regular shape of variable size. Thus, image
segmentation is essential to maintaining the sensitivity and accuracy of the entire mass detection and classification
system.
We have proposed an adaptive K-means segmentation method for detection of mass lesion in digital
mammograms. In the present work, we have made an attempt to improve the performance of existing K-means
approach by varying various values of certain parameters discussed in the algorithm [11, 12]. The K-means
algorithm is an iterative technique that is used to partition an image into K clusters. In statistics and machine
learning, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters
in which each observation belongs to the cluster with the nearest mean. The basic algorithm is:
Pick K cluster centers, either randomly or based on some heuristic.
Assign each pixel in the image to the cluster that minimizes the distance between the pixel and the cluster
center.
Re-compute the cluster centers by averaging all of the pixels in the cluster.
Repeat last two steps until convergence is achieved i.e. no pixel change its cluster.
For a given a set of observations 1 2 3
( , , , ..., )n
x x x x where each observation is a D-dimensional real vector. K-
means clustering aims to partition the “n” observations into k sets “S”, where 1 2 3
{ , , , ..., }k
S S S S S . Thus, the
algorithm aims to measure the sum of squared Euclidean distance from each sample to the nearest cluster.
2
1 1
( , )
k m
ij i j
i s
S U u x
(1)
Where 1
( ... )m
is a vector containing cluster centers which equal to the mean of points ini
S . U is a matrix
containing value 1 if a sample i
x belongs to cluster j and 0 otherwise.
A. Standard Algorithm
Given an initial set of k- means in which may be specified randomly or by some heuristic.
I. Assignment Step
Assign each observation to the cluster with closest mean, i.e. partition the observation according to the voroni
diagram generated by the means in equation (2)
*
*{ : 1, 2 , 3, ... }
t t
i j j i j iS x x m x m i k (2)
II. Update Step
Calculate the new means to be the centroid of the observations in the cluster in equation (3)
1
( )
1
j i
t
i jt
x S ti
m xs
(3)
B. Modified Algorithm “1st Modification”
The modifications in this algorithm can be achieved as follow: the histogram is summary graph showing a
count of data points falling in various ranges. The effect is rough approximation of the frequency distribution of
data. The group of data is called classes, and in context of histogram they are known as bins, because one can
311
29th NATIONAL RADIO SCIENCE CONFERENCE
(NRSC 2012) April 10 - 12, 2012, Faculty of Engineering/Cairo University, Egypt
think of them as containers that accumulate data and fill up at a rate equal to the frequency of that data class. The
shape of the histogram sometimes is particularly sensitive to the number of bins. In the widened bins, important
information might get omitted. By reducing the number of bins and then increasing the number of classes in the
K-means algorithm, the detection accuracy is found to be increasing. Quantization in terms of color histograms
refers to the process of reducing the number of bins by taking colors that are very similar to each other and putting
them in the same bin. By default the maximum number of bins one can obtain using the histogram function is 256.
The modification process can be summarized in the following points:
For every pixel, based on the neighbor hood value the histogram equalization is done, by copy the image
matrix in a new matrix and pad it with zero in all sides.
Tracing this new matrix by window size, starting from the position (1, 1).
Find the probability density function of each value inside the new matrix
Find the accumulative distribution function for each pixel and replace its value by the value in the middle
element in the window and repeat this process to whole element in the matrix. See Fig.1(c) for the
resultant histogram.
Now select the histogram bins in the traced image as number of bins and cluster in the k-means algorithm
discussed in [10, 13].
This process keep our modified algorithm convergence to find the optimal segmentation due t predefined
the number of clusters in the beginning of the algorithm.
For the purpose of saving time when trying to compare color histograms, one can quantize the number of bins.
Obviously quantization reduces the information regarding the content of images but as was mentioned this is the
tradeoff when one wants to reduce processing time.
(a)
(b) (c)
Fig.1: (a) Original mammogram from the image database. (b) Histogram for the original image. (c) Histogram for
equalized image.
C. Modified Algorithm “2nd
modification”
Another selection for the number of the classes could be found from the equation (4):
# (# )2
of clusters log of bins in the equalized im age (4)
The detection accuracy will be increased when the algorithm run by the optimum number of clusters.
IV. RESULTS
A database of 120 breast images was formed. All of the real time breast images were collected from a Breast
Imaging Reporting and Data System (BI-RADS) and from online mammogram database (MIAS database) [6, 9]
312
29th NATIONAL RADIO SCIENCE CONFERENCE
(NRSC 2012) April 10 - 12, 2012, Faculty of Engineering/Cairo University, Egypt
.The software used in this algorithm was MATLAB 7.9 that has a collection of functions which supports a wide
range of image processing operations. Fig.2 shows an original image from the image database. It can be seen that
benign and malignant elements in the breast image became clearer. The modified algorithm can be worked better
by select the number of clusters as logarithmic for base 2 for the number of bins that obtained by the equalized
image. The comparison between this algorithm and the other that depicted in [11, 14] can be shown in Fig. 3.
(a)
Original Image
(b)
Result for 1st
modification
(c)
Result for 2nd
modification
(a)
Original Image
(b)
Result for 1st
modification
(c)
Result for 2nd
modification
(a)
Original Image
(b)
Result for 1st
modification
(c)
Result for 2nd
modification
(a)
Original Image
(b)
Result for 1st
modification
(c)
Result for 2nd
modification
(a)
Original Image
(b)
Result for 1st
modification
(c)
Result for 2nd
modification
Fig. 2: (a) Original mammogram images, (b) Results of the 1st
modification , (c) Results for the 2nd modification
313
29th NATIONAL RADIO SCIENCE CONFERENCE
(NRSC 2012) April 10 - 12, 2012, Faculty of Engineering/Cairo University, Egypt
Fig. 3: Comparison between detection rate between 1st modified, 2nd modified, and AKC [11]
Finally, the detection accuracy was estimated and compared the performance with previous similar research works
emphasizing the detection accuracy values. The results obtained are also in support of anticipation with the
findings and diagnosis by a senior radiologist of MISR RADIOLOGY CENTER. Table 1 shows the detection
accuracy of proposed and existing work.
Table 1: Detection accuracy of mass lesion
Type of Mass Lesion
Mass lesion detection accuracy
Bhagwati Charan
Patel et al [11]
Proposed K-means method
1st
modification
2nd
modification
Benign tumor 77% 92% 94%
Malignant Tumor 91% 93% 95%
V. CONCLUSION
Breast cancer is one of the major causes of death among women. In this paper we have presented a modified
approach to identify the presence of breast cancer mass and calcification in mammograms using image processing
functions, K-means clustering for clear identification of clusters. The modified k-means algorithm successfully
detected the breast cancer area in mammograms images. The results indicate that this system can facilitate the
doctor to detect the breast cancer in the early stage of diagnosis as well as classify the total cancer affected area.
This will help doctor to take or analyze in which stage of cancer the patient have and according to which she can
take necessary and appropriate treatment steps. This paper is based on visual detection method of the processed
mammogram images.
REFERENCES
[1] World health organization, http://www.who.int/cancer/en.
[2] National Cancer Institute in Cairo, http://www.nci.edu.eg/
[3] J. B. MacQueen, “Some methods of classification and analysis of multivariate observations”, In
Proceeding of the Fifth Berkeley Symposium on Mathematical Statistics and Probability,
1:281-297, 1967
[4] A. Oliver, “Automatic mass segmentation in mammographic images”, PhD Thesis. Department of
Electronics, Computer Science and Automatic Control. University of Gerona, 2007.
[5] P. McKenzie and M. Alder, “Initializing the EM Algorithm for use in Gaussian Mixture Modeling”,
In Proceeding on Pattern Recognition in Practice IV, pages 91-105, 1994.
[6] MAIS database, http:// peipa.essex.ac.uk/info/mias.html [Accessed: September 20,2011].
314
29th NATIONAL RADIO SCIENCE CONFERENCE
(NRSC 2012) April 10 - 12, 2012, Faculty of Engineering/Cairo University, Egypt
[7] Bijay Ketan Panigrahi Assistant, Manas Ranjan “Feature Extraction for Classification of Micro-
calcifications and Mass Lesions in Mammograms”, IJCSNS International Journal of Computer
Science and Network Security, VOL.9 No.5, May 2009 255 Manuscript
[8] Sampat MP, Whitman GJ, Stephens TW, Broemeling LD, Heger NA, Bovik AC, Markey MK., “
The reliability of measuring physical characteristics of spiculated masses on mammography”, Br J
Radiol. Dec 2006; 79 Spec No 2:S134-40.
[9] American College of Radiology. The ACR breast imaging reporting and data system (BI-RADS) ,
http://www.acr.org/departments/stand_accred/birads/contents.html. [Accessed: September 20, 2011]
[10] Yanhui Guo “Computer-Aided Detection of Breast Cancer Using Ultrasound Images” , PhD in
computer Science , UTAH STATE UNVIVERSITY Logan, UT 2010.
[11] Bhagwati Charan Patel, G.R.Sinha , “An Adaptive K-means Clustering Algorithm for Breast Image
Segmentation”, International Journal of Computer Applications (0975 – 8887) Volume 10– N.4,
November 2010.
[12] Chris Solomon,SchToby Breckon " Fundamentals of Digital Image Processing A Practical Approach
with Examples in Matlab” , John Wiley & Sons, Ltd 2011.
[13] Ron Wehrens and Lutgarde M.C. Buydens , “Model-Based Clustering for Image Segmentation and
Large Datasets Via Sampling ”, Technical report no. 424, University of Washington, Feb.2003.
[14] Mencattini, A., Salmeri, M., Casti, P., Raguso, G. “ Automatic breast masses boundary extraction
in digital mammography using spatial fuzzy c-means clustering and active contour models”, IEEE
International Workshop on Medical Measurements and Applications (MeMeA) , Bari, Italy, pp.632-
637 , May 30-31, 2011.
315