AUTOMATED MOTION DETECTION FROM SPACE IN SEA SURVEILLANCE
Transcript of AUTOMATED MOTION DETECTION FROM SPACE IN SEA SURVEILLANCE
AUTOMATED MOTION DETECTION FROM SPACE IN SEA
SURVEILLIANCE
Elisavet Charalambous(a)(b), Junichi Takaku(c),Pantelis Michalis (d), Ian Dowman(e), Vasiliki
Charalampopoulou(f)
(a) ADITESS Ltd , Nicosia, Cyprus, (b) Department of Electrical and Computing Engineering,
University of Cyprus, Nicosia, Cyprus, (c) Remote Sensing Technology Center of Japan, (d) Center
for Security Studies (KEMEA), (e) Department of Civil, Environmental and Geomatic Engineering,
University College London, (f) Geosystems Hellas
ABSTRACT
The Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM) carried by the Advanced Land-Observing
Satellite (ALOS) was designed to generate worldwide topographic data with its high-resolution and stereoscopic
observation. PRISM performs along-track (AT) triplet stereo observations using independent forward (FWD), nadir
(NDR), and backward (BWD) panchromatic optical line sensors of 2.5m ground resolution in swaths 35 km wide. The
FWD and BWD sensors are arranged at an inclination of ±23.8◦ from NDR.
In this paper, PRISM images are used under a new perspective, in security domain for sea surveillance, based on the
sequence of the triplet which is acquired in a time interval of 90 sec (45 sec between images). An automated motion
detection algorithm is developed allowing the combination of encompassed information at each instant and therefore the
identification of patterns and trajectories of moving objects on sea; including the extraction of geometric characteristics
along with the speed of movement and direction. The developed methodology combines well established image
segmentation and morphological operation techniques for the detection of objects. Each object in the scene is represented
by dimensionless measure properties and maintained in a database to allow the generation of trajectories as these arise
over time, while the location of moving objects is updated based on the result of neighbourhood calculations.
Most importantly, the developed methodology can be deployed in any air borne (optionally piloted) sensor system with
along the track stereo capability enabling the provision of near real time automatic detection of targets; a task that cannot
be achieved with satellite imagery due to the very intermittent coverage.
Keywords: ALOS, PRISM, stereo mapping, satellite images, image processing, automated motion detection, sea
surveillance
1. INTRODUCTION
Earth observation satellites capture an area on Earth periodically according to their orbit. Humans are good at deriving
information from such images, because of our innate visual and mental abilities[1]. In image processing a range of
algorithms have already been developed in order to extract specific (based on the application) geospatial information for
the earth itself along with the position, size, shape and interrelationships of man-made objects on its surface. Thus, the
satellite image is transferred to meaningful geospatial information in various formats (raster, vector and matrix).
In this paper, the ALOS PRISM triplet along track stereo panchromatic images are used as the source for the automatic
detection of vessels’ motion on the sea. The test areas are two large commercial ports in Cyprus (Larnaca, Limassol). As
a first step, the PRISM images should be geometrically registered to each other before applying the automated motion-
detection-algorithms. This procedure is important as the improvement of the relative position increase the accuracy and
the precision of the automatic motion detection algorithm. Since this is our first attempt in performing automatic motion
detection with satellite images, we should note that the selected methods were chosen in ways that would limit the bias
(even if that would come against the effectiveness of the outcome). The automatic detection of patterns within the images
is done by making use of edge detection techniques along with the use of several pattern descriptors following by a
number of dimensionless measures.
The results of this study are very promising based on the fact that the large majority of moving vessels are detected, from
large ships to very small boats (as long as they are captured in the image). Moreover, the motion of a vessel is identified
even at very low speeds of movement (the lowest could be less than 0.7 miles / hour). Surprisingly, during the
implementation of the procedure an airplane in its way to land to Larnaca airport is detected, too. Thus it seems that the
developed methodology could be a generic one.
In this paper, the importance of along track stereoscopic view with optical sensors is presented in automatic motion
detection procedures, along with a very effective methodology to extract the position and the velocity of moving vessels
on the sea. In general terms, the developed methodology could be used in any satellite or air borne (optionally piloted)
sensor system with along the track stereo capability, which could provide even real time automatic detection of targets.
Thus, an integrated solution could be established starting from a Remote Piloted Aircraft System (RPAS) capable of
carrying an along track sensor payload, which can process on board the above mentioned methodology and send
information on the moving vessels in near real time to a control center.
2. ALOS/PRISM
Advanced Land Observing Satellite (ALOS) was launched on January 2006, and operated very well until May 2011.
Though the mission life of the satellite ended years ago approximately 6.5 million scenes of archived data covering the
entire globe are still available to users [2]. Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM), one
of onboard sensors carried by ALOS, was designed to generate worldwide topographic data with its optical stereoscopic
observation. The sensor consists of three independent panchromatic radiometers for viewing forward (FWD), nadir
(NDR), and backward (BWD) in 2.5m ground resolution producing a triplet stereoscopic image along its track of 35km
swath. The FWD and BWD sensors are arranged at an inclination of ±23.8◦ from NDR, so that a time interval of the
acquisitions for the triplet images will be 90 seconds (45 seconds between images). This specific imaging mechanism
can be utilized not only for creating the topographic data but also for detecting moving objects during the time intervals.
a. Test data set
We selected four datasets of PRISM triplet images which observed the two site-areas in Cyprus.
Table 1 shows the basic information of the datasets
Table 1. Basic information of the PRISM triplet image datasets.
Area Scene ID (NDR) Date Scene center lat./lon. (NDR)
Larnaca ALPSMN231682900 2010/05/31 34.804/ 33.657
ALPSMN271942900 2011/03/03 34.806/ 33.637
Limassol ALPSMN254292905 2010/11/02 34.557/ 33.040
ALPSMN267712905 2011/02/02 34.559/ 33.035
b. Registration of the stereo images
The PRISM triplet stereo images should be geometrically registered with each other before applying the automated
motion-detection-algorithms. The relative orientation technique, which is being used in the triangulation for creating the
topographic data, is utilized in this process [3]. In the relative orientation tie-points among the triplet images of level-1B1
(i.e., geometrically not corrected) are automatically generated first. Approximately 100 tie-points were generated for
each image triplet. Figure 1 shows the distribution of 100 tie-points generated on one of the NDR images, and one of the
tie-points generated on each image.
Figure 1. Distribution of 100 tie-points on one of NDR images (a), and one of the tie-points generated on each image: (b)
NDR, (c) FWD, and (d) BWD.
Next the tie-points are used to estimate errors of orientation parameters for FWD/BWD images while the ones for NDR
image are fixed as a reference. A viewing vector, which corresponds to each tie-point in image space, is calculated in
object space for each triplet image by using the sensor model of PRISM and orbit/attitude data of the satellite. And then
the errors of orientation parameters (i.e., attitude errors) of FWD/BWD sensors are determined with the conventional
least square method so that the RMSE of x-parallaxes (i.e., cross-track parallaxes) of the viewing vectors between FWD
and NDR and between BWD and NDR will be minimized. The symmetry of y-parallaxes (i.e., along-track parallaxes)
between FWD and NDR and between BWD and NDR is also considered in the estimation. Table 2 shows the residual
RMSE of x-parallaxes for the tie-points on each image triplet. The RMSEs are from 0.76m to 1.50m and are
corresponding to the relative registration accuracy of each image triplet. The absolute errors of the relative orientation
models remain almost in vertical offset due to instability of pitch angles of the FWD/BWD sensors; they are corrected
with a tie-point which is generated on an area where the absolute height is known such as at sea level (i.e., height = 0).
Table 2: Residual RMSE of x-parallaxes for the tie-points on each image triplet (in meters).
Scene ID (NDR) No. of tie-points FWD – NDR BWD – NDR
ALPSMN231682900 98 1.404 1.005
(a) (b)
(c) (d)
ALPSMN271942900 100 0.935 1.078
ALPSMN254292905 100 0.756 1.006
ALPSMN267712905 100 1.502 1.063
Finally the triplet images of level-1B1 are projected onto the geoid surface (i.e., mean sea level) in UTM coordinates
with the orientation models so that they will only include y-parallaxes which correspond to the orthometric height on
the ground. Therefore, the objects on the sea are basically registered geometrically on the images without any post-
processing.
3. BACKGROUND
The motion or traffic detection based on satellite data has already been examined by the research community and it is
based on high resolution Satellite images (Worldview) or SAR images (TERRASAR).
In the WorldView case, the traffic monitoring or the car motion detection is based on specific characteristics of
WorldView acquisition. WorldView-2’s focal plane consists of one PAN and two MS imaging sensors (MS1: Blue,
Green, Red, Near-IR1 and MS2: Coastal Blue, Yellow, Red Edge, Near-IR2), with a very small lag in data recording
between each spectral band, few hundreds of milliseconds, causes imperceptible displacements of moving objects in the
image space.
In paper [4] the use of MS1 and MS2 (time t1) spectral bands for traffic detection due to the high spectral contrast
between roads and vehicles in this specific wavelengths is investigated. In this case, the delay in data recording between
Coastal Blue and Blue has been estimated in only 316ms [5], thus, an object with a speed of 30 km/h (18.6 mph) will
show a ground displacement of 2.6 m between the image sensed at time t0 and time t1=t0+316ms. In terms of image
space, the ground displacement corresponds to about 5 image pixels in the PAN and less than 2 image pixel in the MS
channels. The automated motion detection procedure for moving cars in paper [4] seems to be very effective based on the
results where the accuracy is better than 90%.
However, for sea surveillance the Worldview image acquisition process could not provide any help, based on the fact
that the time is very small to allow detection of low speed moving vessels. Moreover, ALOS PRISM image are in the
same waveband while Worldview images are in different spectral bands, thus additional pre-processing need to be done.
In the TERRASAR case, the Dual Receive Antenna (DRA) or Aperture Switching (AS) mode [6] enables the reception
of two SAR images of the same scene within a small temporal gap, which can be utilized for interferometric detection
approaches. Since moving objects suffer from special effects in the SAR processing algorithm, specific methods to detect
vehicles are required [7]. Thus, in order to detect moving vehicles we should derive simultaneously the effects in SAR
images, which are caused by the vehicle’s motion in across and along-track.
However, for sea surveillance as in a Worldview case the image acquisition is not useful, based on the fact that the time
is very small to allow detection of low speed moving vessels. As a conclusion the main tool that can provide the data for
automated motion detection could be a sensor system based on the same architecture as ALOS PRISM. This means that
along track image acquisition with at least one sensor in oblique view is an adequate requirement.
4. METHODOLOGY
Satellite images are essentially raster information where the cell (pixel) size determines the resolution at which the data
is represented. The depiction of pictorial information with resolution dependent media introduces the benefit that the
location of each point in the image is straightforwardly characterized by its coordinates in the grid of pixels, but also the
issue that the resolution imposes limits on the level of approximation (during digitization) of every element. Satellite
images commonly capture the objects of interest from fairly large distances and as a consequence large areas fall within
in the field of view; artifacts, in the result image, may not be accurately represented. Moreover the study of consecutive
images – obtained at regular time intervals – makes this task even more challenging as it involves the extraction and
identification of under-sampled artifacts. The automatic detection of patterns within the images is done by making use of
well-established edge detection techniques [8] while pattern identification is accomplished with the use of several pattern
descriptors [9]; among them a number of dimensionless measures [10]. The combination and further analysis of the
results obtained by the individual snapshots then allows the automatic motion detection of the overlapping scenes [11].
Each pixel of ALOS PRISM is translated into an area of 6.25m2; each pixel side corresponds to a distance of 2.5m.
According to the Nyquist criterion, a sampling interval equal to twice the highest specimen spatial frequency is required
in order to accurately preserve its spatial resolution in the resulting digital image [8]. An equivalent measure is Shannon's
sampling theorem, which states that the digitizing device must utilize a sampling interval that is no greater than one-half
the size of the smallest resolvable feature of the optical image. Therefore, to capture the smallest degree of detail present
in a specimen, sampling frequency must be sufficient so that two samples are collected for each feature, guaranteeing
that both light and dark portions of the spatial period are gathered by the imaging device [10]. With regards to the
analyzed sample, a pixel detail level of 2.5m may only adequately depict artifacts larger than 5m. Even though, the
automatic motion detection of ships does not require their image representation at the finest level it is evident that this
limitation may easily cause problems; especially for small sized ships or ships close to each other or even close to the
coastline. Due to this, the applied image processing techniques for pattern recognition are selected considering this
limitation while still having computational performance in mind.
Image segmentation was accomplished by applying global thresholding to each image and serves as a computationally
efficient method even for very large images. Thresholding is a method very well known in the field which partitions the
image histogram by using a single global threshold. Segmentation is then accomplished by scanning the image pixel by
pixel and labeling each pixel as object or background [8]. Over the years this method has evolved producing methods for
optimal and adaptive thresholding, despite this the most basic implementation has been adopted as it limits the bias over
any triplet of images. The resultant binary image then became subject to binary dilation with a diamond structuring
element of 5 neighbours which allowed filling one pixel gaps between extracted edges. In binary morphology, dilation is
a shift-invariant (translation invariant) operator, strongly related to the Minkowski addition [12]. The objective at this
point of the process was to extract patterns drawn by the objects of interest and any other artifacts that remained in the
sea. The extraction of patterns became possible by filling the connected edges and then extracting their boundaries. This
important step had as a result the outlining of the ships.
The aim of this paper was to test the feasibility of automatic motion detection of ships from satellite images and therefore
the coastline was considered as a given and was manually removed after boundary detection. The reduced version of the
triplet images consisting only patterns in the sea were then used for the detection of pattern movements. The reliable
and robust tracking of a ship, over discontinuous snapshots, requires the use of pattern descriptors for identification,
other than proximity; especially considering peak times near ports. Therefore, for each detected pattern, in each of the
three images, the following measures/features were calculated: area, perimeter, roundness, elongation, bounding box
area, eccentricity, centroid, orientation and solidity (shown in Table 3) where the centroid was expressed in the form of
standard coordinates by combining the Geo-information attached to each image and the pictorial coordinates.
Table 3. List of calculated pattern descriptors
Metric Description
Area Actual number of pixels in the region
Perimeter Length in pixels around the boundary of the region; calculated as the distance between each
adjoining pair of pixels around the border of the region
Roundness A ratio function to the area and perimeter of the region (dimensionless ratio)
Elongation Minor axis width to major axis length ratio (dimensionless ratio)
Bounding box area The area of the box completely containing the object calculated as the product of the length of the
major and minor axes (dimensionless ratio)
Eccentricity The ratio of the distance between the foci of the ellipse and its major axis length (dimensionless
ratio)
Centroid Specifies the center of mass of the region
Orientation Calculated as the angle of the line segment connecting two points to the equator
Solidity Proportion of the pixels in the convex hull that are also in the region. Computed as area/convex area
(dimensionless ratio)
Image differencing is commonly used for the detection of movement [11] between images of same size. The triplet
images, under analysis, are captured at times separated by intervals of 45 seconds from three different angles (backward,
nadir and forward) and non-standard resolution. Despite the high overlapping of the registered images, image
differencing did not serve as an option due to the already discussed limitations. The detection of motion for each pattern
was performed based on local searching where for two successive images, proximity searches were initiated, on the
second picture, around the centroid points of patterns in the first. The proximity of the search was determined as a
function of the elapsed time between the images, a ship's traveling, near the coast, speed limit and the pixel point
resolution while the distance between the two points is calculated with the Harvesine distance metric [13]. A match was
signified if the proximity search yields a pattern with similar pattern descriptors (roundness, elongation, eccentricity and
solidity while the area and parameter were attributes used in the case of multiple matching) as the initiating pattern. The
centroid points (expressed as coordinates) of the matching tuples are used for calculating the speed of movement, in
nautical miles per hour. Orientation of movement is calculated based on the shaping angle between the equator (x-axis)
and the centroids of the tracked ship during two consecutive images. This was done to avoid miscalculating the
orientation of square shapes and also provides information on the direction of movement.
The discussed process iterated twice for each triplet of images; during the first iteration the backward and nadir images
were analyzed while during the second iteration the accordance between the nadir and forward images was examined.
Each set of results contained the list of matched patterns and the final step involved tracking the presence of each
identified pattern in both sets. It is also important to note that, each pattern is used only once (during the matching
process); in case more than one patterns match the same characteristics then the one closest to the initial state is selected.
5. RESULTS AND DISCUSSION
The effectiveness of the discussed methodology has been tested on the triplets of PRISM images with the proximity
search being limited at radiuses of 230m (which corresponds to a speed of movement at 10 nautical-mph) and 460m
(speed of movement at 20 nautical-mph). Figure 2 shows part of a satellite image capture at nadir overlaid with some
detected and tracked ships. The pattern drawn in red is extracted from the BWD image, the blue with the NDR and the
greed with the FWD sensor. Moving ships are captured at different snapshots where when combined prove their track, on
the same lines, the traces of not moving ships fall on top of each other. Figure 2 also illustrates the problem of extracting
patters very close to the dock or ships at anchor, however with the deployment of the already discussed methodology this
is solved as the limits of the coastline are given.
Figure 2. Example shot were both still and moving ships are successfully tracked
Figure 3 shows instances in which the patterns were correctly matched, the ships are correctly matched regardless of their
difference in speed of movement. Pattern representation with the use of descriptors and the expression of centroids based
on the coordinate system allows the superposition of information contained within the images. The impact of limitations
imposed by satellite images (i.e. the fuzzy/blurry depiction of vessels) may be significantly decreased when the analysis
expands outside the use of single triplets and previously contained information is used for increasing the reliability of
already produced results. Also, the detection of vessels near non overlapping regions is flawed as shown in Figure as
not all three sensors manage the complete coverage of the area.
More interestingly, it has been noticed that the proposed methodology is also valid for the detection of any type of
moving object (large enough) which resides within the captured field of view. Upon the analysis of triplets of the
Larnaca port it was noticed that the patterns drawn from flying aircraft could also be detected as far as the proximity
search was performed for a large enough radius, adequate of capturing an aircraft’s speed (Figure 5).
Figure 3: Example patterns of successful motion detection and tracking (a) (b)
Figure 4. The non-fully overlapping of the images may neglect the tracking of objects that lay outside the fully covered areas
Motion detection from space is possible when images are collected on a regular basis, the implemented experiment
illustrated that the speed and orientation of ships may be calculated adequately, while the trajectory of movement may be
obtained by combining the calculated orientation at each snapshot. Taking it a step further this could be enhanced with
an estimated arrival time; calculated as the length of the trajectory line to the coast over the average speed of movement.
The obtained results could be improved by employing a more sophisticated method for edge detection. Even though the
use of thresholding assists significantly in reducing the computational complexity and time during processing, the result
is highly dependent on the selection of the threshold value (and overall contrast).
Figure 5. Aircraft automatic detection during landing procedure
Undetected ships result from instances where the ship either relies outside the overlapping region or due to under-
sampling its trace changes significantly from one capture to another. Also since most vessels bear some common shape
(i.e. vessels are commonly print as elongated patterns) and operational (i.e. acceleration and deceleration) properties.
Such measures may be used as means of increasing robustness in presence of noise which may be caused due to bad
weather conditions. The number of successfully and unsuccessfully detected artefacts for each triplet are summarized in
. The presented numbers are the result of manually counting end examining the resultant images, however it is important
to stress that most misses occurred upon the detection of small vessels (with areas less than 10px), which the eye could
not identify with certainty. The figures in Table 4 also indicate that the success rate varied from triplet to triplet, this is
mainly due to the fact that the methodology did not consider the differences in the contrast of the images (so that no
additional bias was introduced), the fact that the counting and identification of the vessels, during evaluation, was done
with the eye (no ground truth existed) and finally due to the fact that the images do not completely overlap.
Table 4.The table summarizes the number of detected and missed vessels/artefacts in the sea
Scene ID (NDR) Detected Missed Error (%)
ALPSMN231682900 28 4 12.5
ALPSMN271942900 33 5 13.1
ALPSMN254292905 47 4 7.8
ALPSMN267712905 35 3 7.8
6. CONCLUSIONS
The automatic detection of vessels from satellite images is possible despite the imposed challenges, however, it does not
only require consideration of the object’s properties (speed of movement, ability to change orientation) but also issues
introduced by the large pixel size. Additionally, the accurate and reliable detection of a vessel’s track requires the use of
information of multiple and sequential triplets. The more times a pattern is identified the higher the chance of generating
reliable and plausible conclusions on the trajectory and route of a ship, while, the use of several dimensionless metrics
serves relatively well the objective of pattern identification when the depiction of an object is not accurate.
As we have already mentioned in this paper, along track stereoscopic view along with an effective automatic motion
detection methodology can provide the position and the velocity of moving vessels. More generally, as ALOS PRISM
along with any other Satellite system could not provide near real time data, an airboard platform is needed for real time
surveillance. Thus, a (possible) Remote Piloted Aircraft Systems (RPAS) could be an option for real time information
which is capable to carry an along track sensor payload with onboard processing for automated motion detection the
appropriate communication channel in order to send information on the moving vessels (targets) in near real time to a
control center in an appropriate format.
ACKNOWLEDGEMENTS
The authors would like to thank Dr. Takeo Tadono of the Earth Observation Research Center (EORC) at the Japan
Aerospace Exploration Agency (JAXA) for his contribution related to the provision of ALOS PRISM datasets.
REFERENCES
[1] Y. Shubham, T. Rupali, J. Ankush, and B. Roshan, “Crack Detection of Medical Bone Image Using Contrast
Stretching Algorithm with the Help of Edge Detection,” Int. J. Sci. Eng. Technol., vol. 4, no. 3, pp. 223–227,
2015.
[2] T. Tadono, H. Ishida, F. Oda, S. Naito, K. Minakawa, and H. Iwamoto, “Precise Global DEM Generation by
ALOS PRISM,” Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. II–4.
2014.
[3] J. Takaku and T. Tadono, “PRISM On-orbit Geometric Calibration and DSM Performance,” IEEE Trans.
Geosci. Remote Sens., vol. 47, no. 12, pp. 4060–4073, 2009.
[4] R. K. Mishra and Y. Zhang, “Moving Vehicle Extraction from One-Pass WorldView-2 Satellite Imagery,” in
Proceedings of Global Geospatial Conference 2012, 2012.
[5] A. Kääb, “Vehicle velocity from WorldView-2 satellite imagery,” IEEE Data Fusion Contest 2011, 2011.
[6] H. Runge, C. Laux, R. Metzig, and U. Steinbrecher, “Performance analysis of virtual multi-channel ts-x sar
modes,” in Proceedings of EUSAR European Conference on Synthetic Aperture Radar 2006, 2006.
[7] D. Weihing, S. Suchandt, S. Hinz, H. Runge, and R. Bamler, “Traffic Parameter Estimation Using TerraSAR-X
Data,” in Proceedings of the International Society for Photogrammetry and Remote Sensing (ISPRS) Congress,
2008.
[8] R. C. Gonzalez, R. E. Woods, and B. R. Masters, “Digital Image Processing, Third Edition,” J. Biomed. Opt.,
vol. 14, no. 2, 2009.
[9] T. Caelli, Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshops SSPR
2002 and SPR 2002, Windsor, Ontario, Canada, August 6-9, 2002. Proceedings. Springer.
[10] J. Friel, Practical Guide to Image Analysis. ASM International, 2000.
[11] A. Mitiche and J. Aggarwal, Computer Vision Analysis of Image Motion by Variational Methods. 2014.
[12] P. K. Ghosh, “An algebra of polygons through the notion of negative shapes,” CVGIP Image Underst., vol. 54,
no. 1, pp. 119–144, Jul. 1991.
[13] R. . Sinnott, “Virtues of the Haversine. Sky and Telescope,” vol. 68, no. 2, p. 159 (1984).