Research on Ship Collision Probability Model Based ... - MDPI

17
Citation: VukĆĄa, S.; Vidan, P.; BukljaĆĄ, M.; PaviÂŽ c, S. Research on Ship Collision Probability Model Based on Monte Carlo Simulation and Bi-LSTM. J. Mar. Sci. Eng. 2022, 10, 1124. https://doi.org/10.3390/ jmse10081124 Academic Editor: Sangjin Kim Received: 3 August 2022 Accepted: 9 August 2022 Published: 15 August 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional afïŹl- iations. Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). Journal of Marine Science and Engineering Article Research on Ship Collision Probability Model Based on Monte Carlo Simulation and Bi-LSTM Sr ÂŻ dan VukĆĄa 1, * , Pero Vidan 1, *, Mihaela BukljaĆĄ 2 and Stjepan PaviÂŽ c 3 1 Faculty of Maritimes Studies, University of Split, 21000 Split, Croatia 2 Faculty of Transport and TrafïŹc Sciences, University of Zagreb, 10000 Zagreb, Croatia 3 Department of Physics and Chemistry, Faculty of Science, University of Split, 21000 Split, Croatia * Correspondence: [email protected] (S.V.); [email protected] (P.V.); Tel.: +385-91-784-7860 (S.V.) Abstract: The efïŹciency and safety of maritime trafïŹc in a given area can be measured by analyzing trafïŹc density and ship collision probability. Maritime trafïŹc density is the number of ships passing through a given area in a given period of time. It can be measured using vessel tracking systems, such as the Automatic IdentiïŹcation System (AIS). The information provided by AIS is real-time data designed to improve maritime safety. However, the AIS data can also be used for scientiïŹc research purposes to improve maritime safety by developing predictive models for collisions in a research area. This article proposes a ship collision probability estimation model based on Monte Carlo simulation (MC) and bidirectional long short-term memory neural network (Bi-LSTM) for the maritime region of Split. The proposed model includes the processing of AIS data, the veriïŹcation of AIS data, the determination of ports and ship routes, MC and the collision probability, the Bi-LSTM learning process based on MC, the ship collision probability for new or existing routes, and the trafïŹc density. The results of MC, i.e., trafïŹc/vessel route and density, and collision probability for the study area can be used for Bi-LSTM training with the aim of estimating ship collision probability. This article presents the ïŹrst part of research that includes MC in detail, followed by a preliminary result based on one day of processed AIS data used to simulate MC and propose a model architecture that implements Bi-LSTM for ship collision probability estimation. Keywords: automatic identification system (AIS); AIS data processing; collision probability; traffic density modeling; Monte Carlo simulation; bidirectional long short-term memory neural network (Bi-LSTM) 1. Introduction The safe navigation of ships is of global importance. Due to the high density of maritime trafïŹc on some fairways or in certain marine regions, accidents occur more frequently. Therefore, collisions, whether ship-to-ship or ship with ïŹxed objects, cause loss of life, environmental damage, and economic losses, as well as other undesirable events that can lead to catastrophic consequences [1,2]. Despite the great efforts of many international organizations to improve maritime safety, many accidents continue to occur at sea. Most maritime accidents occur in areas with dense trafïŹc and difïŹcult navigational conditions (e.g., northern Baltic Sea, Adriatic Sea, Singapore Strait, etc.) [3,4]. The development of maritime trafïŹc monitoring and control has slowed down in recent years due to the rapid growth of trafïŹc. This is due to trafïŹc monitoring and control not keeping pace with the development of ships with larger dimensions and higher speeds. Therefore, estimates and analysis of trafïŹc density on the main navigational routes are carried out [5]. However, by analyzing the data, it is possible to obtain information about areas with increased trafïŹc density. Therefore, using data from AIS, which is one of the devices used by ships, it is possible to predict potential accident hazard areas [6]. It allows for the identiïŹcation of maritime trafïŹc and the consolidation of ship-tracking data, which can J. Mar. Sci. Eng. 2022, 10, 1124. https://doi.org/10.3390/jmse10081124 https://www.mdpi.com/journal/jmse

Transcript of Research on Ship Collision Probability Model Based ... - MDPI

Citation: VukĆĄa, S.; Vidan, P.; BukljaĆĄ,

M.; Pavic, S. Research on Ship

Collision Probability Model Based on

Monte Carlo Simulation and

Bi-LSTM. J. Mar. Sci. Eng. 2022, 10,

1124. https://doi.org/10.3390/

jmse10081124

Academic Editor: Sangjin Kim

Received: 3 August 2022

Accepted: 9 August 2022

Published: 15 August 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional affil-

iations.

Copyright: © 2022 by the authors.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

Journal of

Marine Science and Engineering

Article

Research on Ship Collision Probability Model Based on MonteCarlo Simulation and Bi-LSTMSrdan VukĆĄa 1,* , Pero Vidan 1,*, Mihaela BukljaĆĄ 2 and Stjepan Pavic 3

1 Faculty of Maritimes Studies, University of Split, 21000 Split, Croatia2 Faculty of Transport and Traffic Sciences, University of Zagreb, 10000 Zagreb, Croatia3 Department of Physics and Chemistry, Faculty of Science, University of Split, 21000 Split, Croatia* Correspondence: [email protected] (S.V.); [email protected] (P.V.); Tel.: +385-91-784-7860 (S.V.)

Abstract: The efficiency and safety of maritime traffic in a given area can be measured by analyzingtraffic density and ship collision probability. Maritime traffic density is the number of ships passingthrough a given area in a given period of time. It can be measured using vessel tracking systems,such as the Automatic Identification System (AIS). The information provided by AIS is real-timedata designed to improve maritime safety. However, the AIS data can also be used for scientificresearch purposes to improve maritime safety by developing predictive models for collisions in aresearch area. This article proposes a ship collision probability estimation model based on MonteCarlo simulation (MC) and bidirectional long short-term memory neural network (Bi-LSTM) for themaritime region of Split. The proposed model includes the processing of AIS data, the verification ofAIS data, the determination of ports and ship routes, MC and the collision probability, the Bi-LSTMlearning process based on MC, the ship collision probability for new or existing routes, and the trafficdensity. The results of MC, i.e., traffic/vessel route and density, and collision probability for the studyarea can be used for Bi-LSTM training with the aim of estimating ship collision probability. Thisarticle presents the first part of research that includes MC in detail, followed by a preliminary resultbased on one day of processed AIS data used to simulate MC and propose a model architecture thatimplements Bi-LSTM for ship collision probability estimation.

Keywords: automatic identification system (AIS); AIS data processing; collision probability;traffic density modeling; Monte Carlo simulation; bidirectional long short-term memory neuralnetwork (Bi-LSTM)

1. Introduction

The safe navigation of ships is of global importance. Due to the high density ofmaritime traffic on some fairways or in certain marine regions, accidents occur morefrequently. Therefore, collisions, whether ship-to-ship or ship with fixed objects, cause lossof life, environmental damage, and economic losses, as well as other undesirable events thatcan lead to catastrophic consequences [1,2]. Despite the great efforts of many internationalorganizations to improve maritime safety, many accidents continue to occur at sea. Mostmaritime accidents occur in areas with dense traffic and difficult navigational conditions(e.g., northern Baltic Sea, Adriatic Sea, Singapore Strait, etc.) [3,4]. The development ofmaritime traffic monitoring and control has slowed down in recent years due to the rapidgrowth of traffic. This is due to traffic monitoring and control not keeping pace with thedevelopment of ships with larger dimensions and higher speeds. Therefore, estimates andanalysis of traffic density on the main navigational routes are carried out [5].

However, by analyzing the data, it is possible to obtain information about areas withincreased traffic density. Therefore, using data from AIS, which is one of the devices usedby ships, it is possible to predict potential accident hazard areas [6]. It allows for theidentification of maritime traffic and the consolidation of ship-tracking data, which can

J. Mar. Sci. Eng. 2022, 10, 1124. https://doi.org/10.3390/jmse10081124 https://www.mdpi.com/journal/jmse

J. Mar. Sci. Eng. 2022, 10, 1124 2 of 17

also be used for scientific research purposes [7]. The AIS data transmission depends onthe type of AIS station, the group of messages, the navigational status, the speed, and thecourse change of the ships. Reporting intervals vary from 2 s to 6 min [7–9]. AIS dataare increasingly used in certain areas of maritime research, such as ship collisions andgroundings probability, traffic monitoring, marine pollution, etc. [10,11].

AIS was originally developed to enable the identification of ships to improve commu-nication and, thus, the safety of navigation and maritime security. In combination withRadio Detection and Ranging (RADAR) and Electronic Chart Display and InformationSystem (ECDIS), it is of great importance to navigational officers and Vessel Traffic Service(VTS) operators. Over the past decade, AIS data have been collected and used to analyzeand study traffic in specific maritime regions [12]. Unless it is a satellite AIS, the data forthe specific maritime regions are limited by the range of the very high frequency (VHF)antenna. Therefore, the data used in this study refer to the maritime region of Split shownin Figure 1. The range of the AIS antenna defines the research area. However, such a studycan be conducted for any maritime region for which AIS data are available.

J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 2 of 18

However, by analyzing the data, it is possible to obtain information about areas with increased traffic density. Therefore, using data from AIS, which is one of the devices used by ships, it is possible to predict potential accident hazard areas [6]. It allows for the iden-tification of maritime traffic and the consolidation of ship-tracking data, which can also be used for scientific research purposes [7]. The AIS data transmission depends on the type of AIS station, the group of messages, the navigational status, the speed, and the course change of the ships. Reporting intervals vary from 2 s to 6 min [7–9]. AIS data are increasingly used in certain areas of maritime research, such as ship collisions and groundings probability, traffic monitoring, marine pollution, etc. [10,11].

AIS was originally developed to enable the identification of ships to improve com-munication and, thus, the safety of navigation and maritime security. In combination with Radio Detection and Ranging (RADAR) and Electronic Chart Display and Information System (ECDIS), it is of great importance to navigational officers and Vessel Traffic Service (VTS) operators. Over the past decade, AIS data have been collected and used to analyze and study traffic in specific maritime regions [12]. Unless it is a satellite AIS, the data for the specific maritime regions are limited by the range of the very high frequency (VHF) antenna. Therefore, the data used in this study refer to the maritime region of Split shown in Figure 1. The range of the AIS antenna defines the research area. However, such a study can be conducted for any maritime region for which AIS data are available.

Figure 1. Adapted from [13]; maritime region of Split.

The problem of maritime coastal regions is the variations in traffic density and char-acteristics, i.e., how these variations on existing or newly formed navigational routes af-fect the ship collision probability or other accidents at sea. This paper presents research that can be used for a ship collision probability model based on MC and Bi-LSTM. How-ever, this is the first part of the research that focuses on the AIS data processing and MC. Therefore, the acquisition and processing of the “raw” AIS data will be explained first, followed by the data validation using web scraping. In addition, the research area is di-vided into cells and a traffic density map is created. A database is then created containing the AIS data for all cells. From the database, the distributions for the cells are created and used for MC. Next, the MC is explained in detail, which uses bivariate Gaussian distribu-tions to select starting waypoints for ship routes. The results of MC provide information and create a database for collision probability, ship routes, and traffic density. Finally, the architecture of a model that can be used for collision probability using Bi-LSTM is pro-posed.

Figure 1. Adapted from [13]; maritime region of Split.

The problem of maritime coastal regions is the variations in traffic density and charac-teristics, i.e., how these variations on existing or newly formed navigational routes affectthe ship collision probability or other accidents at sea. This paper presents research that canbe used for a ship collision probability model based on MC and Bi-LSTM. However, this isthe first part of the research that focuses on the AIS data processing and MC. Therefore, theacquisition and processing of the “raw” AIS data will be explained first, followed by thedata validation using web scraping. In addition, the research area is divided into cells anda traffic density map is created. A database is then created containing the AIS data for allcells. From the database, the distributions for the cells are created and used for MC. Next,the MC is explained in detail, which uses bivariate Gaussian distributions to select startingwaypoints for ship routes. The results of MC provide information and create a database forcollision probability, ship routes, and traffic density. Finally, the architecture of a modelthat can be used for collision probability using Bi-LSTM is proposed.

2. Literature Review

Initially, an overview of two papers related to the research area is given, in whichAIS data are used to analyse maritime traffic. LuĆĄic et al. [14] described the area of thecentral Adriatic Sea, which includes the maritime region of Split. Maritime traffic in thisarea is analysed for the period 2014 and 2015. The AIS data were used as input variablesfor the Inland Water Risk Assessment Program (IWRAP Mk2). The model determines theprobability of collisions and groundings during this period for the area around PalagruĆŸa

J. Mar. Sci. Eng. 2022, 10, 1124 3 of 17

Island. The initial results in this article did not indicate a higher risk. However, whenadditional variables are added, such as the prevailing navigational routes and the presenceof smaller vessels for which AIS is not mandatory, the risk of vessel collisions increasessignificantly. BukljaĆĄ et al. [15] addressed the analysis of maritime traffic in the Strait ofSplit based on AIS data. The article presents a method for determining the number ofsimultaneous ship passages through the Strait of Split and information about these shipsfor five randomly selected days in summer.

A significant contribution in this field was made by Pedersen [16], who is a pioneerin the field of research on the risk of ship collisions and groundings. Analysis of marineaccident statistics shows that collisions and groundings are the most common marineaccidents. The main objective of his research was to identify optimal and economic (cost-effective) measures to improve prevention and reduce the probability of such accidents.

Svanberg et al. [17] described, in detail, how AIS data are used for scientific researchpurposes. The most commonly used data in research are contained in AIS messages oftypes 1–3 and 5, namely: Maritime Mobile Service Identity (MMSI), navigational status,rate of turn, Speed Over Ground (SOG), position accuracy, longitude, latitude, Course OverGround (COG), true course, timestamp, call sign, vessel name, vessel dimensions, EstimatedTime of Arrival (ETA), draught, and destination. The paper provides a structured andcomprehensive overview of the application of AIS data in the maritime domain, methodsto determine the ship collision probability, one of which is MC. Zhang et al. [18] presenteda model for assessing the near-miss collision risk of multiple vessels in the open waterbased on sufficient kinematic data of the ships from AIS data. Wang et al. [19] presenteda new method for analysing vessel traffic with respect to speed limits using AIS data.The authors explain, in detail, how to clean the data, geocode the AIS data to waterwaysegments, calculate ship traffic characteristics, and estimate ship speed and traffic volume.Vestre et al. [20] presented a method for creating a database of near-collision situationsusing processed raw AIS data by implementing a closest-point-of-approach framework.Gil et al. [21] examined the typical empirical values of bow crossing range (BCR) and thefactors that influence it using AIS data over a 10-year period.

Mou et al. [1] focused on the traffic separation scheme area in the port of Rotterdamand performed an analysis of ship collision analysis using AIS data and a linear regressionmodel. They attempted to determine the relationship between a ship closest point ofapproach as a key factor in collision avoidance and vessel size, speed, and course. Inthe study by Nowy et al. [22], an analysis of vessel traffic flow based on AIS data wasperformed using statistical methods. The area of the port of Swinoujscie was used as acase study to show the distribution of maritime traffic and its parameters. In the articleabout the classification of ships and their trajectories based on AIS data, the authors classifyand categorise the ships in the Indonesian maritime area [23]. Based on characteristics,such as length, width, and deadweight, the ships are classified into nine types. The dataare analysed to determine the ship routes. The application of the IWRAP Mk2 model toidentify risk areas for ship collisions is presented in a case study for the Gulf of Finland [24].The authors describe the drawbacks of the model, due to the lack of a verification process,and propose a new framework for assessing ship collision probability and identifying riskareas. A new, clear, and streamlined approach is also presented that includes a verificationprocess that can be applied to any collision probability model that uses tools similar toIWRAP Mk2. Wolsing et al. [25] provided a literature review on anomaly detection inmaritime AIS tracks. Based on a review of 44 articles, the authors identify several typesof anomalies.

Coric et al. [26] provided an overview of the research and models used to determinethe ship collision probability. The models used to determine the collision probability areclassified into different categories and their main characteristics are described. Twenty-ninemodels were examined and divided into AIS data-processing-based models, simulations,and analytical models. This review article can serve as a good basis for improving theexisting, and developing a new, ship collision probability model.

J. Mar. Sci. Eng. 2022, 10, 1124 4 of 17

The use of MC is well illustrated in a study for the Gulf of Finland [27,28]. The authorspropose a model/method for estimating the ship collision probability. The model candetermine the expected number of accidents/collisions, areas, and times when accidents aremost likely to occur and displays input data (AIS) that can be used for future research andmodels dealing with expected consequences, i.e., probability. MC is one of the approachesused by Huang et al. [29] to evaluate the safety of navigation in approaching channelsbased on AIS data. The risk probability results showed consistency with another crossing-line method used. From the following articles, it can be seen that MC has been used inmaritime research, whether in the study of traffic density and flows, ship collisions andgroundings, pollution, or ship maintenance [30–33]. An interesting model that combines theMarkov model and MC to estimate the marine accident risks is presented by Jon et al. [34].The advantage of this method is its simplicity, its flexibility in application to any type ofmaritime accidents, and the fact that the method does not require extensive data collection.Xin et al. [35] used MC to estimate collision risk and ensure accuracy and efficiency ofassessments, based on AIS data. The results were found to be effective and reliable ina complex dynamic situation. Moreover, MC was compared with some other methodsand found to be the least restrictive. An example of using MC in a hybrid model ispresented by Zhang et al. [36]. The method incorporates big data (AIS) analytics fordetermination of ship collision probability and the damage stability that may lead to theserious flooding. The collision risk estimation involves three steps: reconstruction of vesseltrajectories of the struck vessels using K-means for static voyage clustering and density-based spatial clustering of applications with noise (DB-SCAN), clustering of collisionscenarios, identification of clustered collision scenarios using the proposed avoidancebehaviour-based collision detection model, and evaluation of collision breaches using ananalysis tool named SHARP. The application of the non-zonal MC to damage stabilityanalysis for the design draft is also presented. The recent research by Ma et al. [37], basedon AIS data, presented a calculation for collision probability (ship-bridge) by MC andBayesian networks. The results of the proposed model show a promising evaluation ofsafety measures, which may be valuable for implementing preventive safety measuresfor ships passing under a bridge. In addition to bridges, offshore installations, suchas wind turbines, could also increase the collision probability in congested traffic areas.Liu et al. [38] presented a numerical study of offshore wind turbines rammed by ships todetermine the most damaging impact scenarios.

It is important to highlight two studies on the ship domain-based model of collisionrisk. Due et al. [39] made an important contribution to a new concept in the dynamic shipdomain and its analysis. The concept of this risk-perception-based ship domain (SD) isbased on the timing of ships taking evasive manoeuvres and the associated perceived risklevels using a large amount of data from AIS. Compared to existing distance-based shipdomains, this domain is dynamic. The paper also demonstrates the application of theDouglas Peucker algorithm (DP) and the non-linear velocity obstacles (NLVO) algorithm.The results show that the proposed risk-perception-based SD can provide an understandingof the ship-to-ship collision avoidance process. Szlapczynski and Szlapczynska [40] alsoproposed a ship domain-based model for collision risk, including five decision variables.

Ji et al. [41] developed a dynamic adaptive threshold grating compression algorithmfor trajectory compression based on AIS data. Due to the large amount of data, such amethod is important to compress the data more easily and process it more effectively andwith better quality. Wang et al. [42] proposed the detection of an abnormal ship trajectoriesbased on complex polygon using AIS data. They showed that anomalies depend on shiptype, time period, weather conditions, and traffic characteristics.

The application of AIS data in a neural-network-based model for traffic monitoringis presented in [43]. The authors propose a deep learning model that reduces memoryredundancy and computational requirements. The authors used trajectory reconstruction,noise generated by detection, and vessel type from AIS data. The performance of theproposed deep learning model was empirically evaluated. The results showed the ability

J. Mar. Sci. Eng. 2022, 10, 1124 5 of 17

to process AIS data containing noise and irregular time samples. The article [44] proposes amodel based on Ship Traffic Extraction Network (STENet), a neural network for predictingmaritime traffic in the caution area. The network was trained using AIS data collectedover a 2-year period in the Yeosu region of South Korea. The article presents four modelsthat predict future traffic in the area in 20, 30, 40, and 50 min. The results show over50% improvement in traffic prediction. Zhang et al. [45] presented a new approach basedon image recognition using Convolutional Neural Networks (CNNs). The authors questionwhether such a CNN-based model can quickly and accurately interpret images based onAIS data to analyse and classify collision risk. They conclude that CNN can interpret imagesand that adding AIS data to the images improves model prediction. A neural-network-based method for predicting ship trajectories using AIS data is presented in [46]. For shipautonomy, it is necessary to know the position of the ship, the actual trajectory of theship, and the direction of motion in order to determine the correction to the ship’s courserequired to return to the planned route. The possibilities of applying neural networksto create such a correction system using only the coordinates of the ship’s position arediscussed. It was found that this is possible on the sections of the route where the ship ison the steady course. Murray et al. [47] described an AIS data processing method usedfor the neural-network-based model, recurrent neural networks. The authors present adeep learning framework that would support the prediction of ship behaviour based onthe collected AIS data.

To recover missing or incomplete vessel monitoring data, Yuan et al. [48] combinedstatistical analysis, data mining, and LSTM neural network to develop a model. TheLSTM neural network was implemented for three important tasks: vessel trajectory repair,engine speed modelling, and fuel consumption prediction. A new risk assessment modelthat can be used to plan preventive measures for collision accidents is presented in [49].Zou et al. [50] proposed research on collision avoidance of navigational marks based onLSTM. The authors used an LSTM neural network for ship trajectory prediction. Theexperimental results showed high prediction accuracy using the proposed LSTM model.The improved recurrent neural network (RNN), like the LSTM neural network, has a widerange of applications. The advantage of the LSTM network is its ability to remember pastdata over a long period of time, its good learning ability, and its ability to handle nonlinearproblems. It uses evidence consequences, a decision tree model, and a back-propagationneural network. He et al. [51] introduced Gated Recurrent Unit (GRU) and bidirectionalrecurrent neural networks (BRNNs) in a Current Information Mining Model (CIMM) usingAIS data. A machine learning model for ship-to-ship collision risk based on big (AIS)and hydrometeorological data and current traffic conditions is presented in [52]. Thestrong development and progress in maritime traffic monitoring based on the collectionof data from various sources, such as AIS, RADAR, Long Range Identification & Tracking(LRIT), and satellite RADAR imagery, opens opportunities for the development of artificialintelligence models. Sawada et al. [53] proposed a new approach for collision avoidancebased on a redesign network using Long Short Memory (LSTM) cell. The result showedthat the method used was suitable in the continuous action spaces but tended to have lowcourse-keeping ability. The model incorporating heading and speed changes was proposedas future work.

To address the problem of low accuracy and high complexity in models for shiptrajectory estimation and collision risk evaluation, Abebe et al. [54] proposed a hybridAutoregressive Integrated Moving Average-Long Short-Term Memory (ARIMA-LSTM)model based on AIS data. Zhou et al. [55] proposed the introduction of a particle swarm op-timization (PSO) algorithm to optimise the LSTM model and determine a better predictionof collision risk. The results showed that LSTM based on a particle swarm optimization(PSO-LSTM) model can handle complex problems with higher accuracy.

AIS trajectory data restoration based on Bi-directional Long Short-Term Memory Re-current Neural Networks (Bi-LSTM-RNNs) was presented in [56]. The model architectureis presented and results indicated that the method is feasible and more accurate for water-

J. Mar. Sci. Eng. 2022, 10, 1124 6 of 17

ways with more complex geometry. A ship trajectory prediction model, which can be usedto develop collision avoidance algorithms, is proposed in [57]. The processing of AIS data(over a period of 14 days) by scaling and spectral clustering is presented and the applicationof Bi-LSTM for model development is proposed. The results of the model showed higheraccuracy compared to two other similar methods. The authors suggest future improvementin data processing and application inefficiency.

From the analysis of the available literature, we may summarize that the proposedhybrid model that introduces the use of AIS data processing, MC, ship domain, and Bi-LSTM is feasible and that such models will be increasingly used to assess navigation safetyin the future, especially as the level of ship autonomy increases.

3. Methodology

In order to effectively examine ship collision probability, various research methodswill be utilized. The research methodology includes the following:

‱ Method for collecting and processing the “raw” AIS data;‱ Data validation using web scraping with a comparative method;‱ Mathematical methods and software solutions create a map of maritime traffic density

(research area);‱ Description method describes the relevant data needed to build the model;‱ The analysis method selects the relevant data (input variables) for the research area

and creates a database for MC;‱ The statistical method determines the distributions of maritime traffic for the research

area from the database;‱ The MC calculates the ship collision probability for the research area;‱ The modelling method will be used to create a neural network that learns by synthesis

based on the results of the MC;‱ The deductive method is used to determine the ship collision probability for the

studied area as a function of navigation routes and traffic density.

The detailed configuration parameters of the software and hardware environmentused for the model development are presented in Table 1.

Table 1. Software and hardware environment.

Hardware SoftwareCPU Ryzen 7 3700x

GPU RTX 3080ti

RAM 32.00 GB

OS Linux

Programming Language Python 3.7.3

IDE Pycharm

Framework Jax

The proposed model of AIS data processing, MC, and collision probability estimationconsists of four steps:

‱ Step 1—AIS data processing (the upper part in Figure 2);‱ Step 2—determination of port and ship route (the lower part in Figure 2);‱ Step 3—process of MC (Figure 3);‱ Step 4—estimation of collision probability based on the results of MC (Figure 4).

The AIS data are received and stored in what is called “raw” format, i.e., as NationalMarine Electronics Association (NMEA) files. The NMEA prescribes standards for theinterconnection of marine electronics. AIS stations use internationally accepted maritimedigital interfaces and data sentences to exchange data between other devices, systems, ornetworks [58]. Data have to be processed, i.e., decoded with an application AisDecoder,

J. Mar. Sci. Eng. 2022, 10, 1124 7 of 17

to be used in the model and to determine the collision probability. Therefore, in Figure 2(Step 1), the NMEA file is imported and configured according to the AisDecoder usermanual [59]. Further, the AIS messages of types 1–3 and 5 were selected because theycontain input variables for MC [27,28,35]. These variables are:

‱ The ship MMSI;‱ The timestamps;‱ The ship position;‱ The routes;‱ The ship speed;‱ The ship type;‱ The ship dimensions (length and width).

Finally, the processed AIS data are stored in a comma separated value (.csv) file.However, due to the AIS data vulnerability, the verification for the processed data wasperformed using web scraping (comparing and extracting data from the website). Next, inFigure 2 (Step 2), ports are determined by checking the time of stopping and the speed ofships along with parallel sorting of ship positions. The ports include the anchor or driftpositions of the ships and do not have to be within the port limits. Parallel sorting meansthat ports and positions are determined at the same time, since there are vessels in theresearch area that pass the same route multiple times on the same day (same route andopposite course). Then, everything that does not belong to the research area is deleted andthe routes are created using the Ramer Douglas Peucker algorithm (RDP) [60].

J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 7 of 18

The proposed model of AIS data processing, MC, and collision probability estimation consists of four steps: ‱ Step 1—AIS data processing (the upper part in Figure 2); ‱ Step 2—determination of port and ship route (the lower part in Figure 2); ‱ Step 3—process of MC (Figure 3); ‱ Step 4—estimation of collision probability based on the results of MC (Figure 4).

k: = 0

INPUT – AIS DATA FOR SPECIFIC DAY

LOOP BY DAYS

CLEAR INCOPLETE DATA – SORTING BY TIME & MMSI

WEB SCRAPING BY MMSI – COMPLETITION OF MISSING DATA

STEP

1

PORT DETECTION BASED ON LONGER PERIOD OF SHIPS

STOPPAGE

SORTING OF SHIP POSITIONS PARALLEL WITH PORT DETECTION

DELETING EVERYTHING WHAT IS OUTSIDE OF MAP – SPLIT MARITIME

REGION

CREATING A ROUTE FOR EACH SHIP USING RAMER-DOUGLEAS-

PEUCKER ALGORITHM

STEP

2

1

Figure 2. Flowchart for AIS data processing (Step 1) and port and ship route determination (Step 2). Figure 2. Flowchart for AIS data processing (Step 1) and port and ship route determination (Step 2).

J. Mar. Sci. Eng. 2022, 10, 1124 8 of 17J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 8 of 18

DEFINING MAP OF CELLS FOR RESEARCH AREA

DETERMINING STATISTICAL OCUPATIO0N OF EACH CHELL

START LOOP TO GENERATE THE NUMBER OF SHIPS, THEIR COURSES, SPEEDS, TYPES AND

TIMES

i: = 0

RANDOM GENERATOR FROM DISTRIBUTION OF CELL OCUPATION

CELL SELECTION

CREATING STATISTICAL

DISTRIBUTION OF PASSAGE TIME IN A

CELL

CREATING UNIFORM ROUTE DISTRIBUTION

IN A CELL

RANDOM GENERATOR

FROM DISTRIBUTON

RANDOM GENERATOR

FROM DISTRIBUTON

SELECTED DEPARTURE TIME

SELECTED ROUTE

SHIP READY FOR THE ROUTE PASSAGE – GENERATED FROM THE DISTRIBUTIONS

SELECTED LENGHT

STATISTICAL DISTRIBUTION OF

SHIP TYPES IN A CELL

STATISTICAL DISTRIBUTION OF

LENGHT, WIDTH AND SPEED FOR GIVEN SHIP

TYPE

SELECTED WIDTHSELECTED SPEED

RANDOM GENERATOR FROM EACH SEPERATE

DISTRIBUTION

RANDOM GENERATOR

FROM DISTRIBUTON

SELECTED TYPE OF SHIP

EMPIRICALLY DETERMINED MEAN VALUES AND STANDARD

DEVIATIONS FOR THE BI-VARIANT GAUSSIAN FOR EACH

ROUTE WP

GENERATING POISSON DISTRIBUTION DURING DELAYS AND

CORRECTING PORT TIME

START OF THE LOOP TO GENERATE THE NEXT WAYPOINT COORDINATES

ALONG THE ROUTE

j: = 0

CREATING A LIST OF POSITIONS, TIME AND ORIENTATION WITH RESPECT TO

A GIVEN SPEED AND MINIMUM SPATIAL STEP

AT THE END OF THE LOOP OF ONE SHIP, SAVE ALL DATA AND CONTINUE THE MAIN LOOP

TESTING IF THE SHIP IS IN PORT

DETERMINING THE RADIUS OF BI-VARIANT GAUSSIAN WITH

RESPECT TO THE LAND AND ISLANDS OF EACH SUBSEQUENT

WAYPOINT

SELECTION OF STARTING LAT. & LONG. WITH BI-VARIANT GAUSSIAN

GENERATE THE NEXT WAYPOINT

STEP

3

1

Figure 3. MC flowchart (Step 3). Figure 3. MC flowchart (Step 3).

J. Mar. Sci. Eng. 2022, 10, 1124 9 of 17J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 9 of 18

INPUT SAVED SIMULATION DATA

SORT BY TIME

STEP

4

TESTING IF TWO SHIPS ARE

IN THE SAME POSITION FOR A SAME TIME INTERVAL

(FOR A GIVEN MAXIMUMCOLLISION RADIUS AND TIME PERIOD)

YES

LOG DATA

NO

END

Figure 4. Collision probability estimation flowchart based on MC results (Step 4).

The AIS data are received and stored in what is called “raw” format, i.e., as National Marine Electronics Association (NMEA) files. The NMEA prescribes standards for the in-terconnection of marine electronics. AIS stations use internationally accepted maritime digital interfaces and data sentences to exchange data between other devices, systems, or networks [58]. Data have to be processed, i.e., decoded with an application AisDecoder, to be used in the model and to determine the collision probability. Therefore, in Figure 2 (Step 1), the NMEA file is imported and configured according to the AisDecoder user manual [59]. Further, the AIS messages of types 1–3 and 5 were selected because they contain input variables for MC [27,28,35]. These variables are: ‱ The ship MMSI; ‱ The timestamps; ‱ The ship position; ‱ The routes; ‱ The ship speed; ‱ The ship type; ‱ The ship dimensions (length and width).

Finally, the processed AIS data are stored in a comma separated value (.csv) file. However, due to the AIS data vulnerability, the verification for the processed data was performed using web scraping (comparing and extracting data from the website). Next, in Figure 2 (Step 2), ports are determined by checking the time of stopping and the speed of ships along with parallel sorting of ship positions. The ports include the anchor or drift positions of the ships and do not have to be within the port limits. Parallel sorting means that ports and positions are determined at the same time, since there are vessels in the research area that pass the same route multiple times on the same day (same route and opposite course). Then, everything that does not belong to the research area is deleted and the routes are created using the Ramer Douglas Peucker algorithm (RDP) [60].

In Figure 3 (Step 3), MC is performed. The first part is to divide the research area into arbitrary number of cells. Therefore, for this model, the research area is divided into 91 cells. Next, the statistical occupancy for each cell is determined and an iteration of 10,000

Figure 4. Collision probability estimation flowchart based on MC results (Step 4).

In Figure 3 (Step 3), MC is performed. The first part is to divide the research area intoarbitrary number of cells. Therefore, for this model, the research area is divided into 91 cells.Next, the statistical occupancy for each cell is determined and an iteration of 10,000 shipsis performed on one day of AIS data. In each iteration, a random cell is selected and theship is generated based on the statistical data. The statistical data consist of distributionsrelated to ship types, ship speed, ship width and length, and passage time within the cell.In addition, a navigational route is also selected by a random generator from the statisticalroute distribution of a selected cell. Next, to generate more realistic waypoints for shiproutes a bi-variant Gaussian is used since each waypoint consists of two variables, latitudeand longitude. In addition, the Poisson distribution was implemented to include possibleship delays in port departure and arrival times in the simulation. Finally, the generatedship and its route are saved and the process continues.

Once MC is complete, the ship collision probability is determined by testing whethertwo ships are in the same position during the same time period, taking into account theship domain as shown in Figure 4 (Step 4).

The results of MC are the traffic density, ship routes, and ship collision probability forresearch area. Since all of these variables are time dependent, they are most suitable as inputvariables for the Bi-LSTM. Therefore, the Bi-LSTM can be trained based on these results.The purpose of the Bi-LSTM is to estimate the collision probability for new or existingnavigational routes and changes in traffic density. Figure 5 proposes the architecture of amodel that can be used to estimate ship collision probability using MC and Bi-LSTM.

J. Mar. Sci. Eng. 2022, 10, 1124 10 of 17

J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 10 of 18

ships is performed on one day of AIS data. In each iteration, a random cell is selected and the ship is generated based on the statistical data. The statistical data consist of distribu-tions related to ship types, ship speed, ship width and length, and passage time within the cell. In addition, a navigational route is also selected by a random generator from the statistical route distribution of a selected cell. Next, to generate more realistic waypoints for ship routes a bi-variant Gaussian is used since each waypoint consists of two variables, latitude and longitude. In addition, the Poisson distribution was implemented to include possible ship delays in port departure and arrival times in the simulation. Finally, the generated ship and its route are saved and the process continues.

Once MC is complete, the ship collision probability is determined by testing whether two ships are in the same position during the same time period, taking into account the ship domain as shown in Figure 4 (Step 4).

The results of MC are the traffic density, ship routes, and ship collision probability for research area. Since all of these variables are time dependent, they are most suitable as input variables for the Bi-LSTM. Therefore, the Bi-LSTM can be trained based on these results. The purpose of the Bi-LSTM is to estimate the collision probability for new or existing navigational routes and changes in traffic density. Figure 5 proposes the architec-ture of a model that can be used to estimate ship collision probability using MC and Bi-LSTM.

AIS data Monte Carlo simulation

Traffic density

Ship routes

Collision probability

Bi-LSTM

New or existing route with with variable

traffic density

Collision probability

Training dataset

Validation dataset

Test dataset

Figure 5. Proposed block diagram of a model for collision probability estimation using MC and Bi-LSTM.

4. Preliminary Results and Discussions Preliminary results are based on the decoded NMEA file for one day in summer 2019.

This chapter provides details about MC and the database creation process, which includes traffic density, ship routes, and ship collision probability.

After decoding the AIS data, as explained in the methodology, Figure 2 (Step 1), to create a map of navigational route density, it is necessary to define the coastline bounda-ries for the research area. The coastline boundaries were established by drawing lines along the coast, creating an object with its coordinates, indicating the coastal area. Figure 6 shows an example of the defined positions of the coastlines for two islands. In the same way, all coastlines were extracted and integrated into the Python program code to create a navigational route density map. This is necessary to avoid the ship routes running over ground.

Figure 5. Proposed block diagram of a model for collision probability estimation using MC and Bi-LSTM.

4. Preliminary Results and Discussions

Preliminary results are based on the decoded NMEA file for one day in summer 2019.This chapter provides details about MC and the database creation process, which includestraffic density, ship routes, and ship collision probability.

After decoding the AIS data, as explained in the methodology, Figure 2 (Step 1), tocreate a map of navigational route density, it is necessary to define the coastline boundariesfor the research area. The coastline boundaries were established by drawing lines alongthe coast, creating an object with its coordinates, indicating the coastal area. Figure 6shows an example of the defined positions of the coastlines for two islands. In the sameway, all coastlines were extracted and integrated into the Python program code to createa navigational route density map. This is necessary to avoid the ship routes runningover ground.

J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 11 of 18

(a) (b)

Figure 6. Coastline boundaries positions: (a) Island of Brac; (b) Island of Vis.

To reduce computation time due to the large number of waypoints for each ship route, the RDP algorithm is used to generate a similar ship route with fewer waypoints. Thus, once the coastlines are defined, ship routes are created (density map) using the RDP algorithm and stored by MMSI number for each ship, as shown in Figure 2 (Step 2), show-ing the methodology. Figure 7 shows a density map based on processed AIS data.

Figure 7. Density map based on processed AIS data.

In the following, it is necessary to divide the research area into a grid and determine the statistical occupancy of each cell. For this purpose, the research area is arbitrarily di-vided into a 7 × 13 grid, as shown in Figure 8.

Figure 6. Coastline boundaries positions: (a) Island of Brac; (b) Island of Vis.

To reduce computation time due to the large number of waypoints for each shiproute, the RDP algorithm is used to generate a similar ship route with fewer waypoints.Thus, once the coastlines are defined, ship routes are created (density map) using theRDP algorithm and stored by MMSI number for each ship, as shown in Figure 2 (Step 2),showing the methodology. Figure 7 shows a density map based on processed AIS data.

J. Mar. Sci. Eng. 2022, 10, 1124 11 of 17

J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 11 of 18

(a) (b)

Figure 6. Coastline boundaries positions: (a) Island of Brac; (b) Island of Vis.

To reduce computation time due to the large number of waypoints for each ship route, the RDP algorithm is used to generate a similar ship route with fewer waypoints. Thus, once the coastlines are defined, ship routes are created (density map) using the RDP algorithm and stored by MMSI number for each ship, as shown in Figure 2 (Step 2), show-ing the methodology. Figure 7 shows a density map based on processed AIS data.

Figure 7. Density map based on processed AIS data.

In the following, it is necessary to divide the research area into a grid and determine the statistical occupancy of each cell. For this purpose, the research area is arbitrarily di-vided into a 7 × 13 grid, as shown in Figure 8.

Figure 7. Density map based on processed AIS data.

In the following, it is necessary to divide the research area into a grid and determinethe statistical occupancy of each cell. For this purpose, the research area is arbitrarilydivided into a 7 × 13 grid, as shown in Figure 8.

J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 12 of 18

Figure 8. Cell occupation for one day.

Furthermore, the cell occupancy (CO) is calculated and presented as a matrix in Equation (1). Each element in the matrix CO corresponds to a cell in the grid, where the first element in the matrix represents the first cell in the grid, i.e., the top left cell in Figure 8. Then, each element in the CO matrix indicates the number of ships in the corresponding cell. From Equation (1), the seventh element in the matrix indicates the highest number of ships, i.e., 26 that correspond to the seventh cell in the grid in Figure 8, which covers the port of Split.

CO =

⎣⎱⎱⎱⎱⎱⎡4 0 4 1 8 5 26 0 0 0 0 0 01 3 4 7 2 3 2 4 2 0 0 0 02 1 1 2 1 5 13 0 0 2 1 1 00 0 0 0 0 18 13 0 0 2 1 1 00 0 1 0 3 9 1 0 0 0 0 0 00 0 1 0 3 9 1 0 0 0 0 0 00 0 0 0 0 2 0 0 0 0 0 0 0⎊⎄⎄

⎄⎄⎄⎀ (1)

The implementation of a grid is important in order to have the possibility to select only cells of interest, e.g., Strait of Split, port areas, etc. Another advantage of the grid is the faster computational time. Statistical distributions can be created from the CO matrix. Therefore, Figure 9 shows an example of the distributions generated for a selected cell, i.e., ship speed, length, and time a ship passes the cell. Similar distributions are also gen-erated for the ship type, width, and navigational route.

Figure 8. Cell occupation for one day.

Furthermore, the cell occupancy (CO) is calculated and presented as a matrix inEquation (1). Each element in the matrix CO corresponds to a cell in the grid, where thefirst element in the matrix represents the first cell in the grid, i.e., the top left cell in Figure 8.Then, each element in the CO matrix indicates the number of ships in the correspondingcell. From Equation (1), the seventh element in the matrix indicates the highest number of

J. Mar. Sci. Eng. 2022, 10, 1124 12 of 17

ships, i.e., 26 that correspond to the seventh cell in the grid in Figure 8, which covers theport of Split.

CO =

4 0 4 1 8 5 26 0 0 0 0 0 01 3 4 7 2 3 2 4 2 0 0 0 02 1 1 2 1 5 13 0 0 2 1 1 00 0 0 0 0 18 13 0 0 2 1 1 00 0 1 0 3 9 1 0 0 0 0 0 00 0 1 0 3 9 1 0 0 0 0 0 00 0 0 0 0 2 0 0 0 0 0 0 0

(1)

The implementation of a grid is important in order to have the possibility to selectonly cells of interest, e.g., Strait of Split, port areas, etc. Another advantage of the grid isthe faster computational time. Statistical distributions can be created from the CO matrix.Therefore, Figure 9 shows an example of the distributions generated for a selected cell, i.e.,ship speed, length, and time a ship passes the cell. Similar distributions are also generatedfor the ship type, width, and navigational route.

J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 13 of 18

(a) (b) (c)

Figure 9. Distributions for ship type: (a) speed; (b) length; (c) time.

From these distributions, MC generates a random ship and selects a random naviga-tional route. Figure 10 shows an example of a generated ship from a selected cell and distributions. Furthermore, Figure 10 shows the simulation of the ship on the randomly selected route. The details selected from the distributions of the generated ship are: ‱ The selected cell from which the ship was generated is cell number 45; ‱ The departure position is 𝜑 = 43°09â€Č19″ N, λ = 016°25â€Č48″ E; ‱ The arrival position is 𝜑 = 43°30â€Č08″ N, λ = 016°27â€Č36″ E; ‱ Ship type is Sailing Vessel; ‱ Time of passage is 8.65 h; ‱ MMSI is 238,182,736; ‱ Length is 111.1 m; ‱ Width is 15 m; ‱ The speed is 6.8 kt.

Figure 9. Distributions for ship type: (a) speed; (b) length; (c) time.

From these distributions, MC generates a random ship and selects a random navi-gational route. Figure 10 shows an example of a generated ship from a selected cell anddistributions. Furthermore, Figure 10 shows the simulation of the ship on the randomlyselected route. The details selected from the distributions of the generated ship are:

‱ The selected cell from which the ship was generated is cell number 45;‱ The departure position is ϕ1 = 43

◩09â€Č19â€Čâ€Č N, λ1 = 016

◩25â€Č48â€Čâ€Č E;

‱ The arrival position is ϕ2 = 43◩30â€Č08â€Čâ€Č N, λ2 = 016

◩27â€Č36â€Čâ€Č E;

‱ Ship type is Sailing Vessel;‱ Time of passage is 8.65 h;‱ MMSI is 238,182,736;‱ Length is 111.1 m;‱ Width is 15 m;‱ The speed is 6.8 kt.

J. Mar. Sci. Eng. 2022, 10, 1124 13 of 17J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 14 of 18

Figure 10. Generated ship and its route from MC.

The results of the MC ship collision probability are shown in Figure 11. It should be noted that the results are presented as a 3D density map of collision probability, where the x-axis represents Longitude (°) East, the y-axis represents Latitude (°) North, and the z-axis represents collision probability in %. In addition, the lower part in Figure 11 shows a map of the research area and the upper part along the z-axis shows a map of collision probability. Furthermore, the peaks in the density map are colored according to the colli-sion probability. From this, it can be seen that the collision probability is highest in the congested port, channel, and island area. However, the collision probability density map was created by defining ship domain as a circle of 2 nautical miles around the ship. It should be noted that this is a distance considered by mariners to be safe during navigation. In addition, the simplified ship domain was introduced to reduce the computational time required by MC and to verify that the simulation produces valid results. However, more accurate results can be obtained with a more complex ship domain. Therefore, a ship do-main presented in [2,60] will be used for further research. The calculated long (𝑎 ) and short radii (𝑏 ) of the ship domain can be determined for 0.1 kt each using Equations (2) and (3).

𝑎 = ..

𝑏 = . ..

if 𝑉 𝑉 (2)

𝑎 = ..

𝑏 = . ..

if 𝑉 𝑉 (3)

where 𝐿 is the length, 𝑉 is the velocity of ship, and 𝑉 is the speed of 10 knots (kt).

Figure 10. Generated ship and its route from MC.

The results of the MC ship collision probability are shown in Figure 11. It should benoted that the results are presented as a 3D density map of collision probability, where thex-axis represents Longitude (◩) East, the y-axis represents Latitude (◩) North, and the z-axisrepresents collision probability in %. In addition, the lower part in Figure 11 shows a mapof the research area and the upper part along the z-axis shows a map of collision probability.Furthermore, the peaks in the density map are colored according to the collision probability.From this, it can be seen that the collision probability is highest in the congested port,channel, and island area. However, the collision probability density map was created bydefining ship domain as a circle of 2 nautical miles around the ship. It should be notedthat this is a distance considered by mariners to be safe during navigation. In addition, thesimplified ship domain was introduced to reduce the computational time required by MCand to verify that the simulation produces valid results. However, more accurate resultscan be obtained with a more complex ship domain. Therefore, a ship domain presentedin [2,60] will be used for further research. The calculated long (a0) and short radii (b0) ofthe ship domain can be determined for 0.1 kt each using Equations (2) and (3).

a0=8L0 −

((V10kt−V0)×0.06

0.1kt

)2

b0=3.2L0 −

((V10kt−V0)×0.028

0.1kt

)2

if V0 ≀ V10kt (2)

a0=8L0 +

((V0−V10kt)×0.06

0.1kt

)2

b0=3.2L0 +

((V0−V10kt)×0.028

0.1kt

)2

if V0 > V10kt (3)

where L0 is the length, V0 is the velocity of ship, and V10kt is the speed of 10 knots (kt).The performance of this algorithm compared to the approaches used by other re-

searchers is the integration of MC and the deep learning process (Bi-LSTM) into a hybridmodel. However, most researchers use MC or the deep learning process for collision proba-bility independently and, in this way, they are mostly used for areas where a large numberof collisions occur. Therefore, hybrid models are more desirable for areas such as the mar-itime region of Split, where traffic density is lower but increases every year and is highlydependent on the season. For this reason, MC is used to create a realistic database due tothe large number of iterations for the research area. Therefore, deep learning processes canbe trained on these databases and provide better insight into the research area.

J. Mar. Sci. Eng. 2022, 10, 1124 14 of 17J. Mar. Sci. Eng. 2022, 10, x FOR PEER REVIEW 15 of 18

Figure 11. Collision probability as result of MC.

The performance of this algorithm compared to the approaches used by other re-searchers is the integration of MC and the deep learning process (Bi-LSTM) into a hybrid model. However, most researchers use MC or the deep learning process for collision prob-ability independently and, in this way, they are mostly used for areas where a large num-ber of collisions occur. Therefore, hybrid models are more desirable for areas such as the maritime region of Split, where traffic density is lower but increases every year and is highly dependent on the season. For this reason, MC is used to create a realistic database due to the large number of iterations for the research area. Therefore, deep learning pro-cesses can be trained on these databases and provide better insight into the research area.

Although this article presents the preliminary results and methodology of model de-velopment, some future research directions can be highlighted, such as: ‱ Inclusion of hydro-meteorological data (currents, winds, visibility, etc.) in the pro-

posed model that may influence the ship collision probability; ‱ Constructing a model that determines the probability of oil spills in areas of high

collision risk and using the Potential Incident Simulation, Control and Evaluation System (PISCES) software to predict the extent of oil spills and determine the optimal prevention measures;

‱ Based on the proposed model, develop a “user interface” to assist VTS operators in early identification of collision-prone areas or as a tool for creating and implementing safety fairways, traffic separation schemes, and navigational routes.

5. Conclusions This paper proposed a model that can be used to estimate ship collision probability

based on MC and Bi-LSTM using AIS data. Furthermore, this paper is the first part of the research, in which MC of the model is explained in detail. Based on MC, a database is created, i.e., input data for Bi-LSTM. Therefore, the processing of AIS raw data is ex-plained and MC architecture is presented. Moreover, validation of the processed AIS data using web scraping method is introduced. The paper also highlights some steps to be

Figure 11. Collision probability as result of MC.

Although this article presents the preliminary results and methodology of modeldevelopment, some future research directions can be highlighted, such as:

‱ Inclusion of hydro-meteorological data (currents, winds, visibility, etc.) in the proposedmodel that may influence the ship collision probability;

‱ Constructing a model that determines the probability of oil spills in areas of highcollision risk and using the Potential Incident Simulation, Control and EvaluationSystem (PISCES) software to predict the extent of oil spills and determine the optimalprevention measures;

‱ Based on the proposed model, develop a “user interface” to assist VTS operators inearly identification of collision-prone areas or as a tool for creating and implementingsafety fairways, traffic separation schemes, and navigational routes.

5. Conclusions

This paper proposed a model that can be used to estimate ship collision probabilitybased on MC and Bi-LSTM using AIS data. Furthermore, this paper is the first part of theresearch, in which MC of the model is explained in detail. Based on MC, a database iscreated, i.e., input data for Bi-LSTM. Therefore, the processing of AIS raw data is explainedand MC architecture is presented. Moreover, validation of the processed AIS data usingweb scraping method is introduced. The paper also highlights some steps to be taken whenusing MC to obtain more accurate results, such as marking and excluding all coastlines inthe study area, to prevent ship routes from running over ground. Furthermore, the gridfor a studied maritime region is implemented so that only cells of interest can be observed,with the aim to reduce computational time. Based on all of these steps and processedAIS data, distributions of ship characteristics can be determined and used as input datafor the MC.

In addition, preliminary results of the MC based on a selected day in summer 2019were presented in this paper, as shown in Figure 11. These results are a good starting pointfor further development of a model that facilitates Bi-LSTM. However, it is necessary to

J. Mar. Sci. Eng. 2022, 10, 1124 15 of 17

run MC with data from the entire year to create a large enough database to train, validate,and test the Bi-LSTM. Once the deep learning process is complete, it should be possibleto determine the collision probability for each new or existing navigational route and thetraffic density for the area under study.

The advantage of such a hybrid model is that once a neural network learns from anexisting database, it no longer needs to run new simulations. Based on this, the networkshould estimate the collision probability for the given navigational route and traffic density.The preliminary results presented are based on AIS data for the maritime region of Split,but the model can be applied to any navigational area for which AIS data are available.

The application of the proposed hybrid model is desirable for areas where trafficdensity is lower but increases every year and is highly dependent on the season. Moreover,in addition to ship collisions, it can also be used for grounding probability estimation byselecting the appropriate AIS message type. It can also be used for traffic analysis in theplanning and development of new ports.

The limitation in this preliminary research is the simplified and fixed ship domain thatdoes not vary by ship type, potentially dangerous cargo on board, ship speed, restrictedareas, and weather conditions.

Therefore, it is necessary to complete the proposed methodology and validationprocess to obtain reasonably positive results. In future research, more complex ship domainswill be applied along with the implementation of the Bi-LSTM architecture. Finally, theresults of the proposed model will provide insight of ship collision probability for a givennavigational route and traffic density.

Author Contributions: Conceptualization, S.V. and P.V.; methodology, S.V., P.V. and S.P.; software,S.P. and S.V.; writing—original draft preparation, S.V.; writing—review and editing, S.V. and M.B.;visualization, S.V. and S.P.; supervision, P.V. and M.B.; All authors have read and agreed to thepublished version of the manuscript.

Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.

Data Availability Statement: Not applicable.

Conflicts of Interest: The authors declare no conflict of interest.

References1. Mou, J.M.; van der Tak, C.; Ligteringen, H. Study on collision avoidance in busy waterways by using AIS data. Ocean Eng. 2010,

37, 483–490. [CrossRef]2. Zhang, S.; Pedersen, P.T.; Villavicencio, R. Probability of ship collision and grounding. In Probability and Mechanics of Ship Collision

and Grounding; Bolger, C., Ed.; Elsevier: Amsterdam, The Netherlands, 2019; pp. 1–61.3. Du, L.; Goerlandt, F.; Kujala, P. Review and analysis of methods for assessing maritime waterway risk based on non-accident

critical events detected from AIS data. Reliab. Eng. Syst. Saf. 2020, 200, 106933. [CrossRef]4. Zou, Y.; Zhang, Y.; Ma, Z. Emergency Situation Safety Evaluation of Marine Ship Collision Accident Based on Extension Cloud

Model. J. Mar. Sci. Eng. 2021, 9, 1370. [CrossRef]5. Liu, C.; Liu, J.; Zhou, X.; Zhao, Z.; Wan, C.; Liu, Z. AIS data-driven approach to estimate navigable capacity of busy waterways

focusing on ships entering and leaving port. Ocean Eng. 2020, 218, 108215. [CrossRef]6. Greig, N.C.; Hines, E.M.; Cope, S.; Liu, X. Using Satellite AIS to Analyze Vessel Speeds Off the Coast of Washington State, U.S., as

a Risk Analysis for Cetacean-Vessel Collisions. Front. Mar. Sci. 2020, 7, 109. [CrossRef]7. Bole, A.; Wall, A.; Norris, A. Automatic Identification System (AIS). In Radar and ARPA Manual; Elsevier: Amsterdam, The

Netherlands, 2014; pp. 255–275.8. IALA Guideline—An Overview of AIS. Available online: https://www.navcen.uscg.gov/pdf/IALA_Guideline_1082%0A_An_

Overview_of_AIS.pdf%0A (accessed on 11 March 2022).9. Androjna, A.; Perkovic, M.; Pavic, I.; MiĆĄkovic, J. AIS Data Vulnerability Indicated by a Spoofing Case-Study. Appl. Sci. 2021, 11,

5015. [CrossRef]10. Le Tixerant, M.; Le Guyader, D.; Gourmelon, F.; Queffelec, B. How can Automatic Identification System (AIS) data be used for

maritime spatial planning? Ocean Coast. Manag. 2018, 166, 18–30. [CrossRef]

J. Mar. Sci. Eng. 2022, 10, 1124 16 of 17

11. Costa, N.; Svanberg, M.; Horteborn, A.; Olsson, F. The Use and Usefulness of AIS Data. Highlights 67/2020. Availableonline: https://www.sspa.se/sites/www.sspa.se/files/field_page_files/2020_sspa_highlights_67_the_use_and_usefulness_of_ais_data.pdf (accessed on 13 March 2022).

12. Gunnar Aarséther, K.; Moan, T. Estimating Navigation Patterns from AIS. J. Navig. 2009, 62, 587–607. [CrossRef]13. Plovput LLC. Maritime Regions. Available online: https://www.plovput.hr/pomorska-signalizacija/plovna-podrucja (accessed

on 18 March 2022).14. LuĆĄic, Z.; PuĆĄic, D.; Medic, D. Analysis of the maritime traffic in the central part of the adriatic. In Proceedings of the AIIT

International Congress on Transport Infrastructure and Systems, Rome, Italy, 10–12 April 2017.15. Bukljaơ Skocibuơic, M.; Vukơa, S.; Pavic, I. Method for Analyzing AIS Data and Determination of Simultaneously Ships Passage

via Strait of Split. TransNav Int. J. Mar. Navig. Saf. Sea Transp. 2018, 12, 679–684. [CrossRef]16. Pedersen, P.T. A Pioneer of Ship Collision and Grounding. Ships Offshore Struct. 2021, 16, 5–10. [CrossRef]17. Svanberg, M.; SantĂ©n, V.; Hörteborn, A.; Holm, H.; FinnsgĂ„rd, C. AIS in maritime research. Mar. Policy 2019, 106, 103520.

[CrossRef]18. Zhang, W.; Feng, X.; Qi, Y.; Shu, F.; Zhang, Y.; Wang, Y. Towards a Model of Regional Vessel Near-miss Collision Risk Assessment

for Open Waters based on AIS Data. J. Navig. 2019, 72, 1449–1468. [CrossRef]19. Wang, L.; Li, Y.; Wan, Z.; Yang, Z.; Wang, T.; Guan, K.; Fu, L. Use of AIS data for performance evaluation of ship traffic with speed

control. Ocean Eng. 2020, 204, 107259. [CrossRef]20. Vestre, A.; Bakdi, A.; Vanem, E.; Engelhardtsen, Ø. AIS-based near-collision database generation and analysis of real collision

avoidance manoeuvres. J. Navig. 2021, 74, 985–1008. [CrossRef]21. Gil, M.; KozioƂ, P.; Wróbel, K.; Montewka, J. Know your safety indicator—A determination of merchant vessels Bow Crossing

Range based on big data analytics. Reliab. Eng. Syst. Saf. 2022, 220, 108311. [CrossRef]22. Nowy, A.; Ɓazuga, K.; Gucma, L.; Androjna, A.; Perkovic, M.; Srơe, J. Modeling of Vessel Traffic Flow for Waterway Design–Port

of Swinoujscie Case Study. Appl. Sci. 2021, 11, 8126. [CrossRef]23. Damastuti, N.; Aisjah, A.S.; Masroeri, A. Vessel Classifying and Trajectory Based on Automatic Identification System Data. IOP

Conf. Ser. Earth Environ. Sci. 2021, 830, 012049. [CrossRef]24. Mazurek, J.; Lu, L.; Krata, P.; Montewka, J.; Krata, H.; Kujala, P. An updated method identifying collision-prone locations for

ships. A case study for oil tankers navigating in the Gulf of Finland. Reliab. Eng. Syst. Saf. 2022, 217, 108024. [CrossRef]25. Wolsing, K.; Roepert, L.; Bauer, J.; Wehrle, K. Anomaly Detection in Maritime AIS Tracks: A Review of Recent Approaches. J. Mar.

Sci. Eng. 2022, 10, 112. [CrossRef]26. Coric, M.; MandĆŸuka, S.; Gudelj, A.; LuĆĄic, Z. Quantitative Ship Collision Frequency Estimation Models: A Review. J. Mar. Sci.

Eng. 2021, 9, 533. [CrossRef]27. Goerlandt, F.; Kujala, P. Modeling of ship collision probability using dynamic traffic simulation. Reliab. Risk Saf. Back Futur. 2010,

10, 440–447.28. Goerlandt, F.; Kujala, P. Traffic simulation based ship collision probability modeling. Reliab. Eng. Syst. Saf. 2011, 96, 91–107.

[CrossRef]29. Huang, J.-C.; Nieh, C.-Y.; Kuo, H.-C. Risk assessment of ships maneuvering in an approaching channel based on AIS data. Ocean

Eng. 2019, 173, 399–414. [CrossRef]30. Xu, W.; Wu, S. Ship Agent model for traffic flow simulation in inland waterway. IOP Conf. Ser. Mater. Sci. Eng. 2020, 768, 072104.

[CrossRef]31. Tang, Y.; Mao, Y.; Wu, M.; Shi, T.; Fan, C. Probability Analysis of Ship Collision and Grounding in Inland Waterway Based on Big

Data Analysis. J. Phys. Conf. Ser. 2020, 1486, 052016. [CrossRef]32. Manea, M.G.; Zăgan, R.; Manea, E. Comments regarding the use of Monte Carlo method in the analysis of maritime ship

maintenance works. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1182, 012043. [CrossRef]33. Hörteborn, A.; Ringsberg, J.W. A method for risk analysis of ship collisions with stationary infrastructure using AIS data and a

ship manoeuvring simulator. Ocean Eng. 2021, 235, 109396. [CrossRef]34. Jon, M.H.; Kim, Y.P.; Choe, U. Determination of a safety criterion via risk assessment of marine accidents based on a Markov

model with five states and MCMC simulation and on three risk factors. Ocean Eng. 2021, 236, 109000. [CrossRef]35. Xin, X.; Liu, K.; Yang, Z.; Zhang, J.; Wu, X. A probabilistic risk approach for the collision detection of multi-ships under

spatiotemporal movement uncertainty. Reliab. Eng. Syst. Saf. 2021, 215, 107772. [CrossRef]36. Zhang, M.; Conti, F.; Le Sourne, H.; Vassalos, D.; Kujala, P.; Lindroth, D.; Hirdaris, S. A method for the direct assessment of ship

collision damage and flooding risk in real conditions. Ocean Eng. 2021, 237, 109605. [CrossRef]37. Ma, W.; Zhu, Y.; Grifoll, M.; Liu, G.; Zheng, P. Evaluation of the Effectiveness of Active and Passive Safety Measures in Preventing

Ship–Bridge Collision. Sensors 2022, 22, 2857. [CrossRef] [PubMed]38. Liu, X.; Jiang, D.; Liufu, K.; Fu, J.; Liu, Q.; Li, Q. Numerical investigation into impact responses of an offshore wind turbine jacket

foundation subjected to ship collision. Ocean Eng. 2022, 248, 110825. [CrossRef]39. Du, L.; Banda, O.A.V.; Huang, Y.; Goerlandt, F.; Kujala, P.; Zhang, W. An empirical ship domain based on evasive maneuver and

perceived collision risk. Reliab. Eng. Syst. Saf. 2021, 213, 107752. [CrossRef]40. Szlapczynski, R.; Szlapczynska, J. A ship domain-based model of collision risk for near-miss detection and Collision Alert Systems.

Reliab. Eng. Syst. Saf. 2021, 214, 107766. [CrossRef]

J. Mar. Sci. Eng. 2022, 10, 1124 17 of 17

41. Ji, Y.; Qi, L.; Balling, R. A dynamic adaptive grating algorithm for AIS-based ship trajectory compression. J. Navig. 2022, 75,213–229. [CrossRef]

42. Weng, J.; Li, G.; Zhao, Y. Detection of abnormal ship trajectory based on the complex polygon. J. Navig. 2022, 1–18. [CrossRef]43. Duong, N.; Vadaine, R.; Hajduch, G.; Garello, R.; Fablet, R. Neural Networks for Vessel Monitoring Using AIS Streams. OCEANS

Oct 2018, Charleston, United States, ffhal-01863943f. Available online: https://hal.archives-ouvertes.fr/hal-01863943/document(accessed on 16 April 2022).

44. Kim, K.-I.; Lee, K. Deep Learning-Based Caution Area Traffic Prediction with Automatic Identification System Sensor Data.Sensors 2018, 18, 3172. [CrossRef]

45. Zhang, W.; Feng, X.; Goerlandt, F.; Liu, Q. Towards a Convolutional Neural Network model for classifying regional ship collisionrisk levels for waterway risk analysis. Reliab. Eng. Syst. Saf. 2020, 204, 107127. [CrossRef]

46. Volkova, T.A.; Balykina, Y.E.; Bespalov, A. Predicting Ship Trajectory Based on Neural Networks Using AIS Data. J. Mar. Sci. Eng.2021, 9, 254. [CrossRef]

47. Murray, B.; Perera, L.P. An AIS-based deep learning framework for regional ship behavior prediction. Reliab. Eng. Syst. Saf. 2021,215, 107819. [CrossRef]

48. Yuan, Z.; Liu, J.; Liu, Y.; Zhang, Q.; Liu, R.W. A multi-task analysis and modelling paradigm using LSTM for multi-sourcemonitoring data of inland vessels. Ocean Eng. 2020, 213, 107604. [CrossRef]

49. Wang, Y.-F.; Wang, L.-T.; Jiang, J.-C.; Wang, J.; Yang, Z.-L. Modelling ship collision risk based on the statistical analysis of historicaldata: A case study in Hong Kong waters. Ocean Eng. 2020, 197, 106869. [CrossRef]

50. Zou, J.; Chen, S.; Cheng, Z.; Liao, J. Research on Collision Avoidance of Navigation Marks Based on Neural Network. InProceedings of the 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), Hefei, China, 25–27September 2020; pp. 632–635.

51. He, Z.; Li, S.; Wang, S.; Zhuang, Z. Mining navigable water current information from ship-based big automated identificationsystem data. Ocean Eng. 2021, 242, 110076. [CrossRef]

52. Zhang, M.; Montewka, J.; Manderbacka, T.; Kujala, P.; Hirdaris, S. A Big Data Analytics Method for the Evaluation of Ship—ShipCollision Risk reflecting Hydrometeorological Conditions. Reliab. Eng. Syst. Saf. 2021, 213, 107674. [CrossRef]

53. Sawada, R.; Sato, K.; Majima, T. Automatic ship collision avoidance using deep reinforcement learning with LSTM in continuousaction spaces. J. Mar. Sci. Technol. 2021, 26, 509–524. [CrossRef]

54. Abebe, M.; Noh, Y.; Kang, Y.-J.; Seo, C.; Kim, D.; Seo, J. Ship trajectory planning for collision avoidance using hybrid ARIMA-LSTMmodels. Ocean Eng. 2022, 256, 111527. [CrossRef]

55. Zhou, W.; Li, Y.; Xiao, Y.; Zheng, J. The Application of Automatic Identification System Information and PSO-LSTM NeuralNetwork in CRI Prediction. Comput. Intell. Neurosci. 2022, 2022, 8699322. [CrossRef]

56. Zhong, C.; Jiang, Z.; Chu, X.; Liu, L. Inland Ship Trajectory Restoration by Recurrent Neural Network. J. Navig. 2019, 72,1359–1377. [CrossRef]

57. Park, J.; Jeong, J.; Park, Y. Ship Trajectory Prediction Based on Bi-LSTM Using Spectral-Clustered AIS Data. J. Mar. Sci. Eng. 2021,9, 1037. [CrossRef]

58. Kessler, G.C. Protected AIS: A Demonstration of Capability Scheme to Provide Authentication and Message Integrity. TransNavInt. J. Mar. Navig. Saf. Sea Transp. 2020, 14, 279–286. [CrossRef]

59. Arundale, N. AIS Decoder. Available online: https://arundaleais.github.io/docs/ais/nocontact.html (accessed on 24 April 2022).60. Namgung, H.; Kim, J.-S. Regional Collision Risk Prediction System at a Collision Area Considering Spatial Pattern. J. Mar. Sci.

Eng. 2021, 9, 1365. [CrossRef]