Maneuver Planning for Highly Automated Vehicles

178
Maneuver Planning for Highly Automated Vehicles Cristina Menéndez Romero Technische Fakultät Albert-Ludwigs-Universität Freiburg Dissertation zur Erlangung des akademischen Grades Doktor-Ingenieur Advisor: Prof. Dr. Wolfram Burgard

Transcript of Maneuver Planning for Highly Automated Vehicles

Maneuver Planningfor Highly Automated Vehicles

Cristina Menéndez Romero

Technische FakultätAlbert-Ludwigs-Universität Freiburg

Dissertation zur Erlangung des akademischen GradesDoktor-Ingenieur

Advisor: Prof. Dr. Wolfram Burgard

Maneuver Planningfor Highly Automated Vehicles

Cristina Menéndez Romero

Dissertation zur Erlangung des akademischen Grades Doktor-IngenieurTechnische Fakultät, Albert-Ludwigs-Universität Freiburg

Dekan Prof. Dr. Rolf BackofenErstgutachter und Betreuer der Arbeit Prof. Dr. Wolfram Burgard

Albert-Ludwigs-Universität FreiburgZweitgutachter Prof. Dr. Luis Montano

University of ZaragozaTag der Disputation 28. 07. 2021.

Acknowledges

This thesis presents the results of three and half years as a PhD student at the BMWGroup in Munich.

My sincere thanks go to Professor Wolfram Burgard for supervising my doctoral thesis.Your expertise and your advice helped me to focus and bring abstract ideas intospecific concepts. I am also very thankful with Professor Luis Montano for acting asco-examiner of this thesis and with Professor Bernhard Nebel and Professor JoschkaBödecker for acting as president and vice-president of the disputation committee.

This thesis would not have been possible without the supervision and support ofDr. Franz Winkler, who always encouraged me to present my ideas and to discusstransparently the potential and limitations of my work.

Special thanks go to Dr. Christian Dornhege, whose supervision and advises helpedme specially in formulating the research questions and explaining the methodology.

I had the luck to develop my thesis and spent my time as PhD candidate with agreat team of people. I am thankful to the colleagues of the “Cluster Regelung” andof the FAS-predevelopment group for their helpfulness, the interesting debates, forthe coffee breaks and the different events. I would like to mention the support ofHelena, Christoph, Luca, Leo, Christian and Steffen with the test vehicles and theopen discussions. Specially thanks to Nina for bringing new perspectives from thepsychological side. Additionally, I would like to thank Thomas and Mustafa for trustingme as supervisor of their master thesis.

Finally, I would like to express my deepest gratitude to my family, specially to myparents Pilar and Miguel, who always believed in me and gave me courage to beginand pursue everything I intended to. My biggest support during this journey was mybetter half Raúl. Your love and your encouragement, particularly on the critical phases,helped me to not give up and to finalize the writing of this work.

v

Summary

One important aspect of autonomous driving lies in the selection of maneuver sequences.Here, the objective is to optimize the driving comfort and travel-duration, while alwayskeeping within the safety limits. Human drivers analyze and try to anticipate thetraffic situation choosing their actions not only based on current information but alsobased on experience. Different from assistance systems, where the last decision and theresponsibility still falls back on the driver, on a highly automated driving vehicle, thedriver does not have continuous control. Thus, the system has to guarantee the safetyduring the autonomous driving phase. The challenge is to perform the driving activitybased on the only partially available knowledge of the situation. Even if the observeddata can be complemented by back-end information, the sensor range is still limited.Besides, the behavior of other road members is only partially predictable for a shorttime horizon. Therefore, the planning system is forced to deal with uncertainties andpartial knowledge. The ability to react to unexpected situations should be ensuredunder defined constraints. The system needs to:

• Present robustness over uncertainties and traffic evolutions.

• Provide feasible solutions regarding the dynamic limitations of the vehicle, theweather conditions and meeting real-time requirements.

• Handle complexity in a traceable way, remaining intuitive for the driver.

This thesis proposes a planning system that ensures driving safety on short horizons andintegrates previous experiences to optimize the expected reward. The planner presentsa multi-level architecture, similar to the human reasoning process, which combinescontinuous planning with semantical information. This allows the planning system todeal with the complexity of the problem in a computationally efficient way and alsoprovides an intuitive interface to communicate the decisions to the driver. A qualitativeanalysis of the different parameters that influence the passenger perception of comfortand safety is presented. The planner clusters the different options, assesses them andselects the best policy based on the expected reward over the time. The integrationof different abstraction levels allows to deal with the increasing time horizon as wellas with the increasing uncertainties. This approach takes not only the informationprovided by the environment into account, but also the observed and learned valuesfrom past situations.

vii

Zusammenfassung

Ein wichtiger Bereich beim autonomen Fahren ist die Auswahl der Maneuverreihenfol-gen. Die Herausforderung dabei ist die Optimierung von Fahrkomfort und Fahrzeit,unter Berücksichtigung von den Sicherheitsgrenzen. Basierend auf der Analyse desmenschlichen Fahrverhalten wird versucht eine Voraussage von Fahrsituationen zu erre-ichen. Die Aktionen des autonomen Fahrzeugs werden damit nicht nur auf Grundlageder aktuellen Information, sondern auch auf vorherige Erfahrungen gestützt.

Beim assistierenden Fahren liegt im Unterschied zum hoch automatisierten Fahren dieletzte Entscheidung und Verantwortlichkeit beim Fahrer. Deshalb muss das System ineinem hoch automatisierten Fahrzeug die Sicherheit gewährleisten, solange es sich indem autonomen Fahrmodus befindet. Eine große Herausforderung bei der hochautoma-tisierten Fahrt stellt die unvollständige Kenntnis der Situation dar. Die Informationensind auch bei Kombination der sensorisch erfassten Daten mit den Daten aus demBackend begrenzt verfügbar.

Desweitern ist das Verhalten der anderen Verkehrsteilnehmer nur begrenzt und für einenkurze Zeithorizon vorhersehbar. Aus diesen Gründen muss der Planungsalgorithmusmit Ungewissheit und fehlenden Informationen umgehen können. Die Fähigkeit aufunerwartete Situation zu agieren muss unter gewissen Beschränkungen sichergestelltsein. Das System muss:

• robust sein gegenüber Ungewissheiten und unerwartete Verkehrsgesehen.

• umsetzbare Lösungen liefern, unter Betrachtung von dynamischen Begrenzungendes Fahrzeugs, den Wetterbedingungen und den Online Beschränkungen.

• mit Komplexität nachvollziehbar umgehen, damit der Fahrer das System intuitiveverstehen kann.

Diese Arbeit stellt einen Planungsalgorithmus vor, welcher die Fahrsicherheit imkurzen Horizon sicherstellt. Dabei werden vorherige Erfahrungen zur Optimierungdes erwarteten Nutzen integriert. Der Algorithmus hat eine „multi-level“ Architektur,wobei kontinuierliche Planung und semantische Informationen miteinander kombiniertwerden. Damit kann der Planungsalgorithmus die Komplexität des Problems rech-nerisch effizient behandeln und dabei auch eine intuitive Schnittstelle zu dem Fahrer

ix

bereitstellen. Des Weiteren wird eine qualitative Analyse der Auswirkungen von unter-schiedlichen Parametern auf den Fahrkomfort und das Sicherheitsgefühl vorgestellt.Der Planungsalgorithmus strukturiert die unterschiedlichen Optionen, bewertet dieseund wählt die beste Handlung basiert auf dem erwarteten Nutzen aus. Die Integrationvon unterschiedlichen Abstraktionsstufen erlaubt eine Handhabung von steigendenZeit-Horizonten und Ungewissheiten. Dieser Ansatz nutzt nicht nur die vorhandeneon-board Informationen, sondern auch die Erfahrungen aus vorherigen Situationen.

x

Contents

1 Introduction 11.1 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Maneuver Planning 112.1 The Driving Activity - Problem Statement . . . . . . . . . . . . . . . . 12

2.1.1 Introduction to Semantic Attachments . . . . . . . . . . . . . . 132.1.2 World as 2D Representation . . . . . . . . . . . . . . . . . . . . 142.1.3 The Planning Domain . . . . . . . . . . . . . . . . . . . . . . . 162.1.4 The Highway Scenario . . . . . . . . . . . . . . . . . . . . . . . 182.1.5 The Planning Problem . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 The Planning Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.1 Perceiving, Planning and Acting . . . . . . . . . . . . . . . . . . 212.2.2 The Planning Layers . . . . . . . . . . . . . . . . . . . . . . . . 222.2.3 Longitudinal Transitions Restrictions between Lanes: Speed

Limitation and Undertaking . . . . . . . . . . . . . . . . . . . . 242.2.4 Range View and Road-condition Limitations . . . . . . . . . . . 252.2.5 Consideration of the Safety Constraints as Precondition for the

Maneuver planner . . . . . . . . . . . . . . . . . . . . . . . . . . 272.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.1 Velocity, Range and Road Limitations - Keep Lane Constraints 292.3.2 Safety Constraints as Precondition for the Lane Change . . . . . 32

2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Planning within the Sensor Range 393.1 The Tactical Planner Approach . . . . . . . . . . . . . . . . . . . . . . 40

3.1.1 Gap Oriented Action Description . . . . . . . . . . . . . . . . . 403.1.2 Modeling agents’ behavior . . . . . . . . . . . . . . . . . . . . . 423.1.3 Feasibility Check . . . . . . . . . . . . . . . . . . . . . . . . . . 453.1.4 Gap Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.1.5 Maneuver Optimization . . . . . . . . . . . . . . . . . . . . . . 53

xi

3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.2.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 553.2.2 Simulated Experiments . . . . . . . . . . . . . . . . . . . . . . . 553.2.3 Real-world Experiment . . . . . . . . . . . . . . . . . . . . . . . 573.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4 Planning and Prediction: Providing Courtesy Behavior 634.1 Intention Prediction and Courtesy Behavior . . . . . . . . . . . . . . . 65

4.1.1 Problem and Task Description . . . . . . . . . . . . . . . . . . . 654.1.2 Prediction Module . . . . . . . . . . . . . . . . . . . . . . . . . 664.1.3 Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . 684.1.4 Approach Generalization in Populated Environments . . . . . . 69

4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.2.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 724.2.2 Simulated experiments I: Single Conflicting Vehicle . . . . . . . 724.2.3 Real-world Experiments . . . . . . . . . . . . . . . . . . . . . . 744.2.4 Simulated Experiments II: Populated Environments . . . . . . . 774.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5 Driving towards the Highway Exit Ramp: A User Study 875.1 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.1.1 Parameters of Interest: Variables of Study . . . . . . . . . . . . 885.1.2 Study Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 935.1.3 The Cost/Reward Function . . . . . . . . . . . . . . . . . . . . 93

5.2 User Study on the Dynamic Driving Simulator . . . . . . . . . . . . . . 965.2.1 Dynamic Driving Simulator . . . . . . . . . . . . . . . . . . . . 965.2.2 Study Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.2.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.2.4 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.2.5 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . 1015.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

xii

6 Learning and Planning: Lane Selection via Reinforcement Learning 1176.1 Planning Framework: Maneuver Planning, State Representation and

Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.1.1 State Representation and Action Space . . . . . . . . . . . . . . 1196.1.2 Discrete Representation for Tabular Learning . . . . . . . . . . 1216.1.3 Reward Function and Return . . . . . . . . . . . . . . . . . . . 1216.1.4 Action-State Value Updates . . . . . . . . . . . . . . . . . . . . 123

6.2 Policy Selection, Reinforcement Learning with Maneuver Planning . . . 1256.2.1 Decision-Making based on Reinforcement Learning . . . . . . . 1276.2.2 Combined Decision-Making: Planning and Learning . . . . . . . 127

6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.3.1 Simulation Experiments Setup . . . . . . . . . . . . . . . . . . . 1296.3.2 Simulation Experiments Results . . . . . . . . . . . . . . . . . . 1306.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

7 Conclusion 1357.1 Limitations and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . 138

xiii

Chapter 1

Introduction

Intelligent Transportation Systems profit from the significant advances on computationaland sensor technologies. Real-world applications for intelligent autonomous vehicleshave taken the leap from theoretical and expensive military applications to affordablesolutions for each consumer. The recognition of hazardous situations in combinationwith the systematic and continuous supervision of the ego-vehicle’s surroundings cansignificantly improve the safety and comfort on the roads.

The driver’s rising awareness, enhanced passive safety systems and new active safetysystems have positively contributed to the improvement of the road safety. According tothe European Commission of Mobility and Transport [29], the number of fatalities in theEuropean Union in 2015 was over 26.100. These statistics present a great improvementsince 2006 when the number of fatalities was over 43.700, but the numbers are still toohigh.

Active safety systems like the autonomous emergency braking or the rear collisionwarning process the information about the vehicle surrounding provided by the sensorsand select different actions like warning the driver, activating a braking maneuver oractivating the restraint systems in order to protect the passengers. Harper et al. [63]showed that a high market penetration of active safety systems such as autonomousbraking or lane departure warning could reduce the number of accidents and generategreat savings. Safety is the main contribution of automated vehicles but other advan-tages can also be achieved, especially in terms of comfort and alternative time usageduring the drive.

The interest in autonomous driving has exploded over the last years. It began withthe DARPA Grand Challenge [157] and the Urban Grand Challenge [162] in the early2000s. The irruption of new players in the automotive industry like Google, Uberor Tesla has accelerated the competition to provide vehicles with high or full levelsof automation. However, the focus of regulations and consumer associations is stillset on the assistance systems: different costumers rating programs, as the New CarAssessment Programs (NCAP), include active systems in their vehicle rating [118].

1

2

The integration of active safety computing programs into our vehicles which are ableto compute evolutions and variations of the situation more precise and faster than ahuman mind, is a big chance to increase safety on the roads, since a large number ofthe accidents on the road are caused by human errors. According to analysis carriedout by the National Highway Traffic Safety Administration (NHTSA), the numberof accidents that occurred in 2016 in the USA [113], 28% of the fatalities were inalcohol-impaired-driving crashes, fatalities in speeding-related crashes represented a27.8% of the total, fatalities in distraction affected crashes were 9.2% and 2.1% of thetotal fatalities involved a drowsy driver.

Automated vehicles are able to perceive and analyse their environment and computeseveral variations and combinations of the scene, but how can it be assured that acomputer system achieves a similar or even better performance than a human driver?Many routine tasks can be overtaken by automated systems that perform the activityfaster and more precise. The use of robots in production has been widely exploredand mastered, achieving high accuracy and speed. The ability of artificial intelligenceto learn and make their own decisions has exponentially increased over the last years.In 2016, AlphaGo [141] surprised everyone mastering the Go-game and beating thehuman world-champion. Only one year later, AlphaZero [142] pushed the limits andnot only mastered the Go but also learned other strategy games on its own. These areexciting examples of how much can be learned and improved by an artificial intelligence.Nevertheless, production robots and strategy games work within defined boundaries asclosed systems without any other interaction beyond their system limits. Those arestructured problems in a reduced and controlled environment. This is not the case ofthe navigation of autonomous vehicles in highway scenarios, due to the high number ofexternal and dynamical factors that influence the navigation. The agent is continuouslyinteracting with other independent agents in a highly dynamical and uncontrolledenvironment. Driving has not only fast changing conditions but also strong limitationsabout the available time to process the information, select an action and provide thecorrect reaction to unexpected new situations.

According to the Society of Automotive Engineers International (SAE), five levels ofvehicle automation are considered [128]. Figure 1.1 shows the six levels of automateddriving, from no automation to fully automated driving. The higher step happensbetween the assistance systems (levels 1 and 2) where the driver monitors the systemand is assisted in a longitudinal or lateral motion and the highly automated levels(levels 3 and 4) where the system performs the perception and motion task and thedriver is only required occasionally and in domain-dependent scenarios. In level 5, theautomated driving system can operate the vehicle under all on-road conditions withno design-based restrictions.

Assistance systems are present nowadays on roads and available to the customer by

Chapter 1 3

Figure 1.1: Automation levels according with the Society of Automotive Engineers [128].Source SAE International Releases Updated Visual Chart for Its Levels ofDriving Automation Standard for Self-Driving Vehicles [127]

.

almost all automotive companies. These assistance systems cover from advices on thecurrent velocity to longitudinal and lateral guidance support for the driver, alwaysassuming that the driver is still supervising the drive. Prototypes of higher automationlevels are already operating on the streets and interacting with purely manually drivenvehicles.

The challenge now is focused on these higher levels of automation, where the attentionof human drivers can deviate from the driving task. Two advantages are targeted bythese higher levels of automation: safer systems and the freedom to select how toinvest the commuting time. An autonomous system is expected to drive safely andin a comfortable way, but it should also be perceived by the other traffic participantsas a cooperative road partner. Furthermore, the automated vehicle has to assess thedifferent possible evolutions of each decision and available actions not only consideringthe advantages in the short-term but also including long-term objectives.

When approaching this problem, I had to deal with some limitations. The environmentalinformation is obtained by on-board sensors and their performance can be reducedunder adverse environmental conditions. Also, the road conditions can vary and theavailable acceleration may be restricted. These restrictions need to be consideredby the system in order to guarantee safety. This work proposes a pre-condition thatconsiders these conditions during the planning.

Another challenge comes from the uncertainties about the scene evolution: the behaviour

4 1.1 Contributions of the Thesis

of the traffic participants is only partially predictable and accurate in the short-term.On one hand, not only the most likely evolution of the situation should be consideredbecause neglecting less likely actions could result in dangerous situations. On theother hand, it is not possible to include all the potential scene evolutions because itmay result in too conservative systems with high computational loads. Therefore, atrade-off between computational cost and exploration needs to be found, allowing theproposed system to be robust over unexpected situations.

The first approaches of autonomous vehicles were rule and model based. Theseapproaches perform well for most of the circumstances but are limited when unexpectedsituations appear. The system has to be flexible enough in order to deal with newsituations and also has be able to present a comprehensible and traceable behaviour.The system explores the available actions integrating the information of previousexperiences.

The key problem in the context of planning and decision-making for autonomousvehicles addressed in this Thesis is to find and select the optimal maneuver sequencein order to achieve the mid and long term strategical goal. The following requirementsneed to be considered:

• Provide feasible solutions regarding the dynamic limitations of the vehicle andthe weather conditions.

• Safety has to be guaranteed.

• Robustness over future unknown traffic evolutions.

• Meet real-time requirements and provide flexibility to cover different computa-tional specifications.

This Thesis focuses on highway scenarios, which present a structured topology andhighly dynamic changing conditions due to the interaction with the other trafficparticipants.

1.1 Contributions of the Thesis

This work addresses the driving task for highly automated driving on highway scenarios.Considering an automated vehicle, where the information is obtained by sensors, aninternal model is generated and a strategy is planned and implemented, the focus of thisThesis lies on the decision-making - how the system considers and assess between thedifferent available actions and selects the most adequate strategy. This work presentsa multilevel architecture, where several levels interact with each other in differentabstractions degrees. A higher level works on the generic lane and velocity selection

Chapter 1 5

according to the current traffic flow, an intermediate level provides a longitudinal andlateral strategy given the desired lane and velocity and the lower levels implement therequested sub-goals in the short-term.

The main key aspects of this Thesis and the proposed system are the integration ofthe safety criteria in the planning, the consideration of the uncertainties of the sceneevolution and the detailed evaluation of the parameters and their influence for theobjective function optimized by the planning system.

The first contribution is the consideration of the safety criteria as a pre-conditionsfor the planning search. In order to explore the possibilities of an available action,several pre-condition or safety criteria have to be fulfilled. Thus, each explored actionis proven to be non-critical in the short-term, under some defined assumptions orpremises. As presented in Chapter 2, the preconditions of the planner verify that asafety braking within the lane is possible and further criteria are introduced in orderto allow or consider the lane change. The weather conditions are considered limitingthe available deceleration according to the measured friction coefficient and restrictingthe current velocity to always allow an emergency brake within the visibility range.

The second contribution is the consideration of uncertainties derived from the environ-mental evolution. The presented Maneuver Planning bridges the gap between classicalplanning approaches and machine learning methods. When considering a planning sys-tem, two kinds of uncertainties need to be considered on the evolution of the situation:the uncertainties on the perception and model accuracy and the uncertainties derivedfrom the lack of knowledge about the intention and dynamic evolution of the system.Assuming perfect knowledge of the whole environment, the evolution of the dynamicagents and objects involved is unknown. Some assumptions of their behaviour can beconsidered to simplify the computation of the different variations, but in the end, theaccuracy of the prediction gets more diffuse when the planning horizon increases. Theconsideration of these uncertainties is included on a microscopic traffic level for theintention prediction of the other agents and on a macroscopic level for the traffic stateevolution for the lane adequacy.

The third contribution is a study of the influence of different parameters on the comfortand safety perception of the passengers. The planner tries to find the maneuver sequencethat optimizes a function objective, but this function objective should be definedaccording with the safety limitations and including the final customer preferences.

This Thesis is organized as follows:

• Chapter 2 introduces the robust framework for decision-making. The robustnessis provided through the integration of semantic and numeric reasoning betweendifferent planning levels and the safety constraints as precondition for planning.

6 1.2 Collaborations

• Chapter 3 presents the planning for the mid-term horizon. An uncertainty successassessment parameter is introduced. This parameter indicates how adequate wasthe selection of a space between two other vehicles on past situations. It reducesthe criticality of the scenarios and improves the success rate. The interaction ofthe prediction of other agents’ behaviour is also introduced in Chapter 3.

• Chapter 4 goes into detail in the prediction of the behaviour of the other trafficparticipants and how to cluster the possible state evolution into complementaryactions to reduce the branching factor when considering the different scenariovariations. This Chapter also presents the courtesy behaviour or how the selectionof ego-actions that cooperate with potential merging vehicles can improve theresults not only for the ego vehicle but also for the surrounding traffic participants.

• Chapter 5 presents a user study that evaluates the influence factors on thecomfort and safety perception by the passengers when driving towards a highwayexit-ramp.

• Chapter 6 includes the experience of previous episodes on the long-term decisionmaking and evaluates the interaction and influence on the action selectionconsidering the mid-term and long-term objectives.

• Chapter 7 summarizes the main results and contributions of the Thesis andanalyses the challenges and limitations of the proposed methods.

1.2 Collaborations

This Thesis summarizes my work as PhD candidate. During this time, I had theopportunity to share ideas with my colleagues who contributed with very interestingdiscussions. I am particularly thankful to my advisor Professor Wolfram Burgard, whoguided this work and always contributed with good advice. Also very important wasthe continuous exchange with my supervisors Christian Dornhege and Franz Winklerwho always provided great orientation.

Chapter 5 presents the analysis of the influence factors on the customer perceptionduring the automated drive based on a user study carried out on the dynamic drivingsimulator. The analysis presented on the chapter was fully conducted by me. The userstudy was carried out by Thomas Nader as part of his Master Thesis Komfort- undSicherheitswahrnehmung bei eine Autobahnabfahrt (2018) [158]. The Master Thesiswas supervised by me and together, we prepared the study design.

Chapter 1 7

1.3 Publications

The following peer-reviewed contributions were published in context of this Thesis.

Conferences

• Cristina Menéndez-Romero, Franz Winkler, Christian Dornhege, Wolfram Bur-gard. Maneuver Planning for Highly Automated Driving, In Proceedings ofIntelligent Vehicles Symposium (IV), June 2017, [102].I, as the main author of the paper, implemented the method, performed theexperimental evaluation and wrote the draft. Christian Dornhege provided con-sultation on the writing and structure of the paper. Wolfram Burgard and FranzWinkler contributed with general consultation. The results of this paper areincluded in Chapters 2 and 3.

• Cristina Menéndez-Romero, Mustafa Sezer, Franz Winkler, Christian Dornhege,Wolfram Burgard. Courtesy Behaviour for Highly Automated Vehicles on HighwayInterchanges, In Proceedings of IEEE Intelligent Vehicles Symposium (IV), June2018. - Best paper Award, [101].I am the main author of the paper. The method to integrate the predictionof surrounding vehicles into the decision-making strategy allowing to provideCourtesy Behaviour towards other traffic participants and the MultinomialRegresion Classifier was developed and implemented by me. I conducted andevaluated the simulations and real-world experiments based on the DrivingStrategy presented in Chapter 3. I wrote the initial draft of the paper. Thepredictor based on Gentle Ada Boost Classifier and Montecarlo Sampling wasdeveloped by Mustafa Sezer, as well as the simulation experiments with theDriving Strategy after Bahram et al. [11] as part of his Master Thesis Designand Analysis of a Cooperative Driving Strategy for Highly Automated Drivingon Freeways [109] and contributed to the writing of paper. Wolfram Burgard,Christian Dornhege and Franz Winkler contributed with general consultationand supervised the paper. The results of this paper appear in Chapter 4.

• Cristina Menéndez-Romero, Franz Winkler, Christian Dornhege, Wolfram Bur-gard. Maneuver Planning and Learning: A Lane Selection Approach for HighlyAutomated Vehicles in Highway Scenarios, In Proceedings of IEEE 23rd Inter-national Conference on Intelligent Transportation Systems (ITSC), September2020, [103].I am the main author of the paper. I developed and implemented the method toevaluate the lane adequacy integrating former experiences in the planning frame-work. I implemented, compared and evaluated the different approaches presented

8 1.3 Publications

in the experiments. I wrote the initial draft of the paper. Wolfram Burgard,Christian Dornhege and Franz Winkler contributed with general consultation.The results of the paper appear in Chapter 6.

PatentsDuring my time in the pre-development department of driving assistance at BMWGroup, I was part of a team where I had the opportunity to discuss and develop severalinitial concepts and preliminary implementations. The following patents applicationsdirectly related to this thesis were registered as part of my collaboration:

• DE102014223000A1 (doc. laid open)- Einstellbare Trajektorienplannung undKollisionsvermeidung. Adjustable trajectory planning and prevention of collision.The patent registration is based on the work of Christian Rathgeber and FranzWinkler [122]. I contributed to the implementation of the trajectory planning,the collision and potential check. The trajectory planning that we developed isused in all the experiments with the maneuver planning presented in this thesis.

• DE102016205806A1 (doc. laid open)- Verfahren zur Optimierung einer Funktionin einem Fahrzeug. Method for optimizing a function in a vehicle. ChristophHellfritsch, Cristina Menéndez, Christian Rathgeber and Franz Winkler.All authors contributed equally, both in the development and in the implementa-tion. I used the resimulation framework for the resimulation of the real-worldmeasurements presented in Chapter 4.

• DE102017205134A1 (doc. laid open)- Verzögerungsassistennzsystem in einemKraftfahrzeug und Verfahren zur steuerung eines entsprechenden Verzögerungsas-sistenzsystemes. Delay assist system in a motor vehicle and method for controllinga corresponding delay assist system. Nina Kauffmann, Cristina Menéndez, Chris-tian Rathgeber and Franz Winkler.All authors contributed equally to the patent registration, both in the develop-ment and in the implementation. The integration of the delay assist system dueto a speed limitation within the maneuver planner is explained in Chapter 2.

• DE102017200580A1 (doc. laid open) - Verfahren zur optimierung einer Maneu-verplanung für autonom fahrende Fahrzeuge. Method for optimizing a maneuverplanning for autonomously driving vehicles. Cristina Menéndez and Franz Win-kler.The patent registration is based on the abstraction levels presented in Chapter 2and the tactical planning presented in Chapter 3, both developed by me. Allauthors contributed equally in writing of the patent draft.

• DE201810204185A1 (doc. laid open)- Fahrerassistenz mit einer variabel ver-aenderbaren Kooperationsgroesse. Driver Assistance with a variably modifiablecooperation size. Cristina Menéndez, Mustafa Sezer, Franz Winkler and Anton

Chapter 1 9

Wolf.The patent registration is based on the method courtesy behavior presented inChapter 4, developed by me. The cooperation parameter was discussed by allauthors and all authors contributed equally to the writing of the patent draft.

• DE102018213971A1 (doc. laid open) - Verfahren und Vorrichtung zur Auswahleines Fahrmanoevers. Method and apparatus for selecting a driving maneuver.Cristina Menéndez and Franz Winkler.The patent registration is based on the combined planning and Reinforcementlearning presented in Chapter 6 and developed by me. All authors contributedequally in writing of the patent draft.

• US2018253103A (Granted patent) - Method and device for controlling a trajec-tory planning process of an ego-vehicle. Franz Winkler, Christoph Hellfritsch,Christian Rathgeber and Cristina Menéndez.All authors contributed equally in the patent draft and in the implementation.This patent registration describes the longitudinal component of the trajectoryplaning and is used in all the experiments with the maneuver planning presentedin this Thesis.

Chapter 2

Maneuver Planning

The decision-making process can be treated as a planningproblem. Classical systems consider the autonomous dri-ving task as a global numeric optimization problem, whichin populated dynamic environments can become computa-tionally intractable. In addition, purely numeric computa-tions hamper the understanding of the decision-making forhuman users. This chapter proposes a planning system thatpresents a multi-level architecture, similar to the human rea-soning process, which combines continuous planning withsemantic information. This allows the planning system todeal with the complexity of the problem in a computatio-nally efficient way and also provides an intuitive interfaceto communicate the decisions to the driver.

Since the DARPA Grand Challenge, the interest in automated driving systems hasincreased not only within the research community but also by the general public.Many automotive manufacturers have brought driver assistance systems with differentautomation degrees to market, including the lane departure warning, the adaptive cruisecontrol (ACC) or the lane change assistant. Compared to assistance systems, where thelast decision and the responsibility still fall back to the driver, in a highly automateddriving vehicle, the driver does not need be continuously in control. The great challengeof autonomous systems is to guarantee the safety of the selected maneuvers during theautonomous driving phase. The ability to react to unexpected situations should beensured under defined constraints. In addition, if the system wants the human takingover control, safety should be guaranteed while the driver is warned and at least untilthe driver gets back into control.

The challenge is to perform the driving activity based on the partially availableknowledge of the situation. Even if the observed data can be complemented by back-

11

12 2.1 The Driving Activity - Problem Statement

end information, the sensor range is still limited. Besides, the behaviour of the otherroad members is predictable only partially and for a short time horizons. Therefore,the planning system is forced to deal with uncertainties and partial knowledge.

A detailed review of the state of the art is presented on section 2.4. Classical approachesconsider the autonomous driving task as a global optimization problem, resulting inintractable computational effort. Other approaches present a top-down architecturewhere some critical information is lost within the layers. The trade-off between safetyand computational effort results in solutions that provide reactive lane changes. Inother words, they perform conservative lane changes only when the free-space near thevehicle is big enough. This conservative behaviour overlooks valid and safe maneuversthat a human driver would actually select.

In this Thesis, the overall problem is considered as a planning problem, presenting anovel framework based on the integration of semantic and continuous planning. Thesame planning domain is defined on different abstraction levels for all the planninglayers. This approach deals with the partial information in an efficient and traceable way.The planner clusters different options, assesses them and selects the best policy basedon the expected future reward. The integration of different abstraction levels allowsdealing with the increasing time horizon as well as with the increasing uncertainties. Itdoes not only take into account the information provided by the environment but alsoobserved and learned values from past situations. This results in a proactive drivingstrategy that plans and executes the best policy to reach a desired lane and the desiredvelocity.

This chapter introduces the driving-task as a planning problem, and presents theframework to plan and execute the driving task on a comfortable and safe way. Toachieve a complete understanding of the world and the driving task, the problem andinvolved elements need to be defined. An introduction to the semantic attachmentsis also provided in this chapter. A novel planning framework for highly automatedvehicles driving in highway scenarios is presented and its consideration of safety isintroduced. The experiments on section 2.3 illustrate the decision-making process andthe results of the planning framework in different scenarios. A comparison with thestate of the art is presented on section 2.4.

2.1 The Driving Activity - Problem Statement

The first step to approach the driving task is to identify and define the main elementsrelated with it. The highway environment is defined as a 2D model including theroad topology, its different representations and the elements included on it. This workapproaches the driving task as a combination of numerical and semantic planning.

Chapter 2 13

Figure 2.1: The driving task on highway scenarios is approached as a planning problemand described as a planning instance.

Figure 2.1 presents the main elements involved in solving the driving task on highwayscenarios. An introduction to semantic attachments provides formal description tounderstand the planning instance, the domain is described and the problem instantiated.

2.1.1 Introduction to Semantic Attachments

As proposed by Dornhege et al. [37] the use of semantic attachment allows to combinea declarative part, including the domain and problem description, and a proceduralpart or symbolic planner. To describe the driving-task as a problem, the definitionsplanning instance, domain and schematic operator according to Bäckström et al. [9],Dornhege [35] and Waters et al. [165] are taken:

Definition 1. A planning instance is defined to be a pair I = (Dom,Prob) wherethe domain Dom describes the model of the world and the generic actions and theproblem Prob instantiates the current world.

Definition 2. The planning domain is a tuple ( Dom = (P,F ,O)) of a finite setof predicate symbols (P ), each one with an associated arity, a finite set of functionsymbols that represent numerical values (F ) and a finite set of schematic operators(O).

Definition 3. A schematic operator is a tuple O = (φ, e, cost)) formed by a con-dition checker or precondition (φ), an effect (e) and cost. The condition checker φstates which requirements have to be satisfied in order to be able to apply the operatorsuccessfully. Effect e or post-condition specifies which values the variables will haveafter a successful execution of the operator. Cost is a function that maps to R≥0 (cost).

14 2.1 The Driving Activity - Problem Statement

Section 2.2.5 presents the use of semantic operators in order to integrate the planningrestrictions and to provide safety.

Definition 4. Semantic attachments are external reasoning modules that computeat run-time valuations of variables used by a planner. They allow to keep the planningtask modular and to introduce more complex reasoning modules when the availabletime and computational resources allow it.

Definition 5. Given a set of variables V, a set of operators O, an initial state ofthe variables Vinit and a goal state Vgoal, a plan is a sequence of valid operators thattransforms the initial set into another state. A plan is a solution if the sequence ofoperators results in a state where all the conditions of the goal state are satisfied.

The main objective is to obtain a plan which is optimal or close to optimal withrespect to some cost or utility measure. Considering automated vehicles driving onthe highway, it is necessary to find a plan under given constraints of computationaltime and computing resources. The best solution presents a good trade-off betweenoptimization and available resources.

Definition 6. For the driving task, the problem Prob = (T ,Obs,Ags,C ,Goal) isdescribed by a 5-tuple consisting of the current topology, the objects and agentsinteracting on it and the context and goal state specifications.

2.1.2 World as 2D Representation

For the driving task, the world can be described thorough a 2D representation, with atopology, objects and agents.

TopologyAs presented by Kuipers et al. [86], the ontology of the topological level can be definedconsidering the world as a 2D abstraction. The same element can be defined throughdifferent dimensional subspaces depending on the abstraction needed by the planner.

• Region: Two-dimensional subset of the environment that can be defined by oneor more boundaries.

• Boundary: One dimensional subspace, used to delimit other subspaces.

• Path: Describes part of the environment as a one dimensional subspace and canbe directed.

• Lane: Defined through one reference path, two lateral boundaries and twolongitudinal boundaries with at least one drivable boundary. A lane can becharacterized as a 2D-region or as 1D-reference path.

Chapter 2 15

• Intersection: Connects at least one different ongoing and outgoing reference paths.The 2D abstraction of an intersection contains the region within its boundaries.

• Joint: The 1D-joint abstraction represents the boundary between two elementsof the topology.

• Default-region: Other regions limited by 1D boundaries are classified as default-regions, like parking or tank places.

Figure 2.2: Exit-ramp represented as 2D abstraction (left) and 1D abstraction (right)

Figure 2.3: Fork intersection represented as 2D abstraction (left) and 1D abstraction (right)

The maneuver planner considers the elements of highway scenarios. Most of the timethe vehicle moves through the available lanes, although other elements as intersectionscan be found. A local planner provides the lateral and longitudinal trajectories drivingwithin the defined boundaries and considering the 2D description of the lane (it maybe desired to drive towards a center-line and deviations are accepted within the laneboundaries but punished). Higher level planner are concerned with the lane selectionover time, therefore, they reason over the 1D dimensional abstraction of the lanesand their connectivity. Fig. 2.2 presents an example of a highway section. A highwayfork contains an intersection where one reference line gets in and at least two differentgo out. On the intersection region of Figure 2.3, two different reference paths arefound, connecting the input section and the two output boundaries. Sometimes thereference paths are trivial or can be obtained from a digital maps. When the extractionof the path is more complex a path planning as the presented by Tanzmeister etal. ([152], [153], [154]) can be used.

Classically, topological algorithms represent maps as graphs, where edges correspond toplaces and arcs correspond to paths between them [156]. Within this work, places arelocations with specific distinguishing features, such as intersections and T-junctions,or locations where the road attributes change. Objects and agents can be placed orlocated on the topology.

16 2.1 The Driving Activity - Problem Statement

ObjectsObjects are non-autonomous elements located on the topology. Some of them arenot drivable obstacles, others can modify the topology attributes (maximal velocity,connectivity between lanes...) or represent a new boundary. Here are some of the mostimportant listed:

• Traffic lights: lead to a new lateral boundary of the lane in the 2D abstractionand a joint in the 1D abstraction. They modify their drive-ability according tothe current light colour.

• Stop sign and give-way signs: represent a boundary, where the drive-ability isassociated to an available space precondition.

• Non-drivable static objects: for example barrier boards on roadworks, which canreduce the available section for the autonomous vehicle.

AgentsDifferent from objects, agents are proactive, temporary continuous, autonomous andreactive [48]. They are also not drivable and usually dynamic. As autonomous entitiesthey follow their own goals. There are two different kind of agents:

• Traffic-participants: follow the road topology and are conscious of the trafficrules.

• No-traffic-users: their behaviour does not follow the topology and usually presentsmore uncertainties, like animals or pedestrians.

One key element for the planning, is the evolution of the state of the agents over thetime. Different models like physic-based models, maneuver-based models or interaction-aware based models [90] can be used for agents’ prediction. The prediction is an usefulinput for the planning algorithm but the algorithm also needs to be able to deal withthe lack of information. Therefore, an assessment parameter derived from uncertaintiesis integrated into the selection policy. In Sections 3.1 and 4.1 the role of the predictionis introduced and how this information is integrated within the planning process isexplained in more detail.

2.1.3 The Planning Domain

Based on the relationships between topology elements, agents and objets, and accordingto different premises or conditions a sequence of actions can be planned. The predicates,function symbols and schematic operators define the planning domain.

PredicatesA predicate is binary variable that depends only on the state, not on previous or

Chapter 2 17

Figure 2.4: Highway section with three lanes where a yellow vehicle (caryelow), a whitevehicle (carwhite), a blue vehicle (carblue) and a red vehicle (carred) are driving.

external values. Each predicate is defined by a unique name and a set of arguments.They represent what combination and relationship between arguments can be true.

• x is-drivable: x is-drivable means that the ego vehicle can drive above an xelement. For example, agents are not drivable but lane markings (boundaries)are drivable if they are dotted and non-drivable if they are continuous. A lane isdrivable if it presents a section big enough which is free of static non-drivableobstacles.

• x is-on-lane y: an entity "x" is located on lane "y". This predicates indicates thatthe whole 2D projection of the entity "x" is located within the boundaries of lane"y".

• x is-partially-on-lane y: an entity "x" is partially located on lane "y". Thispredicates indicates part of the 2D projection of the entity "x" is located withinthe boundaries of lane "y" and part of the projection is outside of the boundariesof lane "y". In Figure 2.4 caryelow-is-partially-on-lane1 and caryelow-is-partially-on-lane2.

• x in-front-of/behind-of y. An entity "x" is located in-front-of another entity "y"when taking "y" as system reference, the entity "y" has a positive longitudinalcoordinate. An entity "x" is located behind-of another entity "y" when taking"y" as system reference, the entity "y" has a negative longitudinal coordinate Fordynamic entities, the longitudinal axis is located in their driving direction andthe lateral axis perpendicular to it on positive sense (counter-clock direction).For static entities the longitudinal axis is defined according to the directed pathof the topology element where they are located.

• x is-left-of/right-of y. An entity "x" is located left-of another entity "y" whentaking "y" as system reference, the entity "y" has a positive lateral coordinate.An entity "x" is located right-of another entity "y" when taking "y" as systemreference, the entity "y" has a negative lateral coordinate.

Function symbolsA function symbol contains the numeric value of a parameter. For example, the velocity

18 2.1 The Driving Activity - Problem Statement

or acceleration of a vehicle are function symbols.

Schematic OperatorsSchematic operators define the transitions between states. In this work they can also beseen as actions, when they are totally grounded. The main operators for the drivingtask on the highway are keep driving on the current lane {KL} and change lane {CL}.For the driving task in other scenarios the operators can be extended with stop at theregion P {SaP} and drive from region A to region B {DfAtB}.

2.1.4 The Highway Scenario

A consistent terminology is used within this work. Some of these terms accept severalvalid interpretations. Ulbrich et al. [161] and Geyer et al. [54] proposed to unify somedefinitions for automated driving. In the following, relevant terminology used withinthis work is described:

Scene, scenario and episode

• Scene instant description or instantiation: Describes the ensemble of the environ-ment. Includes the topology, the agents and objects and their locations. A scenecan be fully observable and known (objective scene - ground truth) or perceivedfrom a observer point of view and thus, incomplete, including uncertainties anderrors (subjective scene).

• Scenario: Describes a temporal development between several scenes in a sequenceof scenes. It presents some characteristics related with the topology, objects andagents included on it.

• Scenery or setting: the ensemble of static environment. It includes the topology(geometry and characterization of the road, of the surrounding environment,number of lanes..) and the static elements (trees, buildings, traffic signals,walls...).

• Setup: Collection of related scenes.

• Episode and scene evolution: Both terms define a temporal sequence of actions,events and states. The main difference is that an episode usually refers to a realevolution, and it is evaluated once the temporal sequence takes place. Sceneevolution on the other hand is used to define both the real evolution and thepredicted or forwards simulated temporal sequence.

Scenario characterizationA vehicle can drive on different scenarios with specific characteristics. Dependingon the dynamics and on the structure, a possible classification could be to consider

Chapter 2 19

dynamic or static environments, and structured, semi-structured and no structuredenvironments.• Classification according to the environment changeability: A static environment is

defined by static elements, which could be a priori unknown. The main constraintsare time-independent. In a quasi-static environment, dynamic elements conditiononly the velocity of the path following. For example, many parking scenariosselect at first a geometrical path than will be driven at a slow velocity. If anypedestrian crosses the planned path, its enough to brake and wait until the spaceis free again. When navigating in dynamic environments, the interaction betweenall the dynamic agents plays a specific role. The optimal movement results as acombination of a longitudinal and lateral movements over the time taking intoaccount the future movement of other dynamic agents.• Classification according to the environment structure: In a structured environmentthe traffic flow is parallel. Static elements are mainly integrated on a regularroad network on semi-structured environments and the dynamic elements areguided by this network. However, the traffic flow is not parallel. Non-structuredenvironments present an irregular, changing or non-existent road network, wherethe dynamic agents can move freely.

The main challenge of static environments is to acquire correctly the information andgenerate the adequate maps. The planning task can be solved by one of the classicplanners as the presented by Latombe [87] or LaValle [88], which are adequate to dealwith static scenarios and generate a geometrical path followed by the vehicle. Whenother dynamic elements also move on the scenario, the planning task has to considerthe time component. It is not only relevant where to drive but also when. In fact, asthe prediction of other vehicles can only be assumed both a fast adaption of the plan asa correct handling of uncertainties are also needed. A restriction for the planning taskcomes with situations that need to be solved within a finite time or way. In such situa-tions the decision can not be indefinitely postponed without risking the accomplishmentof the task. Highway scenarios are structured and highly dynamic. The environmentinformation can change quickly and the right decision has to be taken fast, it is thereforeimportant to know the main objectives and to adapt the decision to the current context.

2.1.5 The Planning Problem

The aim of approaching the driving task as a planning problem is to find a sequenceof actions that reach a defined objective or goal. The topology, agents and objectsare instantiated according to the current information defining an initial state and thedifferent restrictions given by the current context are included.

20 2.2 The Planning Framework

ContextThis work considers a two-dimensional abstraction of the world, this representationdoes not consider the street profile explicitly. Nevertheless, this information is relevantin those situations where the available power of the vehicle is reduced. For example, avehicle driving with a trailer could have less power available to perform an overtakingmaneuver. In addition, the road conditions can change depending on the weather–forexample due to rain or snow. The available transmission force to the road determinesthe vehicle behaviour and has to be considered during the planning step. Through thecontext specification at the problem definition, constraints as the available accelerationcan be limited so that the inclination, load and friction are implicitly included.

GoalsThe vehicle follows simultaneously two different goals:

• Maintenance goals: always drive safe.

• Achievement goals: drive on the selected lane with a selected velocity, or reach agiven position.

The world and its entities are fully defined and instantiated with the domain and theproblem definition. The driving task takes place on a defined location (based on streettopology), where different objects can be placed, some of them changing the attributesof the location. There are different agents moving and interacting. The ego vehiclefollows the maintenance and achievement goals.

2.2 The Planning Framework

An autonomous vehicle is a robot that perceives its environment and moves controlled bya computer program. Rasmussen [120] classifies the performance of human operators inknowledge-based behaviour, rule-based behaviour and skill-based behaviour. The decision-making for a human is a combination of rule-based behaviour when the decision is basedon associations of state and task or in stored rules for task, and a knowledge basedbehaviour, when different plans and their effects are represented and tested againsta defined goal. The knowledge-based behaviour provides the ability to react to newsituations. Similar to the human reasoning process, the presented Maneuver Plannerprovides a planner able to deal with new situations but also able to include rule-basedand experience-based criteria. This Thesis focuses on the decision-making of the robotbut a good perception and execution are also essential to master the navigation.

Chapter 2 21

2.2.1 Perceiving, Planning and Acting

Rasmussen [121] proposed to structure the knowledge representation of a decisionmaker for complex systems in several levels of abstraction. Donges [33]. [34] introduceda hierarchical three-level structure to explain the task of driving a vehicle, presented inFigure 2.5. According with the work of Donges, the driver activity is subdivided into anavigation, a course guidance and a stabilization task.

Driver

Vehicle

Control variables

(vehicle motion)

navigation route

temporal course

Possible navigations route

selected guidance variables,

selected lane, selected velocity

Environment

Control variables

(steering,

accelerate, brake)

Current lane, current velocity

Interval safe guidance variables

Navigation

Course

guidance

StabilizationLateral and

longitudinal

dynamics

Road

network

Street and

traffic

situation

Driving

surface

Figure 2.5: Three-Layer-Structure of the task of vehicle guidance task proposed byDonges [33].

Figure 2.6 shows the equivalent process for an automated vehicle. The navigationroute is assumed to be known by the system. Firstly the vehicle receives informationabout the environment through different sensors like LIDAR radar or camera and fromthe back-end communication. Lidar works based on the reflection of light beams andprovides the position of different elements measuring the time difference between pulseand the detection of its reflect. As an optical sensor, it is sensible to bad visibilityconditions due, for example, to weather conditions like rain or fog. The radar is basedon radio waves and can measure accurate relative velocities with Doppler effect, it isnot so sensible to weather conditions. Cameras provide a good images and are speciallygood to classify objects. All this information is processed in the Environmental Model,generating the information about the surrounding topology, objects and agents. Forthe dynamical elements, a prediction of their behaviour is computed on the predictionmodule. With the provided information, the decision-making process or maneuverplanner optimizes the expected return for a given cost function. In the ManeuverPlanner, the maximal velocity profile is processed according to the topology. Then,the available keep lane and change lane strategies are assessed and the most adequatepolicy is selected and optimized, providing a drive-able and collision-free trajectory

22 2.2 The Planning Framework

towards the goal. The provided trajectory is tracked, stabilized and implemented bythe controllers and vehicle’s actuators and thus, closing the system loop.

Sensors Environment Model

Back-end

Vehicle

Actors Controller

Maneuver PlannerPrediction

Figure 2.6: Simplified process for sensing, planning and acting for autonomous driving.

2.2.2 The Planning Layers

Many classical planning algorithms are based on a discretized model of the continuoussolution space, where a guided search like A* [114] or D* [146] generates a plan near tothe optimal one. Livelong methods [82], [81] accelerate the time of subsequent searchesreusing the parts that were identical in previous steps. This kind of lifelong planning isdifficult to be applied on highly dynamic environments as the behaviour of the dynamicagents usually lightly deviates from the predicted behaviour. The complexity of thesolution increases with the resolution of the discretization leading to the branchingfactor problem, where the set of different possibilities to be selected tends to infinity.

In order to deal with the branching factor, the proposed planner in this work presentsa multilevel architecture where the first levels work with higher abstraction of thedomain and longer time horizon and the lower levels operate with more precision forshorter time horizons.

Multilevel architectures using top-down approaches and taking hierarchical decisionshave been explored since the fist DARPA Challenges. In 2005, the winner of theDARPA Grand Challenge, the robot Stanley [157], refined the road references givenby a navigation module with the obstacles and provided a smoothed trajectory. Inthe winner of the DARPA Urban Challenge in 2007, the robot Boss [162], a missionplanning module provided the navigation route, a behaviour module provided a motion

Chapter 2 23

goal and two kind of dynamically feasible trajectories were evaluated in order toachieve the given motion goal. The main difference of the maneuver planning with atop-down architecture is its parallelism. The different planning layers of the maneuverplaner work in different abstraction levels of the topology. Figure 2.7 shows the threedifferent planing layers used in this work. The lane selection works with the 1Dabstraction of the lanes and their connectivity, it assesses the long time reward ofdiving on a selected lane and plans the temporal sequence of the lanes and maximalego velocity. the decision making to select the next action is proposed as a combinationof planning and tabular learning as introduced in Chapter 6. The tactical plannerworks with a 1.5D abstraction of the topology, it is in charge of planning within thesensor range and assess lane change and lane keep actions based on the availablegaps. The tactical planer only explores the actions where the precondition of thesafety guarantees are accomplished as explained on the next section. This planneralso integrates the predicted behaviour of the other traffic participants and is furtherexplained on chapters 3 and 4. Finally the trajectory planner is a local planner thatprovides the lateral and longitudinal trajectories for a selected action driving withinthe defined boundaries of a 2D consideration of the lane. The trajectory planner usedin this thesis is based on the work of [122] and not focus of this thesis.

(a) Lane selection (b) Tactical Planning (c) Trajectory Planing

Figure 2.7: The three different planning layers.

The planning combines an abstract decision-making process with numerical mechanisms.This leads to a complete and structured exploration, identification and assessment ofthe different options during the planning task. The abstraction of maneuver clustersallows the planner to reduce the complexity overriding the infinite branching factorof the planning task. Besides, the abstraction level allows a better traceability of theplanner and can be directly used as interface to communicate with the driver.

Section 2.1 presented the different elements of the topology for the driving task. Velocityrestrictions are given by the regulation in order to guide the traffic flow and theircharacteristics. This restrictions are given in form of traffic signals and traffic lightsand the maneuver planner needs to respect the mandatory velocity limits and passrestrictions. Physical velocity limitations come from the physic of the road, the state ofthe tyres, the maximal available friction forces, the road curvature and the sensor range.The planner needs to integrate all the restrictions into the planning and guarantee thatthe proposed plan is safe.

24 2.2 The Planning Framework

2.2.3 Longitudinal Transitions Restrictions between Lanes:Speed Limitation and Undertaking

The elements of the topology present different attributes, one of the most relevantis the speed limit. It is required that the vehicle is driving at any time under thedefined speed limit. Usually a constant velocity is defined by a traffic sign. Conditionaltransitions as traffic lights, gates or barriers can also be considered as changing speedlimitations over short periods of time. In this way, the maximal speed is the same asfor the region if the traffic light is not red, the gate or barrier are opened. Otherwise,the maximal speed is limited to zero.

Speed limitation given by a traffic signalWhen the speed limit of a lane is reduced by a traffic signal, the system has to respectthe new speed limit from the required point. This introduces a longitudinal transitionbetween lanes: at the moment where the vehicle enters on the section with the minimalspeed limit, it should be driving at most at this velocity. How this velocity is achievedis on a first shot open. The vehicle can continue at the highest velocity and make acomfortable maximal brake the last moment or it can take an energy efficient way andmake a foresight braking with a lower deceleration, or stop to give gas and sail untilthe new speed limit [94], [45].

The sailing deceleration (decsail) is defined as the corresponding deceleration to reach thegiven point at the given velocity, limited by a minimal value of the sailing deceleration(decsail0):

decsail = min

(decsail0 ,

v2newLimit − v2

ego0

2 ∗ ssignal0

)(2.1)

To reach a new lower desired velocity vnewLimit given by a static traffic signal locatedat ssignal0 , vsignal(t) = 0, the condition sets that the ego vehicle can not drive fasterthan the new limit velocity when crossing the signal localization: vego(t) ≤ vnewLimit,sego(t) = ssignal(t). The required deceleration time tdec to reach the new velocity limitvnewLimit can be computed assuming a constant deceleration profile:

tdec = vnewLimit − vego0

a(2.2)

The distance driven during the deceleration maneuver sdec can be obtained with tdec:

sdec = vego0·tdec + 0.5·a·t2dec (2.3)

The substitution of the default acceleration (a) by the maximal maximal comfortdeceleration (a = deccomf) provides the maximal comfort velocity profile for theplanning. The substitution by the computed sailing deceleration (a = decsail) providesan energy efficient profile. This two profiles define the interval for the deceleration

Chapter 2 25

maneuver approaching a lower velocity limit. Criteria optimizing energy will be closerto the sailing profile while criteria optimizing time will be closer to the maximal comfortdeceleration profile.UndertakingIn many countries is explicitly forbidden to undertake a slower vehicle. For example,for right-hand-traffic countries as Germany or Spain, it means to overtake the vehicleusing its right lane and not the left lane. In order to avoid the undertaking, the velocityis dynamically restricted to the front vehicle driving on the nearest non overtakinglane (left lane for right-hand-traffic countries).

2.2.4 Range View and Road-condition Limitations

Non-drivable elements can appear each moment into the lane, sometimes a lateralavoidance is not possible, an autonomous vehicle (as well as any human driver) shouldalways be able to make a safety brake when an obstacle appears into its view range.The worst case in a highway is a static element (wrong-way vehicles are not considered).

Safety Constraint 1 (Keep Lane I). The autonomous vehicle has to be able toperform an emergency braking and stop if an static obstacle blocks its lane.

The limitation for the velocity due to the maximal frontal perception range (sFrontV iewRange)is included into the planning. For the safety brake the vehicle considers the cur-rent maximal available deceleration (decsafety) and the current front visibility range(sFrontV iewRange). This limitation also considers that the available acceleration canchange according to the road conditions or µ [55]. For example, wet road conditionspresent a lower friction coefficient and the road available acceleration aroad is reduced.Although the force transmission depends on the road conditions, tyre type, tyre state,vehicle velocity and other factors [125], for the sake of simplicity the influence of param-eters are considered under a estimated road friction µ. The acceleration is composedby a lateral and a longitudinal component. The maximal relationship between thelateral and longitudinal components can be obtained from the forces of the maximaltraction circle or Kamm’s circle as presented in Figure 2.8. For a safety braking withinthe lane, the acceleration is composed by a lateral component consisting on the lateralacceleration required to follow the road curvature κroad and a longitudinal componentconsisting on the available longitudinal safety deceleration decsafety.

aroad ≥ alateral + alongitudinal (2.4)

The lateral acceleration needed to follow the road curvature κroad can be defined asthe square of the velocity and the road curvature:

alateralRoad = κroad · v2 (2.5)

26 2.2 The Planning Framework

FmaxFx

Fy

Figure 2.8: The maximal available force transmission to the road can be approximated asa friction circle, also known as the Kamm’s circle [115], where maximal forcetransmission is the resultant of the longitudinal and lateral forces and it islimited by the available road friction coefficient.

Note that the approximation of the lateral acceleration as the centripetal accelerationdue to the road curvature is only valid in the case that the instant centre of rotationis approached by the road curvature and the vehicle follows the centre lane. For lanechanges, the lateral acceleration would also present an additional component.According to the Safety Constraint 1: Keep Lane I, the vehicle has to be able to stopand perform safety braking within the frontal range view:

v ≤√−2 ·decsafety·sFrontViewRange (2.6)

resulting in a velocity restriction of:

vmax =(

4 ·s2FrontViewRange·a2

road

4 ·s2FrontViewRange · κ2

road + 1

)1/4

(2.7)

Those restrictions are applied for the keepLane actions. keepLane includes two differentactions: keepLaneWithinComfortRange and keepLaneWithinSafetyRange.The precondition for the keepLaneWithinComfortRange sets than the current velocityallows a safety braking within the range view and under the current road conditions. Ifthe precondition is achieved, a velocity sequence is planned. Within this sequence themaximal velocity has to be under the maximal road velocity due to range and roadlimitations. In order to provide some puffer to the controller and to avoid continuouslycommuting actions on limit cases, a safety margin is selected and planed velocity isrestricted according to the range limitation and a maximal acceleration correspondingto the 90% of the maximal available road acceleration.The action keepLaneWithinSafetyRange has no precondition. The vehicle plans amaneuver that stops at the limit of the sensor range using the available deceleration.A detailed example of the functionality of the keepLane actions is presented on sec-tion 2.3.

Chapter 2 27

2.2.5 Consideration of the Safety Constraints as Preconditionfor the Maneuver planner

The view range limitation introduces the first safety criteria or precondition for thedriving task: the vehicle should be always able to break within its view range. But themain challenge is due to the dynamic behaviour of different agents interacting on theroad. Safety distances to potential collision partners have to be dynamically evaluatedand maintained.

The lane change and lane keep strategies are assessed coordinated by two planninglayers (a detailed description of these layers will be introduced on Chapters 2, 4and 6). A first layer decides the adequacy of changing the lane and a second layerassess and executes the available spaces and actions for the lane keep and lane changestrategies [102]. This second layer (Tactical Planner) assesses and searches over thespace of actions induced by the available gaps within the sensor range and regards thesafety constraints for dynamic elements.

GapA gap is the free space limited by two consecutive agents or static obstacles. Within thisThesis, each gap is assigned to a lane and is defined in the one dimensional abstractionand it has both a longitudinal [meters] and a temporal description [seconds].

During the gap assessment, the Tactical planner includes a feasibility check for the gapselection. In this way, the system safety limits are screened on each step. The tacticalplanner simulates forwards the different longitudinal and lateral motions to reach andchange into each gap, see Chapter 3 for more information. If the intention of a potentialconflicting merging vehicle is unclear, the planner considers the resulting expectedutility of the ego action over both options of the potential merging vehicle. The methodis explained on Chapter 4. For each ego action, the different scene evolutions withthe conflicting vehicle (merging in front of the ego vehicle or yielding the right-of-wayto the ego vehicle) are forward simulated and weighted over the probabilities of eachaction. Thus, we obtain the expected return for each ego action from the initial stateat t0 until the simulation horizon tk: E[Gt0:tk |a, gap].

Each time step, a braking maneuver under the available maximal deceleration isplausibilized for the keep lane strategy. This maneuver should be able to avoid a frontalcollision with any collision partner already present on the ego lane. It is assumed thanmerging vehicles will merge into the lane without risking the safety distances.

Safety Constraint 2 (Keep Lane II ). The autonomous vehicle has to keep a safetydistance to the agent in front of it, that allows the vehicle to adapt its velocity andreact to a longitudinal maneuver of the agent in front without collision

28 2.2 The Planning Framework

Relevant for primary ego lane

Relevant for goal lane

Figure 2.9: Lane relevance for other agents during lane change.

The lane change is a complex maneuver when the ego vehicle becomes relevant forseveral lanes and the other vehicles may need to react to them. To assure the safetyduring a lane change maneuver, Miller et al. [105] and MobilEye [137] proposed anassessment to guarantee the safety of the vehicle. Here the limitations for the lanechange are considered on a similar way. As presented in Figure 2.9, a lane changeis composed of three relevant parts. During the first one (gap approaching) the egovehicle is only relevant for the ego lane. During the second interval (changing lane)the ego vehicle is relevant for both lanes and the third part of the lane change it isonly relevant for the new lane (adapting to new lane). During the execution of thelane change, the gap feasibility assessment is still working. If the gap is identifiedas no more available, the vehicle will return to the keep lane strategy. During thegap-approaching, if the distance to the front vehicle gets outside of the comfort limits,a deceleration to return to the minimal reaction distance for the comfort limits withrespect to the front vehicle will be pursued. The distance to the rear vehicle of the newlane also needs to be considered. A conservative comfort deceleration can be assumedfor the rear vehicles decreactive. During the changing lane and the adapting to new laneintervals, the distance between the rear vehicle of the goal lane and the ego vehicle haveto be enough to allow the rear vehicle to perform a comfort braking with decreactiveand maintain a safety distance to the ego vehicle.

Safety Constraint 3 (Change Lane). The autonomous vehicle can perform a lanechange only if the available spaces between the front vehicles allow to adapt its velocityand avoid a collision with the front vehicles within the comfort limits and only if itcan be guaranteed that the rear vehicle of the goal lane has enough space to perform acomfort braking and safely adapt its velocity

Virtual vehiclesIn order to include the restriction of the sensor range during the assessment of thecomfort actions, virtual vehicles are added on the limits of the sensor range. In this way,a vehicle driving at the maximal legal road velocity is located on the rear limit of the

Chapter 2 29

perception range and a vehicle driving at the velocity of the ego vehicle is located onthe front limit of the perception range. During the action keepLaneWithinSafetyRangea virtual static vehicle is placed on the front limit of the perception range.

In this way, the safety constraints are considered for each ego action at the plausibi-lization or precondition step.

2.3 Experiments

This section illustrates with examples the planning steps and the use of preconditionsas safety guards for the planner.

2.3.1 Velocity, Range and Road Limitations - Keep LaneConstraints

In order to illustrate the performance of the system and the different elements con-sidered in the planning framework a experiment with two different scenarios wasconducted. In this experiment the vehicle drives on a straight lane and enters in aright curve, different available acceleration and perception range are analysed. Forthese scenarios, the action keepLaneWithinSafetyRange implies a constant cost of 10and keepLaneWithinComfortRange has an associated cost of 3. These cost are constantin order to illustrate the preference of keeping the lane within the comfort limits ifthis action is possible and to illustrate how the preconditions work. Next chaptersintroduce the utility function used to assign the different cost to the available actionsdepending on comfort and safety criteria as well as according to the lane adequacy.

The first scenario Legal Limitations shows a vehicle driving on a straight lane andentering in a curve with a small radius. Shortly before the curve entrance, the legalvelocity is limited from 130 km/h to 80 km/h. The maximal available accelerationis 10 m/s2 and the front perception range is 130 m. The comfort acceleration is setto 1.5 m/s2 and the sailing acceleration is set to 0.5 m/s2. The topology information(curvature and legal velocity) is processed for the following 500 m.

Figure 2.11 shows the vehicle kinematics during the experiment and Figure 2.10 showsthe 2D birds eye view of the vehicle at different time steps. The black surface indicatesthe limits of the perception range. The lila surface indicates the limits of the frontperception range still covered by the topology foresight. During the whole experiment,the precondition for the keepLaneWithinComfortRange is accomplished (the vehicle isable in any moment to perform an emergency brake within the perception range withavailable acceleration restricted by current road conditions). Therefore the planner

30 2.3 Experiments

Birds eye view: 10 s

2600 2900 3200

x[m]

450

600

750

y[m

]

ego position

out of range

foresight range

80 km/h legal limitation

perception range

Birds eye view: 30 s

2600 2900 3200

x[m]

450

600

750

y[m

]

Birds eye view: 20 s

2600 2900 3200

x[m]

450

600

750

y[m

]

Birds eye view: 40 s

2600 2900 3200

x[m]

450

600

750

y[m

]

Figure 2.10: Evolution of the ego vehicle during the Legal Limitations scenario.

Longitudinal Distance

15 30 45 60

time[s]

0

100

200

300

400

500

[m]

Longitudinal Velocity

15 30 45 60

time[s]

0

15

30

[m/s

]

limit sailingtarget sailing

target legal

ego actual ego planned

target safety maximal safety

target foresight

target comfort limit comfort

Lateral Acceleration

15 30 45 60

time[s]-10

-5

0

5

10

[m/s

2 ]

Longitudinal acceleration

0 15 30 45 60

time[s]-10

-8

-6

-4

-2

0

2

[m/s

2]

limit sailing

road limit

ego actual ego planned

limit safety

limit comfort

Figure 2.11: Ego vehicle dynamics during the Legal Limitations scenario.

plans to maintain the velocity at the maximal legal and road conditions within theforesight. At the beginning, the most restrictive condition is the maximal legal velocityrestriction of 80 km/h at position1 (red circle in Figure 2.10). When the distancebetween the ego vehicle and position1 is smaller than the distance needed by thevehicle to reach the new velocity using the sailing deceleration, a pair of commandvelocities are given to the trajectory planning: drive at maximal velocity of 120 km/hand reach 80 km/h at distanceX, being distanceX the longitudinal distance in roadcoordinate system between position1 and the position of the ego vehicle. From second

Chapter 2 31

7 the foresight planning limits the velocity due to the scenario in order to reach anadequate velocity to drive in the curve, resulting in new planning command: drive atmaximal velocity of 109 km/h and reach 80 km/h at distanceX. Once position1 isreached, the new command velocity is 80km/h. Note that the limit reference profilesfor the comfort and the sailing are only related to the maximal legal velocity.The second scenario, called Sensor Failure shows the interaction of the differentrestrictions. The vehicle is driving on a wet road with a maximal available accelerationof 6m/s2 and the front perception range is 130m. The comfort acceleration is setto 1.5m/s2 and the sailing acceleration is set to 0.5m/s2. The topology information(curvature and legal velocity) is processed for the following 500 m.

Birds eye view: 10 s

2600 2900 3200

x[m]

450

600

750

y[m

]

ego position

out of range

foresight range

80 km/h legal limitation

perception range

Birds eye view: 30 s

2600 2900 3200

x[m]

450

600

750

y[m

]

Birds eye view: 20 s

2600 2900 3200

x[m]

450

600

750

y[m

]

Birds eye view: 40 s

2600 2900 3200

x[m]

450

600

750

y[m

]

Figure 2.12: Evolution of the ego vehicle during the Sensor Failure scenario.

At the time step 5.8 s, the foresight velocity indicates a reduction in order to adaptthe vehicle to the current acceleration and the foresight curvature: drive at maximalvelocity of 84 km/h and reach 80 km/h at distanceX. When the vehicle is drivingthe curve, at time step 30 s a sensor failure is injected and the front perception rangeis suddenly limited to 52 m. The system limits its velocity to adapt itself to the newmaximal safe velocity. During 0.84 s the precondition for the keepLaneWithinComfor-tRange is not accomplished, and the planning selects the keepLaneWithinSafetyRangetrajectory, resulting in a command reach 0 km/h at the sensor limit (52 m). For thekeepLaneWithinSafetyRange action, the maximal available longitudinal accelerationcan be used. Once the precondition is once more accomplished, the vehicle selects thekeepLaneWithinComfortRange action, that includes the velocity restriction due to theroad curvature and the maximal acceleration.

32 2.3 Experiments

Longitudinal Distance

15 30 45 60

time[s]

0

100

200

300

400

500

[m]

Longitudinal Velocity

15 30 45 60

time[s]

0

15

30

[m/s

]

limit sailingtarget sailing

target legal

ego actual ego planned

target safety maximal safety

target foresight

target comfort limit comfort

Lateral Acceleration

15 30 45 60

time[s]-5

0

5

[m/s

2 ]

Longitudinal acceleration

15 30 45 60

time[s]

-6

-4

-2

0

2

[m/s

2]

limit sailing

road limit

ego actual ego planned

limit safety

limit comfort

Figure 2.13: Ego vehicle dynamics during the Sensor Failure scenario

2.3.2 Safety Constraints as Precondition for the Lane Change

The experiment shown Preconditions Lane Change analyses the safety constraintspresented on section 2.2.5 for the Change Lane actions. The vehicle is driving on a dryroad with a maximal available acceleration of 10 m/s2, the front perception range is130 m and the rear perception range is 80 m. The comfort acceleration is set to 1.5 m/s2

and the sailing acceleration is set to 0.5 m/s2. The assumed braking deceleration for rearvehicles is −2 m/s2 for vehicles located within (rearrange, 0 .85 ·rearrange) and −0.5 m/s2

for the other vehicles. The topology information (curvature and legal velocity) isprocessed for the following 500 m. The planning algorithm checks each step theavailability of the actions keepLaneWithinSafetyRange, keepLaneWithinComfortRangeand ChangeLaneComfortToTheLeft. Each action has a corresponding cost of 10, 3 and1 respectively. These costs are constant in order to illustrate the first preference ofchanging the lane, if it is not possible keeping the lane within the comfort limits orkeeping the lane within the safety limits, in order to show how the preconditions ofthe safety criteria works. Next chapters introduce the utility function used to assignthe different cost to the actions depending on comfort and safety criteria as well as thelane adequacy. The black surface indicates the limits of the perception range. The lilasurface indicates the limits of the front perception range still covered by the topologyforesight. The ego vehicle (white and black) attempts to perform a lane change. The redvehicle (veh1) is driving at 85 km/h, the orange vehicle (veh2) is driving at 150 km/hand the blue vehicle (veh3) is driving at 130 km/h. A virtual vehicle driving at themaximal legal velocity is located on the limits of the rear perception range (vehvirt) .

Figure 2.15 shows the scenario evolution and Figure 2.14 shows the main kinematicvalues related with the ego vehicle. The scenario begins with the ego vehicle driving

Chapter 2 33

at a velocity of 100 km/h and the maximal legal velocity limited to 130 km/h. Onits lane a slower vehicle (veh1, red coloured) is driving at 85 km/h. The actionChangeLaneComfortToTheLeft is possible and activated. Around 3 seconds after thebeginning of the experiment a fast vehicle (veh2, orange coloured) driving at 150 km/h(driving faster than legally allowed) appears on the perception range of the ego vehicle,the precondition for the lane change is not valid and the lane change is aborted, asonly the actions keepLaneWithinSafetyRange and keepLaneWithinComfortRange areavailable. The ego vehicle reduces its velocity to drive behind veh1. When veh2 reachesand pass the ego vehicle, the lane change is again activated. Around second 17 anotherfast vehicle driving 130 km/h (veh3, blue) appears on the perception range of the egovehicle but the distance and relative velocities remain under the limits of the lanechange precondition and the action ChangeLaneComfortToTheLeft remains activateduntil the lane change is accomplished. The ego vehicle begins to accelerate until thedesired velocity of 130 km/h once it is located on the left lane and not anymore drivingbehind veh1.

Longitudinal Distance

15 30 45 60

time[s]

-80

-40

0

40

80

120

[m]

veh1

veh2

veh3

vehvirt

Longitudinal Velocity

15 30 45 60

time[s]

0

15

30

[m/s

]

Lateral Position

15 30 45 60

time[s]0

2

4

[m]

ego actual ego planned

maximal safetytarget legal

precondition lane change

Longitudinal Acceleration

0 15 30 45 60

time[s]-10

-8

-6

-4

-2

0

2

[m/s

2]

limit sailing

road limit

ego actual ego planned

limit safety

limit comfort

Figure 2.14: Ego vehicle dynamics during the Preconditions Lane Change scenario

This experiment shows how the preconditions needed to select the Change Lane actionserve as guard to guarantee the safety during the lane change, making possible to reachthe desired goal or to abort the maneuver in each time step.

The presented experiments had a demonstrative constant cost assigned to each action,reducing the strategy selection to a picking of the a priori assessed and prioritizedavailable actions. The strategy to select the most adequate time, gap and lane accordingto the surrounding traffic participants are presented on the next chapters.

34 2.3 Experiments

Birds eye view: 3 s

x[m]

y[m]

ego position

out of range

foresight range

veh_1 position

perception range

veh_2 position

veh_3 position

veh_virt position

Birds eye view: 8 s

x[m]

y[m

]

Birds eye view: 15 s

x[m]

y[m

]

Birds eye view: 21 s

x[m]

y[m

]

Birds eye view: 35 s

x[m]

y[m

]

Birds eye view: 5 s

x[m]

y[m

]

Birds eye view: 12 s

x[m]

y[m

]

Birds eye view: 18 s

x[m]

y[m

]

Birds eye view: 25 s

x[m]

y[m

]

Birds eye view: 45 s

x[m]

y[m

]

Figure 2.15: Evolution of the ego vehicle during the Preconditions Lane Change scenario.

Chapter 2 35

2.4 Related Work

The concept of legal safety as basis for the interaction between human and automateddrivers was proposed by Vanholme et al. [164]. This concept was expanded to a lanesafety assessment based on the ICS (Inevitable Collision States) and its stochasticequivalent PDS (Probabilistic Collision States) [3]. One limit of this safety assessmentis the infinite branching factor because the number of possible scenarios tends toinfinity when the horizon time increases. Althoff et al. [4] simplified this problemconsidering the extreme possible behaviour (maximal acceleration and deceleration)for the other road users. Their on-line formal verification can guarantee safety forall times but the method can lead to extremely conservative behaviours because theyonly accept a solution for the ego vehicle if no overlap with other road users happensduring the whole horizon. Nevertheless, an evaluation of the risk is necessary for thewhole maneuver and the safety needs to be guaranteed under defined constraints forthe immediate time horizon.

Many proposed architectures rely on a top-down architecture where the lower leveloptimizes a safe trajectory for a selected longitudinal or lateral goal [1, 41, 42, 59,122, 168]. The main problem of the top-down architecture is its level dependency. Ifthe high level fails to make its decision, the lower module can only return that nosolution was found. This forces the high level to propose another solution, consequentlyloosing critical time in dangerous situations. Therefore it is important to guaranteethat the subgoals selected by the higher layers of the architecture are still feasibleon the lower layers. Richards et al [126] and Schouwenaars et al. [132] proposed theuse of Mixed-integer linear programming (MILP) to integrate the obstacle avoidanceas a constraint for the linear programming and find optimal maneuvers. In [133] theautors explored the safety issue when using MILP, because when the approach isused in a Receding Horizon setting without taking further safety policies into account,the selected strategy could become unfeasible in the next iteration and stuck into acollision risk. A combination of hybrid automaton and decision tree is the architectureproposed by Ardelt et al. [7], integrating a discrete decision architecture within thecontinuous data processing. Ardelt et al. [6] evaluate the utility of each lane consideringmeasurement uncertainties. Some works concentrate on the cooperation between roadusers. Schwarting et al. [136] present a cooperative decision-making algorithm thatanticipates and solves predictable conflicts. Based on the game-theory, a combinedprediction and planning framework is presented by Bahram et al. [10]. The approachof Galceran et al. [52] proposes a framework based on a behavioural anticipation and adecision-making to consider the interaction between agents. All these methods optimizethe immediate utility for the lane change or lane keep decision, also considering thecooperation between vehicles but they do not take into consideration its further effects.

36 2.4 Related Work

In other words, these methods work within a short horizon time but do not provide amedium term strategy.

Partial Observable Markov Decision Models (POMDPs) have been widely used toapproach planning under uncertainty [72], [144]. Belief vectors build policy treesto compute optimal policies using dynamic programming algorithms as presentedby Cassandra et al. [24].The problem of this method is its associated computationload. Wiering et al. [170] introduced HQ-learning, a method that decomposes a givenPOMDP into a sequence of reactive problems, distinguishing between reactive policiesand critical points. Recently, efficient methods for solving online POMDP problemshave been presented and applied to the motion planning task. Bai et al. [12] presentedan Intention-aware planner, demonstrated on a golf cart. Morere et al. [108] proposeda Continuous Belief Tree Search which dynamically samples promising actions whileconstructing a belief tree and demonstrated it on a real-world parking task application.Nevertheless, the planning times of around three seconds make it unsuitable for drivingapplications.

Nilsson et al. [44] presented STRIPS (Stanford Research Institute Problem Solver), anautomated planner whose goal is to find some composition of operators that transformthe initial world model into a world model that satisfies the goal condition. PlanningDomain Definition Language (PDDL) [98], was introduced as an attempt to unifythe planning languages. Since then, several planning formalisms have been developedand extended to integrate temporal dependencies [13, 47] or to model continuousdomains [46]. Planning formalisms provide the tool to deal with the complexity,improving the knowledge representation and the reasoning process. Frazzoli et al. [49]introduced an approach to generate plans as the concatenation of a finite number ofmotion primitives and trim primitives. A Maneuver Automaton was able to generateby concatenation of compatible primitives a motion plan starting from an initial state.In this approach, the feasible trajectories are reduced to the combination of motionprimitives. Dornhege [35] presents a semantic planning where the high and low levelwork more tightly integrated for a robotic planning task. Zhao et al. [177] integrateontology-based knowledge into the decision-making for intersections, but the rule basedstrategies can get too conservatives.

One contribution of this Thesis is the combination symbolic and continuous planning.This combination gets over the computational disadvantages of a down-top approachand the information-loss of a top-down approach and achieves integration and coherencebetween different abstraction levels. Symbolic entities allow to consider the actions orschematic operators on a high level, while different external operators can be calledaccording to the available information, detail level and computational resources toanalyze and compute the requested values. The exploration of the plan is guidedopening the safety critical first, and expanding first the most promising and immediate

Chapter 2 37

branches. The system presented in this Thesis also handles the lack of informationand the uncertainties derived from the behaviour of other traffic participants with thecombination of action and numerical evolution of the scene.

2.5 Discussion

The transition to fully autonomous intelligent vehicles needs to bridge the gap betweentraditional rule-based systems and adaptive systems that can learn and integrate knowl-edge acquired through the interaction with the environment and previous experiences.Nevertheless, safety is a critical issue that needs to be guaranteed and traced.

Two main contributions are presented in this chapter. Firstly, after defining anddelimiting the the domain restrictions for driving in highway scenario, the planningframework to plan and execute the maneuvers for a highly automated vehicle in highwayscenarios is presented. Secondly, the safety restrictions are included in the planningframework in the form of preconditions. In this way, the planner can guarantee thatthe resulting plan will accomplish with the safety constraints. This method makespossible to explore different feasible planning alternatives maintaining in all moment asafety backup.

In this chapter, the driving task is set out as a planning problem. The highway isdefined as a 2D model with a road topology and different objects located on it, wheredifferent agents interact with each others. The autonomous vehicle plans and executeseach time step according with the available information obtained from the environmentand according with former experiences.

With the combination of symbolic and continuous planning instances, several modulescan interact with each other, achieving a non-hierarchical decision-making process. Thedomain is defined on different abstraction levels that allow a progressive refinement ofthe planning, always regarding the safety constraints. The system covers the problemsof driving in a highly dynamic environment and the uncertainties derived from thebehaviour of other agents. In order to obtain a robust behaviour, observed informationfrom former experiences can be integrated on the system within different planning layers.This preconditions are analytical restrictions that need to be fulfilled in order to selectthe related action. In this way, the safety constraints are continuously accomplishedwithin the short horizon including the changing conditions of the environment.

This chapter introduces the planning framework and the main definitions related withthe decision making. The inclusion of safety constraints on the planning as preconditionsis presented and discussed. The actions can only be selected if the safety preconditionsare given. A comfortable lane keeping guarantees that the autonomous vehicle is able

38 2.5 Discussion

to adapt its velocity to its front vehicle. The autonomous vehicle considers each timestep the maximal available braking deceleration under the current road conditions andif an static obstacle blocks its lane, it is able to perform an emergency braking and stop. During a comfortable lane change the vehicle has to adapt its velocity to the frontvehicles of the related lanes and a safety rear distance in the goal lane is also monitoredduring the lane change. The introduction of virtual vehicles at the perception limits ofthe ego vehicle allows to implicitly include the perception range in the preconditions.The interaction of the planning framework and the preconditions is illustrated withexperiments. The experiments show on the examples of a new velocity limit, a sensorfailure and a lane change, how the system is able to react to the changing conditionsand respect the safety constraints.

Next chapters present the different planning layers, that allow to combine and optimizethe planning on a short horizon but also on a long horizon.

Chapter 3

Planning within the Sensor Range

During the autonomous driving task the vehicle has to se-lect the most adequate maneuver sequence to optimize thetravel-duration and the driving comfort, always regardingthe safety limits. One important aspect of autonomous dri-ving lies in the selection of maneuver sequences. Humandrivers analyze and try to anticipate the traffic situationchoosing their actions not only based on current informa-tion but also based on experience. On a similar manner, theinformation provided by the sensor is combined with someassumption of the behavior of other traffic participants andwith information from previous experiences in order to op-timize the lane change and lane keep strategies.

The objective of this work is to provide an adequate framework to allow the autonomousvehicle a safe drive, maintaining the requested goals and providing robustness overother traffic users behavior for a further horizon time.

When considering the planning horizon, there is a trade-off between the accuracy ofthe information and the time horizon, that means the further the planning horizonis considered the more imprecise is the prediction of the scene evolution. This char-acteristic is not a problem for a human driver, because they are used to combiningan anticipatory behavior evaluating the current situation and their evolution with amore reactive behavior dependent on the immediate actions. For example a driverwants to change to the left lane to drive faster, selects a gap between two vehicles,drives towards this gap and is able to adapt his velocity if another unexpected vehiclemerges in front of him. This chapter introduces the intermediate decision-making level,responsible for planning and executing the driving activity on medium time horizon,the Tactical Planner.

The approach is described using the example of a highway entrance ramp scenario

39

40 3.1 The Tactical Planner Approach

Figure 3.1: In entrance ramp situations the vehicles driving on the entrance lane have toachieve a mandatory lane change on a limited space and to merge itself intothe traffic flow.

as the shown in Figure 3.1. The approach is also valid for interchanges and furtherhighway scenarios.

This chapter presents a novel medium horizon planner (Tactical Planner) that combinesthe information provided by the sensor and some assumptions of the behavior of theother traffic participant and its focus are the lane change and lane keep strategies. Theapproach is validated in simulation and through a set of experiments carried out witha real vehicle and an integrated traffic simulation also known as vehicle-in-the-loop(VIL).

3.1 The Tactical Planner Approach

This chapter focuses on the medium horizon planning. The main functionality of theTactical Planner is to identify the different maneuver sequences to reach a desired laneand velocity, quantify the cost of each of them, select the best policy and forward thenext selected maneuver to the trajectory planning. Depending on the automatizationlevel, the lane and velocity can be given by the system or by a human driver. Theselection of the most adequate lane to drive considering the reward over a furtherhorizon is described in Chapter 6.

3.1.1 Gap Oriented Action Description

The tactical level combines a semantic abstraction of the configuration space with acontinuous estimation of the scene evolution. The configuration space is abstractedinto gap spaces allowing the planner to evaluate different clusters of possible motionbehaviors related to each gap. This abstraction provides the planner a simplifiedinterface to reason over time and restrict the search space for the numerical optimizer.With this description, the lane keeping strategy includes the possibilities of continuingon the current lane (keepLane actions, {KL}) as keep on driving maintaining the safetydistance on current front gap or keep on driving considering the gap generated for a

Chapter 3 41

predicted merging vehicle or brake until still-stand on the current lane. A change lanestrategy {CL} includes all the possibilities of changing whether to the left or rightlane into the different defined gaps. The tactical planner is composed by five differentoperators, as shown in Figure 3.2.

Gaps Sort & Predict

Keep LaneFeasibility & Pre-assessment

Change LaneFeasibility & Pre-assessment

PolicySelection

ManeuverOptimization

Selected Maneuver

Gap basedmaneuver selection

Figure 3.2: Tactical Planner work-flow

Sort and Predict Gaps:The agent list is processed to provide a list of current and potential gaps. A gap isdefined by its front and rear limitations. These restrictions of the available space canbe generated by agents, non-drivable objects or other static restrictions as an endinglane. For each considered lane, the relevant agents and other space limiters are sortedwith respect to their longitudinal distance, generating the gap information. Virtualvehicles are included at the sensor range limits as explained in section 2.2.5.The intention of changing or keeping the lane is predicted for the surrounding vehicles.For the vehicles with a lane change intention prediction, two different gap sets aregenerated to consider both, the scenario in which it does not change the lane and alsothe scenario in which it changes the lane. Both scenarios are forwards simulated, thecost are computed and weighted proportionally to the belief of a lane change to besuccessful. A detailed explanation of the intention prediction is presented in Chapter 4.Once the intention of the surrounding vehicles is predicted and the different gaps aregenerated, a feasibility and assessment evaluation is performed.

Feasibility check and Pre-assessment:The current lane is evaluated regarding the gaps located in the ego lane, in front ofthe ego vehicle. The front vehicle, potential merging vehicles and other obstacleslocated on the ego lane are considered, a keepLane assessment is done regarding theircurrent longitudinal distances and an associated utility cost for the keepLane actions is

42 3.1 The Tactical Planner Approach

estimated. On a similar way the current and predicted gaps of the neighbour lanes areanalysed, the reachable gaps for a lane change are identified and an associated utilitycost is calculated. The feasibility and assessment functions are presented with moredetail in section 3.1.3.

Policy Selection:The block has two different tasks, firstly it selects the most conservative maneuver andgives it to the optimization module Then it process the then most promising maneuverfrom the feasible set. Once all the feasible maneuvers are already expanded or theavailable time is over, it selects the most adequate maneuver to be implemented bythe trajectory planning.

Maneuver Optimization:A prediction of the system evolution over a given time horizon is computed in thisstep. The time horizon is discretized and, for each time step, the ego control variableis computed. At this level the longitudinal variable is the acceleration whereas thelateral variable is the velocity in order to simplify the model. Section 3.1.5 introducesdifferent kinds of optimization considered within this work. To obtain the optimizationor prediction of the scene evolution until a given time horizon, a model of the otheragents’ behavior needs to be considered. Different behavior models are discussed inthe next section.

To assure a safe behavior, on the first place the most conservative maneuver is selectedand optimized and then, depending on the available computing time, further optionsare explored. Thus, a fall-back policy is considered during each step.

3.1.2 Modeling agents’ behavior

Each time step, the maneuver planner simulates the evolution of different policies inorder to obtain the estimated cost for each ego policy. As explained before, agents aredynamic and proactive and follow their own objectives. It is therefore not possible toobtain pure certainties of their behavior, but only estimations and predictions. Similarto the ego approach, for this work a behavior model of the other traffic participants isassumed. A first prediction of the behavior computes the probability of the maneuver,that can be keep the current agent lane or change the lane, merging in front ofanother agent. Then a longitudinal and lateral model is used to obtain the lateral andlongitudinal trajectory corresponding with the selected maneuver.

Longitudinal Motion Model for the surrounding agentsTwo different longitudinal motion models are used within this work:

• Constant acceleration(CA) - Constant velocity(CV): a simple kinematic modeldefined as a combination of constant CA-CV profiles. The CA-CV model allows

Chapter 3 43

to give a first estimation of the evolution. These profiles also allow findinganalytic solutions for the time when two vehicles intersect and therefore find thereachability limits. In absence of more information, the vehicles are assumed tokeep the observed acceleration (a(t0)) during a defined time span (tacc) and tokeep a constant velocity afterwards. In this way, the free-drive velocity (vfree) ofa vehicle at each time (t) can be computed from the current attributes determinedat t0, the observation time:

vfree(t) = v(t0) + a(t0) ·min(tacc, t), t ≥ 0. (3.1)

Vehicles driving behind a slower vehicle are also assumed to react and break inorder to adapt their velocities to the slower front vehicle. They are assumedto break with a defined deceleration (decreactive) at some time point (tbraking)until the velocity is adapted to their front vehicles (tadapted). This model allowsto define a vehicle-following adaption behavior, but more accurate models canalso be used for the description of the interactions between vehicles. As thedeceleration parameters from other road users can not be controlled, a maximalconservative deceleration (decmaxreactive

) based on experience can be assumed.Thus, allows to consider a reactive behavior:

vreactive(t) =

v(t0) + a(t0) ·min(tacc, t), t < tbraking

v(tbraking) + decreactive ·∆tbraking, t ≥ tbraking ≤ tadapted

v(tadapted) t ≥ tadapted

(3.2)

with tadapted = tbraking + ∆tbraking,

∆tbraking = vreactive − vfrontdecreactive

,

decreactive =

decmaxreactive, vreactive > vego

0, otherwise.

(3.3)

• Intelligent Driver Model (IDM) . The model developed by Treiber et al [159]allows a detailed interaction model between the vehicles. Equation 3.4 present theIDM acceleration computation, with ∆v and ∆s being the relative longitudinalvelocity and distance between a vehicle and its corresponding front vehicle, v andvdes being the current and desired velocities of the vehicle. sIDM correspondsto the minimal accepted inter-vehicular distance,aIDM , bIDM are the maximalacceleration and braking values, THWdes the desired time headway and δ anexperimental exponent. This work assumes a similar behavior for all the vehicles,following the equation 3.4 and saturating the values to aIDM , bIDM . Table 3.1

44 3.1 The Tactical Planner Approach

summarizes the values used within this work and assumed as constant. For adetailed prediction this values could be inferred and predicted from the observedbehavior of the vehicles. Using a sequential simulation of the scene, the equationdelivers an acceleration value for each vehicle.

a = aIDM

1−(v

vdes

)δ− 1

∆s

(sIDM + THWdes·vdes + v·∆v

2√aIDMbIDM

)2 (3.4)

Table 3.1: IDM parametrization used within this work

parameter sIDM aIDM bIDM THWdes δ

value 2 1 1.5 1 4

The deceleration during the reactive part (decreactive) of the CA-CV model canbe considered as a relaxation of the following behavior of the IDM.

Both longitudinal motion models are based on a following behavior. In order to obtainthis behavior, it is crucial to determine which agent is following which agent, in otherwords, to obtain the relevance sequence of the agents for each lane. One vehicle isconsidered to be longitudinal relevant for another when it is located in front andsituated on the same lane. To improve the longitudinal behavior of the ego vehicle andprovide a better foresight, the ego vehicle can consider up to three front vehicles at thesame time: the front vehicle on the ego-lane, the front vehicle of the goal gap, and apredicted vehicle merging in front of the ego vehicle in the ego lane. Figure 3.3 showsthe relevance configuration. The ego vehicle considers all of them during the planningand takes the optimal trajectory within the safety limits.In a first instance only the most likely maneuvers for the other traffic participants

Figure 3.3: The ego vehicle (centre line, black and white) is performing a lane changebetween the orange and the white vehicle on the left lane. The blue vehicle onthe right lane is predicted to make a lane change into the centre lane behindof the green vehicle. The ego vehicle’s velocity is going to be limited by threevehicles: the white one on the left lane, the green one on the centre lane andthe incoming blue one on the right lane

are considered, Chapter 4 explains how this method can be improved in order to gainrobustness considering more unlikely outcomes.

Chapter 3 45

Lateral Motion Model for the surrounding agentsAs explained before, the lane and decision-making strategy works based on 1 and1.5 dimensional abstractions. The longitudinal motion is studied on detail whereasthe lateral motion is only relevant in order to predict the lane correspondence ateach time step. A simple model for the lateral motion is used within this work. AConstant-Velocity model is assumed for a lane change, and a constant lateral lanechange duration is assumed (tLClat

). Vehicles that are not performing a lane changeare assumed to be driving on the middle of the lane.

3.1.3 Feasibility Check

The operators for lane change and lane keep policies are plausibilized based on areachability analysis for available gaps. The reachability analysis or feasibility checkverifies on an analytical way if, under the given acceleration assumptions, the distancesduring each maneuver are compliant with the safety constraints introduced in Chapter 2.

Lane Keep FeasibilityFor the current lane, the front vehicle and potential front merging vehicles are ana-lyzed and the lane keeping policies for the comfort and also for the safety limits areplausibilized.

As presented in Chapter 2, the precondition of keepLaneWithinComfortRange is satisfiedwhen the ego vehicle is able to adapt its velocity within the comfort decelerationdeccomfort limits. Frontal limitations are the front vehicle, the front merging vehiclesand the end-of-driveability of the ego lane. If the preconditions Safety RequirementsKeep Lane I and II are satisfied, the action is feasible.

The action keepLaneWithinSafetyRange has no precondition. The ego vehicle is in eachtime step allowed to adapt its velocity within the safety deceleration, these actionsget higher associated costs, but remain as fall back option if no comfort maneuver isavailable. In case that a collision cannot be avoided, the damages can still be reducedby decreasing the kinematic energy as much as possible before an unavoidable frontcollision takes place.

Lane Change FeasibilityThe maximal velocity profile for the goal lane is given by the conditions from the SafetyRequirements Keep Lane I. The maximal velocity profile for the ego lane is given bythe The lane change policies are computed for the comfort limits and included in thepolicy list. A lane change is defined with the original lane, a connected and drive-ablegoal lane and the pointed gap. The maneuver is limited by the pointed gap and also bythe front available space. This space is defined by the front vehicle(s) on the currentlane and the original lane’s end-of-driveability, if given.

46 3.1 The Tactical Planner Approach

Algorithm 1 Change Lane Comfortif current situation compliant with Safety Requirements Keep Lane (I, II) then

for each connected and drivable neighbour lane li=R,L dofor each gapi on neighbour lane do

if gapi is-reachable thenAdd CL-li-into-gapi to policy list

end ifend for

end forend if

Gap Reachability and Minimal estimated Lane Change Duration (mineLCD)The minimal estimated lane change duration into a gap is on a first step computed inthe lane Change Feasibility module. In further steps, in the Maneuver OptimizationModule, this first estimation will be refined. Each gap is defined by a front elementthat can be a vehicle or a statical limitation and rear element.

Regarding the lateral behavior, the ego vehicle drives on the ego lane until a lanechange is initiated (tLC). The lateral motion of the lane change is assumed to last aconstant duration tLClat

. During the first interval of the lateral lane change the egovehicle is only relevant for the ego lane (t ≤ tLC−1 with tLC−1 = αeLC

· tLClat). Then the

ego vehicle is relevant for both the ego lane and the goal lane until it is fully located onthe goal lane (tLC−2 with tLC−2 = αgLC

· tLClat. During the last interval of the lateral

lane change the ego vehicle is only relevant for the goal lane . Within this Thesis, theduration of the lateral lane change is assumed as constant and the duration of bothintervals is assumed as constant, with αeLC

= 1/3 and αgLC= 2/3. The different time

steps are visualized on figure 3.4.

In order to be compliant with the Safety Requirement Keep Lane I, the ego velocityhas to be always under the maximal velocity (vmax) defined by the perception range,the available acceleration and the maximal legal velocity.

vego(t) ≤ vmax(t) ∀t (3.5)

The Safety Requirement Keep Lane II states that the ego vehicles has to maintain asafety distance to the agents in front. This requirement is covered when the ego vehiclecan adapt its velocity to a front slower or statical element colliding and maintain adistance that allows a further velocity adaption after a reaction time (∆treaction).

sego(t) + vego(t)·∆treaction ≤ sfront(t), t ≤ tLC + αgLC·tLClat

sego(t) + vego(t)·∆treaction ≤ sgapi:front(t), t ≥ tLC + αeLC·tLClat

(3.6)

Chapter 3 47

Temporal evolution of the ego vehicle lane change

Relevant for primary ego lane

Relevant for goal lane

tt0 tLC tLC + tLClattLC−2tLC−1

Scene snapshot at time t0gapi

front − gap

Scene snapshot at time tLCgapi

front − gap

Scene snapshot at time tLC−1

gapi

front − gap

Scene snapshot at time tLC−2

front − gap

Figure 3.4: Gap description and temporal evolution of a lane change of the ego vehicle. Theego vehicle is the black and white one. The front gap of the ego lane is limitedby is the orange truck in the front. The gap gapi of the goal lane is defined bythe front red vehicle gapi : front and the rear blue vehicle gapi : rear.

The Lane Change Feasibility module assumes a combination of CV and CA profiles forthe involved agents.

The velocity of the front vehicle is defined as a CA− CV combination as shown inEquation 3.1, vfree. This profile is also considered for the rear-vehicle motion duringthe first segment of the ego lane change. When the ego vehicle begins a lateral lanechange (tLC), a reactive behavior enhances the longitudinal profile of the rear-vehiclewith a reactive behavior as described in Equation 3.2, vreactive. The reactive behaviorof the rear vehicle begins when the lateral lane change maneuver reaches the goal lanerelevance (tLC + αeLC

·tLClat). This behavior assumes that the rear vehicle recognizes

48 3.1 The Tactical Planner Approach

the lateral motion of the ego vehicle and reacts with a conservative braking decelerationdecreactive. The Change Lane constraints enhance the conditions of Keep Lane II shownin Equation 3.6 with the conditions for the rear vehicle of the goal gap presented inEquation 3.7

sego(t) ≤ sgapi:rear(t) + vgapi:rear(t)·∆tbraking, t ≥ tLC + αgLC·tLClat

(3.7)

The ego vehicle is assumed to combine first constant velocity phase with duration∆t1, a second constant deceleration phase with the maximal comfort decelerationdeccomfort and a duration of ∆t2, a third acceleration phase with the Maximal ComfortAcceleration acomfort and duration of ∆t3, a fourth deceleration phase with the deccomfortand duration of ∆t4 and a final constant velocity phase.

v(t0)

Velocity

[m/s]

t

deccomfort acomfort

v(t2)

deccomfortv(t3)v(t4)

t1 t2 t3 t4

Figure 3.5: Pre-assessment ego vehicle velocity profile based on CV - CA with the maximalavailable comfort acceleration and deceleration values..

vego(t) =

v(t0), t ≤ t1

v(t1) + deccomfort·(t− t1), t1 < t ≤ t2

v(t2) + acomfort·(t− t2), t2 < t ≤ t3

v(t3) + deccomfort·(t− t3), t3 < t ≤ t4

v(t4), t > t4

(3.8)

t1 = t0 + ∆t1,t2 = t0 + ∆t1 + ∆t2,t3 = t0 + ∆t1 + ∆t2 + ∆t3,t4 = t0 + ∆t1 + ∆t2 + ∆t3 + ∆t4,with ∆t1,∆t2,∆t3,∆t4 ≥ 0.

(3.9)

The module Lane Change Feasibility computes the minimal tLC when the ego vehicle canbegin the lateral lane change under the limitations given by the Equations 3.5, 3.8, 3.6and 3.7. A laneChange into a gap is considered as valid if the ego vehicle can reach

Chapter 3 49

the gap within the time horizon tk being compliant with the Change Lane and KeepLane (I and II) safety constraints. If the conditions are given, the reachability analysisfor the gap is considered as valid.

3.1.4 Gap Assessment

The keepLane actions begin with an ego vehicle driving on the ego lane having asfront limitation another agent, a statical limitation (given by the topology or staticalobjects located on the road) or its own perception range. At the end of the maneuver,the ego vehicle is driving on the ego lane, having an agent, an statical limitationor its perception range as frontal limitation. The laneChange actions begin with anego vehicle driving on the ego lane, performs a lane change and end on the requiredneighbour lane, left or right. As preconditions, the lateral boundaries from the ego tothe goal lane have to be drivable and the safety constraints presented in Chapter 2 mustbe fulfilled. The effect is the ego vehicle driving on the goal lane once the maneuver iscompleted. A lane change request can be an external input requested by the humandriver (for example in assistance applications (SAE level2) like a lane change assistancesystem) or a decision of the autonomous system. In the case of an external lane change,the benefits of changing the lane have been already evaluated and the system assumesa constant cost penalization for the keepLane actions. In highly and fully automatedsystems, the lane adequacy is evaluated by the system. How decision-making works inthese cases, how the benefits of each lane are considered over the time horizons andwhich metrics are used is introduced in Chapter 6.If lane change is desired, different gaps can be selected to reach the goal lane. Theadequacy of a each available gap to perform the lane change can be measured accordingto the estimated time duration needed to complete the whole maneuver if this gapis selected. For changeLane actions, the effect is to be on the goal lane when themaneuver is finished and it is considered by the planner as deterministic. The plannerassumes that when an action is selected the effect but it also receives cyclic updates ofthe environmental information. If the situation changes, a lane change maneuver mightbe aborted because it is no more compliant with the safety conditions or because itis no more optimal. In order to include this possibility and the adequacy of a gap toperform a lane change can be also evaluated depending on the success of lane changesinto similar gap observed from former experiences. In order to perform a lane change,different gaps can be chosen. The planner evaluates the different possibilities to changethe lane given the different gaps and the cost or negative rewards are computed inseveral steps during the planning. The maneuver cost is defined as a combination ofcomfort, safety and convenience parameters.

cManeuver = w1 · ccomf + w2 ·csafety + w3 ·cconvenience (3.10)

50 3.1 The Tactical Planner Approach

• Comfort. The comfort is related to the kinematics of the ego vehicle, measuredover the variations on the acceleration and deviations to the required velocity.

• Safety. The safety is related to the relative distances between the differentagents and road limitations. The tactical planner only considers the longitudinaldistance between agents, topology limitations and obstacles located on the samelanes.

• Adequacy. The adequacy of a maneuver is a combination of the cost of selectinga defined gap to perform the maneuver and the cost associated to the reachedstate once the maneuver is finished.

A priori cost estimation can be realized using simple models and analytical solutions,that are fast to compute. In this way, the different feasible maneuvers get an initialvalue, and then more detailed forwards simulation can be computed, beginning by themost relevant ones and depending on the available computation time.

Pre-assessment or Instant cost estimationA first estimation of the maneuver, the maneuver cost pre-assessment or instantcost estimation(cMF ), is made based on the current values, during the feasibilityassessment of the lane change and lane keep policies (Lane Change Assessment, LaneKeep assessment). The estimation is made based on a reachability analysis for theavailable gaps. When a gap is reachable, it is included with its associate instant costand success assessment in the list of available policies.

As the assessment is based on the maximal available comfort accelerations, the ac-celeration cannot be used for the initial assessment. The maximal velocity deviationcomputed for the maneuver (c∆v) is computed as comfort indicator. The safety isrelated with the size of the gap where the vehicle is changing (cgap). The convenienceor adequacy of the maneuver is defined as a combination of the cost of the estimatedmaneuver duration (ceD) together with its success rate assessment (cSA). The conve-nience term considers the physical values obtained by the environmental model and thesuccess rate on past situations. This results in an instant gap-cost estimation definedas a weighted combination of the variations on velocity, the estimated time span to dothe lane change, the temporal gap size and the probability of a successful lane change.

cMF = w1·c∆v + +w2·cgap + w3·cLCD + w4·cSA. (3.11)

Elements of the Cost Pre-assessment Function

• Cost for Velocity Deviation (c∆v)

The ego vehicle should try to drive as close as possible to the reference velocity(vref), respecting the safety distance to the other road users. The reference

Chapter 3 51

velocity is defined as the current maximal velocity limited by the driver desiredvelocity, the current legal velocity limitation and the maximal road velocitydue to range and road limitations, as explained on the Chapter 2. Equation(3.12) evaluates the maximal velocity deviation from the desired one during themaneuver.

c∆v = max(| ∆v(t) |)vref

= max(| vref (t)− v(t) |)vref

(3.12)

• Cost for Temporal Gap Size (cgap)

The temporal gap size (gapsize) is defined as the sum of the required AdaptionTime (tadaption) to reach the front vehicle velocity (if it is driving faster) aspresented in Equation 3.2 and the resulting Time Headway (THW) (tfollow =∆s(tadaption)v(tadaption) ) between a front vehicle and its vehicle behind (rear vehicle). TheTHW between a reference vehicle an another entity is defined as the quotientbetween the longitudinal distance between both of them (∆s) and the velocity ofthe rear vehicle.

The criteria for the gap acceptance is defined as gapmin, or the minimal acceptablegap size. Smaller gap sizes result in infinite cost (are not accepted) while biggervalues get a normalized cost assessment. The resulting temporal gap size isdefined by:

cgap =

∞, gapsize < gapmingapmin

gapsize, gapsize ≥ gapmin

(3.13)

• Cost for estimated Maneuver Duration(ceD)

The estimated time for the ego lane changes teD provides relevant informationto compare between strategies. Lane changes that finished later are penalizedbecause they imply a higher risk of the situation to change. A generic availabletime - the time horizon (tk) of the planner - is considered as maximal availabletime for the maneuver (tmax). For spatial restrained situations (as ending lanes orhighway nodes), the maneuver has to be finished within the available geometricalrestrictions. As shown in (3.14) the cost is calculated with a quadratic functionso that the cost increases faster when the estimated time tends towards the limitof the available time.

ceLCD =

∞, teRG > tmax

( teRG

tmax)2, teRG ≤ tmax

(3.14)

• Cost for Learned Success Assessment (cSA)The success assessment evaluates the consequences of a scene evolution differentfrom the predicted one. The most likely evolution is evaluated and forwards

52 3.1 The Tactical Planner Approach

simulated within the cost function term. The success assessment gives, basedon learned values from past situations, an associated cost for the probabilitythat the situation evolves worse than predicted and the selected gap evolvesin not reachable under the safety constraints. Uncertainties in the behavior ofother traffic participants are considered through the cost value cSA calculatedas the probability of an unsuccessful lane change given a selected maneuver:cSA = 1 − p(Success|selectedManeuver) Each lane change initiated and finishedwithin the maximal time horizon is considered as a success. For the computationof this cost parameter, a set of simulations with different start configurationswas run and a neural network with two layers and 30 nodes was trained. Foreach simulation, once lane change started, the information of the selected gapand the result of the maneuver (the lane change was successfully completed onthe selected gap or not) are saved and given as target value for the network. Asinput vector the feature vector is defined through the intervehicular time, theintervehicular distance and the time to collision between the ego vehicle andthe main vehicles involved in the maneuver. Those vehicles are the front andrear vehicles defining the goal gap of the desired lane and the vehicle in frontof the ego vehicle on the current lane. The proposed cost for Learned SuccessAssessment is provided by a function learned off-line. However, this parametercould be also continuously updated using Reinforcement Learning techniques, forexample Q-Learning [166].

The pre-assessment or Instant cost estimation provides a first estimation of the costof each lane change into a gap. Depending on the available computing time, the costvalues can be refined and a predicted cost estimation can be provided.

Predicted cost estimationEach selected scene evolution is optimized or forwards simulated from the current time(t0) until a defined time horizon (T ). The cost values are updated corresponding tothe cumulated cost obtained for each simulation step.

• Comfort Cost (ccomf )

ccomf =T∑

t=t0

w1 · |ax(t)|+ w2 · |ay(t)| (3.15)

being ax and ay the longitudinal acceleration and deceleration values.

• Cost for optimized Lane Change Duration (coLCD)

Similar to (3.14), using the resulting time of the forwards simulation toRG insteadof the estimated time to reach gap teRG

Chapter 3 53

• Safety Cost (csafety)

Evaluation of the safety of the ego vehicle related to the surrounding vehicles.It is defined as the safety-ratio based on Time to Collision(TTC ) and THWbetween vehicles [11]. The safety-ratio takes the maximum value of both rTTC ,rTHW normalized. TTCmin and TTCmax are the limit values and TTC(t) theTTC at time t.

csafety =T∑

t=t0

w ·max(rTTC (t), rTHW (t) with: (3.16)

rTTC (t) =

1, TTC(t) < TTCmin

0, TTC(t) > TTCmax

1− TTC(t)−TTCmin

TTCmax−TTCmin, otherwise

(3.17)

rTHW (t) =

1, THW (t) < THWmin

0, THW (t) > THWmax

1− THW (t)−THWmin

THWmax−THWmin, otherwise

(3.18)

With the updated costs, the best policy is selected and given to the Trajectory Plannerthat optimizes a collision-free and jerk minimal trajectory for the short horizon [122].

The system is able to deal with dynamic environments because the foresight planningstructure considers the most likely evolution of the situation but also integrates valuesfrom former experiences and includes the vehicle dynamic limitations. The next sectionpresents the simulation and experimental results.

3.1.5 Maneuver Optimization

The precondition gives a first assessment about if an ego action into a selected gapis possible and how good is this ego action. It provides an estimation based on thecurrent situation and simple assumptions of the evolution of each scene. A detailedcomputation can be computed for each action into a defined gap and a more detailedestimation of the cost for each maneuver can be computed by a maneuver optimization.

No optimizationNo detailed planning is made for the planning horizon. The decision-making is basedonly on the values computed at the precondition step. The reference velocity for the tra-jectory planning is based on the first segment of the CA-CV profile estimated for the egovehicle as described in Equation 3.8. Each action gets assigned an immediate cost and

54 3.2 Experiments

the policy is selected according to a greedy criteria - the lower immediate cost is selected.

Forwards simulation - based on the selected gapA simpler way to consider the scene evolution is to apply the IDM model to the egovehicle, for each selected action and each corresponding gap. The forwards simulationis computed assuming that each vehicle selects the acceleration provided by the IDMand the lane changes begin with a constant lateral velocity when the front distance tothe front vehicle on the goal lane is within the safety limits

Forwards simulation - Graph searchThe maneuver optimization is based on a classic discrete graph search, where startingfrom the current position and velocity, each node is expanded using a set of predefinedaccelerations. This graph approach is similar to the presented by Bahram et al [11].The discrete available set of actions for the ego vehicle is defined as Mlongitudinal ×Mlateral. The evolution of the longitudinal motion is given by discrete accelerationvaluesMlongitudinal = {a, 0,d}, with a and d being a discrete set of acceleration anddeceleration values. For the lateral motion a constant velocity model is assumed:Mlateral = {vl, 0,−vl}.

In the experiments presented on section 3.2 the acceleration and deceleration setsare described as the 100%, the 50% and the 25% of the comfort acceleration anddeceleration values. For practical reasons, to limit the number of branches of the graph,the longitudinal actions are restricted to maximum of constant velocity phase followedby one deceleration, a constant velocity phase, one acceleration, one deceleration and afinal constant velocity phases. The lateral actions are limited to one velocity phase andtwo constant phases, only one lane change is possible within the time horizon. Thetime step is set to 0.25 s.

The graph search is a global optimization during the time horizon, the optimum isafterwards mapped back to the corresponding maneuvers into its corresponding gaps.

3.2 Experiments

The Tactical Planner presented in this chapter was validated within two differentframeworks. The first one was simulative and the second one was carried out on a

Chapter 3 55

prototype vehicle and a simulated traffic environment (VIL). The experiment setupconsists of an entrance ramp scenario as defined in Figure 3.1. The vehicle has toachieve the lane change before the current lane ends. The module Trajectory Planneris based on the defined by Rathgeber et al. [122].

The planner has an intern model of the motion of other agents, independent of theplanning structure as presented in section 3.1.2.

3.2.1 Evaluation Metrics

The following metrics are used to assess and compare the different versions of thetactical planning.Safety Evaluation

• TTC. The Time To Collision (TTC) is an indicator of the vehicle’s safety duringthe drive. The minimal average TTC for all the experiments is also presented.Literature considers TTC values as relevant under 5 seconds and critical under 2seconds [80].

Comfort Evaluation

• Longitudinal Jerk: according to [39] et al.,the most common approach towardsthe contribution of passengers comfort is to minimize the resulting jerk.

Efficiency of the method

• Success rate: Percentage of completed lane changes. In case the vehicle cannotachieve the lane change, it has to break into standstill at the end of the lane,then the maneuver is not successful but still safe. No collision occurs during theexperiments.

• Lane Change Duration: Indicates the time span between the lane change desireand the successfully completed lane change.

• Computational burden: Average computation time of the tactical planner.

3.2.2 Simulated Experiments

The system was firstly evaluated in simulation with entrance ramp scenarios. Asshown in Figure 3.1, the ego vehicle is forced to merge into the neighbor lane beforethe entrance ramp lane ends, several vehicles are driving on the contiguous lane andthe ego-vehicle has to select the most adequate strategy. The scenario runs in aCo-Simulation of MATLAB/Simulink R© and the traffic simulator Pelops [28]. Pelops

56 3.2 Experiments

offers different driver-types based on the work of Wiedemann [169] that are used forthe simulation of the surrounding vehicles.

The experiment consist of 140 configurations. Each one of the simulated vehicles driveswith a different driver profile provided by Pelops in order to evaluate the limitations ofthe planner and the dependency of the parameters to successfully achieve the mission.

The experiment was tested on four different systems, a baseline of a merely reactiveapplication and three different configurations of the proposed approach:

• C1 is the baseline, where the lane change is only allowed when the current gap isfree.

• C2 is the proposed approach with the first three parameters of the cost estimationand with a heuristic velocity selection instead of maneuver optimization.

• C3 is the proposed approach without the learned model, with a maneuveroptimization through graph search.

• C4 is the proposed approach with the learned success probability and a maneuveroptimization through graph search.

C1 C2 C3 C40

100

200

Experiment Setup

Jerk

[m/s

3 ]

Longitudinal Jerk

C1 C2 C3 C40

0.2

0.4

0.6

Experiment Setup

TTC−

1[s−

1 ]

Maximal TTC−1

C1 C2 C3 C40

10

20

Experiment Setup

Tim

e[s]Lane Change Duration

Figure 3.6: Longitudinal Jerk, maximal TTC−1 and Lane Change Duration

Table 3.2: Simulation Results

Approach success rate computational time critical situationsC1 90.9% 0.0022 ms 8.7%C2 94.9% 0.0037 ms 1.4%C3 95.6 % 0.2839 ms 4.3%C4 96.4 % 0.2848 ms 0.7 %

Chapter 3 57

Table 3.2 summarizes the rate of successfully accomplished lane changes, the compu-tational time and the rate of critical situations (TTC ≤ 2seconds) for each strategy.Fig. 3.6 presents the longitudinal jerk of the complete lane change maneuver, the aver-age maximal TTC−1 and the Lane Change Duration for the successful lane change. Theupper whisker for the experiment C1 is not visualized in the figure since it correspondsto 4.1 s−1 and the values are scaled until 1 s−1 to provide a better overview.

Lower values of longitudinal jerk point out a more comfortable behavior of the strategiesC2 and C4 over C1 and C3. The TTC improves substantially with the use of a proactivestrategy (C2, C3, C4) compared with a reactive strategy. Regarding the warn ratebased on TTC, the values of a proactive strategy get firstly worse with the introductionof a maneuver optimization but then improve with the inclusion of the learned successprobability. The proactive strategy improves generally the success rate of a lane changewhen the space is limited. The consideration of the prior experiences when selectingthe strategy increases the success rate further. The computational time increases dueto the maneuver optimization (C3, C4), but it is still computable online on a real timevehicle-platform.

3.2.3 Real-world Experiment

Simulations provide a useful tool to realize sensitivity analysis. To validate the correctintegration of the planning approach with the vehicle, the system was also evaluatedon a vehicle-in-the-loop configuration. The real vehicle (Fig. 3.7) drives on a test trackand the road users run on an integrated traffic simulation. The vehicle is provided witha dGPS (diferential GPS - high precision localization), which allows to position thevehicle within the test track and allows to generate the road model and the integratedtraffic simulation. The planner runs on a real time platform Autobox R©.

Figure 3.7: Test Vehicle with dGPS

58 3.2 Experiments

Birds eye view: 2 s

x[m]

y[m

]

ego vehicle

road work vehicle veh_1

vehicles on main lane veh_i

Birds eye view: 4 s

x[m]

y[m

]

Birds eye view: 6 s

x[m]

y[m

]

Birds eye view: 8 s

x[m]

y[m

]

Birds eye view: 3 s

x[m]

y[m

]

Birds eye view: 5 s

x[m]

y[m

]

Birds eye view: 7 s

x[m]

y[m

]

Birds eye view: 9 s

x[m]

y[m

]

Figure 3.8: Evolution of the merging maneuver of the ego vehicle. A slower vehicle (thered one) is driving in front. The ego vehicle has to complete the lane changebefore entering on the shoulder lane (grey zone).

Fig. 3.8 shows an entrance ramp situation with fluent traffic flow in the main lane. Onthe entrance lane a slow road work vehicle (veh1) is driving in front of the ego vehicle.The grey zone indicates a shoulder lane where it is not allowed to drive, therefore thelane change has to be completed before the ego vehicle arrives to the grey zone.

Figure 3.9 shows how our planning framework is able to provide maneuvers that aresmooth and feasible on the real system. The ego vehicle uses a proactive strategy, wherethe vehicle reduces its velocity during a lane change to keep the safety distance withthe slower front vehicle, when it is safe accelerates and finishes the lane change. Theblue line represents the selected longitudinal and lateral trajectories and the red-onesare the measured state of the vehicle. The action lane change is selected at shortlybefore t = 3s. The longitudinal distances to all vehicles are indicated in grey, thedistance to the front and rear vehicles of the selected goal gap are indicated by the

Chapter 3 59

Longitudinal Distance

0 5 10 15

time[s]-50

0

50

100

[m]

veh_frontveh_x

veh_gap-selected:frontveh_gap-selected:rear

Longitudinal Velocity

0 5 10 15

time[s]

0

15

30

[m/s]

ego actual

ego planned

reference

Lateral Distance

0 5 10 15

time[s]0

2

4

[m]

Longitudinal acceleration

0 5 10 15

time[s]-2

0

2

[m/s

2 ]

Figure 3.9: Ego vehicle dynamics during the lane change maneuver

cyan and the indigo lanes and the distance to the current front vehicle is given by thegreen lane. The safety distance to the red vehicle is maintained respecting the safetylimits until the ego vehicle is completely located on the main lane at t ∼ 7s. Thesafety distance to the front vehicle on the goal lane is maintained from t ∼ 5s. Aroundt ∼ 6s, the ego vehicle is located on the main lane and begins to accelerate in order toreach the new velocity. The system runs on real time, the surrounding vehicles areprocessed by the system, the most adequate action is selected and the planned actionsare implemented by the actors, closing the control loop. Only small deviations betweenthe planned and the executed strategies were observed.

3.2.4 Discussion

The experiments show that the use of a proactive strategy increases the success ratefor a lane change compared with a merely reactive strategy. The success rate of theproactive strategy can be increased improving the maneuver optimization for a selectedgap, computing the maneuver over the time horizon with more detail. Nevertheless,the longitudinal jerk and the TTC statistics become at the first time slightly worse.This effect probably occurs because of the detailed optimization is more sensitive tochanges of the selected gap over the time. The results improve with the introductionof a learned success probability. The planning including the learned success probabilityperforms better without increasing the risk of the maneuver.

The model presents still some limitations. The assumption of a reactive behavior ofthe rear vehicles, considering that those vehicles will brake behind the ego vehiclewithin conservative limits allows the planer to change into gaps that are closing, that

60 3.3 Related Work

is, cooperative reaction of the other vehicles is assumed during the planning of the lanechange. Although in the future a communication between vehicles (v2v) or betweenvehicles and infrastructure (v2x) may improve the prediction of the behaviour of theother traffic participants, this information is still not available and the autonomousvehicle will need to work in mixed traffic with older vehicles where the safety still hasto be assured with the available information. The safety preconditions introduced inChapter 2, provide the safety criteria during the maneuver. The Change Lange criteriacheck that the goal gap is big enough so that the rear vehicle has enough space andtime to break using a conservative deceleration. Furthermore, the lane change can beaborted since the ego vehicle still keeps the comfort and safety distance to the frontvehicle in the original lane during the lane change maneuver and is able to returnto a lane keep strategy if the gap is closed. The safety conditions together with afast replanning (40 ms) assure the safety of the maneuver. The presented strategytakes into account the most likely evolution of the scene, taking other possible sceneevolutions into account could improve the robustness of the system, as presented in thenext chapter Nevertheless, on situations with high traffic density the original predictedgap size could be still too small and the proactive algorithm would not accomplish thelane change. This situations could be mastered with a cooperative strategy for gapopening, which is out of the scope of this work.

Real-world experiments show a successful smooth integration of the planning on thereal vehicle. It allows a safe validation of critical situations and enables the future stepof driving with sensors on real traffic.

3.3 Related Work

Many systems to perform a lane change and to control the velocity and keep thesafety distance with the front vehicle have been studied over the last years. Assistancesystems corresponding to SAE-Leavel 2 of automation [128] are nowadays present onthe roads and available to the customer by almost all automotive companies. Cruisecontrol, lane keeping or warning functions have improved over the last decades.

First implementations of the Assistance Systems was based on elementary controllers,presenting a merely reactive approach. For example, an ACC was only a controllertuned to give an acceleration depending on the current distance and velocities. Po-tential Fields [19] was first used in the robotics to allow fast computation results andto introduce the influence of several sources. The idea of potential fields was alsoimplemented for autonomous driving [171] but this approach still has some issuesrepresenting the influence of the time component and can stuck into local minima.Although this methods present a fast performance and are able to maintain the distance

Chapter 3 61

in most of the situations, they are reactive, no foresight planning is performed withthem and no reasoning process is possible.

Frazzoli et al [49] proposed a formal language description to generate maneuversequences based on motion primitives. Several methods to take into account the timecomponent generate a directed graph based on the sampling of motion primitives. Graphsearch algorithms based on spatiotemporal-lattices as presented by McNaughton etal. [100] and Gu et al. [60] provide an intensive exploration of the search space takinginto account the non holonomic constraints of the system.

A direct method to solve the planning problem is its consideration as a Model PredictiveControl problem. MPC allows the current timeslot to be optimized, while keeping futuretimeslots into account. Many works formulate the motion planning as an MPC (ModelPredictive Control) problem. Léfevre et al. [89] proposed an interesting Learning-Basedframework to learn from the driver and thus, individualize the longitudinal behaviorbased on the front vehicle. They use Robust-MPC (RMPC) to keep the safety distanceand define a polyhedral terminal constraint for the RMPC problem. Although thismethod includes the safety constraints with respect to the front vehicle, the approachonly considers one of the possible scene evolution. Although they are interestingapproaches for the maneuver optimization, the local consideration of the problem doesnot address how to solve the global strategy.

The avoidance of collision has been also widely studied as a Mixed-integer linearprogramming problem, as the spacecraft trajectory planning presented by Richards etal. [126]. Bemporad et al. [17] combined a linear quadratic optimal controller forconstrained systems with MPC methods. Some works propose to solve the globaldecision-making strategy using Mixed-Integer Quadratic Programming (MIQP). Theapproach of Qian et al. [119] integrates continuous and logical constraints into theproblem formulation and solves it as a MIQP. Their method works in different drivingscenarios, but has still computational issues related with the resolution of the dis-cretization step and the time horizon. Furthermore, the consideration of the collisionavoidance as one binare constraint without including the evaluation of the distance tothe vehicles into the decision-making could present robustness problems related withthe uncertainties of the behavior of the other traffic participants.

The approach presented by Glaser et al. [56] proposes to use a maneuver generatorformed by nine different combination of maneuver sets and a safety trajectory. Thegenerator is limited to one acceleration phase, which is a limitation in complex scenarios.

The consideration of the time horizon implies making assumptions and models of theevolution of the scenes, which could be wrong or not accurate enough. To deal withthe uncertainties of the other participants, Xu el al. [173] proposed a motion planningunder uncertainty including the uncertainties of the other traffic participants.

62 3.4 Conclusions

The behavior of the traffic participants is related with their surrounding environment,each agent interacts with its environment and influences the evolution of the scene withits actions. This includes more uncertainties, related with the behavior of the trafficparticipants. Partially Observable Markov Decision Processes (POMDPS) have beenwidely used in the literature to approach and model the interaction between trafficparticipants ([12], [21],[68]) but they present a high computational burden. For thisreason, Cunningham et al. [30] proposed a relaxation of the POMDPS approach whichcan model the interactions but be adapted to reach a low computational load.

The proposed method presented in this section, to approach the lane keep and lanechange strategies from the combined semantical and numerical approach allows aprogressive expansion of the most promising options. The assessment of all availablegaps can be complemented with machine learning methods, improving the performanceof the system. The accomplishment of safety limits is a precondition for the expansionof each option, avoiding the problems of hierarchical systems and resulting in no cycleslost if an option is not available.

3.4 Conclusions

This chapter presents a system that provides a robust framework for the autonomousdriving task through the integration of semantic and numeric reasoning between differentplanning levels. The approach of keep lane and change lane allows a description thatcan be progressively detailed. It also allows including on a more abstract manner thebehavioral predictions of the other agents. The gap assessment can be expanded tointroduce an uncertainty assessment method based on learned situations. With thisparameter more robustness is gained over behaviors that deviate from the assumedmodel.

The presented system has a highly flexible structure that allows including differentimplementation levels depending on the available information and computational power.In this chapter experimental results obtained in simulation and on a real vehicle withvirtual surrounding traffic are shown. The experimental results show how the methodcan be adapted depending on the available computational resources and are onlinefeasible in a real time system.

Chapter 4

Planning and Prediction:Providing Courtesy Behavior

The ability to plan adequate courtesy behaviors improvespublic acceptance of autonomous systems and the comfortof the surrounding vehicles without considerably decreasingthe own comfort. This chapter presents a novel method thatautomatically adapts the driving behavior, integrating themerging intention of other vehicles. In contrast to othersystems, robustness is achieved by considering not onlythe most likely evolution, but also the expected value ofother possible outcomes in real time. The flexibility of thismethod allows us to integrate it within different planningsystems. The system is therefore able to offer courtesy be-haviors to other vehicles, thereby improving the collectivecomfort of the situation and also safety.

Vehicles of different automated levels are already operating on the streets and interactingwith purely manually human-driven vehicles. One challenge lies in situations that ahuman driver solves in an intuitive way and that are still non-trivial for the machine.

Particularly interesting are those situations that arise from the politeness of the trafficparticipants. For many merging situation as presented in Figure 4.1, human driversanticipate the merging intention of other vehicles driving on the neighbor lane andselect a cooperative behavior. Drivers also expect such cooperative behavior fromautomated vehicles. In dense traffic situations, the vehicle should be able to adapt itsstrategy, where a light decrease of the own comfort considerably improves the collectiveone. For example if a merging vehicle is reaching the end of lane or approaching aslower vehicle, the ego vehicle could decide to open a gap by decelerating or changingthe lane.

63

64

Figure 4.1: On a merging scenario, incoming vehicles should select the appropriate gap tomerge and vehicles in the main flow can cooperate to facilitate the maneuver.

This chapter focuses on such cooperative behaviors. After the identification of aconflicting situation for a potential merging vehicle, our vehicle assesses the optionsbetween behaving cooperatively or following its own interest. The challenge is tooptimize safety and comfort over the evolution of several possible scenes without theexplosion of computational costs.

Many works address the prediction of other traffic participants behavior as described inSection 4.3. However, this behavior is only partially predictable and only accurate for ashort time horizon. Classical planning systems consider only the most likely evolutionof the situation, neglecting less likely actions that could result in more dangeroussituations. At the other extreme, some approaches include all possible scene evolutionsresulting in too conservative systems or too high computational load to be used forreal-world applications.

The decision-making should deal with the uncertainties derived from the behavior ofother traffic participants in an efficient manner. This means that the trade-off betweencomputational load, comfort and safety should be optimized. The focus of this workconcerns the integration of the intention of merging vehicles within the decision-makingprocess.

In this chapter a courtesy behavior method that enhances several existing planningapproaches is presented. This method allows to assess how adequate it is for the egovehicle to adapt its own strategy in order to facilitate the merging maneuver of apotential merging vehicle. As the main contribution the improvement of two alreadyexisting planning approaches using the presented method is presented. The limitsbetween a good planning, an intention prediction algorithm and the re-planning abilityof the system are also analyzed.

Chapter 4 65

4.1 Intention Prediction and Courtesy Behavior

The objective of this work is to provide the autonomous driving planning systemwith a courtesy behavior, which identifies the intention of other traffic participantsand assesses the cost of adapting the own ego strategy. The inclusion of courtesybehavior in the planning is achieved by integrating an intention prediction algorithminto the decision-making. Thereby a better foresight of the scene evolution is gainedby including all possible outcomes to provide the planning system with robustness overfalse predictions. Decisions are made based on maximizing the expected utility of theinvolved traffic participants.

4.1.1 Problem and Task Description

When human driver are confronted with situation like the presented in Figure 4.2,different possibilities can happen depending on the actions of the merging vehicle (blue)and the one on the main lane (white-black).

(a) Forced merge (b) Yield the right-of-way (c) Courtesy merge

Figure 4.2: Some possible scene evolution when a vehicle driving on the entrance rampintends to merge into the main lane.

If the white-black vehicle continues with its drive without any adaption, the mergingvehicle could force its merge in front of the white vehicle, forcing the white to brakeabruptly- this results on high cost for the comfort and even for the safety. It couldalso happen that the white-black vehicle continuous its drive and the merging vehicleaborts its merging maneuver, braking to yield-the-right-of-way to the white vehicleand merge behind - although the white-black vehicle is not affected, this situationcould bring high cost for the merging vehicle. It can also happen that the white-blackvehicle identifies the situation and selects to lightly decelerate in order to open thespace in front and to facilitate the merging maneuver - this kind of conflict resolutionwith minimal associated cost is the goal of this chapter.

The maneuver planning presents a combination of continuous planning with semanticinformation in order to deal with the complexity of the problem in a computational

66 4.1 Intention Prediction and Courtesy Behavior

efficient way. On a similar way, the possible actions for the ego and the conflictingvehicle semantically determine as presented in Tables 4.1 and 4.2.

NC Follow solely the ego objectives

COKL Cooperation by deceleration when keeping laneCLL Cooperation by changing to the left lane

Table 4.1: Ego Vehicle Actions

Y Yield the right-of-way to the ego vehicle and merge after itNY Merge in front of the ego vehicle

Table 4.2: Conflicting Vehicle Actions

All possible ego actions are clustered as No Cooperative (NC) and Cooperative (CO).Thus, the action space of the ego vehicle is defined as Aego := {NC,CO}. For theconflicting vehicle all merge actions are combined into the No Yield actions (NY ),corresponding to merging in front of the ego vehicle and the Yield actions (Y ). Thus,the action space of the conflicting vehicle (cv) is defined as Acv := {Y,NY }.

The goal is to choose an action for the ego vehicle (alphaego) so that the combinedexpected utility (U(α)) is maximized, i.e.,

α∗ego = argmaxα∈Aego

E(U(α))

= argmaxα∈Aego

∑αcv∈Acv

p(αcv) · U(αego, αcv).(4.1)

This makes necessary to predict the intention of the merging vehicle p(acv), whichis presented in the next section. The description of the expected utilities is given inSection 4.1.3.

4.1.2 Prediction Module

In the state-of-the-art methods, the costs of actions are computed for the most likelyactions of other traffic participants. But these approaches lack the robustness againstfalse predictions. The aim of this section, is to predict the probability of unlikelybehavior of other traffic participants, specifically the misprediction probability forthe conflicting vehicle. Using this probability the expected utility of an ego actionconsidering two possible decisions of the conflicting vehicle can be computed.

A potential conflicting vehicle is a vehicle that might perform a lane change in front ofthe ego vehicle. Lane changes occur constantly between lanes. The case of merging

Chapter 4 67

vehicles from a entrance ramp is of special interest due to the additional spatialrestrictions to finish the lane chane. The prediction approach presented in this sectionfocuses on lane changes in merging lanes, but the courtesy approach is also valid forall lane changes predictions.

Intention prediction at the merging laneIn order to evaluate the probability of different outcomes, an intention predictor wasneeded. Two different classifiers were trained, a Gentle Ada Boost classifier [129], [50]with Monte-Carlo sampling and a Multinomial Regression Classifier [2]. As presentedin [109] and [101], a scene with the ego and the conflicting vehicle was generated with300135 different initial configurations to generate the samples for the training of thepredictor. The configurations were simulated on the model predictive control frameworkof Bahram et al.[11] to get and label the data. For the simulation, a conflicting vehiclemodel based on the merging models proposed by Choudhury [27] was used.

The first considered predictor was based the ensemble learning based Gentle Boost classi-fier [51] and presented in the master thesis of Sezer [109]. The classifier’s feature vector~x consists of the Time-to-Lane-End (TTL) of the merging vehicle, the time headway(THWego-cv) between the ego and the conflicting vehicle, the velocities and positionsof both vehicles relative to the ending lane (~x = [TTL, THWego-cv, vego, vcv, xego, xcv]).Its output is the merging decision, thus either Y or NY . The classification tree istrained using a Gentle Boost classifier provided by Matlab R© with 40 learners (classifi-cation trees) and 20 maximum splits with 20-fold cross validation. The output of theclassifier is either Y or NY , thus not probabilistic. In order to compute how certainthe prediction is, we use Monte Carlo sampling with Gaussian distributions aroundxego0 , xcv0 , vego0 and vcv0 with N = 400 samples. The votes for each class are countedwithin the sampling region and the ratio is computed to get the probability of eachclass. The Monte Carlo sampling approach allows to convert the binary output of theAdaboost prediction into a numerical probability of the action, required for computingthe expected utility. The probabilistic consideration introduces robustness againstfalse classifications. It also introduces robustness against measurement errors since itsamples around the current measurement values.

This prediction provides an accurate prediction of the intentions, but its computationalload is too high. For this reason a second prediction module was implemented: aMultinomial Regression Classifier.

The Multinomial Regression Classifier fits directly a probability value, for the classifi-cation probabilities over 0.5 are accepted as positive. The accuracy is computed as

TP+TNTP+TN+FN+FP where TP are the true positives, TN the true negatives, FN the falsenegatives and FP the false positives. Table 4.3 shows the recall TP

TP+FN and precisionvalues TP

TP+FP resulting for both classifiers. An accuracy of the 99.2% was obtained for

68 4.1 Intention Prediction and Courtesy Behavior

Gentle Boost Multinomial RegressionClassifier Classifier

Precision Recall Precision RecallY 92.85% 94.78% 98.25 98.87NY 99.64% 99.50% 82.22% 74.67%

Table 4.3: Recall and precision values for both classifiers [101]

the Gentle Boost Classifier and an accuracy of 97.3% for the Multinomial RegressionClassifier.

The accuracy and precision values are better for the Gentle Boost Classifier, but theexperiments presented in Section 4.2 show that a good performance can also be achievedwith the Regression Classifier.

4.1.3 Decision-Making

The goal of the planning approach is to deal with the uncertainties derived from thebehavior of other traffic participants and provide a cooperative behavior if necessary.For this purpose, the planning strategy is enhanced with the information coming fromthe prediction, providing a courtesy strategy.

The selection of the action strategy is based on an utility function. The utility for eachvehicle (U(αj), where j ∈ {ego, cv}) is defined as the inverse of the cost, being thetotal cost a combination of a safety and a comfort therm. The comfort cost is definedas the accumulated acceleration over the time. The safety cost, is the accumulatedmaximal risk defined on Bahram et al. [11].

For each of the ego strategies the scene is forward simulated, firstly for the mostlikely predicted merging decision and once again for the opposite merging action.The common expected utility of an action combination U(αego, αcv) is defined as thecombination of the individual utilities, i.e.,

U(αego, αcv) = Uαego(αcv) + λ · Uαcv (αego) (4.2)

The parameter λ in Equation 4.2 is the cooperation coefficient, from λ = 0 for purelyegoistic up to λ = 1 for highly cooperative behavior.

The goal is to choose the ego action which has the maximum expected utility. Hence,the action αcv is marginalized out of the conflicting vehicle from the common expectedutility Equation 4.1. Since the behavior of the conflicting vehicle is analyzed for a fixedego strategy, the probability of the given ego action is assumed to be equal to 1 for each

Chapter 4 69

of the NC and CO actions. Thus, and assuming that αcv and αego are independentand using the Kolmogorov’s conditional probability axiom:

P (acv ∪ αego) = P (αcv) + P (αego)− P (acv ∩ αego) (Kolmorogov’s axiom)

⇓ with P (αego = 1) (given a fixed ego action)

P (αego ∩ αcv) = P (αego, αcv) = P (acv)

resulting P(αcv) = P(αego, acv) . Thus, the single merging probability can be usedinstead of joint probability in order to evaluate unlikely actions weighted by theirprobability.

For simplicity, in this work the actions of the conflicting vehicle are assumed to beindependent of the actions of the ego vehicle and are maintained during the forward-simulation. The single probability provided by the prediction classifier with theinformation available at the planning time is used. Although the ego and conflictingvehicle action are tightly interconnected, this assumption together with a frequentupdate and re-planning of the system allows to reduce the branching options andprovide a trade of for the computational load.

4.1.4 Approach Generalization in Populated Environments

The previous section presents the approach of courtesy behavior with focus on merginglanes. It presents how to deal with an intention prediction when the vehicle is drivingnear an entrance ramp and intents to merge into the main lane. The approach presentedon section 4.1.1 is valid for general situations where the merging intention of a vehicledriving in the neighbor lane is predicted. Agents’ behavior are predicted as explainedon section 3.1.2, providing the trajectories for the most likely evolution. For vehicleswhere the merging intention in front of the vehicle is unclear, two different possibleevolutions are considered as detailed on Algorithm 2.

In populated environments or situations with several traffic participants as depicted inFigure 4.3, the aggregate utility should include not only the values for the ego and theconflicting vehicles but also the values for the relevant vehicles directly influenced bythe decision. Aggregate utility is, therefore, extended to the rear vehicles of the egolane, the conflicting vehicle lane and the selected lane for the potential courtesy lanechange.

For the generalization of the intention prediction to non merging lanes, the THWbetween the conflicting and its front vehicle could be used instead the TTL andpositions should be relative to the ego-vehicle to provide a more generic reference, forexample a prediction method like the presented by Scheel et al. [130] could be used.

70 4.2 Experiments

Algorithm 2 Prediction and Decision-Makingfor all relevant lanes li do

update most likely scene evolution α ∈ lifor Potential Merging Agents in Front of Ego do

Get Probability of αcvk= Merge-in-Front

if Situation unclear thengenerate both hypothesis for acvk

: Merge-in-Front and Yieldinclude cooperative action αCO−KL with acvk

if not included αCO−CLL theninclude αCO−CLL with acvk

end ifend if

end forend forα∗ego = argmaxα∈Aego E(U(α))

Figure 4.3: Populated merging scenario. When optimizing the aggregate traffic utility, theego vehicle (white-black vehicle driving in the middle lane), has to considerthe utilities of the conflicting vehicle (blue one in the merging lane), and thedirectly affected vehicles, the rear vehicles of the involved lanes: green vehicleon the middle lane, truck on the merging lane and red vehicle on the left lane(due to the courtesy by lane change).

Given a prediction probability of the merging-in-front maneuver for the conflictingvehicle, the Maneuver Planner Module is able to integrate and assess different sceneevolutions with their associated probabilities of occurrence and select the ego actionwith the highest expected utility.

4.2 Experiments

This chapter presents a method to enhance the decision-making strategy with coopera-tive behavior avoiding the infinite branching factor of all possible trajectory evolutionover the time. To evaluate the method a set of experiments using the same system

Chapter 4 71

configurations was designed, modifying only the parts corresponding to the predictionand decision-making.

Figure 4.4 presents the work-flow of the system. Firstly, the vehicle receives informationabout the environment through the different sensors and back-end. This information isprocessed by the Environment Model. The behavior of the other traffic participants ispredicted by the Prediction Module. Then, the Maneuver Planner Module selects thebest policy for the current situation and provides a drivable and collision-free trajectory.This trajectory is tracked and the vehicle controller controls the actors, closing thecontrol loop.

Environment Model

backend

Sensors

Actors Controller

Maneuver PlannerDecisionMaking

TrajectoryPlanner

Prediction

vehicle

Figure 4.4: Simplified environment and vehicle control loop. This chapter focuses on theinteraction between prediction and decision-making.

The system configuration used in simulation and real world experiments was the same:Prediction, Maneuver Planning, as well as the Trajectory Tracking which is part ofthe controller. The other traffic participants behavior is predicted by the PredictionModule. This module provides the probability of the vehicle to merge in front of theego vehicle. The prediction or estimation of the agents’ motion is computed within theManeuver Planning, as explained on Section 3.1.2. The Maneuver Planner Moduleconsists of a decision-making module and a trajectory planning module. For thesimulation experiments two different non-cooperative decision-making implementationswere integrated. The first one (DS-NC) corresponds to the driving strategy presented byBahram et al. [11]. The second planning strategy (MPL-NC) is presented on Chapter 3and on [102]. Both planning strategies are enhanced with the presented courtesybehavior approach and generated the planning strategies DS-CO and MPL-CO. Thetrajectory planning and trajectory tracking are for both systems based on the approachdescribed by Rathgeber [122].

72 4.2 Experiments

4.2.1 Evaluation Metrics

The setup consists for both the real-world and simulated experiments of a mergingramp scenario, where a merging vehicle has to perform a mandatory lane change intothe ego lane. In order to evaluate the performance of our proposed method, differentsafety and comfort metrics are evaluated:

Safety Evaluation:

• maxTTC−1: Maximum inverse TTC over time. Note that the inverse TTC isconsidered instead of the TTC to also average the cases when TTC is infinite.

Comfort Evaluation:

• maxDecEgo: maximal longitudinal deceleration of the ego vehicle.

Conflict Resolution Efficiency:

• tMerging: The conflict resolution efficiency of the proposed approach is evaluatedusing the merging time. Lower times indicate that the vehicle merged in front ofthe ego vehicle while higher times indicate that the conflicting vehicle trended toyield-the-right of way to the ego vehicle, waited until it passed by and mergedbehind the ego vehicle.

4.2.2 Simulated experiments I: Single Conflicting Vehicle

The simulation runs in a Co-Simulation of MATLAB/Simulink R© and the trafficsimulator Pelops [28]. Pelops provides a realistic driver behavior for the other trafficparticipants. The performance of the courtesy behavior strategy was evaluated workingwith different planning strategies. For this test, the Gentle Boost Classifier presentedin Section 4.1.2 is used for the merging prediction. For the selection of the ego strategy,the Driving Strategy after Bahram et al. [11] and the Maneuver Planning presentedon Chapter 3 and on [102] are employed. Both strategies are simulated with thecourtesy (CO) and non-courtesy (NC) configurations indicated in 4.2. Both strategiesare computed on 5GHz frequency.

Random initial configurations of the scene shown in Figure 4.5 had been generated.For the simulation experiments, the entrance ramp’s length is 250 m, the ego andconflicting vehicle positions xego

0 , xcv0 are varied randomly between 0 to 200 m from

the reference point located at the beginning of the entrance ramp and their velocitiesvego

0 and vcv0 are varied between 60 to 130 km/h respectively. The vehicles start with a

constant velocity at the center of their lanes.

Table 4.4 shows the metrics for the different configurations. The computational timecorresponds to the prediction and the decision-making modules. The enhancement of

Chapter 4 73

experiment1Scene

x = 0m xend = 250m

xego0xcv0

vego0vcv0

Figure 4.5: Initial configuration for simulated experiments

Table 4.4: Simulation Results for different strategies. Metrics averaged over 469 cases.

Approach maxTTC−1 maxDecEgo Merging ComputationTime Time

DS-NC 0.080 s−1 -1.22 m/s2 5.910 s 0.195 sDS-CO 0.068 s−1 -1.03 m/s2 5.440 s 1.545 sMPL-NC 0.049 s−1 -1.04 m/s2 7.576 s 6.66e-06 sMPL-CO 0.023 s−1 -1.67 m/s2 5.206 s 1.350 s

the Driving Strategy with a cooperative module allows to improve all the consideredmetrics. A reduction of the maxTTC−1 indicates that on average, the most criticaltime point becomes safer. The lower deceleration values also indicate that the egoreaction has a better foresight for the planning. Similarly, the merging times areimproved by the cooperative strategy. With the non-cooperative approach for theMPL (MPL-NC), the ego vehicle optimizes only its own utility and it does not providean anticipatory deceleration. Therefore, the merging vehicle selects a conservativebehavior, yielding the right-of-way to the ego vehicle and merging after it. In thiscase, the ego vehicles pursues a merely egoistic approach and the average mergingtime increases considerably. The enhancement of the MPL approach with a courtesybehavior, reduces the merging time and the values of maxTTC−1, by increasing theaverage deceleration values. It provides a safer and faster conflict resolution by braking.

One restriction of the non cooperative approach of the driving strategy is the highcomputation time. The difference between the driving strategy by Bahram et al. [11]and the maneuver planning is that the first allows a different action selection at eachplanning step (similar to the graph search presented on Chapter 3) and the second fixesthe semantical action for the planning horizon and forwards simulates the situationbased on the selected gap as explained on Chapter 3. The strategies for the maneuverplanning leave a great time buffer for other computations but the results obtainedfor both courtesy implementations the prediction of the merging intention were toohigh to be computed in real-time. The problem was the high computation time forthe prediction of the merging intention. For this reason the intention prediction withthe Multinomial Regression Classifier explained in Section 4.1.2 was integrated. The

74 4.2 Experiments

Table 4.5: Simulation Results depending on the rate time. Metrics averaged over 469 cases.

Approach computation task N max TTC−1 maxDecEgo MergingTime Rate Time

MPL-NC 6.67e-06 s 200 ms 1 0.049 s−1 -1.04 m/s2 7.576 sMPL-NC 6.67e-06 s 40 ms 1 0.048 s−1 -1.14 m/s2 5.400 s

MPL-CO GB 1.350 s 200 ms 400 0.023 s−1 -1.67 m/s2 5.206 sMPL-CO GB 0.145 s 200 ms 35 0.051 s−1 -1.03 m/s2 5.242 sMPL-CO MR 4.43e-05 s 200 ms 1 0.039 s−1 -1.09 m/s2 5.502 s

task time of the decision-making strategy according their online capabilities was alsomodified. Table 4.5 shows the results.

The MPL-NC presents results of the reactive strategy, without prediction for twodifferent task times, in order to study the influence of faster updates. MPL-COGB presents the courtesy strategy with the gentle boost classifier for the predictioninformation presented in Section 4.1.2. The number of samples was reduced to obtainan online-capable computational time. The MPL-CO MR integrates the courtesystrategy with the multinomial regression classifier.

The non cooperative strategy improves the merging time by computing in a fastertask rate, it becomes more reactive and increases the braking rate. Nevertheless,the maxTTC−1 metrics remain similar. An improvement on the maxTTC−1 valuesobtained with 400 samples over the non cooperative strategy is not achieved with alower number of samples (35). The use of multinomial regression for the predictionsallows to find a balance between the computational time and the metrics improvementover a merely reactive approach. The MPL-CO-MR was selected for the evaluation onreal world experiments.

4.2.3 Real-world Experiments

The real-world tests were performed on a test vehicle: a seven series with serial sensorsas the shown in Figure 4.6. There the information is obtained by the sensors, and anenvironment model module sends the agents, objects and topology information to oursystem, which runs on the real time platform Autobox R©.

The experiments should be reproducible to compare the results and therefore theywere carried out in the controllable environment of a test track. To perform these

Chapter 4 75

Figure 4.6: The experiments of courtesy behavior were conducted with a BMW 7 serieswith serial sensors and a virtual triggered end of lane. Picture from BMWCommunication.

experiments a virtual end of lane was placed in front of the merging vehicle, which wasactivated based on the relative distance between both vehicles.

The aim of this experiment was to compare the enhancement of the decision-makingstrategy with and without using courtesy behavior. The merging vehicle was instructedto perform the lane change independent of the ego vehicle. This situation representsthe use case when the merging vehicle overlooks the vehicles driving on the merginglane or underestimates the danger of the situation.

We drive a set of 14 configuration with different initial velocities (vego : 65− 100 km/h,vcv : 60−80 km/h). Several repetitions for each measurement were taken and, in orderto provide a more accurate overview, the scene was also re-simulated with the traces.

real-N

C

resim

-NC

real-C

O

resim

-CO

0

0.1

0.2

TTC−

1[s−

1]

Maximal TTC−1

real-N

C

resim

-NC

real-C

O

resim

-CO

0

2

4

Dec

eler

atio

n[m

/s2]

Maximal Deceleration

Figure 4.7: Metrics for real-world experiments with and without the courtesy behaviorstrategy. Measured values and re-simulated values

Figure 4.7 represents the metrics for different configurations. The increase ofmaxTTC−1

values indicates that the experiments are less critical for the courtesy setup. In addition,for these experiments the deceleration values are lightly lower. Table 4.6 shows thep-values obtained with the T-Test (double tailed) for the hypothesis that the compared

76 4.2 Experiments

Compared experiments Maximal TTC−1 p-value Minimal deceleration p-valuereal NC vs. real CO 0.02 0.24

resim NC vs. resim CO �0.01 ∼0.1real NC vs. resim NC 0.31 0.83real CO vs. resim CO 0.48 0.67

Table 4.6: P-values obtained for the real and resimulated experiments to analyse the statis-tically significance of the results.

values are similar. Regarding the lower p-values (≤ 0.05) obtained for the maximalTTC−1, indicate that the difference is statistically significant. For the decelerationvalues, it exist a trend that the deceleration values are lower when using courtesybehaviour but is not statistically significant. The simulation with traces also providesa reference between the quality of simulations values and real-world experiments. Thehigh p-values comparing the real world experiments and the corresponding resimulationsindicate that the simulated results are also representative of real-world behaviour.

Figure 4.8 shows the velocities and positions of a real-world measurement with theMPL-CO strategy active.

First plot shows the longitudinal distance between the ego and the conflicting vehiclefrom the ego perspective. Second plot represents the velocities of the ego and mergingvehicles differentiating between the classification on the right lane and on the ego lane.Third plot shows the lateral distance. The longitudinal reaction of the ego vehicle

Longitudinal Distance

10 15 20 25

0

40

80

[m]

time[s]

Longitudinal Velocity

10 15 20 25

0

15

30

cv rightLane cv egoLane ego actual

ego planned ego target time[s]

[m/s]

Lateral Distance

10 15 20 25

-4

-2

0

2

[m]

time[s]

Longitudinal acceleration

10 15 20 25

-2

0

2 [m/s ]2

time[s]

Figure 4.8: Experiments results in the real-world, for an active courtesy strategy: kinematicvalues of the ego and the conflicting vehicle.

Chapter 4 77

Ego perspective: 13 s Ego perspective: 16 s

Ego perspective: 18 s Ego perspective: 19 s

Figure 4.9: Experiments results in the real-world, for an active courtesy strategy: experimentsequence from ego perspective.

begins about two seconds before the conflicting vehicle is classified at the ego lane.Figure 4.9 shows the situation from the ego perspective.

4.2.4 Simulated Experiments II: Populated Environments

The previous result only evaluate the approach for one conflicting agent. This exampleshows how the extended approach works in a populated traffic situation, where theego vehicle is surrounded by several agents.The experiment was run with two different configurations:• egoistic configuration: the ego vehicles follows solely its own objectives and taking

only the most likely evolution into account.• courtesy behavior: the ego vehicle includes the courtesy behavior and can select a

courtesy behavior - whether a courtesy lane change to the left (αego : CO−CLL)or a courtesy deceleration (αego : CO −KL)- and a non-cooperative behaviorbased only on the most likely prediction (αego : NC −KL).

Figure 4.10 shows the scene evolution for the two options: egoistic and courtesy.Figure 4.11 shows the evolution of the longitudinal dynamics and relevant safety

78 4.2 Experiments

parameters the cost evolution for the main vehicles involved in the maneuver, forboth configurations during the experiment. Figure 4.12 and Figure 4.13 show theexpanded actions combinations considered by the decision-making at one step-time.The predicted dynamics for the ego vehicle and the main involved vehicles for thedifferent action combinations are plotted in figure 4.14. The planned evolution figuresshow the internal predicted scene evolutions at the time-step 3.2s. From this moment,the two strategies (courtesy and egoistic) took different decisions and the scenes evolveddifferently.At the beginning of the scenario it can be observed that the ego vehicle veh1 is drivingon the center lane. The right lane and the center lane are separated by a continuouslane and therefore the lane change is not allowed between the right-most-lane and thecenter lane. The center lane and the left lane are separated by a discontinuous lane,lane changes between both lanes are allowed. The map informs that the lane changefrom right to left will be allowed from Position 450 m and that the right most lane willend at Position 730 m. The scenarios Courtesy Behavior and Egoistic Configurationbegin to differ from t =3.2 s. A conflicting vehicle veh129 is identified and an intentionprediction to change to the ego lane is recognized.At this moment, the Courtesy Behavior selects a cooperative lane change to the left(CO-CLL) in order to open the front gap and to allow the conflicting vehicle veh129 tochange to the center lane. The ego vehicle plans to open the available gap between itsfront-vehicle veh131 and its rear-vehicle veh122. Shortly after the lane change of the egovehicle, its rear-vehicle veh122 also changes to the left lane, opening the gap even more.But the conflicting vehicle stays with a conservative behaviour on the right lane and isthe vehicle veh3 on the right lane, the one who uses the generated gap and changes tothe center lane. The new gap generated for the conflicting vehicle (veh131 − veh3) issmaller as before, forcing the conflicting vehicle to continue decelerating in order tostay on the ending-lane and to pass behind veh3.On the other side, the ego vehicle on the Egoistic Configuration continue driving on itslane adapting its velocity to the vehicle in the front (veh131). The rear vehicle veh122

performs a lane change to the left. The veh3 on the right lane profits from the gapleft by the veh122 and perform a smooth lane change behind the ego vehicle at t 10 s.The conflicting vehicle veh129 decelerates at the end of the lane and forces its merge infront of the ego vehicle at t 18.5 s.The vehicle dynamics (Figure 4.11) show how the scenario with the Courtesy Behaviorpresents a smoother and better velocity and acceleration profile for the ego vehiclethan the Egoistic Behavior. This is due to two different reasons: the first reason is thehigher vehicle density and a lower mean velocity lane of the center lane and the secondreason is the forced merge of the conflicting vehicle for the Egoistic Configuration. Asthe conflicting vehicle forces its merge at the end of the merging lane the ego vehicle

Chapter 4 79

Birds eye view: 0 s

Birds eye view: 3 s

Birds eye view, courtesy: 5 s Birds eye view, egoistic: 5 s

Birds eye view, courtesy: 7 s Birds eye view, egoistic: 7 s

Birds eye view, courtesy: 10 s Birds eye view, egoistic: 10 s

Birds eye view, courtesy: 15 s Birds eye view, egoistic: 15 s

Birds eye view, courtesy: 18.5 s Birds eye view, egoistic: 18.5 s

Birds eye view, courtesy: 20 s Birds eye view, egoistic: 20 s

Figure 4.10: Scene evolution for the courtesy and for the egoistic approaches.

has to break harder. The objective pursued by the Courtesy Behavior was to facilitatethe merging of the conflicting vehicle. Due to the complex traffic situation and itsconservative behaviour none of the strategies allows a fluent resolution of the situationfor the vehicle veh129. With the Courtesy Behavior regarding the conflicting vehicleveh129 the velocity went down to v=5.1 m/s, and the middle lane was reached at t=21 sand the minimal TTC was 3 s. On the other hand, for the Egoistic Configuration theconflicting vehicle veh129 velocity went down to 13.6 m/s, the middle lane was reached

80 4.2 Experiments

Time Headway - courtesy strategy

5 10 15 20

time[s]0

2

4

6

8

10

TH

W [

s]Time Headway - egoistic

5 10 15 20

time[s]0

2

4

6

8

10

TH

W [

s]

Time to Collision - courtesy strategy

5 10 15 20

time[s]0

2

4

6

8

10

TT

C [

s]

Time to Collision - egoistic

5 10 15 20

time[s]0

2

4

6

8

10

TT

C [

s]

Longitudinal Velocity - courtesy LC

5 10 15 20

time[s]

0

10

20

30

40

v [

m/s

]

Longitudinal Velocity - egoistic

5 10 15 20

time[s]

0

10

20

30

40

v [

m/s

]

Longitudinal acceleration - courtesy LC

5 10 15 20

time[s]-3

-1.5

0

1.5

3

a [

m/s

2]

Longitudinal acceleration - egoistic

5 10 15 20

time[s]-3

-1.5

0

1.5

3

a [

m/s

2]

veh_6 veh_122veh_115ego vehicle veh_3veh_2 veh_131veh_129veh_124

Figure 4.11: Evolution of the headway, time to collision, linear velocity and accelerationfor the main vehicles involved on the maneuver. For the THW and TTCcomputation, the end of lane is included as a static vehicle at the end ofmerging lane.

at t=18.5 s and the minimal TTC was 0.6 s.

Regarding the different action combinations planned in t=3.2 s (Figures 4.12 and4.13) it can be observed that the cooperative Lane change CO-CLL presents thesmoother trajectories for the ego vehicle (as also happens in reality), nevertheless, thepredicted values of the conflicting vehicle veh129 differ strongly from reality for bothactions merging-in-front and yield-the-right-of-way. The cooperative keepLane CO-KLpresents a better reaction as the non cooperative keepLane NC-KL, a former adaptionof the velocity avoids abrupt braking and in case that the conflicting vehicle selects

Chapter 4 81

Ego Courtesy Strategy: Change Lane to the Leftαego: CO-CLL - αcv:merging-in-frontt = tinit

350 500

x[m]

y[m

]

t = tinit + 5s

500 650

x[m]

y[m

]

t = tinit + 10s

500 650

x[m]

y[m

]

t = tinit + 15s

650 800

x[m]

y[m

]

αego: CO-CLL - αcv:yield-the-right-of-wayt = tinit

350 500

x[m]

y[m

]

t = tinit + 5s

500 650

x[m]

y[m

]

t = tinit + 10s

500 650

x[m]

y[m

]

t = tinit + 15s

650 800

x[m]

y[m

]

Ego Courtesy Strategy: Keep Lane with Courtesy Decelerationαego: CO-KL - αcv: merging-in-frontt = tinit

350 500

x[m]

y[m

]

t = tinit + 5s

500 650

x[m]

y[m

]

t = tinit + 10s

500 650

x[m]

y[m

]

t = tinit + 15s

650 800

x[m]

y[m

]

αego: CO-KL - αcv: yield-the-right-of-wayt = tinit

350 500

x[m]

y[m

]

t = tinit + 5s

500 650

x[m]

y[m

]

t = tinit + 10s

500 650

x[m]

y[m

]

t = tinit + 15s

650 800

x[m]

y[m

]

Figure 4.12: Prediction of the scene evolution depending on the different courtesy ego actionsαego = {CO-CCL, CO-KL} and the different actions αcv ={merging-in-front,yield-the-right-of-way} for the conflicting vehicle, at time tinit=3.2s

82 4.2 Experiments

Ego Non Courtesy Strategy: Keep Laneαego: NC-KL - αcv: merging-in-frontt = tinit

350 500

x[m]

y[m

]

t = tinit + 5s

500 650

x[m]

y[m

]

t = tinit + 10s

500 650

x[m]

y[m

]

t = tinit + 15s

650 800

x[m]

y[m

]

αego:NC-KL - αcv: yield-the-right-of-wayt = tinit

350 500

x[m]

y[m

]

t = tinit + 5s

500 650

x[m]

y[m

]

t = tinit + 10s

500 650

x[m]

y[m

]

t = tinit + 15s

650 800

x[m]

y[m

]

veh_6 veh_122veh_115ego vehicle veh_3veh_2 veh_131veh_129veh_124

Figure 4.13: Prediction of the scene evolution for the non-courtesy keepLane ego actionαego = {NC-KL} and the different actions αcv ={merging-in-front, yield-the-right-of-way} for the conflicting vehicle, at time tinit=3.2s

Chapter 4 83

Ego Courtesy Strategy: Change Lane to the Leftαego: CO-CLL - αcv:merging-in-front

Planned Velocity m/s

0 5 10 15 20

time[s]

0

10

20

30

40

v [

m/s

]

Planned Acceleration [m/s2]

0 5 10 15 20

time[s]

-10

-5

0

5

a [

m2/s

]

αego: CO-CLL - αcv:yield-the-right-of-wayPlanned Velocity m/s

0 5 10 15 20

time[s]

0

10

20

30

40

v [

m/s

]

Planned Acceleration [m/s2]

0 5 10 15 20

time[s]

-10

-5

0

5

a [

m2/s

]

Ego Courtesy Strategy: Keep Lane with Courtesy Decelerationαego: CO-KL - αcv: merging-in-front

Planned Velocity m/s

0 5 10 15 20

time[s]

0

10

20

30

40

v [

m/s

]

Planned Acceleration [m/s2]

0 5 10 15 20

time[s]

-10

-5

0

5

a [

m2/s

]

αego: CO-KL - αcv:yield-the-right-of-wayPlanned Velocity m/s

0 5 10 15 20

time[s]

0

10

20

30

40

v [

m/s

]

Planned Acceleration [m/s2]

0 5 10 15 20

time[s]

-10

-5

0

5

a [

m2/s

]

Ego Non Courtesy Strategy: Keep Laneαego:NC-KL - αcv: merging-in-front

Planned Velocity m/s

0 5 10 15 20

time[s]

0

10

20

30

40

v [

m/s

]

Planned Acceleration [m/s2]

0 5 10 15 20

time[s]

-10

-5

0

5

a [

m2/s

]

αego: NC-KL - αcv:yield-the-right-of-wayPlanned Velocity m/s

0 5 10 15 20

time[s]

0

10

20

30

40

v [

m/s

]

Planned Acceleration [m/s2]

0 5 10 15 20

time[s]

-10

-5

0

5

a [

m2/s

]

veh_6 veh_122veh_115ego vehicle veh_3veh_2 veh_131veh_129veh_124

Figure 4.14: Predicted dynamics of the main involved vehicles depending on the courtesyand non-courtesy ego actions αego = {CO-CLL, CO-KL,NC-KL} and thedifferent possible actions αcv ={merging-in-front, yield-the-right-of-way} forthe conflicting vehicle, predicted at time tinit=3.2 s

84 4.2 Experiments

to yield-the-right-of-way, the reduction of the ego-velocity can rapidly be recovered.The estimated evolution of the conflicting vehicle for the yield-the-right-of-way theconservative one, corresponds to the observed conflicting vehicle evolution in theCourtesy Behavior scenario. The influence in the real scene evolution of the rearvehicles veh3 and veh121 plays an essential role on the conservative behavior selectedby the conflicting vehicle.

During complex traffic situations, it is difficult to predict the behavior of all the involvedother traffic participants. If the rear vehicle of the right lane veh3 had considered theintention of its previous one, the resolution of the situation would have been morefluent for everyone. Nevertheless, this fact is no more controllable by the ego vehicle. Itindicates that cooperation by deceleration allows a tighter control of the final situationbecause in this way the ego vehicle maintains the control over the gap for the mergingvehicle. Lane changes on the other hand may increase the own ego vehicle rewards buthave a higher dependency on the behavior of the other traffic participants regardingthe whole benefit.

4.2.5 Discussion

Simulation results show how the enhancement of a driving strategy with courtesybehavior improves the comfort of the conflicting vehicle, reducing its merging time. Inaddition, the safety metrics, TTC−1, for the ego vehicle are also improved due to theforesight planning for the two different planning approach analyzed. The usage of a fastdecision-making algorithm, allows fast-replanning approaches (faster task rates) and,even without courtesy approaches the Maneuver Planning approach was able to obtaingood values on the safety metrics. Nevertheless, the best compromise was obtained forthe foresight planning. Advantages of a foresight planning are particularly illustratedin the measurements of the real-world experiments. For critical situations, where themerging vehicle overlooks the ego vehicle, the selection of a cooperative strategy allowsthe ego vehicle to adapt itself comfortably, outperforming the no-cooperative strategy.

For the cooperative strategies the methods presented in Section 4.1.2 were used, howeverother prediction methods could provide more accurate information and optimize thecomputation time. The presented prediction and experiments are focused on merginglanes, but the situation is similar to other use cases like merging vehicles which overtakeslower vehicles. The challenge is to estimate correctly the intention of potential mergingvehicles and the assessment of a courtesy behavior can be also applied in those situations.

The approach presented in this chapter fixes the action of the conflicting vehicle foreach simulation option in order to reduce the branching factor. In the real world, bothvehicles can adapt their decisions in each moment. This approach assumes that the

Chapter 4 85

conflicting vehicle actions are independent of the ego action and relies on a frequentupdate and fast re-planning to evaluate the new predicted intentions. The presentedapproach considers each planning step the decision of each vehicle as independentand only dependent on the current situation and updates its prediction each planningstep (40 ms) according to the available metrics. The study of dependences of thedecision between vehicles would be interesting to develop fully cooperative strategies,but these would strategies require a direct communication between vehicles of betweenthe vehicles and the infrastructure. However, a comparison of the proposed approachwith the use of the conditional probability(p(αcv|αego) instead of the single probabilityp(αcv) would be interesting but is not in the scope of this work.

Real-world experiments were driven on a supervised environment. An important futurestep is to test the system intensively in a public environment.

4.3 Related Work

Human drivers analyze and anticipate the traffic situation. Similarly, autonomousvehicles should integrate a prediction of the behavior of other participants into theirdriving task.

The problem of robots interacting on populated human environments is not only focus ofautonomous driving but also point of interest of other robotic fields. Bennewitz et al. [18]presented a method to predict the trajectories of persons and improve the navigationbehavior of a mobile robot. Kuderer [85] and Kretzschmar [83] present a cooperativenavigation model for mobile robots interacting with pedestrians. Nevertheless, whennavigating on freeways and highways, the topology is more structured and the velocitiesof the traffic participants are higher, which requires consideration of specific solutions.

In a near future, as presented by Hobert [67], the intercommunication between vehiclesand infrastructure (V2X) will allow to acquire a precise information about the intentionsof the traffic participants and the evolution of the situations. The authors of [104]proposed an on-ramp merging system which assessed the road traffic conditions andtransmit the instructions to the vehicles on the surrounding area. The system performswell but relies an advanced infrastructure. V2X technology presents encouraging resultsbut the technology is still not mature enough.

Throughout the last years several prediction methods for traffic participants havebeen intensively studied on the literature. Different motion models like physic-basedmodels, maneuver-based models or interaction-aware based models can be used forthe prediction [90]. The integration of prediction and intention information within thedecision-making process plays a crucial role in the system performance.

86 4.4 Conclusion

Düring et al. [38] compute the complete set of the collision-free start and end points oftrajectories for each vehicle. This kind of intensive computations provides accurateresults in detriment of the on-line capability of the system. Other strategies takethe current most likely prediction and rely on a continuous update of the availableinformation and fast re-planning system, as the multilevel planning system presentedby Menéndez [102]. Carvalho et al. [23] integrate the most likely cut-in predictioninformation to improve the autonomous cruise control. The combination of the mostlikely prediction with a fast re-planning works for most of the situations quite well,but still does not consider other possible interactions between the agents involved.

Iterative planning strategies combine the planning and prediction tasks. Wei etal. [167] propose an intention prediction based strategy generation. In [10] the authorssuggest a game theoretic approach which can model the re-planning capabilities of thedrivers. Song et al. [143] proposed a decision-making based on partially observableMarkov decision process (POMDP). The multi-policy decision-making presented byCunningham et al. [30] also simulates the scene evolution using the most likely evolutionof the other agents involved in order to reason about the policies. The problem withsuch iterative planning approaches is that they only consider the most likely evolutionof the other traffic participants. Especially in longer prediction horizons the modelpredictions can become inaccurate and overlook some critical situations.

One step further, the proposed system not only anticipates the behavior of other trafficparticipants to improve their own safety but also plans a cooperative behavior toimprove the aggregate traffic comfort.

4.4 Conclusion

This chapter presents a novel approach that provides the automated vehicle withcourtesy behaviors. During conflicting situations different possible scene evolution areassessed. It takes into account not only the most likely behavior of the other trafficparticipant but also the opposite one, to evaluate the effect of a cooperation withthem. This chapter presents how this method can complement several decision-makingstrategies based on a generic prediction algorithm. The simulation results show that theuse of this courtesy behavior improves the results of already existing decision-makingstrategies. It is also discussed the scalability of the approach to be adapted to thecomputational requirements and therefore be on-line capable. Results show how afast re-planning strategy (non-cooperative) can obtain great results. Nevertheless, aforesight planning (cooperative strategy), considering not only the most likely outcomebut the opposite one, performs better and can improve the results further. The set ofexperiments with a test vehicle illustrates the usability for real applications.

Chapter 5

Driving towards the Highway ExitRamp: A User Study

The decision-making of highly automated vehicles is clas-sically driven by a cost-function, which evaluates differentvariables as comfort and safety with a mathematical formula.Many works are based on such cost-functions which can begiven by the designer or learned by the system. Neverthe-less, the optimization function is usually selected withoutinvolving the final user of the system. This can cause a gapbetween an "optimal function" and the desired behaviour ofthe customer. This chapter explores, based on a user study,different factors that influence the customer’s perception.

On the highway, human drivers continuously make decisions adapting their drivingbehavior. When driving towards an exit ramp, they select intuitively their laneto optimized their desired velocity without the risk of passing the exit. On highlyautomated vehicles, this decision should be taken by the system, while the drivercan deviate its attention from the driving task and give other utility for its time.When driving towards an exit ramp, the selection of the velocity and the lane canbe optimized. Thereby is important to assure the comfort of the vehicle occupants.This optimization can be made based on a cost function or an reward function whichweights the relevant parameters. The challenge is to select the significant parameters,their relative importance (weight) and how they reflect the subjective comfort andsafety perception of the occupants. Many users studies have been performed andthe parameters involved on the comfort perception during a lane change have beenintensively analyzed but the information of the perception when driving towards anexit ramp is still not sufficient. Nevertheless, as it is one of the last moments of thehighway drive, this perception has a high influence on the evaluation of the whole drive.

87

88 5.1 Study Design

This chapter presents a systematically conducted user study that analyses the influenceof the main parameters involved on the comfort and safety perception when taking thehighway exit using a dynamic driving simulator. The perception of the drive is alsostudied for different attentiveness levels of the drivers. As result a relation betweenthe main factors affecting the overall perception of the autonomous drive towards anexit ramp is obtained.

5.1 Study Design

As human drivers, we tend to plan in foresight according to the expected trafficsituation. When the traffic is not congested, many drivers tend to use more oftenthe faster lanes to overtake slower vehicles and tend to use up the distance until theexit ramp because they trust that adequate gaps will be also available near the exitramp. When the traffic becomes more congested, most of the drivers tend to be moreconservative and to drive towards the exit lane sooner. How the lane change is carriedout also affects to the perception of the driving comfort, big gaps are usually favored.The vehicle dynamics and the driver attentiveness also influence the comfort perceptionof the occupants during the drive.

5.1.1 Parameters of Interest: Variables of Study

The traffic characteristics, the distance to the exit ramp, the selected gap for the lanechange, the vehicle velocity and its dynamics as well as the driver attentiveness definethe studied variables to identify the influence factors on comfort and safety perceptionwhen driving towards the exit ramp.

• The traffic characteristics - Traffic flow and traffic densityTraffic models can be classified according their aggregation level in macroscopicor microscopic models [160]. Macroscopic models describe traffic flow analogouslyto liquids or gases in motion. The dynamical variables are locally aggregatedquantities such as the traffic density (vehicles/km), flow (vehicles/hour) or meanspeed. Microscopic models describe individual driver-vehicle entities, whichcollectively form the traffic. These models describe the reaction of every driver(accelerating, braking, lane changing) depending on the surrounding traffic.

The prediction models presented and used in the tactical planning in the previouschapters are microscopic models, these models perform well on short time horizonbut become more inaccurate in long term simulations. For a prediction ofa long therm evolution of the traffic, the individual behavior of each vehiclebecomes less relevant than the aggregated behavior or trend of each lane, making

Chapter 5 89

the macroscopic characteristics of the traffic flow a good indicator of the longtherm evolution. Those theories try to model and explain the traffic in orderto improve the infrastructures design accordingly. Hall [61] analyzed severalthree-dimensional model of the traffic stream based on flow, concentration andspeed. These models already indicated different behaviors on uncongested andcongested operations. Kerner [77], [76], proposed the three-phase traffic theory,dividing the congested traffic into synchronized flow and wide moving jams. Thethree-Phase after Kerner [76] differentiates between:

– free flow, characterized by high vehicle speeds which may differ amongneighboring lanes.

– synchronized flow, the speed of vehicles drops significantly, but the flow rateremains similar, due to vehicle density increase. The term synchronizedreflects the synchronization of velocity of the vehicles in different lanes .

– wide moving jams, at this stage, both flow rate and velocity drops signifi-cantly, the velocity is close to zero.

The Three-Phases-Traffic Flow Theory presents some limitations when confrontedwith complex multi-ramp setups or accident-induced bottlenecks as explainedby Schönhof et al. [131], and consider the classification not well defined andqualitative. The speed-density diagrams of german and dutch highways presentedby Treiber et al. [160] show a differentiated lane velocity for left lane velocitiesgreater than 100 km/h, with low traffic densities and the begin of synchronizedflow with densities of 10 vehicles/km/lane. Synchronized traffic consider thedifference of mean speed between neighbor lanes under 10 km/h. The highlycongested traffic appear at velocities under 60 km/h and traffic densities greaterthan 20 vehicles/km/lane. Within this work we simplify the Three-Phases ofKernel and consider three clusters of traffic flow: free flow when the maximallane velocity is greater than 100 km/h, synchronized flow when the maximal lanevelocity is lower than 100 km/h and greater than 60m/h and traffic jam whenthe maximal lane velocity is smaller than 60 km/h. Note that this classificationis used in this work for two and three lane highways with an the entrance rampand exit ramp located adjacent to the lower speed lane.

Traffic jam situations are not considered in this chapter as they usually do notpresent a distribution of available gaps to make the lane change but rather needa different strategy to open a merging space between two vehicles. The work ofKauffmann et al. [74] presents different strategies to interact with the surroundingvehicle in dense traffic situation in order to open suitable gaps. This work focuseson free flow and synchronized flow scenarios, where available gaps in order tocomplete a lane change can be found.

90 5.1 Study Design

• VelocityEach road segment has an associated maximal velocity. The associated maximalvelocity is restricted by the legal velocity limitation and by the maximal velocityto allow a safety braking. As presented in Chapter 2, the maximal velocityis limited in order to allow a safety braking with the available deceleration(limited by the current friction coefficient) within the sensor range. Some highwaysegments are exempted from a legal velocity limitation, as for instance in somehighway segments in Germany. In case of no legal velocity limitation, the maximalavailable velocity for an automated vehicle is given by the sensor range. However,this is only an upper limitation, since the maximal segment velocity is restrictedby the surrounding traffic. Some countries, present left-hand-traffic, and ontheir highways the left lanes are usually faster than the right lanes. Other landspresent right-hand-traffic. In many left-hand-traffic countries right-overtaking isnot allowed for higher velocities as well as in many right-hand-traffic countriesthe left overtaking is not allowed. In other lands no legal restrictions apply tothe right-overtakings or left-overtakings.

In order to optimize the driving time, the autonomous vehicle should drive asnear as possible to the maximal velocity, under consideration of correspond-ing right/left-overtaking rules. Faster lanes are associated to more aggressivedrivers, specially on roads without speed limitation. Some drivers prefer a moreconservative approach and select a lower cruise speed.

• Distance to exit rampThe longitudinal distance to exit ramp indicates the longitudinal distancein the one dimensional abstraction of the topology between the current vehicleposition and the position of exit ramp. It can be also defined as the longitudinaldistance in the road coordinate system.The lane distance to the exit ramp is defined as the minimal number of lanechanges required to reach the exit ramp lane. Most highway exit ramps havespecial lanes to leave the road and the vehicle has to be on the correct lane inorder not to pass them. If the exit ramp is missed, the vehicle has to find analternative route to reach its goal.

• Temporal Gap Size or Time HeadwayThe gap size is defined by the distance between two consecutive vehicles, andcan be considered as a temporal or a length variable. It can be measured inmeters (length distance) or in seconds (time required by the rear vehicle to coverthe current longitudinal distance between both vehicles at its current velocity).In this thesis the temporal gap size or Time Headway (THW ) is considered.Bellem et al. [16] identified the THWas one of the objective metrics to evaluatedriving comfort [16]. According with Lewis-Evans et al. [93] a linear influence

Chapter 5 91

on the comfort decrease and the risk perception increase appeared when theTHWwas under 2 seconds independently of the velocity range. The existenceof this THWthreshold on the subjective risk and comfort perception was alsoconfirmed by the THWstudy of the adaptive cruise control performed by Siebert etal. [139]. Taieb-Maimon et al. [151] found strong differences according to thedriver preferences and the selected velocity, some drivers even chose a THWunder their own braking capabilities. When the drivers were conducted by anautomated vehicle they are still of the visibility conditions, the study of Siebert etal [140] shown a dependence between the preferred THWand the velocity andvisibility for highly automated vehicles. The naturalistic study performed byIvanco [69] also showed that in real traffic, the chosen THWand its scatteringincreased for low velocities.• Acceleration and Jerk

Bellem et al. [16] proposed to use objective metrics in the evaluation of drivingcomfort, under the selected metrics were the acceleration and its variation orjerk. As the authors explain, humans’ vestibular system is not able to perceivespeed but perceives changes in speed and it is also sensitive to rapid changes inacceleration.The works of MacNeilage et al.[97] and Müller [112] show that similar variationson the linear acceleration are perceived similarly by the driver independent of theacceleration reference. The work of Müller et al. [111] shows a stark dependence onthe perception of changes on the acceleration power depending on the approachingdirection. The vestibular perception strongly differs when approaching a constantvelocity depending on the approaching direction. Variations when approachinga constant deceleration are more intensely perceived by the drivers as whenapproaching an acceleration. This effect indicates that the jerk term should beweighted differently, depending on the gradient sign.Furthermore Kobayashi et al. [79] identified that the subjective anxiety changedue to acceleration and deceleration intensity varies depending on the inter-vehicledistance. This results are also consistent with the findings of Dillen et all [32],who identified that the positive relationship between longitudinal accelerationand jerk and physiological responses was further positively magnified by thepresence and proximity of a lead vehicle.• Attentiveness LevelThe attentiveness level conditions the perception of the external situation. Theinfluence of visual and cognitive distraction on the driver performance has beenwidely studied. Engstrom et al. [40] studied the influence of visual and cognitivedistraction on active driving in different settings: static simulator, moving basesimulator and field. Both distractions implied a reduced driving performance.

92 5.1 Study Design

Visual distractions represented the highest physiological load and intermittentcontrol strategies were developed by the drivers as they strive to maintainacceptable lane keeping performance. For purely cognitive distractions, it wasidentified that the gaze concentration increased towards the road center anddecreased towards the neighbour lanes. The results were consistent between thedifferent settings. Kaber et al. [71] demonstrated that driver visual and cognitivedistractions have independent and combined effects on driver performance andworkload. Visual distractions were compensated by complex gaze behaviorsor increasing of headway time. Cognitive distraction also leads to increasedworkload by dividing driver concentration among the roadway and secondarytasks. They also identified a dependence of the workloads depending on the levelof the driving control, more complex maneuvers like overtaking a slower vehiclepresented higher workloads than simple maneuvers like car-following.

The on-road assessment performed by Harbluk et al. [62] demonstrated that whenthe drivers are confronted with congnitive loads, they concentrate their visualattention on the road center and less on the side and peripheral devices. Theyalso observed increased incidents of hard braking and a reduction of safety.

Festner et al. [43] studied the influence of three different lane changes dynamicsand three secondary task with different levels of distraction. The study found thatparticipants involved in a cognitive and motion interactive secondary task founddynamic lane changes as disturbing independent of their own driving preferences.The study also found that participants involved in only visual activities preferredlane changes more similar to their own driving preferences.

The participants were divided into two study groups: "visual surveillance" and"secondary task". Figure 5.1 show an example of a participant of each group.

– The "visual surveillance" group was instructed to monitor the road. Theparticipant was conscious during the whole scenario of the traffic situation,was just released from the manual driving task ("eyes-on").

– The "secondary task" group got a combined visual and cognitive distraction.The study leader conducted a quiz similar to "Who wants to be millionaire".The participant played with help of a tablet located on the cockpit. Theparticipant read the questions on vive voice, discussed them with the studyleader and had to select the right answer on the table. The participant wasdistracted from the traffic situation and concentrated on an non-relatedsecondary task ("mind-off").

Chapter 5 93

Figure 5.1: Two participants during the study. The participant of the left was part ofthe Visual Surveillance Group and the participant of the right was part of theSecondary Task group

5.1.2 Study Objectives

The aim of the study was to identify the influence factors that define comfort andsafety in order to obtain an adequate lane selection when driving towards the exitramp. In addition to the factor’s influence, two questions were also evaluated:

• A highly automated system should have a similar behavior as the behavior of thedriver

• The selection of a desired lane on highly automated systems has the same influenceon the comfort perception independently of the attentiveness level of the driver

5.1.3 The Cost/Reward Function

The presented parameters have a great influence in the comfort and safety perceptionof the vehicle occupant. In autonomous vehicles the driver hands the control over tothe vehicle, the correct combination of comfort and safety perception is essential togain their trust and acceptance.

The work of Jian et al. [70] empirically studied and developed how to measure trustbetween people and automated systems. They proposed a scale to help understand howsystem characteristics might affect the perception of trust. Reliability and familiaritywere some of factors associated to the perception of trust in human-machine interactions.

Trust, reliability of familiarity are subjective factors, and thus, difficult to measure andto generalize. The planning system, on the other side, can optimize its cost or rewardfunction based on measurable values. An objective assessment of the trust, comfortand safety perception into quantifiable parameters is thus needed.

94 5.1 Study Design

Planning approaches that optimize towards an objective function define a cost functionwhere several parameters are weighted. The goal of the planner is to find the sequence ofactions that optimizes the function objective. But how the function objective is definedplays an essential role. Many different cost functions are proposed in the literature todefine the function that has to be optimized by the motion planer. Xu [174] proposeda combination of static cost (geometrical costs like path or distance to static obstacles)with dynamic cost (like time, velocity...) classifying the cost terms in different categories:efficiency, comfort, behavior, energy and safety. Lefévre et al. [89] firstly trains a drivermodel based on the longitudinal distance and velocity of the ego and its precedingvehicle to obtain a reference acceleration and then optimize the selected accelerationbased on the velocity deviation, the acceleration deviation from the reference andthe jerk. Bahram et al. [11] proposed a three-level objective function for autonomousvehicles in highways. The highest levels takes safety into account evaluating the TimeTo Collision (TTC) and Time Headway, then the traffic law of not overtaking by theright is weighted and finally the comfort is evaluated. The comfort term proposed bythe authors is composed by a deviation term of the desired velocity, a deviation of theright-most-lane and the longitudinal distance to the preceding vehicles on the ego andneighbor lanes.

Althoff, Manzinger and Koshi [5] proposed to standardize the cost functions based onaccording to the state of the art, most commonly accepted for autonomous driving.The objective was to provide a benchmark to evaluate and compare different motionplanning approaches. Table 5.1 presents the cost parameters from the benchmark [5]used in our study.

Running CostAcceleration (longitudinal or lateral) JALONG/LAT =

∫ tet0 a2 dt

Jerk (longitudinal or lateral) JJLONG/LAT =∫ te

t0 a2 dtSteering angle JSA =

∫ tet0δ2dt

Yaw rate JY R =∫ tet0ψ2dt

Lane center offset Jloffset=∫ tet0d(t)2dt

Velocity offset JV =∫ tet0

(vdes(x(t))− v(t))2dt

Path length JL =∫ tet0vdt

Terminal costTime JT = te − t0

Table 5.1: Cost Parameters evaluated on the study adapted from the benchmark proposalby Althoff, Manzinger and Koshi et al. [5]

As explained on the previous section, the Time Headway has an important relevance

Chapter 5 95

in the comfort and safety perception. Therefore we substitute the term "Distance toobstacles" (JD =

∫ tet0max(ξn, ..ξ0)dt, with ξi = e−wdisdi) proposed by Althoff [5] and

Xu [174] by two terms based on the THW and TTC after Bahram [11] and presentedin eq. 5.1 and 5.2. In this study TTCmin and THWmin are set to 0.2s. The TTCmax isset to 5s or the necessary time to make a safety braking with 10 m/s2 from 180 km/h.THWmax is set to 2 s or the threshold THW when the risk perception increases linearlyaccording with Lewis-Evans et al. [93]. The running cost parameters are extended tothe related terms as presented in Table 5.2.

rTTC (t) =

1, TTC(t) < TTCmin

0, TTC(t) > TTCmax

1− TTC(t)−TTCmin

TTCmax−TTCmin, otherwise

(5.1)

rTHW (t) =

1, THW (t) < THWmin

0, THW (t) > THWmax

1− THW (t)−THWmin

THWmax−THWmin, otherwise

(5.2)

Running CostTTC JTTC = ∑T

t=t0 rTTC(t), with rTTC from eq.5.1THW JTHW = ∑T

t=t0 rTHW (t), with rTHW from eq.5.2

Table 5.2: Cost Parameters evaluated on the study adapted from Bahram et al. [11]

Specially in uncongested traffic, different mean lane velocities are observed. In Germany,the legislation indicates to drive as near to the right lane as possible, in case of multilaneand when the traffic flow so requires are the vehicles allowed to use the left lanes, butvehicle are not allowed to make a right overtaking (StVO §7[147]). Driving on thefaster lanes can reduce the travel time, and most of the drivers tend to pass theirlead vehicle when they perceive that the neighbor lane is driving faster, as shownby Redelmeier et al. [123, 124]. Therefore another metrics is included: the velocitydeviation compared with the deviation in the fastest lane Jvlane

5.3.

Running CostJvlane

Jvlane=∫ tet0wivLane(t) · |vref (x(t))−v(t)|

vref (x(t)) dt, with wivLane from eq. 5.3

Table 5.3: Cost Parameters for the Lane Adequacy

96 5.2 User Study on the Dynamic Driving Simulator

wivLane(t) =

1, max(vsection − vi) ≤ 0.1 · vref10, 0.1 · vref <| max(vsection − vi) |≤ 0.2 · vref20, | max(vsection − vi) |> 0.2 · vref

(5.3)

When the exit ramp is approaching, not to be driving in the lane close to the exitramp increases the probability of missing it, if the necessary lane changes cannot becompleted on time, therefore the lane index or the Distance to exit ramp when theright-most-lane is selected would be also included in the evaluation.

Some of the presented metrics are perceived visually by the occupants, if they arefocused on secondary tasks and visually distracted, they will be only aware of thevestibular metrics (acceleration or jerk). Attentiveness level can also influence theperception of the passengers but only in combination with the other metrics.

The corresponding partial cost of the different scenarios are computed using thepresented metrics.

5.2 User Study on the Dynamic Driving Simulator

5.2.1 Dynamic Driving Simulator

The use of a driving simulator instead of real traffic helps to maintain the replicate-ability and reproducibility of the study between different test persons. Although thework of Engstrom et al. [40] showed comparable results for the driver attentiveness onstatic simulators, dynamic simulators and real traffic, those results were evaluated for anactive driving and interaction. Greenberg et al. [58] observed better lane keeping ratesthan the experimented on static simulators and found that undesired lane departingwere found to be significantly higher when the lateral motion of the driving simulationwas deactivated. Bellem et al. [15] confirmed that the results obtained for comfortassessment of ADAS systems in dynamic simulators reproduced the results obtainedon real tracks.

A dynamic simulator provides realistic acceleration and velocities. As one importantaspect of the parameters of interest are based on kinematic values, the vestibularstimulation would provide more realistic results due to the feedback of the dynamicsimulator compared with use of static simulators, where the perception is just visual.Thus, the dynamic simulator was selected as the more adequate setting, Figure 5.2shows the simulator used in the study.

Chapter 5 97

Figure 5.2: Dynamic Simulator of BMW in Munich used for the study. Photo provided byBMW Group

5.2.2 Study Setup

The study was composed of twelve scenarios, varying the traffic flow, with the meanlane velocity and mean Time Headway, and the distance to the exit ramp when thelane change into the right lane was triggered. For the traffic flow the first group is thefree flow traffic, with big spaces available to perform the lane change (free-flow3.6s).When the Time Headways get closer the traffic begins to articulate, with the formationof platoons on the right lane but still with a differentiate velocity difference betweenlanes (free-flow1.8s). For smaller velocities the traffic begins to synchronize and thevelocity difference between lanes decreases, two different sets were defined (synchron90

and synchron70 ). The trigger of a lane change to the right was set in three differentdistances, oriented to the informative signposting that can be found in highways. Tables5.4 and 5.5 show the scenarios overview and Table 5.6 shows the measured values ofthe main parameters for each scenario.

Traffic flow free-flow3s free-flow1.8s synchron90 synchron70

mean velocity middle lane 125 km/h 125 km/h 95km/h 72.5 km/hmean velocity right lane 90 km/h 90 km/h 90km/h 70 km/h

mean gap THW right lane 3s 1.8s 1.8s 1.8s

Table 5.4: Scenario variation related to the traffic flow

The study did not presented any difference in the term of the path length JL, since thedriven route was the same on the twelve scenarios. Also the values of lateral acceleration

98 5.2 User Study on the Dynamic Driving Simulator

Trigger lane change to the right lane early normal lateDistance to exit-ramp 1800 m 1000 m 400 m

Table 5.5: Variation to trigger the lane change to the right lane

Traffic Flow free-flow3s free-flow1.8s synchron90 synchron70Trigger [m] 1800 1000 400 1800 1000 400 1800 1000 400 1800 1000 400Scenario 1 2 3 4 5 6 7 8 9 10 11 12

Running CostJAlong 0.74e+03 0.72e+03 0.82e+03 1.65e+03 1.22e+03 1.17e+03 0.74e+03 0.66e+03 0.59e+03 1.37e+03 1.58e+03 1.48e+03JAlat 120.4 115.8 122.2 133.5 121.4 123.2 121.6 118.4 118.2 112.8 107.6 109.1JJlong 104.3 273.3 153.5 749.2 366.8 290.7 172.4 242.9 90.9 181.0 231.6 241.1JSA 5.18e-01 5.27e-01 5.17e-01 5.27e-01 5.55e-01 5.50e-01 6.00e-01 6.08e-01 5.74e-01 8.63e-01 9.29e-01 9.14e-01JYR 4.14 4.32 4.35 4.46 4.29 4.33 4.40 4.23 4.29 4.22 4.27 4.27

Lane center offset 4.55e+03 4.70e+03 4.43e+03 4.54e+03 4.52e+03 4.44e+03 4.61e+03 4.54e+03 4.52e+03 4.65e+03 4.66e+03 4.53e+03JV 2.53e+05 8.12e+04 7.4e+03 2.54e+05 1.14e+05 1.9e+03 2.78e+05 2.07e+05 1.69e+05 1.01e+06 1.02e+06 1.00e+06

Path length 3.41e+03 3.41e+03 3.41e+03 3.41e+03 3.41e+03 3.41e+03 3.41e+03 3.41e+03 3.41e+03 3.41e+03 3.41e+03 3.41e+03JTTC 0 1.06 0.25 12.37 0.25 1.70 0 0 0 0.72 0.06 0.26JTHW 0.97e+03 0.50e+03 0.15e+03 1.03e+03 0.59e+03 0.21e+03 1.24e+03 1.05e+03 1.05e+03 1.57e+03 1.52e+03 1.52e+03

Terminal costTime 120.2 110.1 104.3 120.2 112.2 105.2 123.0 120.0 119.1 158.6 157.0 156.7

Other scenario valuesTTCmin 5.53 4.48 4.79 2.74 4.60 4.05 6.46 6.08 6.15 4.52 4.90 4.69THWmin 0.95 0.92 1.01 0.42 0.85 0.84 0.61 0.80 0.90 1.00 0.77 0.68

min(along) -1.83 -1.97 -1.94 -3.52 -2.93 -2.78 -1.49 -1.49 -1.51 -2.31 -2.20 -2.21max(|alat|) 0.99 0.99 0.99 1.00 0.99 1.00 1.00 1.00 1.00 1.01 1.01 1.01

JvLane 1.46e+4 5.76e+3 8.29e+2 1.50e+4 6.78e+3 4.21e+2 8.39e+2 6.39e+2 5.73e+2 2.00e+3 2.02e+3 2.04e+3

Table 5.6: Summary of the cost term values for the different parameters considered in thestudy

JAlat, lane center offset, and yaw rate JY R are quite similar since the number of lane

changes performed in each scenario was the same and the parametrization of the lanechange dynamics was constant. The values of the steering angle values increase forthe lower speed scenarios but the yaw rate remains similar. The main differences areobserved in the longitudinal terms, here the longitudinal acceleration and longitudinaljerk vary between the scenarios. Also the THW and TTC parameters differ. Onespecial case is the Scenario number 4: here the safety cost values are higher and thelongitudinal jerk is also the highest. In this scenario the lane change was forced behindthe lead vehicle with a short front space correcting the distance afterwards. Thisscenario allows to evaluate the safety perception of the participants during more abruptmaneuvers, nevertheless the maneuver was still safe and the Time To Collision wasduring the whole scenario greater than 2.7s.

5.2.3 Procedure

The study took about 100 minutes pro participant and was structured in three parts:

• Reception: each participant fulfilled questionnaire about their demographicsituation, their driving skills and the previous experience with ADAS systems.Afterwards they receive the safety instructions and instructions according to theirdistraction group. Finally they drove a test scene to get comfortable with the

Chapter 5 99

system.

• Test execution: the twelve scenarios were driven and after each drive, theparticipant fulfilled a questionnaire about the perceived safety and comfort. Thesequence of scenarios was varied for each test person in order to reduce theeffects of learning during the experiment due to the repetition according withthe method of Bortz and Döring 2007 [20].

• Conclusion: at the end of the study, participants were asked to summarize theirexperience and filled a survey answering their mindset about automated vehicles.

5.2.4 Participants

A total number of N = 65 participants between 21 and 59 years old were part ofthe study. Each participant was randomly assigned to one of the attentiveness-levelgroups. Table 5.7 summarizes the gender and the mean and standard deviation valuesof the age and driver-license years for both attentiveness-level groups. Due to therequirements of the driver simulator, participants were workers for the BMW Group.40% of the participants work or have previously worked on a job related with advancedassistance systems or autonomous driving.

Groups Group 1 Group 2Attention Level Visual surveillance Secondary Task

Age mean SD mean SD32.2 8.4 35.1 10.9

years of mean SD mean SDdriving license 14.8 8.2 17.4 10.6

gender male female male femaleN 22 13 25 5

Table 5.7: Demographic characteristics of the study participants

Participant previous knowledge and mindset

In order to determine the preferences and previous knowledge of the participants,several questions about their driving style and preference were stated. Results of theprevious knowledge survey are shown on Figure 5.3 and Figure 5.4. More than thehalf of the drivers were frequent drivers with a great average of driven kilometers proyear. The ACC Systems (Adaptive cruise control) was in general better known andoften used that the lateral assistance systems ALC. Also most of the participantsclassified themselves as sporty and comfort oriented drivers who enjoy driving but wereopen-minded to drive assisted by the car or in an highly automated vehicle.

100 5.2 User Study on the Dynamic Driving Simulator

Previous driving expertise of the participants

0 50 100.

How many km do I drive per year?

Participant responses

Responses [%]

<5000 km/year 5000 - 10000 km/year 10000 - 20000 km/year >20000 km/year

0 50 100.

Previous experience with ACC

Previous experience with ALC

I do not know the system I know about the system but I have never used itI use it sometimes I use it regularly

Figure 5.3: Driving experience of the participants. The first question is related with theirmanual driven experience. They were also asked about their experience withassistance systems of SAE-level 2: the adaptive cruise control (ACC) and theActive Lane Keeping and Traffic Jam Assistant (ALC). Results are averagedover the answers of the 65 participants.

Driving preferences of the participants

0 50 100

I drive sportlyI drive comfort-oriented

I enjoy to drive by myselfby the car while driving

I am happy to be assistedI would like to drive highly automated

I find comfortable being the co-driver

Responses [%]

Strongly disagree Somewhat disagree Neutral Somewhat Agree Strongly agree

Figure 5.4: Driving preferences of the participants measured in a Likert-scale of 5 points.Results are averaged over the answers of 65 participants.

Chapter 5 101

5.2.5 Experiment Results

Each participant drove the 12 different scenarios, in a randomized sequence. After eachdrive, they were asked to evaluate different parameters perceived during the experience.The results are presented in form of box-plots in order to facilitate the qualitativeanalysis and overview. For the statistical test, a normal distribution of the responseswas assumed and a 5% significance level was considered in order to accept or reject thesignificance hypotheses. Note that the response of each Attention Group corresponds towithin-subject studies (all combinations were tested for each participant) whereas theattention cluster corresponds to a between-subject factor (each participant correspondseither to the group Visual Surveillance or the group Secondary Task).

Distance to exit rampFirstly, the participants were asked to evaluate the moment when the lane changestarted considering the desired exit ramp and the traffic situation, results are presentedon Figure 5.5.

Adequacy of distance to exit ramp

Q1.1 I find the moment of starting the lane change ...

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

very inadequate -4

0

very adequate 4

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

very inadequate -4

0

very adequate 4

Attention Group: Secondary Task

1800m

1000m

400m

Q1.2 I think the lane change started ...

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

too late -4

0

too soon 4

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

too late -4

0

too soon 4

Attention Group: Secondary Task

1800m

1000m

400m

Figure 5.5: Participants response about the timing adequacy of the lane change to theright-most lane. The answers are shown for the two attention groups: theparticipants who were supervising the drive and the participants who weredistracted with a combined visual and cognitive distraction

When considering the adequacy of the lane change related to the remaining distanceto the exit ramp, the group dedicated to visual surveillance showed that the furtherdistance (1800 m) mainly evaluated as inadequate and too soon for the free-flow-trafficscenarios, in synchronized-traffic scenarios the position was also evaluated as soon butthe evaluation tends to neutrality. For latter lane changes the evaluation was positive,

102 5.2 User Study on the Dynamic Driving Simulator

being the 1000 m as lightly soon an the 400 m as lightly late. The opinion of theparticipants of the distracted group presents the same trend with lightly moderatedevaluations for the 400 m and 1000 m and synchronized flow scenarios, but for the1800 m scenario with free-flow they have a more favourable opinion of a sooner lanechange than the participants of the visual surveillance group.

The question Q1.1 allows to analyze the dependency of the lane change adequacyagainst traffic flow and distance to the exit ramp. As measure of effect size the η2

and the partial − η2 are presented presented as suggested by with Levine et al. [92],these metrics η2 can be interpreted in terms of the variance accounted by a variable.The metrics are computed with Matlab and the toolboxes provided by Hentschke etal. [64] and Schurger [134]. Table 5.8 summarizes the results of the two-way Analysisof Variance (ANOVA) with repeated measures for Adequacy of distance to exit rampconsidering traffic flow and distance. Traffic flow and distance to the exit ramp have asignificance influence in the adequacy of the lane change moment, for both attentiongroups with p < 0.01 but only the visual surveillance group shows a significance forthe interaction of traffic flow and distance. The effect size of the factors is also greaterfor the visual surveillance group than for the secondary task group.

Visual SurveillanceMeasurements F df1 df2 p η2 partial − η2

Distance 12.55 2 68 0.000 0.224 0.251TrafficFlow 68.65 3 102 0.000 0.061 0.084

TrafficFlow*Distance 4.96 6 204 0.000 0.048 0.068Secondary Task

Measurements F df1 df2 p η2 partial − η2

Distance 12.55 2 68 0.000 0.000 0.090TrafficFlow 68.65 3 102 0.000 0.001 0.031

TrafficFlow*Distance 4.96 6 204 0.784 0.068 0.008

Table 5.8: Results of the two-way ANOVA with repeated measures for Adequacy of distanceto exit ramp considering traffic flow and distance

Distance was confirmed as the most relevant parameter in the perceived adequacy ofdistance to exit ramp. In order to compare if the responses of both Attention Groupspresented significance differences in the Q1.2 evaluations for each distance, a T-Test forsimilar variances was performed, results are presented in Table 5.9 The perception of asoon lane change by 1800 m presented significance difference by the "eyes-on" groupthan by the "mind-off" group in the Q1.2 evaluation, the p ≤ 0.001 suggested that thedata of both groups came from populations with unequal means. The perception for

Chapter 5 103

the lane changes at 1000 m and 400 m presented no significance differences for theevaluation in both groups. The most-preferred lane change lies between 1000 m and400 m, making a linear interpolation the position 600 m could be suggested.

Visual Surveillance Secondary TaskDistance mean SD mean SD p-value1800 2.25 1.61 1.50 1.83 5.55e-41000 0.85 1.16 0.68 1.28 0.27400 -0.7 1.07 -0.5 1.13 0.14

Table 5.9: Results of the T-Test between both Attention Groups for the evaluation of thedistance to exit ramp when the lane change started.

Gap Adequacy for manual drive and for automated driveThe participants were then asked to evaluate the gap selected for the lane change.They were explained that the vehicle merged into an available gap on the right laneand integrated itself into to the rightmost-lane in order to reach the exit ramp andthey were asked to assess if their would have chosen another gap, both for the cases ofdriving themselves and for the case of an automated system. Results are presented inFigure 5.6.

Results show that the participants of the visual surveillance group would have oneor several vehicles if they would be driving for the 1800 m variants, with a trendto the same gap for scenarios with a higher traffic density, they also would haveexpected the automated vehicle to have taken the a later gap. This tendency is alsopresent in the secondary task group, although here the preference are more moderated.Regarding the free-flow traffic density scenarios and the visual surveillance group, forthe 1000 m situations the participants would have overtaken one vehicle more whendriving manually and agree with the chosen gap for the 400 m scenarios. In case ofsynchronized traffic-flow, the preference of the visual surveillance group are lightlynear to the selected gap in 1000 m and most of the drivers would have overtaken onevehicle less in the 400 m scenarios. For an automated system the trends are similarbut lightly more conservative. For the secondary task group their preferences as activedriver or an the automated systems are more similar. They would have overtaken onemore vehicle in the 1800 m scenarios and in the 1000 m scenario with free-flow trafficand agree with the gap in the 1000 m scenario and synchronized traffic as-well as inthe 400 m scenarios.

Generally the results of the visual surveillance group indicate that they would chosegaps nearer to the exit ram if they were the drivers than for if they were driven for anautomated system. It could indicate that the drivers choose a more aggressive trend

104 5.2 User Study on the Dynamic Driving Simulator

Gap adequacy

Q2.1 If you were driving the vehicle you would have ...

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

overtaken several less vehicles

overtaken one vehicle less

taken the same gap

overtaken one vehicle more

overtaken several more vehicles

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

overtaken several less vehicles

overtaken one vehicle less

taken the same gap

overtaken one vehicle more

overtaken several more vehicles

Attention Group: Secondary Task

1800m

1000m

400m

Q2.2 In your opinion the automated system should have ...

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

overtaken several less vehicles

overtaken one vehicle less

taken the same gap

overtaken one vehicle more

overtaken several more vehicles

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

overtaken several less vehicles

overtaken one vehicle less

taken the same gap

overtaken one vehicle more

overtaken several more vehicles

Attention Group: Secondary Task

1800m

1000m

400m

Figure 5.6: Participants response about their gap selection criteria for vehicle driven bythemselves and by an automated system. The answers are shown for thetwo attention groups: the participants who were supervising the drive andthe participants who were distracted with a combined visual and cognitivedistraction.

when they are driving themselves and prefer a lightly conservative strategy when theyare passengers of an automated vehicle. In order to analyze this hypothesis a T-Testanalysis was performed, assessing if the means of both groups presented significantsimilarities. Table 5.10 presents the results.

Manual drive vs. Automated driveVisual Surveillance Secondary Task

Manual drive Automated drive Manual drive Automated drivemean SD mean SD p-value mean SD mean SD p-value-1.295 1.093 -1.557 1.085 5.21e-04 -1.480 1.060 -1.605 1.0014 0.104

Table 5.10: Results of the T-Test for both Attention Groups for desired gap evaluation fora manual or an automated drive.

The null hypothesis, H0, says as follows the participants would select the same gapwhen they are driving themselves than they expected the vehicle to take when the driveis automated. The alternative hypothesis is the opposite one - they would select adifferent gap when they are driving themselves as when they are in an automatedvehicle. The comparison of both preferences for the manual drive and the automateddrive present significantly unequal means (p 5 0.01) for the "eyes-on" group whereasfor the "mind-off" group the values were not significantly different. The results indicate

Chapter 5 105

that the Visual Surveillance group generally prefers a more conservative strategy with asooner alignment on the right side when the vehicle is driven by an automated functionthan when they are driving by themselves. The Secondary Task group present similarpreferences for both options: manual drive and automated drive. Looking into detailthe results of the Secondary Task group correspond with the conservative preferencesselected also for the automated-drive in the Visual Surveillance group as presented inTable 5.11.

Visual Surveillance vs. Secondary Task"eyes-on" - manual "mind-off" "eyes-on" - automated "mind-of"mean SD mean SD p-value mean SD mean SD p-value-1.295 1.093 -1.5431 1.0326 1.38e-04 -1.557 1.085 -1.5431 1.0326 0.827

Table 5.11: Results of the T-Test comparing the response of the "mind-off" group with theresults of the "eyes-on" group for a manual or an automated drive.

Vehicle DynamicsThen, the participants were asked to evaluate the vehicle dynamics in longitudinal andlateral therms. Figure 5.7 presents the participants responses.

Vehicle dynamics

Q3.1 How did you find the vehicle longitudinal dynamics?

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

very uncomfortable -4

0

very comfortable 4

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

very uncomfortable -4

0

very comfortable 4

Attention Group: Secondary Task

1800m

1000m

400m

Q3.2 How did you find the vehicle lateral dynamics?

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

very uncomfortable -4

0

very comfortable 4

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

very uncomfortable -4

0

very comfortable 4

Attention Group: Secondary Task

1800m

1000m

400m

Figure 5.7: Participants response about the vehicle dynamics during the lane change. Theanswers are shown for the two attention groups: the participants who weresupervising the drive and the participants who were distracted with a combinedvisual and cognitive distraction

The higher values of lateral and longitudinal accelerations of the scenario free-flow-1.8sand 1800 m were perceived by both groups and evaluated as less comfortable than

106 5.2 User Study on the Dynamic Driving Simulator

the other cases for longitudinal and lateral dynamics. For the other scenarios, theevaluation of the longitudinal dynamics in free-flow presented a bigger dispersionthan the synchronized traffic-flow. The visual surveillance group evaluated the lateraldynamic similarly for all cases while the secondary task attention group evaluatedworse the lateral dynamic of the scenarios that also get a worse evaluation for thelateral dynamic. The results of the Secondary group task, present fewer differencebetween lateral and longitudinal dynamics than the visual surveillance group. Apossible explanation could be the attention deviation, they felt something (visceral)but were less able to order it into a longitudinal or lateral dimension.

Table 5.12 presents the results of the vehicle dynamics perception according to theAttention Group, analyzed with a T-Test. It is not possible to discard that the responseof both groups present equal means with equal variances. As the response of bothgroups did not presented significant differences, both groups were considered as one inthe analysis of the influence parameters of the vehicle dynamics perception.

Longitudinal Dynamics Lateral Dynamics"eyes-on" "mind-off" "eyes-on" "mind-of"

mean SD mean SD p-value mean SD mean SD p-value1.3262 2.3775 1.6417 2.2020 0.0563 2.2167 1.8209 2.0889 1.8519 0.332

Table 5.12: Results of the T-Test comparing the response of the "mind-off" group with theresults of the "eyes-on" group regarding the vehicle dynamics.

Section 5.1.3 presented the different parameters involved in the assessment of theexpected reward and Table 5.6 summarized the different cost values for each term inthe twelve scenarios. The scenarios presented similar lateral dynamics and differentlongitudinal dynamics. In order to obtain the information of which parameter had thegreatest influence in the perception of the participants, a linear regression model wasfitted in order to quantify the perceived comfort of the vehicle dynamics (ccomfDyn).The process of fitting a linear regression model is an iterative one, as indicated byMontgomery et al. [107]. The firstly model included the parameters related with thevehicle kinematics: JAlong

, JAlat, JJlong

, JSA, JY R and JV and their interaction withthe traffic flow as categorical variable (TrFl). For the traffic flow interaction thefour variations were combined into two clusters: FreeFlow and synchronFlow. The βicoefficients represent the corresponding coefficients of the linear regression model.

ccomfDyn =β0 + β1 · JAlong+ β2 · JAlat

β3 · JJlong+ β4 · JSA + β5 · JY R + β6 · JV +

+ β1:TrF l · JAlong∗ TrF l + β2:TrF l · JAlat

∗ TrF l + β3:TrF l · JJlong∗ TrF l+

+ β4:TrF l · JSA ∗ TrF l + β5:TrF l · JY R ∗ TrF l + β6:TrF l · JV ∗ TrF l(5.4)

Chapter 5 107

The model was obtained, progressively the parameters with a non significance influenceand the greatest p-value were subsequently eliminated, until all the parameters definingthe model presented a significant relevance (p > 0.05).

Table 5.13 present the results of the parameters identified as significant for the comfortperception of the lateral and the longitudinal dynamics, with the estimated βi as wellas the standard error, t-statistics, p-values and effect sizes.

Longitudinal dynamics: RMSE 2.06, R2 0.207Measurements Coef SE tStat p-value

Intercept 9.153 2.844 3.218 0.001JAlong

-7.45e-4 3.75e-4 -1.984 0.047JAlat

-0.057 0.023 -2.426 0.015JJlong

-2.04e-3 9.5e-4 -2.136 0.033JAlong

∗ synchronF low 7.9e-4 2.27e-4 3.483 0.0005Lateral dynamics: RMSE 1.72, R2 0.122

Measurements Coef SE tStat p-valueIntercept 10.189 1.295 7.865 0.000JAlat

-0.063 0.011 -5.571 0.000JJlong

-1.75e-3 4.58e-4 -3.837 0.000

Table 5.13: Results of regression analysis for the relevant terms to assess the longitudinaland vehicle dynamics.

The linear models only explain a small percent of the variance in the results, 20%for the longitudinal dynamics and 12% for the lateral dynamics but the models helpto understand the main influence factors. According to the findings of Bellem etal. [16], the acceleration and its jerk have the greatest influence in the vehicle dynamicperception, as they are perceived by the vestibular system. Their findings could alsoexplain why the perception of the vehicle dynamics for both attention groups wassimilar. The influence of the yaw rate JY R could be neglected in this study, it shouldbe also remarked that this term was similar for all the scenarios. Although the valuesof the steering angle were higher for the lower velocities, they did not presented anyinfluence, since the resulting lateral acceleration values were similar. This could indicatethat the lateral acceleration or the yaw rate are more adequate as cost term than thesteering angle to compare the vehicle dynamics of different scenarios since they arebetter perceived by the vestibular system and do not depend as much on the vehicle’svelocity as the steering angle.

Scenario Assessment

108 5.2 User Study on the Dynamic Driving Simulator

Personal perception

Q4.1 I perceived the previous experience, in therms of safety, as...

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

dangerous -4

0

safe 4

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

0

safe 4

Attention Group: Secondary Task

1800m

1000m

400m

Q4.2 I perceived the previous experience, in therms of comfort, as...

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

uncomfortable -4

0

comfortable 4

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

uncomfortable -4

0

comfortable 4

Attention Group: Secondary Task

1800m

1000m

400m

Q4.3 I perceived the previous experience, in therms of comprehensibility, as...

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

puzzling -4

0

comprehensive 4

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

puzzling -4

0

comprehensive 4

Attention Group: Secondary Task1800m

1000m

400m

Q4.5 I perceived the previous experience, in therms of conservatism, as...

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

speculative -4

0

conservative 4

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

speculative -4

0

conservative 4

Attention Group: Secondary Task

1800m

1000m

400m

Q4.5 I perceived the previous experience, in therms of desirability, as...

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

not desirable -4

0

desirable 4

Attention Group: Visual Surveillance

1800m

1000m

400m

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

free-3.6s

free-1.8s

sync-90km/h

sync-70km/h

Traffic Flow

not desirable -4

0

desirable 4

Attention Group: Secondary Task

1800m

1000m

400m

Figure 5.8: Participants response about their experience for each scenario with the auto-mated vehicle in therms of perceived safety, comfort, comprehensibility, conser-vatism and desirability. The answers are shown for the two attention groups:the participants who were supervising the drive and the participants who weredistracted with a combined visual and cognitive distraction

Chapter 5 109

Finally, the participants were asked to rate their experience with the automated drivingfunction for the driven scenario in therms of perceived safety, comfort, comprehensibility,conservatism and desirability. Figure 5.8 shows the responses.

The scenario free-flow-3.6s, 1800m presented a better evaluation in therms of desirabilityin the secondary task attention group than in the visual surveillance attention group,with similar values in therm of perceived safety and perceived conservatism. This resultcould indicate that the passengers give a higher value to their travel time when theyare not concentrated in other tasks.

In general therms, the participants of both groups perceived the scenarios as safe,comfortable and comprehensible. The free-flow-1.8s in 1800 m was the exception,this scenario with a highly dynamical lane change far away from the exit ramp wasperceived whether as safe not as dangerous, lightly uncomfortable and puzzling. Thisscenario also was the worst evaluated in therms of desirability. Both the 1000 andthe 400 m scenarios were well evaluated in terms of desirability, as well as 1800 m

scenarios with dense synchronized traffic flow.

Table 5.14 shows the results of a T-Test comparing the results between both attentiongroups, the Visual Surveillance or "eyes-on" group and the Secondary Task or "mind-off"group. The therms comfort, comprehensibility and desirability presented significantdifferences between both attention groups and only the responses about safety andconservatism were similar.

Visual Surveillance vs. Secondary Task"eyes-on" "mind-off"

therm mean SD mean SD p-valuesafety 2.238 1.875 2.266 1.772 0.82comfort 1.431 2.217 2.030 1.944 7.44e-5

comprehensibility 1.269 2.357 1.680 2.053 0.01conservatism 1.295 2.135 1.308 2.070 0.93desirability 0.945 2.476 1.444 2.302 0.003

Table 5.14: Results of the T-Test comparing the response of the "mind-off" group withthe results of the "eyes-on" group regarding the perceived safety, comfort,comprehensibility, conservatism and overall desirability.

The main influence parameters of each therm was analyzed using a linear regression,as explained before. The parameters considered in the study were the parametersrelated to the vehicle dynamics JAlong

, JAlatand JJlong

, as well as the interaction thermJAlong

with TrafficFlow. Additionally, the parameters related with time to collisionJTTC , the Time Headway JTHW and the distance till exit ramp D2E, as well as theirinteraction with the two categories of TrafficFlow - TrF l, with the two categoriesof AttentionGroup -AtGr and with the combined interaction of TrafficFlow with the

110 5.2 User Study on the Dynamic Driving Simulator

Distance till exit Ramp D2E were also included, as the question 1 revealed an influenceof the interaction of both factors. The steering angle and the yaw rate parameterswere not included JSA, JY R. In order to assess the influence of the velocity deviation,two parameters were analyzed: the velocity JV proposed by Althoff [5]the interactionof this therm and the deviation to the maximal velocity of the section JvLane presentedin section 5.1.3.

ck =β0 + β1 · JAlong+ β2 · JAlat

+ β3 · JJlong+ β1:TrF l · JAlong

∗ TrF l + β4 · JTHW++ β4:TrF l · JTHW ∗ TrF l + β4:AtGr · JTHW ∗AtGr++ β4:TrF l:D2E · JTHW ∗ TrF l ∗D2E + β5 · JTTC + β5:TrF l · JTTC ∗ TrF l++ β5:AtGr · JTTC ∗AtGr + β5:TrF l:D2E · JTTC ∗ TrF l ∗D2E++ β6 ·D2E + β6:TrF l ·D2E ∗ TrF l + +β6:AtGr ·D2E ∗AtGr++ β7 · JV + β8 · JvLane

(5.5)

Table 5.15 shows that for the safety perception the TTC, the THW and the distance toexit ramp were identified as significant parameters, as well as the THW and distancein dependence of the TrafficFlow. The fitted model only helped to explain a 12.4% ofthe variance.

Safety: RMSE 1.72, R2 0.124Measurements Coef SE tStat p-value

β0 1.603 0.196 8.157 1.37e-15JTTC -0.164 0.024 -6.604 7.426e-11JTHW -8.24e-3 2.46e-3 -3.3481 8.53 e-4D2E 5.24e-3 1.40e-3 3.7207 2-13e-4

JTHW ∗ SynchronizedF low 8.82e-3 2.52e-3 3.489 5.11e-4D2E ∗ SynchronizedF low -5.08 1.41e-3 -3.59 3.51e-4

Table 5.15: Results of regression analysis for the relevant terms to assess safety perception.

Table 5.16 presents the relevant parameters for the comfort perception: the longitudinalacceleration and the lateral acceleration. The model is only able to explain a 14% of thevariance for the visual Surveillance Group and less than the 10% of the variance for theSecondary Task Group. Although the lateral and the longitudinal acceleration are theonly values identified as relevant, the variance explained by the model is small, speciallywhen compared with the vehicle dynamic perception (Table 5.13), where the modelwas able to explain a 20% of the variance for the longitudinal dynamics. Other factorsdifferent to the explored in the experiment may also influence the comfort perception,for example the perception of the "mind-off" group were significantly positive (Table5.14 mean 2.03, std: 1.94), probably due to the secondary task.

Chapter 5 111

Comfort - visual Surveillance: RMSE 2.06, R2 0.141Measurements Coef SE tStat p-value

β0 16.615 1.857 8.944 1.23e-17JAlong

-8.76e-4 2.69e-4 -3.249 0.001JAlat

-0.120 0.015 -7.900 2.50e-14Comfort - Secondary Task: RMSE 1.85, R2 0.096

Measurements Coef SE tStat p-valueβ0 12.263 1.805 6.792 4.61e-11

JAlong-9.73e-4 2.62e-4 -3.713 2.36e-4

JAlat-0.077 0.014 -5.245 2.68e-07

Table 5.16: Results of regression analysis for the relevant terms to assess comfort perception.

Table 5.17 presents the relevant parameters for the comprehensibility of the systemperception. The combined influence of the TTC and distance and the corrected deltavelocity term correspond to the significant parameters in the comprehensibility of thesystem for the participants of the "eyes-on" group. The use of JvLane instead of the JVparameter allows to increase the R2 from 0.14 to 0.18 - indicating. For the "mind-off"group, higher values of Time To Collision are also less understood for sooner lanechanges. Sooner lane changes also influence negatively the comprehensibility, but thiseffect is more accepted for synchronized Traffic Flow. The influence of the velocityparameter is not significant for this group, which could indicate that optimizing thetravel time gets less relevant when another secondary task, not related with the drivingtask is performed.

Comprehensibility - Visual Surveillance: RMSE 2.13, R2 0.185Measurements Coef SE tStat p-value

β0 2.011 0.138 14.568 3.72e-39JvLane -1.39e-4 2.62e-05 -5.342 1.51e-07

JTTC ∗D2E -6.48e-05 2.20e-05 -2.941 3.44e-3Comprehensibility - Secondary Task: RMSE 2.12, R2 0.196

β0 2.298 0.224 10.257 3.71e-22D2E -1.32e-3 2.41e-4 -5.458 8.26e-08

JTTC ∗D2E -6.49e-05 2.11e-05 -3.065 2.31e-3D2E ∗ synchronizedF low 9.72e-4 1.95e-4 4.978 9.39e-07

Table 5.17: Results of regression analysis for the relevant terms to assess comprehensibility.

Table 5.18 presents the relevant parameters for the perceived conservatism of the

112 5.2 User Study on the Dynamic Driving Simulator

system: the system is perceived as more aggressive when the term related to theTime to Collision increases and as more conservative when the distance to exit rampincreases. The model was able to explain a 15.5% of the variance.

Conservatism : RMSE 1.94, R2 0.155Measurements Coef SE tStat p-value

β0 -0.124 0.147 -0.847 0.397JTTC -0.135 0.022 -6.155 1.20e-09D2E 1.51e-3 1.29e-4 11.739 2.07e-29

Table 5.18: Results of regression analysis for the relevant terms to assess conservatism.

The assessment of the overall desirability was obtained as a model of the other fourtherms: safety, comfort, comprehensibility and conservatism. Table 5.19 shows theresults for the linear regression:

cDesirability =β0 + β1 · Safety + β2 · Comfort+ β3 · Comprehensibility + β4 · Conservatism(5.6)

Desirability - Visual Surveillance : RMSE 1.06, R2 0.819Measurements Coef SE tStat p-value

β0 -0.338 0.065 -5.151 4.00e-07Comfort 0.428 0.0376 11.391 2.38e-26

Comprehensibility 0.622 0.034 18.125 1.55e-54Conservatism -0.092 0.025 -3.635 3.13e-3Desirability - Secondary Task : RMSE 0.95, R2 0.831Measurements Coef SE tStat p-value

β0 -0.634 0.083 -7.63 2.18e-13Safety 0.128 0.061 2.096 0.036Comfort 0.376 0.058 6.453 3.59e-10

Comprehensibility 0.670 0.038 17.45 1.11e-49Conservatism -0.079 0.033 -2.402 0.016

Table 5.19: Results of regression analysis for the relevant therms to assess desirability.

The comfort and the comprehensibility are the most significant therms to explainthe overall desirability and have a proportional influence, while the conservatism hasa small negative influence. The therm of safety only was identified as significantlyrelevant in the group of the Secondary Task. This is probably due to the , safetycould be considered as an hygienic factor (it must be given) whereas the other onesare motivation factors, according to the Hygiene-Motivation Factors of Herzberg [65].

Chapter 5 113

Participant posterior mindset

Personal opinion of the participants about autonomous driving

0 50 100

I can trust the system [70]

The system is reliable [70]

I am confident with the system [70]

The actions of the system lead to harmfuloutcomes or disadvantages - adap.[70]

I am suspicious of the system’s intentsactions or outputs [70]

The system is deceptive [70]

The system provides road safety - adap. [8]

The system is confusing - adap. [26]

The actions of the system are nottransparent to me -adap. [26]

I find the system boring - adap. [8]

I would like to adapt the systemaccording to my preferences - own item

I would consider to buy the system - adap. [8]

Participant responses [%]

Responses [%]

I totally disagree Neutral I totally agree

Figure 5.9: Personal mindset of the participants after the study. Some statements wereadapted from the related literature (adap.)

The last part of the study, after driving the different scenarios, the participants wereasked about their personal opinion on autonomous driving. The general mindset showeda high trust on the system and the belief that the system provides drive safety anddo not lead to disadvantages. Despite the high acceptance shown by the participants

114 5.2 User Study on the Dynamic Driving Simulator

the confidence with the system was evaluated more neutrally as the other criteria, asfor most of the drivers, to give up the control of the vehicle to a machine needs someadaption time. Nevertheless, they evaluated the actions of the system as transparentand were willing to consider the buy of such a system.

One interesting question was that when the participants were asked about the possibilityto adapt the system to their own preference they were mostly strongly interested. Thatindicates that although they are willing to give up the control of the car they still wantto have an individualized solution and make their own choices.

5.2.6 Discussion

One limitation of this work is the use of a dynamic driving simulator. McGehee et al. [99]showed statistical equivalence between important driver reaction times for studies in adynamic driving simulator and in a vehicle driving in a test crash. Nevertheless, thedrivers would need more adaption time to give their trust to a vehicle driving complexreal traffic situations with several vehicles interacting with each others.

Note that the objective of the linear regression presented here, was not to obtain theweights of the cost function but to determine which parameters presented a significantinfluence. In order to obtain the weights based on real data, a higher number of samplespoints would be needed. This values could be obtained, for example, getting a feedbackof the drivers based on real fleet data. Many works propose to learn directly fromthe driver in order to learn their preferences and reproduce their driving style. It isa controversial issue. The work of Taieb-Maimon et al. [151] shows that a significantamount of drivers maintained a Time Headway under their braking reaction time. Theauthors proposed two possible explanations, either people were not accurately aware oftheir braking abilities or they assume the lead vehicle will not brake or slow suddenly.In addition, our results show a trend towards a more conservative driving style thanthe own one preferred when the driving task is performed by the system. This trend isaligned with the findings of Basu et al. [14], who identified that the driving style ofthe drivers tend to be more aggressive than the preferred for an automated system.Therefore it may be not enough to learn from the normal drive but it may be requiredto learn from "good drivers" or from drives which regard this conservative trend.

This work does not focuses on the take-over-request but another important questionwith highly automated driving is if the driver is requested to be available to take overwithin a reduced amount of time. In most situations the driver will be distracted andwill need an adaption time or even some assistance to regain the concentration intothe driving tasks during the take-over-request, the required take-over time depends onthe driver characteristics as described by Lin et al. [95] but also depends on the traffic

Chapter 5 115

state as presented by Gold et al.[57]. In order to guarantee a safe drive, the vehicleshould be able to bring itself into a safe state if the driver is not able to regain thecontrol of the vehicle.

5.3 Conclusions

In order to be accepted, highly automated vehicles should be able to perform at least,as well as a medium driver. That means, that the occupants should trust the vehicleand feel comfortable with it. The development of this trust level is essential on the earlyphase of automated vehicles and it is tightly related with the perception of comfortand safety. Unfortunately, this perception is many times subjective and different fromperson to person, and even differs for the same person in different moments.

In this study, driver comfort and safety perception when driving towards a highwayexit-ramp have been explored using a dynamic simulator-based experiment. The useof a dynamic simulator provides reality closed results in therms of perception of thedynamic parameters, and allows the repeatability of scenarios which would not begiven in real traffic. Nevertheless, the amount of scenarios driven within the study isquite limited and the results should be intensively validated on real traffic.

Within the scope of the study, following conclusions were obtained:

• The perception of the occupant depends on the attentiveness level to the surround-ing environment, participants with a visual and cognitive distraction were morewilling to accept sooner lane changes, although it represents a lightly increase ofthe travel time.

• Driving as close as possible to the reference velocity was identified as a significantparameter for the participants that were supervising the surrounding traffic butnot for the participants which had a visual and cognitive distraction.

• A dependency on the traffic Flow is observed, for both Attentiveness Groups

• The vehicle dynamics are similarly perceived for both attention groups, and theresponses are well explained by therms related with the vestibular perception.

• Although several trends could be observed, the preferences of each individualpresent variations, a personalized system could be more adequate to fit allpreferences.

This study benefits of the integration of a real system which also is able to run ontesting vehicles on the road. The findings of this work contribute to the calibrationand adjustment of the design of automated vehicles driving on the highway. Observingthe preferences of the participants, the most relevant parameters to assess the comfort,

116 5.3 Conclusions

safety and comprehensibility and adequacy perception were obtained. Also, a range ofpreferred distance to exit ramp can be proposed to be on the rightmost lane dependentof the traffic flow could be found. Nevertheless, the initial findings were restricted to aclosed environment, participants with a high affinity to vehicles and some of them witha previous experience on driving assistance functions. It could be possible that thegeneral public would prefer an more conservative approach. In order to generalize andobtain a mature system that can be trusted and perceived as familiar by the driver,further studies of the customers behavior and their feedback of the field function shouldbe integrated.

Chapter 6

Learning and Planning: LaneSelection via ReinforcementLearning

Highway scenarios are highly dynamic environments whereseveral vehicles interact following their own goal, leading todifferent combinations of scenes that also change over time.An autonomous system performing any driving activityshould be able to integrate information learned from formerinteractions. Reinforcement Learning has shown promisingresults, but it should only be applied to autonomous vehi-cles if the system is also able to fulfil safety and integrityrequirements on a deterministic and reproducible way. Thischapter presents a planning system that is able to learnover time, always complying to the safety requirements andcombining the advantages of Reinforcement Learning basedsystems and reactive systems. The method is evaluated insimulation comparing different learning techniques. Resultsshow that the planning system is able to adaptively integratethis experience outperforming rule-based strategies.

An autonomous driving system should be able to optimize comfort and safety whiletaking hard constraints like traffic rules or collision avoidance into account. Highwaysare highly dynamic environments, making it impossible to predict all state evolutioninvolving other agents’ behaviour and allowing only short term predictions. The vehicle,however, is able to collect its previous experiences and should use this information toimprove the decision policy.

This chapter presents a system that integrates the advantages of Reinforcement Learning

117

118

Figure 6.1: Vehicle driving towards an exit ramp - Image from Spider Driving Simulator.

in a planning structure. The focus is set on the example of driving towards an exitramp as the one shown in Figure 6.1, but our approach may be extended to otherscenarios. In such situations, a trade-off between comfort and driving time has to bemade, e.g. by balancing the chance to overtake slower vehicles and the risk of missingthe exit ramp. Human drivers make this decision based on the current situation andtheir former experiences. In a similar manner the decision-making in autonomousvehicles is considered as a planning problem: a plan within reliably predictable horizonis enhanced with the vehicle’s experiences to select the most adequate strategy.

An autonomous vehicle is confronted with a highly dynamic environment with a largeamount of possible evolutions. Reinforcement learning (RL) is a promising field toincorporate the experience from former interactions where actions are encourageddepending on prior results [150]. The decision-making for autonomous driving andsimilar applications imply high requirements on the integrity and safety of the system.The interaction of large number of agents makes the observation of traffic rulesmandatory, even within the learning phase. Safe RL techniques address such safetyconstraints. While some works rely on the modification of the optimization criteria,based on simulations encouraging the agent to avoid those not allowed or risky behaviors,other works modify the exploration process [53]. In this chapter, safety constraints arealso integrated as an intrinsic and explicit part of the learning and decision-makingprocess.

To integrate the vehicle’s experiences reinforcement learning is used. Two differentapproaches are proposed in this chapter. The first one follows the reinforcementlearning based policy but the decision supervised by the tactical planner that ensuressafety constraints. The second approach plans through different scene evolutions, asfar as a complete lane change or as lane keep maneuver can be predicted, and appendsthe Reinforcement Learning based value estimation at the predicted state.

Chapter 6 119

6.1 Planning Framework: Maneuver Planning,State Representation and Reward

Chapter 2 presented how a safety braking maneuver within the sensor range is alwaysplanned and Chapter 3 describes how the tactical planner layer selects the mostadequate gap for a desired lane change. It is assumed that the other traffic participantsbehave social, that they change the lane only if it does not imply a collision for thesurrounding traffic.

Environment Model

backend

Sensors

Actors Controller

Maneuver PlannerTacticalPlanner

LaneSelection

TrajectoryPlanner

Prediction

Figure 6.2: Simplified environment and vehicle control loop.

Figure 6.2 shows the simplified control loop, where the environment and backgroundinformation are pre-processed in an environmental model, the scene evolution ispredicted and the maneuver planning selects the optimal maneuver and correspondingtrajectory. This chapter focuses on the lane selection, a higher planning level than thetactical planner presented in chapter 3.In order to integrate the lane selection into the planning framework two differentmethods are followed: direct decision-making based on reinforcement learning and anintegrated planning and learning approach. The first one, selects the lane with thehigher expected reward based on the experience learned from previous situations andthen plans the most adequate sequence of actions to reach the lane. The second method,combines the predicted reward of the planning with the previous experiences to selectthe most adequate lane. Both methods are explained in detail in the Section 6.2.

6.1.1 State Representation and Action Space

The lane selection is formulated as a classical RL problem, being s ∈ S the currentstate of our vehicle, a ∈ A the action that the agent can select on each step, s′ ∈ S the

120 6.1 Planning Framework: Maneuver Planning, State Representation and Reward

state on the next step and r ∈ R the immediate reward obtained after each transition.The selected action a is determined at each step following a policy π. The aim is tofind the optimal deterministic policy so that π∗ = argamaxE(s′|s, a).

State RepresentationEach state is defined with the feature-based state representation, established by theset of features described on the Table 6.1.

Feature Descriptionlego absolute ego lanelGoal goal lanesrelGoal relative longitudinal distance to goal positionvego ego longitudinal velocityvlane(lane) average lane velocityvmaxSet maximal allowed velocityaego ego longitudinal accelerationegooα: ongoing action, ego vehicle is:

-kl - keeping the lane-clr - changing to the right lane-cll - changing to the left lane

Table 6.1: Features description for state representation.

Two different kinds of states are present during each episode, Transition States andthe Terminal State.

• Transition States, as their name suggest, are the set of states where the goal isstill not achieved but at least one action can be taken to reach the goal.

• The Terminal State T is a particular state for each episode. It determines whenthe episode finishes. Within this work, it is considered that an episode hasfinalized when:

– the ego vehicle reaches the goal longitudinal position (srelGoal = 0).

– the ego vehicle brakes into a standstill (because of a static obstacle, the endof the drive-able lane or due to another agent).

– the ego vehicle collides with another entity. Note that no collision shouldoccur in the situations covered by the safety constraints (chapter 2).

An episode is considered as successful if at the terminal state the vehicle is drivingon goal lane at the goal longitudinal position. In any other terminal state theepisode is considered as a failure.

Chapter 6 121

Action SpaceIn addition to the keep lane (KL), change lane to the right (CLR) and change lane tothe left (CLL) actions, the action of abort lane change (ALC) is introduced to simplifythe representation of the transitions between states. Once a lane change has begun,it is pursued until it is completed or aborted. ALC checks continuously if all thepreconditions for a requested lane change are still met. ALC also allows to introducesome extra desired aborting criteria (e.g. a maximum duration of a lane change).

6.1.2 Discrete Representation for Tabular Learning

The RL methods used in this chapter are tabular methods, therefore the continuousfeatures need to be discretized. The relative longitudinal distance to goal position,srelGoal, is discretized into sstep increments over the lane index. For the velocity, thedeviations are considered and classified into four groups: faster than, in range, slowerthan, and much slower than. For example, the deviation from the maximal sectionvelocity δvsection =vego − vmaxSet is discretized into: over the maximal allowed velocity,in range of the maximal allowed velocity, slower than the allowed velocity and muchslower than the allowed velocity. The lane velocity is discretized after The ThreePhases Traffic of Kerner [77] into free-flow, synchronized-flow and wide-moving-jam.

The world representation is discretized into a tabular representation but the lanedecision module runs in a small task rate, making possible to stay in a state overseveral time steps. We denote a state si as the immediate state at each time step i, andS as the corresponding discretized state. A transition between two states is defined tohappen between t and t+1 if the current discretized state S ′ differs from the previousdiscretized state S (St → S ′t+1 , S 6= S ′).

6.1.3 Reward Function and Return

A challenge of the driving task is the simultaneous consideration of short and longterm goals. On one side, the whole drive needs to be comfortable and safe, on the otherside, it needs to assure that the final destination is reached. Both kinds of objectivesneed to be included to balance the conservative side with the practical one. Thus, itshould be avoided to stay behind much slower vehicles, the exit ramp must be reachedand the selection of a faster lane, where the traffic is driving faster than we are allowedto, should be avoided.

The reward is defined as composition of long and short term metrics. As presented inchapter 3, the cost value is defined as a combination of comfort, safety and benefit. Thereward can be computed as the opposite of the cost. Each time step i, an immediate

122 6.1 Planning Framework: Maneuver Planning, State Representation and Reward

(negative) reward ri based on comfort, safety, lane adequacy and success metrics iscomputed:

• Comfort cost (ccomfi): Evaluation of the vehicle dynamics, as presented in

chapter 5, it is defined as a combination of the longitudinal and lateral accelerationvalues.

ccomfi= w1 · aego + w2 · alatego . (6.1)

• Safety cost (csafei): Evaluation of the safety of the ego vehicle related to

the surrounding vehicles. It is defined as the safety-ratio based on Time toCollision(TTC ) and Time Headway (THW ) between vehicles (see Chapter 2).

• Lane cost: (cli) Evaluation of the lane adequacy. When the lane change desired isnot an external input, the benefit of driving on each lane has to be assessed. Thelane cost is defined as a combination of the lateral lane adequacy or deviationto the right-most lane and the longitudinal lane adequacy or deviation to themaximal velocity.The cost of the right-most lane rule (crmli) is defined as the ratio between thecurrent absolute lane and the total number of lanes of the current section:

crmli = lGoaltotal number of lanes .

The cost of the current lane velocity (clvi) assesses the deviation between the

velocity at a given lane (vlane) and the maximal allowed velocity (vmaxSet). In thischapter, and based on the results of Chapter 5 four clusters of relative velocityare defined (much faster , maximal velocity, slower and much slower). The weightis assigned proportionally as presented in figure 6.3.

clvi= wi ·

|∆v|vmaxSet

.

v1 v2 v3 v4 v5 v6Much slower Slower Similar Faster

wsm

ws

w0

wf

v

weight

Figure 6.3: Weight coefficient values for the current lane velocity cost.

• Episode success (csuccess): The terminal state gets a penalization if it is notsuccessful .

csuccess =

Pfail, (i == T ) & (legoi6= lGoal)

0, otherwise.

Chapter 6 123

The reward Rt+1 corresponding to the transition St → S ′t+1 can be defined as theaccumulated instant rewards (ri) while the discretized state S is held:

Rt+1 =τ<t+1∑τ=t0

rτ (s) | S(τ) = St ,∀τ ∈ [t, t + 1 ) (6.2)

St S′t+1

sτ0 sτ1 sτ2 sτn s′τ0 s′

τm

Rt+1

rτ1 rτ2 rτn+1

When the vehicle is far away from its ramp out, the influence of the terminal stateshould be lower. With the inclusion of a discount factor γ, the values of states nearthe exit position get a higher influence of the episode’s success than the states locatedfurther away. The return Gtk is defined as the cumulative discounted reward followingtk, with T the index of terminal state:

Gtk =T∑

j=tk

γj−kRj (6.3)

The return from t0 Gt0 can be subdivided in two parts, the accumulated reward untiltk Gt0:tk and the discounted return following tk or Gtk The return Gt0 :tk corresponds tothe cumulative discounted reward following t0 until tk or the expected reward betweenS0 and Sk and the return Gtk to the expected reward from Sk:

Gt0 = Gt0 :tk + γkGtk =tk∑

j=t0

γjRj + γk

T∑m=tk

γm−kRm

(6.4)

6.1.4 Action-State Value Updates

The goal of the agent is to select the action that optimizes the expected return or sumof discounted rewards. Value functions are functions of states or state-action pairsthat estimate how good is for the agent to be in a given state or how good is for anagent to perform a given action in a given state, defined in terms of expected return[150]. The rewards depend on which action the agent takes. The mapping from statesto probability of selecting each possible action is the policy π. vπ is the state-value

124 6.1 Planning Framework: Maneuver Planning, State Representation and Reward

function for a policy π, being the value of a state s under a policy π the expectedreturn starting at s and following pi.

vπ .= Eπ [Gt|St = s] (6.5)

The value of taking action α in state s under a policy π, denoted qπ(s, a), is defined asthe expected return starting from s, taking the action α and following afterwards thepolicy π:

qπ .= Eπ [Gt|St = s,At = α] (6.6)

This work focuses on the state-action pair values and the aim is to learn the state-actionvalue function for the selected policy π. To estimate from experience the state-actionvalue qπ an average of the observed return that followed a taken action for each statestate-action can be kept, when the number of times that a state and action pair aretaken tends to infinity the average values converge to the action values qπ(s, a). Insteadof keeping separate average of each state-action pair it is also possible to approximateqpi as a parametrized function and to adjust the values of the parametrized functionin order to match the observed values. Within this work a tabular approach anddiscretized state representation is followed and separate values for each state-actionpair qπ(s, a) are kept. The definition of a parameterized function approximator is notscope of this thesis.

Monte Carlo estimations for the state-action values:One way to average the return and to estimate the qπ(s, α) is to average the completeobserved return that followed a state-action pair until the finalization of the episode orreaching the terminal state ST . This method is called Monte Carlo because it averagesover random samples of actual returns.

St, αt St+1St+1, αt+1 St+2

ST−1, αT−1 ST

Rt+1 Rt+2 RT

Figure 6.4: Monte-Carlo backup diagram

In this way, for Monte Carlo state-action values updates, each state estimation does notdepend on the estimate of the other states. Monte Carlo state-action values updatesare presented in Algorithm 3. As this method averages the returns of an episode onceit is finished, no prior knowledge of the environment or its dynamics is required.

Monte Carlo Methods allow to learn directly from the former experiences withouta model of the environment, but in order to update the estimate they have to waitthe episode to finish. Temporal-Difference learning methods update their estimates

Chapter 6 125

Algorithm 3 Monte Carlo Updates for State-Action ValuesWhen the episode finishesfor each pair s, α appearing in the episode do

G ← the return that follows the first occurrence of s, aAppend G to Returns(s, α)Q(s, α)← average(Returns(s, α))

end for

based on previous estimates and on the current experience without waiting to the finaloutcome, they boostrap.

Dyn-n-steps estimations for the state-action values:Sutton [150] presented n-step bootstraping, which allows a fast update of the actionvalues along relevant time intervals. This concept is here used keeping n dynamic.

Sk, αk Sk+1Sk+1, αk+1 Sk+2

St−1, αt−1 StSt, αt

Rk+1 Rk+2 Rt−1

with αk = αk+1 = ... = αt−1 and tt − tk ≤ tupdateMax

Figure 6.5: Dyn-n-updates backup diagram

In this way, updates are made after every change of the selected action or after a definedmaximum update time interval (tupdateMax), as shown in the Algorithm 4. Dyn-n-stepsis a boostraping method where the estimate for one state depends on the estimate offollowing states. Through this, the aim is to gain a higher reflection of the rewardscoming from significant local changes. The main difference between the dyn-n-updatesand the Monte Carlo updates used in this work is that the first ones update the returnvalues for the state-action pairs locally while as with the Monte Carlo approach thereturn values are updated at the end of the episode

6.2 Policy Selection, Reinforcement Learning withManeuver Planning

The lane change decision occurs in the higher level of the planning structure presentedin Figure 6.2. One level below, the tactical planner processes the gaps and provides anestimation of each lane change and lane keep maneuver within the sensor range. Eachtrajectory that achieves a defined gap belongs to the maneuver driving into this gap.Safety requirements are already covered, as the tactical planner selects only among

126 6.2 Policy Selection, Reinforcement Learning with Maneuver Planning

Algorithm 4 Update State-Action Values with dyn-n-stepsif t < T then

if new action or time horizon exceeded thenfor k = t− n : k < t, k + + do

G ← ∑min(k+n,T)i=k+1 γi−k−1 Ri

G ← G + γt−kQ(St , αt)Q(Sk , αk)← Q(Sk , αk) + β[G −Q(Sk , αk)]

end forn← 0

elsen← n+ 1

end ifelse

for k = t− n : k <= t, k + + doG ← ∑min(t+n,T)

i=k+1 γi−k−1 Ri

Q(Sk , αk)← Q(Sk , αk) + β[G −Q(Sk , αk)]end for

end if

feasible maneuvers, and a safety braking within the visibility range is also guaranteed.The lower planning level ensures that the planner is continuously updated with thesensors information. This structure allows to integrate the learning process in twodifferent ways:

• Direct decision-making based on Reinforcement Learning. The expected rewardof each action is computed based solely on the reward obtained during previousexperiences. In this chapter, two different tabular RL methods for the estimationof the action values are analyzed: ε-soft policies with Monte Carlo and dynamic-n-step updates.

• Integrated planning and learning approach. The integration of RL predictionwithin the planning structure follows the idea of Dyna [149] and MCTS [22]. Theestimation of the state-action values for the policy selection is a combination ofthe information provided by the tactical planner in the short horizon (model-based information) with the estimated state-action value obtained with RL. Forchanging lanes, the tactical planner provides an estimation for every availableaction within a defined time horizon for lane changes and for keeping the lane.Then, the expected value for the resulting state is appended. The studied methodin this chapter also obtains the estimated reward with tabular methods.

.

Chapter 6 127

6.2.1 Decision-Making based on Reinforcement Learning

The first method corresponds to the classical planning cascade, where the lane changedecision is taken on the highest level and the lower levels provide different options toexecute it.

The algorithm learns and updates the state-action values for each state-action pairvisited during each episode. Any standard RL algorithm can be hence integrated forthe lane selection. The presented experiments use Monte Carlo and dynamic n-stepupdates for the action state values and follow ε-soft policies to select the next action.

Algorithm 5 Policy Selection - Decision Making based on Reinforcement Learningeach time step:limit maximal velocity to current visibilityA(st0)← available actions according to road limitationsα∗ ← arg maxα[G|St0 , α]For all α ∈ A(s) select ε− greedy policy with:

π(α|s)←

1− ε+ ε/|A(s)|, if α = α∗

ε/|A(s)|, if α 6= α∗

tactical planner:. identify and assess reachable gaps following αt0(st0).

select αt0 , gap ← arg maxgapE[Gt0 :tk |st0 , αt0 , gap]

6.2.2 Combined Decision-Making: Planning and Learning

This approach tries to make a better use of the available information. The accurateestimation of the cost of each maneuver already provided by the tactical planner is usedin the further decisions. Then, the highest level integrates this information and appendsthe learned state after the maneuver completion. This way, a detailed estimationof the associate reward for each maneuver cluster and a learned estimation for theconsequences in the long term can be aquired. For the policy selection, the highestplanning level processes the topology information including upcoming velocity andlane change limitations. It receives the information from the tactical planner about theavailable maneuvers and their estimated cost. At the resulting predicted state of eachmaneuver, the estimated value state is considered. The resulting state-action valuesare then updated including all available information, as presented in Algorithm 6. Foreach time step t, the planner identifies the maneuvers that fulfil the preconditionsand estimates the reward related to each one until a new stable state is achieved

128 6.2 Policy Selection, Reinforcement Learning with Maneuver Planning

(E[Gt0 :tk |a, gap]). A stable state is reached when the lane change is finished or whenkeep-lane is hold among a short horizon time. Then, the expected value of the reachedstate, Gtk+1 = ∑

a π(a|St+1 )Q(St+1 , a), is added to the planned transition, obtaining astate-action value estimation for each maneuver into the different available gaps. Theestimated state-action value for each lane results from the maximum expected value ofall available gaps on the lane. The lane policy is selected following an ε− greedy policyand the gap for the selected action is chosen following a greedy policy. Figure 6.6shows several steps of one episode from the experiments presented in Section 6.3.1.In this example, the valid gaps and their corresponding values are listed. The gapswhich do not accomplish the reachability or gap size requirements are not listed. Notethat virtual vehicles are located at the limits of the visibility range. In this example,the first planning step shows two different available actions for change lane to theleft, beeing the change behind veh1 38 the best of both with an expected cummulatedreward of -755. For keep lane there is only one option, with a return of -582. Forchanging the lane to the right, the best available option is to change the lane betweenveh1 31 and veh1 24 with an expected return of 620. The greedy policy in this case isto keep the lane. For the second planning step, the lane change to the left is availablebut misses the exit ramp, leaving only the keep lane strategy as an option

Algorithm 6 Policy Selectioneach time step:limit maximal velocity to current visibilityA(st0)← available actions according to road limitationstactical planner:

. identify and assess reachable gaps following A(st0).for all reachable gap (gap) with action (α) do

G← E[Gt0:tk |α, gap]. forwards simulate transient state.

G ← G + γt−k ∑α π(α|Stk )Q(Stk , α)

. append expected value.end fora∗ ← arg maxα,gap[G|St0 , α, gap]For all α ∈ A(s) select ε− greedy policy with:

π(α|s)←

1− ε+ ε/|A(s)|, if α = α∗

ε/|A(s)|, if α 6= α∗

select αt0 , gap ← arg maxgap[G|St0 , αt0 , gap]

Chapter 6 129

St0 [lego, vego, srelGoal] : [2, 22m/s, 1820m]

[idrear, idfront]Selected α, gap

[lego, vego, srelGoal]

Stk

E[Gt0:tk |α, gap]Planning Estimation ∑

α π(α|St+k)Q(St+k , α)

Expected value Stk

CLL [138-128] -173 -684[3, 22m/s, 1558 m]CLL [idR-138] -252 -503[3, 22m/s, 1381 m]KL [127-idF] -50 -532[2, 22m/s, 1726 m]CLR [132-131] -241 -400[1, 20m/s, 1492 m]CLR [131-124] -179 -441[1, 20m/s, 1600 m]

St0 [lego, vego, srelGoal] : [1, 21 m/s, 404 m]

[idrear, idfront]Selected α, gap

[lego, vego, srelGoal]

Stk

E[Gt0:tk |α, gap]Planning Estimation ∑

α π(α|St+k)Q(St+k, α)

Expected value Stk

CLL [idR-137] -143 Exit Ramp Missed[2, 22m/s, 212 m]KL [127-131] -143 -16[1, 21m/s, 320 m]

Figure 6.6: Example during different planning steps, visibility range (-100 m, +150 m).idR and idF depict respectively the rear and front virtual vehicles located atthe limits of the visibility range.

6.3 Experiments

6.3.1 Simulation Experiments Setup

A multi-lane highway with three lanes is considered. 800 different traffic configurationsfor a highway section of three main lanes are generated. At the beginning of thehighway segment there is an entrance ramp ending after 750 m. An exit ramp beginsafter 3.2 km and has a length of 300 m. Three different variations of the maximalallowed highway velocity are included: unlimited, 100 km/h and 80 km/h. Figure 6.7shows the mean and the 95% range (mean ± 2 standard deviation (2SD)) of velocity

130 6.3 Experiments

and traffic density of the experiments. Simulation runs in Pelops [28], the other trafficparticipants follow the TRM model. Each vehicle gets a driver profile with individualdesired velocity, safety needs and estimation capacity, making their lane and velocitydecisions independently.

A broken vehicle standing on the right lane in 1% of the scenarios. At the beginningof each scenario, the ego vehicle is situated at 250 m on the entrance ramp with aninitial velocity of 70 km/h. The discount factor γ was set to 0.9 and the step-rate αwas set to 1/n, with n the number of state visits, the sample average method.

0 10 20 30

density [vehicles/km]

60

80

100

120

v [k

m/h

]

mean + 2SDmean - 2SDmean velocity

Figure 6.7: Average Traffic Velocity and Density.

6.3.2 Simulation Experiments Results

The four methods detailed in section 6.1 are compared against a rule based strategy:driving always on the N-lane and change to the right lane by (0.5km*N-lane) for theexit, being N the middle lane. Figures 6.8b and 6.8a show the learning process overall the episodes. The success rates and the reward values begin to stabilize around the300th episode, then the success rate continues to improve asymptotically. Table 6.2presents the average metrics comparison for the last 300 episodes. The comfort costof the baseline method (37.52) is better compared with the learned methods, as thenumber of lane changes is limited the cost derived from the changes in the lateral jerkare lower, but since the success rate is also lower than for the MC and the Mpl-RLmethods, the average episode reward is better for the two learning methods. MC andthe combined method present better results than the baseline on the average episodereward and success rate. Furthermore, the average episode reward of the combinedmethod outperforms the MC approach, since having an accurate planning of the lanechanges before selecting the strategy allows to increase the episode reward for a similarsuccess rate.

Chapter 6 131

0 200 400 600 800 1000

Number of Episodes

0

1

Succ

ess

Rat

e

(a) Success Rate

0 200 400 600 800 1000

Number of Episodes

-15000

-10000

-5000

0

Ave

rage

Rew

ard

(b) Average Reward

Figure 6.8: Simulation Results.

Table 6.2: Metrics comparison for simulation results over different approaches.

Method Sucess Rate Safety Comfort EpisodeCost Cost Reward

Baseline 0.73 88.15 37.52 -4362.77MCε−0.3 0.78 127.98 48.97 -3947.54dyn-n-stepsε−0.3 0.69 76.69 55.54 -4703.35Mpl-RLε−0.3 0.78 77.58 55.51 -3751.10

6.3.3 Discussion

Learning based on experiences present several advantages over a defined rule basedstrategy. For example staying vehicles or obstacles impeding the traffic flow on one lanecan be easily avoided without including extra rules. On the other side, the learningprocess takes some time to stabilize.

Not all the methods are adequate for our scenario. The learning rate ε is 0.3, and theavailable set of maneuvers is three, therefore the maximum expected success rate forthe ε − greedy policy is 0.8. Both MC and the combined method reach this value,but it is not the case of the dynamic-n-step approach. The selected bootstrapingmethod presents some disadvantages over a back-propagation after the finalizationof the episode. Due to the characteristics of a drive towards an exit-ramp and thediscretization of the distance, no state can be revisited during the same episode and theadvantages of temporal difference learning towards Monte Carlo (avoiding the need ofwaiting till the epsisode end to update the expected rewards) do not have any influence.The idea of the dynamic-n-step was to obtain more information about the local effectsof each action change. Here, this problem is solved by merging the lane decision-making

132 6.4 Related Work

with the tactical planning information. In this way, we include precise informationabout the near environment and the possible scene evolution without jeopardizing thelong term goal.

6.4 Related Work

Chen et al. [25] identified two different approaches for the decision-making in au-tonomous driving: mediated perception approaches, where the sensor information isfused on a consistent world representation and then the decision- making is taken basedon this representation and behavior reflex approaches, which present a direct mappingfrom the sensor input into a driving action. They proposed a third approach, the directperception approach, where a set of compact descriptions of the scene enables a simplecontroller to drive autonomously. Most of the classical decision-making approachesbelong to the first group, in particular, the decision-making in autonomous driving hasclassically been approached as a planning problem. Galceran et al. [52] presented amulti-policy decision-making through an approximation of Partially observable MarkovDecision process (POMDP), which allows to take high-level behavioral decisions. Ae-berhard et al. [1] proposed an hybrid automaton that combines discrete decisions overa finite set of lateral and longitudinal states with a lower level trajectory planning.

Recent results integrate learning methods on the decision-making, providing a promisingway to improve the adaption and personalization of the driving task on autonomousdriving. Menéndez-Romero et al. [102] improved the gap assessment during a lanechange with the observed probability of a lane change into a gap. Vallon et al. [163]showed how to learn the lane change initialization based on Support Vector Machines(SVM). Learning by Demonstration approaches use Inverse RL to imitate the humanbehavior. Kuderer et al. [84] presented the use of Inverse RL to individualize thevelocity and acceleration behaviors and Sharifzadeh et al.[138] proposed the use ofDeep Q-Network as refinement step in Inverse RL approaches.

Hester et al. [66] showed how to learn to control the velocity of an autonomous vehicleusing a model-based framework by learning random forest models. The work ofGoogle DeepMind [106] presented Deep Q-network (DQN), a model-free end-to-endreinforcement learning approach using deep neural networks, which allows to learnpolicies directly from high-dimensional sensory inputs. Yu et al. [176] investigated theuse of Deep Q-Learning to learn to control a simulated car. On a similar directionMukadam et al. [110] presented a Deep Learning approach for the decision-makingand introduced Q-Masking on their learning process in order to restrict the decisionaccording to the current traffic restrictions. Q-Masking presents a possible solution to

Chapter 6 133

the safety and integrity problem when reinforcement learning is used.An autonomous vehicle has to guarantee safety during the whole drive. Many approachesrely on lower level controllers that optimize a collision-free trajectory within a shorthorizon. In this direction Schwarting et al. [135] proposed a framework for cityscenarios where, under defined constraints, the MPC approach is able to guaranteethe safety. Pek et al. [117] presented a formal lane change maneuver verification basedon traffic rules. One problem with top down architectures comes when the lower levelfails to find a solution. Thus, the higher level has to propose another alternative,loosing valuable time in critical situations. Therefore, the safety verification needs tobe integrated in the planning decision.Different proposals to integrate domain knowledge into the planning have been studied.Dornhege et al. [36] integrated external reasoning process during the motion planning,in this way, domain independent planners and domain-specific sub-solvers could becombined where necessary. Sridharan et al. [145] proposed an architecture to integrateprevious knowledge and identify and integrate new axioms when unexpected outcomesare experienced. An hybrid model combining learning and planning was proposed byWu et al. [172] for visual indoor-navigation tasks.An autonomous system should be able to integrate former experiences, to plan inanticipation and to react safely in unexpected dangerous situations. AlphaZero [142]used the Monte Carlo Robot Representing and Reasoning with Knowledge fromReinforcement Learning Tree Search (MCTS) to evaluate the evolution of the scenes.Sutton [148] presented the Dyna architecture, which integrates learning, planning andreaction. Thrun [155] presented a method to learn how to act in POMDP processes,representing all belief distributions by samples and using Monte Carlo sampling toperform the backups. Lu et al. [96] integrated logical-probabilistic representation andreasoning using POMDPs with model-based reinforcement learning for a service robotaccomplishing delivery tasks. Leonetti et al. [91] presented how a service robot cantake advantage of planning and reinforcement learning to constrain the behavior ofthe agent to reasonable choices and to adapt itself to the environment and to increaseits reliability. Yang et all.[175] presented a combination reinforcement learning andsymbolic planning of to allow rapid policy search and robust symbolic plans in complexdomains.In a similar way, this chapter presents a method to restrict the learning process tothe available actions, coming from a consistent world representation. The advantagesof a planning approach and the advantages of learning from previous experiences areintegrated together. The use of local world models within our sensor range allowsus to obtain accurate local descriptions while the use of reinforcement learning inour planning structure allows a better estimation of the long term effects. Thus, the

134 6.5 Conclusions

automated vehicle is able to optimize not only the short term but over the whole drive.

6.5 Conclusions

This chapter presents a novel method that bounds together the advantages of learningand planning without losing the reactive component of the system. The lower planninglayers provide maneuvers that accomplish the safety driving guarantees while thehigher levels select within the available those maneuvers that optimize the long-term objectives. The integration of planning and learning allows to combine thedetailed scene evolution and maneuver estimation provided by the tactical planner withexperience gained from former interactions. Thus, both the short and long term goalsare optimized maintaining the safety constraints. Simulations results confirm that thedirect integration of classic Reinforcement Learning methods can improve the resultsof rule based methods. The combined planning and learning system outperforms boththe rule based lane selection and the direct decision-making based on reinforcementlearning. The combined planing and learning system also showed lightly better resultsthan the hierarchical system, the inclusion of the tactical planner results within the laneselection allows to avoid local minimum optimizations. This system is applicable toautomation from assistance systems to highly automated driving applications. Futurework should address experiments in real driving situations.

Chapter 7

Conclusion

This Thesis investigates the automated driving for highway scenarios. The aim isto enable the automated vehicle to plan and execute the maneuvers needed to driveon a highway, from the entrance to the exit ramp, without a human guarding thesystem and integrating safety constraints into the planning. The proposed systemis able to integrate the information from the surrounding environment as well as tolearn from previous experiences in order to optimize the drive in mid- and long-terms.The parameters used in the objective function are analyzed within a user study in adynamic driver simulator.

As a first step, the planning domain and the planning problem were defined. Acombination of semantical and numerical planning was proposed in order to integratethe infinite possible numerical combinations into a finite description. Highways arehighly structured environments and therefore, one- and bi-dimensional abstractions ofthe topology, specially of its lanes, allows to combine different planning abstractionsworking on different time horizons together. The planner combines an action feasibilitycheck (preconditions) with a detailed assessment of the actions in mid horizon (tacticalplanner) and a lane selection strategy. The precondition module includes thesafety constraints within the perception time horizon. Based on the current situation,the tactical planning assess the available actions that are compliant with the safetyconstraints. The lane selection layer evaluates the utility of each lane and optimizesthe short and long-term objectives considering the information provided by the tacticalplanner. Finally, a 2D-trajectory planning provides the jerk optimized trajectorycorresponding to the selected action that would be executed by the actors.

The topology and its lane abstractions allow a first classification of the possible actions:KeepLane and ChangeLane. Those abstractions also allow to refer the safety constraintsto corresponding lanes as presented in Chapter 2. Then, the safety constraints weremodelled as preconditions for the different actions. The maximal velocity is defined bythe limitation of the sensor perception, the maximal current visibility and the currentacceleration and deceleration limits given by the road conditions: the autonomous

135

136

vehicle has to be able to perform an emergency braking and stop if a static object blocksits lane. The information about static obstacles and dynamic agents was included withthe concept of gaps, defining a gap as the available space between two obstacles oragents in a lane. The main precondition for the automated vehicle is that the vehiclehas to always maintain a safety distance to the agent or object in its front and, if noobject or agent is currently located in its field of view, it still must be able to react toa new obstacle on its perception limits and to perform an emergency braking if needed.Once this condition is fulfilled, the planner is allowed to select other actions as lanechanges. The precondition for a lane change is related to the goal gap: the availabledistance should be big enough to allow the ego vehicle to merge into, allowing all theinvolved agents to adapt their drive without leaving the comfort limits.

The tactical planner introduced in Chapter 3 allows the vehicle to plan and executethe required actions in the mid-term horizon. The autonomous vehicle evaluates thedifferent reachable gaps and assess the most suitable maneuver to change or keep thelane. A gap is defined as reachable if a combination of the maximal and minimalcomfort acceleration, which would position the vehicle between the front and therear gap limits within a defined time horizon, can be computed and if the calculatedcombination would allow the completion of the maneuver safely (safety distances arekept and the rear vehicle is never forced to break abruptly). Once the reachable gapsare identified, the scene of the maneuver where the ego-vehicle reaches each gap isforwards simulated with the most likely behavior of the other traffic participants,including the calculation of the estimated cost for each maneuver. This maneuverassessment can be enhanced with a term that evaluates the success probability of alane change into a gap based on former situations observed by the vehicle. In this way,uncertainties in the behavior of other traffic participants can be integrated.

The planner optimizes the objective function, but in order to increase the acceptanceof automated vehicles and contribute to a fluent traffic, the ego vehicle should be ableto recognize those situations when a light decrease of its own individual reward couldimprove the overall traffic flow. This ability to provide courtesy behavior is exploredin Chapter 4. The ego vehicle strategies are enhanced with courtesy actions in orderto facilitate the merging maneuver of a potential merging vehicle. The keepLane andthe changeLane strategies are extended with a keepLane-courtesy maneuver and achangeLane-courtesy maneuver where the ego vehicle opens a gap to facilitate themerging maneuver of the conflicting vehicle by, respectively, braking or changing thelane immediately. Furthermore, not only the most likely predicted maneuver of theconflicting vehicle is considered but also its opposite one. The resulting pairs of actioncombination for the ego and the conflicting vehicle are forwards simulated in order toobtain the individual utility of the ego and the conflicting vehicle for each scenario.The estimated utility for the conflicting vehicle is weighed with a cooperation coefficient.

Chapter 7 137

With this information, the utility of the ego vehicle is computed as the combination ofthe utility of both variants of the conflicting action, weighed by the probability of eachconflicting-action. The experiment results show how the introduction of the courtesybehavior on the ego vehicle would improve the average comfort and safety metrics,achieving a behavior that not only is beneficial for the traffic situation but also for theego vehicle.

The objective function is usually defined in terms of comfort and safety using asmeasurable, non-subjective parameters the acceleration, the time-to-collision and theintervehicular times. But the question that arises is if these parameters are actuallyrepresentative of the perception of the vehicle occupants. Chapter 5 explores thisquestion based on the results of a user study on a dynamic driver simulator with 65participants. The study evaluates the influence factors of the automated function whendriving towards an exit-ramp. An influence of the attentiveness level of the occupanton the perceived drive is observed within the study results. Also an interdependency ofthe adequacy perception of the automated driving function between the lane selectionand the current traffic-flow is observed. In general, sooner lane changes were betteraccepted when the traffic-flow is synchronized than for the free-flow.

The adequate lane selection influences the long-term reward. Indeed, during the studypresented in Chapter 5, the lane selection was perceived as a relevant parameter forthe perceived adequacy. The balance and consideration of the long-term objectivestogether with the mid-term objectives and the safety constraints is treated in Chap-ter 6, where a Reinforcement Learning based system for autonomous driving witha high level scene understanding is proposed. In particular, the presented methodintegrates learned policies within a planning system that ensures safe driving: theplanner assess the current available actions and their mid and long rewards. Firstly,the precondition analysis presented in Chapter 2 provides the available ego-actionsthat can be accomplished with the safety constraints. Then, the different actions tocontinue the driving in the current lane and to change into the left and right lane aresimulated, obtaining accurate estimations of the scene evolution during the short andmid-term through the forward simulation of the different available actions as presentedin Chapters 3 and 4. The simulation of the scene evolution can provide good resultsfor short and mid horizon, but the results are more uncertain for further time horizons.For this reason, the simulation is computed until a non-transient state is reached withinthe mid-term. After that, the expected reward of the resulting state is assessed witha Q-tabular RL method using the information of former experiences. The resultingreward for each available ego action is the combination of the expected reward of theaction during the mid horizon, obtained in the simulation and the expected reward ofthe resulting state at the end of the simulation obtained in former episodes. In orderto explore new actions, the planner can select the action with an ε-greedy strategy

138 7.1 Limitations and Outlook

instead of selecting the action with the maximal expected reward. Once the final stateis reached, the new values are integrated into the tabular reward matrix. In this way,the decision is optimized based on previous experiences (how adequate it was to bein each lane for a given velocity and traffic conditions in former situations) while amodel-based planning allows to integrate prior knowledge of the system dynamics(accurate estimations of the scene evolution during the short and mid horizon forthe different available actions). On the example of lane selection driving towards anexit ramp, the results show how this adaptive method outperfoms a ruled-based strategy.

To summarize, this works presents a planning framework for the driving strategy ofautomated vehicles in highways that integrates safety constraints, long and mid-termsobjectives, provides courtesy behavior and allows to integrate former experiences inorder to improve the performance. The different abstraction levels and model assump-tions allow a real-time trade-off between computational burden and the explorationof different scenes evolution. The system presented in this work is not only tested onsimulations but also in vehicle prototypes driving on the test road and in a dynamicsimulator.

7.1 Limitations and Outlook

This work and its results focused on automated driving for highway scenarios. Somesimplifications of the real world like the lane-based description are assumed to achievethe trade off between fast planning computation and the assessment of the multipleavailable options. The planning domain is based on the premise of an structuredtopology (the lanes) and a reduced set of actions: keepLane and changeLaneRight/Left.This model is expandable to other kind of scenarios thanks to the semantic-continuousrepresentation. In complex intersections and unstructured sections, other operators ascrossIntersection or enter/exitRoundabout could be provided. The safety guaranteesare provided in this work by the preconditions, that only allow to choose those actionswhich are compliant with the defined safety constraints. Other works, like thosepresented by Pek et al. [116] propose an online verification of the trajectories. Theinclusion of the online verification or other safety guards at the end of the planningprocess could provide a redundant safety proof to the system, gaining robustnessagainst unexpected failures.

This Thesis presented a method for a lane change that assumes the existence of anavailable space or gap. Nevertheless, in many traffic situations, the gap does not existor is not big enough and the vehicle is not able to change lane, especially in situationswith dense traffic or traffic jams. In order to solve those situation, the autonomous

Chapter 7 139

vehicle should be able to communicate its intention to the other agents in order toopen a gap. The work conducted by Kauffmann [73, 75] outlines the importanceof transmitting and interpreting the intentions of other drivers in order to preventnegative external perception of the automated vehicle during interactions. Furthermore,Traiber [78] estimated based on traffic simulations that the politeness factor allowinglane changes could increase the mean speed by 20km/h. The presented method inthis Thesis provides the ego vehicle with courtesy behavior. A good enhancement fordense or urban traffic would be the integration of the interaction strategies proposed byHubmann et al. [68] and the communication strategies presented by Kauffmann [73, 75].Providing politeness into the driving strategy and developing a good interaction strategywith the surrounding traffic will be crucial for the acceptance of automated vehicles inmixed traffic.

The results of the study driven on the dynamic driving simulator show how useracceptance presents a high dependency on the traffic flow, but also depends on theattentiveness level. A significant dispersion of the driving preferences, the adequacyand the comfort perception was observed between the different participants. Individu-alization and personalization could improve the positive perception of the system. Thepossibility of learning from the driver as proposed by Kuderer [84] could be technicallypossible, but this should be treated with care as not every driver is a good driver. Forexample, the work of Taieb-Maimon et al. [151] showed that a significant number ofdrivers maintained a time headway under their braking reaction time. The authorsproposed two possible explanations, either people were not accurately aware of theirbraking abilities when they adjust their headways to the vehicle in front of them orthey assume the lead vehicle will not brake or slow suddenly.

The proposed methods to integrate former experiences with reinforcement learningor neural networks showed an improvement on the performance against rule basedstrategies. Future work should go in depth into the opportunities opened by thesemethods. The reinforcement learning approach presented in this work was reduced totabular learning, but non-discrete methods could perform and generalize better if theright amount of data is provided. Furthermore, the function updates could run offlineon a back-end system with much more available resources.

The experiments presented within this work were limited to exemplary use cases andvalidated in simulations, test on prototype vehicles and the dynamic simulator. In orderto test and validate the customer-ready automated vehicle function, a full verificationin real traffic is needed. Today, many customer vehicles are already equipped withadvanced environmental sensors and different driving assistance system, and thus, withthe ability of collecting data. A progressive development driven by this data collectionwould allow to integrate higher levels of automation progressively.

140 7.1 Limitations and Outlook

The interest in autonomous driving increased during the last decade and generatedlot of expectation. Nevertheless the availability of products offering higher levelsof automation was announced and postponed by different manufacturers. The 2020Automotive Consumer Study by Deloitte [31] showed how the interest in autonomousvehicles stalled the last years and a significant percent of consumers agreed with theidea that autonomous vehicles would not be safe. In my opinion, two fields are crucialto acquire and improve the acceptance of the drivers. They have to comprehend thesystem and the safety has to be guaranteed. In order to facilitate the comprehensionof the system by normal users, the limits of the system and the assumptions made inorder to restrict the domain need to be transparent. The interaction with the systemalso needs to be perceived as familiar. This perception does not only refers to theoccupants of the vehicle but also to the surrounding traffic participants.

Automated systems represent a high advancement to improve road safety. Constantsupervision of the system can decrease incident rates and the safety systems in thevehicle can go even further. The integration of the environmental information alreadyavailable for comfort functions and active safety functions would be able to also improvethe classical systems of passive safety.

List of Figures

1.1 Automation levels according with the Society of Automotive Engi-neers [128]. Source SAE International Releases Updated Visual Chartfor Its Levels of Driving Automation Standard for Self-Driving Vehi-cles [127] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 The driving task on highway scenarios is approached as a planningproblem and described as a planning instance. . . . . . . . . . . . . . . 13

2.2 Exit-ramp represented as 2D abstraction (left) and 1D abstraction (right) 152.3 Fork intersection represented as 2D abstraction (left) and 1D abstraction

(right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Highway section with three lanes where a yellow vehicle (caryelow), a

white vehicle (carwhite), a blue vehicle (carblue) and a red vehicle (carred)are driving. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5 Three-Layer-Structure of the task of vehicle guidance task proposed byDonges [33]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Simplified process for sensing, planning and acting for autonomous driving. 222.7 The three different planning layers. . . . . . . . . . . . . . . . . . . . . 232.8 The maximal available force transmission to the road can be approxi-

mated as a friction circle, also known as the Kamm’s circle [115], wheremaximal force transmission is the resultant of the longitudinal andlateral forces and it is limited by the available road friction coefficient. 26

2.9 Lane relevance for other agents during lane change. . . . . . . . . . . . 282.10 Evolution of the ego vehicle during the Legal Limitations scenario. . . 302.11 Ego vehicle dynamics during the Legal Limitations scenario. . . . . . . 302.12 Evolution of the ego vehicle during the Sensor Failure scenario. . . . . 312.13 Ego vehicle dynamics during the Sensor Failure scenario . . . . . . . . 322.14 Ego vehicle dynamics during the Preconditions Lane Change scenario . 332.15 Evolution of the ego vehicle during the Preconditions Lane Change

scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1 In entrance ramp situations the vehicles driving on the entrance lanehave to achieve a mandatory lane change on a limited space and tomerge itself into the traffic flow. . . . . . . . . . . . . . . . . . . . . . . 40

141

3.2 Tactical Planner work-flow . . . . . . . . . . . . . . . . . . . . . . . . . 413.3 The ego vehicle (centre line, black and white) is performing a lane change

between the orange and the white vehicle on the left lane. The bluevehicle on the right lane is predicted to make a lane change into thecentre lane behind of the green vehicle. The ego vehicle’s velocity isgoing to be limited by three vehicles: the white one on the left lane, thegreen one on the centre lane and the incoming blue one on the right lane 44

3.4 Gap description and temporal evolution of a lane change of the egovehicle. The ego vehicle is the black and white one. The front gap ofthe ego lane is limited by is the orange truck in the front. The gap gapiof the goal lane is defined by the front red vehicle gapi : front and therear blue vehicle gapi : rear. . . . . . . . . . . . . . . . . . . . . . . . . 47

3.5 Pre-assessment ego vehicle velocity profile based on CV - CA with themaximal available comfort acceleration and deceleration values.. . . . . 48

3.6 Longitudinal Jerk, maximal TTC−1 and Lane Change Duration . . . . 563.7 Test Vehicle with dGPS . . . . . . . . . . . . . . . . . . . . . . . . . . 573.8 Evolution of the merging maneuver of the ego vehicle. A slower vehicle

(the red one) is driving in front. The ego vehicle has to complete thelane change before entering on the shoulder lane (grey zone). . . . . . . 58

3.9 Ego vehicle dynamics during the lane change maneuver . . . . . . . . . 59

4.1 On a merging scenario, incoming vehicles should select the appropriategap to merge and vehicles in the main flow can cooperate to facilitatethe maneuver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.2 Some possible scene evolution when a vehicle driving on the entranceramp intends to merge into the main lane. . . . . . . . . . . . . . . . . 65

4.3 Populated merging scenario. When optimizing the aggregate trafficutility, the ego vehicle (white-black vehicle driving in the middle lane),has to consider the utilities of the conflicting vehicle (blue one in themerging lane), and the directly affected vehicles, the rear vehicles of theinvolved lanes: green vehicle on the middle lane, truck on the merginglane and red vehicle on the left lane (due to the courtesy by lane change). 70

4.4 Simplified environment and vehicle control loop. This chapter focuseson the interaction between prediction and decision-making. . . . . . . . 71

4.5 Initial configuration for simulated experiments . . . . . . . . . . . . . . 734.6 The experiments of courtesy behavior were conducted with a BMW 7

series with serial sensors and a virtual triggered end of lane. Picturefrom BMW Communication. . . . . . . . . . . . . . . . . . . . . . . . . 75

4.7 Metrics for real-world experiments with and without the courtesy be-havior strategy. Measured values and re-simulated values . . . . . . . . 75

142

4.8 Experiments results in the real-world, for an active courtesy strategy:kinematic values of the ego and the conflicting vehicle. . . . . . . . . . 76

4.9 Experiments results in the real-world, for an active courtesy strategy:experiment sequence from ego perspective. . . . . . . . . . . . . . . . . 77

4.10 Scene evolution for the courtesy and for the egoistic approaches. . . . . 79

4.11 Evolution of the headway, time to collision, linear velocity and accelera-tion for the main vehicles involved on the maneuver. For the THW andTTC computation, the end of lane is included as a static vehicle at theend of merging lane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.12 Prediction of the scene evolution depending on the different cour-tesy ego actions αego = {CO-CCL, CO-KL} and the different actionsαcv ={merging-in-front, yield-the-right-of-way} for the conflicting vehi-cle, at time tinit=3.2s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.13 Prediction of the scene evolution for the non-courtesy keepLane egoaction αego = {NC-KL} and the different actions αcv ={merging-in-front, yield-the-right-of-way} for the conflicting vehicle, at time tinit=3.2s 82

4.14 Predicted dynamics of the main involved vehicles depending on thecourtesy and non-courtesy ego actions αego = {CO-CLL, CO-KL,NC-KL} and the different possible actions αcv ={merging-in-front, yield-the-right-of-way} for the conflicting vehicle, predicted at time tinit=3.2 s 83

5.1 Two participants during the study. The participant of the left was partof the Visual Surveillance Group and the participant of the right waspart of the Secondary Task group . . . . . . . . . . . . . . . . . . . . . 93

5.2 Dynamic Simulator of BMW in Munich used for the study. Photoprovided by BMW Group . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.3 Driving experience of the participants. The first question is relatedwith their manual driven experience. They were also asked about theirexperience with assistance systems of SAE-level 2: the adaptive cruisecontrol (ACC) and the Active Lane Keeping and Traffic Jam Assistant(ALC). Results are averaged over the answers of the 65 participants. . 100

5.4 Driving preferences of the participants measured in a Likert-scale of 5points. Results are averaged over the answers of 65 participants. . . . 100

5.5 Participants response about the timing adequacy of the lane change tothe right-most lane. The answers are shown for the two attention groups:the participants who were supervising the drive and the participantswho were distracted with a combined visual and cognitive distraction . 101

143

5.6 Participants response about their gap selection criteria for vehicle drivenby themselves and by an automated system. The answers are shownfor the two attention groups: the participants who were supervising thedrive and the participants who were distracted with a combined visualand cognitive distraction. . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.7 Participants response about the vehicle dynamics during the lane change.The answers are shown for the two attention groups: the participantswho were supervising the drive and the participants who were distractedwith a combined visual and cognitive distraction . . . . . . . . . . . . . 105

5.8 Participants response about their experience for each scenario with theautomated vehicle in therms of perceived safety, comfort, comprehen-sibility, conservatism and desirability. The answers are shown for thetwo attention groups: the participants who were supervising the driveand the participants who were distracted with a combined visual andcognitive distraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.9 Personal mindset of the participants after the study. Some statementswere adapted from the related literature (adap.) . . . . . . . . . . . . . 113

6.1 Vehicle driving towards an exit ramp - Image from Spider DrivingSimulator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.2 Simplified environment and vehicle control loop. . . . . . . . . . . . . . 1196.3 Weight coefficient values for the current lane velocity cost. . . . . . . . 1226.4 Monte-Carlo backup diagram . . . . . . . . . . . . . . . . . . . . . . . 1246.5 Dyn-n-updates backup diagram . . . . . . . . . . . . . . . . . . . . . . 1256.6 Example during different planning steps, visibility range (-100 m, +150

m). idR and idF depict respectively the rear and front virtual vehicleslocated at the limits of the visibility range. . . . . . . . . . . . . . . . . 129

6.7 Average Traffic Velocity and Density. . . . . . . . . . . . . . . . . . . . 1306.8 Simulation Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

144

List of Tables

3.1 IDM parametrization used within this work . . . . . . . . . . . . . . . 443.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.1 Ego Vehicle Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.2 Conflicting Vehicle Actions . . . . . . . . . . . . . . . . . . . . . . . . . 664.3 Recall and precision values for both classifiers [101] . . . . . . . . . . . 684.4 Simulation Results for different strategies. Metrics averaged over 469

cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.5 Simulation Results depending on the rate time. Metrics averaged over

469 cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.6 P-values obtained for the real and resimulated experiments to analyse

the statistically significance of the results. . . . . . . . . . . . . . . . . 76

5.1 Cost Parameters evaluated on the study adapted from the benchmarkproposal by Althoff, Manzinger and Koshi et al. [5] . . . . . . . . . . . 94

5.2 Cost Parameters evaluated on the study adapted from Bahram et al. [11] 955.3 Cost Parameters for the Lane Adequacy . . . . . . . . . . . . . . . . . 955.4 Scenario variation related to the traffic flow . . . . . . . . . . . . . . . 975.5 Variation to trigger the lane change to the right lane . . . . . . . . . . 985.6 Summary of the cost term values for the different parameters considered

in the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.7 Demographic characteristics of the study participants . . . . . . . . . . 995.8 Results of the two-way ANOVA with repeated measures for Adequacy

of distance to exit ramp considering traffic flow and distance . . . . . . 1025.9 Results of the T-Test between both Attention Groups for the evaluation

of the distance to exit ramp when the lane change started. . . . . . . . 1035.10 Results of the T-Test for both Attention Groups for desired gap evalua-

tion for a manual or an automated drive. . . . . . . . . . . . . . . . . . 1045.11 Results of the T-Test comparing the response of the "mind-off" group

with the results of the "eyes-on" group for a manual or an automateddrive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.12 Results of the T-Test comparing the response of the "mind-off" groupwith the results of the "eyes-on" group regarding the vehicle dynamics. 106

145

5.13 Results of regression analysis for the relevant terms to assess the longi-tudinal and vehicle dynamics. . . . . . . . . . . . . . . . . . . . . . . . 107

5.14 Results of the T-Test comparing the response of the "mind-off" groupwith the results of the "eyes-on" group regarding the perceived safety,comfort, comprehensibility, conservatism and overall desirability. . . . . 109

5.15 Results of regression analysis for the relevant terms to assess safetyperception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.16 Results of regression analysis for the relevant terms to assess comfortperception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.17 Results of regression analysis for the relevant terms to assess compre-hensibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.18 Results of regression analysis for the relevant terms to assess conservatism.1125.19 Results of regression analysis for the relevant therms to assess desirability.112

6.1 Features description for state representation. . . . . . . . . . . . . . . . 1206.2 Metrics comparison for simulation results over different approaches. . . 131

146

Bibliography

[1] M. Aeberhard, S. Rauch, M. Bahram, G. Tanzmeister, J. Thomas, Y. Pilat,F. Homm, W. Huber, and N. Kaempchen, “Experience, results and lessons learnedfrom automated driving on germany’s highways,” IEEE Intelligent TransportationSystems Magazine, vol. 7, no. 1, pp. 42–57, Spring 2015.

[2] A. Agresti, Categorical data analysis. John Wiley & Sons, 2003, vol. 482.

[3] D. Althoff, M. Werling, N. Kaempchen, D. Wollherr, and M. Buss, “Lane-basedsafety assessment of road scenes using inevitable collision states,” in 2012 IEEEIntelligent Vehicles Symposium, June 2012, pp. 31–36.

[4] M. Althoff and J. M. Dolan, “Online verification of automated road vehiclesusing reachability analysis,” IEEE Transactions on Robotics, vol. 30, no. 4, pp.903–918, Aug 2014.

[5] M. Althoff, M. Koschi, and S. Manzinger, “Commonroad: Composable bench-marks for motion planning on roads,” in 2017 IEEE Intelligent Vehicles Sympo-sium (IV). IEEE, 2017, pp. 719–726.

[6] M. Ardelt, C. Coester, and N. Kaempchen, “Highly automated driving on freewaysin real traffic using a probabilistic framework,” IEEE Transactions on IntelligentTransportation Systems, vol. 13, no. 4, pp. 1576–1585, Dec 2012.

[7] M. Ardelt, P. Waldmann, N. Kaempchen, and F. Homm, “Strategic decision-making process in advanced driver assistance systems,” in Advances in AutomotiveControl, 2010, pp. 566–571.

[8] S. Arndt, Evaluierung der Akzeptanz von Fahrerassistenzsystemen. Springer,2011.

[9] C. Bäckström, “Computational aspects of reordering plans,” Journal of ArtificialIntelligence Research, vol. 9, pp. 99–137, 1998.

[10] M. Bahram, A. Lawitzky, J. Friedrichs, M. Aeberhard, and D. Wollherr, “Agame-theoretic approach to replanning-aware interactive scene prediction and

147

planning,” IEEE Transactions on Vehicular Technology, vol. PP, no. 99, pp. 1–1,2016.

[11] M. Bahram, A. Wolf, M. Aeberhard, and D. Wollherr, “A prediction-basedreactive driving strategy for highly automated driving function on freeways,” inIntelligent Vehicles Symposium Proceedings, 2014 IEEE, June 2014, pp. 400–406.

[12] H. Bai, S. Cai, N. Ye, D. Hsu, and W. S. Lee, “Intention-aware online pomdpplanning for autonomous driving in a crowd,” in 2015 IEEE InternationalConference on Robotics and Automation (ICRA), May 2015, pp. 454–460.

[13] J. Barreiro, M. Boyce, M. Do, J. Frank, M. Iatauro, T. Kichkaylo, P. Morris,J. Ong, E. Remolina, T. Smith et al., “Europa: a platform for ai planning, schedul-ing, constraint programming, and optimization,” 4th International Competitionon Knowledge Engineering for Planning and Scheduling (ICKEPS), 2012.

[14] C. Basu, Q. Yang, D. Hungerman, M. Sinahal, and A. D. Draqan, “Do you wantyour autonomous car to drive like you?” in 2017 12th ACM/IEEE InternationalConference on Human-Robot Interaction (HRI. IEEE, 2017, pp. 417–425.

[15] H. Bellem, M. Klüver, M. Schrauf, H.-P. Schöner, H. Hecht, and J. F. Krems,“Can we study autonomous driving comfort in moving-base driving simulators? avalidation study,” Human Factors, vol. 59, no. 3, pp. 442–456, 2017, pMID:28005453. [Online]. Available: https://doi.org/10.1177/0018720816682647

[16] H. Bellem, T. Schönenberg, J. F. Krems, and M. Schrauf, “Objectivemetrics of comfort: Developing a driving style for highly automated vehicles,”Transportation Research Part F: Traffic Psychology and Behaviour, vol. 41, pp.45 – 54, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S136984781630064X

[17] A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos, “The explicit linearquadratic regulator for constrained systems,” Automatica, vol. 38, no. 1, pp. 3–20,2002.

[18] M. Bennewitz, W. Burgard, G. Cielniak, and S. Thrun, “Learning motion patternsof people for compliant robot motion,” Internationl Journal of Robotics Research,vol. 24, pp. 31–48, 2005.

[19] J. Borenstein and Y. Koren, “The vector field histogram - fast obstacle avoidancefor mobile robots,” IEEE Journal of Robotics and Automation, June 1991.

[20] J. Bortz and N. Döring, Forschungsmethoden und Evaluation für Human-undSozialwissenschaftler: Limitierte Sonderausgabe. Springer-Verlag, 2007.

148

[21] M. Bouton, A. Cosgun, and M. J. Kochenderfer, “Belief state planning forautonomously navigating urban intersections,” in Intelligent Vehicles Symposium(IV), 2017 IEEE. IEEE, 2017, pp. 825–830.

[22] C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlf-shagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of montecarlo tree search methods,” IEEE Transactions on Computational Intelligenceand AI in games, vol. 4, no. 1, pp. 1–43, 2012.

[23] A. Carvalho, A. Williams, S. Lefèvre, and F. Borrelli, “Autonomous cruise controlwith cut-in target vehicle detection,” in Int. Symposium on Advanced VehicleControl (AVEC), 2016.

[24] A. R. Cassandra, L. P. Kaelbling, and M. L. Littman, “Acting optimally inpartially observable stochastic domains,” in AAAI, vol. 94, 1994, pp. 1023–1028.

[25] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learningaffordance for direct perception in autonomous driving,” in Proceedings of the2015 IEEE International Conference on Computer Vision (ICCV), ser. ICCV ’15.Washington, DC, USA: IEEE Computer Society, 2015, pp. 2722–2730. [Online].Available: http://dx.doi.org/10.1109/ICCV.2015.312

[26] J. K. Choi and Y. G. Ji, “Investigating the importance of trust on adopting anautonomous vehicle,” International Journal of Human-Computer Interaction,vol. 31, no. 10, pp. 692–702, 2015.

[27] C. Choudhury, V. Ramanujam, and M. Ben-Akiva, “Modeling accelerationdecisions for freeway merges,” Transportation Research Record: Journal of theTransportation Research Board, no. 2124, pp. 45–57, 2009.

[28] F. Christen and Q. Huang, Das Fahrermodell im Verkehrs-flusssimulationsprogramm PELOPS : Modellierung und Applikations-möglichkeiten, ser. Fortschritt-Berichte VDI : Reihe 22, Mensch-Maschine-Systeme. Düsseldorf: VDI -Verl., 2008, vol. 28. [Online]. Available:http://publications.rwth-aachen.de/record/98429

[29] E. Commission, “Annual accident report,” European Commission, DirectorateGeneral for Transport„ Tech. Rep., Jun. 2017. [Online]. Available:https://ec.europa.eu/transport/road_safety/sites/roadsafety/files/

[30] A. G. Cunningham, E. Galceran, R. M. Eustice, and E. Olson, “Mpdm: Mul-tipolicy decision-making in dynamic, uncertain environments for autonomousdriving,” in 2015 IEEE International Conference on Robotics and Automation(ICRA), May 2015, pp. 1670–1677.

149

[31] Deloitte, “2020 global automotive consumer study, are consumers readyfor disruptive automotive technology?” Deloitte, Tech. Rep., Nov. 2020.[Online]. Available: https://www2.deloitte.com/us/en/pages/manufacturing/articles/automotive-trends-millennials-consumer-study.html

[32] N. Dillen, M. Ilievski, E. Law, L. E. Nacke, K. Czarnecki, and O. Schneider, “Keepcalm and ride along: Passenger comfort and anxiety as physiological responsesto autonomous driving styles,” in Proceedings of the 2020 CHI Conference onHuman Factors in Computing Systems, 2020, pp. 1–13.

[33] E. Donges, “Aspekte der aktiven sicherheit bei der führung von personenkraftwa-gen,” Automob-Ind, vol. 27, no. 2, 1982.

[34] ——, “A conceptual framework for active safety in road traffic,” Vehicle SystemDynamics, vol. 32, no. 2-3, pp. 113–128, 1999.

[35] C. Dornhege, “Task planning for high-level robot control,” Ph.D. dissertation,University of Freiburg, 2015, https://www.freidok.uni-freiburg.de/data/10122.

[36] C. Dornhege, P. Eyerich, T. Keller, M. Brenner, and B. Nebel, “Integratingtask and motion planning using semantic attachments,” in Workshops at theTwenty-Fourth AAAI Conference on Artificial Intelligence, 2010.

[37] C. Dornhege, P. Eyerich, T. Keller, S. Trüg, M. Brenner, and B. Nebel, “Semanticattachments for domain-independent planning systems,” in in Proceedings ofICAPS, 2009, 2009.

[38] M. Düring, K. Franke, R. Balaghiasefi, M. Gonter, M. Belkner, and K. Lemmer,“Adaptive cooperative maneuver planning algorithm for conflict resolution indiverse traffic situations,” in 2014 International Conference on Connected Vehiclesand Expo (ICCVE), Nov 2014, pp. 242–249.

[39] M. Elbanhawi, M. Simic, and R. Jazar, “In the passenger seat: investigating ridecomfort measures in autonomous cars,” IEEE Intelligent Transportation SystemsMagazine, vol. 7, no. 3, pp. 4–17, 2015.

[40] J. Engström, E. Johansson, and J. Östlund, “Effects of visual and cognitive loadin real and simulated motorway driving,” Transportation research part F: trafficpsychology and behaviour, vol. 8, no. 2, pp. 97–120, 2005.

[41] D. Ferguson, T. M. Howard, and M. Likhachev, “Motion planning in urbanenvironments: Part i,” in 2008 IEEE/RSJ International Conference on IntelligentRobots and Systems, Sept 2008, pp. 1063–1069.

150

[42] D. Ferguson, T. Howard, and M. Likhachev, “Motion planning in urban environ-ments: Part ii,” in Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJInternational Conference on, Sept 2008, pp. 1070–1076.

[43] M. Festner, A. Eicher, and D. Schramm, “Beeinflussung der komfort- und sicher-heitswahrnehmung beim hochautomatisierten fahren durch fahrfremde tätigkeitenund spurwechseldynamik.” in Uni-DAS 11. Workshop Fahrerassistenzsysteme undautomatisiertes Fahren, Walting im Altmühltal, 2017.

[44] R. Fikes and N. J. Nilsson, “Strips: A new approach to the application of theoremproving to problem solving,” Artificial Intelligence, vol. 2, pp. 189–208, 12 1971.

[45] C. Flores, V. Milanés, J. Pérez, D. González, and F. Nashashibi, “Optimal energyconsumption algorithm based on speed reference generation for urban electricvehicles,” in Intelligent Vehicles Symposium (IV), 2015 IEEE. IEEE, 2015, pp.730–735.

[46] M. Fox and D. Long, “Modelling mixed discrete-continuous domains for planning.”J. Artif. Intell. Res.(JAIR), vol. 27, pp. 235–297, 2006.

[47] J. Frank and A. Jónsson, “Constraint-based attribute and interval planning,”Constraints, vol. 8, no. 4, pp. 339–364, 2003.

[48] S. Franklin and A. Graesser, “Is it an agent, or just a program?: Ataxonomy for autonomous agents,” in Proceedings of the Workshop on IntelligentAgents III, Agent Theories, Architectures, and Languages, ser. ECAI ’96.London, UK, UK: Springer-Verlag, 1997, pp. 21–35. [Online]. Available:http://dl.acm.org/citation.cfm?id=648203.749270

[49] E. Frazzoli, M. A. Dahleh, and E. Feron, “Maneuver-based motion planningfor nonlinear systems with symmetries,” IEEE transactions on robotics, vol. 21,no. 6, pp. 1077–1091, 2005.

[50] Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,” Journal-Japanese Society For Artificial Intelligence, vol. 14, no. 771-780, p. 1612, 1999.

[51] J. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: aStatistical View of Boosting,” The Annals of Statistics, vol. 38, no. 2, 2000.

[52] E. Galceran, A. G. Cunningham, R. M. Eustice, and E. Olson, “Multipolicydecision-making for autonomous driving via changepoint-based behavior predic-tion.” in Robotics: Science and Systems, vol. 1, no. 2, 2015.

151

[53] J. García and F. Fernández, “A comprehensive survey on safe reinforcementlearning,” Journal of Machine Learning Research, vol. 16, pp. 1437–1480, 2015.[Online]. Available: http://jmlr.org/papers/v16/garcia15a.html

[54] S. Geyer, M. Baltzer, B. Franz, S. Hakuli, M. Kauer, M. Kienle, S. Meier,T. Weißgerber, K. Bengler, R. Bruder et al., “Concept and development of aunified ontology for generating test and use-case catalogues for assisted andautomated vehicle guidance,” IET Intelligent Transport Systems, vol. 8, no. 3,pp. 183–189, 2013.

[55] T. D. Gillespie, Fundamentals of vehicle dynamics. Society of automotiveengineers Warrendale, PA, 1992, vol. 400.

[56] S. Glaser, B. Vanholme, S. Mammar, D. Gruyer, and L. Nouveliere, “Maneuver-based trajectory planning for highly autonomous vehicles on real road withtraffic and driver interaction,” IEEE Transactions on Intelligent TransportationSystems, vol. 11, no. 3, pp. 589–606, 2010.

[57] C. Gold, M. Körber, D. Lechner, and K. Bengler, “Taking over control fromhighly automated vehicles in complex traffic situations: The role of trafficdensity,” Human Factors, vol. 58, no. 4, pp. 642–652, 2016, pMID: 26984515.[Online]. Available: https://doi.org/10.1177/0018720816634226

[58] J. Greenberg, L. Tijerina, R. Curry, B. Artz, L. Cathey, D. Kochhar, K. Kozak,M. Blommer, and P. Grant, “Driver distraction: Evaluation with event detectionparadigm,” Transportation Research Record, vol. 1843, no. 1, pp. 1–9, 2003.

[59] T. Gu, J. Snider, J. M. Dolan, and J. w. Lee, “Focused trajectory planning forautonomous on-road driving,” in 2013 IEEE Intelligent Vehicles Symposium (IV),June 2013, pp. 547–552.

[60] T. Gu and J. M. Dolan, “On-road motion planning for autonomous vehicles,” inInternational Conference on Intelligent Robotics and Applications. Springer,2012, pp. 588–597.

[61] F. L. Hall, “Traffic stream characteristics,” Traffic Flow Theory. US FederalHighway Administration, vol. 36, 1996.

[62] J. L. Harbluk, Y. I. Noy, P. L. Trbovich, and M. Eizenman, “An on-roadassessment of cognitive distraction: Impacts on drivers’ visual behavior andbraking performance,” Accident Analysis & Prevention, vol. 39, no. 2, pp. 372–379, 2007.

152

[63] C. D. Harper, C. T. Hendrickson, and C. Samaras, “Cost and benefit estimates ofpartially-automated vehicle collision avoidance technologies,” Accident Analysis& Prevention, vol. 95, pp. 104–115, 2016.

[64] H. Hentschke and M. C. Stüttgen, “Computation of measures of effect size forneuroscience data sets,” European Journal of Neuroscience, vol. 34, no. 12, pp.1887–1894, 2011.

[65] F. Herzberg, B. Mausner, and B. Snyderman, “The motivation to work,” 1959.

[66] T. Hester and P. Stone, “Texplore: real-time sample-efficient reinforcementlearning for robots,” Machine learning, vol. 90, no. 3, pp. 385–429, 2013.

[67] L. Hobert, A. Festag, I. Llatser, L. Altomare, F. Visintainer, and A. Kovacs,“Enhancements of v2x communication in support of cooperative autonomousdriving,” IEEE Communications Magazine, vol. 53, no. 12, pp. 64–70, Dec 2015.

[68] C. Hubmann, J. Schulz, M. Becker, D. Althoff, and C. Stiller, “Automated drivingin uncertain environments: Planning with interaction and uncertain maneuverprediction,” IEEE Transactions on Intelligent Vehicles, vol. 3, no. 1, pp. 5–17,March 2018.

[69] A. Ivanco, “Fleet analysis of headway distance for autonomous driving,”Journal of Safety Research, vol. 63, pp. 145 – 148, 2017. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S002243751730018X

[70] J.-Y. Jian, A. M. Bisantz, and C. G. Drury, “Foundations for an empirically de-termined scale of trust in automated systems,” International Journal of CognitiveErgonomics, vol. 4, no. 1, pp. 53–71, 2000.

[71] D. B. Kaber, Y. Liang, Y. Zhang, M. L. Rogers, and S. Gangakhedkar, “Driverperformance effects of simultaneous visual and cognitive distraction and adapta-tion behavior,” Transportation research part F: traffic psychology and behaviour,vol. 15, no. 5, pp. 491–501, 2012.

[72] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting inpartially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1-2,pp. 99–134, 1998.

[73] N. Kauffmann, F. Naujoks, F. Winkler, and W. Kunde, “Learning the languageöfroad users-how shall a self-driving car convey its intention to cooperate to otherhuman drivers?” in International Conference on Applied Human Factors andErgonomics. Springer, 2017, pp. 53–63.

153

[74] N. Kauffmann, F. Winkler, F. Naujoks, and M. Vollrath, “What makes a cooper-ative driver? identifying parameters of implicit and explicit forms of communica-tion in a lane change scenario,” Transportation research part F: traffic psychologyand behaviour, vol. 58, pp. 1031–1042, 2018.

[75] N. Kauffmann, F. Winkler, and M. Vollrath, “What makes an automated vehiclea good driver?” in Proceedings of the 2018 CHI Conference on Human Factorsin Computing Systems, 2018, pp. 1–9.

[76] B. S. Kerner, The physics of traffic: empirical freeway pattern features, engineer-ing applications, and theory. Springer, 2012.

[77] B. S. Kerner and H. Rehborn, “Experimental properties of complexity in trafficflow,” Physical Review E, vol. 53, no. 5, p. R4275, 1996.

[78] A. Kesting, M. Treiber, and D. Helbing, “General lane-changing model mobil forcar-following models,” Transportation Research Record, vol. 1999, pp. 86–94, 012007.

[79] T. Kobayashi, T. Ikeda, Y. O. Kato, A. Utsumi, I. Nagasawa, and S. Iwaki,“Evaluation of mental stress in automated following driving,” in 2018 3rd Inter-national Conference on Robotics and Automation Engineering (ICRAE), 2018,pp. 131–135.

[80] A. Koenig, M. Gutbrod, S. Hohmann, and J. Ludwig, “Bridging the gapbetween open loop tests and statistical validation for highly automated driving,”SAE Int. J. Trans. Safety, vol. 5, pp. 81–87, 03 2017. [Online]. Available:https://doi.org/10.4271/2017-01-1403

[81] S. Koenig and M. Likhachev, “Fast replanning for navigation in unknown terrain,”IEEE Transactions on Robotics, vol. 21, no. 3, pp. 354–363, June 2005.

[82] S. Koenig, M. Likhachev, and D. Furcy, “Lifelong planning astar,” ArtificialIntelligence, vol. 155, no. 1, pp. 93 – 146, 2004. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S000437020300225X

[83] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Sociallycompliant mobile robot navigation via inverse reinforcement learning,” TheInternational Journal of Robotics Research, 2016. [Online]. Available: http://ais.informatik.uni-freiburg.de/publications/papers/kretzschmar16ijrr.pdf

[84] M. Kuderer, S. Gulati, and W. Burgard, “Learning driving styles for autonomousvehicles from demonstration,” in 2015 IEEE International Conference on Roboticsand Automation (ICRA), May 2015, pp. 2641–2646.

154

[85] M. Kuderer, H. Kretzschmar, and W. Burgard, “Teaching mobile robots tocooperatively navigate in populated environments,” in Int. Conf. on IntelligentRobots and Systems (IROS), 2013.

[86] B. Kuipers, “The spatial semantic hierarchy,” Artificial intelligence, vol. 119, no.1-2, pp. 191–233, 2000.

[87] J.-C. Latombe, Robot Motion Planning. Norwell, MA, USA: Kluwer AcademicPublishers, 1991.

[88] S. M. LaValle, Planning Algorithms. Cambridge, U.K.: Cambridge UniversityPress, 2006, available at http://planning.cs.uiuc.edu/.

[89] S. Lefevre, A. Carvalho, and F. Borrelli, “A learning-based framework for velocitycontrol in autonomous driving,” IEEE Transactions on Automation Science andEngineering, vol. 13, no. 1, pp. 32–42, 2015.

[90] S. Lefévre, D. Vasquez, and C. Laugier, “A survey on motion prediction and riskassessment for intelligent vehicles,” ROBOMECH Journal, vol. 1, no. 1, pp.1–14, 2014. [Online]. Available: http://dx.doi.org/10.1186/s40648-014-0001-z

[91] M. Leonetti, L. Iocchi, and P. Stone, “A synthesis of automated planningand reinforcement learning for efficient, robust decision-making,” ArtificialIntelligence, vol. 241, pp. 103 – 130, September 2016. [Online]. Available:http://www.cs.utexas.edu/users/ai-lab?leonetti:aij16

[92] T. R. Levine and C. R. Hullett, “Eta squared, partial eta squared, and misreport-ing of effect size in communication research,” Human Communication Research,vol. 28, no. 4, pp. 612–625, 2002.

[93] B. Lewis-Evans, D. De Waard, and K. A. Brookhuis, “That’s close enough: Athreshold effect of time headway on the experience of risk, task difficulty, effort,and comfort,” Accident Analysis & Prevention, vol. 42, no. 6, pp. 1926–1933,2010.

[94] S. E. Li, H. Peng, K. Li, and J. Wang, “Minimum fuel control strategy inautomated car-following scenarios,” IEEE Transactions on Vehicular Technology,vol. 61, no. 3, pp. 998–1007, 2012.

[95] Q. Lin, S. Li, X. Ma, and G. Lu, “Understanding take-over performanceof high crash risk drivers during conditionally automated driving,” AccidentAnalysis & Prevention, vol. 143, p. 105543, 2020. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S0001457519312102

155

[96] K. Lu, S. Zhang, P. Stone, and X. Chen, “Robot representation and reasoningwith knowledge from reinforcement learning,” arXiv preprint arXiv:1809.11074,2018.

[97] P. R. MacNeilage, N. Ganesan, and D. E. Angelaki, “Computational approachesto spatial orientation: from transfer functions to dynamic bayesian inference,”Journal of neurophysiology, vol. 100, no. 6, pp. 2981–2996, 2008.

[98] D. Mcdermott, M. Ghallab, A. Howe, C. Knoblock, A. Ram, M. Veloso, D. Weld,and D. Wilkins, “Pddl - the planning domain definition language,” Yale Centerfor Computational Vision and Control, Tech. Rep. TR-98-003, 1998.

[99] D. V. McGehee, E. N. Mazzae, and G. S. Baldwin, “Driver reaction timein crash avoidance research: Validation of a driving simulator study ona test track,” Proceedings of the Human Factors and Ergonomics SocietyAnnual Meeting, vol. 44, no. 20, pp. 3–320–3–323, 2000. [Online]. Available:https://doi.org/10.1177/154193120004402026

[100] M. McNaughton, C. Urmson, J. M. Dolan, and J. Lee, “Motion planning forautonomous driving with a conformal spatiotemporal lattice,” in 2011 IEEEInternational Conference on Robotics and Automation, May 2011, pp. 4889–4895.

[101] C. Menéndez-Romero, M. Sezer, F. Winkler, C. Dornhege, and W. Burgard,“Courtesy behavior for highly automated vehicles on highway interchanges,” inIEEE Intelligent Vehicles Symposium (IV), 2018, pp. 943–948. [Online]. Available:http://ais.informatik.uni-freiburg.de/publications/papers/menendez18iv.pdf

[102] C. Menéndez-Romero, F. Winkler, C. Dornhege, and W. Burgard, “Maneuverplanning for highly automated vehicles,” in Intelligent Vehicles Symposium (IV),2017 IEEE. IEEE, 2017, pp. 1458–1464.

[103] ——, “Maneuver planning and learning: A lane selection approach for highlyautomated vehicles in highway scenarios,” in 2020 IEEE Intelligent TransportationSystems Conference (ITSC). IEEE, Sep. 2020.

[104] V. Milanes, J. Godoy, J. Villagra, and J. Perez, “Automated on-ramp merg-ing system for congested traffic situations,” IEEE Transactions on IntelligentTransportation Systems, vol. 12, no. 2, pp. 500–508, June 2011.

[105] C. Miller, C. Pek, and M. Althoff, “Efficient mixed-integer programming forlongitudinal and lateral motion planning of autonomous vehicles,” in Proc. ofthe IEEE Intelligent Vehicles Symposium, 2018.

156

[106] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare,A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-levelcontrol through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529,2015.

[107] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to linear regressionanalysis. John Wiley & Sons, 2012, vol. 821.

[108] P. Morere, R. Marchant, and F. Ramos, “Continuous state-action-observationpomdps for trajectory planning with bayesian optimisation,” IROS, 2018.

[109] M.Sezer, “Design and analysis of a cooperative driving strategy for highly au-tomated driving on freeways,” Master’s thesis, Technical University of Munich(TUM), Dec. 2016.

[110] M. Mukadam, A. Cosgun, A. Nakhaei, and K. Fujimura, “Tactical decisionmaking for lane changing with deep reinforcement learning,” in NIPS workshopon Machine Learning for Intelligent transportation Systems (MLITS), 2017.

[111] T. Müller, H. Hajek, L. Radić-Weißenfeld, and K. Bengler, “Can you feel thedifference? the just noticeable difference of longitudinal acceleration,” in Pro-ceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 57,no. 1. SAGE Publications Sage CA: Los Angeles, CA, 2013, pp. 1219–1223.

[112] T. A. Müller, “Ermittlung vestibulärer wahrnehmungsschwellen zur zielgerichtetengestaltung der fahrzeug-längsdynamik,” Ph.D. dissertation, Technische Univer-sität München, 2015.

[113] NHTSA, “2016 fatal motor vehicle crashes: Overview,” NHTSA‚s NationalCenter for Statistics and Analysis, Tech. Rep., Oct. 2017. [Online]. Available:https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812456

[114] N. J. Nilsson, P. E. Hart, and B. Raphael, “A formal basis for the heuristicdetermination of minimum cost paths,” IEEE Transactions on Systems, Science,and Cybernetics, vol. SSC-4, no. 2, pp. 100–107, 1968.

[115] H. Pacejka, Tire and vehicle dynamics. Elsevier, 2005.

[116] C. Pek, S. Manzinger, M. Koschi, and M. Althoff, “Using online verification to pre-vent autonomous vehicles from causing accidents,” Nature Machine Intelligence,vol. 2, no. 9, pp. 518–528, 2020.

[117] C. Pek, P. Zahn, and M. Althoff, “Verifying the safety of lane change maneuversof self-driving vehicles based on formalized traffic rules,” 2017 IEEE IntelligentVehicles Symposium (IV), pp. 1477–1483, 2017.

157

[118] E. N. C. A. Programme, “Assessment protocol - safety assist,” (Euro NCAP),Tech. Rep., Nov. 2018. [Online]. Available: https://cdn.euroncap.com/media/41771/euro-ncap-assessment-protocol-sa-v804.201811091311338276.pdf

[119] X. Qian, F. Altché, P. Bender, C. Stiller, and A. de La Fortelle, “Optimaltrajectory planning for autonomous driving integrating logical constraints: AnMIQP perspective,” in ITSC. IEEE, 2016, pp. 205–210.

[120] J. Rasmussen, “Skills, rules, and knowledge; signals, signs, and symbols, andother distinctions in human performance models,” IEEE transactions on systems,man, and cybernetics, no. 3, pp. 257–266, 1983.

[121] ——, “The role of hierarchical knowledge representation in decisionmaking andsystem management,” IEEE Transactions on systems, man, and cybernetics,no. 2, pp. 234–243, 1985.

[122] C. Rathgeber, F. Winkler, and S. Müller, “Kollisionsfreie längs- und quertrajek-torienplanung unter berücksichtigung fahrzeugspezifischer potenziale.” Automa-tisierungstechnik, vol. 64, no. 1, pp. 61–76, 2016.

[123] D. A. Redelmeier and R. J. Tibshirani, “Why cars in the next lane seem to gofaster,” Nature, vol. 401, no. 6748, pp. 35–35, 1999.

[124] ——, “Are those other drivers really going faster?” Chance, vol. 13, no. 3, pp.8–14, 2000.

[125] J. Reimpell, H. Stoll, and J. Betzler, The automotive chassis: engineeringprinciples. Elsevier, 2001.

[126] A. Richards, T. Schouwenaars, J. P. How, and E. Feron, “Spacecraft trajectoryplanning with avoidance constraints using mixed-integer linear programming,”Journal of Guidance, Control, and Dynamics, vol. 25, no. 4, pp. 755–764, 2002.

[127] SAE, “Sae international releases updated visual chart for itslevels of driving automation standard for self-driving vehicles,”Dec. 2018. [Online]. Available: https://www.sae.org/news/press-room/2018/12/sae-international-releases-updated-visual-chart-for-its-%E2%80%9Clevels-of-driving-automation%E2%80%9D-standard-for-self-driving-vehicles

[128] Taxonomy and Definitions for Terms Related to Driving Automation Systemsfor On-Road Motor Vehicles, SAE J 3016, SAE International Std.

[129] R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” Machine learning, vol. 37, no. 3, pp. 297–336, 1999.

158

[130] O. Scheel, L. A. Schwarz, N. Navab, and F. Tombari, “Situation assessmentfor planning lane changes: Combining recurrent models and prediction,” in2018 IEEE International Conference on Robotics and Automation, ICRA 2018,Brisbane, Australia, May 21-25, 2018. IEEE, 2018, pp. 2082–2088. [Online].Available: https://doi.org/10.1109/ICRA.2018.8460957

[131] M. Schönhof and D. Helbing, “Criticism of three-phase traffic theory,” Trans-portation Research Part B: Methodological, vol. 43, no. 7, pp. 784–797, 2009.

[132] T. Schouwenaars, B. D. Moor, E. Feron, and J. How, “Mixed integer programmingfor multi-vehicle path planning,” in 2001 European Control Conference (ECC),Sept 2001, pp. 2603–2608.

[133] T. Schouwenaars, É. Féron, and J. How, “Safe receding horizon path planningfor autonomous vehicles,” in Proceedings of the Annual Allerton Conference onCommunication Control and Computing, vol. 40, no. 1. The University; 1998,2002, pp. 295–304.

[134] Schurger, “Two-way repeated measures anova , matlab central fileexchange. retrieved october 26, 2020. .” MATLAB Central File Exchange.[Online]. Available: https://www.mathworks.com/matlabcentral/fileexchange/6874-two-way-repeated-measures-anova

[135] W. Schwarting, J. Alonso-Mora, L. Pauli, S. Karaman, and D. Rus, “Parallelautonomy in automated vehicles: Safe motion generation with minimal inter-vention,” in 2017 IEEE International Conference on Robotics and Automation(ICRA), May 2017, pp. 1928–1935.

[136] W. Schwarting and P. Pascheka, “Recursive conflict resolution for cooperative mo-tion planning in dynamic highway traffic,” in 17th International IEEE Conferenceon Intelligent Transportation Systems (ITSC), Oct 2014, pp. 1039–1044.

[137] S. Shalev-Shwartz, S. Shammah, and A. Shashua, “On a formal model of safeand scalable self-driving cars,” CoRR, vol. abs/1708.06374, 2017.

[138] S. Sharifzadeh, I. Chiotellis, R. Triebel, and D. Cremers, “Learning to drive usinginverse reinforcement learning and deep q-networks,” CoRR, vol. abs/1612.03653,2016. [Online]. Available: http://arxiv.org/abs/1612.03653

[139] F. W. Siebert, M. Oehl, and H.-R. Pfister, “The influence of time headway onsubjective driver states in adaptive cruise control,” Transportation ResearchPart F: Traffic Psychology and Behaviour, vol. 25, pp. 65 – 73, 2014. [Online].Available: http://www.sciencedirect.com/science/article/pii/S1369847814000710

159

[140] F. W. Siebert and F. L. Wallis, “How speed and visibility influence preferredheadway distances in highly automated driving,” Transportation Research PartF: Traffic Psychology and Behaviour, vol. 64, pp. 485 – 494, 2019. [Online].Available: http://www.sciencedirect.com/science/article/pii/S1369847819301287

[141] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche,J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Masteringthe game of go with deep neural networks and tree search,” nature, vol. 529, no.7587, p. 484, 2016.

[142] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot,L. Sifre, D. Kumaran, T. Graepel, T. P. Lillicrap, K. Simonyan, and D. Hassabis,“Mastering chess and shogi by self-play with a general reinforcement learningalgorithm,” CoRR, vol. abs/1712.01815, 2017.

[143] W. Song, G. Xiong, and H. Chen, “Intention-aware autonomous driving decision-making in an uncontrolled intersection,” Mathematical Problems in Engineering,vol. 2016, 2016.

[144] M. T. Spaan and N. Vlassis, “Perseus: Randomized point-based value iterationfor pomdps,” Journal of artificial intelligence research, vol. 24, pp. 195–220, 2005.

[145] M. Sridharan and B. Meadows, “An architecture for discovering affordances,causal laws, and executability conditions,” Advances in Cognitive Systems, vol. 5,pp. 1–16, 2017.

[146] A. Stentz, “Optimal and efficient path planning for partially-known environments,”in ICRA, vol. 94, 1994, pp. 3310–3317.

[147] Straßenverkehrs-Ordnung (StVO), “§7 benutzung von fahrstreifen durchkraftfahrzeuge,” 2013. [Online]. Available: https://www.gesetze-im-internet.de/stvo_2013/__7.html

[148] R. S. Sutton, “Integrated architectures for learning, planning, and reacting basedon approximating dynamic programming,” in In Proceedings of the SeventhInternational Conference on Machine Learning. Morgan Kaufmann, 1990, pp.216–224.

[149] ——, “Dyna, an integrated architecture for learning, planning, and reacting,”ACM Sigart Bulletin, vol. 2, no. 4, pp. 160–163, 1991.

[150] R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1st ed.Cambridge, MA, USA: MIT Press, 1998.

160

[151] M. Taieb-Maimon and D. Shinar, “Minimum and comfortable driving headways:Reality versus perception,” Human factors, vol. 43, no. 1, pp. 159–172, 2001.

[152] G. Tanzmeister, M. Friedl, A. Lawitzky, D. Wollherr, and M. Buss, “Road courseestimation in unknown, structured environments,” IEEE Intelligent VehiclesSymposium, 2013.

[153] G. Tanzmeister, D. Wollherr, and M. Buss, “Environment-based trajectoryclustering to extract principal directions for autonomous vehicles,” IROS 2014,2014.

[154] ——, “Grid-based multi-road-course estimation using motion planning,” IEEETRANSACTIONS ON VEHICULAR TECHNOLOGY, 2015.

[155] S. Thrun, “Monte carlo pomdps,” in Advances in neural information processingsystems, 2000, pp. 1064–1070.

[156] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics (Intelligent Roboticsand Autonomous Agents). The MIT Press, 2005.

[157] S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens, A. Aron, J. Diebel, P. Fong,J. Gale, M. Halpenny, G. Hoffmann et al., “Stanley: The robot that won thedarpa grand challenge,” Journal of field Robotics, vol. 23, no. 9, pp. 661–692,2006.

[158] T.Nader, “Komfort- und sicherheitswahrnehmung bei eine autobahnabfahrt,”Master’s thesis, Hochschule für angewandte Wissenschaften München, Jan. 2018.

[159] M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empiricalobservations and microscopic simulations,” Phys. Rev. E, vol. 62, pp. 1805–1824,Aug 2000. [Online]. Available: http://link.aps.org/doi/10.1103/PhysRevE.62.1805

[160] M. Treiber and A. Kesting, “Traffic flow dynamics,” Traffic Flow Dynamics:Data, Models and Simulation, Springer-Verlag Berlin Heidelberg, 2013.

[161] S. Ulbrich, T. Menzel, A. Reschka, F. Schuldt, and M. Maurer, “Defining andsubstantiating the terms scene, situation, and scenario for automated driving,”in Intelligent Transportation Systems (ITSC), 2015 IEEE 18th InternationalConference on. IEEE, 2015, pp. 982–988.

[162] C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M. Clark, J. Dolan,D. Duggins, T. Galatali, C. Geyer et al., “Autonomous driving in urban environ-ments: Boss and the urban challenge,” Journal of Field Robotics, vol. 25, no. 8,pp. 425–466, 2008.

161

[163] C. Vallon, Z. Ercan, A. Carvalho, and F. Borrelli, “A machine learning approachfor personalized autonomous lane change initiation and control,” in IntelligentVehicles Symposium (IV), 2017 IEEE. IEEE, 2017, pp. 1590–1595.

[164] B. Vanholme, D. Gruyer, S. Glaser, and S. Mammar, “A legal safety conceptfor highly automated driving on highways,” in 2011 IEEE Intelligent VehiclesSymposium (IV), June 2011, pp. 563–570.

[165] M. Waters, B. Nebel, L. Padgham, and S. Sardiña, “Plan relaxation via actiondebinding and deordering,” in ICAPS. AAAI Press, 2018, pp. 278–287.

[166] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3-4, pp.279–292, 1992.

[167] J. Wei, J. M. Dolan, and B. Litkouhi, “Autonomous vehicle social behaviorfor highway entrance ramp management,” in 2013 IEEE Intelligent VehiclesSymposium (IV), June 2013, pp. 201–207.

[168] M. Werling and D. Liccardo, “Automatic collision avoidance using model-predictive online optimization,” in 2012 IEEE 51st IEEE Conference on Decisionand Control (CDC), Dec 2012, pp. 6309–6314.

[169] R. Wiedemann, “Simulation des strassenverkehrsflusses,” Ph.D. dissertation,Schriftenreihe des Instituts für Verkehrswesen der Universität Karlsruhe, Band8., Karlsruhe., 1974.

[170] M. Wiering and J. Schmidhuber, “Hq-learning,” Adaptive Behavior, vol. 6, no. 2,pp. 219–246, 1997.

[171] M. T. Wolf and J. W. Burdick, “Artificial potential functions for highway drivingwith collision avoidance,” 2008 IEEE International Conference on Robotics andAutomation, Pasadena, CA, USA,, May 19-23 2008.

[172] Y. Wu, Y. Wu, A. Tamar, S. Russell, G. Gkioxari, and Y. Tian, “Learning andplanning with a semantic model,” arXiv preprint arXiv:1809.10842, 2018.

[173] W. Xu, J. Pan, J. Wei, and J. M. Dolan, “Motion planning under uncertaintyfor on-road autonomous driving,” in 2014 IEEE Int. Conf. on Robotics andAutomation (ICRA), May 2014, pp. 2507–2512.

[174] W. Xu, J. Wei, J. M. Dolan, H. Zhao, and H. Zha, “A real-time motion plannerwith trajectory optimization for autonomous vehicles,” in 2012 IEEE Interna-tional Conference on Robotics and Automation. IEEE, 2012, pp. 2061–2067.

162

[175] F. Yang, D. Lyu, B. Liu, and S. Gustafson, “Peorl: integrating symbolic planningand hierarchical reinforcement learning for robust decision-making,” in Proceed-ings of the 27th International Joint Conference on Artificial Intelligence, 2018,pp. 4860–4866.

[176] A. Yu, R. Palefsky-Smith, and R. Bedi, “Deep reinforcement learning for simulatedautonomous vehicle control,” Course Project Reports: Winter, pp. 1–7, 2016.

[177] L. Zhao, R. Ichise, Y. Sasaki, Z. Liu, and T. Yoshikawa, “Fast decision mak-ing using ontology-based knowledge base,” in 2016 IEEE Intelligent VehiclesSymposium (IV), June 2016, pp. 173–178.

163