Simulating Group Interactions through Machine Learning and ...

66
Simulating Group Interactions through Machine Learning and Human Perception Simulating Group Interactions through Machine Learning and Human Perception FANGKAI YANG Doctoral Thesis Stockholm, Sweden 2020

Transcript of Simulating Group Interactions through Machine Learning and ...

Simulating Group Interactionsthrough Machine Learning and Human Perception

Simulating Group Interactions through Machine Learning and Human Perception

FANGKAI YANG

Doctoral ThesisStockholm, Sweden 2020

TRITA-EECS-AVL-2021:3ISBN 978-91-7873-744-4

KTH School of Computer Science and CommunicationSE-100 44 Stockholm

SWEDEN

Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framläg-ges till offentlig granskning för avläggande av teknologie doktorsexamen i datalogimåndag den 25 januari 2021 kl. 10.00 i Visualiseringsstudion VIC, Kungl Tekniskahögskolan, Lindstedtsvägen 7, Stockholm.

© Fangkai Yang, November 2020

Tryck: Universitetsservice US AB

iii

Abstract

Human-Robot/Agent Interaction is well researched in many areas, butapproaches commonly either focus on dyadic interactions or crowd simula-tions. However, the intermediate structure between individuals and crowds,i.e., small groups, has been studied less. In small group situations, it is chal-lenging for mobile robots or agents to approach free-standing conversationalgroups in a socially acceptable manner. It requires the robot or agent to plantrajectories that avoid collisions with people and consider the perception ofgroup members to make them feel comfortable. Previous methods are mostlyprocedural with handcrafted features that limit the realism and adaptation ofthe simulation. In this thesis, Human-Robot/Agent Interaction is investigatedat multiple levels, including individuals, crowds, and small groups. Firstly,this thesis is an exploration of proxemics in dyadic interactions in virtualenvironments. It investigates the impact of various embodiments on humanperception and sensitivities. A related toolkit is developed as a foundation forsimulating virtual characters in the subsequent research. Secondly, this thesisextends proxemics to crowd simulation and trajectory prediction by proposingneighbor perception models. It then focuses on group interactions in whichrobots/agents approach small groups in order to join them. To address thechallenges above, novel procedural models based on social space and machinelearning models, including generative adversarial neural networks, state re-finement LSTM, reinforcement learning, and imitation learning, are proposedto generate approach behaviors. A novel dataset of full-body motion-capturedmarkers was also collected in order to support machine learning approaches.Finally, these methods are evaluated in scenarios involving humans, virtualagents, and physical robots.

iv

Sammanfattning

Mänsklig-robot / agentinteraktion är välforskning inom många områden,men de fokuserar vanligtvis antingen på individuell interaktion eller grupp-simuleringar, och länken, dvs. små grupper, mellan individer och folkmassorstuderas sällan. Det är utmanande för robotar eller agenter som navigerar föratt närma sig fristående konversationsgrupper ska bete sig på ett säkert ochsocialt acceptabelt sätt. Det kräver inte bara att roboten eller agenten plane-rar banor som undviker kollisioner med människor utan också tar hänsyn tillgruppmedlemmarnas känslor för att få dem att känna sig bekväma. Tidigaremetoder är mestadels procedurella med handgjorda funktioner som begrän-sar simuleringens realism och anpassning. I denna avhandling undersöker viMänsklig-robot / agentinteraktion i flera nivåer inklusive individ, folkmassoroch fokus på små grupper. För det första utforskar vi proxemics i individuel-la interaktioner under virtuella miljöer och utvecklar relaterade sociala speloch blandad verklighetsdemos för att upptäcka beteenden i virtuella interak-tioner. För det andra utökar vi forskningen inom proxemics till att omfattafolkmassimulering och förutsägelse av banor genom att föreslå perceptuel-la modeller. Då fokuserar vi på gruppinteraktionsbeteenden som robotar /agenter närmar sig för att gå med i små grupper. För att hantera utmaning-arna ovan föreslår vi nya procedurmodeller och maskininlärningsmodeller föratt generera närmande gruppbeteenden. Under processen att utveckla ma-skininlärningsmodeller samlar vi in en helkropps rörelsefångad dataset ochpresenterar användningsfall som använder den. Slutligen utvärderar vi våraföreslagna metoder med robotar och mänskliga grupper via gruppinteraktion.Uppsatsens höjdpunkt är utforskningen av gruppinteraktion med datadrivnatillvägagångssätt. Vårt arbete publiceras i ett antal välrenommerade konfe-renser och tidskrifter.

List of Publications

In this section, I list the publications discussed in this thesis, in chronological order.In all cases except for paper No.2 (I was the leading researcher) and paper No.7 (Iwas the lead researcher in this research, the first two authors contributed equallyto the paper), I was the leading author and the leading researcher and worked onall aspects of the paper, guided by appropriate supervision.

All the papers are peer-reviewed, and they are published in international con-ferences.

Papers included in this thesis

1. Fangkai Yang, Chengjie Li, Robin Palmberg, Ewoud Van Der Heide andChristopher Peters.“Expressive virtual characters for social demonstration games.”9th International Conference on Virtual Worlds and Games for Serious Ap-plications (VS-Games), (September 2017).To accelerate and better enable the use of virtual characters in social games,we present a virtual character behavior toolkit to develop expressive virtualcharacters. It is a middleware toolkit that sits on top of the game engine,focusing on providing high-level character behaviors to create social gamesquickly.

2. Christopher Peters, Fangkai Yang, Himangshu Saikia, Chengjie Li, GabrielSkantze.“Towards the use of mixed reality for hri design via virtual robots.”Proceedings of the 1st International Workshop on Virtual, Augmented, andMixed Reality for HRI (VAM-HRI), (2018).

In this paper, we present our ongoing work towards developing a mixed rea-lity platform for designing social interactions with robots through the use ofvirtual robots.

3. Fangkai Yang, Himangshu Saikia, Christopher Peters“Who are my neighbors? A perception model for selecting neighbors of pe-

v

vi

destrians in crowds.”Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA), (2018).

In this paper, we propose a novel approach for selecting neighbors of an agentby modeling its perception as a combination of a location and a locomotionmodel.

4. Fangkai Yang, Jack Shabo, Adam Qureshi, Christopher Peters“Do you see groups? The impact of crowd density and viewpoint on the per-ception of groups.”Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA), (2018).

Based on a real-time crowd simulator that has been implemented as a unila-teral incompressible fluid and augmented with group behaviors, a perceptualstudy was conducted in order to determine the impact of groups on the per-ception of the crowds at various densities from different camera views.

5. Fangkai Yang, Christopher Peters.“Social-aware navigation in crowds with static and dynamic groups.”11th International Conference on Virtual Worlds and Games for Serious Ap-plications (VS-Games), (2019).This paper presents social-aware navigation for crowds in serious games. Itenables a virtual character to join a group or navigate to a goal positionthrough a crowd with static and dynamic group formations in a safe andsocially acceptable way.

6. Fangkai Yang, and Christopher Peters.“App-LSTM: Data-driven generation of socially acceptable trajectories forapproaching small groups of agents.”Proceedings of the 7th International Conference on Human-Agent Interaction,(September 2019).In this paper, we propose a novel neural network App-LSTM to generate theapproach trajectory of an agent towards a small free-standing conversationalgroup of agents. The App-LSTM model is trained on a dataset of approachbehaviors towards the group. Since current publicly available datasets forthese encounters are limited, we develop a social-aware navigation methodas a basis for creating a semi-synthetic dataset composed of a mixture ofreal and simulated data representing safe and socially-acceptable approachtrajectories.

7. Fangkai Yang, and Christopher Peters.“AppGAN: Generative Adversarial Networks for Generating Robot Approach

OTHER RELEVANT PAPERS vii

Behaviors into Small Groups of People.”28th IEEE International Conference on Robot and Human Interactive Com-munication (RO-MAN), (October 2019).This paper proposes AppGAN, a novel trajectory prediction model capableof generating trajectories into free-standing conversational groups trained ona dataset of safe and socially acceptable paths.

8. Fangkai Yang, Wenjie Yin, Tetsunari Inamura, Björkman, Mårten, andChristopher Peters.“Group Behavior Recognition Using Attention-and Graph-Based Neural Networks.”24th European Conference on Artificial Intelligence (ECAI), (September 2020).The recognition and analysis of such socially complaint dynamic group be-haviors have rarely been studied in-depth and remain challenging in socialmulti-agent systems. This paper presents novel group behavior recognitionmodels, attention-based and graph-based, that consider behaviors on boththe individual and group levels.

9. Fangkai Yang, Wenjie Yin, Björkman, Mårten, and Christopher Peters.“Impact of Trajectory Generation Methods on Viewer Perception of RobotApproaching Group Behaviors.”29th IEEE International Symposium on Robot and Human Interactive Com-munication (RO-MAN), (October 2019).In this paper, we conducted an experiment to examine the impact of threetrajectory generation methods for a mobile robot to approach groups frommultiple directions: a Wizard-of-Oz (WoZ) method, a procedural social-awarenavigation model (PM), and a novel generative adversarial model imitatinghuman approach behaviors (IL).

Other relevant papers

1. Vanya Avramova, Fangkai Yang, Chengjie Li, Christopher Peters and Ga-briel Skantze.“A virtual poster presenter using mixed reality.”Proceedings of the 17th International Conference on Intelligent Virtual Agents(IVA), (2017).In this demo, we will showcase a platform we are currently developing forexperimenting with situated interaction using mixed reality. The user willwear a Microsoft HoloLens and be able to interact with a virtual characterpresenting a poster.

2. Christopher Peters, Chengjie Li, Fangkai Yang, Vanya Avramova and Ga-briel Skantze.“Investigating social distances between humans, virtual humans and virtual

viii

robots in mixed reality.”Proceedings of the 17th International Conference on Intelligent Virtual Agents(IVA), (2017).In this paper, we summarise initial experiments we are conducting in whichwe measure comfortable social distances between humans, virtual humans andvirtual robots in mixed reality environments.

3. Naresh Balaji Ravichandran, Fangkai Yang, Christopher Peters, AndersLansner and Pawel Herman.“Pedestrian simulation as multi-objective reinforcement learning.”Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA), (2018).In this work, we model pedestrians in a modular framework integrating navi-gation and collision-avoidance tasks as separate modules. Each such moduleconsists of independent state-spaces and rewards, but with shared action-spaces.

4. Chengjie Li, Theofronia Androulakaki, Alex Yuan Gao, Fangkai Yang, Hi-mangshu Saikia, Christopher Peters and Gabriel Skantze.“Effects of posture and embodiment on social distance in human-agent inte-raction in mixed reality.”Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA), (2018).In this paper, we conducted an experiment in which participants were askedto walk up to an agent to ask a question, in order to investigate the socialdistances maintained, as well as the subject’s experience of the interaction.We manipulated both the embodiment of the agent (robot vs. human andvirtual vs. physical) as well as closed vs. open posture of the agent.

5. Himangshu Saikia, Fangkai Yang, Christopher Peters.“Priority driven Local Optimization for Crowd Simulation.”Proceedings of the 18th International Conference on Autonomous Agents andMultiAgent Systems (AAMAS), (2019).We provide an initial model and preliminary findings of a lookahead basedlocal optimization scheme for collision resolution between agents in large goal-directed crowd simulations.

6. Himangshu Saikia, Fangkai Yang, Christopher Peters.“Criticality-based Collision Avoidance Prioritization for Crowd Navigation.”11th International Conference on Virtual Worlds and Games for Serious Ap-plications (VS-Games), (2019).Our method resolves critical agents-agents that are likely to come withincollision range of each other-in order of priority using a Particle Swarm Op-

UNPUBLISHED MANUSCRIPTS ix

timization scheme. The resolution involves altering the velocities of agents toavoid criticality.

7. Yuan Gao, Fangkai Yang, Martin Frisk, Daniel Hemandez, Christopher Pe-ters and Ginevra Castellano.“Learning Socially Appropriate Robot Approaching Behavior Toward Groupsusing Deep Reinforcement Learning.”28th IEEE International Conference on Robot and Human Interactive Com-munication (RO-MAN), (2019).In this paper, we present a deep learning scheme that acquires a prior mo-del of robot approaching behavior in simulation and applies it to real-worldinteraction with a physical robot approaching groups of humans.

Unpublished Manuscripts

1. Fangkai Yang, Yuan Gao, Ruiyang Ma, Sahba Zojaji, Ginevra Castellanoand Christopher Peters.“A Dataset of Human and Robot Approach Behaviors into Small Free-StandingConversational Groups.”Submitted to PLOS ONE.We present CongreG8, a novel dataset that captures the full-body humanmotions of approaching to join a group scenario, in which a newcomer (ahuman or a robot) moves towards and joins a free-standing conversationalgroup consisting of three people.

2. Fangkai Yang, Christopher Peters.“ISAN: Imitated Social Aware Navigation in Virtual Crowds with Groups andObstacles.”Submitted to IEEE Transaction on Games.This paper presents a Social-Aware Navigation (SAN) method for crowds inserious games and we augment the SAN model with Imitated Social-AwareNavigation (ISAN), which imitates the trajectories of SAN but without theneed to compute the social-aware space.

In the remainder of this thesis, I shall use the pronouns we, our, and us, to in-dicate contributions in the aforementioned papers, as is the standard of referencingscientific literature contributions.

Acknowledgments

Ph.D. is like a marathon, and not everyone would take the courage to try it. It is along journey and full of pressure and depression, and I could not make it withouthelp from the people who accompany me all the way along.

Thank you, Prof. Christopher Peters, for being my supervisor. In my Ph.D.journey, you are a very supportive, optimistic, and energetic boss. I still rememberthe last question in my Ph. D. interview that you asked me, ’What do you thinkmakes a good boss?’ You always did as I answered, i.e., the one I could turn towhen having any question. I still keep the draft of my first paper that is full ofyour handwritten comments. Thank you for the freedom you gave me to exploreresearch areas that I am interested. I could never be grateful enough, and thosegood memories will keep inspiring me in my life and future career. I am and willalways be lucky to be your Ph.D. student.

Thank you, Prof. Ginevra Castellano, for being my co-supervisor. You arealways very supportive of my experiment design with Human-Robot Interaction.Your helpful guidance in this area is always inspiring me to be thoughtful.

To Prof. Mårten Björkman, thank you for the help in paper revision anddiscussion. Thank you, Prof. Johan Hoffman, Prof. Björn Thuresson, Prof.Stefano Markidis, for supporting my Ph.D. research, and it is a great pleasureto work with you in CST.

To Xi Chen, Wenjie Yin, and Yuan Gao, you are my most important andsupportive friends in my Ph.D. journey. You are the first person coming to my mindwhen I encounter any question in my research, and I know I could start a discussionright away without any hesitation.

To Himangshu Saikia, Maha El-Garf, and Sahba Zojaji, thank you foryour very supportive help in my experiments and brainstorming excellent ideas. Itis my honor to be with you in our ESAL lab.

Thank you Tianzhi Zhou, Shengmei Xiang, Zheng Wei, Jiaqiao Peng,Shenglin Yu, and Zhengyang Lv, for cheering me up in countless Swedish wintersand nights.

To Shaobo Jin, my senior from high school to university, thank you for yourguidance and help when I first arrived in Sweden, and you are the one who introdu-ced such a great place to me. To Lele Cao, it is such a fate that we became friendsin Greece, and our friendship will always be my great treasure. To Zheng Ning,

xi

xii

my oldest friend, I would never come to Sweden without you. You know everythingin Statistics, and you are never tired of discussing research questions. I wish youthe best in your future life and career.

Finally, I would like to thank my family and all my friends for your countlesssupport. It is your faith in me that encourages me to overcome difficulties andbecome a better person.

Contents

Papers included in this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vOther relevant papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiUnpublished Manuscripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Contents xiii

1 Introduction 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Foundations for Group Interactions 112.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Virtual characters in social games . . . . . . . . . . . . . . . . . . . . . . . 122.3 Human-Machine Interactions via mixed reality . . . . . . . . . . . . . . . . 14

3 Crowd Simulation and Perception 193.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Neighbor selection in trajectory prediction . . . . . . . . . . . . . . . . . . 203.3 Perception of groups in crowds . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Approaching Behaviors into Small Groups 254.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Trajectory generation methods . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.4 Group behavior recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Conclusions 375.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2 Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Bibliography 41

xiii

Chapter 1

Introduction

1.1 Overview

Mobile robots and virtual agents are increasingly involved in human lives, with theirroles evolving from providing services to acting as companions and even partners.Constructing systems capable of interacting with humans while adapting to theirsocial norms is crucial to this evolution. However, this is a challenging prospectdue to the social sensitivities of humans that are not only verbal in nature, butalso include non-verbal behaviours related to social distances, body gestures, andmovements. Therefore, traditional approaches to model robot and agent behaviorsby handcrafted features are not always robust when considering the complexity ofshared environments and human perception. Robots and agents should adapt tohumans and be socially compliant by taking human perception and sensitivitiesinto consideration. The data of social behaviors from human-human interactionscould be an important basis for learning and modeling socially compliant behaviorsfor robots and agents.

Most previous research concerning mobile robots or virtual agents has eitherfocused on dyadic interactions or crowd simulations [PPD∗10]. However, Human-Machine Interaction1 at the level of small groups still remains largely underexplored.Small groups are ubiquitous in real-life scenarios since it is natural for people togather into clusters [CJ61, Jam53] to interact. It is the goal of this thesis toinvestigate how mobile robots and virtual agents consider human social norms whenthey interact with small groups of people.

This thesis focuses on non-verbal behaviors in HMI. While verbal behaviorsare very important, they are explicit and may abruptly interrupt the on-goingconversations or activities within groups. Non-verbal behaviors have the potentialto be more subtle and, for humans, may help minimize social embarrassment incertain situations [Gof08]. Further, non-verbal behaviors are crucial in Human-

1The term Human-Machine Interaction (HMI) will be used throughout this thesis to representthe work done both on Human-Agent Interaction and Human-Robot Interaction.

1

2 CHAPTER 1. INTRODUCTION

Machine Interactions where robots and agents should be capable of understandingand predicting people’s intentions. For example, people may try to understand theintentions of others by considering their eye gaze, body gestures, social distanceand walking direction.

As a foundation for investigating social norms in HMI, a basic toolkit for virtualagents was developed in Section 2.2 and perceptual studies concerning proxemicsbetween human users and virtual agents are presented in Section 2.3. With varyingappearance, gender and body gestures conducted by virtual agents, participants areasked to take the most comfortable social distance during non-verbal interactions.The embodiment of a system may change the way that humans treat it. Therefore,the work presented here on dyadic interaction explores the impact of embodimenton human behaviors, and it provides foundations for the subsequent study and sim-ulation of group interactions that account for proxemics and full-body behaviors.It also raises questions about if and how social norms hold across different media,including Virtual Reality (VR), Mixed Reality (MR), and the physical world. Theimpact of embodiment on human perception is studied in crowd scenarios (Sec-tion 3.3) to investigate some of the circumstances in which humans perceive groupsin high density crowds.

As humans, we have to adapt to machines in our daily lives, such as usingexternal user interfaces to send commands to machines via mouse and keyboardinterfaces. However, a socially-compliant system should understand social normsand adapt to humans. By investigating social norms in one-to-N interactions, i.e.,crowd simulation, neighbor selection models are developed (Section 3.2) for trajec-tory prediction to improve the adaptation of collision avoidance in crowds. Simi-larly, mobile robots should also adapt to groups of people in group interactions bytaking appropriate and socially-acceptable trajectories (Section 4.2). The adaptionbrings humans in the loop such that the robots and agents should synthesize andunderstand social norms and human behaviors. Thus, machine learning approachesare proposed (Section 4.4) to recognize behaviors in group interactions so that therobots could adapt their behaviors when they must join human groups. These be-haviors are referred to in this thesis as approaching and joining behaviors. Sincethe data is necessary for deep machine learning, this thesis collects novel datasets(Section 4.3) containing full-body behaviors in group interactions. Furthermore,experiments are presented with robot joining behaviors generated by several meth-ods that were previously developed in a real-world scenario. The thesis concludesthat machine learning approaches achieved better performance by adapting to so-cial norms and human sensitivities than procedural models that rely on handcraftedfeatures.

Within this thesis, Human-Machine Interaction is explored at the levels ofdyadic interactions, crowd (one-to-N) interactions, and group interactions. Thethesis focuses on group interactions as a bridge between dyadic interactions andcrowd interactions, investigating the impact of robots and agents that are sensi-tive to some human social norms. The core contribution of the thesis are controlalgorithms based on machine learning approaches for realising this. This thesis has

1.2. CONCEPTS 3

implications for domains that have need of social robots and agents, from sociallycompliant mobile systems that navigate in the same spaces as humans, to prosocialdemonstration systems for use in education and virtual interactions.

1.2 Concepts

This section provides a brief explanation of important background concepts relatedto social norms and group interactions in static and free-standing conversationalgroups of humans as a background for the research done in the thesis.

ProxemicsHall [Hal66] proposed an important term, proxemics, in social gatherings concerningsocial distance. It is typically divided into four distance zones corresponding to thepeople’s social relation to interacting with, and they are intimate, personal, social,and public distances. The intimate distance is for the most trusted and lovedin social circles, such as partners and siblings. Personal distance is reserved forfriends and family, people you know and trust. Social distance is more common foracquaintances in social gatherings such as sales or service providers. Finally, thepublic distance is reserved for activities outside of personal involvement, e.g., publicspeaking. Social distance and its impact on HMI is an important consideration inthis thesis.

GroupsPeople tend to gather in smaller groups when they have a shared identity or goal.As proposed by Kendon [Ken88], there are two types of groups based on the type offocused attention within groups. The first one has jointly focused attention, suchas conversational groups, groups of workers cooperate in a task. The second hascommon focused attention, such as audiences listen to a speech. Moreover, groupsare also common in larger crowds, i.e., moving groups. People tend to congregatein smaller groups rather than walking alone, and the group-level phenomena havegreater prominence in crowds [CJ61, Jam53]. In this thesis, both static groups andmoving groups are covered, focusing on jointly focused attention groups, specificallyconversational groups.

F-FormationsIn a free-standing conversational group of humans, Kendon [Ken90] proposed theF-formation system to define the positions and orientations of individuals within agroup. The o-space, proposed by Kendon as an exclusive space surrounded by groupmembers, has been used as a basis for models seeking to that prevent individualsoutside of the group from intruding into it as shown in Figure 1.1(a). The p-spaceis a narrow stripe that surrounds the o-space, and that contains the bodies of the

4 CHAPTER 1. INTRODUCTION

Figure 1.1: (a) Kendon’s F-Formation [Ken90]; (b) face-to-face formation; (c) V-shaped formation; (d) side-by-side formation; (e) L-shaped formation. The bluecircles represents o-spaces.

participants, while the r-space is the area beyond the p-space. The F-formationshave various arrangements (Figure 1.1(b)-(e)); in the case of two participants, typ-ical F-formation arrangements are face-to-face, L-shape, and side-by-side. Whenthere are more than three participants, a circular formation is typically formed.

1.3 Related Work

Small Groups

Large clustering of people is often associated with crowd simulation. Numerousworks have been done to simulate human behaviors in large crowds at both themacroscopic [Hug03, TCP06, NGCL09] and the microscopic [HM95, VdBLM08,YCDD05] scales. In order to capture behaviors and interactions of each individual,this thesis focuses on the microscopic perspective, where small group simulationhas attracted more attention.

As mentioned in Section 1.2, there are basically two main types: moving groupsand free-standing conversational groups. In dynamic crowds, people tend to con-gregate in smaller groups rather than walking alone [CJ61], and the ‘group’ here isused in its sociological sense, as stated in [Har76]. The spatial formations of mov-ing groups have been conducted through empirical studies [PE09, MPG∗10, KO10].Karamouzas et al. [ALSN09] modeled the moving group formations based on anempirical study from Moussaïd et al. [MPG∗10]. Besides the empirical studies,works have also been done to simulate moving groups via certain patterns, such asleader-follower pattern [VMdO03], abreast, UV and river patterns [FGMV12].

However, free-standing conversational groups or F-formations are commonlyseen in static crowds, such as cocktail parties [RVS∗15], poster sessions [APYR∗15].Kendon [Ken90] proposed the F-formation system to define the positions and ori-entations of individuals within a conversational group. The o-space, an exclusive

1.3. RELATED WORK 5

space surrounded by group members, has been used to identify people outside fromintruding into the group in the F-formation systems.

In this thesis, individuals in F-formation are considered quasi-dynamic, i.e., inthe sense that while the group as a whole is not moving, individual members withinit may shift position and orientation, and the agent may therefore need to updateits approach trajectory. Therefore, the machine learning methods proposed hereare adaptive to both static and quasi-dynamic small groups and perform betterthan procedural models in quasi-dynamic group situations.

Approaching Group BehaviorsThere have been many studies concerning mobile robot behaviors in terms of ap-proaching humans recently. They can be classified into two general categories: a)approaching an individual human or b) approaching a group of humans. Researchconcerning the case of humans approaching a robot either alone or in a group isnot considered in this thesis.

The first category typically involves proxemics, social distances between humansand robots, or virtual agents. Mead and Mataric [MM15] presented an experimenton how a human perceives robots during interactions at different distances. Peterset al. [PYS∗18] proposed comfortable distances when a virtual robot approachesa human user using Hall’s model [Hal10]. Walters et al. [WDWK07] showed thatpeople prefer to be approached within their field-of-views (FOVs) rather than fromdirectly behind, and Ball et al. [BRSTV17] also found that people feel uncomfort-able when the approaching agent cannot be seen. Approaching behaviors towardshumans in different states, such as sitting, walking, standing, has been studied in[KSAO∗14, TCJ13, CTWB13].

Figure 1.2: Images from a cocktail party dataset [RVS∗15]. A man (green trian-gle overhead) tries to approach a group (red circle). However, the closest joiningposition has been occupied, so he goes around the group to find a place to join inwithout intruding upon the o-space of the group members.

The second category of behaviors, in which individual approaches a group ofhumans, has been studied in [TN18, GMG14, NSPB15]. As mentioned, there areworks concerning small groups and individual behaviors. However, there is less workthat brings those together through modeling the interactions between small groups

6 CHAPTER 1. INTRODUCTION

and individual agents i.e. to consider the case of an agent approaching a group.In a free-standing conversational group of humans, Kendon [Ken90] proposed theF-formation system to define the positions and orientations of individuals withina group. The O-space, proposed by Kendon as an exclusive space surrounded bygroup members, has been used as a basis for models seeking to prevent robotsfrom intruding into the group. For example, robots outside a group want to ap-proach to join it; they need to calculate a trajectory that does not intersect withthe o-space, as shown in Fig. 1.2. Leveraging the F-formation system, Truong etal. [TN18] proposed a framework to enable a robot to approach a human groupsafely and socially. Althaus et al. [AIK∗04] developed a topological map-basedmodel for approaching a group. Gómez et al. [GMG14] extended a fast marchingalgorithm to navigate a robot for engaging a group of people. Coming back to thedomain of human-agent interaction, Cafaro et al. [CRO∗16] estimated users’ atti-tudes towards other users embedded in the virtual environment, for example, forthem to join a group of agents. Pedica et al. [PV08] augmented the Social Forcesmodel [JT07, HM95] in modeling agent’s approaching group behaviors with at-tractive forces driving towards the target and repulsive forces encouraging collisionavoidance. However, these methods relied on hand-crafted features on distances orpredefined rules. Other recent works [BRSTV17, VCM∗17] focus on investigatingthe factors in a conversational human group that may impact robot behaviors, suchas human orientations and robot approach distances. This thesis considers both ofthe aforementioned with a focus on approaching small group behaviors, i.e., whenan agent or a robot, referred to in this thesis as the newcomer, approaches a smallgroup to join it.

Data-driven ModelsMost previous works are either experimental studies or computational models im-plemented and validated in simulation. All assume that group members remainentirely static, i.e., do not change body orientations and positions at all during aconversation. The lack of data-driven models can address these limitations anddevelop new models for planning approaching group behaviors in a human-like andsocially acceptable manner.

With the advent of deep learning, data-driven methods based on RecurrentNeural Networks (RNNs) and its variant structures such as Long-Short Term Mem-ory networks (LSTMs) have been used in simulating interactions and behaviors incrowds. These methods model human interactions by capturing neighboring in-formation. One of the most popular RNN-based methods is the Social LSTM[AGR∗16] method, which represents all pedestrians using LSTMs and uses a localpooling method to gather information from local neighbors. It is further augmentedin [VMO18] by a Spatio-temporal graph-based social attention model and also aug-mented in [FDSF18b] through a soft and hardwired attention framework. Zhanget al. [ZOZ∗19] refined states for LSTM networks in order to capture the currentintention of the neighbors.

1.3. RELATED WORK 7

Moreover, Generative Adversarial Networks (GANs) [GPAM∗14] based trajec-tory prediction methods [SKS∗18, FDSF18a, GJFF∗18] have shown promising re-sults in capturing interactions between pedestrians and the surrounding environ-ment. Social GAN [GJFF∗18] proposed a pooling mechanism to learn social normsin a data-driven approach. SoPhie [SKS∗18] leveraged social and scene attention-based mechanisms to generate a distribution of predicted trajectories. GD GAN[FDSF18a] augmented the task of trajectory prediction with group detection todiscover social group interactions. GAN models can be used to predict trajectoriesbased on a training set and starting conditions. They can also then be used forgenerating new trajectories.

However, none of these methods focus specifically on predicting a human’s orrobot’s trajectory when approaching a group. Moreover, only position informationis considered in these works since their training datasets have few or no conversa-tional groups, and orientation information is always ignored or set as the movingdirection by default. In order to overcome the aforementioned challenges, this the-sis proposes machine learning methods trained to generate behaviors of robots toapproach small groups of people. The training data is not limited to 2D informationsuch as positions and orientations, and full-body behaviors of groups are used totrain machine learning models. Furthermore, they are adapted to group dynamics,where individuals in the group still have some mobility rather than being totallystatic.

Group interaction datasetsHuman-human interaction databases were extensively reviewed in several surveysincluding [SP18, BCC13, ZZZ∗19]. Unlike datasets containing individual actionrecordings, human-human interaction datasets, i.e., those containing multiple hu-mans interacting, are relatively scarce. One is the CMU Panoptic Dataset [JSL∗17].In this dataset, different kinds of interactions, such as dance and haggling, are col-lected. This dataset’s advantage is that the recordings are relatively accurate,although a disadvantage is that the recording space is limited if trajectories areto be considered. Another dataset is the BARD dataset [CIOP14]. The datasetrecords human interactions in the wild with a focus on human behavior analysisin video sequences with multiple targets. However, there is no particular joiningbehavior in these scenarios. Besides, a recent dataset MHHRI [CSG17] focuseson analyzing and comparing the natural behavior of human-human and human-robot interactions. Although it contains trajectories of head and hand movement,group approach behaviors are not considered. The SALSA dataset [APSS∗15], theMatchNMingle dataset [CQDG∗18], and the Idiap Poster dataset [HK11] containsa limited number of group approach behaviors with only position and orientationinformation. However, there is still a shortage of datasets with enough training datathat could be used to learn and understand group behaviors. This thesis collects adataset, called CongreG8 (see Section 4.3) that contains full-body motion captureddata where robots and humans have interactions with small groups. The dataset

8 CHAPTER 1. INTRODUCTION

collected in this thesis provides foundations for training machine learning modelsthat recognize group behaviors and generate approaching group behaviors.

1.4 Motivation

As noted in previous sections, most research has either focused on dyadic interac-tions or crowd simulations [PPD∗10], but there are fewer studies modeling multi-party interactions at the level of small groups. More specifically, fewer studies focuson approaching behaviors into small groupS. However, in reality, it is crucial formore natural interactions in many real-world situations. First, group interactionsare pervasive when people gather in clusters (e.g., cocktail party, poster session),and it is important to understand such social gatherings in various scenarios. Sec-ond, group interaction does not merely assemble multiple dyadic interactions to-gether. The way people behave in a group is different than how they behave indyadic interactions [Zaj65]. Group interactions pose additional challenges relatedto group dynamics, such as formation control, navigation, and turn-taking. Whenconsidering embodied agents, how should these agents be positioned and orientedso that they could participate in the conversation while conducting interactionswith people outside of the group, such as newcomers? Moreover, what path andbehaviors should robots or agents adopt when approaching small groups in orderto make people in groups feel comfortable? Finally, when it comes to perception,often little is known about how perception models perform when tested with realgroups of people, and many procedural models are tested in simulations. Thus, itmotives this thesis to include perception as a consideration, i.e., how humans mayperceive an agent or a robot, and how it should interpret the behaviors of groupmembers when interacting with the group.

Another critical reason to research this topic is that previous experimental stud-ies on group interaction typically do not propose computational models and whencomputational models are proposed, they may only be evaluated in simulationswith handcrafted features. Although these methods and experiments involve safeand socially-acceptable paths, collectively, they have shortcomings that limit theirutility as a full solution for simulating group interactions. As introduced in Sec-tion 1.3, data-driven models may generally have better performance and do notneed delicate parameter design and tuning. This thesis aims to develop data-drivenmodels and evaluate them with people’s perceptions while comparing previous meth-ods in group interaction. Additionally, current research often focuses on fully staticgroups within the environment and does not properly consider group members’full-body behaviors or localized shifts in their positions within the static group.In contrast to these totally static groups, in real situations individuals in a groupmay still have some mobility, for example, changing their positioning within a for-mation during the course of interaction to accommodate newcomers or reformingdue to a departure of a group member (Figure 1.3). Thus, this thesis presents ma-chine learning methods that not only consider group dynamics, but also cover some

1.5. CONTRIBUTIONS 9

full-body behaviors of small groups.

Figure 1.3: Members of the conversational group (red circle) are quasi-dynamic,changing body positions and orientations. The robot initially plans a path (redcurve) to approach and join the group without interrupting the conversation whileapproaching from the front (left). However, the two people in the group changepositions and body orientations (right), and the robot thus changes its path ac-cordingly (green curve) to continue to approach the group from the front.

1.5 Contributions

The research reported in this thesis contributes to the areas of group interaction bysimulating approaching and joining group behaviors, understanding group dynamicsand evaluating them.

In Chapter 2, work in dyadic interactions concerning proxemics is presentedby demonstrating social games and interactive mixed reality demos. This work ex-plores Human-Machine Interaction in virtual environments and provides parametersand inspiration for models and simulations in the following chapters. The virtualcharacter behavior toolkit developed within the work builds a technical foundationfor simulating and visualizing group interactions.

Chapter 3 shows the research of this thesis in crowd simulations that extends theresearch on proxemics. The group perceptual study investigates human’s perceptionof small groups in crowds with various densities and viewpoints. It also inspires aneighbor selection model that increases accuracy in pedestrian trajectory prediction.

Finally, Chapter 4 presents the focus of this thesis concerning small group inter-actions. It presents novel methods in both procedure and machine learning basedbehavior generation for robots to approach groups of humans. The machine learn-ing methods enable robots to join quasi-dynamic groups. To train machine learningmodels, motion capture techniques are used to collect full-body behaviors of smallgroups. These methods are evaluated in real-world scenarios with robots interactingwith groups of people.

Chapter 2

Foundations for Group Interactions

This chapter presents an overview of the research of this thesis in relation to Human-Agent Interaction at the level of dyadic interactions within virtual environments.The animation toolkit in this chapter provides the foundations for the virtual sim-ulated group interactions, especially in animating groups of characters, and it alsodemonstrates some possible application areas of this research.

2.1 Contributions

In Section 2.2, this thesis introduces virtual characters and their behaviors in socialgames by presenting a simple toolkit and a social game developed upon this toolkit.Section 2.3 further presents the abilities of virtual characters in Human-MachineInteraction, specifically in mixed reality concerning proxemics.

Papers included in this thesis

• Fangkai Yang, Chengjie Li, Robin Palmberg, Ewoud Van Der Heide andChristopher Peters.“Expressive virtual characters for social demonstration games.”9th International Conference on Virtual Worlds and Games for Serious Ap-plications (VS-Games), (September 2017).

• Christopher Peters, Fangkai Yang, Himangshu Saikia, Chengjie Li, GabrielSkantze.“Towards the use of mixed reality for HRI design via virtual robots.”Proceedings of the 1st International Workshop on Virtual, Augmented, andMixed Reality for HRI (VAM-HRI), (2018).

11

12 CHAPTER 2. FOUNDATIONS FOR GROUP INTERACTIONS

Other relevant papers• Vanya Avramova, Fangkai Yang, Chengjie Li, Christopher Peters and Gabriel

Skantze.“A virtual poster presenter using mixed reality.”Proceedings of the 17th International Conference on Intelligent Virtual Agents(IVA), (2017).

• Christopher Peters, Chengjie Li, Fangkai Yang, Vanya Avramova and GabrielSkantze.“Investigating social distances between humans, virtual humans and virtualrobots in mixed reality.”Proceedings of the 17th International Conference on Intelligent Virtual Agents(IVA), (2017).

• Chengjie Li, Theofronia Androulakaki, Alex Yuan Gao, Fangkai Yang, Hi-mangshu Saikia, Christopher Peters and Gabriel Skantze.“Effects of posture and embodiment on social distance in human-agent inter-action in mixed reality.”Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA), (2018).

2.2 Virtual characters in social games

Human-Machine Interactions are often integral to game-based learning environ-ments, especially those involving social scenarios. A common requirement in suchsystems is to enable social behaviors via animated virtual characters, where theypresent players or non-player characters. These characters are capable of involvingin a wide range of roles in game-based learning environments, where they may be-come the focus of the game. They are suitable for such roles since they are capableof acting in a controllable and repetitive manner. However, there are several limita-tions concerning the use of sophisticated 3D characters in such environments, suchas the level of technical skill required in implementation and the cost involved.

To be more specific, a common requirement to maintain engagement in games isthat they engage the player by remaining challenging and adapting to the player’sskill level. Virtual characters support various challenges that meet both gameplayand learning objectives by allowing the complexity of characters’ appearance andbehaviors to be precisely altered in a controllable manner [Pel05]. On the otherhand, virtual characters are capable of a range of expressive full-body behavior[HMP05] communicating social cues and basic and complex emotional states. Sincehumanoid virtual characters often share a similar embodiment to humans, studiesthat deal with human behaviors also provide insights towards creating behaviorsfor virtual characters. In both areas, studies generally focus on the facial area, thebody (excluding facial expressions [ONP15]), full face and bodily expressions, andhigher-level expressions potentially associated with impressions of social character

2.2. VIRTUAL CHARACTERS IN SOCIAL GAMES 13

[KKPS14] e.g., trustworthiness, cooperation. Furthermore, the embodiment andappearance of virtual characters may be changed. They change their behaviors tosupport generalization and in the meanwhile maintaining their engagement.

To address the aforementioned challenges, work related to this thesis presenteda simple toolkit for developing expressive virtual character behaviors [YLP∗17].The toolkit sits on top of the game engine Unity1 and provides high-level characterbehaviors to quickly create social games, i.e., a bridge between the game engine andsocial games. It enables social interactions for individuals and groups focused onsocial game scenarios by controlling their behaviors around the environment. Asan accelerator, it allows more time to be devoted to designing pedagogical gamescenarios and taking the attention away from fundamental technical developmentwork involved in designing the animation of expressive virtual characters.

The virtual character behavior toolkitThe toolkit that the thesis introduces is designed to develop virtual character be-haviors aimed specifically towards social game scenarios involving small groups. Itis not to provide a single game scenario or set of characters, but rather a develop-ment kit for creating a wide range of scenarios surrounding social behaviors andinteractions that support a variety of character appearances.

The behavior toolkit manages the control of the face, full-body, and formationbehaviors [JT07] of individuals and groups of virtual characters (see Figure 2.1).It also supports peripheral features required for working with virtual characters,such as camera, player and navigation control, and a number of special features,such as the copy controller, which enables and determines how sets of human facialexpressions are to be mapped onto the faces of virtual characters in real-time.

A number of existing virtual characters are supported in the toolkit combinedwith existing predefined behavior labels. Since the style and needs of each gamediffer, the expectation is that game designers will eventually design, rig, and createpredefined animations for their own characters that they will associate with existingpredefined labels in the toolkit. The process for doing this involves binding the facialexpressions and full-body animations of the new characters.

Emotion with friendsA small game demonstrator, Emotions with Friends, concerning the recognition ofsocial cues and emotions has been created, featuring two levels of difficulty andfacial and full-body behaviors for two virtual characters where they have two em-bodiments that differ in terms of realism. The demonstrator’s initial pedagogicalaim has been to raise awareness and discussion possibilities with students aboutface and body expressions and enrich their emotional vocabulary, i.e., through theuse of a range of emotion labels.

1https://unity.com/

14 CHAPTER 2. FOUNDATIONS FOR GROUP INTERACTIONS

Figure 2.1: Virtual characters in a group welcome and make space for the player(left). A character in the group is introducing the player to the rest (right).

This toolkit is also a basis for the simulations throughout this thesis involvingvirtual characters. It can create a wide range of scenarios concerning various so-cial behaviors and interactions within a short period of time, and these scenariosare useful in the visual estimation of simulations and perceptual studies. In thefollowing chapters, this thesis presents research supported by the toolkit.

2.3 Human-Machine Interactions via mixed reality

Mixed reality environments enable the design of social interaction experiences wherevirtual characters may feel present while enabling the user to have contact with thesurrounding environment. Since the users are free to move around the environment,as in virtual reality, the issue of social distance is fundamental and paramount.There are few studies have previously investigated social distances or related fac-tors (varying character types, gender, viewpoint) in mixed reality environments.Mixed reality is significant since it is not based fully in either the physical or thevirtual environment. Instead, it anchors and embeds virtual objects into the realenvironment. If mixed reality interactions share similarities with their purely phys-ical counterparts, there is the potential for the design of real interactions withoutthe need for physical objects, which may be expensive or susceptible to breakdown(i.e., in the case of complex social humanoid robots).

Previously the thesis has developed a virtual character toolkit for quickly cre-ating social scenarios with expressive virtual characters. The thesis proposed amixed reality platform for designing social interactions with robots through the useof virtual robots and the virtual character toolkit. Such a platform will allow us toinvestigate design possibilities and conduct experiments with participants withoutthe physical robots’ need to be present and available. Furthermore, change can bemade to the virtual robots that would be very expensive to conduct with the actualrobots.

2.3. HUMAN-MACHINE INTERACTIONS VIA MIXED REALITY 15

The platform involves four major components: The IrisTK2 multimodal di-alogue framework [SAM12], a toolkit to animate full-body behaviors for virtualcharacters, Unity3D, and the Microsoft HoloLens. An overview of the platformis shown in Figure 2.2. Microsoft HoloLens is used to display mixed reality char-acters (referred to as holograms) that are rendered in the Unity 3D game engine.The HoloLens allows virtual objects to be world-locked, i.e., they stay in the sameworld-space position even when they rotate their heads or walk around. Coordinatevalues are expressed in meters, allowing objects to be easily rendered at a real-worldscale and for efficient measurement of real-world distances.

Figure 2.2: Overview of a platform being developed for the project. Unity 3Dvisualizes the virtual scene in the mixed reality environment through the HoloLens,animating characters using a series of behavior controllers.

Virtual poster presenterAn initial version of the platform was demonstrated in a scenario involving a human-sized mixed reality virtual poster presenter (see Figure 2.3). This demo has beenselected as an initial use-case for the platform and the design process. It worksnot only as an initial attempt towards the utility of HoloLens, but also an entry

2http://www.iristk.net

16 CHAPTER 2. FOUNDATIONS FOR GROUP INTERACTIONS

point to the following research questions throughout this thesis. For example, ifmultiple listeners are standing in front of the poster, the virtual poster presentercan distinguish which listen to interact with based on speech recognition. It thusinspired us to research on Human-Machine Interaction with small groups.

Figure 2.3: A virtual poster presenter scenario [AYL∗17], demonstrating an initialversion of the platform and representing one of the use-cases in the project.

The virtual poster presenter demo proposes the following questions: If the vir-tual human in the scenario is replaced with a human, a virtual robot, or a physicalrobot, what impact does each of these embodiments and their associated behav-iors have on interaction, and how could the appearance/behavior of the virtualrobots be varied in order to match the other cases better? If a mobile robot or anagent wants to initial a conversation with a group of people, what kind of trajecto-ries or behaviors should they adopt in order to make people within the group feelcomfortable and socially-acceptable?

Human-Machine Interaction in proxemicsAs mentioned in Section 1.2, proxemics is a significant factor in a variety of mod-eling and interactions. However, there is less studies when it takes place in virtualenvironments. For example, whether the distance zones with similar parameterswould be applied in virtual interactions. Proxemics has been explored not only forcreating plausible looking groups [EO12], but also as one of a number of modalitiesthat help to signal their attitudes towards users embedded in the virtual environ-ment, for example, in order for them to join a group of agents [CRO∗16]. It thusacts as the fundamental building block of the group interaction simulations.

2.3. HUMAN-MACHINE INTERACTIONS VIA MIXED REALITY 17

This part of the thesis focuses on proxemics, where humans use and manipulatedistances with others in social behavior. This thesis aims to explore the connec-tions between proxemics in the real world and virtual environments. The thesisconducted, to the best of our knowledge, one of the first studies comparing prox-emics and other interpersonal factors with respect to varying embodiments of anagent such as type (human or robot), gender (male or female), and size (standardsize or human size) [PYS∗18]. Participants were asked to walk up to an agent toask a question to investigate the social distances maintained and the subject’s ex-perience of the interaction. The virtual agent was displayed using a mixed realityheadset (see Figure 2.4).

Figure 2.4: The experiment investigate proxmeics between humans and virtualcharacters projected into physical environments via HoloLens. The virtual charac-ters vary in type (human or robot), gender (male or female), and size (standardsize or human size) [PYS∗18].

Work related to this thesis further extends the perceptual study by comparingwith physical Human-Human/Robot Interaction [PLY∗18] with extra interpersonalfactors with respect to posture (open or closed) and virtuality (physical or virtual)(see Figure 2.5). Expected outcomes were observed in several cases, such as pref-erences for the physical environment, including a higher engagement in it. Thisis at least partially attributable to the limited field of view afforded by the head-set. There were also some unexpected observations, such as participants remainingslightly further back on average for the open posture than the closed posture. Onepossibility is that the closed posture is indicative of a troubled state, which mightattract concern from others. However, participants reported the open posture to bemore engaging and willing to interact, which shows that willingness to interact andproximity may not always have strong correlations. There is no significant effect ofprior knowledge of VR or of gender on the results.

18 CHAPTER 2. FOUNDATIONS FOR GROUP INTERACTIONS

Figure 2.5: The virtual characters vary in type (human or robot), posture (open orclosed) and virtuality (physical or virtual).

Chapter 3

Crowd Simulation and Perception

Chapter 2 focuses on dyadic interactions, specifically concerning proxemics in vir-tual environments. As mentioned in Section 2.3, proxemics seems important for cre-ating plausible looking groups. This chapter presents an overview of the extensionof the previous research of this thesis with respect to proxemics in virtual environ-ments by establishing a neighbor selection model in crowd simulations. Moreover,the latter half of this chapter presents a perceptual study of small groups in crowdsin order to introduce an environment where small groups exist.

3.1 Contributions

In Section 3.2, this thesis extends the previous research on proxemics by proposinga neighbor selection model based on the perception of the agent. It is shown thatthe proposed model contributes a better trajectory prediction than state-of-the-art methods. Moreover, the perception model is applied in crowd simulation andinvolves a novel method based on criticality. Section 3.3 extends the crowd simula-tion model in this thesis by explicitly simulating groups in crowds and performing aperceptual study to find out under which conditions small groups may be simulatedin crowd situations.

Papers included in this thesis• Fangkai Yang, Himangshu Saikia, Christopher Peters

“Who are my neighbors? A perception model for selecting neighbors of pedes-trians in crowds.”Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA), (2018).

• Fangkai Yang, Jack Shabo, Adam Qureshi, Christopher Peters“Do you see groups? The impact of crowd density and viewpoint on the per-

19

20 CHAPTER 3. CROWD SIMULATION AND PERCEPTION

ception of groups.”Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA), (2018).

Other relevant papers• Naresh Balaji Ravichandran, Fangkai Yang, Christopher Peters, Anders

Lansner and Pawel Herman.“Pedestrian simulation as multi-objective reinforcement learning.”Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA), (2018).

• Himangshu Saikia, Fangkai Yang, Christopher Peters.“Priority driven Local Optimization for Crowd Simulation.”Proceedings of the 18th International Conference on Autonomous Agents andMultiAgent Systems (AAMAS), (2019).

• Himangshu Saikia, Fangkai Yang, Christopher Peters.“Criticality-based Collision Avoidance Prioritization for Crowd Navigation.”11th International Conference on Virtual Worlds and Games for Serious Ap-plications (VS-Games), (2019).

3.2 Neighbor selection in trajectory prediction

In Chapter 2, this thesis has investigated proxemics in Human-Machine Interaction,and it inspires us to develop a perception model in trajectory prediction. In trajec-tory prediction, pedestrians are able to perceive and act according to a variety ofinformation present in the environment. Such information includes the surround-ings, relative positions, velocities, and perceived intent of other pedestrians - somelocated quite far apart - and processing all of this in a streaming fashion to success-fully navigate their path. Designing virtual agents to perform similar processing oftheir surroundings and evaluate their path trajectories correctly is of great impor-tance in social robotics [LSTA10], urban planning for public safety [BDD03] andrealistic simulation of virtual crowds [LMM03] among others.

However, pedestrian trajectory prediction is a challenging problem as the futurepositions of an agent are determined by its previous positions and the interactionof the agent with its neighbors. Previous methods either take all pedestrians inthe scene as the neighbors of the predicted agent [VMO18, HM95, TMMK13] ortake a gird-like filter to select nearby neighbors [AGR∗16]. But these methodssuffer problems in some cases. The former methods are not efficient in a largecrowded scenario as the trajectory prediction of a queried agent involving all agents.Moreover, they end up assigning high attention weights to agents who are far awayfrom the queried agent and/or moving in the opposite direction, even though such

3.2. NEIGHBOR SELECTION IN TRAJECTORY PREDICTION 21

agents might have little to no impact on the queried agent’s trajectory. On the otherhand, the latter methods do not achieve comparable performance as the former ones.To overcome the above challenges, this thesis proposes a novel approach [YSP18]for selecting neighbors of an agent by modeling its perception as a combination ofa location and a locomotion model.

The overall architecture of the proposed neighbor selection model consists oftwo parts (see Figure 3.1). The location model selects interesting neighbors outof all the agents based on their proximity and bearing angles. The locomotionmodel selects the agents with a high risk of future collisions based on their angularand tangential velocities. The formula details are not included in this thesis butpresented in [YSP18]. Besides, this thesis adopts distance parameters from theprevious study concerning proxemics in the neighbor selection model.

Figure 3.1: (a) The Location Model. For an agent Ai, this consists of an ellipse Ei

and a sector Si (b) The effect of the bearing angle velocity on risk of collision. As-sume two agents (represented by the green and blue circles), one moving verticallyup and the other horizontally towards the left. If the bearing angle between thetwo reduces with time (i.e. θ̇ < 0), the blue agent will pass in front of the greenagent and not collide. If the bearing angle increases with time (i.e. θ̇ > 0), thegreen agent passes the blue agent and again there is no collision. However, if θ̇ ∼ 0,the two agents will probably collide.

The model was evaluated with three publicly available datasets: ETH [PESVG09],UCY [LCL07], and Pedestrian Walking Path (PWP) [YLW15]. The method’s per-formance was demonstrated by comparing it with the existing state-of-the-art meth-ods, such as social LSTM [AGR∗16] and Social Attention [VMO18] on the abovedatasets. The results show that the neighbor selection model overall improves theaccuracy of trajectory prediction and enables prediction in scenarios with largenumbers of agents in which other methods do not scale well. To better visualizethe performance, random cases are selected. Figure 3.2 shows an example present-ing the difference between the Social Attention model and the proposed model, andFigure 3.3 shows that after the neighbor selection model, the attention weights areassigned correctly to the corresponding neighbors.

The neighbor perceptual model was applied to crowd simulation. Goal-directedagent navigation in crowd simulations involves a complex decision-making process.

22 CHAPTER 3. CROWD SIMULATION AND PERCEPTION

Figure 3.2: An example illustrating the difference between the Social Attentionmodel and the proposed model. (1) Prediction accuracy: The solid dots representthe ground truth positions, and the ‘+-’ markers represent the predicted positions.As can be observed, the proposed method predicts future positions more accurately,and there is a lesser deviation between the true position and the predicted positions.(2) Neighbor importance: The queried agent, whose neighbors are being estimatedfor importance, is shown in red. The blue diamond marker represents the currentpositions of the various agents. The circular radii represents their attention weights.The Social Attention model assigns high attention weights to agents who are faraway from the queried agent and/or moving in the opposite direction. The proposedmodel successfully prunes such agents and assigns high weights to only those likelyto influence the queried agent’s trajectory.

An agent must avoid all collisions with static or dynamic obstacles (such as otheragents) and keep a trajectory faithful to its target at the same time. Here, theconcept of criticality in [SYP19] to break down the global optimization probleminto small local optimization problems by resolves critical agents - agents that arelikely to come within collision range of each other - in order of priority using aParticle Swarm Optimization scheme. Figure 3.4 shows one test scenario of theproposed methods compared to others that the proposed model achieves a crowdwith lane formations. Results also show that the navigation problem can be solvedin several important test cases with minimal collisions and minimal deviation tothe target direction. The method’s efficiency and correctness was demonstrated bycomparing it to four other well-known algorithms and performing evaluations onthem based on various quality measures.

3.3 Perception of groups in crowds

People tend to gather into larger masses when facing similar routes and obstacles,which results in a dynamic crowd of people. It is of great interest to simulatecrowds for applications ranging from special effects in movies to mobile robots.However, many studies of crowd behaviors [KGS06, HJAA07, YJ07] neglected thesocial interactions among pedestrians in crowds and treated a crowd as a collectionof isolated individuals. In reality, people tend to congregate in smaller groups

3.3. PERCEPTION OF GROUPS IN CROWDS 23

Figure 3.3: The weights in the Social Attention model (a), (c), (e), and the proposedmodel (b), (d), (f). The solid dots represent the ground true positions. The queriedagent, whose trajectory is being predicted, is shown in red. The blue diamondmarker represents the current positions of the various agents. The circular radiirepresents the attention weights.

rather than walking alone. The group-level phenomena have greater prominencein crowds [CJ61, Jam53], and a crowd is a sentient collection with psychologicalbehaviors, primarily sharing the property of having a certain distance betweenagents [SAD∗09].

To determine the impact of groups in crowds, this thesis extends the previousresearch on proxemics to simulate groups in crowds [YSQP18]. e.g., proxemicswithin groups is an essential factor in group formation. However, explicitly simu-lating groups requires calculations since there are a few group formations: abreast,UV and river formation [FGMV12]. The actual formation is either one of thethree formations or an interpolation, depending on the surroundings [HMC∗12].Therefore, human perception of groups in crowds is analyzed with various groupconditions, densities, and viewpoints (see Figure 3.5). This thesis intends to dis-cover that under which condition groups should be explicitly simulated from theperceptual aspect.

The results indicate that groups are perceived more saliently in the egocentricview, but more significant numbers of groups are seen in the perspective view.Further, crowds are perceived to be denser in the egocentric view. The dominantlyperceived group type is a two-agent group in lower density crowds, but four-agent

24 CHAPTER 3. CROWD SIMULATION AND PERCEPTION

Figure 3.4: Illustration of criss-cross interactions in the Orthogonal Corridor sce-nario for the five methods. The method (a) [SYP19] successfully forms the 45◦ lanesnecessary to navigate the crossing avoiding unnecessary collisions or maneuvers incomparison with state-of-the-art methods: ORCA (b) [VDBGLM11], Social Force(c) [HM95], Power Law (d) [KSG14], Zanlungo’ 11 (e) [ZIK11].

Figure 3.5: The real-time simulator [YSQP18], based on unilateral incompressiblefluids augmented with group behaviors, is capable of simulating very high densitycrowds. A perceptual study is presented that varies with densities, viewpoints andgroup conditions to determine the impact of groups on the perception of the realismof crowds. Perspective view (upper), egocentric view (lower); High density (left),low density (right).

or over-four is more dominant in higher density crowds. These results are of use toresearchers when recording, annotating, and detecting groups of real-world crowddata [GCR09, GCR12]. Finally, groups were less salient and were seen less invery high-density crowds. However, participants were more likely to see groupsin high-density crowds regardless of whether they are explicitly simulated. In thiscase, there may be possibilities not to simulate the groups, reducing the amount ofcalculation necessary for them.

Chapter 4

Approaching Behaviors into SmallGroups

Previous chapters present the work in dyadic interactions under virtual environ-ments (Chapter 2) and perceptual models in crowd simulation (Chapter 3). Whilethere is numerous research on these two areas, the intermediate structure, i.e., groupinteractions, between individual virtual characters and large crowds remains rela-tively underexplored. This chapter presents an overview of the focus of this thesis:behavior generation for joining groups, group behavior recognition and evaluation.

4.1 Contributions

In Section 4.2, this thesis introduces methods to generate approaching group tra-jectories using procedural models and machine learning models. Due to the lackof dataset available for the training purposes, the thesis collected a dataset (seeSection 4.3) using motion capture techniques. This dataset offers a possibility todo further research involving full-body behaviors. A use case of the dataset is pre-sented in Section 4.4. Finally, a user study is performed to evaluate the previouslyproposed methods of generating robot trajectory approaching behaviors.

Papers included in this thesis• Fangkai Yang, Christopher Peters.

“Social-aware navigation in crowds with static and dynamic groups.”11th International Conference on Virtual Worlds and Games for Serious Ap-plications (VS-Games), (2019).

• Fangkai Yang, and Christopher Peters.“App-LSTM: Data-driven generation of socially acceptable trajectories forapproaching small groups of agents.”

25

26 CHAPTER 4. APPROACHING BEHAVIORS INTO SMALL GROUPS

Proceedings of the 7th International Conference on Human-Agent Interaction,(September 2019).

• Fangkai Yang, and Christopher Peters.“AppGAN: Generative Adversarial Networks for Generating Robot ApproachBehaviors into Small Groups of People.”28th IEEE International Conference on Robot and Human Interactive Com-munication (RO-MAN), (October 2019).

• Fangkai Yang, Wenjie Yin, Tetsunari Inamura, Björkman, Mårten, andChristopher Peters.“Group Behavior Recognition Using Attention-and Graph-Based Neural Net-works.”24th European Conference on Artificial Intelligence (ECAI), (September 2020).

• Fangkai Yang, Wenjie Yin, Björkman, Mårten, and Christopher Peters.“Impact of Trajectory Generation Methods on Viewer Perception of RobotApproaching Group Behaviors.”29th IEEE International Symposium on Robot and Human Interactive Com-munication (RO-MAN), (October 2019).

Other relevant papers• Yuan Gao, Fangkai Yang, Martin Frisk, Daniel Hemandez, Christopher Pe-

ters and Ginevra Castellano.“Learning Socially Appropriate Robot Approaching Behavior Toward Groupsusing Deep Reinforcement Learning.”28th IEEE International Conference on Robot and Human Interactive Com-munication (RO-MAN), (2019).

4.2 Trajectory generation methods

Robots or agents that navigate to approach free-standing conversational groupsshould do so in a safe and socially acceptable manner. As introduced and discussedin Section 1.3, it is challenging since it not only requires the robot to plot trajectoriesthat avoid collisions with members of the group but also to do so without makingthose in the group feel uncomfortable, for example, by moving too close to them orapproaching them from behind. In this section, the thesis presents both proceduraland machine learning methods to generate such approaching group trajectories,focusing on machine learning aspects.

Social-aware navigation in crowds

Social-aware methods have been mostly applied to the socially acceptable naviga-tion of robots [SMUAS07, SKG∗13, GMG14, YLW15]. However, these works regard

4.2. TRAJECTORY GENERATION METHODS 27

static groups as ordinary obstacles rather than social entities. A novel social-awarenavigation method in crowds with obstacles and free-standing conversational groupsis proposed in [YP19c]. The social-aware space is modeled in order to generate aspeed map for navigation (see Figure 4.1). Social-aware space represents a senseof discomfort. For example, if a character enters into another entity’s social-awarespace in the environment, closer distances will cause more discomfort.

Figure 4.1: A static group with three group members. (a) F-formation with o-, p-and r-spaces. (b) Related social-aware space of groups.

The fast marching method [OS88] was used to navigate the character in thesocial-aware space, and we further compare it with state-of-the-art procedural meth-ods (see Figure 4.2) to show that the proposed method achieves a better perfor-mance.

Machine learning methodsThe procedural methods presented above typically have problems such as highcomputational cost or require handcrafted features. Also, many previous methodsassume the groups are static. However, as introduced in Section 1.3, real groups arequasi-dynamic: group members change body position and orientation slightly dur-ing the conversation, and this affects the approaching trajectories of the newcomers.In this section, machine learning methods are presented that generate approach-ing trajectories driven by data. The thesis proposes a semi-synthetic dataset in[YP19a] and utilizes state refinement LSTM [ZOZ∗19] to develop the App-LSTMmodel. It captures both the position and orientation information from the groupmembers and iteratively refines the newcomer’s current state to better focus onrecent information about the group (see Figure 4.3).

This thesis also applies Generative Adversarial Networks (GANs) [GPAM∗14] togenerate approaching group trajectories. Previous GAN-based trajectory predictionmethods [SKS∗18, FDSF18a, GJFF∗18] have shown promising results in capturinginteractions between pedestrians and the surrounding environment. GAN models

28 CHAPTER 4. APPROACHING BEHAVIORS INTO SMALL GROUPS

Figure 4.2: A scenario from the SALSA dataset [APSS∗16] where a person (witha green triangle overhead) leaves a conversational group and approaches to joinanother (red circle). Three methods including the proposed social-aware method(red line), SF (orange line) and A* (yellow line) are compared in this scenario withthe ground truth trajectory (green line).

can be used to predict trajectories based on a training set and starting conditions.They can also then be used for generating new trajectories. However, none of theaforementioned works focus specifically on predicting a human or robot’s trajectoryapproaching a group.

Moreover, only position information is considered in these works since theirtraining datasets have few or no conversational groups, and orientation informationis always ignored or defaulted as the moving direction. Thus the App-GAN model isproposed in [YP19b], as shown in Figure 4.4. Other methods, such as reinforcementlearning, have also been investigated to generate trajectories [GYF∗19].

As shown in [YP19a, YP19b], App-LSTM, and App-GAN achieve better perfor-

4.2. TRAJECTORY GENERATION METHODS 29

Figure 4.3: The overview of the App-LSTM framework. It contains a Group Inter-action Module (GIM) which selectively captures both the position and orientationfeatures of the group members. App-LSTM iteratively refines the current state ofthe approaching agent within the GIM to better focus on the current intention ofthe group members.

mance than both procedural models, LSTM-based and GAN-based models by con-sidering the position and orientation information of group members. Moreover, thegenerated trajectories are adaptive to quasi-dynamic groups so that the newcomercan change its approach direction to find a better place to join a group. However,the two models proposed are trained on the generated semi-synthetic dataset dueto the lack of publicly available datasets. Thus a dataset have been collected that

30 CHAPTER 4. APPROACHING BEHAVIORS INTO SMALL GROUPS

Figure 4.4: The overview of AppGAN . It consists of Generator (G), Group Inter-action Module (GIM) and Discriminator (D). G takes the observed trajectoriesof humans and the robot, and encodes the relative positions and orientations ofhumans separately as well as the robot positions. GIM is a multimodal attention-based fusion module which inputs the hidden states of humans and outputs a groupcontent vector for trajectory generation. The decoder generates the future trajec-tories from the group content vector and the hidden states of the robot. D takesinput Treal or Tfalse to classify them as real or not.

contains both human-group interactions and robot-group interactions, as shown inthe following section (see Section 4.3).

4.3 Dataset

Analysis and simulation of human interactions in a small group or between smallgroups are important in numerous research areas, especially those involving the con-trol of behaviors for mobile, social artificial systems. Due to the lack of availabledatasets of human behaviors in groups, problems such as modeling and enablingsocially-acceptable group behaviors for mobile robots still remain challenging. Tothis end, this thesis presents CongreG8 (pronounced ‘con-gre-gate’). This noveldataset captures the full-body human motions of approaching to join a group sce-nario, in which a newcomer (a human or a robot) moves towards and joins a free-standing conversational group consisting of three people (see Figure 4.5). In thisdataset, the behaviors of both the newcomer and the three group members are cap-tured. It contains human approach trials and robot approach trials. Additionally,it includes participant questionnaires related to personality (BFI-10), perceptionof robots (Godspeed), and custom human/robot interaction questions. Moreover,an analysis of the dataset is provided, showing human groups are more likely toaccommodate a human newcomer than a robot newcomer. The thesis also showsseveral use cases in the domains of behavior detection and generation in both vir-tual and physical environments. A sample of the CongreG8 dataset is available athttps://sites.google.com/view/congreg8.1

1The dataset paper is submitted to PLOS ONE and currently going through a review process.

4.4. GROUP BEHAVIOR RECOGNITION 31

Figure 4.5: Setup for the human-group interaction motion capture. A participantin a motion capture suit, T-pose (left). Three group players stand Who’s the Spy[YYI∗20] in a group, and an adjudicator approaches to join them (right).

4.4 Group behavior recognition

Unlike previous trajectory generation methods (e.g., Section 4.2), a full-body be-havior analysis has been conducted with the collected CongreG8 dataset. First,group behavior recognition is performed in [YYI∗20]. By analyzing the CongreG8dataset, the group is capable of two behaviors when a newcomer approaches them,i.e., Accommodate and Ignore (see Figure 4.6), which corresponds to the behaviorsof the group as the newcomer is approaching to join. These group behaviors areobserved in real-life datasets [APSS∗15, HK11] and experiments [VJL∗15].

In [YYI∗20], two novel group behavior recognition models are proposed, attention-based and graph-based, that consider behaviors on both the individual and grouplevels. The attention-based category consists of Approach Group Net (AGNet) andApproach Group Transformer (AGTransformer). They share a similar architec-ture and use attention mechanisms to encode temporal and spatial information onboth the individual and group levels. The graph-based models consist of ApproachGroup Graph Convolutional Networks (AG-GCN ), which combine Multi-Spatial-Temporal Graph Convolutional Networks (MST-GCN ) on the individual level andGraph Convolutional Networks (GCN ) on the group level, with multi-temporalstages (see Figure 4.7). The individual-level learns the spatial and temporal move-ment patterns of each agent, while the group level captures multiple agents’ rela-tions and interactions.

To show the application of the proposed group behavior recognition models,a use case in a virtual environment has been used (see Figure 4.8). There is atypical pattern of online multi-agent interactions that small groups of people gather

32 CHAPTER 4. APPROACHING BEHAVIORS INTO SMALL GROUPS

Figure 4.6: Two group behaviors when the newcomer (yellow character) approachesto join the group. The red arrow indicates the movement of the newcomer. Thegroup members stand still and ignore the newcomer purposefully (top). The groupmembers accommodate the newcomer, with one group member (red character)moving backward to make space for them (bottom). All skeletons above are recon-structed from the collected data.

and stand in an environment to converse, e.g., VRChat2. The group behaviorsshould account for the interactions between the group and a newcomer. The groupmembers either ignore the newcomer or react to them by adjusting their positionsand orientations to accommodate the newcomer in the group formation better. Themotivation for establishing such a virtual reality interaction environment is to show

2https://www.vrchat.com/

4.5. EVALUATION 33

Figure 4.7: Overview of the Approach Group Graph Convolutional Neural Network(AG-GCN) for group behavior analysis. The full-body markers are connected asskeleton graphs and fed into the Multi-spatial-temporal Graph Convolutional Net-work (MST-GCN), which encodes the marker’s spatial relationships and movementover time. The group members (P1, P2, P3) share the same model, while the new-comer (Q) is modeled through another MST-GCN model. The output from theMST-GCN module is then fed to the Group Graph Convolutional Neural Network(G-GCN) Module, which encodes the group’s spatial relationships. The output isused to classify the group behavior to either Accommodate or Ignore.

the capability of perceiving the group behaviors is desirable for intelligent agentswhen approaching free-standing conversational groups in a social manner.

4.5 Evaluation

In the last part of this thesis, an evaluation of generated approach group trajectoriesis presented [YYMP20]. Existing trajectory generation methods focus on collisionavoidance with pedestrians without considering feelings of discomfort from groupmembers. The models that generate approach behaviors into groups are evalu-ated in simulation. To evaluate methods with human participants, trajectories aregenerated to navigate mobile robots to approach and join small groups of people(see Figure 4.9). The experiment examined the impact of three trajectory gen-eration methods for a mobile robot to approach groups from multiple directions:a Wizard-of-Oz (WoZ) method, a procedural social-aware navigation model (PM),and a novel generative adversarial model imitating human approach behaviors (IL).The IL model is trained on the CongreG8 dataset utilizing the framework of Gen-erative Adversarial Imitation Learning (GAIL) [HE16]. Measures also comparedtwo camera viewpoints (see Figure 4.10) and static versus quasi-dynamic groups.As previously mentioned, a quasi-dynamic group is one whose members change ori-entation and position throughout the approach task, even though the group entityremains static in the environment. This represents a more realistic but challengingscenario for the robot. Three methods are evaluated with objective and subjectivemeasurements based on viewer perception. Results show that WoZ and IL have

34 CHAPTER 4. APPROACHING BEHAVIORS INTO SMALL GROUPS

Figure 4.8: The virtual reality use case. The perspective view of the multi-agentinteraction scenario, where a newcomer approaches to join a conversational group(top-left). The first-person view of the newcomer with the normalized probabilityof group behaviors represented by color bars (top-right). Each participant controlsone virtual character with one VR device (bottom).

comparable performance, and both perform better than PM under most conditions.

4.5. EVALUATION 35

Figure 4.9: The setup for the experiment in a motion capture lab where a Pepperrobot approaches to join a free-standing group with three group members.

Figure 4.10: Collected videos in the egocentric view (left) and in the perspectiveview (right).

Chapter 5

Conclusions

5.1 Discussion

The thesis covers work at multiple levels of interactions: dyadic interactions, one-to-N interactions, and group interactions. The thesis starts with proxemics fordyadic interactions in virtual environments. The virtual agents vary in terms ofappearance, body gesture, and gender. The following work investigates proxemicsin virtual and physical scenarios. Although there are differences in the preferencesand engagement of the embodiments in VR and physical scenarios, the embodi-ments in both scenarios still have a common social impact on users’ perceptions,providing support for further research on simulating group interactions in virtualenvironments before testing in real-world scenarios. Further, it inspires us to pro-pose perceptual models in crowd simulation and trajectory prediction.

This thesis bridges dyadic interactions and one-to-N interactions by studyingthe small group level. There are fewer studies in this area and this thesis proposesmethods concerning robots or agents’ joining group behaviors, including proceduralmodels and machine learning models. It starts with static groups utilizing 2Dposition and orientation information. Groups are found to be quasi-dynamic wherethe group members do not always stand still, and their positions are orientationschange as the conversation goes on. The quasi-dynamic group makes the joiningposition dynamic, and the newcomer needs to re-plan the approaching trajectoryto adopt a socially-acceptable manner. In order to make robots aware of humanperception in the groups and adopt social behaviors accordingly, machine learningmethods are developed with various approaches, including reinforcement learning,generative adversarial networks and imitation learning. These methods can adaptto quasi-dynamic groups.

Further, the 2D position and orientation information is extended to full-bodybehaviors as they contain richer behavioral information. To make the robots andagents better perceive human behaviors and intents, and adapt to groups of humans,this thesis proposes models of group behavior recognition and trajectory generation

37

38 CHAPTER 5. CONCLUSIONS

based on full-body information. In the meantime, the thesis collects a full-bodymotion-captured dataset to train these machine learning models, and the thesisproposes several use cases based on these models. The collected dataset makes thetraining process possible, and the machine learning approaches perform better thanprocedural models generally. With the consideration of human behaviors, robotsare sensitive to human behaviors and perception, and they adopt socially-acceptablebehaviors.

The thesis found that some social norms hold across virtual reality and physicalworld when exploring proxemics in dyadic interactions. It furthermore found thatthe embodiment of the agents and robots change the way people treat them withvarying appearances and body behaviors. Although the collected human-humaninteraction data provides foundations for modeling Human-Machine Interactionsinvolving small groups, both our experiments in [YYMP20] and the CongreG8dataset collecting process found some people are more likely to engage in Ignorebehaviors to robot newcomers in comparison to human newcomers. It is importantto take embodiments and humans’ perceptions into consideration when developingmachine learning approaches to generate robot behaviors since they likely impactthe way humans interact with them.

5.2 Shortcomings

The thesis focuses on the non-verbal behaviors of newcomers approaching smallgroups. The joining group behavior acts as an important step to trigger and ini-tialize the conversation. The proposed machine learning models are trained on thecollected dataset of groups of three people, and it is not extended towards othergroup sizes. The limitation in group sizes and types of behaviors restricts the workof generating other kinds of behaviors. However, the current dataset shows itscapacity in behavior recognition and robot behavior generation related to three-member groups. In general, it could be adapted to different group sizes by trainingon datasets with other group sizes and modifying the network inputs. The Con-greG8 dataset is updating by collecting group data with varying configurations,and it is enhanced by including more HRI data.

The leaving group behaviors are not covered in the thesis, and it could be futurework. It is also interesting to explore how a free-standing conversational groupfinalizes the conversation and break the group formation. The proposed modelsin this thesis could also be extended to generate such leaving group behaviors bytraining on related datasets.

The evaluation (Section 4.5) is done in a motion capture room with motioncapture markers and cameras. The full-body mocap data has higher quality withless occlusion and more continuity compared with video extracted skeleton infor-mation. It could be further extended to outdoor scenarios with computer visiontechniques to extract skeletons from videos, and the previously proposed machinelearning models with full-body markers as inputs could thus be applied.

5.2. SHORTCOMINGS 39

The ability of robots is limited by their appearances and mobility. As foundin the data collection, participants in groups show less accommodating behaviorsto the approaching robot than human newcomers. Although the research of thisthesis on embodiments found some social norms appear to be generally sharedacross virtual reality and physical scenarios, the embodiments are more likely tomake a difference on participants’ perceptions when full-body non-verbal behaviorsare included. For example, the robots adopting a similar moving speed as humansare reported to have a risk to collide onto the participants. Some participants havelower preferences towards robot appearances (i.e., Pepper) comparing with humans.It inspires us to continue research on the use of virtual environments, where agentscould be customized in terms of their appearance, for online virtual training withhumans.

Bibliography

[AGR∗16] Alahi A., Goel K., Ramanathan V., Robicquet A., Fei-FeiL., Savarese S.: Social lstm: Human trajectory prediction incrowded spaces. In Proceedings of the IEEE conference on computervision and pattern recognition (2016), pp. 961–971.

[AIK∗04] Althaus P., Ishiguro H., Kanda T., Miyashita T., Chris-tensen H. I.: Navigation for human-robot interaction tasks. InIEEE International Conference on Robotics and Automation, 2004.Proceedings. ICRA’04. 2004 (2004), vol. 2, IEEE, pp. 1894–1900.

[ALSN09] Amaoka T., Laga H., Saito S., Nakajima M.: Personal spacemodeling for human-computer interaction. In International Confer-ence on Entertainment Computing (2009), Springer, pp. 60–72.

[APSS∗15] Alameda-Pineda X., Staiano J., Subramanian R., BatrincaL., Ricci E., Lepri B., Lanz O., Sebe N.: Salsa: A noveldataset for multimodal group behavior analysis. IEEE transactionson pattern analysis and machine intelligence 38, 8 (2015), 1707–1720.

[APSS∗16] Alameda-Pineda X., Staiano J., Subramanian R., BatrincaL., Ricci E., Lepri B., Lanz O., Sebe N.: Salsa: A noveldataset for multimodal group behavior analysis. IEEE Transactionson Pattern Analysis and Machine Intelligence 38, 8 (2016), 1707–1720.

[APYR∗15] Alameda-Pineda X., Yan Y., Ricci E., Lanz O., Sebe N.:Analyzing free-standing conversational groups: A multimodal ap-proach. In Proceedings of the 23rd ACM international conferenceon Multimedia (2015), ACM, pp. 5–14.

[AYL∗17] Avramova V., Yang F., Li C., Peters C., Skantze G.: Avirtual poster presenter using mixed reality. In International Con-ference on Intelligent Virtual Agents (2017), Springer, pp. 25–28.

41

42 BIBLIOGRAPHY

[BCC13] Borges P. V. K., Conci N., Cavallaro A.: Video-based humanbehavior understanding: A survey. IEEE transactions on circuitsand systems for video technology 23, 11 (2013), 1993–2008.

[BDD03] Batty M., Desyllas J., Duxbury E.: Safety in numbers? mod-elling crowds and designing control for the notting hill carnival. Ur-ban Studies 40, 8 (2003), 1573–1590.

[BRSTV17] Ball A. K., Rye D. C., Silvera-Tawil D., Velonaki M.: Howshould a robot approach two people? Journal of Human-RobotInteraction 6, 3 (2017), 71–91.

[CIOP14] Cancela B., Iglesias A., Ortega M., Penedo M. G.: Unsu-pervised trajectory modelling using temporal information via min-imal paths. In Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (2014), pp. 2553–2560.

[CJ61] Coleman J. S., James J.: The equilibrium size distribution offreely-forming groups. Sociometry 24, 1 (1961), 36–45.

[CQDG∗18] Cabrera-Quiros L., Demetriou A., Gedik E., van der MeijL., Hung H.: The matchnmingle dataset: a novel multi-sensorresource for the analysis of social interactions and group dynam-ics in-the-wild during free-standing conversations and speed dates.IEEE Transactions on Affective Computing (2018).

[CRO∗16] Cafaro A., Ravenet B., Ochs M., Vilhjálmsson H. H.,Pelachaud C.: The effects of interpersonal attitude of a groupof agents on user′s presence and proxemics behavior. ACM Trans-actions on Interactive Intelligent Systems (TiiS) 6, 2 (2016), 12.

[CSG17] Celiktutan O., Skordos E., Gunes H.: Multimodal human-human-robot interactions (mhhri) dataset for studying personal-ity and engagement. IEEE Transactions on Affective Computing(2017).

[CTWB13] Carton D., Turnwald A., Wollherr D., Buss M.: Proac-tively approaching pedestrians with an autonomous mobile robot inurban environments. In Experimental Robotics (2013), Springer.

[EO12] Ennis C., O’Sullivan C.: Perceptually plausible formations forvirtual conversers. Comput. Animat. Virtual Worlds 23, 3-4 (May2012), 321–329.

[FDSF18a] Fernando T., Denman S., Sridharan S., Fookes C.: Gd-gan:Generative adversarial networks for trajectory prediction and groupdetection in crowds. arXiv preprint arXiv:1812.07667 (2018).

BIBLIOGRAPHY 43

[FDSF18b] Fernando T., Denman S., Sridharan S., Fookes C.: Soft+hardwired attention: An lstm framework for human trajectory pre-diction and abnormal event detection. Neural networks 108 (2018),466–478.

[FGMV12] Federici M. L., Gorrini A., Manenti L., Vizzari G.: Datacollection for modeling and simulation: case study at the universityof milan-bicocca. In International Conference on Cellular Automata(2012), Springer, pp. 699–708.

[GCR09] Ge W., Collins R. T., Ruback B.: Automatically detecting thesmall group structure of a crowd. In Applications of computer vision(wacv), 2009 workshop on (2009), IEEE, pp. 1–8.

[GCR12] Ge W., Collins R. T., Ruback R. B.: Vision-based analysisof small groups in pedestrian crowds. IEEE transactions on patternanalysis and machine intelligence 34, 5 (2012), 1003–1016.

[GJFF∗18] Gupta A., Johnson J., Fei-Fei L., Savarese S., Alahi A.: So-cial gan: Socially acceptable trajectories with generative adversarialnetworks. In Proceedings of the IEEE Conference on Computer Vi-sion and Pattern Recognition (2018), pp. 2255–2264.

[GMG14] Gómez J. V., Mavridis N., Garrido S.: Fast marching solu-tion for the social path planning problem. In 2014 IEEE Inter-national Conference on Robotics and Automation (ICRA) (2014),IEEE, pp. 1871–1876.

[Gof08] Goffman E.: Behavior in public places. Simon and Schuster, 2008.

[GPAM∗14] Goodfellow I., Pouget-Abadie J., Mirza M., Xu B.,Warde-Farley D., Ozair S., Courville A., Bengio Y.: Gen-erative adversarial nets. In Advances in neural information process-ing systems (2014), pp. 2672–2680.

[GYF∗19] Gao Y., Yang F., Frisk M., Hernandez D., Peters C.,Castellano G.: Social behavior learning with realistic rewardshaping. In 2019 28th IEEE International Symposium on Robot andHuman Interactive Communication (RO-MAN) (2019), IEEE.

[Hal10] Hall E. T.: The hidden dimension, vol. 609. Garden City, NY:Doubleday, 1910.

[Hal66] Hall E. T.: The hidden dimension, 1966.

[Har76] Hare A. P.: Handbook of small group research.

44 BIBLIOGRAPHY

[HE16] Ho J., Ermon S.: Generative adversarial imitation learning. InAdvances in neural information processing systems (2016), pp. 4565–4573.

[HJAA07] Helbing D., Johansson A., Al-Abideen H. Z.: Dynamics ofcrowd disasters: An empirical study. Physical review E 75, 4 (2007),046109.

[HK11] Hung H., Kröse B.: Detecting f-formations as dominant sets.In Proceedings of the 13th international conference on multimodalinterfaces (2011), ACM, pp. 231–238.

[HM95] Helbing D., Molnar P.: Social force model for pedestrian dy-namics. Physical review E 51, 5 (1995), 4282.

[HMC∗12] Hocevar R., Marson F., Cassol V., Braun H., Bidarra R.,Musse S. R.: From their environment to their behavior: A pro-cedural approach to model groups of virtual agents. Lecture Notesin Computer Science (including subseries Lecture Notes in Artifi-cial Intelligence and Lecture Notes in Bioinformatics) 7502 LNAI(2012), 370–376.

[HMP05] Hartmann B., Mancini M., Pelachaud C.: Implementing ex-pressive gesture synthesis for embodied conversational agents. InInternational Gesture Workshop (2005), Springer, pp. 188–199.

[Hug03] Hughes R. L.: The flow of human crowds. Annual review of fluidmechanics 35, 1 (2003), 169–182.

[Jam53] James J.: The distribution of free-forming small group size. Amer-ican Sociological Review (1953).

[JSL∗17] Joo H., Simon T., Li X., Liu H., Tan L., Gui L., BanerjeeS., Godisart T. S., Nabbe B., Matthews I., Kanade T.,Nobuhara S., Sheikh Y.: Panoptic studio: A massively multiviewsystem for social interaction capture. IEEE Transactions on PatternAnalysis and Machine Intelligence (2017).

[JT07] Jan D., Traum D. R.: Dynamic movement and positioning ofembodied agents in multiparty conversations. In Proceedings of theWorkshop on Embodied Language Processing (2007), Association forComputational Linguistics, pp. 59–66.

[Ken88] Kendon A.: Goffman’s approach to face-to-face interaction. ErvingGoffman: Exploring the interaction order (1988).

[Ken90] Kendon A.: Conducting interaction: Patterns of behavior in fo-cused encounters, vol. 7. CUP Archive, 1990.

BIBLIOGRAPHY 45

[KGS06] Kretz T., Grünebohm A., Schreckenberg M.: Experimentalstudy of pedestrian flow through a bottleneck. Journal of StatisticalMechanics: Theory and Experiment 2006, 10 (2006), P10014.

[KKPS14] Keltner D., Kogan A., Piff P. K., Saturn S. R.: The socio-cultural appraisals, values, and emotions (save) framework of proso-ciality: Core processes from gene to meme. Annual review of psy-chology 65 (2014), 425–460.

[KO10] Karamouzas I., Overmars M.: Simulating the local behaviour ofsmall pedestrian groups. In Proceedings of the 17th acm symposiumon virtual reality software and technology (2010), ACM, pp. 183–190.

[KSAO∗14] Koay K. L., Syrdal D. S., Ashgari-Oskoei M., WaltersM. L., Dautenhahn K.: Social roles and baseline proxemic pref-erences for a domestic service robot. International Journal of SocialRobotics 6, 4 (2014), 469–488.

[KSG14] Karamouzas I., Skinner B., Guy S. J.: Universal power lawgoverning pedestrian interactions. Physical review letters 113, 23(2014), 238701.

[LCL07] Lerner A., Chrysanthou Y., Lischinski D.: Crowds by exam-ple. Computer Graphics Forum 26, 3 (2007), 655–664.

[LMM03] Loscos C., Marchal D., Meyer A.: Intuitive crowd behavior indense urban environments using local laws. In Proceedings of Theoryand Practice of Computer Graphics, 2003. (June 2003), pp. 122–129.

[LSTA10] Luber M., Stork J. A., Tipaldi G. D., Arras K. O.: Peo-ple tracking with human motion predictions from social forces. InRobotics and Automation (ICRA), 2010 IEEE International Con-ference on (2010), IEEE, pp. 464–469.

[MM15] Mead R., Matarić M. J.: Proxemics and performance: Subjec-tive human evaluations of autonomous sociable robot distance andsocial signal understanding. In 2015 IEEE/RSJ International Con-ference on Intelligent Robots and Systems (IROS) (2015), IEEE,pp. 5984–5991.

[MPG∗10] Moussaïd M., Perozo N., Garnier S., Helbing D., Ther-aulaz G.: The walking behaviour of pedestrian social groups andits impact on crowd dynamics. PloS one 5, 4 (2010), e10047.

[NGCL09] Narain R., Golas A., Curtis S., Lin M. C.: Aggregate dynam-ics for dense crowd simulation. In ACM transactions on graphics(TOG) (2009), vol. 28, ACM, p. 122.

46 BIBLIOGRAPHY

[NSPB15] Narayanan V. K., Spalanzani A., Pasteau F., Babel M.: Onequitably approaching and joining a group of interacting humans.In 2015 IEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS) (2015), IEEE, pp. 4071–4077.

[ONP15] Ochs M., Niewiadomski R., Pelachaud C.: 18 facial expres-sions of emotions for virtual characters. In The Oxford Handbook ofAffective Computing. Oxford University Press, USA, 2015, p. 261.

[OS88] Osher S., Sethian J. A.: Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi formulations.Journal of computational physics 79, 1 (1988), 12–49.

[PE09] Peters C., Ennis C.: Modeling groups of plausible virtual pedes-trians. IEEE Computer Graphics and Applications 29, 4 (2009),54–63.

[Pel05] Pelachaud C.: Multimodal expressive embodied conversationalagents. In Proceedings of the 13th annual ACM international con-ference on Multimedia (2005), pp. 683–689.

[PESVG09] Pellegrini S., Ess A., Schindler K., Van Gool L.: You’llnever walk alone: Modeling social behavior for multi-target tracking.In Computer Vision, 2009 IEEE 12th International Conference on(2009), IEEE, pp. 261–268.

[PLY∗18] Peters C., Li C., Yang F., Avramova V., Skantze G.: In-vestigating social distances between humans, virtual humans andvirtual robots in mixed reality. In Proceedings of the 17th Interna-tional Conference on Autonomous Agents and MultiAgent Systems(2018), pp. 2247–2249.

[PPD∗10] Panzoli D., Peters C., Dunwell I., Sanchez S., Petridis P.,Protopsaltis A., Scesa V., de Freitas S.: A Level of Interac-tion Framework for Exploratory Learning with Characters in VirtualEnvironments. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010,pp. 123–143.

[PV08] Pedica C., Vilhjálmsson H.: Social perception and steeringfor online avatars. In International Workshop on Intelligent VirtualAgents (2008), Springer, pp. 104–116.

[PYS∗18] Peters C., Yang F., Saikia H., Li C., Skantze G.: Towardsthe use of mixed reality for hri design via virtual robots. In Pro-ceedings of the 1st International Workshop on Virtual, Augmented,and Mixed Reality for HRI (VAM-HRI) (2018).

BIBLIOGRAPHY 47

[RVS∗15] Ricci E., Varadarajan J., Subramanian R., Rota Bulo S.,Ahuja N., Lanz O.: Uncovering interactions and interactors:Joint estimation of head, body orientation and f-formations fromsurveillance videos. In Proceedings of the IEEE International Con-ference on Computer Vision (2015), pp. 4660–4668.

[SAD∗09] Singh H., Arter R., Dodd L., Langston P., Lester E.,Drury J.: Modelling subgroup behaviour in crowd dynamics DEMsimulation. Applied Mathematical Modelling 33, 12 (2009), 4408 –4423.

[SAM12] Skantze G., Al Moubayed S.: Iristk: a statechart-based toolkitfor multi-party face-to-face interaction. In Proceedings of the 14thACM international conference on Multimodal interaction (2012),pp. 69–76.

[SKG∗13] Satake S., Kanda T., Glas D. F., Imai M., Ishiguro H.,Hagita N.: A robot that approaches pedestrians. IEEE Transac-tions on Robotics 29, 2 (2013), 508–524.

[SKS∗18] Sadeghian A., Kosaraju V., Sadeghian A., Hirose N.,Savarese S.: Sophie: An attentive gan for predicting pathscompliant to social and physical constraints. arXiv preprintarXiv:1806.01482 (2018).

[SMUAS07] Sisbot E. A., Marin-Urias L. F., Alami R., Simeon T.: Ahuman aware mobile robot motion planner. IEEE Transactions onRobotics 23, 5 (2007), 874–883.

[SP18] Stergiou A., Poppe R.: Understanding human-human interac-tions: a survey. arXiv preprint arXiv:1808.00022 (2018).

[SYP19] Saikia H., Yang F., Peters C.: Priority driven local optimiza-tion for crowd simulation. In Proceedings of the 18th InternationalConference on Autonomous Agents and MultiAgent Systems (2019),pp. 2180–2182.

[TCJ13] Torta E., Cuijpers R. H., Juola J. F.: Design of a parametricmodel of personal space for robotic social navigation. InternationalJournal of Social Robotics 5, 3 (2013), 357–365.

[TCP06] Treuille A., Cooper S., Popović Z.: Continuum crowds. ACMTransactions on Graphics (TOG) 25, 3 (2006), 1160–1168.

[TMMK13] Trautman P., Ma J., Murray R. M., Krause A.: Robot navi-gation in dense human crowds: the case for cooperation. In Roboticsand Automation (ICRA), 2013 IEEE International Conference on(2013), IEEE, pp. 2153–2160.

48 BIBLIOGRAPHY

[TN18] Truong X.-T., Ngo T.-D.: ’to approach humans?’: A unifiedframework for approaching pose prediction and socially aware robotnavigation. IEEE Transactions on Cognitive and DevelopmentalSystems 10, 3 (2018), 557–572.

[VCM∗17] Vázquez M., Carter E. J., McDorman B., Forlizzi J., Ste-infeld A., Hudson S. E.: Towards robot autonomy in group con-versations: Understanding the effects of body orientation and gaze.In Proceedings of the 2017 ACM/IEEE International Conference onHuman-Robot Interaction (2017), ACM, pp. 42–52.

[VDBGLM11] Van Den Berg J., Guy S. J., Lin M., Manocha D.: Reciprocaln-body collision avoidance. In Robotics research. Springer, 2011,pp. 3–19.

[VdBLM08] Van den Berg J., Lin M., Manocha D.: Reciprocal veloc-ity obstacles for real-time multi-agent navigation. In Robotics andAutomation, 2008. ICRA 2008. IEEE International Conference on(2008), IEEE, pp. 1928–1935.

[VJL∗15] Vroon J., Joosse M., Lohse M., Kolkmeier J., Kim J.,Truong K., Englebienne G., Heylen D., Evers V.: Dy-namics of social positioning patterns in group-robot interactions.In 2015 24th IEEE International Symposium on Robot and HumanInteractive Communication (RO-MAN) (2015), IEEE, pp. 394–399.

[VMdO03] Villamil M. B., Musse S. R., de Oliveira L. P. L.: A modelfor generating and animating groups of virtual agents. In Inter-national Workshop on Intelligent Virtual Agents (2003), Springer,pp. 164–169.

[VMO18] Vemula A., Muelling K., Oh J.: Social attention: Modelingattention in human crowds. In 2018 IEEE International Conferenceon Robotics and Automation (ICRA) (2018), IEEE, pp. 1–7.

[WDWK07] Walters M. L., Dautenhahn K., Woods S. N., Koay K. L.:Robotic etiquette: results from user studies involving a fetch andcarry task. In 2007 2nd ACM/IEEE International Conference onHuman-Robot Interaction (HRI) (2007), IEEE, pp. 317–324.

[YCDD05] Yu W., Chen R., Dong L., Dai S.: Centrifugal force model forpedestrian dynamics. Physical Review E 72 (2005), 026112.

[YJ07] Yu W., Johansson A.: Modeling crowd turbulence by many-particle simulations. Physical review E 76, 4 (2007), 046105.

BIBLIOGRAPHY 49

[YLP∗17] Yang F., Li C., Palmberg R., Van Der Heide E., PetersC.: Expressive virtual characters for social demonstration games.In 2017 9th International Conference on Virtual Worlds and Gamesfor Serious Applications (VS-Games) (2017), IEEE, pp. 217–224.

[YLW15] Yi S., Li H., Wang X.: Understanding pedestrian behaviors fromstationary crowd groups. In Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (2015), pp. 3488–3496.

[YP19a] Yang F., Peters C.: App-lstm: Data-driven generation of sociallyacceptable trajectories for approaching small groups of agents. InProceedings of the 7th International Conference on Human-AgentInteraction (2019), ACM.

[YP19b] Yang F., Peters C.: Appgan: Generative adversarial networksfor generating robot approach behaviors into small groups of people.In 2019 28th IEEE International Symposium on Robot and HumanInteractive Communication (RO-MAN) (2019), IEEE.

[YP19c] Yang F., Peters C.: Social-aware navigation in crowds withstatic and dynamic groups. In 2019 11th International Conferenceon Virtual Worlds and Games for Serious Applications (VS-Games)(2019), IEEE.

[YSP18] Yang F., Saikia H., Peters C.: Who are my neighbors? aperception model for selecting neighbors of pedestrians in crowds.In Proceedings of the 18th International Conference on IntelligentVirtual Agents (2018), pp. 269–274.

[YSQP18] Yang F., Shabo J., Qureshi A., Peters C.: Do you see groups?the impact of crowd density and viewpoint on the perception ofgroups. In Proceedings of the 18th International Conference on In-telligent Virtual Agents (2018), pp. 313–318.

[YYI∗20] Yang F., Yin W., Inamura T., Björkman M., Peters C.:Group behavior recognition using attention- and graph-based neu-ral networks. In Proceedings of the 24th European Conference onArtificial Intelligence (2020).

[YYMP20] Yang F., Yin W., Mårten B., Peters C.: Impact of trajec-tory generation methods on viewer perception of robot approachinggroup behaviors. In 2020 29th IEEE International Symposium onRobot and Human Interactive Communication (RO-MAN) (2020),IEEE.

[Zaj65] Zajonc R. B.: Social facilitation. researchcenter for group dynam-ics. Institute for social research, University of Michigan (1965).

50 BIBLIOGRAPHY

[ZIK11] Zanlungo F., Ikeda T., Kanda T.: Social force model withexplicit collision prediction. EPL (Europhysics Letters) 93, 6 (2011),68005.

[ZOZ∗19] Zhang P., Ouyang W., Zhang P., Xue J., Zheng N.: Sr-lstm:State refinement for lstm towards pedestrian trajectory prediction.arXiv preprint arXiv:1903.02793 (2019).

[ZZZ∗19] Zhang H.-B., Zhang Y.-X., Zhong B., Lei Q., Yang L., DuJ.-X., Chen D.-S.: A comprehensive survey of vision-based humanaction recognition methods. Sensors 19, 5 (2019), 1005.

Included Papers and Manuscripts

1. Fangkai Yang, Chengjie Li, Robin Palmberg, Ewoud Van Der Heide andChristopher Peters.“Expressive virtual characters for social demonstration games.”9th International Conference on Virtual Worlds and Games for Serious Ap-plications (VS-Games), (September 2017).

2. Christopher Peters, Fangkai Yang, Himangshu Saikia, Chengjie Li, GabrielSkantze.“Towards the use of mixed reality for hri design via virtual robots.”Proceedings of the 1st International Workshop on Virtual, Augmented, andMixed Reality for HRI (VAM-HRI), (2018).

3. Fangkai Yang, Himangshu Saikia, Christopher Peters“Who are my neighbors? A perception model for selecting neighbors of pedes-trians in crowds.”Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA), (2018).

4. Fangkai Yang, Jack Shabo, Adam Qureshi, Christopher Peters“Do you see groups? The impact of crowd density and viewpoint on the per-ception of groups.”Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA), (2018).

5. Fangkai Yang, and Christopher Peters.“App-LSTM: Data-driven generation of socially acceptable trajectories forapproaching small groups of agents.”Proceedings of the 7th International Conference on Human-Agent Interaction,(September 2019).

6. Fangkai Yang, and Christopher Peters.“AppGAN: Generative Adversarial Networks for Generating Robot Approach

51

52 BIBLIOGRAPHY

Behaviors into Small Groups of People.”28th IEEE International Conference on Robot and Human Interactive Com-munication (RO-MAN), (October 2019).

7. Fangkai Yang, Wenjie Yin, Tetsunari Inamura, Björkman, Mårten, andChristopher Peters.“Group Behavior Recognition Using Attention-and Graph-Based Neural Net-works.”24th European Conference on Artificial Intelligence (ECAI), (September 2020).

8. Fangkai Yang, Wenjie Yin, Björkman, Mårten, and Christopher Peters.“Impact of Trajectory Generation Methods on Viewer Perception of RobotApproaching Group Behaviors.”29th IEEE International Symposium on Robot and Human Interactive Com-munication (RO-MAN), (October 2019).