Analysis of Computer- Mediated Discourses Focusing on ...

281
Submitted by Dipl-Ing. Robert Ecker Submitted at Department of Telecooperation Supervisor and First Examiner Univ.-Prof. Mag. Dr. Gabriele Anderst-Kotsis Second Examiner o. Univ.-Prof. Dipl.-Ing. Dr. Michael Schrefl August 2017 JOHANNES KEPLER UNIVERSITY LINZ Altenbergerstraße 69 4040 Linz, ¨ Osterreich www.jku.at DVR 0093696 Analysis of Computer- Mediated Discourses Focusing on Automated Detection and Guessing of Structural Sender- Receiver Relations Doctoral Thesis to obtain the academic degree of Doktor der technischen Wissenschaften in the Doctoral Program Technische Wissenschaften

Transcript of Analysis of Computer- Mediated Discourses Focusing on ...

Submitted byDipl-Ing. Robert Ecker

Submitted atDepartment ofTelecooperation

Supervisor andFirst ExaminerUniv.-Prof. Mag. Dr.Gabriele Anderst-Kotsis

Second Examinero. Univ.-Prof. Dipl.-Ing.Dr. Michael Schrefl

August 2017

JOHANNES KEPLERUNIVERSITY LINZAltenbergerstraße 694040 Linz, Osterreichwww.jku.atDVR 0093696

Analysis of Computer-Mediated DiscoursesFocusing on AutomatedDetection and Guessingof Structural Sender-Receiver Relations

Doctoral Thesis

to obtain the academic degree of

Doktor der technischen Wissenschaften

in the Doctoral Program

Technische Wissenschaften

Kurzfassung

Formen der computervermittelten Kommunikation (CvK) sind allgegenwärtig und beein-flussen unser Leben täglich. Facebook, Myspace, Skype, Twitter, WhatsApp und YouTubeproduzieren große Mengen an Daten - ideal für Analysen. Automatisierte Tools für dieDiskursanalyse verarbeiten diese enormen Mengen an computervermittelten Diskursenschnell. Diese Dissertation beschreibt die Entwicklung und Struktur einer Software-Architektur für ein automatisiertes Tool, das computervermittelte Diskurse analysiert,um die Frage “Wer kommuniziert mit wem?” zu jedem Zeitpunkt zu beantworten. DieZuweisung von Empfängern zu jeder einzelnen Nachricht ist ein wichtiger Schritt. DirekteAdressierung hilft, wird aber nicht in jeder Nachricht verwendet.

Populäre Kommunikationsmodelle und die am weitesten verbreiteten CvK-Systemewerden untersucht. Das zugrunde liegende Kommunikationsmodell verdeutlicht diewesentlichen Elemente von CvK und zeigt, wie diese Kommunikation abläuft. Mit diesemVerständnis werden mehrere Betrachtungsweisen definiert, die durch verschiedene At-tribute und unterschiedliche Leitfragen repräsentiert werden. Praktische Beispiele er-läutern, welche grundlegenden Informationen aus textbasierten Diskursen gewonnenwerden können und wie dies stattfindet. Der Autor konzentriert sich hauptsächlich aufden Internet Relay Chat (IRC) als angewandtes Beispiel aufgrund seines frei zugänglichenund gut dokumentierten Protokolls. In Diskursen ist nicht immer klar, wer mit wemkommuniziert. Dies ist besonders bei automatischer Diskursanalyse problematisch. Es istwichtig, die Nicknamen der Benutzer in einem schriftlichen Diskurs zu identifizieren, umdie Absender und Empfänger von Nachrichten zu bestimmen. Jedoch sind die sprach-lichen Möglichkeiten in der Kreation von Nicknamen und auch deren Verwendung imDiskurs vielfältig. Um zu untersuchen, wie Nicknamen kreiert und in IRC verwendet wer-den, wurden Logs von 13 Gesprächskanälen (Channels) bestehend aus 8937 öffentlichenChat-Nachrichten und 7936 einzigartigen Nicknamen detailliert analysiert. In dieserDissertation wird beschrieben, welche grundlegende Struktur IRC Nicknamen haben, auswelchen Gruppen von Wortarten Nicknamen zusammengesetzt sind, und welche Teileder Nicknamen im Chat-Diskurs weggelassen werden. Dieses Wissen, in Kombinationmit der Identität des eingeloggten Benutzers, führt zu einer besseren Vorhersage darüber,ob das untersuchte Wort im Diskurs eine verkürzte oder kreativ veränderte Form einesNicknamens sein kann. Darüber hinaus verbessert diese Arbeit zwei weitere Funktionen:Erstens, die automatisierte Erkennung von geschriebenen Empfängernamen (oder Teilendavon) und ihre Zuordnung zu eingeloggten Benutzern. Zweitens, wenn kein Empfänger-name geschrieben wird, das automatisierte Erraten des Empfängernamens ohne Semantik.Die Architektur der Software wird im Detail beschrieben. Ein IRC-Diskurs mit 5605Nachrichten wird manuell und automatisch analysiert, beide Ansätze erzielen ähnlichgute Ergebnisse für die Erkennung und das Erraten von Sender-Empfänger-Relationen.

i

Abstract

Various forms of computer-mediated communication (CMC) have become ubiquitous,and influence our lives in many ways. Facebook, Myspace, Skype, Twitter, WhatsApp,and YouTube produce enormous amounts of traffic and data—which is ideal for analysis.Automated tools for discourse analysis process this tremendous amount of computer-mediated discourse quickly. The aim of this thesis is to describe and develop a softwarearchitecture for an automated tool that analyzes computer-mediated discourses to answerthe question “Who is communicating with whom?” at any point in time. Assigningreceivers to each message is an important step. While direct addressing is helpful, it is notused in every message.

The author explores popular communication models and the most widely used CMCsystems. The underlying communication model highlights the basic elements of CMC,and shows how this communication takes place. Based on this understanding, multipleviews are defined by using different attributes and various guiding questions. Practicalexamples explain which basic information can be extracted from text-based discourses,and how that is done. The author mainly focuses on Internet Relay Chat (IRC) as anapplied example because of its open and well-documented protocol. In discourses, it is notalways clear who is communicating with whom; which especially affects the automaticanalysis of discourses. It is important to identify the users’ nicknames in written discoursein order to determine who the respective senders and receivers are. However, the linguisticpossibilities in nickname creation, and of using nicknames in the discourse, are various.To study how nicknames are created and used in IRC, logs of 13 channels, consistingof 8937 public chat messages and 7936 unique nicknames, are analyzed in detail. Thisthesis shows the basic structure of IRC nicknames, which parts of speech group are usedto compound nicknames, and which parts of speech of a nickname are omitted withinthe chat discourse. This knowledge leads to a better prediction as to whether there isa link between a current logged-in user and the examined word in discourse, whichcan be a shortened or creatively changed form of a nickname. Additionally, this workimproves two other aspects: first, automated detection and mapping of written receivernames (or parts thereof) for logged-in users; and second, automated receiver guessingwithout semantics if no receiver name is specified. The architecture of the automatedsoftware is described in detail. An IRC discourse with 5605 messages is manually andautomatically analyzed, and both approaches achieve similar results in detecting andguessing sender-receiver relations.

iii

Dedicated to my family and all my friends. Thank you for bringing color to my life.Thank you also to the staff of the hospital “Krankenhaus der Barmherzigen Schwestern Ried”.

Everybody can make their dreams come true.

— Walter Elias “Walt” Disney (1901–1966)

Acknowledgment

I am grateful to many people who were directly or indirectly involved in the preparationof this thesis. In particular, I want to thank

• my supervisor Univ.-Prof. Mag. Dr. Gabriele Anderst-Kotsis for her guidance,valuable feedback, and support that she provided throughout this work,

• o. Univ.-Prof. Dipl.-Ing. Dr. Michael Schrefl for reading this thesis as the secondexaminer and giving many valuable hints on various topics,

• Mag. Dr. Michael Karlinger for helpful discussions and feedbacks,

• Dewi Williams for proofreading my three papers and an earlier version of this thesis,

• Susan Gall, Leila Johnston, and Hashim Zakiullah for proofreading the thesis,

• Ewa Joanna Bogad, Carolin Zeitler, and Paul Froemel for reading the Germanabstract,

• the peer-reviewers for their helpful comments, and

• all my student colleagues for making the time at university as enjoyable as it was.

ix

Contents

Abstract iii

Contents xi

List of tables xix

List of figures xxiii

1 Introduction 11.1 Research background and motivation . . . . . . . . . . . . . . . . . . . . . 11.2 Objectives, research questions, and guiding questions . . . . . . . . . . . . 21.3 Scope and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Research approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5.1 (Computer-mediated) communication . . . . . . . . . . . . . . . . . 61.5.2 Computer-mediated discourse analysis . . . . . . . . . . . . . . . . 61.5.3 Automated discourse analysis with a focus on IRC . . . . . . . . . 6

1.6 List of original publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

I (Computer-mediated) communication 9

2 Communication and communication models 112.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 Communication model . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Communication models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.1 Aristotle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.2 Ferdinand de Saussure . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.3 Karl Bühler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.4 Wendell Johnson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.5 Harold Dwight Lasswell . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.6 Claude Elwood Shannon and Warren Weaver . . . . . . . . . . . . 152.2.7 Jürgen Ruesch and Gregory Bateson . . . . . . . . . . . . . . . . . . 162.2.8 Theodore Mead Newcomb . . . . . . . . . . . . . . . . . . . . . . . 162.2.9 Wilbur Schramm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.10 Charles Egerton Osgood and Wilbur Schramm . . . . . . . . . . . . 172.2.11 George Gerbner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.12 Bruce H. Westley and Malcolm S. MacLean . . . . . . . . . . . . . . 192.2.13 John W. Riley and Matilda White Riley . . . . . . . . . . . . . . . . 192.2.14 David Kenneth Berlo . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.15 Roman Ossipowitsch Jakobson . . . . . . . . . . . . . . . . . . . . . 20

xi

Contents

2.2.16 Gerhard Maletzke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.17 Melvin Lawrence DeFleur . . . . . . . . . . . . . . . . . . . . . . . . 212.2.18 Frank E. X. Dance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.19 Samuel L. Becker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.20 Elizabeth G. Andersch, Lorin C. Staats, and Robert N. Bostrom . . 222.2.21 Dean C. Barnlund . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.22 Ray Eldon Hiebert, Donald F. Ungurait, and Thomas W. Bohn . . . 242.2.23 D. Lawrence Kincaid . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2.24 Friedemann Schulz von Thun . . . . . . . . . . . . . . . . . . . . . . 25

2.3 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Computer-mediated communication (CMC) 273.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2 Characteristics of CMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.3 CMC systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3.1 Electronic mailing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3.2 Electronic mailing list . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.3 Usenet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3.4 Forum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3.5 Blog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.3.6 Wiki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3.7 Online chat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.8 Instant messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.9 Text messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.3.10 Social network site . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.3.11 Other CMC systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.4 Advantages and disadvantages of CMC . . . . . . . . . . . . . . . . . . . . 443.4.1 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.4.2 Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.4.3 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.4.4 Place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.4.5 Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.4.6 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.4.7 Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.4.8 Data exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.4.9 Interactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.4.10 Shared knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.4.11 Amount of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.4.12 Richness of communication . . . . . . . . . . . . . . . . . . . . . . . 463.4.13 Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.4.14 Storage of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.5 Ethical issues in CMC research . . . . . . . . . . . . . . . . . . . . . . . . . 473.6 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4 Communication model for CMC 494.1 Computer-mediated communication process . . . . . . . . . . . . . . . . . 504.2 Basic elements of CMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

xii

4.2.2 CMC system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2.3 Communicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2.4 Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.2.5 Communication barrier . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.3 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 Communication model for CMC applied to Internet Relay Chat (IRC) 555.1 CMC system: IRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.1 IRC server and network . . . . . . . . . . . . . . . . . . . . . . . . . 565.1.2 IRC server software (IRC daemon) . . . . . . . . . . . . . . . . . . . 565.1.3 Service: IRC bot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.1.4 IRC client software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.1.5 IRC channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.2 Communicator and nickname . . . . . . . . . . . . . . . . . . . . . . . . . . 595.3 IRC message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.3.1 IRC message format . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.3.2 IRC message delivery . . . . . . . . . . . . . . . . . . . . . . . . . . 605.3.3 IRC commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

II Computer-mediated discourse analysis 63

6 Multiple-views analysis approach to computer-mediated discourses 656.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.1.2 Text and discourse analysis . . . . . . . . . . . . . . . . . . . . . . . 666.1.3 Conversation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 666.1.4 Content analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.1.5 Computer-mediated discourse analysis . . . . . . . . . . . . . . . . 67

6.2 Overview: General steps and extensions . . . . . . . . . . . . . . . . . . . . 686.3 Ethical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4 Key questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.5 Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.5.1 Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.5.2 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746.5.3 Communication barrier . . . . . . . . . . . . . . . . . . . . . . . . . 746.5.4 Date/time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.5.5 Hardware and network . . . . . . . . . . . . . . . . . . . . . . . . . 776.5.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.5.7 Communicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.5.8 Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806.5.9 Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.5.10 Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836.5.11 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.5.12 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.6 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.7 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

xiii

Contents

6.8 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7 Multiple-views analysis approach applied to IRC 937.1 Information on the general steps . . . . . . . . . . . . . . . . . . . . . . . . 937.2 Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.2.1 Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937.2.2 Date/time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.2.3 Hardware and network . . . . . . . . . . . . . . . . . . . . . . . . . 957.2.4 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.2.5 Communicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.2.6 Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.2.7 Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017.2.8 Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7.3 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8 Multiple-views analysis approach applied to IRC discourses 1058.1 Datasets in this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058.2 Information on the general steps . . . . . . . . . . . . . . . . . . . . . . . . 105

8.2.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1068.2.2 Data extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

8.3 Ethical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078.4 Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

8.4.1 Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078.4.2 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1138.4.3 Communication barrier . . . . . . . . . . . . . . . . . . . . . . . . . 1148.4.4 Date/time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1148.4.5 Hardware and network . . . . . . . . . . . . . . . . . . . . . . . . . 1168.4.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1178.4.7 Communicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1178.4.8 Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198.4.9 Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1228.4.10 Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1248.4.11 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1248.4.12 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

8.5 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1258.6 Message visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.6.1 Visualization with LogBot . . . . . . . . . . . . . . . . . . . . . . . . 1278.6.2 Visualization with mIRC . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.7 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1288.8 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

III Automated discourse analysis with a focus on IRC 131

9 Creation of IRC nicknames 1339.1 Information on the general steps . . . . . . . . . . . . . . . . . . . . . . . . 133

9.1.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1339.1.2 Data extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

xiv

9.1.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1359.2 Typologies of chat nicknames . . . . . . . . . . . . . . . . . . . . . . . . . . 1359.3 Requirements for a “perfect” nickname . . . . . . . . . . . . . . . . . . . . 1369.4 The story behind the chosen nickname . . . . . . . . . . . . . . . . . . . . . 1379.5 Nickname restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

9.5.1 Nickname collision . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1399.5.2 Erroneous nickname . . . . . . . . . . . . . . . . . . . . . . . . . . . 1399.5.3 Letter case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1419.5.4 Maximum nickname length (NICKLEN) . . . . . . . . . . . . . . . 1429.5.5 Inappropriate nicknames . . . . . . . . . . . . . . . . . . . . . . . . 143

9.6 Compounding/decompounding of nicknames . . . . . . . . . . . . . . . . 1439.6.1 Creating a stem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1449.6.2 Styling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

9.7 Basic structure of IRC nicknames . . . . . . . . . . . . . . . . . . . . . . . . 1509.7.1 Stem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1509.7.2 Clan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1509.7.3 Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1519.7.4 Decoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1519.7.5 Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

9.8 Classification of nicknames . . . . . . . . . . . . . . . . . . . . . . . . . . . 1539.8.1 Stem-based nickname . . . . . . . . . . . . . . . . . . . . . . . . . . 1549.8.2 Non-stem-based nickname . . . . . . . . . . . . . . . . . . . . . . . 1559.8.3 Mixed-based nickname . . . . . . . . . . . . . . . . . . . . . . . . . 155

9.9 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

10 Use of IRC nicknames in English chatroom discourse 15710.1 Information on the general steps . . . . . . . . . . . . . . . . . . . . . . . . 15710.2 Focus on chat communication . . . . . . . . . . . . . . . . . . . . . . . . . . 15810.3 Direct addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15810.4 Tracking of nicknames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16010.5 Complications in the detection of nicknames while chatting . . . . . . . . 162

10.5.1 Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16310.5.2 Orthographic errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 16310.5.3 Saving keystrokes, time, and effort . . . . . . . . . . . . . . . . . . . 16410.5.4 Creative linguistic playground . . . . . . . . . . . . . . . . . . . . . 165

10.6 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

11 Automated detection and mapping of identifiers 16711.1 Information on the general steps . . . . . . . . . . . . . . . . . . . . . . . . 168

11.1.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16811.1.2 Data extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16811.1.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

11.2 Read log and config . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16811.3 Message structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

11.3.1 Handle system-specific message templates . . . . . . . . . . . . . . 16811.3.2 Handle template “direct addressing” . . . . . . . . . . . . . . . . . 17011.3.3 Handle template “greeting/farewell” . . . . . . . . . . . . . . . . . 170

11.4 Communicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

xv

Contents

11.4.1 Tracking identifier changes . . . . . . . . . . . . . . . . . . . . . . . 17111.4.2 Check availability of communicators . . . . . . . . . . . . . . . . . . 171

11.5 Identifier structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17111.5.1 Calculate complexity of identifiers . . . . . . . . . . . . . . . . . . . 17111.5.2 Calculate identifierprint . . . . . . . . . . . . . . . . . . . . . . . . . 17211.5.3 Calculate POS groups . . . . . . . . . . . . . . . . . . . . . . . . . . 173

11.6 Comparing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17411.6.1 Compare identifiers with text fragments . . . . . . . . . . . . . . . . 17411.6.2 Adapt scope of identifiers due to tracking list . . . . . . . . . . . . . 17611.6.3 Compare POS groups of logged-in identifiers with text fragments . 17611.6.4 Adapt orthography and compare . . . . . . . . . . . . . . . . . . . . 176

11.7 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17611.8 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

12 Automated receiver guessing without semantics 17912.1 Information on the general steps . . . . . . . . . . . . . . . . . . . . . . . . 17912.2 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

12.2.1 Load analysis approach . . . . . . . . . . . . . . . . . . . . . . . . . 18012.2.2 Consider specific settings . . . . . . . . . . . . . . . . . . . . . . . . 180

12.3 Message structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18212.3.1 Handle messages addressed to a group or all . . . . . . . . . . . . . 18212.3.2 Consider adjacency pairs . . . . . . . . . . . . . . . . . . . . . . . . 182

12.4 Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18212.4.1 Calculate the most probable receivers . . . . . . . . . . . . . . . . . 18312.4.2 Return the calculated receivers . . . . . . . . . . . . . . . . . . . . . 183

12.5 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

13 Manual vs. automated analysis 18513.1 Manual analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

13.1.1 Detection and mapping of identifiers . . . . . . . . . . . . . . . . . 18513.1.2 Receiver guessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18713.1.3 Visualization: Heatmap in HTML . . . . . . . . . . . . . . . . . . . 188

13.2 Automated analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19013.2.1 Detection and mapping of identifiers with IdentifierMapper . . . . 19013.2.2 Receiver guessing with ReceiverGuesser . . . . . . . . . . . . . . . 190

13.3 Comparing both versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19013.3.1 Detection and mapping of identifiers . . . . . . . . . . . . . . . . . 19013.3.2 Receiver guessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

13.4 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

IV Conclusion 195

14 Results and future work 19714.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19714.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Bibliography 199

xvi

Appendix 233

A The Penn Treebank POS Tagset 235

B IRC message format 237B.1 “Pseudo” BNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237B.2 Augmented BNF (ABNF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

C Detailed hourly usage (CET) per channel 239

D IRC commands 241D.1 IRC server commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241D.2 Additional IRC server commands . . . . . . . . . . . . . . . . . . . . . . . . 245D.3 IRC client (mIRC) commands . . . . . . . . . . . . . . . . . . . . . . . . . . 246D.4 Mapping of commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

xvii

List of tables

1.1 Guiding questions and corresponding chapters . . . . . . . . . . . . . . . 31.2 Elements of research methods and corresponding chapters . . . . . . . . . 41.3 Chapters based on author’s publications . . . . . . . . . . . . . . . . . . . 7

3.1 Example of an email message . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1 Influencers for the new model . . . . . . . . . . . . . . . . . . . . . . . . . 494.2 Basic elements visualized within the communication model for CMC . . . 51

5.1 IRC services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.2 Examples of IRC message delivery (adapted from [OR93]) . . . . . . . . . 615.3 IRC command groups (adapted from [Kal00c; Kal00d; OR93]) . . . . . . . 61

6.1 Four domains of language (adapted from [Her04]) . . . . . . . . . . . . . . 676.2 Herring’s medium and situation factors (adapted from [Her04]) . . . . . . 686.3 Key questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4 Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.5 Typical path of a message in CMC . . . . . . . . . . . . . . . . . . . . . . . 726.6 Extracts of messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.7 Causes of communication barriers mapped to each view . . . . . . . . . . 746.8 Extracts of barriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.9 Extracts of timestamps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.10 Extracts of hardware and network . . . . . . . . . . . . . . . . . . . . . . . 776.11 Extracts of software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.12 Extracts of identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.13 Extracts of relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.14 Extracts of topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826.15 Extracts of emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836.16 Examples of cause-effect relationships . . . . . . . . . . . . . . . . . . . . . 846.17 Attributes of each view (examples) . . . . . . . . . . . . . . . . . . . . . . . 856.18 Figures of Table 6.19 mapped to their used views . . . . . . . . . . . . . . 896.19 Visualization of discourses . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.1 Typical path of a message in IRC . . . . . . . . . . . . . . . . . . . . . . . . 937.2 IRC Networks - Top 10 on 27.06.2012 (adapted from [Gel13]) . . . . . . . . 957.3 Available statuses of communicators . . . . . . . . . . . . . . . . . . . . . . 987.4 Detection of nicknames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007.5 Mapping of nicknames: Assignment possibilities . . . . . . . . . . . . . . 1007.6 IRC channels: Top 10 on 17.06.2012 (adapted from [Gel13]) . . . . . . . . . 1027.7 Classification of typographic emoticons (adapted from [Ama12]) . . . . . 1027.8 Japanese basic text emoticons (kaomoji) (adapted from [Kav15; Oku05;

Pta+11]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

xix

List of tables

8.1 Overview of datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058.2 Selected IRC networks, servers, and channels per category . . . . . . . . . 1068.3 Dataset 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1068.4 IRC commands used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1088.5 Top 10 characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1088.6 Top 10 messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1088.7 Message flow between client and server with client output . . . . . . . . . 1108.8 Detailed hourly usage (CET) . . . . . . . . . . . . . . . . . . . . . . . . . . 1158.9 User counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1178.10 Visualization of Log Example 12 . . . . . . . . . . . . . . . . . . . . . . . . 1188.11 Top 10 exactly written nicks in discourse (netiquette) . . . . . . . . . . . . 1198.12 Top 10 directed sender-receiver relations . . . . . . . . . . . . . . . . . . . 1208.13 Top 10 undirected sender-receiver relations . . . . . . . . . . . . . . . . . . 1208.14 Rearrangement of the messages into conversation threads . . . . . . . . . 1228.15 Rearrangement of the messages into topic threads . . . . . . . . . . . . . . 1238.16 Topics of the threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1248.17 Top 10 emoticons in channel #help . . . . . . . . . . . . . . . . . . . . . . . 1248.18 Attributes of each view (examples) . . . . . . . . . . . . . . . . . . . . . . . 1258.19 Colored message for each LogBot Java function . . . . . . . . . . . . . . . 1278.20 RGB colors and Java constants for LogBot Java functions . . . . . . . . . . 1278.21 Colored message for each mIRC item . . . . . . . . . . . . . . . . . . . . . 1288.22 RGB colors for mIRC items . . . . . . . . . . . . . . . . . . . . . . . . . . . 1288.23 Legend of an extended sociogram visualization . . . . . . . . . . . . . . . 130

9.1 Used IRC networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1349.2 Selected channels for the 9 categories . . . . . . . . . . . . . . . . . . . . . 1349.3 Dataset 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1359.4 Frequencies of characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1399.5 Mapping of non-permissible characters . . . . . . . . . . . . . . . . . . . . 1419.6 Character classes of each nickname . . . . . . . . . . . . . . . . . . . . . . 1419.7 Letter case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1429.8 Variants of shortening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1429.9 Average length of nicknames per channel . . . . . . . . . . . . . . . . . . . 1439.10 The five most popular templates used for creating nicknames . . . . . . . 1439.11 Cluster of Penn Treebank POS Tags . . . . . . . . . . . . . . . . . . . . . . 1449.12 New POS tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1449.13 Tagging nicknames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1459.14 Top 10 POS groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1469.15 Traditional mechanisms for the creation of stems . . . . . . . . . . . . . . . 1469.16 Personal information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1479.17 Non-traditional morphological processes . . . . . . . . . . . . . . . . . . . 1489.18 Letter and (part of) word mapping . . . . . . . . . . . . . . . . . . . . . . . 1489.19 Orthography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1499.20 Different styling possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . 1499.21 Basic structure of nicknames with possible positions . . . . . . . . . . . . 1509.22 Stem: Part(s) count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1509.23 Clan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

xx

9.24 Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1519.25 Decoration of a stem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1529.26 Decoration (Top 5 of each position) . . . . . . . . . . . . . . . . . . . . . . . 1529.27 Concatenation (Top 5 of each position) . . . . . . . . . . . . . . . . . . . . 1539.28 Basis of the nick <[p]Xxx666xxX_AFK> . . . . . . . . . . . . . . . . . . . 1539.29 Classification of nicknames . . . . . . . . . . . . . . . . . . . . . . . . . . . 1549.30 The parts of the nick <ˆ_ˆ[p]Germ{a}n_boy-15ˆ_ˆ|Away> . . . . . . . . . 1549.31 The parts of the nick <BRB_Ghost> . . . . . . . . . . . . . . . . . . . . . . 1559.32 Top 10 basic structure templates . . . . . . . . . . . . . . . . . . . . . . . . 1559.33 Non-stem-based nicknames . . . . . . . . . . . . . . . . . . . . . . . . . . . 1559.34 Mixed-based nicknames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

10.1 Use of nicknames within IRC discourses . . . . . . . . . . . . . . . . . . . 15710.2 Single direct addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15810.3 Multiple direct addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15910.4 Signal words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16010.5 Words after signal words that are not usually nicknames . . . . . . . . . . 16010.6 Visualization of nick changes . . . . . . . . . . . . . . . . . . . . . . . . . . 16110.7 How nicks are renamed while chatting (Top 5) . . . . . . . . . . . . . . . . 16210.8 Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16310.9 Orthographic errors due to typing . . . . . . . . . . . . . . . . . . . . . . . 16310.10 Text normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16410.11 Shortening or omitting parts of the nickname . . . . . . . . . . . . . . . . . 16410.12 Omitted POS of nickname while chatting . . . . . . . . . . . . . . . . . . . 16510.13 Creativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

11.1 Analysis of the logged messages . . . . . . . . . . . . . . . . . . . . . . . . 16911.2 Automatic setting of senders and receivers . . . . . . . . . . . . . . . . . . 16911.3 Templates for direct addressing (Top 5) . . . . . . . . . . . . . . . . . . . . 17011.4 Signal words and receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . 17011.5 Complexity of three identifiers . . . . . . . . . . . . . . . . . . . . . . . . . 17211.6 Identifierprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17211.7 Examples of detected and not detected receivers in discourse . . . . . . . 17511.8 Mapping of marked receivers to communicators: Assignment possibilities 175

12.1 Rules for calculating the most probable receivers . . . . . . . . . . . . . . . 183

13.1 Used direct addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18613.2 Linguistic use of written receivers . . . . . . . . . . . . . . . . . . . . . . . 18613.3 Assignment possibilities between messages and receivers, Part I . . . . . 18713.4 Assignment possibilities between messages and receivers, Part II . . . . . 18813.5 Detection and mapping of identifiers: Comparing results of both versions 19113.6 Statistical measures for IdentifierMapper (adapted from [Faw06; Pow11]) 19213.7 Receiver guessing: Comparing results of both versions . . . . . . . . . . . 192

A.1 The Penn Treebank POS Tagset (adapted from [Mar+94]) . . . . . . . . . . 235

B.1 IRC message format in “pseudo” BNF . . . . . . . . . . . . . . . . . . . . . 237B.2 IRC message format in Augmented BNF (ABNF) . . . . . . . . . . . . . . 238

xxi

List of tables

C.1 Detailed hourly usage (CET) per channel . . . . . . . . . . . . . . . . . . . 239

D.1 IRC server commands (adapted from [Kal00c; Kal00d; OR93]) . . . . . . . 241D.2 Additional IRC server commands . . . . . . . . . . . . . . . . . . . . . . . 245D.3 mIRC commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246D.4 Mapping of mIRC commands to the respective server commands . . . . . 248

xxii

List of figures

1.1 Elements of research methods and dissertation structure . . . . . . . . . . 51.2 Main topics of the thesis and corresponding chapters . . . . . . . . . . . . 5

2.1 Aristotle’s model (adapted from [Nar06a]) . . . . . . . . . . . . . . . . . . 132.2 de Saussure’s model [Sau66] . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Bühler’s organon model [Bü90] . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Johnson’s model [BB12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Lasswell’s model (adapted from [Fle09]) . . . . . . . . . . . . . . . . . . . 152.6 Shannon and Weaver’s model (adapted from [Sha48]) . . . . . . . . . . . . 152.7 Ruesch and Bateson’s model (adapted from [Kam16]) . . . . . . . . . . . . 162.8 Newcomb’s ABX model [Kam16] . . . . . . . . . . . . . . . . . . . . . . . . 162.9 Schramm’s model (adapted from [BB12]) . . . . . . . . . . . . . . . . . . . 172.10 Schramm’s model with field of experience (adapted from [MD94]) . . . . 172.11 Osgood and Schramm’s model (adapted from [MW95]) . . . . . . . . . . . 172.12 Gerbner’s model [Hal65] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.13 Westley and MacLean’s model [Hal65] . . . . . . . . . . . . . . . . . . . . 192.14 Riley and Riley’s model [Hal65] . . . . . . . . . . . . . . . . . . . . . . . . 192.15 Berlo’s SMCR model (adapted from [BB12; MD94]) . . . . . . . . . . . . . 202.16 Jakobson’s model (adapted from [Lan13]) . . . . . . . . . . . . . . . . . . . 202.17 Maletzke’s model [Cri+08] . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.18 DeFleur’s model [Nar06a] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.19 Dance’s model [Hil+07] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.20 Becker’s model [Hil+07] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.21 Andersch, Staats, and Bostrom’s model [Hil+07] . . . . . . . . . . . . . . . 232.22 Barnlund’s model [PP10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.23 Hiebert, Ungurait, and Bohn’s model [RB16] . . . . . . . . . . . . . . . . . 242.24 Kincaid’s model [Fig+02] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.25 Schulz von Thun’s model (adapted from [Thu17]) . . . . . . . . . . . . . . 25

3.1 Classification of CMC [Ngu08] . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 The email client “The Bat!” . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3 Email system components (adapted from [Bye05]) . . . . . . . . . . . . . . 323.4 The email list manager “Dada Mail” . . . . . . . . . . . . . . . . . . . . . . 333.5 The newsgroup newsreader “GrabIt” . . . . . . . . . . . . . . . . . . . . . 353.6 Usenet group exchanges (adapted from [Wik13]) . . . . . . . . . . . . . . . 363.7 The forum software “phpBB” . . . . . . . . . . . . . . . . . . . . . . . . . . 373.8 The forum software “Serendipity” . . . . . . . . . . . . . . . . . . . . . . . 383.9 The wiki software “MediaWiki” . . . . . . . . . . . . . . . . . . . . . . . . 393.10 The IRC client “qwebirc” used for web chatting . . . . . . . . . . . . . . . 403.11 Instant messaging architecture . . . . . . . . . . . . . . . . . . . . . . . . . 413.12 Instant messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.13 Text messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

xxiii

List of figures

3.14 SMS-enabled GSM network architecture (adapted from [LB05]) . . . . . . 43

4.1 Transactional communication model for CMC . . . . . . . . . . . . . . . . 50

5.1 Scheme of IRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.2 Screenshot of the mIRC client . . . . . . . . . . . . . . . . . . . . . . . . . . 585.3 Scheme of IRC with communicators . . . . . . . . . . . . . . . . . . . . . . 595.4 IRC message delivery (adapted from [OR93]) . . . . . . . . . . . . . . . . . 60

6.1 Multiple-views analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.2 Multiple-views analysis approach: Overview . . . . . . . . . . . . . . . . . 686.3 Phrase of key questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.4 Examples of views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.5 Important parts of the “View COM” . . . . . . . . . . . . . . . . . . . . . . 78

8.1 Hourly usage (CET) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1168.2 Average time gap between messages . . . . . . . . . . . . . . . . . . . . . . 1168.3 Visualization as an extended sociogram: Version 1 . . . . . . . . . . . . . . 1298.4 Visualization as an extended sociogram: Version 2 . . . . . . . . . . . . . . 1298.5 Visualization as an extended sociogram: Version 3 . . . . . . . . . . . . . . 129

11.1 Overview of the IdentifierMapper, Part I . . . . . . . . . . . . . . . . . . . 16711.2 Decompounding of nicknames with NickDecompounder . . . . . . . . . . 17311.3 Overview of the IdentifierMapper, Part II . . . . . . . . . . . . . . . . . . . 17411.4 Overview of the IdentifierMapper: Comparing . . . . . . . . . . . . . . . . 174

12.1 Overview of the architecture for the analysis of sender-receiver relations . 17912.2 Overview of the ReceiverGuesser . . . . . . . . . . . . . . . . . . . . . . . 18012.3 Analysis directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18012.4 Messages per hour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18112.5 Messages per minute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

13.1 Overview of manual vs. automated analysis . . . . . . . . . . . . . . . . . 18513.2 Heatmaps: 5605 messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18913.3 Heatmaps: 1146 messages with receivers . . . . . . . . . . . . . . . . . . . 18913.4 Heatmaps: 5605 messages with 5639 relations . . . . . . . . . . . . . . . . 190

xxiv

CHAPTER 1 Introduction

This chapter gives a general introduction to the topics of this thesis, starting with theresearch background and motivation in Section 1.1. Section 1.2 introduces the objectives,research questions, and guiding questions pursued in this thesis. Section 1.3 describesthe scope and constraints. Section 1.4 outlines the research approach and Section 1.5 theoverall structure of this thesis. Finally, a list of the original publications written by theauthor is presented in Section 1.6.

1.1 Research background and motivationCommunication is a key factor in everyday life, and as important as the need to eat, sleep,or love. The modern form of communication, known as computer-mediated communica-tion (CMC), has influenced our lives enormously. Communication takes place betweencommunicators and occurs with the help of information and communications technologies.CMC includes systems such as electronic mail (email), microblogging, instant messaging,text messaging, and online chat. Facebook, Myspace, Skype, Twitter, WhatsApp, andYouTube are current popular representatives. Nowadays, much communication is donevia CMC, which produces a lot of data. Pingdom [Pin13a], in its “Internet 2012 in numbers”report, stated that every day there were 2.4 billion Internet users worldwide, 144 billionemails, 175 million tweets, and 2.7 billion “likes” on Facebook every day worldwide.This flood of data is a perfect playground in which analysis can extract potentially usefulinformation. In particular, automated discourse analysis without the need for semanticknowledge/meaning is faster and less complex.

The author’s guiding definition of discourse is provided by Langdridge and Hagger-Johnson [LHJ09], who emphasize that “discourse consists of spoken and written com-munication and all other forms of communication.” Additionally, discourse analysisis defined in this thesis as a “[c]over term for various analyses of discourse” [Bus06].Computer-mediated discourse analysis (CMDA; i.e., analysis of computer-mediated dis-course) is an approach to the analysis of CMC focused on “online language and languageuse” [Her07]. According to Herring [Her04], CMDA can be applied to four domains orlevels of language: structure, meaning, interaction, and social behavior (see Table 6.1). Ofthese, two are primarily analyzed in this thesis: structure (e.g., word formation/nicknamecreation, orthography when using nicknames in discourse) and interaction (e.g., turns,forms of addressivity, conversation threads, sender-receiver relations).

For automatic discourse analysis of chat transcripts (chat logs), it is important to identifynicknames in the written chat messages to know who is chatting with whom. A possiblechat discourse extract is illustrated below:

<Limbic_Region> Hi all<Goblin> hello Limbic_Region<RobiX> limbic: hi

1

Chapter 1 Introduction

Users’ nicknames are surrounded by angle brackets as they appear on the Internet RelayChat (IRC), followed by written messages. This work mainly focuses on IRC, originallywritten in 1988 by Oikarinen and Reed. IRC is one of the most frequently used chatsystems in the world. It is a multi-user, multi-server, and multi-channel text-based chatsystem for near real-time communication. Several different independent IRC networks(e.g., QuakeNet, IRCnet, Undernet, EFnet) exist, and each one consists of a certain numberof servers that communicate over a well-defined open protocol.

Nicknames have been used since the Middle Ages, and today in a computing context theword nickname is omnipresent, especially in CMC. People use nicknames (also knownas nicks) to identify themselves, especially in chat rooms (channels), bulletin boards,and in social networks on platforms such as Facebook and Twitter. Nicknames play aspecial role in chat discourses for direct addressing: the way in which people addressone another. A nick acts as a marker in the chat discourse, comparable with Sacks’sconcept of “speaker select” [Bay98; Kor99; Nas05]. Basically, chats allow many-to-manyconversations; and can contain sequences of one-to-many or one-to-one interactions. Toprevent misunderstandings regarding the addressing of a message, the nick of the receiveris frequently put in front of the message, followed by a colon and a space. This is one ofthe basic (written and unwritten) rules of online communication, called netiquette. Directaddressing, also known as addressivity [Wer96] or cross-turn reference [Her99], opensup the possibility of taking part in more than one conversation at the same time for theparticipants. Explicit direct addressing is not used or required in every case, however; as itusually means addressing a message to everybody in a channel [Mut04a]. Most IRC clientsprovide automatic text highlighting, including manually set nicknames and variants. Ithelps us to know, in channels with a lot of traffic, who is talking to us.

The previous extract points out the following problems: First, as mentioned above, directaddressing is not always necessary or desirable. No nickname occurs in the message “Hiall”, although according to Rintel et al. [Rin+01], “openings are an excellent starting pointfor investigating how interaction on IRC functions to instantiate and develop interpersonalrelationships.” Second, <Goblin> does not comply with the rules of netiquette, becausethe receiver <Limbic_Region> is not put at the front of the message and followed by acolon. A word-by-word comparison between the user list and each word of the messageis necessary in order to find the receiver’s nickname. Third, the shortened variant “limbic”does not correspond exactly to one of the nicks in the existing user list (<Limbic_Region>,<Goblin>, and <RobiX>). The author presents a detailed linguistic study on the use ofnicknames in discourse (for instance, on which parts of a nickname are omitted, and howchatters play with the original nickname).

Knowledge about sender-receiver relations is important for discourse analysis. It helps toanswer the question “Who is communicating with whom?” for the whole discourse oreven for a specific message. Answering this question improves the quality of discourseanalysis in general.

1.2 Objectives, research questions, and guiding questionsThe main focus of this research is on computer-mediated discourse analysis. The objectivesof this thesis are (1) to describe a software architecture and (2) to develop a prototypefor an automated tool that analyzes computer-mediated discourses. The automated

2

1.2 Objectives, research questions, and guiding questions

prototype should answer the question “Who is communicating with whom?” as well ashuman beings could. It should do so at any point in time; even for messages that do notcontain receivers’ nicknames. To meet these key research objectives, the following researchquestions are addressed:

• Analysis approach of computer-mediated discourses: Can a common approach,derived from a computer-mediated communication model, be developed to supportthe analysis of discourses, and what does this approach look like?

• Sender-receiver relations: How and how well can senders and receivers be detectedand guessed, both manually and automatically, in computer-mediated discoursesfor each message?

A number of guiding questions are defined that will serve to guide the reader of this thesis.Guiding questions, and the chapters where those questions are answered, are shown inTable 1.1.

Table 1.1: Guiding questions and corresponding chapters

Nr. Guiding question Chapter(s)

1 How does human communication take place? 22 What are the basic elements of human communication? 23 What are the characteristics of CMC? 34 What forms (systems) of CMC exist and how do they work? 35 How does CMC take place in general? 3, 46 What are the basic elements of CMC? 47 How does the communication model for CMC look when applied to IRC? 58 Can a common approach be developed to support the analysis of discourses? 69 Which key questions are useful to analyze discourses? 6

10 What basic information (i.e., attributes) can be extracted from discourses? 611 Can all this information be clustered into views? 612 Which views are fundamental to analyzing discourses? 613 How does this multiple-views analysis look when applied to IRC? 7, 814 What is the basic structure of IRC nicks? 915 Which parts of speech (POS) do nicks consist of, in detail? 916 In what order are POS concatenated to a compounded nick? 917 How and with what characters are they concatenated? 918 Are there any special cases or features that occur in the creation of nicks? 919 How are nicknames used to address users in a chat discourse? 1020 How are nicks written in discourse, and which parts of them are omitted? 1021 How can senders and receivers be detected in IRC? 1122 How and how well can written receivers in discourses be mapped to logged-in users? 11, 1323 How and how well can written receivers in discourses be guessed, if no receivers 12, 13

are mentioned or found?24 Can automated discourse analysis be done equally well by hand? 13

3

Chapter 1 Introduction

1.3 Scope and constraintsThe limitations of this thesis are as follows:

• The Discourse examples are mainly focused on the text-based chat system IRC,which has open and well-documented protocols.

• Private messages are not monitored, in order to respect the privacy of users. Thisaccess would also require special access to the IRC servers.

• Automated analysis is done without semantics to archive results more quickly.

1.4 Research approachThis thesis is organized into three main parts and focuses on (computer-mediated) com-munication (Chapters 2–5), computer-mediated discourse analysis (Chapters 6–8), andautomated discourse analysis (Chapters 9–13). A fourth part concludes the thesis (Chapter14). Table 1.2 presents the elements of research methods used in this thesis and relatesthem to the chapters.

Table 1.2: Elements of research methods and corresponding chapters

ChapterElement of research methods 2 3 4 5 6 7 8 9 10 11 12 13

Literature surveys or reviews 3 3 3 3 3 3 3 3 3 3 3 7

Modelling/approachdesign communication model for CMC 7 7 3 7 7 7 7 7 7 7 7 7

design discourse analysis approach 7 7 7 7 3 7 7 7 7 7 7 7

Applied exampleapplied communication model for CMC 7 7 7 3 7 7 7 7 7 7 7 7

applied discourse analysis approach 7 7 7 7 7 3 3 7 7 7 7 7

creation of IRC nicknames 7 7 7 7 7 7 7 3 7 7 7 7

IRC nicknames in discourse 7 7 7 7 7 7 7 7 3 7 7 7

CMC datadata presentation: CMC raw data 7 3 7 7 3 7 7 7 7 7 7 7

data presentation: specific IRC logs 7 7 7 7 7 3 3 3 7 3 7 7

data presentation: online interviews via IRC client 7 7 7 7 7 7 7 3 7 7 7 7

data analysis 7 7 7 7 7 3 3 3 3 3 3 3

Prototypedesign and implementation of a prototype 7 7 7 7 7 7 7 3 7 3 3 7

evaluation of prototype implementations 7 7 7 7 7 7 7 7 7 7 7 3

Both quantitative and qualitative methods are used in this thesis (see datasets in Subsec-tion 8.1). As Herring [Her04] suggests in Table 6.1, text analysis for the domain “structure”and conversational analysis for “interaction” are done. Linguistic rule-based approachesto classify nicknames and to calculate the most probable receivers are used. Alternativemethods exist in the field of machine learning, such as neural networks or decision tree

4

1.5 Structure of the thesis

learning. The reasons for developing a rule-based approach are that: (1) no training datasets are necessary, (2) the rules as used are simple, (3) the rules can be easily implemented,and (3) a rule-based approach is easier to handle and adjust than “blackboxed” classifica-tion methods. To the best of the author’s knowledge, this is the first study on automateddetection and guessing of sender-receiver relations.

Figure 1.1: Elements of research methods and dissertation structure

Table 1.2 is visualized in Figure 1.1 as a diagram. “Data (A)” means “data analysis”, “Data(C)” stands for “data collection”, and “Data (C + A)” includes both. Figure 1.2 presentsthe four main topics of the thesis visualized as a Venn diagram composed of four equalellipses [Ven80]: communication, system, model/approach, and discourse analysis. Thenumbers indicate the corresponding chapters of this work.

Figure 1.2: Main topics of the thesis and corresponding chapters

5

Chapter 1 Introduction

1.5 Structure of the thesis

This thesis has been organized into three main parts, which are summarized in the nextsubsections.

1.5.1 (Computer-mediated) communication

Chapters 2 and 3 give an overview of communication and computer-mediated communi-cation, and provide a literature-based foundation for the fourth chapter. In Chapter 4, thepresent thesis defines a communication model that shows how CMC works in general.This knowledge is necessary for effective analysis as researchers need to understandthe communication process. The transactional model describes five basic elements inthe computer-mediated communication process: communicator, CMC system, message,context, and communication barrier. The well-known text-based chat system IRC is thefirst example of this communication model, which is described in Chapter 5.

1.5.2 Computer-mediated discourse analysis

In Chapter 6, a multiple-views analysis approach is introduced. This analysis approachprovides a structured and generalized approach for analyzing computer-mediated dis-courses. It includes the common main steps: preparation, data collection, data extraction,analysis, and result. This approach is based on at least twelve views, which are explained,and each view is represented by various underlying key questions. Clarifying and answer-ing these questions is a convenient way to analyze discourses. Additionally, each viewexplores attribute-value pairs and optional units of measure. The goal of the approach isto get a better multi-angled understanding. In Chapter 7, IRC is again the first example.

1.5.3 Automated discourse analysis with a focus on IRC

The main view of the multiple-views analysis approach in this part is the “View REL”(relation), which is mainly supported by “View COM” (communicator) and “View DAT”(date/time). Chapters 8 and 9 present an empirical study of nicknames in IRC, withobservations that are both quantitative and qualitative. The aim is to analyze the linkbetween written nicknames in discourse and current logged-in users. The author providesa series of examples that can help researchers and practitioners improve the quality ofmethods for automatically processing IRC chat, especially nickname detection in discourse.In particular, creativity and saving keystrokes complicate the detection of nicknames indiscourse. They make exact string matching problematic. These chapters help to predictwhether words in discourse match the original nicknames. For this purpose, one-to-onestring matching (excluding focus on letter case) and a string comparison of the first twoPOS groups are mostly sufficient. For a completely automatic discourse analysis though,it is important to identify all the nicks (and other links such as pronouns) in the writtenchat messages in order to find out more about the discourse structure [Hol08a], threaddetection [She+06], or who is chatting with whom. The goal of the final chapters is to findsender-receiver relations for each message automatically as well as by hand. The authordescribes the software architectures in detail. Further, discourse analysis is done both byhand and automatically in order to compare the results.

6

1.6 List of original publications

1.6 List of original publicationsThis dissertation is based on three previously published and peer-reviewed scientificpapers. Below is a list of these publications in ascending chronological order. Thesepublications are not referenced again in this dissertation.

1. Robert Ecker. Creation of Internet Relay Chat Nicknames and Their Usage in EnglishChatroom Discourse. Linguistik online, 50(6):3–29, 2011.

2. Robert Ecker. Multiple-Views Analysis of Computer-Mediated Discourses. In Pro-ceedings of the 17th International Conference on Information Integration and Web-basedApplications & Services, iiWAS ’15, pages 186–195, New York, NY, USA, 2015. ACM.

3. Robert Ecker. Automated Detection and Guessing without Semantics of Sender-Receiver Relations in Computer-Mediated Discourses. In Proceedings of the 18thInternational Conference on Information Integration and Web-based Applications & Services,iiWAS ’16, pages 172–180, New York, NY, USA, 2016. ACM.

The following chapters of this thesis are based on these three publications.

Table 1.3: Chapters based on author’s publicationsChapter

Publication 2 3 4 5 6 7 8 9 10 11 12 13

1 3 3

2 3 3

3 3 3 3

1.7 Chapter summaryThis chapter provided an introduction to this thesis by articulating the research back-ground and motivation, followed by formulating the objectives, research questions, andguiding questions. It also defined the scope and constraints. Next, the research approachand structure of the thesis were outlined. Finally, the list of original publications waspresented. Chapter 2 will give an overview of communication and communication models.

7

Part I

(Computer-mediated)communication

9

CHAPTER 2 Communication and communi-cation models

This chapter starts with two definitions; followed by some examples of communicationmodels.

2.1 Definitions

The terms “communication” and “communication model” are defined and explained.

2.1.1 Communication

Human communication is a fundamental factor in our life. According to Verderber et al.(1976; as quoted by [Adl+11]), a sample group of college students “spent an average ofover 61 percent of their waking hours engaged in some form of communication”. Addi-tionally, Watzlawick et al. [Wat+67] note, “no matter how one may try, one cannot notcommunicate”. The word “communication” is derived from the Latin word “communi-care” meaning “to share, divide out; communicate, impart, inform; join, unite, participatein”, literally “to make common” from “communis” (“in common, public, general, notpretentious, shared by all or many”) [BP08; Onl12]. This word, which has “a rich history[,]... entered the English language in the fourteenth and fifteenth centuries” [Pet00]. Al-though communication is ubiquitous, it appears nonetheless difficult to define. Variousdefinitions of communication are found in the literature [BP08; Kus10; MD94; Pat07; SR93;Sen11; Woo12]. These definitions focus on different aspects, and conceive communication,for example, “communication as an act or process of transmitting information”, or “as ahuman activity or behavior” [MD94]. Further examples include:

• According to Louis A. Allen (1958; as quoted by [Rod00]), “[c]ommunication is thesum of all the things one person does when he [or she] wants to create understandingin the mind of another. It is a bridge of meaning. It involves a systematic andcontinuous process of telling, listening and understanding”.

• Gerbner [Ger67] defines communication as “social interaction through messages”.

• Theodorson and Theodorson (1969; as quoted by [MD94]) view communication as“the transmission of information, ideas, attitudes, or emotion from one person orgroup to another (or others) primarily through symbols”.

• In the words of Newman/Summer (1977; as quoted by [Kus10]), “[c]ommunicationis an exchange of facts, ideas, opinions or emotions by two or more persons”.

• Kincaid [RK81] defines communication as “a process in which participants createand share information with one another in order to reach a mutual understanding”.

11

Chapter 2 Communication and communication models

• One definition is proposed by Dance [Dan82]: “Acting upon information. Limitedto organisms. The act may take place within an organism, or between or amongorganisms (the organism may or may not be human). At this level neither intent norsuccess is implicit in the concept of communication”.

• DeVito [DeV86] defines communication as “[t]he process or act of transmitting amessage from a sender to a receiver, through a channel and with the interference ofnoise”.

• Bisen and Priya [BP08] state that communication “is a process of exchange of facts,ideas, opinions and as a means that individual or organization share meaning andunderstanding with one another”.

The definition of the term “communication” used in this thesis is presented on page 28.

2.1.2 Communication modelAn effective way “to understand the principles and processes that define the nature ofcommunication is through modelling” [Kus10]. Communication models are idealizedsystematic representations [AF08]. They are descriptive tools for visualizing the commu-nication process [Nar06a]. DeVito (1978; as quoted by [BB12]) notes that “communicationmodels serve to organize the various elements and processes of the communication act”.The models focus on different aspects of communication. For example, Fielding [Fie06]divides the levels of communication in organizations into the categories: organizational,mass, small-group, interpersonal, public, and intrapersonal communication. Further, com-munication experts and scholars have proposed various models of communication, fromsimple diagrams to complex mathematical formulae, in order to explain why and howcommunication takes place. The process of communication can be modeled in three basicways: linear, interactional, and transactional [Fuj09]. Another classification is given byNarula [Nar06c]. She classifies communication models in three categories: stages (action,interaction, transaction, and convergence), types (linear, and non-linear), and forms ofmodels (various forms such as symbolic, physical, mental, verbal, iconic, analog, andmathematical).

Early theoretical work showed that communication is a one-way (linear) process in whichone source sends a message and then a recipient receives it. In the course of time, thislinear model evolved into a more complex model. The interactional model is a two-way process, which describes the feedback sent from the receiver back to the sender.Nowadays, communication is defined as “a continuous, transactional process” [Adl+11].The transactional model of communication underscores the fact that sending and receivingmessages is simultaneous and mutual [WT11].

2.2 Communication modelsAs diverse as models of communication are, so their focus can differ, for example inpsychology, sociology or linguistics. Many models were designed to focus on the com-munication process, and “perhaps they all [have] something in common” [AF08]. Thefollowing major models illustrate the process of human communication, “all attempting tocharacterize the ... factors involved in a communication situation” [Man10]. An incomplete

12

2.2 Communication models

overview is presented in the next subsections, with short descriptions of the communica-tion models. The selection criteria are the degree of popularity and the uniqueness of thecommunication model.

2.2.1 AristotlePerhaps the first model of communication was described over 2000 years ago by the Greekphilosopher and polymath Aristotle [Cug+09; Mon07; MD94; Nar06a]. This model indi-cates a linear, one-way relationship with five essential elements [MD94; Nar06a]. In thisspeaker-centered model, the speaker constructs a speech (message) for different audiences(listeners) and occasions. These passive listeners are affected by the message received(see Figure 2.1). Narula [Nar06a] notes that the Aristotelian model of communication andlanguage is “more applicable to public speaking than interpersonal communication”.

Figure 2.1: Aristotle’s model (adapted from [Nar06a])

2.2.2 Ferdinand de SaussurePublishing his work in 1916, the Swiss linguist Ferdinand de Saussure devised a circularcommunication model called “speech circuit” [Nö95]. In Figure 2.2a, two people, Aand B, are conversing with each other [Sau66; Thi97]. This model contains two fields ofcircularity (see Figure 2.2b). First is the connection of two simple linear communicationchains (audition, phonation). Second, circularity occurs in the mental process (“c” and“s”) indicated by two arrows within the circles (speaker and hearer) [Nö95; Seu16]. Thesearrows go both from “c” (concept) to “s” (sound-image) and vice versa.

(a) (b)

Figure 2.2: de Saussure’s model [Sau66]

2.2.3 Karl BühlerKarl Bühler was a German psychologist and linguist. His Organon Model (1934) isfounded on the linguistic sign and its representation of objects and states of affairs in

13

Chapter 2 Communication and communication models

relation to sender and receiver [JL02; Ren04]. In Figure 2.3, “representation” refers to thelanguage-based transfer of information; “expression” to the way the sender produces thesign; and “appeal” to the way the sign affects the receiver beyond the bare content of thesign [Hau01].

LegendS linguistic sign

Figure 2.3: Bühler’s organon model [Bü90]

2.2.4 Wendell Johnson

The model of interpersonal communication conceived by American psychologist WendellJohnson (1946) “indicates that communication takes place in a context which is external toboth speaker and listener and to the communication process as well” [BB12]. The stagesshown in Figure 2.4, which occur in both speakers and listeners, can be summarized as (1)event or source of stimulation, (2) sensory stimulation, (3) preverbal state, (4) symbolicstate, (5) overt expression, and (1) transformation of overt expression into air waves andlight waves which serve as stimulation for the listener [BB60; Hil74; Joh16].

Figure 2.4: Johnson’s model [BB12]

14

2.2 Communication models

2.2.5 Harold Dwight Lasswell

The American communications theorist Harold D. Lasswell was primarily concernedwith mass communication and propaganda [BD12; PS04]. In 1948, he suggested a com-munication model similar to Aristotle’s. Lasswell explicitly considered the effect of themessage. His linear model can be expressed in the following formula: “Who says what inwhich channel to whom with what effect?” (see Figure 2.5). Studies of the five elementsof Lasswell’s formula “describe the different kinds of communication research” [MD94].This model does not “allow for any feedback, interruption, or interference with the mes-sage” [But11], however; neither does it “provide much information about the relationshipbetween the different parts of the process” [Mon07]. This model was extended in 1958 byRichard Braddock with the added questions “under what circumstances” and “for whatpurpose” [Bra58].

Figure 2.5: Lasswell’s model (adapted from [Fle09])

2.2.6 Claude Elwood Shannon and Warren Weaver

Shannon and Weaver’s model (1948) “has initially been developed from a mathematicalpoint of view” [VER05], but was adopted for “transmission of messages in the field oftelecommunication” [MD94]. This simple, linear model consists of the following parts[Sha48]: (1) an information source which produces the message, (2) a transmitter thatencodes the message, (3) a channel or medium that transmits the signal (i.e., message),(4) a receiver that decodes the message, and (5) the destination (person/thing) for whomthe message is intended. The American scientists Shannon and Weaver introduced anadditional element, called noise, as a “factor that may distort the message” during theprocess of signal transmission through the channel [Mon07]. Figure 2.6 represents Shannonand Weaver’s model of communication.

Figure 2.6: Shannon and Weaver’s model (adapted from [Sha48])

15

Chapter 2 Communication and communication models

2.2.7 Jürgen Ruesch and Gregory BatesonThe American psychiatrist Ruesch and the English anthropologist Bateson (1951) specifiedthat human communication operates on four levels or systems (see Figure 2.7). These levelsare (1) intrapersonal (embodied consciousness), (2) interpersonal (dyadic interaction),(3) group (social interaction), and (4) cultural (inter-group culture) [Lan13; Ste88]. Thecommunication process involves evaluating (E), sending (S), the channel (C), and receiving(R) [Lan13].

Figure 2.7: Ruesch and Bateson’s model (adapted from [Kam16])

2.2.8 Theodore Mead NewcombNewcomb’s model (or ABX model) introduces “the role of communication in a society ora social relationship” [Fis90] (see Figure 2.8). The tripartite model of the American socialpsychologist (1953) “consists of the sender (A), the receiver (B), and the social situation inwhich the communication takes place (X)” [Dan08].

Figure 2.8: Newcomb’s ABX model [Kam16]

2.2.9 Wilbur SchrammThe American scholar Wilbur Schramm developed several communication models. Thefirst model, in 1954 (see Figure 2.9), was mainly based on the concept of encoding anddecoding signals (messages).

16

2.2 Communication models

Figure 2.9: Schramm’s model (adapted from [BB12])

The second model (see Figure 2.10), which was a further development of the first one,is “somewhat different from the previous models” [MD94]. It provides an additionalperspective: the study of human behavior, especially in “the concept of overlapping ’fieldsof experience’ between the sender and the receiver” [MD94]. Ideally, the sender encodesthe message, based on their sender’s field of experience, and transmits it in the formof a signal via a channel to the receiver, where the signal is decoded, according to thereceiver’s field of experience. Communication can occur if there is some commonalityin the sender’s and receiver’s field of experience (e.g., a common language, culture, orbackground knowledge) [MD94].

Figure 2.10: Schramm’s model with field of experience (adapted from [MD94])

2.2.10 Charles Egerton Osgood and Wilbur SchrammSchramm, in cooperation with the American psychologist Osgood, developed anothermodel: the first circular model of communication (see Figure 2.11). This model depictsan ongoing interaction between the sender and receiver: the sender encodes the messageand sends it to a receiver, who decodes and interprets the message based on his/her ownexperience and culture. The receiver then responds by encoding another signal and sendsthe feedback message as a new one back to the original sender [Mon07; MD94]. As a result,both participants—the sender and the receiver—are encoder, decoder, and interpreter.They continually swap roles.

Figure 2.11: Osgood and Schramm’s model (adapted from [MW95])

17

Chapter 2 Communication and communication models

2.2.11 George GerbnerGeorge Gerbner was a Hungarian-American communication scholar. His general-purposemodel of communication (1956) emphasizes “the importance of effects and context in thecommunication process and act” [Nar06b]. This model can be divided into three stages:

1. The process begins with an event (E) which is perceived by man or machine (M)[BB12; Woe92]. Bhatnagar and Bhatnagar [BB12] add that “M’s perception of E isa percept E1”. E1 is the “product of perceptual activity” [Woe92] that differs fromE because M cannot perceive the whole actual event E. Three factors are involvedbetween E and M: selection, context, and availability.

2. M has to use the selected channels for communication with some degree of control,depending on M’s skill in using these channels [KSD12].

3. This stage shows “the statement about the event (SE), being perceived by a secondperson (M2)” [Woe92]. Woelfel [Woe92] notes that “[t]his perceptual activity involvesa transformation such that the differences between SE and SE1 occur”.

LegendE eventE1 perceived message by MM, M2 man or machineSE, SE1 S: signal or form

E: content

Figure 2.12: Gerbner’s model [Hal65]

In addition to the graphical representation in Figure 2.12, a verbal version is given by thefollowing formula [Ber95; Nar06c]:

someoneperceives an eventand reactsin a situationthrough some meansto make available materialsin some formand contextconveying contentwith some consequence.

18

2.2 Communication models

2.2.12 Bruce H. Westley and Malcolm S. MacLeanThe Americans Westley and MacLean (1957) adopted Newcomb’s model specifically formass media [Fis90]. They included a new element between sender and receiver called“gatekeeper” [WM57], “which is the process of deciding what and how to communicate”[AG01]. Narula [Nar06c] notes that “[l]inear feedback is an important component of thismodel”. The communication process for this model is explained in [BB60] as follows:the major components in the communication process are objects of orientation (X1, ...,X∞), their abstracted form (X1, X2, X3) and one example with more than one sense (X3m),message interpretation or coding (X’, X”), sender (A), receiver (B), gatekeeper (C), andfeedback (f) (see Figure 2.13).

Figure 2.13: Westley and MacLean’s model [Hal65]

2.2.13 John W. Riley and Matilda White RileyThe Americans John and Matilda Riley (1959) pointed out in their model “the importanceof the sociological view in communication” [Lee93] (see Figure 2.14). They “developeda model ... to illustrate ... sociological implications in communication” [Lee93]. Thecommunicator (C) and receiver (R) are part of an overall social system. Both are membersof certain primary groups which influence the formulation, selection, and perception ofmessages [RB16].

Figure 2.14: Riley and Riley’s model [Hal65]

2.2.14 David Kenneth BerloIn 1960, the American scholar David Berlo introduced his linear model based on theShannon-Weaver model (Figure 2.15). Berlo’s SMCR model consists of the four primarycomponents: source (S), message (M), channel (C), and receiver (R). He added five factors

19

Chapter 2 Communication and communication models

and characteristics for each of these four components. These factors, which affect theprimary components, influence communication [Mon07]. Both the source and receiver areaffected, e.g., by their communication skills and culture, and the message is influenced bywhat is sent and how it is sent. The channel is related to the five senses.

Figure 2.15: Berlo’s SMCR model (adapted from [BB12; MD94])

2.2.15 Roman Ossipowitsch Jakobson

The Russian–American linguist Jakobson (1960), influenced by Bühler’s model, describessix factors of language (e.g., addresser) which must be present for communication to bepossible. Each of the factors has one associated element of verbal communication (e.g.,emotive) [Ber95; Hun+10; Nö95]. In Figure 2.16, the model presents both the six factorsand six elements.

Figure 2.16: Jakobson’s model (adapted from [Lan13])

2.2.16 Gerhard Maletzke

Gerhard Maletzke was a German communication scientist and psychologist who con-structed a model of mass communication model (1963). This model refers to “contextualand psychosocial factors that influence communication activities and patterns, such as

20

2.2 Communication models

the self-image of the interlocutors and the type of social environment in which the com-munication takes place” [Dan08]. The communication process has the four elements:communicator (C), message (M), medium, and receiver (R) (see Figure 2.17).

Figure 2.17: Maletzke’s model [Cri+08]

2.2.17 Melvin Lawrence DeFleurMelvin L. DeFleur was an American scholar whose mass communication model (1966)expands the version of Shannon and Weaver (interjecting a mass medium device). Thismodel is based on that of Westley and MacLean’s (added two-way feedback device toillustrate a circular communication process) [Nar06a; Nar06b] (see Figure 2.18). DeFleurrecognized that noise interferes at any stage in the process [Fol13; Nar06a].

Figure 2.18: DeFleur’s model [Nar06a]

2.2.18 Frank E. X. DanceIn 1967 [Dan67], the American communication scholar Frank Dance proposed a communi-cation model called Dance’s Helical Model (see Figure 2.19). His model “represents the

21

Chapter 2 Communication and communication models

way communication evolves or progresses in a person from birth to the present moment”[GG06].

Figure 2.19: Dance’s model [Hil+07]

2.2.19 Samuel L. BeckerSamuel Becker was an American communication scholar, and deviser of the Mosaic Model(1968). This model “attempts to portray the multidimensional nature of communication,its inner and outer features, and the fact that the inner features run to considerable depths,some of them held back from, or hidden from, public view” [Hil+07] (see Figure 2.20).

Figure 2.20: Becker’s model [Hil+07]

2.2.20 Elizabeth G. Andersch, Lorin C. Staats, and Robert N. BostromThree American scholars developed the communication model presented in Figure 2.21.This model (1969) “stresses the transactional nature of any communication, in which

22

2.2 Communication models

meanings are constructed and interpreted both by the sender and the receiver and are alsosubject to outside influences” [Bla06].

Figure 2.21: Andersch, Staats, and Bostrom’s model [Hil+07]

2.2.21 Dean C. BarnlundThe American scholar Barnlund identified human communication as a process which“describes the evolution of meaning” [Dix73; Hil+07]. Furthermore, communication is dy-namic, continuous, circular, unrepeatable, irreversible, and complex [Dix73; Hil+07; TC78].Therefore, Barnlund’s transactional model (1970) differs greatly from those previouslydiscussed (see Figure 2.22).

LegendCBEHNV nonverbal behav-

ioral cuesCBEHV verbal behavioral

cuesCPR private cuesCPU public cuesD decodingE encodingM messageP person

Figure 2.22: Barnlund’s model [PP10]

This model is “a complex graphic representation of spirals and arrows to show commu-nication components as interrelated and constantly evolving” [Mas11]. There are three

23

Chapter 2 Communication and communication models

different sets of signs: public, private, and behavioral cues [Bar09; Hil+07]. Public cues(CPU) are subdivided into natural, which “come from our environment without humanintervention”, and artificial cues, which “arise from a human’s involvement with the outerworld, their impact on it, their modification and manipulation of it” [Hil+07]. Private cues(CPR) “operate intrapersonally, in our own head, part of the lexicon of memory and experi-ence” [Hil+07]. Behavioral cues, which are either verbal or non-verbal, “may become cuesfor the other communicant” [Dix73]. The persons (P1, P2) are simultaneously sending andreceiving messages (M). A person “decodes (D), or assigns meaning to, the various cuesavailable in his/her perceptual field ...” [Bez96], “... responds to them, and encodes (E)them for transmission to a recipient or recipients in the form of behavioral cues” [Dix73].The jagged lines illustrate the fact “that the number of cues to which meaning may beassigned is probably without limit” [Bar09].

2.2.22 Ray Eldon Hiebert, Donald F. Ungurait, and Thomas W. Bohn

The American scholars Hiebert, Ungurait, and Bohn designed a model “focused on contentdevelopment” [Nar06b]. The concentric HUB model (1974), presented in Figure 2.23, isdescribed as “a series of concentric circles with ... several elements that are important inthe mass communication process” [Viv06]. These elements are the sender (communicator),codes, gatekeepers (e.g., controllers or reviewers of a message), media, regulators (socialcontrol of the media), filters (physical, emotional, psychological, cultural, or other framesof reference), audiences (social groups receiving information), and effect [HG00; RB16].The content (or message) of mass communication starts with a sender and has to gothrough a series of steps, overcoming various obstacles or barriers, before it reaches theaudience and produces a certain effect on their mind [HG00; RB16]. Rogala and Bialowas[RB16] add that “[f]eedback has to travel from the audience back to the communicator”.

Figure 2.23: Hiebert, Ungurait, and Bohn’s model [RB16]

2.2.23 D. Lawrence Kincaid

The American scientist Kincaid developed a convergence model in 1979. Narula [Nar06b]notes that “[a]ccording to this model, effective feedback creates convergence and ineffec-tive feedback creates divergence”. In the convergence model, “[i]nformation and mutualunderstanding are the dominant components of this model” [Fig+02] (see Figure 2.24).

24

2.3 Chapter summary

Figure 2.24: Kincaid’s model [Fig+02]

2.2.24 Friedemann Schulz von ThunSchulz von Thun is a German psychologist whose four-sided model (also named four-ears model or communication square) is an interpersonal communication model (1981).Communication is four-dimensional, that is, every message has four sides: self-revelation,factual information, relationship, and appeal [Bau10; Thu13; Wei+15] (see Figure 2.25).

Figure 2.25: Schulz von Thun’s model (adapted from [Thu17])

2.3 Chapter summaryThis chapter defined the terms “communication” and “communication model”. Commu-nication models and their main components (basic elements) were explained. The modelsshow how human communication takes place in general. The basic elements describedare synonyms or hyponyms of an author’s basic elements of communication: context,(transmission) system (e.g., air or a CMC system), communicator (sender and receiver),message (e.g., sound, signal), and communication barrier. These basic elements are usedto define a communication model for CMC in Chapter 4. The next chapter introducescomputer-mediated communication.

25

CHAPTER 3 Computer-mediated communi-cation (CMC)

This chapter provides definitions and characteristics of CMC, examples of CMC systems,advantages and disadvantages of CMC, and ethical issues in CMC research.

3.1 DefinitionThe groundbreaking and visionary publication The Network Nation written by Hiltz andTuroff, was first published in 1978, and updated in 1993. It is “one of the earliest studies ofonline communication” [Gur97]. The book, which is divided into three parts, describesthe nature of computerized conferencing and related technologies, the impact and use ofCMC technology, and a future scenario in 2084. Hiltz and Turoff [HT93] argue that “inorder to understand computer-mediated communications at all, you must see them asa social process”. Teresa Carpenter comments that this book “has become the definingdocument and standard reference for the field of computer mediated communication(CMC)” [HT93]. Hiltz and Turoff’s work inspired other CMC researchers. Further, theirobservations, like the defined key problem areas, still remain valid today [Kie08].

Lots of different definitions and similar terms for CMC exist, such as, electronicallymediated communication (EMC), digitally mediated communication (DMC), Internet-based communication (IBC), and Internet-mediated communication (IMC) [Cry11; JD12].Simpson [Sim03] comments that “[t]here is debate as to what to include within a definitionof CMC”. Nguyen [Ngu08] adds that “just like the fast-changing CMC technologiesthemselves, the definition of CMC is not fixed”. This implies the inclusion of videoand audio conferencing, telephone text-messaging, and the World Wide Web [Bar10a;Kar13; Mur00; Sim02; Sim03]. Some researchers, such as Higgins [Hig92], define CMCin a broader sense (“human communication via computer”), and Metz [Met94] (“anycommunication patterns mediated through the computer”). On the other hand, Simpson[Sim02] notes that some writers such as Murray [Mur00] “restrict the definition to includeonly text-based modes”. Examples of definitions of CMC:

• December [Dec97] interprets CMC in a detailed way as a “process of human commu-nication via computers, involving people, situated in particular contexts, engagingin processes to shape media for a variety of purposes”.

• Another popular definition is proposed by Herring [Her96]. She defines CMC simplyas “communication that takes place between human beings via the instrumentalityof computers”.

• Erlich et al. [Erl+05] summarize different sources as follows: “CMC is a combinationof telecommunication technologies and computer networks ... that enable usersto transmit, store, and receive information ... via synchronous and asynchronouscommunication tools ...”.

27

Chapter 3 Computer-mediated communication (CMC)

• Zitouni [Zit13] notes that “Warschauer (1999) proposes a ‘structure-based’ definitionfor CMC by decorticating CMC into three core concepts”. These core concepts arecomputer (computers and digital networks), mediated (communication is transmit-ted and facilitated through people’s interactions by means of computers and digitalnetworks), and communication (which is dynamic, transactional, multifunctional,and multimodal).

The author of this work defines computer-mediated communication in a broad manner tocover a wide range of CMC systems: communication takes place between communicatorsvia or with the help of information and communications technologies. Therefore, CMCcan be briefly summarized as communication via computer(s). Warschauer’s three coreterms of CMC (computer, mediated, and communication) are defined as follows.

• Communication: Dynamic, transactional, multifunctional, and multimodal processto exchange messages, including, for example: facts, ideas, opinions, or emotions.Communicators are usually human beings, but non-human beings such as chatbotsare also possible.

• Computer: System which consists of hardware, software, and optionally, a network.Mobile phones are included. Although some scholars call this stretch “informationand communication technology” (ICT) [Len09], the author uses the term “computer-mediated communication” in this thesis. Again, it is not necessary to use networkedhardware. It is possible that the communication is happening via a single computer.For example, a human being communicates with a chatbot on the same physicalcomputer.

• Mediated: One or more computers act as mediators to transmit messages. The word“mediator” is derived from the Latin “mediatorem” (nominative “mediator”), whichmeans “to be or divide in the middle” [Onl13].

3.2 Characteristics of CMCThe “multiplicity of CMC dimensions, and distinctions between these are not alwaysclear” [Sim03]. A traditional distinction is between asynchronous (delay; communicatorsare not necessarily online simultaneously) and synchronous (real-time; online) CMC[Lee02; Seg02; Sim03; Zit04], although Simpson [Sim03] notes that this distinction “issomewhat problematic”. Segerstad [Seg02] adds that “[n]o synchronous CMC is fullysynchronous in the way spoken face-to-face interaction is: there is always the lag and delayof typing and sending the message”. Therefore, Herring [Her99] distinguishes betweenone- and two-way text-based synchronous CMC (SCMC). In one-way SCMC (e.g., email,IRC), “it is technically impossible for the addressee to respond while the message is beingwritten ... until a complete message appears” on the screen [Her99], while in two-waySCMC (e.g., the instant messaging tool ICQ), messages “can be seen as they are beingcomposed” [Sim03]. Instead of synchronous, Garcia and Jacobs [GJ99] favor the termquasi-synchronous, “as only the produced messages and not the message production aresimultaneous [sic] available to the participants in a CMC environment” [Str+07]. Apartfrom this aspect of “synchronicity” [Lee02; Seg02], CMC can be categorized according todifferent attributes, for example:

28

3.3 CMC systems

• “nature of the interaction” [Wet+01], e.g., one-to-one or many-to-many (group)interactions [AC07; Bar10b; Erl05; Lee02; Str+07; Wet+01];

• ephemeral (not recorded) or persistent (recorded) communication [Hol08c];

• medium, e.g., text, audio, graphics, or video-based [KP07];

• media richness and social presence [Bar10a; Ben09; Bub01; Kal07];

• push or pull communication [Hei+06; Mar04; Ver99];

• textual dichotomy, i.e., text-based CMC or not [Sim03]; and

• other media (e.g., size of message buffer) or situation factors (e.g., public, private)[Her07].

The “language used in CMC as a whole” [Hun+10] is called, among other things, netspeak,cyberspeak, Internet slang, or netlingua [Bar10a; Cry04; Gre10]. Various researchers havecompared the language used in CMC in general, or in specific CMC systems such aschat and email, to spoken and written modes of communication (e.g., [Bar03b; Cry04;Has09; Seg02; Tag09]). Crystal [Cry04] and many other linguists (e.g., [Fre08; Gre10; MS00;Seg02]) conclude that netspeak is a hybrid between speech and writing. He adds thatthis makes netspeak a new and “genuine ’third medium’” of linguistic communication.Examples of linguistic features of CMC are abbreviations or emoticons. Greiffenstern[Gre10] notes that “not everyone who communicates online uses all available features ofCMC, and not all features are used in all modes of CMC”. The choice of linguistic featuresdepends on different factors such as the communicator, CMC mode, communicative goal,and relationship [Gre10]. English is the dominant language in CMC [Bar10a; Mur00].Murray [Mur00] states that this “is the result not only of the global expansion of English asa lingua franca but also of the historical development of the technology itself”. However,many participants “have very limited knowledge of English” [Gre10]. In CMC, basic(written and unwritten) rules should be kept in mind when sending messages to othercommunicators. These rules of proper conduct are called netiquette, which is short fornetwork etiquette [She94]. The netiquette may vary across CMC systems. Examples ofguidelines are written for CMC in general by Shea [She94] and Hambridge [Ham95],for IRC by Charalabidis [Cha00] (e.g., “Don’t flood the channel by sending large textfiles.”) and by “IRC Beginner.com” [IRC13] (use a firewall), for Usenet by Kehoe [Keh92](useful subjects), and for email by Shapiro and Anderson [SA85] (avoid responding whileemotional) and Flynn and Kahn [FK03] (copy with care). In summary, “think before youcommunicate” is a common piece of advice in CMC.

3.3 CMC systemsThis section provides a brief explanation and a short overview of different forms ofcomputer-mediated communication. CMC comes in numerous forms (or modes) [Bar10a;Bar03a; BS08; Chu08; Met94; Mur00; Riv02a; Seg02; Sim02; Sim03; Ver14; WS05]. Itis not possible to present a complete list of all the forms because “CMC is constantlychanging, and new forms of CMC are emerging” [Gre10]. Nowadays, “CMC modes arefrequently used in combination” [Sim02]. As mentioned above, CMC can be divided

29

Chapter 3 Computer-mediated communication (CMC)

into synchronous and asynchronous modes. Asynchronous modes include email, forum,blog, and wiki. Examples of synchronous modes are chat, instant messaging, and textmessaging. Nguyen [Ngu08] points out that “[a]nother widely-accepted classificationof CMC is whether it is text-based or audio/video-based”. Figure 3.1 illustrates bothwidely-accepted classifications of CMC.

Figure 3.1: Classification of CMC [Ngu08]

3.3.1 Electronic mailingEmail is perhaps the most popular and widely used CMC system [Hen09; WS05]. Anumber of abbreviated spelling variations of electronic mail exist, such as email, e-mail,Email, EMail, E-mail, and eMail. Initially, email was called “net notes” or “simply mail”[Abb00]. An exchange of emails is usually faster than the traditional postal service (mailor post), which is sometimes called “snail mail” or “smail” [Gre10; Met02]. Therefore,email has replaced the postal service in many situations. Partridge [Par08] mentionsthat “[e]lectronic mail existed before networks did”. Examples of free mail providersare Gmail (https://mail.google.com/), Outlook.com (http://www.outlook.com/), andYahoo! Mail (https://mail.yahoo.com/).

Figure 3.2: The email client “The Bat!”

Email is an example of an asynchronous CMC. It is a service that allows users to send

30

3.3 CMC systems

messages to the mailboxes of one or more users. A mailbox is identified by an emailaddress which is made up of a local part (e.g., username), the at sign “@”, and the domainname or fully qualified domain name. An email message consists of three components: amessage envelope (used to deliver email to a recipient), message header (contains con-trol information), and message body (the actual message content). Common messageheader fields (key-value pairs) include, for example, “Date:” (local time and date whenthe message was originally sent), “From:” (the sender’s email address and optionally, thesender’s name), the “Reply-To:” (address where replies should be sent), the “Message-ID:”(single unique message identifier), “To:” (address(es) and name(s) of the recipient(s)),“Cc:” (carbon copy), “Bcc:” (blind carbon copy), “Subject:” (a short summary of thecontent), “In-Reply-To:” (used to identify the message(s) to which the new message is areply), “References:” (used to identify a thread of conversation), “MIME-Version:” (MIMEversion number), “Content-Type:” (indicates the media type of the message content),“Content-Transfer-Encoding:” (content transfer encoding applied), and “Received:” (track-ing information generated by email servers) [KP05; Res08]. An example is given in Table3.1.

Table 3.1: Example of an email messageDate: Wed, 22 May 2013 16:28:29 +0200From: [email protected]: [email protected]: <[email protected]>To: “Sue Thompson” <[email protected]>Subject: Re: ASCII to UnicodeIn-Reply-To: <[email protected]>References: <[email protected]>MIME-Version: 1.0Content-Type: text/plain; charset=us-asciiContent-Transfer-Encoding: 7bit

Hello Sue,

Thanks for your help!

Greetings,Robert

An email can usually be composed, sent, read, replied to, redirected, forwarded, saved,or deleted. Originally, email messages were made up of 7-bit ASCII (American StandardCode for Information Interchange) [Res08]. MIME (multipurpose Internet mail extension)especially extends this specification to carry one or more multi-media content attachments[Gru04] (e.g., documents, images, audio files, and video files). MIME is defined by RFC822, RFC 2045–2049, and RFC 2387 [Cro82; FB96a; FB96b; FB96c; Fre+96; Lev98; Moo96].Email can be sent in plain text, HTML (hypertext markup language) formatted text, orRTF (rich text format). Depending on the level of detail, several software components(email agents or message agents) are used within an email system [Cro09; Hut+07; KS07].An overview of the components is given in Figure 3.3. The MAA and MRA mentionedbelow are not standardized in RFC 5598 [Cro09].

31

Chapter 3 Computer-mediated communication (CMC)

Figure 3.3: Email system components (adapted from [Bye05])

Details of these components are as follows:

• Mail user agent (MUA): Used to read and write emails. The user interacts withan MUA (email client or email reader). Stand-alone email clients are, for example,Mozilla Thunderbird (http://www.mozilla.org/thunderbird/), The Bat! (http://ritlabs.com/, see Figure 3.2), and Microsoft Outlook (http://office.microsoft.com/outlook/). Additionally, web-based applications (webmails) use web browsers(e.g., Gmail, https://gmail.com/).

• Mail transport agent (MTA): Takes care of mail routing and transport. SMTP (simplemail transfer protocol), which is defined by RFC 5321 [Kle08], is the most commontransport protocol. Most MTAs come with a simple MDA functionality.

• Mail delivery agent (MDA): Accepts emails from an MTA. The local MDA (LDA)delivers emails to a local mailbox, SMTP MDA to other MTA.

• Mail submission agent (MSA): Specialized form of the MTA, accepts emails froman MUA. It prepares and delivers them to an MTA. MSA is often built into the MTA[KS07].

• Mail access agent (MAA): Reads emails from the message store (mailbox) and talkswith an MUA (or MRA) using an access protocol such as POP (post-office protocol)or IMAP (Internet message access protocol). POP is specified in RFC 1939 (in itsthird version), 1734 (authentication command), and 2449 (extension mechanism)[Gel+98; Mye94; MR96]; IMAP in RFC 3501 [Cri03]. These are the most commonlyused Internet mail protocols for retrieving emails. An MRA is usually part of theMUA.

• Mail retrieval agent (MRA): Retrieves emails from an MAA and makes them avail-able to an MUA. Typically, the MRA is part of the MUA.

32

3.3 CMC systems

3.3.2 Electronic mailing listEmail is a one-to-one CMC system. In contrast, an electronic mailing list (also email listor elist) is a means for one-to-many exchanges at once. The electronic mailing list is anasynchronous CMC form which uses the email system for distribution. It is a list of storedemail addresses of users who are interested in a particular topic or discussion. The mostused commands are “subscribe” (join a mailing list) and “unsubscribe” (leave). The senderdoes not need to know the email addresses of the other users in the list for posting. Thereare different email addresses associated with one mailing list [Mil06; Ste00b]: (1) the listaddress used for the mailing list itself (i.e., for posting messages to mailing list subscribers),(2) the administrative address for subscribe messages and other administrative messages,(3) the address of the list owner (responsible for list operations), and (4) the address of thelist moderator (responsible for moderated lists). Additional email headers for mailing listspresent special features to the user. They are defined by RFC 2919 (List-Id field) [CW01]and RFC 2369 (e.g., fields List-Owner and List-Subscribe) [BN98]. The SMTP commandEXPN expands a mailing list. It returns the email address of each member in the list[Kle08].

Figure 3.4: The email list manager “Dada Mail”

The term “Listserv” is usually used as a synonym for an electronic mailing list [Bar03a;Her02]. Unfortunately, LISTSERV, a mailing list software, is now a registered trade-mark licensed to L-Soft International, Inc. Web-based interfaces for convenient admin-istration are available, and popular mailing list servers (or mailing list managers) forthe Internet include LISTSERV (http://www.lsoft.se/products/listserv.asp) andMajordomo (http://www.greatcircle.com/majordomo/). A screenshot of Dada Mail(http://www.dadamailproject.com/), another mailing list manager, is shown in Figure3.4. Mailing list servers offer several functions to manage email lists:

33

Chapter 3 Computer-mediated communication (CMC)

• One-way and two-way list: In a one-way list (also named announcement list), theowners or editors broadcast to the subscribers (e.g., through a newsletter or productannouncement). Members of a two-way list (discussion list) can both send andreceive messages. The exchange of information centered on a specific topic betweenlist members is possible.

• Open (public) and closed (private) subscription: Any valid user can subscribeto a public list. In a closed list, subscriptions are approved by the list owner ormoderator(s).

• Automatic and manual subscription: Subscription requests are processed automat-ically or manually (e.g., by the mailing list owner).

• Open (public) and closed (private) posting: Anyone can post messages, or, onlysubscribers are allowed to post messages in the mailing list.

• Moderated and unmoderated (anarchic): In a moderated list, the messages are ap-proved by a moderator (or list owner) before being distributed or rejected. Messagessent to an unmoderated list are automatically distributed without any prior contentreview.

• Individual message and digest: Instead of receiving a copy of each message, severalmessages within a certain time period (e.g., day, week, month) are combined togetherand sent in a single message. Therefore, an email digest is a way to reduce the totalnumber of messages.

• Reply to sender and list: Replies are sent to the sender or to the subscribers.

• Visible or hidden mailing list: A hidden mailing list is displayed only to subscribers(and not to everybody).

• Archived and not archived messages: Message can either be saved, archived, andmade accessible to others or not [Ste00b]. An archived mailing list allows the retrievalof the previous stored postings.

3.3.3 UsenetUsenet (also netnews) stands for “user’s network” [Gar+94; Ste00b]. It is a worldwide,distributed, decentralized, and asynchronous CMC system. Usenet contains a large col-lection of hierarchically organized, computer-based discussion groups (newsgroups).A newsgroup involves a discussion on a particular subject which is identified with aperiod-separated name [Gar+94]. Major standard categories contain many subgroups.These categories are related to computer (“comp.*”), humanities (“humanities.*”), miscel-laneous (“misc.*”), newsgroup (“news.*”), recreation (“rec.*”), science (“sci.*”), society(“soc.*”), talk (“talk.*”), and alternative (“alt.*”) [Gru04; Smi99; Ste00b]. For example,“comp.software.testing” is the newsgroup for discussing all aspects of software testing.Additionally, many other (smaller) top- and lower-level hierarchies, and also regional oreven company-specific, exist. For example, “de.comp.lang.c” is the German newsgrouprelated to the C programming language. Usenet was originally designed to transfer text-only messages (articles, postings, news). A complex process is necessary to share binaryfiles such as pictures, audio, and video files. Programs encode binary data (8-bit ASCII)

34

3.3 CMC systems

into plain text (7-bit ASCII) with a little overhead. To avoid the maximum-size limitationof messages, large binary files have to be broken up into smaller parts. After transmission,the split files are reassembled into the original files and then decoded. Binary attachmentsare posted strictly in the “alt.binaries” hierarchy [Ste00b]. In general, Usenet is public,regardless of provider or location. However, newsgroups can either be unmoderated(anyone can post) or moderated. Moderators control which articles are allowed to beposted (e.g., to avoid spam). Nowadays, public Usenet servers typically have a shortretention time (the time after which articles expire) or do not offer a large number ofnewsgroups. Furthermore, Usenet or Internet service providers block binary newsgroupsto avoid legal issues or to reduce network traffic. Therefore, various Usenet newsgroupproviders offer only paid access.

Figure 3.5: The newsgroup newsreader “GrabIt”

Usenet servers can be accessed through a client (newsreader, email client with integratednewsreader, web browser, or website). GrabIt (http://www.shemes.com/) is an exampleof a free newsreader (see Figure 3.5). Users can subscribe to newsgroups, start new threads,read, post, filter, and browse articles. When a message is posted, the user’s online identity(screen name or alias) and email address appear in the post, and a copy of the message issent for distribution to all other news servers on the network at regular intervals. Figure 3.6shows a diagram of Usenet group exchanges. The green, blue, and red squares representthe group exchanges and subscriptions. The article format is similar to an email message.A Usenet article consists of (1) the header (provides all the administrative informationnecessary to process an article), (2) the body (contains the text of the article), and (3)(optionally) the signature (small amount of information, stored in a signature file, suchas a real name or user’s website address) [Har95]. The six mandatory header fields are“Date” (time and date the message was posted), “From” (user’s screen name), “Message-ID” (unique message-identification string), “Newsgroups” (specifies the newsgroup(s)

35

Chapter 3 Computer-mediated communication (CMC)

to which the article is posted), “Path” (indicates the route taken by an article since itsinjection into the Usenet), and “Subject” (subject of the message) [Har95; Mur+09; Smi99].Optional header fields are, for example, “References” (reference to any previous postingswith the same topic) or “Distribution” (specifies geographic or organizational limits on anarticle’s propagation) [Har95; Mur+09]. Originally, UUCP (Unix-to-Unix copy) was usedto transmit Usenet data [Har95; Nem+95]. More recently, NNTP (news transfer protocol)has become “the most common propagation mechanism” [Gru04]. NNTP is defined byRFC 3977 and updated by RFC 6048 [Eli10; Fea06], and UUCP, by RFC 976 [Hor86]. Otherimportant standards for Usenet are defined by RFC 5322 [Res08], RFC 5536 [Mur+09], andRFC 5537 [AL09].

Figure 3.6: Usenet group exchanges (adapted from [Wik13])

3.3.4 ForumForums are “asynchronous communication systems that allow a member of that forumto post a comment, idea, or question online” [Pfe10]. A forum is also known by variousother names such as a bulletin board, discussion board, message board, discussion group,discussion forum, Internet forum, or online forum [Bar10a; BS08; Hol08c; Pfe10; WS05].Usually, the forum software is installed on a central server. A popular forum softwareis phpBB (http://www.phpbb.com/), which is free and open source (see Figure 3.7). Thisforum software is written in the PHP scripting language (http://www.php.net/) and sup-ports multiple database engines such as MySQL (http://www.mysql.com/) or PostgreSQL(http://www.postgresql.org/). Delphi Forums (http://www.delphiforums.com/) andProBoards (http://www.proboards.com/) host free forums on the Internet. These providefor different features, such as spam protection, search engine optimization, forum poll cre-ation, calendar functionality, and private messaging system. Forum administrators haveseveral settings to choose from. For example, they can allow anonymous user-submittedmessages (posts), allow authors to delete their own posts, or force moderation of posts.Moderators are users who review posts before they are made available and public toall users. Barasa [Bar10a] notes that “[a] forum consists of a tree like structure”. Theforum structure can include categories, sub-forums, threads/topics, and posts/replies/-comments. Depending on how messages are grouped and displayed, forums are divided

36

3.3 CMC systems

into three display formats [Bul13; Kad+12]: (1) non-threaded or flat (no relation to anyprior messages; messages are displayed in chronological order), (2) semi-threaded (repliesto message topics are allowed; direct replies to replies are not allowed), and (3) fully-threaded display formats (replies to topic and/or replies to other replies are allowed; therelationship between the replies is shown by indenting each underneath the messagebeing replied to) [Bul13; Kad+12; RR13].

Figure 3.7: The forum software “phpBB”

Web browsers are needed to visit and participate in forums. Forums usually requireregistration (username, email address) to fully utilize the features. Registered users mustlog in to access their account with profile. A profile may include user statistics like theuser’s email address, registration date, or the the user’s total number of posts. Additionally,some forums allow guests (anonymous users) to view and/or post. A submitted messagemay display the username, the post date, and the body, for example. Posts are stored intext files or in a database.

3.3.5 BlogBlogs (short for web logs) are web pages similar to online diaries in which dated entries aredisplayed, commonly in reverse chronological order [Her+04; Rin07]. They are “usuallycomprised of short, frequently updated postings” [Thu+04]. Puschmann [Pus10] notesthat blogs are asynchronous, although “micro-blogging formats are shifting towardssynchronicity”. Microblogging is a form of blogging for distributing short messagesas well as small files. Twitter (https://twitter.com/) is currently the most popularplatform for microblogging. Apart from microblogging, various other types of blogsexist [Her04; Pus10; Rin07]. Examples are video blog (vlog), warblog (or milblog), andanonyblog (or anonoblog).

One or more authors (bloggers) maintain a blog by writing and publishing blog entries(articles or posts) around a topic. Bloggers frequently write under pseudonyms, “butthis seems less common than in other forms of CMC” [Pus10]. Blog entries are mainly

37

Chapter 3 Computer-mediated communication (CMC)

text-based but may include multimedia content (e.g., pictures). Categories (topics) help toclassify the entries. Readers often have the option to leave comments or questions after theentries [Pus10; Rin07], making a blog more interactive and interesting. Many types of blogsoftware are free and offer different functionality, e.g., search engine-friendly permalink(permanent and direct link to a single entry) structure, trackback and pingback (notificationbetween websites about related resources), or blogroll (list of links to other recommendedor useful sites). Examples of blog software are WordPress (http://wordpress.org/) andSerendipity (http://www.s9y.org/). A screenshot of Serendipity is shown in Figure 3.8.

Figure 3.8: The forum software “Serendipity”

3.3.6 Wiki

A wiki is an asynchronous “[w]eb-based software application that allows users to col-laboratively contribute and edit articles on various topics” [Hen09] (see Figure 3.9). Ithas a non-linear hypertext structure [Ebe+08]. Howard G. “Ward” Cunningham [Hen09]was the developer of the first wiki software, which was named WikiWikiWeb (or WardsWiki) [Cun13a; Cun13b]. Inspired by the shuttle bus system “Wiki Wiki Shuttle” at theHonolulu International Airport, Cunningham [Cun13b] “chose wiki-wiki as an alliterativesubstitute for quick and thereby avoided naming this stuff quick-web”. The term “wiki”is short for the Hawaiian word “wiki-wiki”. Wikipedia (http://www.wikipedia.org/), acollaboratively written online encyclopedia), Wiktionary (http://www.wiktionary.org/,

38

3.3 CMC systems

a free dictionary), and WikiLeaks (http://wikileaks.org/, designed “to bring importantnews and information to the public” [Wik13a]) are popular examples of wikis.

Users can easily access the content with web browsers. No additional software is required.Wikis are public (open to everyone) or restricted [Hen09]. The four types of participantsare reader, author, wiki administrator, and system administrator [Ebe+08]. If users haveaccess, they can create new pages (articles), and edit or remove existing ones. Wiki markupor WYSIWYG (what you see is what you get) editors help to format pages. Although wikisoftware platforms vary in detail. However, most of them include core functions suchas editing pages by clicking on the edit button, creating internal links to other articles orexternal links to web resources on the Internet, storing history with all previous versionsor modifying any single page made from version to version, and providing overviews ofrecent changes to wiki pages or of all changes within a predefined time period, as well asa sandbox (or playground) to carry out experiments, and search functions (e.g., full-textsearch) [Cho08; Ebe+08; Hen09; WT07]. Advanced wiki features are, for example, RSS(really simple syndication) feed support, skinning to change software’s look and feel,plugin architecture, or file uploading support. Wikis are primarily text-based, althoughmultimedia files can be added. The content of a page is stored in a file or database. Agreat variety of wiki software is free, such as MediaWiki (http://www.mediawiki.org/,which is used to implement Wikipedia) and XWiki (http://www.xwiki.org/). Hostingservices (wiki farms) such as Wikia (http://www.wikia.com/) offer both wiki softwareand web space [Hen09].

Figure 3.9: The wiki software “MediaWiki”

3.3.7 Online chatA chat system allows synchronous communication among multiple users [Bar03a]. Mainlydesigned for group communication, many chat systems also offer one-to-one communica-tion. The process of taking part in chat rooms (channels) is called chatting. The oldest formof online chats, such as IRC, is text-based. Additionally, graphic (e.g., avatars—graphicalrepresentation of a user—or other images), voice (e.g., spoken word with microphone),and video chats (e.g., visual with webcam) are also available. Web chatting, which is not

39

Chapter 3 Computer-mediated communication (CMC)

only text-based, is done with a web browser without any additional software. Comparedto IRC, web chats “are much simpler to use, ... and they have a much simpler structurethan IRC, as they usually consist of only a single server” [Dew+03].

Chatters log in through nicknames, which are registered or unregistered (guests). Grubbs[Gru04] establishes that they “enter, participate, and leave channels continuously”. Typi-cally, the user’s computer screen splits into two or more sections to show a current userlist as well as the sent messages. Herring [Her02] adds that “[m]essages are displayed toeveryone in the room or channel in the temporal order in which they are received, with theuser’s nickname appended automatically before each message”. Dewes et al. [Dew+03]note that “there is no common protocol base for chat systems,” with some exceptions (e.g.,IRC). Different technologies are used to implement chat servers and clients, such as C++,Java applets, PHP, HTML, Adobe Flash, or Ajax. For example, the IRC client qwebirc(http://www.qwebirc.org/) offers a web frontend (see Figure 3.10).

Figure 3.10: The IRC client “qwebirc” used for web chatting

3.3.8 Instant messagingInstant messaging (IM) is a synchronous form of CMC. Each user defines a list of contacts(contact list, buddy list, or friend list) by adding identifiers. For example, WhatsApp(http://www.whatsapp.com/) requires phone numbers that can receive international SMSmessages. Users interact with these specified, known contacts by exchanging messagesamong others, and users can control their contacts by blocking, ignoring, or deleting.A contact list typically displays the user’s friends by their screen names and currentcommunication statuses. The presence status is, for example, online, busy, away, invisi-ble, or offline. Besides text-based chatting, advanced IM clients (IM messengers) allow,among other things, voice/video chat, file transfers, remote desktop sharing, and inboundmessages while offline. IM systems use a client-server or a direct peer-to-peer (P2P) archi-tecture [RR05]. In client-server IM, the message is first sent to the server and then from theIM server to the intended receiver. In P2P IM, the client contacts the IM server to locatethe desired client and then it contacts the peer directly. Both of the basic IM architectures

40

3.3 CMC systems

are shown in Figure 3.11. IM clients are available in different versions, such as mobile orweb-based.

(a) Client-server IM (b) Peer-to-peer IM

Figure 3.11: Instant messaging architecture

Some popular IM applications are AIM (AOL instant messenger, http://www.aim.com/),Google+ Hangouts (http://www.google.com/hangouts/), ICQ (“I seek you”, http://www.icq.com/), WhatsApp (https://www.whatsapp.com/), and Yahoo! Messenger (http://messenger.yahoo.com/). Each user has to use the same IM client to communicatebecause the various clients are usually incompatible with each other. Therefore, manyproprietary and incompatible IM protocols exist, such as XMPP (extensible messagingand presence protocol), OSCAR (open system for communication in real time), or YMSG(Yahoo! messenger protocol). A multi-protocol client is able to connect to a number ofthese popular IM protocols, obviating the need to run multiple clients simultaneouslyto reach users on other IM systems. Multi-protocol clients are, for example, MirandaIM (http://www.miranda-im.org/), Pidgin (http://pidgin.im/), and Trillian (https://www.trillian.im/) (see Figure 3.12).

(a) Buddy list in Pidgin (adapted from[Pid14])

(b) WhatsApp on Android phone

Figure 3.12: Instant messaging

41

Chapter 3 Computer-mediated communication (CMC)

3.3.9 Text messagingText messaging (or texting, TM) is an example of asynchronous CMC. TM is also knownas SMS (short message service) or TMS (text messaging service, simply text). Availableon most mobile devices, it allows users to receive, store, delete, forward, reply, and sendmessages (see Figure 3.13). SMS is defined within the GSM (global system for mobilecommunication) digital mobile phone standard [Bar03a; Seg02] and has been ported toother network technologies such as GPRS (general packet radio service) and CDMA (codedivision multiple access) [LB05]. Typically, the following options are provided: mobile tomobile, mobile to email, web to mobile, and mobile to provider [SV08]. The maximumlength of a short message (SM) is 140 octets (i.e., 160 characters with 7-bit ASCII encoding)[Bra11; Hor08]. MMS (multimedia messaging service) is an extension of SMS that containsrich text (e.g., bold text), images (e.g., pictures), sound (e.g., audio files), or other contentor formats. MMS has a size limit of 300 kilobytes [LB05].

(a) Text messaging on Android phone withQWERTY keyboard

(b) Short discourse on Android phone

Figure 3.13: Text messaging

TM is a store-and-forward messaging technology. Barasa [Bar10a] notes that messages areheld “for a number of days until the phone is active and within range”. Components of atypical SMS-enabled GSM architecture are presented in Figure 3.14. The GSM network iscomposed of three subsystems: the base station subsystem (BSS), the network subsystem(NSS), and the operation subsystem (OSS). Le Bodic [LB05] adds that “[t]he OSS imple-ments functions that allow the administration of the mobile network”. The OSS is notrepresented in this figure.

• Terminal equipment (TE): Devices such as a personal digital assistant (PDA) orpersonal computer (PC) are external devices that can be connected to the ME.

• Mobile station (MS): The MS consists of two main elements: the ME and the SIM.A short message is typically stored in the MS [LB05].

– Mobile equipment (ME): An ME is a device which sends and receives radiosignals within a cell site. Le Bodic [LB05] adds that it “contains the radio

42

3.3 CMC systems

Figure 3.14: SMS-enabled GSM network architecture (adapted from [LB05])

transceivers, the display, and digital signal processors”.

– Subscriber identity module (SIM): The SIM stores a unique IMSI (interna-tional mobile subscriber identity) which allows the network to identify thedevice when attached to the mobile network [LB05]. It is embedded in a plasticcard.

• Base station subsystem (BSS): The BSS consists of the BTS and BSC, together withcertain interfaces.

– Um: This interface is also known as the air interface or radio link.

– Base transceiver station (BTS): Le Bodic [LB05] notes that BTS “implementsthe air communications interface with all active MSs located under its coveragearea (cell site)”. Several stations are connected to a single BSC.

– Abis: The interface between the BTS and the BSC that allows control of the radioequipment and radio frequency allocation in the BTS.

– Base station controller (BSC): The BSC manages one or multiple BTS units.It performs a set of essential control functions and coordination between theBTSs.

– A: This interface, which lies between the BSS and the MSC, “provides meansfor BSS management, call control and mobility management” [Kas+08].

• Network subsystem (NSS)

– Home location register (HLR): The HLR contains subscription details for eachregistered device.

– Mobile switching center (MSC): The MSC performs the switching functionsof the network. It also provides functions for registration, authentication, andlocal user updates [LB05].

– Short message service center (SMSC): Le Bodic [LB05] explains that “[t]heSMSC manages the delivery and submissions of messages and commands toand from SMEs”. SMEs (short message entities) are elements that can send orreceive short messages (e.g., software application in a mobile handset) [LB05].

• Email gateway (EGW): The email gateway makes it possible to send messages froman SME to an Internet host, and vice versa [LB05].

• Public switched telephone network (PSTN): The domestic public telephone net-work is traditionally a public utility providing a circuit-switched network optimizedfor voice communications [Hor08].

43

Chapter 3 Computer-mediated communication (CMC)

3.3.10 Social network siteHuman beings are (usually) a social breed, and so in this computer era, social networksites (SNSs) “have become an increasingly popular way to initiate, develop, and maintainfriendships online and to show one’s social network of friends” [WM11]. The three basiccomponents of SNSs are profiles, relations, and interactions [Die12]. SNSs are web-basedservices that often contain the functions of instant messaging, online chat, blog, or forum.Examples of SNSs include Facebook (https://www.facebook.com/), Google+ (https://plus.google.com/), LinkedIn (http://www.linkedin.com/), and Myspace (https://myspace.com/).

3.3.11 Other CMC systemsThe previously mentioned CMC systems are mainly text-based and are the main focus ofthis work. Many other systems exist. For example, audio- and video-based forms whichcan be both asynchronous (e.g., streaming media such as video clips) and synchronous(e.g., video conferencing). The instant messaging client Skype (http://www.skype.com/)offers, among other things, voice (e.g., using VoIP—voice over Internet protocol) technol-ogy and video calls, file sharing, and screen sharing. Media and content sharing websitesallow one to post and share media such as music, photos, videos, and PDF (portable docu-ment format) presentations. Popular sites are SoundCloud (https://soundcloud.com/),Flickr (http://www.flickr.com/), YouTube (http://www.youtube.com/), and SlideShare(http://www.slideshare.net/).

3.4 Advantages and disadvantages of CMCLike any form of revolutionary technology, CMC has its significant advantages anddisadvantages. There is a sizable literature on this subject with different focuses onCMC, both in general and more specifically. For example, focuses are computer-mediatedcommunication compared with other forms such as face-to-face (F2F) (e.g., [Bor97]), CMCused in the workplace (e.g., [Lee11]), CMC in second-language classrooms (e.g., [Hat03]),or virtual teams (e.g., [AE+09]). In general, CMC is a powerful medium, but it cannot (i.e.,should not) replace F2F communication. There follows a brief summary of the advantagesand disadvantages of CMC.

3.4.1 TechnologyHardware, a network, and software are needed to interact, but this combination can createcountless problems which are often time-consuming or nerve-racking. Hardware andsoftware may be incompatible, there may be no access because of a server crashing, thecommunicator may have poor writing skills, or even inhibitions related to using computers.Additionally, low bandwidth may restrict the communication. This is a big advantage fortext-based CMC as it does not require high bandwidth. Because communicators of CMCtechnologies leave traces of their use (e.g., presence, actions), messages can be observedinvisibly or traced back; thereby raising a variety of social and ethical issues [Her02; Kal07].Lane [Lan94] notes that “[m]any of the disadvantages of CMC appear to relate to theparticipants rather than to the medium itself”, especially any inhibitions about using CMCtechnology and disinhibition on the Internet [Bub01; Lan94; Ste12]. Disadvantages of

44

3.4 Advantages and disadvantages of CMC

CMC include Internet addiction disorder [WS05], cyberstalking (systematic harassing orthreatening) [WS05], email bombing (used to overload email systems by sending massmailings) [Yar06], flaming (hostile and aggressive interaction, e.g., derisive or angrycomments) [Thu+04], spamming (sending unsolicited and usually unwanted emails)[Cry04; Hor08; WS05], trolling (“posting of incendiary comments with the intent ofprovoking others into conflict”) [Har10], cyber-trespass (“crossing boundaries into otherpeople’s property and/or causing damage”) [Yar06], and cyber-deception [Yar06].

3.4.2 Costs

CMC technology and its use (e.g., equipment for communicating, administration) is rarelyfree, but costs little [Kie+84]. Tennison [Ten99] notes that “[o]nce time and money hasbeen invested ..., the results of that investment should be available” to others. CMC canalso reduce costs. For example, groupware (collaborative software) cuts down on travel ortransportation costs [Bub01].

3.4.3 Availability

CMC is available (nearly) everywhere.

3.4.4 Place

CMC is place-independent because there is no need “to be in close physical proximity”for communication [Alt97]. On the one hand, physically handicapped people can benefitfrom CMC [Alt97; HT93]. On the other hand, “[v]irtual team members may feel isolatedand missing a team identity” [Ben09]. Furthermore, there is a lack of physical humancontact such as touching or hugging [Lan94; Lei+08].

3.4.5 Mobility

CMC devices, especially mobile devices (e.g., smartphones, laptops, tablet computers),can be carried around anywhere by users. This mobility allows users to be connected withothers at all times. This can influence work-life balance negatively [McL08; Ric+06].

3.4.6 Time

Asynchronous CMC is time-independent because communicators do not need to meet ata designated time. They can also take extra time to reflect on and formulate thoughtfulmessages [Alt97], which is useful for speakers of other languages, for example. However,an overly long lapse can be a disadvantage, because users may have to wait for responsesand delayed feedback [Riv02b]. That might result in frustration, demotivation, or lossin textual coherence [LZ05]. Synchronous communication can be difficult with others“due to time-zone barriers and different work schedules” [BSJ11]. CMC can be moretime-consuming than F2F [Neu05]. For example, writing a message on the tiny keypad ofa mobile phone takes physical effort and more time [Bub01; Riv02b; Seg02].

45

Chapter 3 Computer-mediated communication (CMC)

3.4.7 AccessCMC allows 24/7/365 access to communication, information, and services. Applicationsvary in accessibility because some have restricted access, whereas others are open to all[LO04].

3.4.8 Data exchangeNumerous media (text, graphics, sound, and video) can be easily used and quicklyexchanged via computer [Kie+84].

3.4.9 InteractivityCMC allows one to send data to individuals or groups of any size.

3.4.10 Shared knowledgeMany people with a lot of different (and also similar) experiences and knowledge areavailable online [Lei+08; Lei+12; WM11]. The sharing that ensues can help to solveproblems, but sometimes people make comments which are misleading, factually wrong,or have nothing to do with the subject [Lei+12]. Due to cultural and language differencesamong virtual team members, effective knowledge sharing between members can bedifficult [GW08; GW13; Pow+04].

3.4.11 Amount of dataAccess to tremendous amounts of information is possible. This large amount of informa-tion can be overwhelming, leading to information overload [Che03; HT85].

3.4.12 Richness of communicationBennett [Ben09] notes that “[c]ommunication media vary in its ability to convey richinformation”. Theories such as the social presence theory and the media richness theory(also known as the information richness theory) “ascribe an important role to cues inmaking the interaction more ’social’” [Tan03]. F2F is the richest form of communicationbecause it includes facial expressions, gestures, vocal tone, and proximity [Tan03; TP03].Compared to F2F communication, CMC is limited and has lower richness, which can leadto misunderstandings [Alt97; Boo04; Bub01; DL83; Her02; Kal07]. Video conferencing isvery close to F2F communication, whereas text-based communication is less rich [BLO07].

3.4.13 AnonymityCMC is more anonymous (and therefore more egalitarian) than F2F communication[Her02]. Johnson [Joh97] mentions that “[a]nonymity poses several benefits and dangers”.These are, for example, unencumbrance by prejudice or stigma based on age, disabilities,gender, language, ideology, nationality, race, or sexual orientation [Ben09; Lei+12; WM11];building trust and goodwill through anonymity, which is not easy [Cro+16; LO09; Nis99;Nis07; Put00]; feeling safe enough to disclose private or intimate information [Pfe10];discussions about topics which would be difficult in an F2F environment [WM11]; misuse

46

3.5 Ethical issues in CMC research

of the technology (e.g., deception, flaming) [Her02]; or losing sight of the fact that peopleare really addressing other people—not the computer [Kie+84].

3.4.14 Storage of dataCMC provides the capability to record, store, and archive data permanently online oroffline [Boy10]. As a result of the persistent nature of CMC [Riv02b], users can reread,replay, reproduce without loss of value (copy), edit prior to sending, or search previouscommunications with less effort [Bay10; Joh97; Ten99]. Having the data available alsomakes post-processing analyses easy (e.g., using automated tools).

3.5 Ethical issues in CMC researchWest and Turner [WT11] define ethics as “the perceived rightness or wrongness of anaction or behavior”. Fagernes and Ribu [FR07] add that “[t]he difference between rightand wrong may be hard to determine, and is not necessarily a question of what is legal”.Various ethical issues arise in CMC research, especially within “data collecting, in datahandling, and in reporting of findings respectively” [Liu99]. Maner is quoted by Bynum[Byn11] as saying that “computers introduced wholly new ethics problems that wouldnot have existed if computers had not been invented”. Other researchers see no sense indeveloping a “new ethics”, because the new computer-ethical issues are only old ones in anew guise [Eyn+08; Joh99]. Liu [Liu99] points out that each “individual researcher has theultimate responsibility for keeping the best interests of the research participants in mind”.Several related topics of interest that have received attention from researchers include:(1) privacy, anonymity, and confidentiality, (2) copyright, (3) harm and risk, (4) informedconsent, (5) public versus private spaces, (6) respect for persons, and (7) research withminors [BE08; Fli+04; HV08; HB05; Neu05; PW05; Seg02].

3.6 Chapter summaryThis chapter mainly gave an overview of computer-mediated communication, its charac-teristics, and its different forms (i.e., CMC systems). It showed that hardware, software,and network play a central role in CMC. Based on the knowledge of Chapters 2 and 3, acommunication model for CMC (Chapter 4) and a multiple-views analysis approach tocomputer-mediated discourses (Chapter 6) are developed.

47

CHAPTER 4 Communication model for CMC

CMC allows people across the world to communicate with each other. The new transac-tional model attempts to describe the entire process of computer-mediated communication.It shows how each of its basic elements relate to one another, and how the model functions.The following statements or text fragments, as mentioned in Section 2.2, are taken over bythe author and influence the new communication model for CMC (see Table 4.1).

Table 4.1: Influencers for the new modelNew model (basic element) Model Statement/text fragment

context Johnson communication takes place in a context whichis external to both speaker and listener andto the communication process as well

Newcomb social situation in which the communicationtakes place

CMC system Shannon/Weaver transmitter that encodes the messageShannon/Weaver channel or medium which transmits the signalShannon/Weaver receiver which decodes the message

communicator Barnlund persons are simultaneously sending andreceiving messages

Bühler sender and receiverOsgood/Schramm each participant is encoder, decoder, and

interpreterOsgood/Schramm they continually swap rolesSchramm sender encodes the message, based on the

sender’s field of experienceSchramm signal is decoded, based on the user’s

field of experienceShannon/Weaver information source which produces the messageShannon/Weaver destination for whom the message is intended

message Maletzke spontaneous feedback from the receiverShannon/Weaver signal

communication barrier DeFleur noise interferes at any stage in the processShannon/Weaver introduced additional element, called noise,

as a factor that may distort the message

The main differences between this new communication model for CMC and the others(described in Section 2.2) include:

• Context: Time is part of the physical context. Both server time and client time playan important role for creating message timestamps in discourses.

• CMC system: The system is complex and consists of hardware, software, andnetwork. It can be strongly influenced by communication barriers such as networklag or server downtime.

49

Chapter 4 Communication model for CMC

• Communicator: A communicator (sender, receiver) is not only a human being.There exists both a virtual and a real world in the communicator’s mind (at least if acommunicator is a human). Communication takes place primarily via virtual andunique names (identifiers).

• Message: A CMC message can contain more than the sender’s message. For exam-ple, an email also contains information about the CMC system (see Table 6.10).

4.1 Computer-mediated communication processIn general, the exchange of messages in CMC between communicators takes place asfollows (see Figure 4.1).

Figure 4.1: Transactional communication model for CMC

Communication (especially the meaning) is embedded in the context. Communicators useCMC systems to communicate with others. They are usually identified by unique names(identifiers). Senders encode messages based on their real and virtual world. A messageis transmitted over a communication medium (with one or more channels) by a CMCsystem. Receivers decode and interpret the meanings of the messages. Communicatorssimultaneously send and receive messages. Additionally, communication barriers canoccur which influence the communication process.

4.2 Basic elements of CMCAfter studying different communication models and CMC systems, the author concludedthat the basic elements of computer-mediated communication are: context, CMC system,communicator, message, and communication barrier. These five basic elements determinethe quality of the communication (see Table 4.2). Some differences exist between this andthe other communication models presented in Section 2.2. These differences include: (1)communicators have virtual and real worlds in their minds, (2) the presence of identifiers(nicknames), and (3) the importance of a CMC system which is divided into hardware,software, and a network (including medium/channel).

50

4.2 Basic elements of CMC

Table 4.2: Basic elements visualized within the communication model for CMCBasic element Visualization

context

CMC system

communicator

message

communication barrier

4.2.1 ContextCommunication always relates to a specific context. Messages can only be understoodin the related context. The word “context” is often informally used to indicate the back-ground, environment, conditions, setting, or situation [Fuj09; Van08; Ver+07]. Jandt [Jan10]defines context “as the environment in which the communication takes place and [which]helps define the communication”. Merriam-Webster Online [Mer14c] characterizes contextas “the group of conditions that exist where and when something happens”. In general,context is multidimensional and can involve, for example:

• cultural, “the values, attitudes, beliefs, orientations and underlying assumptionsprevalent among people in a society” [Ver+07];

• historical, “messages are understood in relationship to previously sent messages”[WT11];

• physical, the communicator’s location (e.g., conference room, restaurant, soccerstadium), the environmental conditions (e.g., room temperature, lighting, noiselevel), distance between communicators, and time [Fuj09; Ham11; SM07; Van08;Ver+07];

• psychological, “[t]he mood and feelings each person brings to a conversation”[Ver+07]; or

• social, “the nature of the relationship between the participants” [Ver+07], for exam-ple, communication among family members, friends, or strangers [CM11; Van08;Ver+07].

51

Chapter 4 Communication model for CMC

4.2.2 CMC system

For most communicators, the CMC systems they use are black boxes because they oftendo not know how their CMC systems really work (e.g., transmitting and receiving data).The infrastructure of a computer-mediated communication system can be divided into thefollowing components: hardware, software, and network.

Hardware: This is made up of multiple physical components. Hardware includes suchcomponents as the CPU (central processing unit), memory modules, motherboard, graph-ics card, network card, HDD (hard disk drive), power supply, monitor, keyboard, mouse,and printer. A computer that is connected to a computer network is called a host. A hostis a designated server (makes resources available for other computers), a client (accessesthe resources of a server), or a peer (acts as both client and server) [Shi02]. A collectionof networked servers is called a server farm. Farms are mounted in server rooms ordata centers by large companies or Internet service providers (ISPs). ISPs offer differentservices, such as Internet access or web hosting.

Software: In addition to hardware, computers also require software. Software, whichruns on hardware, “is a set of programs that instructs the computer about the tasks tobe performed; hardware carries out these tasks” [Goe10]. It includes system software(e.g., operating system, server software, device driver, utility program) and applicationsoftware (e.g., web browser, email client, newsreader) [MP13].

Network: A network is a combination of hardware (e.g., network card), software (e.g.,network operating system), and cabling (e.g., twisted pair cables) [Odo04]. It allowscommunication between communicators over a communication medium. A medium maybe divided into multiple channels [Lin+12]. Channels (and therefore also the mediumitself) use analog or digital signals to transmit information between communicators. Theterms “medium” and “channel” are often used interchangeably in the related literature[Kar13; Win+09]. The Internet is “the network that is most commonly associated withthe term CMC” [Mar06]. Sometimes referred to as a “network of networks” [Sch+09],the Internet is the world’s biggest and best-known computer network. Different networkservices such as the World Wide Web (WWW) are provided by servers to other clients.

4.2.3 Communicator

Communicators can both send and receive messages simultaneously. Senders initiate thecommunication process.

Synonyms: Other interchangeable names for a communicator who sends messages are,for example, author, emitter, encoder, orator, sender, source, speaker, talker, transmitter,or writer; a communicator who receives messages are, for example, audience, decoder,destination, hearer, listener, reader, receiver, recipient, spectator, or target [Che12; Eis04;Kuc+08; Nö95; Riv02a; Rod00; Ste06; WT11]. Depending on the CMC system and currentactions, communicators have several specific names. For example, a blogger is someonewho blogs, a chatter uses chats to talk online, an emailer writes emails, a poster is a bulletinboard author, and a texter sends text messages.

Types: Communicators are primarily human beings. Non-human beings also communi-cate on CMC systems, such as bots. A bot (short for robot) appears as a human user but is

52

4.2 Basic elements of CMC

actually a computer program. Bots provide special services that are automated or semi-automated. Examples of bots are Wikipedia’s ClueBot NG, which tries to detect vandal-ism and reverse changes automatically [Wik13b]; Eggdrop (http://www.eggheads.org/),which is used to manage IRC channels [Mut04a]; forum bots, which add posts includingadvertising; and chatbots, which communicate with other chat users.

Encoding and decoding: Jandt [Jan10] notes that “[u]nfortunately (or perhaps fortu-nately), humans are not able to share thoughts directly”. Senders translate (encode) theirinformation (such as ideas, thoughts, feelings, and intentions) in their head (i.e., commu-nicator’s mind) into codes that can be understood by the receivers [Fuj09; Pea+11; Rod00;Ste06; TM11]. A code (language) is “a systematic arrangement of symbols” [Pea+11].Verbal codes (symbols and their grammatical arrangement) and nonverbal codes (sym-bols that are not words, e.g., bodily movements) are the two types of code [Pea+11]. Ifmessages are not properly encoded, it is unlikely that they will be exactly or fully under-stood. Senders transmit (send) the messages and usually expect the receivers’ feedback[Rod00]. Receivers have to interpret and translate (decode) the messages in their braininto meaningful information [Rod00].

4.2.4 MessageMessages refer to what communicators (i.e., senders) have transmitted to receivers overcommunication channels (mediums).

Synonyms: In general, synonyms for messages include “data, information, transmission,text, topic, subject, and ideas” [Eis04]. More specific synonyms are, for example, a wholeor part of a chat message (e.g., of IRC), a tweet (Twitter), an email, a forum post, or anSMS (text messaging).

Types: Messages include verbal (e.g., spoken or written information) and nonverbalcontent (e.g., body movement, gestures, or eye contact) [BT10; Mon07; WT11]. They maycontain texts, graphics, voices, videos, “and any other medium that can be represented indigital form” [ST08]. Many CMC systems are primarily text-based, where “participantsinteract by means of the written word, e.g., by typing a message on the keyboard of onecomputer which is read by others on their computer screens” [Her96].

Protocols: They are sets of rules and conventions for message exchanges between elec-tronic devices, such as computers [Hor08; Mer13; Pop06]. Protocols are implemented ashardware, software, or both. The formal specification is based on messages and consistsof the message format specification (structure of the message), the message-processingprocedures specification (i.e., messages), and the error processing specification (set of errorreactions) [Pop06]. Typically, a message consists of a message header and a body (payload)[Pop06]. A wide range of protocols exist, such as DHCP (dynamic host configurationprotocol), HTTP (hypertext transfer protocol), and FTP (file transfer protocol) [BL+96;Dro+03; PR85].

4.2.5 Communication barrierCommunication is successful when the ideas, thoughts, and emotions of a sender matchwith the interpretations of the receiver(s). Interactions between communicators can leadto miscommunication and conflicts due to one or more barriers. There are a large number

53

Chapter 4 Communication model for CMC

of communication barriers that can affect, complicate, and even interrupt or stop thecommunication process. Many examples and types are described in the literature. Gilland Adams [GA02] identify four types of barriers (mechanical, semantic, psychological,and organizational). Another typology is given by Kushal [Kus10], who classifies bar-riers of communication into five types (semantic or linguistic, organizational, personal,emotional or perceptional, and physical). Kaul [Kau09] describes communication barriersas “impediments/blocks/obstacles in the process of communication that hinder smoothprogression of ideas and concepts”. Communication barriers “can arise while the messageis being developed, transmitted, received or interpreted” [Kus10]. Zastrow [Zas01] notesthat “[a] breakdown in the communication process may occur if the intended messagewas not encoded or decoded properly”. This may be inevitable because no two minds arethe same. Although completely barrier-free CMC communication is not possible, effortsshould be made to establish communication that is as effective as possible.

4.3 Chapter summaryThe author presented his communication model for CMC in detail. This basic modelconsists of five elements, which are context, CMC system, communicator, message, andcommunication barrier. The following chapter introduces the CMC system Internet RelayChat (IRC) in more detail. IRC is later used for discourse analysis.

54

CHAPTER 5 Communication model for CMCapplied to Internet Relay Chat(IRC)

IRC is a CMC system through which text and files can be transferred. It “was the Internet’sfirst widely popular quasi-synchronous” CMC system [Rin+01]. During the summer of1988, Jarkko Oikarinen was working in the Department of Information Processing Scienceat the University of Oulu [Oik93; Ste00b]. Oikarinen administered the department’sSun Unix server running on the university’s bulletin board system (BBS), known asthe OuluBox. But this work did not take up all his time, so he started to develop acommunications program. The purpose was to make OuluBox a little more usable. JyrkiKuoppala had already implemented a program called “rmsg” for sending “short (a fewlines) messages to users on other machines” [Kuo89]. Oikarinen [Oik93] notes that “[i]tdidn’t have the channel concept implemented (though it supported it), so it was mainlyused for person-to-person communications”. Another program on OuluBox was MUT(MultiUser talk), written by Jukka Pihl. This software had a bad habit of malfunctioning.Oikarinen decided to improve MUT, which was based on the basic “talk” program in Unixcomputers. He called the result IRC “and first deployed it at the end of August, 1988”[Ste00b]. “BITNET Relay” (“The Interchat Relay Network” or known simply as “Relay”)“was a good inspiration for IRC” [Oik93]. It operated on the BITNET (“because it’s timenetwork” or originally “because it’s there network”) [GC00], which had started in 1981 asa cooperative US university computer network [Kul08; WM10]. Developed by Jeff Kell in1985 [Hol+12], BITNET Relay can be considered as the precursor to Internet Relay Chat.Markku Järvinen improved the only IRC client “by including support for Emacs editorcommands” [Ste00b]. Oikarinen [Oik93] adds that “the idea of BBS extensions was givenup and just IRC stayed”. He continued the development of IRC over the next years toclearly define the IRC protocol in RFC 1459.

IRC became well-known to the general public through the 1991 Soviet coup d’état attempt,as well as during the previous invasion of Kuwait, and the 1994 Northridge earthquake[Gru04; Ste00b]. It was used both to get up-to-date information and to disseminateinformation around the world [Gru04; Ste00b]. Nowadays, IRC is still used by thousandsof users worldwide. Even WikiLeaks or the hacktivist (blend of hack and activist [Oxf13])group “Anonymous” uses IRC; although compared to previous years, “IRC has seen adramatic downturn in usage” [Pin13b]. Other perhaps easier ways to communicate orshare files were developed “resulting in less of a need for IRC” [Pin13b]. Nevertheless,IRC is not dead.

In general, communicators use client programs to connect to IRC servers. Each commu-nicator is identified by a unique nickname. After joining channels, communicators cantake part in conversations by writing and sending text-based messages to other communi-cators. The basic elements of CMC (see Section 4.2)—CMC system, communicator, andmessage—are adapted to IRC. These apply to all IRC discourses in general. The basicelements “context” and “communicator barrier” are not mainly IRC-specific.

55

Chapter 5 Communication model for CMC applied to Internet Relay Chat (IRC)

5.1 CMC system: IRCIRC was first formally documented in May 1993 by RFC 1459 [OR93]. An RFC, whichstands for “request for comments”, is a formal document from the Internet EngineeringTask Force (IETF). In 2000, four new RFCs “were created to address many of the changesthat took place since the original was written” [IRC12]. These are RFC 2810 (architecture),RFC 2811 (channel management), RFC 2812 (client protocol), RFC 2813 (server protocol),and RFC 7194 (default port via TLS/SSL) [Har14; Kal00a; Kal00b; Kal00c; Kal00d]. Thetwo figures below (see Figure 5.1) give an overview of IRC.

(a) Focus on hardware and network (b) Focus on software

Figure 5.1: Scheme of IRC

5.1.1 IRC server and networkThe IRC protocol is based on the client-server model [Cha00; OR93] in which users areconnected via clients to servers. IRC servers handle the communication transfer as acommunication relay for the logged-in users. An IRC server can expand the IRC networkby connecting to other IRC servers to form the “backbone of IRC” [OR93]. There are twotypes of servers: leaves and hubs. A hub is connected to more than one server, while aleaf is only connected to one (see Table 5.1).

Important terms in this context are lag, netsplit, and netjoin. Witt [Wit04] defines networklatency (lag) as “[a] delay in the transmission of messages between users or between auser and a server”. A netsplit (short for network split) divides the network into separatenetwork fragments in case “two servers lose their link for any reason” [Cha00]. A netjoin(network join) takes place when servers relink (reconnect) to others on the networkafter a netsplit. This merging is a source of problems because the servers must keepall information about users and channels that exist on the entire network [Ant09]. Forexample, nickname collisions are possible if in the separate fragments of the IRC network,the same instance of a nickname exists on the network [Ant09]. Additionally, updatedinformation can lead to high information exchange between servers [Ant09].

5.1.2 IRC server software (IRC daemon)IRC server software, which is commonly known as IRC daemon (IRCd), implementsthe IRC protocol. Popular IRC daemons are, for example, InspIRCd (http://inspircd.

56

5.1 CMC system: IRC

github.com/) and UnrealIRCd (http://www.unrealircd.com/). Both are free software,licensed under the “GNU General Public License” (GNU GPL or simply GPL). Servershave a set of commands for communicating. An overview of the basic server commandsis presented in Table 5.3.

5.1.3 Service: IRC botDepending on the network, different services can be offered. Services are bots whichimplement a set of features on IRC servers (e.g., logging channels). Table 5.1 describesthe most common IRC services, whose names mainly end in “Serv”. The most popularservices are ChanServ and NickServ, which are covered in popular service packages suchas Anope (http://www.anope.org/) or Atheme (http://www.atheme.net/).

Table 5.1: IRC servicesIRC service Description

AuthServ Similar to NickServ, but this authentication service allows one to only register an account(AuthName), that identifies users to ChanServ.

BotServ Requests a bot for the channel, that is predefined by the network administration.ChanServ Registers and controls various aspects of the channel.HelpServ Provides general help on a variety of IRC topics (e.g., simple overview of IRC services).HostServ Allow users to register virtual hosts (vhosts), which hide the real IP address by a vanity

mask and thus protect a user’s privacy.Global Notification service, that sends server-wide messages to all users on the network.MemoServ Records and delivers short messages (memos) to offline users or to entire channels.NickServ Registers and protects user nicknames.OperServ Helps operators in managing the network in a very efficient way. Also known, for exam-

ple, as AdminServ or RootServ.StatServ Performs various statistical analyses on the network.

Unlike many other networks, IRC services on QuakeNet consist of one-character ab-breviations. The best-known service is “Q” (channel management service, similar toChanServ). Other services exist, such as “G” (automated facilitate and control service), “H”(automated help service), “L” (lightweight channel management service), “O” (operatorservice), “P” (proxy scanner), “R” (“Q” and “S” request service), “S” (spam scanner), and“T” (Trojan scanner).

5.1.4 IRC client softwareMutton [Mut04a] emphasizes that IRC is one of the most accessible chat environments.IRC clients communicate directly with IRC servers (port 6667 is the most common). Theyusually contain areas to view messages, logged-in users (sorted list of nicknames), andtext fields to enter messages. Mutton [Mut04a] notes that IRC clients “are availablefor virtually all operating systems”. mIRC (http://www.mirc.com/) supports MicrosoftWindows. XChat (http://www.xchat.org/) and irssi (http://www.irssi.org/) support,e.g., Microsoft Windows, Unix-like operating systems such as FreeBSD (http://www.freebsd.org/), and Apple OS X. Also, qwebirc (http://qwebirc.org/) and KiwiIRC(https://kiwiirc.com/) are web clients for IRC. Khaled Mardam-Bey’s mIRC is the mostpopular IRC client for Windows. It has a graphical user interface (see screenshot in Figure

57

Chapter 5 Communication model for CMC applied to Internet Relay Chat (IRC)

5.2). This client offers lots of tools, functions, and powerful features [Mut04a]. RunningmIRC the first time, a dialog automatically pops up, where some details are provided (fullname, email address, nickname, and alternative nickname). Subsequently, it is possible toconnect to servers, join specified channels, and to communicate. mIRC presents both users(current logged-in nicks) and chat messages (which sometimes include written nicknamesof the addressees) as lists of text. An overview of essential mIRC commands is explainedin Table D.3.

Legend1 titlebar2 menubar3 toolbar4 switchbar5 treebar6–7 server window6 status window7 input box/command line8–10 channel window8 chat window9 user list10 input box/command line11–12 query window11 chat window12 input box/command line

Figure 5.2: Screenshot of the mIRC client

In the narrow sense, both bot and bouncer (proxy) are a special form of client, because“[a]nything that connects to an IRC server that is not another server is called a client”[Mut04a]. Bouncers “are programs used to relay network connections, much like a proxy”[Ber+09]. They stay on the channel, permanently reserving the users’ nicknames, andlogging the messages while users are disconnected from the Internet [Hä10]. A BNC(short for bouncer) is also able to hide the source Internet protocol (IP) address of a user[Ber+09; Hä10]. Some examples of BNC software are JBouncer (http://www.jibble.org/jbouncer/), psyBNC (http://www.psybnc.dk/), and ZNC (http://wiki.znc.in/ZNC).

5.1.5 IRC channel

IRC is organized into channels which “are often built around a particular topic” [Mut04a].A channel is a “virtual meeting place” [Koz05] “where group conversations occur” [Von12].It is sometimes called a “chat room”, “though IRC purists scoff at the use of that term”[Koz05]. Channel names are usually prefixed with a number sign “#”, as in #help. “#”indicates a normal channel that is available across a whole network. Other channel prefixesare “&” (local channel which is not distributed outside the IRC server), “+” (modelesschannel with no operator and no topic), and “!” (safe channel that is not implicitly created)[Kal00b]. Some channels are password-protected and require invitations, and there arecertain restrictions on channel names. They may not contain a space, a Control-G (Ctrl+G,

58

5.2 Communicator and nickname

ˆG, or ASCII 7), or a comma [Cer69; Kal00c; OR93]. Channel names are case-insensitive[Kal00b; Kal00c]. Users can join multiple channels and communicate with others fromall over the world. When a user attempts to join a non-existent channel, the channel iscreated by the server. All clients in a channel receive messages addressed to that channel[OR93]. Additionally, this first user is given operator status (ops) on that channel. Ifthe last channel operator (commonly abbreviated as chanop or chop) leaves the channel,the user loses the chanop status and the channel is opless. This risk can be lessened bytransferring the rights to other users or by using IRC services (bots). When the last userleaves the channel, the channel ceases to exist. Only channel operators can modify thevarious single case-sensitive channel modes, which “alter the characteristics of individualchannels” [OR93]. Available standard modes for channels are “o” (channel operatorprivileges), “p” (private), “s” (secret), “i” (invite-only), “t” (topic changes only by channeloperator), “n” (no external messages to channel from clients), “m” (moderated), and “l”(set user limit) [Kal00b; Kal00c; OR93].

5.2 Communicator and nicknameNicknames play a key role in text-based chat conversations. They are the communicator’sidentity. A nickname is an interface between the real world and the virtual one. Commu-nicators are human beings (circlets) and non-human beings (squares). They use hardware(big circles) and software (triangles) to communicate others, including IRC clients, bots,and bouncers (see Figure 5.3).

Figure 5.3: Scheme of IRC with communicators

In IRC, communicators can send and receive messages at the same time. All usersconnected to IRC are identified by unique nicknames within the IRC network. Theyare able to perform all the basic functions of IRC. The NICK command “allows usersto change their nicknames as often as they wish” [Rei91]. Nicknames create the “firstimpression” [Joh04], and “must be chosen with care” [BI95]. Johnová [Joh04] adds, “Themore catching [sic] the nick is, the bigger are the chances of being addressed by otherusers”. A detailed analysis about the creation of IRC nicknames is presented in Chapters 9and 10.

5.3 IRC messageThis section shows how IRC messages (which may include commands) are delivered andwhat the IRC message format looks like.

59

Chapter 5 Communication model for CMC applied to Internet Relay Chat (IRC)

5.3.1 IRC message formatThe IRC protocol uses the TCP/IP (transmission control protocol/Internet protocol) net-work protocol and optionally TLS/SSL (transport layer security/secure sockets layer)[Bra89; DR08; Fre+11; Har14; OR93]. “Both the server and the client must comply with theprotocol”, which is based on a client-server model, “in order to communicate effectively”[Cha00]. Servers and clients send each other messages over the IRC network. Nearly allIRC messages “sent to the server generate a reply of some sort” [OR93]. The most commonreply is the numeric one [OR93]. This three-digit number (raw numeric) can be used touniquely identify the message. The maximum character length of a single message sentfrom and to the server is 512, “including the trailing carriage return and linefeed pair(\r\n)” [Mut04a; OR93]. The IRC message format defined in RFC 1459 is representedas “pseudo” Backus-Naur form (BNF) in Table B.1 [OR93]. The augmented Backus-Naurform (ABNF) was later defined by RFC 5234 [CO08]. The ABNF representation for theIRC message format is defined in RFC 2812 in Table B.2 [Kal00c].

5.3.2 IRC message deliveryEvery message, whether typed in by hand or by using the GUI, is sent “quasi-synchronous[ly]”[GJ99] by the client across the Internet to IRC servers. These servers act as relays. Theyhandle the communication transfer, making sure that each message is sent to the partic-ipants concerned that are connected to that channel (one-to-many to a group/channel).Private messages (PMs or queries) are seen only by two users on the network (one-to-onecommunication), and can be sent without being in the same channel. Figure 5.4 illustratesa small IRC network with five servers (A, B, C, D, and E) and four clients (1, 2, 3, and 4)[OR93].

Figure 5.4: IRC message delivery (adapted from [OR93])

In Table 5.2, one-to-one communication (Examples 1 to 3) and one-to-many to a group(channel, Examples 4 to 6) are presented. Additional information and a more detaileddescription are given in Oikarinen and Reed [OR93]. An interesting feature is DCC (directclient-to-client, direct client connection). DCC is used to transfer files and “allows clientsto communicate directly” via one-on-one connection “with each other outside the IRCnetwork” [Cha00].

60

5.3 IRC message

Table 5.2: Examples of IRC message delivery (adapted from [OR93])Nr. Example Description

1 message between clients 1 and 2 Only seen by server A, which sends it straight to client 2.2 message between clients 1 and 3 Seen by servers A and B, and client 3. No other clients or

servers are allowed to see the message.3 message between clients 2 and 4 Seen by servers A, B, C, and D, and client 4.4 any channel with one client in it Messages to the channel go only to the server5 two clients in a channel All messages traverse a path as if they were private mes-

sages between the two clients outside a channel.6 clients 1, 2, and 3 in a channel All messages to the channel are sent to all clients and to only

those servers which must be traversed by the message if itwere a private message to a single client. If client 1 sendsa message, it goes back to client 2 and then via server B toclient 3.

5.3.3 IRC commandsCommands are used by the client and server to “perform specific functions” [VG08]. Bothserver and client have a set of commands for communicating. In many cases, these twocommand sets do not exactly coincide [Cha00]. Table 5.3 presents an overview of groupswith related IRC commands. These server commands are defined in RFC 1459, RFC 2812,and RFC 2813 [Kal00c; Kal00d; OR93].

Table 5.3: IRC command groups (adapted from [Kal00c; Kal00d; OR93])

Command group IRC server commands

connection registration MODEu NICK OPER PASS QUITSERVER SERVICE SQUIT USER

channel operations INVITE JOIN KICK LIST MODEc

NAMES NJOIN PART TOPICsending messages NOTICE PRIVMSGserver queries/commands ADMIN CONNECT INFO LINKS LUSERS

MOTD STATS TIME TRACE VERSIONservice query/commands SERVLIST SQUERYuser-based queries WHO WHOIS WHOWASmiscellaneous messages ERROR KILL PING PONGoptional features AWAY DIE ISON REHASH RESTART

SUMMON USERHOST USERS WALLOPS

LegendCommand is defined in RFC 1459Command is defined in RFC 2812Command is defined in RFC 2813

61

Chapter 5 Communication model for CMC applied to Internet Relay Chat (IRC)

Table D.1 describes the above IRC server commands and their parameters in more detail.Angle brackets (< and >) signify the parameters of a command. Whatever is betweensquare brackets ([ and ]), indicates that these respective parameters are optional. Inaddition, curly brackets ({ and }) in BNF or asterisk (*) in ABNF show that the parametersbetween them are repeatable (0 or more). The vertical bar “|” (BNF) or the slash “/”(ABNF) indicate the logical OR operator (alternative). A detailed description can be foundin RFC 5234.

Several extensions to the IRC protocols exist, such as IRCX, P10, and TS6 [Abr98; Har04;Und00]. Depending on which IRC daemon is in use, additional server commands areavailable. For example, the HELP command is not specified by any RFC, but is usuallyimplemented by IRC daemons. Other examples of IRC server commands are shown inTable D.2.

mIRC commands are case insensitive (user and channel modes are excluded). They areprefixed with a command character (usually a forward slash “/”). This client supportslots of commands. Especially, mIRC implements commands that are unique, or that aremodifications, or extensions of standard IRC commands [MB10]. Some of these clientcommands are described in Table D.3.

IRC clients such as mIRC read in and parse text messages. Depending on the command,clients filter commands, perform the appropriate actions, and if necessary, pass them (withor without modifications) to the IRC server [Von12]. In Table D.4, the author mappedmIRC commands to the respective server commands. For example, the commands toignore users (IGNORE) and to clear the buffer of the current window (CLEAR) are client-side features. Another interesting example can be shown with the RAW command. TheHELP command (“/help [<keyword>]”) opens up the built-in “mIRC Help” dialog. Nomessage is sent to the IRC server because the command is performed by the client, incontrast to HELP in combination with the RAW command. With “/raw HELP”, themessage “<server> HELP” is sent directly to the server. The specified server returns (ifimplemented) a list of available IRC commands.

5.4 Chapter summaryThe well-known IRC was presented in detail. IRC is the preferred example of use fordiscourse analysis in this thesis. The following chapter introduces the author’s multiple-views analysis approach to computer-mediated discourses.

62

Part II

Computer-mediated discourseanalysis

63

CHAPTER 6 Multiple-views analysis ap-proach to computer-mediateddiscourses

Figure 6.1 represents a basic process for analyzing computer-mediated discourses. Multiple-views analysis is similar to looking through glasses with different color lenses. Discoursesand the related messages are sliced into several views, and combinations of one or moreviews are possible. This approach helps in focusing, understanding, and extractinginformation more effectively.

Figure 6.1: Multiple-views analysis

The next sections will start with some definitions, then present an overview and steps ofthe multiple-views analysis approach, ethical considerations, useful key questions, twelvedefined views with potentially extracted attributes, and various visualization possibilities.

6.1 DefinitionsFor the sake of clarity, some terms should be defined, explained, or mentioned.

6.1.1 AnalysisThe word “analysis” comes from the Greek “analyein” (to break up, to loosen) [Mer17a;Onl17]. Different methods of analysis exist that are qualitative, quantitative, or mixed[Tru04]. Discourse analysis is largely qualitative, while content analysis is usually quanti-tative but can also be qualitative (e.g., [Ber01; JI+07; Neu02; Sch12; WK00]). Tešitelová[Teš92] explains that “[q]uality means a substantial determination of an object, quantitymeans such determination according to which a thing can be divided into homogeneousparts which can then be integrated into one whole”.

65

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

Discourse analysis is either done online (real-time, ante-mortem) or offline (post-mortem,non real-time) [PHL10]. Grishman and Kittredge [GK86] comment that successful dis-course analysis “requires detailed knowledge of the language” on various levels, includingmorphology, syntax, semantics, and pragmatics [Bus06; Tit+00; Tra07]. It is necessary tocapture the data (e.g., voice recordings, chat discourses) for later processing and analyzing.Messages can be saved to files or into databases either in their original form or in reducedform [BS08]. Logged data tend to become large. Therefore, depending on the researchquestions, messages should be captured with an adequate level of detail. However, offlineanalysis reduces runtime overhead on servers or clients.

6.1.2 Text and discourse analysis

A convenient way to divide CMC is to categorize into text-based and non-text-basedcommunication, as Simpson [Sim03] does. CMC systems are primarily based on text.Bussmann [Bus06] defines text as “[t]heoretical term of formally limited, mainly writtenexpressions that include more than one sentence”. Abbasi and Chen [AC08a] describedistinct properties that differentiate CMC text from non-CMC documents (e.g., essays,research papers). They note that CMC text has a communicative nature that makes it richin interaction. Additionally, it differs from non-CMC linguistically and with respect to itsinformational composition.

The term “discourse” is a word with many different meanings. McCarthy [McC01] explainsthat the two terms “discourse” and “text” have been used interchangeably. Bussmann[Bus06] defines discourse as a “[g]eneric term for various types of text”. Langdridge andHagger-Johnson [LHJ09] add that “discourse consists of spoken and written communi-cation and all other forms of communication”. Brown and Yule [BY83] state that “[t]heanalysis of discourse is, necessarily, the analysis of language in use”. Bussmann [Bus06]notes that discourse analysis (DA) is a “[c]over term for various analyses of discourse”.She adds that “discourse analysis” is used synonymously with “text analysis”. In contrast,other researchers associate DA with the study of both written and spoken words, and usetext analysis to analyze words only [McC01].

6.1.3 Conversation analysis

Agne and Tracy [AT09] note that conversation is “ordinarily understood as informal,free-flowing talk” such as interviewing or giving a speech. Conversation analysis (CA) isa “method for analysing spoken conversation” [OM03]. Wooffitt [Woo05] states that CA“is focused on interaction”. He adds that “[i]t examines language as social action”.

6.1.4 Content analysis

The term “content” has several senses, e.g., “what a communication that is about some-thing is about” [Pri14]. Synonyms of content are, among other things, meaning, signif-icance, subject matter, substance, and theme [Col14c; Mer14b; Pri14]. Jackson II et al.[JI+07] define content analysis as a “generic name for a variety of ways for conductingsystematic, objective, quantitative, and/or qualitative textual analysis”.

66

6.1 Definitions

6.1.5 Computer-mediated discourse analysisAccording to Herring [Her01], computer-mediated discourse (CMD) is “the communica-tion produced when human beings interact with one another by transmitting messages vianetworked computers”. This definition is updated in the second edition with the additionof “or mobile computers” [HA15]. Computer-mediated discourse analysis (CMDA) is anapproach to the analysis of CMC focused on “online language and language use” [Her07].Herring [Her04] describes CMDA as follows:

CMDA applies methods adapted from language-focused disciplines such aslinguistics, communication, and rhetoric to the analysis of computer-mediatedcommunication (Herring, 2001). It may be supplemented by surveys, inter-views, ethnographic observation, or other methods; it may involve qualitativeor quantitative analysis; but what defines CMDA at its core is the analysis oflogs of verbal interaction (characters, words, utterances, messages, exchanges,threads, archives, etc.). In the broadest sense, any analysis of online behav-ior that is grounded in empirical, textual observations is computer-mediateddiscourse analysis.

Also according to Herring [Her04], there are four different domains or levels of CMDA:structure, meaning, interaction, and social behavior. These domains are presented in Table6.1.

Table 6.1: Four domains of language (adapted from [Her04])Domain Phenomena Issues Methods

structure typography, orthography, genre characteristics, structural/descriptivemorphology, syntax, orality, efficiency, linguistics, text analysisdiscourse schemata expressivity, complexity

meaning meaning of words, what the speaker intends, semantics, pragmaticsutterances (speech acts), what is accomplishedmacrosegments through language

interaction turns, sequences, interactivity, timing, conversation analysis,exchanges, threads coherence, interaction as ethnomethodology

co-constructed, topicdevelopment

social behavior linguistic expressions of social dynamics, power, interactional socio-status, conflict, negotiation, influence, identity linguistics, criticalface-management, play; discourse analysisdiscourse styles, etc.

Herring proposes a faceted classification scheme for computer-mediated discourse withten medium (technological) and eight situation (social) factors. Both types of factor“influence discourse usage in CMC environments” [Her07]. This open-ended list ofcategories (facets) is shown in Table 6.2. Additional categories can be added if “... theyaffect online discourse” [Her07]. For example, L2 (second language) factors for a learningenvironment are additionally defined in [Tay09]. L2 factors are, for example, coursemanagement system, course type, and teaching approach.

67

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

Table 6.2: Herring’s medium and situation factors (adapted from [Her04])medium factors synchronicity, message transmission, persistence of transcript, size of message

buffer, channels of communication, anonymous messaging, private messaging,filtering, quoting, message format

situation factors participation structure, participant characteristics, purpose, topic or theme,tone, activity, norms, code

6.2 Overview: General steps and extensionsThis section is an introduction to one approach to computer-mediated discourse analysis.Gee [Gee11] mentions that “[t]here are many different approaches to discourse analysis,none of them, including this one [in his book], [is] uniquely ‘right’”. For example, one spe-cific approach to computer-mediated discourse analysis described in [Her04] is informedby a linguistic perspective.

An overview of a multiple-views analysis approach, which focused on CMC discourses, isgiven in Figure 6.2. The common main steps (orange rectangles), which include prepara-tion, data collection, data extraction, analysis, and result, are extended. These extensions(blue rectangles) are described in the next sections in more detail.

Figure 6.2: Multiple-views analysis approach: Overview

This short outline can be viewed as a template of how to approach multiple-views analysis.The underlying approach provides a systematic way to analyze computer-mediateddiscourses. The following step-by-step introduction describes the five common main stepsand references to their extensions.

• Preparation

– Identify and understand the nature of the research problem and opportunity. Ifnecessary, undertake background reading about discourse analysis and subjectstudy.

– Determine and clearly define the research questions that should be answeredby analyzing computer-mediated discourses.

– Focus on CMC systems such as electronic mail (email), forum, blog, wiki, onlinechat, instant messaging, or text messaging (see Section 3.3).

68

6.2 Overview: General steps and extensions

• Data collection

– Consider ethical issues in data collection (see Section 6.3).

– Think about the research questions to make clear which information “is neededto answer the questions of interest” [PD12].

– Choose adequate attributes that help to answer the research questions. Exam-ples of attributes are shown in Section 6.6.

– Check which views are involved (see Section 6.5) and which other attributes ofthese views are also of interest.

– Set the context parameters and limitations in which the computer-mediateddiscourses take place. These include, for example, clear topics, clear timeframes, potential sources, and languages used.

– Verify which CMC system provides the required information (i.e., attributes orviews).

– Consider Herring’s medium and situation factors that influence discourse usein CMC environments (see Table 6.2).

– Select one or more appropriate CMC systems and tools (e.g., for logging).

– Collect new data from one or more selected CMC systems or choose existingdata sources.

• Data extraction

– Transform unstructured or poorly structured data such as forum pages or blogsinto structured ones [Lö+09].

– Validate that data are complete and accurate enough for analysis.

– Remove unnecessary data such as outliers and duplicates to avoid noise (datacleaning).

• Analysis

– Perform real-time and/or offline analysis.

– Use qualitative and/or quantitative research methods for data analysis.

– Focus on the different views and attributes explained in Sections 6.5 and 6.6.Asking key questions is useful as a guide (see Section 6.4).

• Result

– Present data as effectively as possible (see Section 6.7).

– Interpret results and draw meaningful and logical conclusions.

69

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

6.3 Ethical considerationsBefore collecting data, ethical questions should be taken into consideration [BS08; Seg02].Herring [Her02] notices that “[t]he challenge is to strike a balance between allowingresearchers to carry out quality CMC research, and protecting users from potential harm”.Androutsopoulos [And14] stresses that “relevant ethics guidelines ... vary considerably bycountry and institution”. He remarks that “[a] complete anonymization of public CMCdata may even be technically impossible” However, not all online communicators maywish to stay anonymous (e.g., famous bloggers) [And14].

6.4 Key questionsDiscourses can be analyzed with the help of simple open-ended questions. Worley etal. [Wor+07] use the following question to summarize the components of most humancommunication: “Who (sender/receiver) is talking, listening, or responding to whom(receiver/sender), about what (content), where (context), when (context), how (channel),and why (motivation)?” The author’s fundamental interrogative words with examplesof general possible answers to get information are shown in Table 6.3. These questionscluster discourses into several views. The taxonomy is similar to that of Robinson andRackstraw [Pom05], which is as follows: who, which, what, when, where, why, and how.In Lasswell’s linear model, another similar formula is expressed: “Who says what in whichchannel to whom with what effect?” [AG01; Fle09]. For example, the question “Aboutwhat are they communicating in this message?” gets answers or information about thetopic of the specific message.

Table 6.3: Key questions

Question Example Example(s) of possible answer(s)/information about View

about what are they communicating? message topic (e.g., computer) TOPhow are they communicating? emotions (e.g., intensity), feelings EMOwhat are they writing? message, message type (e.g., question, answer) MESwhen are they communicating? date or time of communication (e.g., 23:59:58) DATwhere are they communicating? virtual location (e.g., web page, site, channel) SOW

are they communicating? physical location (e.g., room at university) CONwho is communicating? information about communicator (e.g., age, identifier) COMwhy are they communicating? cause (e.g., need of more information) CAU

is there a lack of communication? cause of communication barrier (e.g., network lag) BARwith what effect are they communicating? effect, effect type (e.g., intentional, long-term) CAU

effectiveness are they communicating? effectiveness, effectiveness type (e.g., quantitative) EFFwith which software are they communicating? software (e.g., client software) SOW

hardware are they communicating? hardware (e.g., physical server, network) HAWwith whom are they communicating? receiver (e.g., original logged-in nickname) REL

70

6.5 Views

For a detailed analysis, it is useful to find out answers to these key questions. The phraseof the key questions is visualized in Figure 6.3.

Figure 6.3: Phrase of key questions

The verbal version of the phrase is given below:

Who communicateswith which hardware, software, and networkwherewhen(about) whatwith whomhowwhywith what effect, andwith what effectiveness?

6.5 ViewsViews “represent different angles to reveal the fundamental characteristics and properties”of a system [Che+12]. Each of the five basic elements of CMC can be analyzed withone or more views. Multiple views “provide a multi-angled understanding” [BLO00]of “different perspectives for different purposes” [Lic10]. In Table 6.4, twelve views arementioned. Other views are possible, but the following are the most essential for analysis.

Table 6.4: ViewsCON context BAR barrierDAT date/time HAW hardware and networkSOW software COM communicatorMES message REL relationTOP topic EMO emotionCAU causality EFF effectiveness

In Figure 6.4, discourses (big yellow-filled rectangles), messages (black lines), views (smallsquares), and found information (attributes, values) related to the views (different colorsof the squares) are represented. A view is either general (e.g., all communicators in thediscourse) or specific (e.g., only communicator with the nickname <Admin>) (a). Addingone or more views usually expands the understanding of the whole discourse. In general,adding views has the following effects: on the one hand, analysis with the help of anew view can extract additional essential information (b), and on the other hand, a newview can restrict other views. In (c), the added “View DAT” restricts the “View SRC”, forexample, because of the focus on a specific time interval.

71

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

(a) (b) (c)

Figure 6.4: Examples of views

6.5.1 MessageThe term “message” can be interpreted in various ways. Additionally, a message may lookdifferent at the various stages on its path between communicators (see Table 6.5). CMCsystems process input messages (e.g., created by audio input devices such as microphones),produce output messages (e.g., displayed by monitors), and transport both. Messages aresent from senders to intended and non-intended receivers. Non-intended receivers arecalled overhearers or lurkers, for example. The receiver’s response to a sender’s messageis called feedback [Nai11; Win+09], which “can be verbal and nonverbal, intentional andunintentional” [AT09]. Rodriques [Rod00] mentions that “feedback has an importantrole of leading the communication cycle to completion”. Feedback indicates to senderswhether each message was received and understood. In the author’s model, feedback isnot considered to be a separate basic element; it is a special type of message. Pearson et al.[Pea+11] note that “[e]ven no response, or silence, is feedback”.

Table 6.5: Typical path of a message in CMCCommunicator CMC system

Receiver

Description Send

er

Inte

ntio

n

Non

-int

.

Inpu

t

Out

put

Med

ium

message in sender’s mind 3 7 7 7 7 7

input message into sender’s client 3 7 7 3 7 7

output message from the sender’s perspective 3 7 7 7 3 7

raw message transmitted from sender’s client 3 7 7 7 7 3

raw message between servers 7 7 7 7 7 3

raw message transmitted to intended receiver’s client 7 3 7 7 7 3

raw message transmitted to non-intended receiver’s client 7 7 3 7 7 3

output message from intended receiver’s perspective 7 3 7 7 3 7

output message from non-intended receiver’s perspective 7 7 3 7 3 7

message in intended receiver’s mind 7 3 7 7 7 7

message in non-intended receiver’s mind 7 7 3 7 7 7

72

6.5 Views

Most CMC discourses under investigation are still mainly text-based [Gre10; Her07].However, due to increased bandwidth, there is a shift toward the “three-dimensional,multimedia based world wide web” [Sou00]. Messages may or may not be readable byhumans, but are essential for both quantitative and qualitative analysis. Table 6.6 showssource code extracts of several CMC messages which may or may not be human-readable.

Table 6.6: Extracts of messagesBlog (BBC blog, http://www.bbc.co.uk/blogs/tv/posts/The-Bridge-Live-web-chat)3. bloogeFebruary 1, 2014, 22:34

I really like the characters in the series, particularly Saga. Who/What was the inspiration for her charact

Email including HTML (private email)Content-Type: text/html;charset=UTF-8Content-Transfer-Encoding: 7bit

<p>Sure I idle about in the quake net and Geeks irc whenever I have time - give me ideas please!</p>

Email including image (private email)Content-Type: image/jpeg; name="image001.jpg"Content-Description: image001.jpgContent-Disposition: inline; filename="image001.jpg"; size=2266;

creation-date="Wed, 26 Feb 2014 08:53:08 GMT";modification-date="Wed, 26 Feb 2014 08:53:08 GMT"

Content-ID: <[email protected]>Content-Transfer-Encoding: base64

/9j/4AAQSkZJRgABAQEAeAB4AAD/2wBDAAoHBwkHBgoJCAkLCwoMDxkQDw4ODx4WFxIZJCIyIoLTkwKCo2KyIjMkQyNjs9QEBAJjBGS0U+Sjk/QD3/2wBDAQsLCw8NDx0QEB09KSMpPT09PT0

Facebook (CNN, https://www.facebook.com/cnn/posts/10152210252401211)<div class="mbs _5pbx userContent" data-ft="&quot;tn&quot;:&quot;K&quot;">It was supposed to be afun family outing to the movies, but an argument led to a fatal encounter. What happened next is disputed. See the details of the confrontation: <a href="http://cnn.it/1cPC35N" target="_blank" rel="nofollow"on mouse over="LinkshimAsyncLink.swap(this, &quot;http:cnn.it1cPC35N&quot;);" onclick="Linkshim

Skype (private conversation)[08.02.2014 13:23:24] *** Call from Claudia ***[08.02.2014 13:26:46] *** Call ended, duration 03:22 ***

Wikipedia (article “Conversation”, http://en.wikipedia.org/wiki/Conversation)

continued on the next page

73

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

Wikipedia (editing article “Conversation”)===One’s self===Also called [[intrapersonal communication]], the act of conversing with oneself can help solve problems

6.5.2 ContextCommunication does not occur in a vacuum [Fuj09; Nai11; Nar06c]. Narula [Nar06c]states that communicators send and receive messages in their own context. Depend-ing on factors such as context, individual personality, and mood, Naidoo [Nai11] adds,“[c]ommunicators do not always communicate the same way from day to day”. They varytheir communicative style [SM07; Seg02]. To understand a communication (i.e., discourse),it is necessary “to know where and under what circumstances people are communicating,because these have a major influence on the individuals involved” [Fuj09]. Differentcontexts are, for example, a distance learning video conference at home, reading blogentries on a mobile phone on a train, an early business meeting at 6:00 a.m., writinginformal and private mails to a girlfriend, or online chats with Chinese activists about thefight for freedom of expression.

6.5.3 Communication barrierBarriers can occur within each basic communication element of the presented commu-nication model and their related views. It is important to identify the barriers becausethey can reduce communication effectiveness. An overview of causes of communicationbarriers is given in Table 6.7.

Table 6.7: Causes of communication barriers mapped to each view

View Causes of communication barriers

CON poor environment (e.g., poor room lighting, uncomfortable seating)BAR (see the other views)DAT little or no time to write, chatting through the night, different time zonesHAW outdated or faulty equipment (e.g., keyboard, monitor, slow Internet connection)SOW software crash, service not availableCOM differences in culture (e.g., attitude, behavior), religion, and language; fatigue, illness, hunger,

lack of attention, prejudiceMES message overload, incomplete sentences, orthography errorsREL wrongly addressed or written receiver, ignoring of a communicatorTOP message complexity (e.g., lots of jargon), not aware of different meanings of words, wrong topicEMO inappropriate emotions, not expressing emotions, being negative or too assertive, conflictCAU receiver misunderstands the message, lack of feedback, improper feedbackEFF (see the other views)

74

6.5 Views

Some examples of the causes of barriers are shown in Table 6.8: email delivery failurebecause of wrong receiver’s email address, lack of concentration on IRC, over-capacityerror on Twitter, and HTTP status “404 Not Found” on a website.

Table 6.8: Extracts of barriersEmailA message that you sent could not be delivered to one or more ofits recipients. This is a permanent error. The following address failed:

Chat (IRC)<hardwarecat> hello comrades!<hardwarecat> erm, wrong channel again<hardwarecat> what the heick<hardwarecat> *heck<hardwarecat> qwerty: I make so many errors that it’s not funny anymore

Skype

Twitter

Website

continued on the next page

75

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

YouTube

6.5.4 Date/timeNeuage [Neu05] emphasizes that the “Internet never sleeps ... making it difficult to saythat there is a beginning or an end to any online communication”. CMC messages usuallycontain date and time information (timestamps) which are, for example, the dates andtimes that messages were sent or received by the communicators’ clients and servers.Without timestamps, it is hard (or even impossible) to know when messages were written,sent, or received, or if they are current or not. Several protocols exist that standardize therepresentation of dates and times, such as ISO 8601, RFC 822, RFC 2445, and RFC 3339[Cro82; DS98; ISO04; KN02]. Examples of timestamps are shown in Table 6.9. This viewalso focuses on time periods (e.g., “two days”, “3500 BC–559 BC”) and parts of the day(“noon”, “morning”).

Table 6.9: Extracts of timestampsBlog (BBC blog)February 1, 2014, 22:34

Email (client “The Bat!”)3 Apr 2012, 16:16 Sunday, Jun 4, 2017, 11:05 3/29/2014, 10:35

FacebookOctober 31, 2013 June 15 at 8:27am April 3016 hrs

SkypeYesterday Thursday, November 29, 2012 Sunday, July 31, 2016 9:50 AM

Twitter15h Apr 3 7 Jul 2010

YouTubePublished on Jul 11, 2017 6 hours ago

76

6.5 Views

6.5.5 Hardware and networkCMC takes place via computer equipment such as personal computers and mobile devices.Input/output devices are used to interact with the computer equipment. For example,a finger, keyboard, and mouse are input devices; while a monitor, multimedia projector,and printer are output devices. Networks enable users to exchange messages, shareresources, or transfer files within the same network. Information about the hardware andnetwork of the used CMC system can be extracted (see Table 6.10). This is, among otherthings, Internet protocol (IP) address [Pos81], uniform resource locator (URL) [BL+94],second-level domain name, host name, and website name.

Table 6.10: Extracts of hardware and networkEmail (private email)Received: from mx14lb.world4you.com [81.19.149.124] by mail10.world4you.com with ESMTP

(SMTPD-11.0) id fe8e0005712dfa8b; Fri, 14 May 2010 02:41:57 0200Received: from [209.85.212.54] (helo=mail-vw0-f54.google.com)

by mx14lb.world4you.com with esmtp (Exim 4.69)(envelope-from <[email protected]>)id 1OCiyL-0006gC-98for [email protected]; Fri, 14 May 2010 02:41:54 +0200

Facebook (login page)<link rel="alternate" media="handheld" href="https://www.facebook.com/" /><title id="pageTitle">Welcome to Facebook - Log In, Sign Up or Learn More</title>

Forum (a phpBB support forum, https://www.phpbb.com/community/viewforum.php?f=46)

Twitter

77

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

6.5.6 SoftwareCMC systems consist not only of hardware and network, but also of software for clientsand servers. Different attributes of the software used can be traced and extracted fromdiscourses. The messages in Table 6.11 include information about operating software,current browsers, or installed plugins, programming languages (e.g., PHP), markuplanguages (e.g., HTML), and email spam filtering software (e.g., SpamAssassin, https://spamassassin.apache.org/).

Table 6.11: Extracts of software

Email (private email)X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mx07lb.world4you.comX-Spam-Level:X-Spam-Status: No, score=-1.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,

DKIM_VALID_AU,FREEMAIL_FROM,GREYLIST_ISWHITE,SPF_PASS autolearn=disabledversion=3.3.2

Forum (a phpBB support forum, https://www.phpbb.com/community/viewforum.php?f=46)<li>

<h4><a href="/community/viewforum.php?from=submenu&amp;f=81">Modification Forums</a><p><a href="/community/viewforum.php?from=submenu&amp;f=81">Discuss and view MODs that

</li>

Twitter<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" version="XHTML+RDFa 1.0" dir="ltr"

xmlns:og="http://ogp.me/ns#"><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta property="twitter:creator" content="@twitter" />

6.5.7 CommunicatorThe parts of this view are the communicators and their real and virtual worlds, as well asthe identifiers and related profiles that link to specific CMC systems (see Figure 6.5).

(a) (b) (c) (d) (e)

Figure 6.5: Important parts of the “View COM”

78

6.5 Views

Communicators are human and non-human beings (Subfigure 6.5(a)). They can be char-acterized by several levels, especially through their activities, experiences, or rights. Forexample, communicators are a named administrator (e.g., a forum admin has specificcontrols for a forum), a newbie or noob (new user), an oldbie (opposite of newbie), acluebie (clueless newbie), a guru (a person with knowledge, expert), a flamer (writerof an extremely hostile message), a lurker (user who reads chat conversation withoutparticipating themselves), or poster (e.g., bulletin board author) [GD04; Urb14f; Urb14i;Urb14j; Urb14k; Urb14m].

The virtual world is a second home for many users, where they are whoever they pretendto be [Tur94]. Wang and Evans [WE08] suggest an “Overlapping World View”, whichsays that “virtual and real world identities are neither entirely separate nor identical”. Ifa user spends a huge amount of time online, the nickname can become a major part ofthe user’s identity, which extends beyond the virtual. It is the same with the author, whois sometimes called by his long-time chat nickname <RobiX> in real life. The author’snickname is a compound of the diminutive form of his first name Robert and the variablex. This variable is written with a capital letter, which represents his current age instead ofa number. Communicators exchange messages that include statements about their real(R) and virtual (V) world (Subfigure 6.5(b)). These statements are true (T) or false (F). Forexample, a 45-year-old male newbie writes in IRC that he is male (RT), 20 years old (RF),and IRC chatter (VT) with more than 100 chat hours (VF).

Communicators are usually identified by unique names (identifiers) within a CMC system(Subfigure 6.5(c)) [Anr+05; WS05]. The displayed names can differ from the technicalones (e.g., real name “Bill Gates” in discourse vs. username “BillGates”). Commonsynonyms for an identifier are the user name, screen name, nickname, pseudonym, loginname, alias, or handle [BS08; Her02; Joh97; Kal07; Neu05; Wev+04; WS05]. Depending onsome restrictions, such as maximum length and non-permitted characters, every availableidentifier is possible to create. An identifier is, for example, an email address, a telephonenumber, and a full real name or nickname (see Table 6.12). The attractiveness of identifiersis important, especially in communication with unknown people (e.g., in chat rooms).

Table 6.12: Extracts of [email protected]

FacebookBillGates (username/profile name), 216311481960 (numeric ID)

SMS+431234567

TwitterBarackObama

YouTubeShakira

79

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

Identifiers such as nicknames “can provide users with anonymity and create a resultantliberating effect” [Gel98]. Lakaw [Lak06] adopted Bechar-Israeli’s topology [BI95] andrelated it to the indication of anonymity of nicknames from a high down to a low level.For example, “[a] very high degree of anonymity means that the nickname ... does notreveal any information about the user’s on-line identity” [Lak06]. Nevertheless, completeanonymity on the Internet at all times is never guaranteed. Beißwenger and Storrer [BS08]add that “the real person behind the CMC character could be identified through a targetedinspection of the contact information provided by the user” (e.g., email address, telephonenumber). Both the virtual and real identity can differ, because users play with theiridentities [BI95; Rei91]. Additionally, a user can create multiple online identities [Mar06].

Information about communicators can be extracted and collected from their identifiers, andmessages can be sent about the virtual and real world (Subfigure 6.5(d)). It is difficult tofind out which messages contain facts about the user’s real life and which ones are simplyinventions. For example, gender switching (or gender swapping) [Bru92; Bru96; Dan98;McR96] “makes authorship attribution and characterization quite difficult” [Kuc+08].Gender switching “occurs when people present a gender that is different from theirbiological sex” [RP99]. Not only do communicators give information away that is collectedand used by others, but data are also requested by the server software or automatically sentby the client software". These data may be collected and merged to generate user profileswith user-specific attributes and CMC system-specific attributes. Rudman [Rud97] statesthat approximately 1000 stylistic features have already been isolated. Another typologyis given by Zheng et al. [Zhe+06], that includes 270 features. Commonly used stylisticfeatures are, among other things, word and message length, vocabulary richness, the useof punctuation marks, and frequency of emoticons [Kuc+06]. These characteristics can beused to define a unique online writing style for a communicator [Kuc+08; OA09]. Similarto a person’s unique fingerprint, it is known as writeprint or wordprint [AC08b; Iqb11].

Each identifier has one or more related profiles (Subfigure 6.5(c)) that refers to one or moreCMC systems (Subfigure 6.5(e)).

6.5.8 RelationThis view focuses on the relations between one (reflexive) or more communicators. Itincludes relation types (e.g., family), relation frequencies (e.g., low), and relation intensities(e.g., strong and close tie). Nguyen [Ngu08] states that “CMC enables multi-dimensionalcommunication” that includes one-to-one (e.g., instant messaging), one-to-many (e.g.,mailing list), and many-to-many (group) interactions (e.g., online chat). Various ways ofaddressing other communicators are found in CMC textual environments. Identifiers areused for addressing receivers in message headers (fixed fields or positions) or somewherein message bodies. For example, the header fields “To:”, “Bcc:”, and “Cc:” specifythe recipients of emails [Res08]. The at sign “@” is commonly used as a “marker ofaddressivity” [HH09] in Facebook posts or Twitter messages (tweets), for example. Thedetection of the intended targets for each message is an important aspect of discoveringwhich messages or parts are related to which receivers. In particular, the arbitrary positionwithin discourses, and the shortened or creatively changed variants make detection ofthe receivers’ identifiers difficult. Greetings and farewells are often followed by receivers’identifiers (see Table 6.13, example “Email”). The first example in the table additionallycontains quoted (copied) previous messages that are set off with angle brackets. Severinson

80

6.5 Views

Eklundh [SE10] mentions that “the quoted text has a discourse-deictic relationship to themessage from which it is taken”. The second example uses anaphora (“you”), which is atype of referential cohesion, to refer to the communicator “Bill Gates”.

Table 6.13: Extracts of relationsEmail (private email)Hello again Robert,

> do you live at the mainland, or on an island? I’ve been several times at Ios and Mykonos :)> It was great.No, I don’t live on an island, I live near Salonica, which lies on the North part of Greece.

Facebook (Bill Gates, https://www.facebook.com/BillGates/)

Skype

6.5.9 TopicThe aim of this view is to find out “what is being talked/written about” [BY83]. Theterm “topic” is the “aboutness” of a unit of discourse (e.g., whole discourse, discoursefragment, sentence) [BY83; Ren04; Sim03]. Synonyms for a topic are, for example, “subject”or “theme” [Col15]. Especially in conversations with many users, multiple topics arediscussed simultaneously, and this can lead to confusion, or make conversations difficult

81

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

to follow. Simpson [Sim03] notes that “topics and their boundaries themselves are ...difficult ... to identify”. He adds that topic changes can be abrupt (topic shift) or gentle(topic drift). Examples of topics are given in Table 6.14: an email about a publication andmigration, some Java topics on a forum, Pope Francis’ request to end slavery, a Wikipediaarticle title about computer-mediated communication, and a talk on YouTube about thesong “Frankie Goes To Hollywood—The Power Of Love [Official Music Video]”. Inspiredby IRC, Twitter uses the “#” (hashtag) in tweets to tag topics of interest [Twi14].

Table 6.14: Extracts of topicsEmail (private mailing list)Linguistik Online: 65/2014 & Relaunch

Dear friends and colleagues,issue 65 has just been published, but what is more: Linguistik online has migrated to a new domain.

Forum (http://www.coderanch.com/forums/f-33/java)

Twitter (Pope Francis, https://twitter.com/pontifex)

Wikipedia (http://en.wikipedia.org/wiki/Computer-mediated_communication)

YouTube (Frankie Goes To Hollywood—The Power Of Love [Official Music Video])

82

6.5 Views

6.5.10 Emotion

Communication starts with a specific mood and feeling in the communicator. Users trans-fer their real-life problems and personalities to their virtual lives. Chenault [Che98] notesthat “therefore, CMC must inherently include all kinds of emotional content”. An emotionsuch as joy, sadness, or anger “is a fundamental component of being human” [BN08]. InCMC, messages among communicators “can often be deep and highly emotional” [Rei91].A review of the literature (e.g., [Bin+10; Der+08; Ekk12; Ekm99; Fri86; OT90]) showsthe existence of many different definitions, emotions models, and measurements. Manyclosely intertwined terms such as affect, mood, or feelings are associated with emotion.In general, words can convey emotional meaning [Str+06]. Clore et al. [Clo+87] suggestdistinguishing between direct affective words (directly referring to emotional states, e.g.,“fear”), and indirect affective words (indirect reference that depends on the context, e.g.,“killer”). The identified emotional expressions at different levels (e.g., word level, sentencelevel) can be labeled with the appropriate emotion category (e.g., with Ekman’s six basicemotions: anger, fear, surprise, sadness, disgust, and happiness [EF86]), the indication ofan emotional valence (e.g., positive), intensity (e.g., low), absolute or relative frequency(e.g., 100 times), and duration (e.g., thread). Further different forms “compensate for thelack of nonverbal cues and give the recipient access to the feelings and emotions of theauthor” [Mar+08]. In Table 6.15, the direct affective word “fear”, graphical emoticonsthat represent “:-(” and “:-)”, a text smiley “:-)”, an exclamation mark, and pressed “likebuttons” are used to express emotions or feelings.

Table 6.15: Extracts of emotionsEmail (client “The Bat!”)

Facebook

Skype

Twitter (Barack Obama, https://twitter.com/barackobama)

YouTube

83

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

6.5.11 CausalityThis view describes and analyzes causalities. A causality is a relationship between a cause(reason) and its effect (result, outcome, consequence) [Col14a; Col14b; Col14d; Mer14a]. Acause can have many effects and vice versa. Communication, intentional or unintentional,always has some effect on one or more involved communicators [DeV15]. Causes andeffects can be of a virtual and/or real nature. A mixture of both worlds is also possible.Examples are a banned chatter (virtual cause) who becomes very angry (real effect), aflooder (virtual cause) who is ignored by someone in the chat (virtual effect), or a tiredchatter (real cause) who exits the chat (virtual effect) and goes to bed (real effect). Aflooder is someone who intentionally fills a chat room by repeating, meaningless sentenceswith the intent to annoy [Urb14h]. Other possible examples of cause-effect relationshipsare shown in Table 6.16. The following example describes a series of events linking thecauses of a problem with its effects (causal chain or domino [Cha14]): Communicator Ahas some problems such as dissatisfaction and frustrations at work. This user writes aCMC message with harsh content to communicator B. Communicator B then respondsimpolitely. These communicators then dislike and ignore each other, and stop exchangingnew messages. After a while, communicator A disconnects from the CMC system.

Table 6.16: Examples of cause-effect relationshipsCMC system Cause Effect

blog Adding a harsh blog entry. The blog entry was deleted.chat <flooder> was flooding in a channel. User <flooder> was ignored.email John got a spam mail. He did not reply and removed the mail.email list The content of the mail was boring. User unsubscribed from the email list.Facebook Mary found an old friend on Facebook. She added the old friend.Skype Skype randomly disconnected. User uninstalled the currently installed

version and installed a newer one.Twitter Graham wanted to share a tweet with He used a built-in retweet feature to share

all of his followers. the tweet.Usenet User was tired. User shut down the Laptop.website User found an interesting website. User read through many pages.

Different classification systems for causes and effects exist. For example, significant andincidental, primary and secondary, and remote and proximate (immediate) [You+07].Perse [Per08] describes the dimensions of effects as micro- vs. macro-level, intentionalvs. unintentional, content-dependent vs. content-irrelevant, short- vs. long-term, andreinforcement vs. change. The effect of a message can be in direct correlation with thenext one in a predictable way. Typical examples are the following adjacency pairs [Bla04;Cla01; Sea+04]: question-answer (“so where are you from”, “canada”), greeting-greeting(“Hi”, “Hi”), farewell-farewell (“Bye”, “Bye”), and request-grant/refusal (“would youlike to chat?”, “sure”). However, not all cause-effect relationships are obvious, becausethey are not always visible or immediately observable.

84

6.6 Attributes

6.5.12 EffectivenessThis last view described focuses on effectiveness. The term “effectiveness” should not beconfused with “effect” [Win+09], however. Effective communication occurs only when thereceiver understands the exact information in the sender’s message, and the desired effectis achieved. It depends on communication skills, attitude, knowledge level, socio-culturalbackground, and on how the sender desires to affect the receiver [Ber60; TM11], andthere is no guarantee that all messages will be understood as the sender intended (e.g.,because of decoding errors). Feedback is the receiver’s response, and this allows thesender to evaluate the effectiveness of messages [DD09]. The quality of communicationcan be determined by the effectiveness between communicators. In general, two types ofmeasurement exist: quantitative and qualitative, and a mixture of the two. In contrast toquantitative measurement, measuring qualitative factors is usually more difficult becausethese are not represented numerically. Measurable qualitative or quantitative attributesare, for example, unique visitors (e.g., the number of unique IP addresses per day), bouncerates (i.e., the percentage of visitors who view only the entrance page), keyword rankings,and the tonal biases of messages (e.g., neutral slant). Needless to say, determining theeffectiveness is not an easy process.

6.6 AttributesEach selected view explores attribute-value pairs and optional units of measure. Thesecollections can be summarized in categories. Examples of extracted attributes of computer-mediated discourses for each view are shown in Table 6.17.

Table 6.17: Attributes of each view (examples)View Attribute Value (example(s))

CON context→ type→ cultural context→ country United States, Irancontext→ type→ cultural context→ reason different meaning of thumbs-upcontext→ type→ historical context→ reference it (pencil) is on my deskcontext→ type→physical context→ location conference roomcontext→ type→psychological→ context mood and feelings user is normally good-natured but

under a great deal of stress at themoment

context→ type→ social context→ communicator→ relationship strangerBAR barrier→ affectedness→ affected view View MES, View COM

barrier→ affectedness→ affected by view View BAR, View HAWbarrier→ type→ emotional or perceptional barrier anger, pridebarrier→ type→ organizational barrier organizational policy, rulesbarrier→ type→personal barrier lack of energy or timebarrier→ type→physical barrier information overloadbarrier→ type→ semantic or linguistic barrier similar-sounding words

DAT timestamp→ creator→ timer→ type server, clienttimestamp→ creator→ software→ name LogBot, mIRCtimestamp→ creator→ software→ version 5.13timestamp→ representation→protocol→ name ISO 8601, RFC 822, RFC 2445timestamp→ representation→protocol→ issuer ISO

continued on the next page

85

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

View Attribute Value (example(s))

timestamp→ representation→protocol→ version 2.3timestamp→ representation→ type→date 20100412timestamp→ representation→ type→ time of day 23:20:50timestamp→ representation→ type→date and time of day 20100412 232050timestamp→ representation→ type→ time interval 20100412 232050/20100413 102000timestamp→ representation→ type→ time zone 0930Z, 134519Z

HAW hardware/network→CMC system→ name chat, emailhardware/network→ component→manufacturer→ company→ name Realtek Semiconductor Corp.hardware/network→ component→ type Keyboardhardware/network→ component→ name Logitech Wave Keyboardhardware/network→ component→ serial number 99303011ZBhardware/network→ component→dimensions→ height→ value 3.1hardware/network→ component→dimensions→ height→ unit mmhardware/network→ component→dimensions→ width→ value 5.7hardware/network→ component→dimensions→ width→ unit cmhardware/network→ component→dimensions→ depth→ value 5.0hardware/network→ component→dimensions→ depth→ unit mmhardware/network→ component→weight→ value 9.97hardware/network→ component→weight→ unit lbshardware/network→ component→power consumption→ value 22hardware/network→ component→power consumption→ unit Whardware/network→ component→ network→ architecture client-server, peer-to-peerhardware/network→ component→ network→medium fiber optic cablehardware/network→ component→ network→ size local area networkhardware/network→ component→ network→ interface type Ethernethardware/network→ component→ network→ hardware address 90-E7-AB-07-E3-5Dhardware/network→ component→ network→ IP address 173.252.110.27hardware/network→ component→ network→ link speed→ value 1hardware/network→ component→ network→ link speed→ unit Gbpshardware/network→ component→ network→ state connectedhardware/network→ component→ network→data received→ value 0hardware/network→ component→ network→data received→ unit bytehardware/network→ component→ network→data sent→ value 10hardware/network→ component→ network→data sent→ unit bytehardware/network→ component→ cost→ historical cost→ value 10,000.00hardware/network→ component→ cost→ historical cost→ unit e

hardware/network→ component→ cost→ running cost→ value 1200.00hardware/network→ component→ cost→ running cost→ unit £hardware/network→ component→ cost→ running cost→ period yearhardware/network→protocol→ name FTP, POP, SSLhardware/network→protocol→ port number 25, 80, 110hardware/network→protocol→ URI→ URL http://www.google.com/

SOW software→CMC system→ name forum, chatsoftware→ component→ type operating system, server softwaresoftware→ component→ developer→ company→ name Microsoftsoftware→ component→ name Windows

continued on the next page

86

6.6 Attributes

View Attribute Value (example(s))

software→ component→ version 10software→ component→ architecture→ value 64software→ component→ architecture→ unit bitsoftware→ component→ cost→ historical cost→ value 2,500.00software→ component→ cost→ historical cost→ unit $software→ component→ cost→ running cost→ value 200.00software→ component→ cost→ running cost→ unit e

software→ component→ cost→ running cost→ period monthCOM communicator→ creature→ type human, non-human being

communicator→ identifier→ real name→first name Robert, Claudiacommunicator→ identifier→ real name→ last name Millercommunicator→ identifier→ real name→ initial RE, GEcommunicator→ identifier→ nickname→ real nickname Bob, Johnnycommunicator→ identifier→ nickname→ virtual nickname Bobby87, JohnnyBGcommunicator→ identifier→ technical name→username/profile name BillGatescommunicator→ identifier→ technical name→ technical identifier 216311481960communicator→ physical appearance→ gender→ biological sex male, femalecommunicator→ physical appearance→ age→ value 25, 49communicator→ physical appearance→ age→ unit year, daycommunicator→ physical appearance→ body→ type mesomorph, ectomorphcommunicator→ physical appearance→ body→ height→ value 1.84communicator→ physical appearance→ body→ height→ unit mcommunicator→ physical appearance→ body→ weight→ value 68communicator→ physical appearance→ body→ weight→ unit kgcommunicator→ physical appearance→ body→modification ear piercing, tattoocommunicator→ physical appearance→ body→ shape-altering device tooth braces, glasses, gold teethcommunicator→ physical appearance→ eye→ color blue, green, hazelcommunicator→ physical appearance→ skin→ color light, darkest browncommunicator→ physical appearance→ hair→ color blond, red, browncommunicator→ physical appearance→ hair→ texture curl, volume, consistencycommunicator→ physical appearance→ hair→ hairstyle Afro, side-parted stylecommunicator→ physical appearance→ clothing coat, hat, glovecommunicator→ physical appearance→ cosmetics lipsticks, eye makeupcommunicator→ social status→ education college, universitycommunicator→ social status→ employment employee, employercommunicator→ social status→ job→ name clerk, receptionistcommunicator→ social status→ job→ income→ value 90.00communicator→ social status→ job→ income→ unit e

communicator→ social status→ job→ income→ period daycommunicator→ behavior→ language English, Germancommunicator→ behavior→ habit nail-bitingcommunicator→ behavior→ interest playing soccercommunicator→ behavior→ favorite - film The Matrixcommunicator→ behavior→ favorite - pizza Hawaiian pizzacommunicator→ behavior→ communication→ type aggressive, passivecommunicator→ behavior→ religion and spirituality→ religion Christianity, Buddhism

continued on the next page

87

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

View Attribute Value (example(s))

communicator→ CMC system related→ user→ account status password protected, blockedcommunicator→ CMC system related→ user→ availability status online, offline, awaycommunicator→ CMC system related→user→ role editor, administratorcommunicator→ CMC system related→user→ rights right to delete own postscommunicator→ CMC system related→ user→writing style using of technical jargon

MES message→CMC system→ name email, chatmessage→CMC system→ creator→ identifier Administratormessage→CMC system→ creator→ command KICK, JOINmessage→ identifier [email protected]→ content <n00>hi everybodymessage→ creator→ type system, usermessage→ language→ name English, Germanmessage→ language→ register→ type formal, intimate, causalmessage→ language→ variety→dialect→ name Austrianmessage→ direction from client to servermessage→ format HTML, plain textmessage→ character encoding UTF-8, ISO 8859-1message→ persistence yes, persisted to databasemessage→ length→ value 80message→ length→ unit charactermessage→ size→ value 1024message→ size→ unit kilobytemessage→ space private, publicmessage→ encryption Advanced Encryption Standardmessage→ synchronization asynchronous, synchronous

REL relationship→ type→ genetic relationship family, sibling, cousinrelationship→ type→ relationship by marriage husband, wiferelationship→ type→ relationship partner friendship, boyfriendrelationship→ type→ sexual relationship casual, monogamous, paramourrelationship→ activity wedding, dating, bondingrelationship→ endings divorce, breakup, separationrelationship→ practice hypergamy, infidelityrelation→ association one-to-one, many-to-manyrelation→ frequency high, lowrelation→ intensity strong, weakrelation→ duration session, long-termrelation→discourse partner (communicator)→ identifier [email protected], NickB

TOP topic→ name “Cookies on the website”topic→ category/tag politics, sports, culturetopic→ contribution on-topic, off-topic

EMO emotion→ form of expression→word great!emotion→ form of expression→ emoticon :-))emotion→ form of expression→ gestures hiding the face (anxiety)emotion→ type anger, fear, happinessemotion→ frequency once, permanent conditionemotion→ intensity low, strong

continued on the next page

88

6.7 Visualization

View Attribute Value (example(s))

emotion→duration seconds, threademotion→ valence positive, negative

CAU cause→ name unfriendly usercause→ type significant, primary, remoteeffect→ name quit the chat system, ignore usereffect→ type content-dependenteffect→ duration short-termeffect→ valence positive, negative

EFF effectiveness→ goal→ name increase typing speed to 120 wordsper minute within one week

effectiveness→ goal→ achievement→ value 80effectiveness→ goal→ achievement→ unit %

6.7 VisualizationA possible step is to visualize discourses. Pupyrev and Tikhonov [PT10] note that “visual-ization gives an intuitive sense of ‘what is going on’ and helps to capture basic patterns andrules”. Therefore, the use of visualization ensures a better overview and understanding.Challenges include decisions about which kind of visualization method should be usedand in which granularity. A wide variety of visualization techniques and tools are used tomap CMC messages and the included information to visual representations. Examples ofvisualizations are shown in Table 6.19 on the next pages. Based on the descriptions, thefollowing views are used by these figures.

Table 6.18: Figures of Table 6.19 mapped to their used viewsView Figure(s) (word(s) in description)

DAT Fig. D (sequences), Fig. E (chronology), Fig. F (over time), Fig. G (2-month period, howlong, time), Fig. H (history), Fig. I (number of days), Fig. J (over time, boxes of a calendar)

SOW Fig. A (IRC channels, #java), Fig. B (Google+), Fig. C (three-dimensional virtual worlds),Fig. D (Wikipedia), Fig. E (email), Fig. F (chat), Fig. G (message board), Fig. H(Wikipedia), Fig. I (Usenet newsgroup), Fig. J (Usenet newsgroups)

COM Fig. A (communicators), Fig. B (users), Fig. F (users), Fig. G (user), Fig. H (authors),Fig. I (participants, author)

MES Fig. B (information flow, public posts), Fig. C (number and length of utterances), Fig. D(textual, words), Fig. E (email, messages), Fig. F (postings), Fig. G (postings), Fig. H(page), Fig. I (average number of posts)

REL Fig. A (relationships between pairs of communicators), Fig. E (conversational thread),Fig. I (per thread)

TOP Fig. C (subject), Fig. J (news-based)EMO Fig. J (angry, peaceful)

89

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

Table 6.19: Visualization of discourses

A

The “PieSpy Social Network Bot”monitors IRC channels and performsa set of heuristics to infer relation-ships between pairs of communica-tors. In the figure, a simple social net-work for the freenode channel #javais created [Mut04b].

B

“Google+ Ripples” [Vi13] “is a vi-sualization of information flow thatshows users how public posts areshared on Google+”.

C

This figure visualizes the numberand length of utterances (y-axis) col-lected in three-dimensional virtualworlds. [Bö+02]. Some 19 subjectnumbers are plotted on the x-axis.

D

“Chromograms” displays long tex-tual sequences (e.g., activities onWikipedia) by mapping words to col-ors [Wat+07].

continued on the next page

90

6.7 Visualization

E

“Thread Arcs” visualizes emailthreads. Kerr [Ker03] notes that itcombines “the chronology of mes-sages with the branching tree struc-ture of a conversational thread in amixed-model visualization”.

F

The “Chat Circles” history inter-face shows “conversation over timewhere activity patterns becomequickly observable” [DV02]. Verti-cal lines represent users, horizontalbars, their postings.

G

A “PeopleGarden” shows “mes-sages from a message board with1200 postings over a 2-month period”[XD99]. Xiong and Donath [XD99]add that the “height of a flower rep-resents how long a user has been inthe board, as indicated by time offirst posting”.

H

A history flow visualization applica-tion produces “a graphical view ofthe revision history of an individual[Wikipedia] page” [Vi07]. It showsthe revision sequence (x-axis) andthe contributions of different coloredauthors (y-axis) [Vi04; Vi07].

continued on the next page

91

Chapter 6 Multiple-views analysis approach to computer-mediated discourses

I

“Newsgroup Crowds” shows the ac-tivity of participants in a Usenetnewsgroup. Viégas and Smith[VS04] add that the “number of daysan author has been active during thechosen month” is displayed on thevertical axis. The horizontal axis rep-resents “the author’s average num-ber of posts per thread in the news-group” [VS04].

J

Loom and Loom2 are visualiza-tion tools for Usenet newsgroups[Don+99; Don+01]. The figure op-posite classifies and spatializes theactual content into four categories:angry (mapped as red), peaceful(green), news-based (yellow), and allothers (blue) [DK01].

As seen in the previous Table 6.19, various possibilities exist for visualizing discourses. Asociogram (e.g., [Fre00] or [Hua+06]) is used to visualize Log Example 12. First publishedin 1934, sociograms were developed by Jacob L. Moreno [Mor78]. Nodes representcommunicators, while links show connections (flows of messages) between these nodes.Additional information extends the sociogram (see Table 8.23). An extended sociogram isable to visualize all 12 suggested views.

6.8 Chapter summaryA multiple-views analysis approach to computer-mediated discourses was presented.Twelve fundamental views were explained and some of their random extracted attributeswere shown. The phrase of the key questions was summarized as “Who communicateswith which hardware, software, and network, where, when, (about) what, with whom,how, why, with what effect, and with what effectiveness?” In the next chapter, thepresented multiple-views analysis approach is applied to IRC.

92

CHAPTER 7 Multiple-views analysis ap-proach applied to IRC

The multiple-views analysis approach to computer-mediated discourses is applied to IRC.This approach is described in Chapter 6. IRC is used for (automated) discourse analysis inthe following chapters.

7.1 Information on the general stepsThe author focuses on the popular (quasi-)synchronous IRC. The main reason for thischoice of focus is the availability of open and well-documented protocols. The goal inthis chapter is to describe the multi-views analysis approach that is more IRC specific,especially concerning the views. Dataset 2 is used to describe log examples (see Section8.1).

7.2 ViewsThe CMC system IRC is analyzed based on the defined views in Section 6.5. The viewscontext, communication barrier, causality, and effectiveness are mainly not IRC-specific.

7.2.1 MessageCommunicators send each other messages. Researchers analyze them for discourseanalysis.

Typical path of a message in IRC

Depending on the current step on the way from sender to receiver(s), a certain messagemay look different or may not be available for analyzing. In Table 7.1, the communicator<ALEX> sends a public and private message to the user <Ttech>. Another user in thechannel (#channel) on the IRC server (morgan.freenode.net) is, for example, <Nazca>.The message in <ALEX>’s mind is encoded and typed (e.g., on a keyboard) into the inputbox of the client. Input via speech-to-text is also possible. The text is displayed on thesender’s screen. The raw message is transmitted either to all users in the channel or onlyto the specific user <Ttech>. The received message is then displayed on the receivers’screens and decoded in their minds. <Nazca> can only read <ALEX>’s public message.

Table 7.1: Typical path of a message in IRCStep Message

Sending a public message<ALEX>’s mind I want to greet Ttech<ALEX>’s client input hi Ttech

continued on the next page

93

Chapter 7 Multiple-views analysis approach applied to IRC

Step Message

<ALEX>’s client output <ALEX> hi Ttechraw message from <ALEX>’s client morgan.freenode.net PRIVMSG #channel :hi Ttechraw message to <Ttech>’s client :ALEX!˜ALEX@unaffiliated/alex PRIVMSG #channel :hi Ttechraw message to <Nazca>’s client :ALEX!˜ALEX@unaffiliated/alex PRIVMSG #channel :hi Ttech<Ttech>’s client output <ALEX> hi Ttech<Nazca>’s client output <ALEX> hi Ttech<Ttech>’s mind ALEX greets me<Nazca>’s mind ALEX greets Ttech

Sending a private message<ALEX>’s mind I want to privately greet Ttech<ALEX>’s client input /msg Ttech hi<ALEX>’s client output *ALEX* hiraw message from <ALEX>’s client morgan.freenode.net PRIVMSG Ttech :hiraw message to <Ttech>’s client morgan.freenode.net PRIVMSG Ttech :hiraw message to <Nazca>’s client -<Ttech>’s client output *ALEX* hi<Nazca>’s client output -<Ttech>’s mind ALEX (privately) greets me<Nazca>’s mind -

Message characteristics

IRC is used almost exclusively for text-based communication. IRC also allows direct user-to-user connections such as DCC to transfer binary files (e.g., pictures). The predominantlanguage on IRC is English, although other languages such as German or Spanish areused as well (especially in dedicated channels). Certain linguistic features are commoncharacteristics in IRC. Greiffenstern [Gre10] adds that “[t]he language in chat rooms candiffer widely depending on the chat participants, the topic of the chat, and the generaltone in the chat room”. Typical features in IRC discourses are, for example:

• short message lengths [Has09; Pao99];

• code switching and code mixing [Bar10a; VG06; VG08]; or

• non-standard orthographic forms [Pro11]

– various types of shortened forms (abbreviations) for words and phrases, in-cluding acronyms and initialisms [Has09; Hor08; Pro11];

– letter/number homophones [Pro11];

– non-conventional spellings or phonetic spellings [Bar10a; Gre10; LZ05; Pro11;VG06] including leetspeak [Bar10a; Mut04a];

– emoticons (e.g., smiley without nose) [Has09; Rei91; Sch99] and letter redupli-cations for emphasis [Sch99];

– words or phrases framed in asterisks to express gestures and actions [Hen98;O’C+10; Seg02];

– self-referencing [Kor99];

94

7.2 Views

– ASCII art [Her02; Kal07];

– words or phrases written either in capitals only or all lower-case [Bar10a;Gre10];

– minimalist or missing punctuation [Gre10; Ngu08; Sim03]; and

– spelling and self-correction (repair) [Hun+10; Mut04a; VG06].

7.2.2 Date/timeEarlier messages can be scrolled back within a limited buffer [Her02]. This is especiallyhelpful when many participants are involved [Her02]. Optionally, each message startswith a timestamp. While chatting, the following two commands display date/time withmIRC: CTCP TIME (the date and time of the current nickname) and TIME (the local timeon the current server).

7.2.3 Hardware and networkSeveral sites like netsplit.de provide different IRC statistics about networks, users, andchannels. According to netsplit.de [Gel13], at the end of April 2012, the top 100 IRCnetworks served 484524 users in 246789 channels on a total of 1277 servers worldwide.Some 674 known IRC networks were listed. There were three mavericks (networks that donot take part in the competition, e.g., freenode) and 46 applicants (networks that were notaccepted yet, e.g., DragonNet) [Gel13]. An extract of the ten biggest networks (on average)is illustrated in Table 7.2. Mavericks and applicants are not included. The more users areonline, the more messages are usually sent (i.e., can be analyzed).

Table 7.2: IRC Networks - Top 10 on 27.06.2012 (adapted from [Gel13])Rank Network Ø Users Network Ø Channels Network(s) Ø Servers

1 QuakeNet 63440 QuakeNet 46403 OltreIrc, OpenJoke 432 IRCnet 58192 IRCnet 335953 Undernet 57649 Rizon 23993 MindForge 424 EFnet 34800 EFnet 17364 EFnet, QuakeNet 415 Ustream 20899 Undernet 143266 Rizon 20305 FCirc 12296 DarkSin, RusNet 357 IRC-Hispano 15152 IRC-Hispano 81478 DALnet 13201 DALnet 7089 DALnet 339 WebChat 11394 GameSurge 6942 DALNet.RU 32

10 FCirc 9597 OnlineGamesNet 6440 1ndonesia 30... ... ... ...

∑ 484524 246789 1277

Various commands are available for obtaining information about IRC networks, servers,and services. For example, LINKS lists all the servers currently linked to the network.The LUSERS and USERS commands return statistics about the size of the network; MAPdisplays a network map of the IRC network; and VERSION provides information on theIRCd software. A few commands are only available for server administrators. Theseare, e.g., the commands DIE, REHASH, or RESTART. JOIN and QUIT messages usuallyinclude the user’s IP address. A simple way to hide the IP address or host name from

95

Chapter 7 Multiple-views analysis approach applied to IRC

other users is to use a cloak. On freenode, a cloak looks like “unaffiliated/” followed by aregistered NickServ account [Fre14]. On SwiftIRC, the numeric IPv4 address is replacedby three truncated MD5 hashes [Swi14]. A long lag time frustrates chatters [Oku05]. TheCTCP PING command is used to measure the delay in the IRC network between clients.Mutton [Mut04a] notes that “[n]etsplits occur frequently on IRC”. Users can detect anetsplit from the characteristic QUIT messages [Cha00].

7.2.4 SoftwareAs well as a hardware and network connection, software is also needed to run IRC. Adetailed introduction to IRC software is given in Subsection 5.1.2.

7.2.5 CommunicatorAfter connecting to IRC servers with clients (i.e., being online), communicators are literallyvisible, and are identified by their unique nicknames within the IRC networks. Eachconnected communicator is a potential discourse partner. Joining a channel is not necessaryfor communication.

Detection of the communicator’s nickname

Log Example 1Line mIRC command Message

1 - <Adeelsidiki> hi2 CTCP [Zaenux VERSION]3 ECHO #mIRC ECHO4 INVITE * Merbo (Merbo@MerbosMagic/Founder/Merbo) invites you to join #Test5 JOIN * Now talking in #freenode6 * karatkievich.freenode.net sets mode: +ns7 * Gryyt (˜g@unaffiliated/gryyt) has joined #defocus8 KICK * Detergentizer was kicked from #defocus by daxx9 ME * Amis hugs the teddy

10 MODE * mrmist sets mode -o MyGirl111 MSG *hisp* hi...how are you?12 NICK * berban is now known as Guest8802513 NOTICE -Yui- hi14 NOTIFY * bazhang [˜bazhang@unaffiliated/bazhang] is on IRC15 PART * Merbo (Merbo@MerbosMagic/Founder/Merbo) has left #freenode16 QUIT * habmala (˜[email protected]) Quit (Quit: leaving)17 TOPIC * dax changes topic to ’Welcome to #freenode’18 WALLOPS !RobiX! Have Fun!19 WHOIS Dave2 is ˜Dave2@freenode/staff/dave2 * Dave Wickham

A short general explanation of the log examples is given in the introduction to Chapter 8.Most of the discourse messages automatically include source nicknames. In Log Example1, source nicknames are placed within messages at fixed positions depending on themessage types (e.g., join, kick). Exceptions are, for example, the messages after joininga channel shown in mIRC (lines 5 and 6) or the ECHO command (e.g., “/echo #mIRC

96

7.2 Views

ECHO”), which prints text in the specified window (line 3). Both messages are displayedonly in the own client window. The nicknames of communicators that are written withinmessages by senders are analyzed in Subsection 8.4.8.

Tracking IRC nickname changes

Communicators can easily change their nicknames over time. Thus, impersonatingsomeone or stealing a nick just for fun is quite simple [Mut04b]. It is important towatch and track all the changes of nicknames to see the history of the recent one. Thisinformation is useful for analyzing discourse to find written nicks. For example, eachword in a message can be compared with the history of current and older used nicknamesto find the receivers of the message. Nickname changes are represented as “is now knownas” (see Log Example 1, line 12).

Availabilities of communicators

IRC commands such as ISON, NAMES, WHO, and WHOWAS return information aboutthe available statuses of communicators on IRC. This process is done in IRC by simplywriting an away message as an action (see Log Example 2, line 1), using the AWAYcommand (line 2), or changing the nickname with the NICK command to indicate thecurrent status (line 7). The AWAY command marks communicators as being away. It isshown to others when they message them (line 8), or when they use the WHO or WHOIScommands. “H” means “here” in line 9 and “G” means “gone” (line 11). Additionally,the idle time can be used to check how long users have been silent (line 19). <r00k> (line3), <Mikaela> (line 5), <epix> (line 6), and <kloeri> (lines 9 and 14 to 21) are online.<Luke> was recently online (lines 24 to 29). <rook> (line 4) and <testuser> (lines 13,22, 23, and 30) are not users.

Log Example 2Line mIRC command Message

1 ACTION * pur| goes away2 AWAY You have been marked as being away3 ISON ison: r00k4 ison: no such user5 NAMES #jsis Mikaela6 #fh-nuernberg epix7 NICK * tonymec__ is now known as tonymec|away8 PRIVMSG kloeri is away: i am away9 WHO #freenode kloeri H+ ˜kloeri@freenode/staff/exherbo.kloeri :0 Bryan Ostergaard

10 kloeri End of /WHO list.11 #freenode kloeri G+ ˜kloeri@freenode/staff/exherbo.kloeri :0 Bryan Ostergaard12 kloeri End of /WHO list.13 testuser End of /WHO list.14 WHOIS kloeri is ˜kloeri@freenode/staff/exherbo.kloeri * Bryan Ostergaard15 kloeri on +#freenode16 kloeri using verne.freenode.net NL

continued on the next page

97

Chapter 7 Multiple-views analysis approach applied to IRC

17 kloeri is away: i am away18 kloeri is using a secure connection19 kloeri has been idle 29mins 23secs, signed on Fri May 16 15:47:0720 kloeri is logged in as Kloeri21 kloeri End of /WHOIS list.22 testuser No such nick/channel23 testuser End of /WHOIS list.24 WHOWAS Luke was ˜Luke@unaffiliated/luke * Luke25 Luke was logged in as Luke26 Luke using leguin.freenode.net Sat May 31 08:39:35 201427 Luke was ˜Luke@unaffiliated/luke * Luke28 Luke was logged in as Luke29 End of WHOWAS30 testuser There was no such nickname31 End of WHOWAS

Availability of communicators is important to observe for some reasons. Communicatorswho are offline or no longer joined to channels, for whatever reason, are not able to readchannel messages that are related to them. Three versions with a different number ofstatuses to visualize the availability of communicators are suggested by the author andpresented in Table 7.3. The first version is channel-centered. Users are (not) joined to achannel or this information is unknown. The three statuses are visualized in green, red,and gray. The information required is extractable from the discourse without any IRCcommands. The second version, with four statuses, additionally focuses on the network(or server), which is visualized in orange. This status means that a user is connectedto the network but (1) is not joined to the specific analyzed channel, or (2) the channelstatus is unknown. The third version has five statuses and includes the virtual presenceof communicators in channels (present or away) with the additional status visualized inyellow. This status means that a user has joined a channel but is currently away. The blackvisualized status is listed for the sake of completeness. It shows impossible statuses, e.g., auser cannot join a channel without connecting to a network.

Table 7.3: Available statuses of communicatorsThree statuses Four statuses Five statuses

Status Channel Network Channel Network Channel Communicator

joined connected joined connected joined present- unknown joined unknown joined present- - - connected joined away- - - unknown joined away- connected not joined connected not joined -- connected unknown connected unknown -not joined not connected not joined not connected not joined -- not connected unknown not connected unknown -unknown unknown unknown unknown unknown -- unknown not joined unknown not joined -

- not connected joined not connected joined present- - - not connected joined away

98

7.2 Views

The extracting and calculating of the current available statuses of each communicator aswell as the visualization process consist of the following steps:

• Extracting: If possible, the current available statuses of communicators are extractedfor all logged lines. For example, communicators who exchange messages in chan-nels or leave the channels have to be online at the time of sending.

• Calculating: Based on the information of the first step, all other unknown statuses ofcommunicators are calculated. For example, a communicator writes two messagesthat are logged as lines 1 and 10. If there are no leave or quit messages betweenthese lines, the availability status of this communicator is always “online”.

• Visualization: Table cells with a black background color indicate the first step. Forexample, the status of communicators who are online is visualized as green witha black background. Unfilled cells are calculated and filled with the related statuscolor and a white background.

An example is visualized in Table 8.10.

User information and user profile

To obtain information about user-specific (real world) and IRC-specific attributes (virtualworld), it is necessary to analyze chat discourses in detail. Communicators disclose infor-mation of all kinds when chatting, and nicknames can also include personal informationsuch as first name, age, sex, or location. Users can be identified by their unique IRC nick-names, IP addresses, writeprints, or special cookies such as supercookies [Bro12; MM12].This uniqueness is not always present but can simplify the assignment of identifiers totheir virtual and real profiles. Most IRC networks do not require users to register accounts.Any user can connect to an IRC under any nickname at any time [GM04]—even duringa conversation. Liu [Liu99] notes that “[t]he more nicknames each participant uses, themore difficult it is to trace individual identities”. Communicators use not only fixed(static) IP addresses but also dynamic ones. Even the providers do not usually match ifpeople are chatting both at home and at work. Additionally, writeprints could be usedor simulated, e.g., to impersonate other communicators, and it is therefore difficult togenerate individual user profiles.

7.2.6 RelationThree kinds of communication occur in IRC. Conversations can either be public (in publicchannels), private (one-to-one communication), or semi-private (in private channels suchas invite-only or secret channels). Vonck [Von12] mentions that a “channel is PUBLIC bydefault”. To get more information about relations among communicators, it is necessaryto detect written nicknames in the chat discourse, map them to the users, and split themessages into conversation threads.

Detection of nicknames in discourse

IRC messages can include users’ nicknames for direct addressing. Direct addressingoccurs when communicators insert the nicknames of other users for addressing [Nas05].

99

Chapter 7 Multiple-views analysis approach applied to IRC

Mutton [Mut04b] adds that “[d]irect addressing is not always used (or required) to specifythe target of a message”. If used, the message usually starts with the receiver’s nickname,followed by a colon and space. Addressing by nickname “functions in a similar mannerto gaze in face to face interaction” [Bay98]. Werry [Wer96] calls this phenomenon “ad-dressivity”, comparing it with Sacks’s concept of “speaker select”. Herring [Her99] callsit “cross-turn reference”. This practice prevents discourse confusion because it avoids“ambiguity and discontinuity in structures of exchange or turn-taking” [Wer96]. Multipledirect addressing is possible by separating addresses with different characters. Indirectaddressing is also found in IRC. It is where users “refer to those who are either not present,or those who are present but do not participate in the interaction directly” [Has09].

In contrast to one-to-one communication, in many-to-many communication it is not alwaysclear who is chatting with whom. For automatic analysis, software tries to identify writtenwords in discourse as (variants of) nicknames but various complications in the detectionof nicknames can occur. Possible results of nickname detection are shown in Table 7.4.

Table 7.4: Detection of nicknamesSoftware identifies word in discourse ...

Communicator writes word in discourse ... as a (variant of a) nick NOT as a (variant of a) nickas a (variant of a) nick right wrong1

NOT as a (variant of a) nick wrong2 right

Class “wrong1” occurs when a software does not identify the written (variants of) nicks inthe discourse. This can happen through orthographic errors due to typing (e.g., “kenndy”instead of <kennyd>), by shortening or omitting parts of nicks (“sanity”, <sanity476>),or by playing with the original nickname (“El3M3NT”, <element>). In class “wrong2”,a software identifies words as nicks that are not nicks. The wrongly identified nicks arespelled exactly like common words (“i am eighteenth”, <eighteenth>), abbreviations (“i’mnot prof in english”, <Prof>), acronyms (“Lol Sara_705 me 2”, <LOL>), or emoticons(“fine xD”, <XD>).

Mapping of written nicknames to communicators’ identifiers

The correct mapping between written nicknames in discourse and related communicatorsand their used nicknames is not easy; not even with the assumed knowledge that thewritten word is definitely a nickname of a current logged-in user (or someone who justleft/quit). Some problems of mapping possibilities are shown in Table 7.5.

Table 7.5: Mapping of nicknames: Assignment possibilitiesPossibility/-ies Explanation

0 No mapping is possible. Due to a totally wrong written nick or an extremelyshortened variant.

1 Usually, the best possible case. The mapping can be wrong if, for example, the nickis accidentally written to another nick (e.g., “guest2” instead of “guest1”).

2 or more More mappings are possible, e.g., “guest” may refer to <Guest3> or <Guest_9>.

100

7.2 Views

Conversation threading

Smith et al. [Smi+00] identify the following five core problems with text chat: (1) lack oflinks between users and what they say, (2) no visibility of listening-in-progress, (3) lackof visibility of turns-in-progress, (4) lack of control over turn positioning, and (5) lack ofuseful recordings and social context. Each message is “displayed in the chronologicalorder in which it is received by the IRC system” [Her96]. It is impossible to overlap andinterrupt turns [Her96; Nas05; Riv02b]. A turn is a “message followed by a carriage return”[Nas05]. A number of conversations (i.e., multiple threads) run simultaneously (i.e., inparallel). Hastrdlová [Has09] adds that “users often participate in more than one streamof conversation at a time”. The longer the discourse, the more adjacency pairs such asquestion-answer or greeting-greeting are intertwined. Discourses often look quite chaoticand can be difficult to follow (e.g., co-text loss) [Cha00; Her99; Hol+09; Mar06; OM03;Shi+06]. Furthermore, it is often difficult to determine the beginning and end of a singlethread. In this work, a thread is defined as the exchange of at least one message betweentwo specific communicators. A thread is “a chain of interrelated messages” [RS97]. Thefollowing formula identifies the number of communication paths (i.e., the maximumnumber of threads): maximum number o f threads = n·(n−1)

2 , where n is the number ofcommunicators.

7.2.7 TopicIRC includes a huge diversity of topics. Discourses usually consist of many topics andsubtopics. Simpson [Sim05] notes that subtopics “emerge and fade”. However, (sub)topicsand their boundaries themselves are difficult to define clearly and identify [Sim05]. Userscan join several channels to discuss certain topics. The LIST command is used to search fora specific topic. It returns a list of public channels with their topic and number of users.

After entering an IRC channel, a software client such as mIRC displays the channel topic.Channel names and their topics give a rough first impression of the subjects talked abouton these channels [Sch99]. Users should check this topic to see if there are any restrictions[Jon97]. Although Bechar-Israeli [BI95] notes that “topics of conversation on most IRCchannels are similar, regardless of the channel’s name” (except technological channels),Bengel et al. [Ben+04] say that a channel’s topics “vary based on its participants’ currentinterests”. A set of messages grouped according to specific topics is called topic thread.It is important to detect topic boundaries within discourse for semantic analysis. Moensand De Busser [MDB01] note that “[o]nce the segments are found, they can be describedby key terms”. Phrases such as “by the way”, “and another thing”, or “let’s talk about”can be used to indicate a new unrelated topic [OM03]; preclosings such as “well”, “so...”, or “ok” [Hol08b; SS73] signal the end of topic threads. Several interwoven topicsare discussed in parallel. After rearrangement of the messages in their “intended” order[Shi+06], the messages fall into seven parallel topic threads. Other ways to thread andname are certainly possible.

The 10 biggest IRC channels with their topics (cut off in the table) on June 17, 2012 arepresented in Table 7.6. Mavericks and applicants are not included.

101

Chapter 7 Multiple-views analysis approach applied to IRC

Table 7.6: IRC channels: Top 10 on 17.06.2012 (adapted from [Gel13])

Rank Channel Network Users Topic

1 #Mega_Fantasy OpenJoke 2475 -=® |[ MEGA_FANTASY? ]| ®=-SPEED 500K2 #CoRaZoN-GyTaNo OpenJoke 2258 .:CoRaZoN-GyTaNo:. SIAMO IN MANUTEN3 #GaLeoNe-dEi-PiRaTi OpenJoke 2219 #GaLeoNe-dEi-PiRaTi FACCIAMO PARTE D4 #ELITEWAREZ Criten 2144 EEELiTEWAREZZZ /join #elite-chat !search/5 #future_games OpenJoke 1973 \\\GIOCHI/// PER LE PASS DIGITA !PASS6 #NuCLeaR OpenJoke 1795 [»NuCLeaR?«] ZERO LiMiT »NEWS-HD« 217 #THE-DOCTOR-46 OpenJoke 1789 ®THE-DOC-TOR-46?®EDICOLA: 17/06 NE8 #NEWS Rizon 1770 #NEWS - bring an xdcc bot for +a | http://ne9 #A-TeAm OpenJoke 1698 Edicola 17/06/2012 NOVITA Categoria Napo

10 #mas_de_40 IRC-Hispano 1694 ø¤◦`◦¤ø Bienvenid@s a #mas_de_40 Podéis en

∑ 19815

7.2.8 EmotionDifferent linguistic features are used in IRC to express emotions while chatting.

Emoticon (smiley)

An effective way for users to express emotion and reflect feelings on IRC is with text-basedemoticons. An emoticon is a blend of the words “emotion” and “icon” [Are11].

Table 7.7: Classification of typographic emoticons (adapted from [Ama12])Category Subcategory Emoticon Explanation

emotional-attitudinal facial expressions :-) happy, smile:-( sad;-) wink

objects, peoples, animals 0:-) angel>:-> devil

action :#) drunk:-* kissing

appearance :%)% a face with acne(8-) bald head(-: left-handedB-) wearing glasses:-{} wearing lipstick&:-) curly hair

pictorial *<<<<+ Christmas tree~(_8ˆ(I) Homer Simpson@(*0*)@ koala@@@@:-) Marge Simpson+<:-) pope

Amaghlobeli [Ama12] classifies emoticons into three groups: typographic emoticons(formed with punctuation marks or other typographic symbols), graphic emoticons (often

102

7.2 Views

animated, represent images in GIF, i.e., graphics interchange format; sometimes auto-matically converted by clients), and verbal emoticons (verbally represent graphic ortypographic ones, e.g., “Happy Smiley”). She adds that typographic emoticons can beplaced into two major categories (see Table 7.7). These are: first, emotional-attitudinalemoticons that provide emotional information and represent a) facial expressions, b) ob-jects, peoples, animals, c) action, and d) appearance; and second, pictorial emoticons aresimple pictures made with keyboard characters that do not convey non-verbal information.Ruan [Rua11] emphasizes that “[e]ven the tones of one’s remarks can be expressed thoughemoticons” (e.g., “:-]” or “:->” mean sarcastic). The term “smiley” is often used as ahypernym for any emoticon, coming from the most popular one. It is easier and faster touse, for example, the three-character smiley “:-)” (tilted 90◦ to the left) than to write “thismakes me happy”. Schulze [Sch99] analyzed an English IRC corpus and found out thatthe type smiley (with its subtypes, e.g., “:)”, “:-)”) and the type frowney (including, e.g.,“:(”, “:-(”, “:((”) are the most-used emoticons. Baron [Bar03a] notes that “new emoticonscan arise at any time, especially among restricted groups of users”. The Eastern style isdifferent from the Western one. Japanese smileys (called “kaomoji” or “face mark” [KY07;Lee16]) pay special attention to the eyes (see Table 7.8). They are not tilted sideways.

Table 7.8: Japanese basic text emoticons (kaomoji) (adapted from [Kav15; Oku05; Pta+11])Kaomoji Explanation

(^_^) to be happym(._.)m to bow(^o^) laughing face\(^o^)/ raising both hands and saying ’banzai’ (yippee!)

Emote

The term “emote” means “to express emotion in a very dramatic or obvious way” [Mer17b].An action command (“/action”, “/describe”, or “/me”) “leads to third-person-utterancesabout oneself” [Hen98]. These mIRC commands are used to indicate emotions, feelings, orto show actions by using words. The DESCRIBE command is the same as the ACTION orME command, except for an optional channel parameter. Another way to express emotionis to surround text by asterisks (see Log Example 3).

Log Example 3Line Type Input Output

1 ACTION command /action just smiles * Dozer just smiles2 DESCRIBE command /describe #room just smiles * Dozer just smiles3 ME command /me just smiles * Dozer just smiles4 Asterisks *just smiles* <Dozer> *just smiles*

Linguistic shortenings

Acronyms are used to reduce the amount of typing and speed-up the response time. Theyalso can express emotions. Like emoticons, the list of acronyms seems to be endless. Forexample, the alphabet-soup LOL, OMG, ILU, and WTH means “laughing out loud”, “ohmy God”, “I love you”, and “what the hell”.

103

Chapter 7 Multiple-views analysis approach applied to IRC

Punctuation

Punctuation marks are used most at the end of a message [Cry04]. Although often omitted,they are also used excessively to convey emotions, “[p]erhaps one reason ... is a result ofthe competition for attention in chat rooms” [Bla04]. In general, the more punctuationmarks are used, the stronger the emotions that are expressed (“than???”, “to late now!!!”).The emotional punctuation marks include a question mark (“how old?”), an exclamationmark (“hi italian guy!”), quotation marks (“a question: Is this the ‘English Room’ ?”), or acombination form such as an interrobang (“and for what?!”). Periods/ellipsis dots (“well... i can help u ... tell me the truth ..”) and hyphens (“help me--- please”) are “employed tocreate pauses and to indicate tempo” [Wer96].

Reduplication

With the morphological process of reduplication [Hen98; Wer96], repeated characterssometimes produce long strings. For example, vowel elongation (“cooool”), consonantrepetitions (“hehehe”), repetition of words (“Yes, yes yesssssssss!!!!!!”), exaggerated punc-tuations (“like me!!!!!!”), deep sadness (“:(((((”), or positive applause (“+++”) show emotionor emphasis, or simulate the sounds of speech [Bla04; Hen98; Sim03; Wer96].

Other ways to highlight or emphasize

Further ways to emphasize messages also exist. A message written entirely in capitalletters is considered rude, and should be avoided [Cry04] because it sounds harsh, likeshouting (“U WILL BURN IN THE HELLS”) [Kro94]. Emphasis on certain words orphrases can be made with different markers such as capital letters (“dont look THERE”),letter spacing (“h e l l o”), asterisks (“throughput *can* be better”), slashes (“lolcat isn’tan /actual/ bot”), and underscores (“that is what _she_ said”). Many IRC clients suchas mIRC support the viewing and sending of formatted messages [Mut04a]. mIRCinterprets control codes for normal, bold, italic, underlined, reverse text, and foregroundor background color of a word [MB10] by using key combinations to insert control codes(e.g., Control-B for bold). Only the part of the message “that is enclosed by the start andend codes will be affected” [MB10]. Unfortunately, those control codes are more oftenabused than used effectively. This overuse usually results in a ban.

7.3 Chapter summaryThis chapter showed the multiple-views analysis approach, as described in Chapter 6, andapplied to IRC. Logged IRC discourses are analyzed in the next chapter.

104

CHAPTER 8 Multiple-views analysis ap-proach applied to IRC dis-courses

This chapter uses qualitative and quantitative (i.e., mixed) methodologies for analyzingIRC discourses. Messages from public IRC channels were logged and post-mortemanalyzed. In contrast to online analysis, missing (i.e., not stored) information is—if atall possible—difficult to reconstruct. Asking subsequent questions or performing IRCcommands to get more information are usually no longer possible. Discourse extractsare explained and mainly visualized in this thesis with the help of the IRC client mIRC(see Subsection 8.6.2). The line numbers next to the log extracts are inserted for ease ofreference. Additionally, it is common to enclose the user’s nickname in “<” and “>”. Theunderlined line numbers in the log examples indicate the borders of the pasted discoursefragments within the examples.

8.1 Datasets in this thesisDataset 2, which is used in this chapter, is described in the next section, following anoverview of all the datasets in this thesis The mixed discourse analysis in this thesis isbased on two main datasets (Datasets 1 and 2; numbered in chronological order) andfour subsets (Datasets 1a, 1b, 2a, and 2b). These datasets were collected from variousIRC networks and channels. They are presented with their corresponding chapters orsubsection for the qualitative (Qual.) and quantitative (Quant.) analysis used that isshown in Table 8.1. Whispering via private message between other users was not logged.The main language of all the channels logged was English.

Table 8.1: Overview of datasets

Net- Chan- Mes- Nick- Logging date Analysis Chapter(s) orNo. work(s) nel(s) sages names From To Qual. Quant. subsection

1 7 13 2403777 150278 27.06.2008 28.07.2008 3 7 91a 7 13 7936 01.07.2008 01.07.2008 7 3 91b 1 2 8937 01.07.2008 01.07.2008 7 3 9, 102 6 11 288507 23136 11.01.2012 15.01.2012 3 3 7, 82a 1 1 5605 456 13.01.2012 13.01.2012 3 3 11–132b 1 1 6382 11.01.2012 15.01.2012 3 3 8.4.10

A detailed overview of Dataset 1 is given in Table 9.3, Dataset 2 in Table 8.3.

8.2 Information on the general stepsIn this chapter, the multiple-views analysis approach is applied to logged IRC discourses.The goal is to find relevant attributes for the 12 defined views in IRC discourses (see

105

Chapter 8 Multiple-views analysis approach applied to IRC discourses

Section 6.5). The steps below are followed to collect and extract data for this chapter.

8.2.1 Data collectionIn a manner similar to one proposed in Section 9.1.1 (the logging of Dataset 1 was donebefore), several channels have been selected to cover categories of frequently-searchedchat terms. Some new channels, that is, #music (instead of #eminem), #perl6 (#perl), and#irchelp (#chat-world, #talk), are logged for Dataset 2. The channel #church is not usedbecause of too little traffic for serious analysis. In summary, six different IRC networksand eleven public IRC channels are classified into 7 different categories (see Table 8.2).

Table 8.2: Selected IRC networks, servers, and channels per categoryCategory Network Server Channel

cars GameSurge irc.gamesurge.net #carsconversation freenode irc.freenode.net #defocus, #freenode

SwiftIRC irc.swiftirc.net #irchelpcountries, languages, cities IrCQ-Net irc.icq.com #English

QuakeNet irc.quakenet.org #englandgames, sports EFnet irc.servercentral.net #soccerlove, relationship IrCQ-Net irc.icq.com #Romancemusic freenode irc.freenode.net #musictechnology, Internet freenode irc.freenode.net ##hardware, #perl6

Table 8.3 shows user messages (written by users to communicate with others), systemmessages (reported by IRC servers, e.g., nickname changes, user joins), and public mes-sages (all logged messages; summary of system and user messages) with absolute (Abs.)and relative (Rel.) frequencies. Similar to Dataset 1, about one third of all public messagesin Dataset 2 are system messages (see Chapter 9).

Table 8.3: Dataset 2User messages System messages Public messages

Frequency Frequency FrequencyChannel Abs. Rel. Abs. Rel. Abs. Rel.

#cars 29 21.32% 107 78.68% 136 0.05%#defocus 22881 71.39% 9168 28.61% 32049 11.11%#england 2696 72.79% 1008 27.21% 3704 1.28%#English 65303 64.08% 36598 35.92% 101901 35.32%#freenode 12229 48.40% 13038 51.60% 25267 8.76%##hardware 21245 78.57% 5794 21.43% 27039 9.37%#irchelp 6382 45.40% 7675 54.60% 14057 4.87%#music 525 36.66% 907 63.34% 1432 0.50%#perl6 5498 79.49% 1419 20.51% 6917 2.40%#Romance 44564 61.69% 27677 38.31% 72241 25.04%#soccer 2825 75.05% 939 24.95% 3764 1.30%

∑ 184177 63.84% 104330 36.16% 288507 100.00%

106

8.3 Ethical considerations

All discourses in the selected channels were logged from January 11, 2012 to January 15,2012 with the Java program LogBot. Web-based logs in XHTML (extensible hypertextmarkup language) format were created. LogBot is an IRC bot that uses the API (appli-cation programming interface) of the Java framework PircBot (http://www.jibble.org/pircbot.php). LogBot connects to an IRC server and creates public logs for IRC channels.Each chat log from a channel was stored in a single file per day. Additionally, compared toSection 9.1.1, the Java source code of LogBot was adapted to log the timestamp in a moredetailed way. The standard log time format was changed from H:mm to HH:mm:ss:SSS.After logging, the text-based IRC discourses are prepared for analysis. One original logmessage is shown in Log Example 4 (line 1). Messages are timestamped using the fol-lowing format (05:14:01:794): hours (05), minutes (14), seconds (01), and milliseconds(794).

Log Example 41 <span class="irc-date">[05:14:01:794]</span> <span class="irc-brick">* GTRsdk 6</span><br />2 [05:14:01:794] * GTRsdk 6

8.2.2 Data extractionTo aid analysis, HTML span tags (e.g., “</span><br />”) were removed (Log Example4, line 2). Also, several HTML entities were converted into special characters. For ex-ample, “&lt;”, “&gt;”, and “&amp;” became “<” (less than), “>” (greater than), and “&”(ampersand).

8.3 Ethical considerationsThe author shares Liu’s [Liu99] opinion that “[c]onversations in publicly accessible IRCchannels are public acts deliberately intended for public consumption”. Therefore, it isacceptable to use the data without users’ permission or consent [Bla04]. Nevertheless,private information about the participants such as real names, email addresses, and IPaddresses are only published in this thesis with the permission of the users. Most of thepresented log messages are kept as they are.

8.4 ViewsIRC discourses are analyzed according to the defined views in Section 6.5. All the definedviews are included in the analyzed discourses.

8.4.1 MessageIn summary, PRIVMSG is the most frequently used command. Table 8.4 shows that around64% of all 288507 logged messages are produced with the PRIVMSG command. JOIN andQUIT messages also make up a big share (28.55%).

107

Chapter 8 Multiple-views analysis approach applied to IRC discourses

Table 8.4: IRC commands usedCommand

Channel JOIN

KIC

K

MO

DE

NIC

K

NO

TIC

E

PAR

T

PRIV

MSG

wit

hA

CT

ION

PRIV

MSG

wit

hout

AC

TIO

N

QU

IT

TO

PIC

∑#cars 41 0 14 10 9 11 0 20 31 0 136#defocus 3201 2 2230 532 5 185 1162 21714 3018 0 32049#england 331 14 250 92 0 16 21 2675 303 2 3704#English 17300 119 1115 760 3 4846 529 64771 12458 0 101901#freenode 6018 0 159 852 0 968 135 12094 5037 4 25267##hardware 2759 0 30 229 0 158 173 21072 2618 0 27039#irchelp 2896 53 190 1681 0 774 0 6382 2081 0 14057#music 436 0 0 49 0 28 22 503 394 0 1432#perl6 688 0 0 47 0 30 142 5356 654 0 6917#Romance 13014 68 1052 545 3 4163 2527 42034 8835 0 72241#soccer 408 0 84 41 0 7 18 2807 399 0 3764

∑ 47092 256 5124 4838 20 11186 4729 179428 35828 6 288507% 16.32 0.09 1.78 1.68 0.01 3.88 1.64 62.19 12.42 0.00 100.00

91.31% of all logged IRC messages are less than 85 characters. The minimum messagelength without a timestamp within the logs is five characters, and the maximum is 915,the average is 50.54. Almost all characters (99.97%) are represented by the 7-bit ASCII.The top 10 most frequently written characters and written messages with the PRIVMSGcommand are shown in Tables 8.5 and 8.6. Letter (and word) frequency analysis can beused as a rudimentary technique for language identification.

Table 8.5: Top 10 charactersFrequency

Rank Character Abs. Rel.

1 space 1602978 10.99%2 e 945887 6.49%3 o 714309 4.90%4 a 694032 4.76%5 n 654897 4.49%6 i 631285 4.33%7 t 618353 4.24%8 s 550720 3.78%9 r 421237 2.89%

10 l 392715 2.69%

∑ 7226413 49.56%14581066 100.00%

Table 8.6: Top 10 messagesFrequency

Rank Message Abs. Rel.

1 lol 3925 1.36%2 hi 1555 0.54%3 :) 932 0.32%4 :D 799 0.28%5 hello 571 0.20%6 yes 536 0.19%7 :P 478 0.17%8 ok 467 0.16%9 ? 458 0.16%

10 haha 402 0.14%

∑ 10123 3.51%288507 100.00%

108

8.4 Views

Message characteristics

In Log Example 5, typically-known features in IRC discourses are shown. These are, forexample, short messages (line 1), code switching and code mixing (line 2), shortenedforms (e.g., BBL means “be back later”, line 3), letter/number homophones (e.g., l8terfor “later”, line 4), phonetic spellings (line 5), leetspeak (e.g., a noob or n00b is a newbie,line 6), emoticons (e.g., smiley without nose in line 7), letter reduplications for emphasis(line 8), express gestures and actions (line 9; lines 10 to 12 show a negative and positivereaction to a comment), self-referencing (line 13), ASCII art (e.g., a train, see lines 14 to 19),words/phrases written in either capitals only (line 20) or all lower-case (line 21), missingpunctuation (line 22), and spelling and self-correction (repair) (lines 23 and 24). Only 1762messages (0.01%) include web addresses.

Log Example 51 <denki> Try with ice this day2 <Charlie> yeah russian deutsch French3 * enchilado (˜enchilado@defocus/yummy/enchilado) Quit (Quit: BBL->)4 <haiyai> l8ter5 <Ttech> mmm tasty6 <KinG`PiN> Which makes me look n00ber than a n00b7 <jnthn> ah :)8 <ThankYou> pretty pleaseeeeeee9 <Khorgath> heyas Stealth *hugs*

10 * orbii chews on bacon & egg on lepinja11 <KindOne> bacon--12 <D[_]> BACON ++++++++13 <Tigs> <----- confused14 <hmmm`> --------.-.-.-.-o-o-o-o-o15 <hmmm`> ---------------_____------o-------16 <hmmm`> ------____====--]OO|_n_n__][.-----17 <hmmm`> -----[________]_|__|________)<----18 <hmmm`> ------oo----oo--'oo-OOOO-|-oo\\_---19 <hmmm`> +--+--+--+--+- ˆ_BorinG_ˆ -+--+--+--+--+20 <girly-girl> IAM FROM MOROCCO21 <orbii> (windows + nix works well for me)22 <handsome_210> sarah are you sarah from yesterday this is hichalm23 <janosch> i never found the “del” chars on the keyboard :P24 <janosch> -chars+key

Communication between IRC client and server

IRC is “a dynamic form of communication” [Rhe00]. While adding chat messages, textlines continuously scroll up. The mIRC command “/DEBUG -ntp @window” “[o]utputsraw server messages, both incoming and outgoing” [Cha00] to the custom window“@window” with timestamps. The extract is shortened (e.g., lines of the “Message ofthe Day” between the lines 49 and 50 are excluded), timestamps are not displayed, andsome log lines are cut off (because they are too long). Messages in raw format usuallycontain the most additional information such as server or channel names. The outputs

109

Chapter 8 Multiple-views analysis approach applied to IRC discourses

are presented in different windows of mIRC, which comprise the debug window (D), thestatus window (S), and the chat window (C). The arrows indicate the flow direction of themessages: from client to server (“->”) or from server to client (“<-”). More details aboutextension specifications to RFC 1459 are described in the IRCv3 specification [IRC17].A list of different replies is supplied in Butcher [But05] and Merlin [Mer12]. Table 8.7illustrates, with the help of the DEBUG command, the communication between client andserver.

Table 8.7: Message flow between client and server with client outputWindow

Line D S C Log

1 * Connecting to chat.freenode.net (6665)2 -> chat.freenode.net CAP LS3 -> chat.freenode.net NICK RobiX4 -> chat.freenode.net USER robix 0 * :Robert5 <- :leguin.freenode.net NOTICE * :*** Looking up your hostname...6 -leguin.freenode.net- *** Looking up your hostname...7 <- :leguin.freenode.net NOTICE * :*** Checking Ident8 -leguin.freenode.net- *** Checking Ident9 <- :leguin.freenode.net NOTICE * :*** Found your hostname

10 -leguin.freenode.net- *** Found your hostname11 <- :leguin.freenode.net NOTICE * :*** No Ident response12 -leguin.freenode.net- *** No Ident response13 <- :leguin.freenode.net CAP * LS :account-notify extended-join identify-msg multi-p14 -> chat.freenode.net CAP REQ :multi-prefix15 <- :leguin.freenode.net CAP RobiX ACK :multi-prefix16 -> chat.freenode.net CAP END17 <- :leguin.freenode.net 001 RobiX :Welcome to the freenode Internet Relay Chat Net18 Welcome to the freenode Internet Relay Chat Network RobiX19 -> leguin.freenode.net USERHOST RobiX20 <- :leguin.freenode.net 002 RobiX :Your host is leguin.freenode.net[130.239.18.172/6621 Your host is leguin.freenode.net[130.239.18.172/6665], running version ircd-seven-1.122 <- :leguin.freenode.net 003 RobiX :This server was created Sun Dec 4 2011 at 14:42:3023 This server was created Sun Dec 4 2011 at 14:42:30 CET24 <- :leguin.freenode.net 004 RobiX leguin.freenode.net ircd-seven-1.1.3 DOQRSZaghi25 leguin.freenode.net ircd-seven-1.1.3 DOQRSZaghilopswz CFILMPQbcefgijklmnopq26 <- :leguin.freenode.net 005 RobiX CHANTYPES=# EXCEPTS INVEX CHANMODES27 CHANTYPES=# EXCEPTS INVEX CHANMODES=eIbq,k,flj,CFLMPQcgimnprstz C28 <- :leguin.freenode.net 005 RobiX CASEMAPPING=rfc1459 CHARSET=ascii NICKL29 CASEMAPPING=rfc1459 CHARSET=ascii NICKLEN=16 CHANNELLEN=50 TOPI30 <- :leguin.freenode.net 005 RobiX EXTBAN=$,arx WHOX CLIENTVER=3.0 SAFELI31 EXTBAN=$,arx WHOX CLIENTVER=3.0 SAFELIST ELIST=CTU are supported by t32 <- :leguin.freenode.net 251 RobiX :There are 235 users and 76849 invisible on 34 serve33 There are 235 users and 76849 invisible on 34 servers34 <- :leguin.freenode.net 252 RobiX 48 :IRC Operators online35 48 IRC Operators online

continued on the next page

110

8.4 Views

WindowLine D S C Log

36 <- :leguin.freenode.net 253 RobiX 6 :unknown connection(s)37 6 unknown connection(s)38 <- :leguin.freenode.net 254 RobiX 39045 :channels formed39 39045 channels formed40 <- :leguin.freenode.net 255 RobiX :I have 4532 clients and 1 servers41 I have 4532 clients and 1 servers42 <- :leguin.freenode.net 265 RobiX 4532 7692 :Current local users 4532, max 769243 Current local users: 4532 Max: 769244 <- :leguin.freenode.net 266 RobiX 77084 83501 :Current global users 77084, max 8350145 Current global users: 77084 Max: 8350146 <- :leguin.freenode.net 250 RobiX :Highest connection count: 7693 (7692 clients) (391047 Highest connection count: 7693 (7692 clients) (391019 connections received)48 <- :leguin.freenode.net 375 RobiX :- leguin.freenode.net Message of the Day -49 Message of the Day, leguin.freenode.net50 <- :leguin.freenode.net 376 RobiX :End of /MOTD command.51 End of /MOTD command.52 <- :RobiX MODE RobiX :+i53 * RobiX sets mode: +i54 <- :NickServ!NickServ@services. NOTICE RobiX :This nickname is registered. Please55 -NickServ- This nickname is registered. Please choose a different nickname, or identi56 <- :leguin.freenode.net 302 RobiX :RobiX=+˜[email protected] Local host: PC (193.171.37.63)58 -> leguin.freenode.net JOIN #freenode59 <- :RobiX!˜[email protected] JOIN #freenode60 * Now talking in #freenode61 -> leguin.freenode.net MODE #freenode62 <- :leguin.freenode.net 332 RobiX #freenode :Welcome to #freenode | Staff are voiced63 * Topic is ’Welcome to #freenode | Staff are voiced, some may also be on /stats p -- fe64 <- :leguin.freenode.net 333 RobiX #freenode dax 134328604865 * Set by dax on Thu Jul 26 09:00:4866 <- :leguin.freenode.net 353 RobiX = #freenode :RobiX kish kgs1992 Trenak Javacat Th67 <- :leguin.freenode.net 353 RobiX = #freenode :+marienz kode54 iosctr ‘eric Cruelty +68 <- :leguin.freenode.net 366 RobiX #freenode :End of /NAMES list.69 <- :leguin.freenode.net 324 RobiX #freenode +CLPcntjf 5:10 #freenode-unreg70 <- :leguin.freenode.net 329 RobiX #freenode 98176058471 <- :ChanServ!ChanServ@services. NOTICE RobiX :[#freenode] Welcome to #freenode72 -ChanServ- [#freenode] Welcome to #freenode. All network staff are voiced in here, b73 <- :services. 328 RobiX #freenode :http://freenode.net/74 <- :hellekin!˜hellekin@lorea/faerie PRIVMSG #freenode :erry: is there a way for me t75 <hellekin> erry: is there a way for me to display the current list of cloaks for my gro76 <- :erry!erry@freenode/staff/erry PRIVMSG #freenode :hellekin, sure, i’ll pm it to yo77 <+erry> hellekin, sure, i’ll pm it to you78 <- :Stracci!˜Stracci@mcbans/administration/freenode.sponsor.stracci NICK :MCBSt79 * Stracci is now known as MCBStracci|Away80 <- PING :leguin.freenode.net

continued on the next page

111

Chapter 8 Multiple-views analysis approach applied to IRC discourses

WindowLine D S C Log

81 -> leguin.freenode.net PONG :leguin.freenode.net82 <- :codemaniac!˜arijit@fedora/codemaniac QUIT :Quit: Leaving83 * codemaniac (˜arijit@fedora/codemaniac) Quit (Quit: Leaving.)84 <- :opieng!˜[email protected] JOIN #freenode85 * opieng (˜[email protected]) has joined #freenode

In Table 8.7, a random or round-robin server (chat.freenode.net) connects to the IRCnetwork freenode on port 6665 (line 1). The CAP LS command (line 2) lists all clientcapabilities supported by the server (line 13) [ML05]. The NICK command sets thenickname (line 3) and the USER command, the login (line 4). The connection is successfuland the server responds (lines 5 to 12). No ident daemon [SJ93], which responds to requestson port 113, is found. The CAP REQ command requests the multi-prefix capability (line14). In line 15, the request is successful. The CAP END command (line 16) “signals to theserver that capability negotiation is complete and requests that the server continue withclient registration. If the client is already registered, this command MUST be ignored bythe server” [ML05]. In line 17, a welcome message is sent after client registration. Theraw number 001 is named RPL_WELCOME. After the welcome message, the USERHOSTcommand (line 19) returns the name of the connected server, the server with IP addressand port, the version of IRC server (RPL_YOURHOST, line 20), the date and time whenthe server was created or last restarted (RPL_CREATED, line 22), and the name andversion of the connected server with the various user/channel modes the server supports(RPL_MYINFO, line 24).

The following lines show further parameters used in the “005 numeric” (RPL_ISUPPORT).An overview of all the parameters is presented in Brocklesby [Bro04] and Roeckx [Roe09].For example, the parameters are CHANTYPES (supported channel prefixes), CHAN-MODES (channel modes) (line 26), CASEMAPPING (used for nickname and channelname comparing), NICKLEN (maximum nickname length), CHANNELLEN (maximumchannel name length), TOPICLEN (maximum topic length) (line 28), and EXTBAN (ex-tended bans) (line 30).

Several raw numerics are displayed in the messages below:

• RPL_LUSERCLIENT (raw number 251): the number of (non-)invisible users cur-rently online for the entire network (line 32);

• RPL_LUSEROP (252): the number of IRC operators online (line 34);

• RPL_LUSERUNKNOWN (253): the number of unknown connections (line 36);

• RPL_LUSERCHANNELS (254): the number of channels currently formed (line 38);

• RPL_LUSERME (255): the number of users connected to this server and the numberof other servers connected to this server (line 40);

• RPL_LOCALUSERS (265): the number of users connected to this server and thehighest number of users ever connected to this server (line 42);

112

8.4 Views

• RPL_GLOBALUSERS (266): the number of users connected to the entire networkand the highest number of users ever connected to the entire network (line 44);

• RPL_STATSCONN (250): the highest total number of connections/client connections(line 46);

• RPL_MOTDSTART (375): start of an RPL_MOTD list (line 48); and

• RPL_ENDOFMOTD (376): termination of an RPL_MOTD list (line 50).

The MODE command with parameter “+i” makes users invisible to other “normal” users(line 52). The nick <RobiX> is registered by someone (line 54). If it is one’s own nickname,it is possible to sign in by typing “/msg nickserv identify password”. Line 56 is returnedin reply to a USERHOST request (see line 19). After sending the JOIN command (line 58),the user joins the channel (line 59). Further raw numerics are displayed:

• RPL_TOPIC (332): channel name with the current topic (line 62);

• RPL_TOPICWHOTIME (333): channel name, user who set the last topic, and theuser who set the current channel topic (line 64);

• RPL_NAMREPLY (353): channel with the nicknames of joined users (lines 66 and67);

• RPL_ENDOFNAMES (366): termination of an RPL_NAMREPLY list (line 68);

• RPL_CHANNELMODEIS (324): channel name with the current modes (line 69 isthe reply to line 61);

• RPL_CREATIONTIME (329): channel with the time it was created (line 70); and

• RPL_CHANNEL_URL (328): channel with the ChanServ URL (line 73).

The NOTICE command sends a private welcome message (line 71). Now the user is ableto chat with other users. Two users (<hellekin> and <+erry>) send public messages tothe channel (lines 74 and 76). In line 78, the user <Stracci> goes away and renames theused nick to <MCBStracci|Away>. The server sends a PING message at regular intervalsto test the presence of an active client (line 80). The PONG message is the client’s reply tothe PING message (line 81). In line 82, the user <codemaniac> quits the channel with aquit message. The user <opieng> joins the channel in line 84.

8.4.2 ContextIRC is technically available (nearly) everywhere in the world. Communicators withvarious languages and cultures exchange messages in IRC. Therefore, this knowledgegreatly enhances understanding between people. For example, a thumbs-up may mean“Congratulations!” in the United States but in Iran it is a rude, offensive gesture (see LogExample 6, line 1) [AT09; Pea+11]. Other types of communication contexts are extractedfrom discourse. These are historical (line 2), physical (lines 3 and 4), psychological (line 5),and social (lines 6 and 7).

113

Chapter 8 Multiple-views analysis approach applied to IRC discourses

Log Example 61 <suzie62> thumbs up to liking pizza?2 <mputtr> as i said3 <FailPowah> dioz; my computer case got a really bright blue diod setup at the power button.. lights

<FailPowah> up my room like an xmas tree on the nigvht D4 <DWSR> I have a mid and full tower in my room.5 <oscarˆpepper> Thank you! I feel better today. :)6 <ziggo> marie is my friend.. ( I hope )7 <Mariettaa_> ziggo yes we are friends:-)

8.4.3 Communication barrier

There are some barriers to effective communication exist in IRC. Different causes ofcommunication barriers are extracted in Log Example 7. These are the sender’s lack ofknowledge of the English language (line 1), selecting the wrong receiver by tab-completion(lines 2 and 3), unintentionally closing the IRC client’s window (lines 4 to 6), not seeingmessages from ignored users (lines 7 to 13), and difficulty in following the conversationbecause messages quickly scroll up the screen with new lines (lines 14 to 18).

Log Example 71 <Mumble> Let me understand your broken english first.2 <nonix4> adaptr: WELP, I need somebody crazy enough to run my kernel code. SnB E3-12xx req’d.3 <nonix4> "Adie:" I meant, damn tab-completion4 * gogo4 (˜[email protected]) Quit (Quit: Leaving)5 * gogo4 (˜[email protected]) has joined #freenode6 <gogo4> i made a mistake and closed xchat. is there any way to recover the lost dialogs/pm’s i had

<gogo4> going on?7 <afk4life> impˆˆ. tell DreamyGirl to choke on a c o ck8 <impˆˆ> dreamy i dont know if u should do what he said!9 <DreamyGirl> [impˆˆ] who said?

10 <DreamyGirl> o.o11 <impˆˆ> scarface12 <impˆˆ> afk4life = scarface13 <DreamyGirl> [impˆˆ] ohh i have added him to ignore list ..so icant see14 <buddy11> theres alot goin on here cant see everything15 <Lexi> keep up buddy ;)16 <buddy11> im trying to keep up17 <Chev1965> room not moving to fast18 <buddy11> fast enough for me i suck at this

8.4.4 Date/time

The messages of Log Example 8 are logged with LogBot. They include all timestamps.Additionally, date- and time-related information such as parts of the day or time zone canbe extracted within the written messages (e.g., lines 1, 10, and 13). mIRC can also log thetimestamp.

114

8.4 Views

Log Example 81 [14:04:06:588]<tadzik> goodmorning2 [14:04:39:188]<masak> tadzik: good afternoon :P3 [14:12:46:892]<masak> tadzik: Sun Jan 15 14:12:32 CET 2012 :P4 [14:14:09:651]<tadzik> I just woke up, so it’s morning5 [14:14:42:861]<tadzik> and I don’t accept any corrections. Even the sun is still shining :)6 [18:44:38:640]<Ron--> UriR where are you from?7 [18:45:01:742]<UriR> israel8 [18:45:29:697]<Ron--> isnt it like 3am there?9 [18:45:34:500]<UriR> nop

10 [18:45:38:430]<UriR> its 19:4511 [18:45:39:395]<Xd1358> It’s 8 pm?12 [18:45:41:314]<Xd1358> yay13 [18:45:44:512]<Xd1358> utc+2 ftw!

64.36% of the messages are automatically logged with LogBot within the time range from00:00:00:000 to 03:59:59:999 (18.34%) and from 15:00:00:000 to 23:59:59:999 (46.02%) CET(Central European Time). Maybe this is the best time for logging English channels. Thetimestamps are extracted from the log files (format [hh:mm:ss:ms]) and cut off (format[hh]). The complete log files of the IrCQ-Net network from January 15, 2012 are notincluded because the log bot was disconnected during the day. In summary, 13148messages are excluded (5904 messages from #English and 7244 from #Romance). Table 8.8shows an overview of the IRC messages produced per hour. A more detailed overviewfor each channel is presented in Table C.1.

Table 8.8: Detailed hourly usage (CET)Frequency Frequency

Hour Abs. Rel. Hour Abs. Rel.

0 15172 5.51% 12 8248 3.00%1 12960 4.71% 13 9431 3.42%2 11000 3.99% 14 9101 3.31%3 11375 4.13% 15 11711 4.25%4 10504 3.81% 16 12051 4.38%5 9996 3.63% 17 12765 4.64%6 9375 3.40% 18 13605 4.94%7 8431 3.06% 19 13907 5.05%8 8102 2.94% 20 14634 5.31%9 7889 2.86% 21 16494 5.99%

10 8370 3.04% 22 15745 5.72%11 8679 3.15% 23 15814 5.74%

∑ 275359 100.00%

115

Chapter 8 Multiple-views analysis approach applied to IRC discourses

Figure 8.1: Hourly usage (CET)

Figure 8.1 visualizes the frequencies per hour of all logged channels. The average timegap between two chat messages is around 16 seconds (i.e., a message is sent within thelogged channels approximately every 16 seconds). Between 0 and 1 o’clock CET the timegap is only 10.3 seconds (Figure 8.2).

Figure 8.2: Average time gap between messages

8.4.5 Hardware and network

Log Example 9 shows some message extracts. While connecting to IRC, the server addressand the name of the IRC network are displayed (line 1). In line 2, a user writes a URL,while JOIN and QUIT messages in lines 3 and 4 include the users’ IP addresses. A cloakon freenode (“unaffiliated/”) and on SwiftIRC (MD5 hashes) are shown in lines 5 and 6.In lines 7 to 9, users reveal information about their hardware.

Log Example 91 -kornbluth.freenode.net- *** Looking up your hostname...2 <Boohbah> http://www.google.com/search?q=lala+teletubby3 * sythe (˜[email protected]) has joined #defocus4 * Guest52337 (˜[email protected]) Quit (Changing host)5 * kcj (˜casey@unaffiliated/kcj1993) Quit (Ping timeout: 240 seconds)6 * SilkThong (˜[email protected]) has joined #irchelp7 <Burninate> I can playback 720p4 on my athlon XP 2.4ghz8 <Timslin> ive installed an internel card reader9 <tandoori> my programs are installed on a mechanical HDD

In Log Example 10, a delay of the IRC network between two clients is presented in line 1.A netsplit is detected in lines 2 and 3. The names of the two servers indicate the brokenlinks, for example, “(*.net *.split)” or “(*.SwiftIRC.net *.SwiftIRC.net)”.

116

8.4 Views

Log Example 101 [al3x PING reply]: 1sec2 * al3x (˜al3x@unaffiliated/al3x) Quit (*.net *.split)3 * element ([email protected]) Quit (*.SwiftIRC.net *.SwiftIRC.net)

8.4.6 SoftwareLog Example 11 presents some information about the software used. In the first two lines,IRC daemons are mentioned, followed by two lines that include the IRC channel, theuser’s current operating system (line 5), and two well-known IRC clients (line 6).

Log Example 111 Your host is leguin.freenode.net[130.239.18.172/6665, running version ircd-seven-1.1.32 <KindOne> freenode runs on Atheme, so i don’t think anyone where would have a clue3 * SparkE ([email protected]) has joined #Romance4 <quasiˆ> but me and #romance has run its course. be excellent to each other5 <Ownage> I’m running windows 7 64-bit pro. I was previously running 2008r26 <denki> You mirc user or xchat?

Table 8.9 shows the minimum, maximum, and average user (identifier) counts of selectedIRC channels during logging.

Table 8.9: User counts11.01.2012 12.01.2012 13.01.2012 14.01.2012 15.01.2012

Channel Min. Max. Min. Max. Min. Max. Min. Max. Min. Max. Ø

#cars 11 13 10 15 10 14 11 13 11 12 12.27#defocus 481 528 466 523 475 523 481 533 498 530 503.94#england 63 77 58 76 61 73 61 76 64 76 68.17#English 67 238 58 225 64 233 85 253 61 232 159.12#freenode 722 792 722 782 726 783 625 790 608 668 731.38##hardware 339 388 335 378 335 385 352 389 350 379 361.99#irchelp 150 170 140 184 141 169 141 176 151 170 158.50#music 44 58 48 62 53 63 56 70 58 69 57.74#perl6 187 206 188 207 189 207 192 209 189 207 199.39#Romance 68 171 79 170 81 164 88 169 87 156 119.28#soccer 71 85 70 83 41 83 70 79 68 86 74.85

8.4.7 Communicator23136 unique nicknames take part in the logged conversations. 212 users are kickedfour times: <ENGLISCHER> and <NoDdy->. 30% of all users are active and writeat least one PRIVMSG command. 70% are passive contributors (i.e., lurkers) who lie inwait [Kal07]. They only leave traces produced by JOIN, KICK, MODE, NICK, NOTICE,PART, QUIT, or TOPIC commands. The chat room participant <ˆjenbunnyˆ> is the mostactive with 3.90% of all messages in #Romance and 0.98% in summary are created by thisnickname.

117

Chapter 8 Multiple-views analysis approach applied to IRC discourses

In Log Example 12, users join (lines 2 and 6), leave (line 8) or quit the channels (line3), write messages (lines 4, 5, 7, 9, and 10), get kicked out (line 12), or change nicks toindicate their current status (line 11). These actions show or change the availability of eachcommunicator. It is necessary to update current logged-in user lists either in general orfor specific channels. The first log line contains no nickname of any sender. This line isdisplayed only on the screen of a user who has joined the channel (in this case it is theauthor’s nick <RobiX>). All other users see a JOIN message.

Log Example 121 * Now talking in #defocus2 * Nazca (˜[email protected]) has joined #defocus3 * Nazca (˜[email protected]) Quit (Changing host)4 <ThePeer> would anyone be able to recommend a laptop with about 2GB RAM, 120HDD?5 <Ttech> ThePeer, That should be fine, that is what my current laptop is.6 * Nazca (˜Nazca@atheme/member/nazca) has joined #defocus7 <marienz> ThePeer: you’ll probably be fine, depending on what you run on it.8 * tomprince (˜[email protected]) has left #defocus9 <ThePeer> Ttech: what type of laptop, if i may ask

10 <Ttech> A Dell Inspiron11 * ThePeer is now known as ThePeer|Away12 * irqq was kicked from #defocus by Ttech

The visualization of Log Example 12 is shown in Table 8.10. In each table cell, results ofthe three versions are displayed to show the differences between them. Certain notionalassumptions are made, especially for the second and third versions. These include theassumption that the current statuses of the users <MrElendig> and <Nazca> are checkedwith IRC commands. The statuses of user <Anon> are not checked because the user is ininvisible mode (MODE +i). The invisible mode prevents normal users from finding thiscommunicator, e.g., through the NAMES and WHO commands.

Table 8.10: Visualization of Log Example 12Line

Nick 1 2 3 4 5 6 7 8 9 10 11 12

<Anon><irqq><marienz><MrElendig><Nazca><RobiX><ThePeer><tomprince><Ttech>

118

8.4 Views

In Log Example 13, information about users can be extracted, such as real name (lines 1, 2and 5), age (lines 1 and 3), ethnic group or citizenship (lines 2 and 3), sex (lines 2 and 3), ordislike (line 4).

Log Example 131 <paul_507> im 192 <friends> hi am egyption man my name is sameh and u ??3 <nattie> 26 f uk4 <MoarCowbell> I hate mowing lawns5 erry is erry@freenode/staff/erry * Errietta Kostala (http://errietta.me/)6 erry on +#freenode7 erry using rajaniemi.freenode.net Helsinki, FI, EU8 erry is using a secure connection9 erry has been idle 9secs, signed on Thu Jan 12 23:36:59

10 erry is logged in as erry11 erry End of /WHOIS list.

8.4.8 RelationIn 15% of all cases, the next turn (i.e., message) is written by the same sender. The directaddressing form “<nick>: ” (exactly written nickname and colon in the beginning)is used in 6.49% of all 184177 user messages (i.e., 11958 times). The nicks <niecza>(302 times), <roast> (291), and <KinG`PiN> (261) are the targets specified most often(Table 8.11). In summary, 1384 unique nicknames are used in the direct addressing form“<nick>: ” (netiquette) to address communicators. That means that these nicks are writtenexactly including same-case sensitivities. This type constitutes only 5.98% of all uniquenicknames.

Table 8.11: Top 10 exactly written nicks in discourse (netiquette)Frequency

Rank Nickname Abs. Rel.

1 <niecza> 302 2.53%2 <roast> 291 2.43%3 <KinG`PiN> 261 2.18%4 <nom> 205 1.71%5 <markings_> 197 1.65%6 <Dax> 168 1.40%7 <Ttech> 160 1.34%8 <orbii> 148 1.24%9 <redcheckers> 141 1.18%

10 <Zuu> 132 1.10%

∑ 2005 16.77%11958 100.00%

4,402 unique directed sender-receiver relations are found (36.81% of all messages withexact direct addressing form). The directed relation <dalek> → <roast> is the mostcommon (see Table 8.12).

119

Chapter 8 Multiple-views analysis approach applied to IRC discourses

Table 8.12: Top 10 directed sender-receiver relationsFrequency

Rank Relation (directed) Abs. Rel.

1 <dalek>→ <roast> 291 2.43%2 <dalek>→ <niecza> 218 1.82%3 <redcheckers>→ <KinG`PiN> 146 1.22%4 <redcheckers>→ <Dany> 96 0.80%5 <dalek>→ <zavolaj> 75 0.63%6 <redcheckers>→ <orbii> 61 0.51%7 <markings_>→ <juanperez> 58 0.49%8 <markings_>→ <Zuu> 54 0.45%9 <redcheckers>→ <The_Phoenix> 52 0.43%

10 <juanperez>→ <markings_> 50 0.42%

∑ 1101 9.21%11958 100.00%

Here, 3,570 undirected sender-receiver relations are extracted in summary. Table 8.13shows the top 10 undirected sender-receiver relations. The most active user <ˆjenbunnyˆ>uses no direct addressing.

Table 8.13: Top 10 undirected sender-receiver relationsFrequency

Rank Relation (undirected) Abs. Rel.

1 <dalek> — <roast> 291 2.43%2 <dalek> — <niecza> 218 1.82%3 <KinG`PiN> — <redcheckers> 158 1.32%4 <juanperez> — <markings_> 108 0.90%5 <Dany> — <redcheckers> 99 0.83%6 <orbii> — <redcheckers> 87 0.73%7 <redcheckers> — <The_Phoenix> 81 0.68%8 <markings_> — <Zuu> 79 0.66%9 <dalek> — <zavolaj> 75 0.63%

10 <jnthn> — <moritz> 62 0.52%

∑ 1258 10.52%11958 100.00%

Detection of nicknames in discourse

In Log Example 14, senders’ nicknames are visualized by the green color, and receivers’nicks by the gold color.

120

8.4 Views

Log Example 14Line mIRC command Message

1 - <GameShark> hmmm need some support2 <moritz> nom: say chr 923 ME <Denise-> hiya Marika :)4 * maslen is more into programming these days.5 * Ozztronomy waves to zoei from his corner6 MSG *hisp* hi...how are you?7 *ROUGH-RIDER* heyyy jamietech

Conversation threading

In Log Example 15, five users communicate in the discourse fragment: <berban>,<markings_>, <PerfM>, <pur|>, and <Zuu>. <pur|> greets everybody in the channel#defocus (line or turn 3). Mutton [Mut04b] notes that “[a] message without explicit directaddressing is either targeted to everybody ... or to an individual user”. It is the firstpair part of a greeting-greeting adjacency pair. <pur|> receives two greetings back fromother users (second pair part; lines 5 and 6). <markings_> and <berban> are talkingabout<berban>’s name (lines 1 and 4). Their conversation (Thread 1 in Table 8.14) isdisrupted by turns belonging to other conversation threads (e.g., lines 13 and 14). Addi-tionally, an automatically generated system message also interrupts ongoing exchanges(JOIN message in line 7) [Her13]. System messages are not part of any conversationthreads.

Log Example 151 <berban> markings_: it’s actually my name2 <markings_> Zuu: I have ton to do anyway, only if I’d end procrastination3 <pur|> hello!4 <markings_> berban: I believe you, sure5 <berban> hello pur|6 * markings_ pets pur|7 * Gershwin (˜Donkey@unaffiliated/gershwin) has joined #defocus8 <Zuu> markings_: ah yes, the procrastination... i have that disease too9 <Zuu> right at this moment for instance :P

10 <berban> markings_: what are you procrastinating on11 <berban> I can help12 <berban> I need something to do13 <Zuu> berban: you can help me too!14 * PerfM gives Zuu a cookie15 <berban> what are you working on16 <markings_> berban: My projects at http://oasis.bombshellz.net/redmine17 <markings_> I should get a shorter URL18 <berban> I hate bitlys

Table 8.14 contains eight threads after rearrangement of the messages, i.e., messagesbetween the same two communicators are arranged together in vertical columns (threads).The maximum number of threads is 10 because of 5·(5−1)

2 . For example, turns 1, 4, 10, 11,12, 15, 16, 17, and 18 belong to the first thread, while turns 3 and 5 belong to the second

121

Chapter 8 Multiple-views analysis approach applied to IRC discourses

thread. Threads 1, 2, 4, and 5 are two-way relations (users communicate in both directions).The other threads are one-way relations (only one direction) in this discourse example.There are no directions (threads) between <berban> = <PerfM> and <markings_> =<PerfM>. Arrows with a golden background color mark directly addressed messages(otherwise a white background color is used).

Table 8.14: Rearrangement of the messages into conversation threadsConversation thread

1 2 3 4 5 6 7 8

Line <be

rban

>

<m

arki

ngs_>

<be

rban

>

<pu

r|>

<be

rban

>

<Z

uu>

<m

arki

ngs_>

<pu

r|>

<m

arki

ngs_>

<Z

uu>

<Pe

rfM>

<pu

r|>

<Pe

rfM>

<Z

uu>

<pu

r|>

<Z

uu>

1 →2 →3 ← ← ← →4 ←5 →6 →78 ←9 ←

10 →11 →12 →13 ←14 →15 →16 ←17 ←18 →

8.4.9 Topic

The stored topic in the current channel is displayed after entering (see Log Example 16,lines 1 to 3). In line 4, a channel operator changes the topic on #england with the TOPICcommand. A topic boundary within discourse is found between lines 5 and 6. A newunrelated topic (lines 7 to 11) is indicated by phrases such as “by the way” in lines 7 to 11.Also, an abrupt topic shift is shown in lines 12 to 15.

122

8.4 Views

Log Example 161 * Now talking in #freenode2 * Topic is ’Welcome to #freenode | Staff are voiced, some may also be on /stats p -- feel free to /msg

staff at any time | Channel guidelines: http://freenode.net/poundfreenode.shtml3 * Set by dax on Thu Jul 26 09:00:484 * GanjaMan changes topic to ’Welcome to the Official Big Brother #England Channel | No advertising5 <orbii> religious conversation is off-topic in #defocus6 <jsoft> Has a cd drive, usb ports7 <Marietta_> KBS by the way where are u from?:)8 <Boohbah> and another thing: how can i use my unaffiliated cloak instead of my shell provider cloak?9 <Inna_> let’s talk abt something interesting

10 <meggg> what about egypt11 <wicked_lamb> ok let’s talk about you, Inna_12 * ziggo ([email protected]) Quit (XMLSocket Connection closed)13 * ziggo ([email protected]) has joined #English14 <ziggo> I keep on pressing the wrong buttons15 <ziggo> I love google maps

In Table 8.15, all messages of Log Example 15 are clustered into their specific topics. Forexample, the first topic thread consists only of messages about <berban>’s name.

Table 8.15: Rearrangement of the messages into topic threadsTopic thread

Line 1 2 3 4 5 6 7

1 3 7 7 7 7 7 7

2 7 3 7 7 7 7 7

3 7 7 3 7 7 7 7

4 3 7 7 7 7 7 7

5 7 7 3 7 7 7 7

6 7 7 3 7 7 7 7

7 7 7 7 3 7 7 7

8 7 3 7 7 7 7 7

9 7 3 7 7 7 7 7

10 7 3 7 7 7 7 7

11 7 7 7 7 3 7 7

12 7 7 7 7 3 7 7

13 7 7 7 7 3 7 7

14 7 7 7 7 7 3 7

15 7 3 7 7 7 7 7

16 7 3 7 7 7 7 7

17 7 7 7 7 7 7 3

18 7 7 7 7 7 7 3

The topics of each thread are presented in Table 8.16.

123

Chapter 8 Multiple-views analysis approach applied to IRC discourses

Table 8.16: Topics of the threadsThread Attribute Value Topic

1 communicator→ identifier→ real name→first name <berban> <berban>’s name2 communicator→ social status→ employment work work and procrastination

communicator→ behavior→ habit procrastination

3 communicator→ behavior→ habit greeting greeting4 software→ server→ chat→ channel→ name #defocus system message in channel

message→ creator→ type system #defocus5 communicator→ behavior→ behavior prosocial helping hand offering and needing

helping hand6 emotion→ type pleasure giving comfort7 hardware/network→ network→URL URL URL-shortening and services

8.4.10 Emotion

Table 8.17 shows the occurrence of the most frequently used emoticons in IRC channel#irchelp. A vague description of their basic meaning is also given. It is notable that all ofthe top 10 forms are created without noses. In general, 349 emoticons are found in Dataset2b (5.47%). These are 77 different emoticons, such as “=]”, “(:”, “;]”, “ˆ-ˆ”, “:Þ”, “=)”, “>.<”,“:]”, “:˜()”, and “:<”.

Table 8.17: Top 10 emoticons in channel #helpFrequency

Rank Emoticon Abs. Rel. Description

1 :P 38 10.89% showing a tongue protruding from one’s face [Urb14n]2 :) 34 9.74% a smiling face [Urb14a]3 :D 17 4.87% very happy smiley face [Urb14g]

ˆ 17 4.87% a symbol that announces sarcasm [Urb14b]5 :p 15 4.30% see “:P”6 :( 14 4.01% an unhappy or sad face [Urb14c]

:o 14 4.01% act/face of being surprised [Urb14l]8 :/ 9 2.58% an agreeable response to a problematic situation [Urb14d]

<3 9 2.58% love or more literally “heart” [Urb14e]xD 9 2.58% a laughing face [Urb14o]

∑ 176 50.43%349 100.00%

8.4.11 Causality

In Log Example 17, lines 1 to 3 show a series of adjacency pairs (chaining). <Chat5808>needs to go to bed (the reason could be tiredness). Therefore in line 1, he/she saysgoodbye. <Flox> replies with the farewell “[good ]Night Chat[5808]”. <Chat5808>leaves the IRC network without a QUIT command (line 3). In this case, the farewell inline 1 affects another farewell (line 2). The effect of the QUIT command is (maybe) thatthe user shuts down the computer and goes to bed (line 3). In line 4, <FishFingers2000>

124

8.5 Attributes

writes an offensive remark. <falcon> ignores this user. A user floods the channel withfast nickname changes (lines 8 to 10) and gets kicked from #defocus (line 11).

Log Example 171 <Chat5808> i need to go to bed goodbye2 <Flox> Night Chat3 * Chat5808 ([email protected]) Quit (Connection closed)4 <FishFingers2000> SUCK MY DICK5 <LJ> Hey falcon..I think he wants you to suck his dick lmao6 <falcon> Oh really? Lol7 <falcon> Too bad i ignored him...8 [05:58:33:863] * AREYOUCRAZY-8 is now known as lalaeats699cocks9 [05:58:39:123] * lalaeats699cocks is now known as deunatacada

10 [05:59:17:832] * deunatacada is now known as withmycock70011 [06:00:03:021] * withmycock700 was kicked from #defocus by Ttech

8.4.12 EffectivenessUnderstanding is important for both effective communication and discourse analysis.In Log Example 18 (line 1), a guest user transmits an unintelligible message. AlthoughSamantha’s messages are clear and understandable (lines 2 to 4), she does not reach hergoal (lines 3 and 4) and quits (line 5). <itrekkie>’s question (line 6) is answered in line 8.He achieves what he wants. Therefore, <itrekkie> fully reaches his goal.

Log Example 181 <Guest_174> cfff2 <877AFL400> hello :)3 <877AFL400> write to me!4 <877AFL400> we can talk about everythink!5 * 877AFL400 ([email protected]) Quit (Connection closed)6 <itrekkie> Hi everyone-can anyone tell me if AIFF is lossless?7 <itrekkie> my goal is to go to ALAC, iTunes doesn’t like FLAC8 <TXRoadkill> yes

8.5 AttributesAs seen in the 12 log examples, attributes of all 12 views can be found within IRC dis-courses. The extracted attributes of the short Log Example 12 are presented in Table 8.18,and there is some room for interpretation. “120HDD” could mean “120 GB HDD” becauseHDDs are usually not 120 TB large at the moment. Maybe <ThePeer>’s intention is tobuy a laptop. A visualization of Log Example 12 is presented in Figures 8.3, 8.4, and 8.5.

Table 8.18: Attributes of each view (examples)View Line(s) Attribute Value(s)

CON 7 context→ type→ historical context→ reference laptop (“it”)HAW 2, 3 hardware/network→ component→ network→ IP address→ client 77.75.106.60

continued on the next page

125

Chapter 8 Multiple-views analysis approach applied to IRC discourses

View Line(s) Attribute Value(s)

4, 5, 7, 9, 10 hardware/network→ component→ type laptop4 hardware/network→ component→ type random-access memory

(RAM)4 hardware/network→ component→ storage→ capacity→ value 24 hardware/network→ component→ storage→ capacity→ unit GB4 hardware/network→ component→ type hard disk drive (HDD)4 hardware/network→ component→ storage→ capacity→ value 1208 hardware/network→ component→ network→domain→ name socrates.hocat.ca10 hardware/network→ component→manufacturer→ company→ name Dell10 hardware/network→ component→ name Dell Inspiron

SOW 1, 2, 6, 8, 12 software→CMC system→ channel→ name #defocus6 software→ component→ name Atheme IRC Services

COM 2, 3, 6 communicator→ identifier→ nickname→ virtual nickname Nazca4, 9, 11 communicator→ identifier→ nickname→ virtual nickname ThePeer5, 10, 12 communicator→ identifier→ nickname→ virtual nickname Ttech7 communicator→ identifier→ nickname→ virtual nickname marienz8 communicator→ identifier→ nickname→ virtual nickname tomprince

MES 1–12 message→ identifier→ log number 1–121–12 message→ language→ name English1, 2, 6 message→CMC system→ creator→ command JOIN3 message→CMC system→ creator→ command QUIT3 message→ content→ quit message Changing host4, 5, 7, 9, 10 message→CMC system→ creator→ command NORMAL4 message→ adjacency pair→ type question-answer4 message→ adjacency pair→ part number first5 message→ adjacency pair→ type question-answer5 message→ adjacency pair→ part number second7 message→ adjacency pair→ type question-answer7 message→ adjacency pair→ part number second8 message→CMC system→ creator→ command PART9 message→ adjacency pair→ type question-answer9 message→ adjacency pair→ part number first10 message→ adjacency pair→ type question-answer10 message→ adjacency pair→ part number second11 message→CMC system→ creator→ command NICK11 message→ content→NICK message ThePeer|Away12 message→CMC system→ creator→ command KICK11 message→ content→KICK message irqq

REL 5, 7 relation→discourse partner (communicator)→ identifier ThePeer9 relation→discourse partner (communicator)→ identifier Ttech12 relation→discourse partner (communicator)→ identifier irqq

TOP 4, 5, 7, 9, 10 topic→ name recommendation alaptop and opinion

EFF 7, 10 effectiveness→ goal→ achievement→ question yesanswered

126

8.6 Message visualization

8.6 Message visualizationmIRC visualizes messages with different colors depending on the IRC command, similarto LogBot.

8.6.1 Visualization with LogBotLogBot visualizes messages with different colors, which can be changed in the Java sourcefile “LogBot.java” or in the style sheet defined in the PHP file “header.inc.php”. Table 8.19,shows the Java functions with examples of colored messages.

Table 8.19: Colored message for each LogBot Java functionLogBot Java function Colored message

onAction * Amis hugs the teddyonDisconnect * DisconnectedonJoin * Gryyt (˜g@unaffiliated/gryyt) has joined #defocusonKick * Detergentizer was kicked from #defocus by daxonMessage <jjohns71> hi how do i mask my address?onMode * ChanServ sets mode +o tdubellzonNickChange * berban is now known as Guest88025onNotice -Nilfirith- hionPart * Merbo (Merbo@MerbosMagic/Founder/Merbo) has left #freenodeonPing [ffm PING]onPrivateMessage <- *chiller17* hionQuit * habmala (˜[email protected]) Quit (Quit: leaving)onTime [armon TIME]onTopic * dax changes topic to ’Welcome to #freenode’onVersion [JacobF VERSION]

LogBot uses the following default RGB colors, defined Java constants, and CSS class namesfor the Java functions shown in Table 8.20. These constants, also found in the discourses,are used to analyze and handle system-specific message templates (see Subsection 11.3.1).

Table 8.20: RGB colors and Java constants for LogBot Java functionsRGB color CSS class name Constant LogBot Java function(s)

000000 irc-black BLACK onMessage, onPrivateMessage00007b irc-navy NAVY onDisconnect, onQuit009200 irc-green GREEN onJoin, onKick, onMode, onNickChange, onPart, onTopic7b0000 irc-brown BROWN onNotice9c009c irc-brick BRICK onActionff0000 irc-red RED onPing, onTime, onVersion

8.6.2 Visualization with mIRCThe COLOR command changes the color settings for items in the colors dialog (shortcutAlt+K) [MB10]. The “mIRC Classic” color scheme is used to illustrate the colors of differentchat discourse extracts (see Table 8.21).

127

Chapter 8 Multiple-views analysis approach applied to IRC discourses

Table 8.21: Colored message for each mIRC itemmIRC item Colored message

action text * Amis hugs the teddyctcp text [Alice SEND] fileinfo text * Disconnectedinfo2 text Local host: PC (193.171.37.63)invite text * Merbo (Merbo@MerbosMagic/Founder/Merbo) invites you to join #Testjoin text * Gryyt (˜g@unaffiliated/gryyt) has joined #defocuskick text * Detergentizer was kicked from #defocus by daxmode text * ChanServ sets mode +o tdubellznick text * berban is now known as Guest88025normal text <jjohns71> hi how do i mask my address?notice text -Nilfirith- hinotify text * bazhang [˜bazhang@unaffiliated/bazhang] is on IRCpart text * Merbo (Merbo@MerbosMagic/Founder/Merbo) has left #freenodequit text * habmala (˜[email protected]) Quit (Quit: leaving)topic text * dax changes topic to ’Welcome to #freenode’wallops text !RobiX! Have Fun!whois text Corey is ˜Corey@freenode/staff/corey * Corey

The following default RGB (red, green, and blue) colors are predefined in the color schemeand used for the discourse messages in this thesis. They are represented as hexadecimalnumbers in Table 8.22.

Table 8.22: RGB colors for mIRC itemsRGB color mIRC item(s)

000000 normal text, whois text00007F info text, quit text009300 info2 text, invite text, join text, kick text, mode text, nick text, part text, topic text7F0000 notice text, wallops text9C009C action textFC7F00 notify textFF0000 ctcp text

8.7 VisualizationThe discourse of Log Example 12 is presented in three figures. Figure 8.3 is the mostdetailed, and includes information such as the sequence number for each log line. Thispresentation form is useful for short discourses. The other two figures are more compact.The legend is shown in Table 8.23.

128

8.7 Visualization

Figure 8.3: Visualization as an extended sociogram: Version 1

In Figure 8.4, indirect message flows (questions to everyone, e.g., Log Example 12, line 4)and direct ones (direct addressing, e.g., line 9) are summarized.

Figure 8.4: Visualization as an extended sociogram: Version 2

An undirected graph with message frequencies is used in Figure 8.5. Nevertheless, disad-vantages occur when sociograms grow in size. Mutton [Mut04b] notes that “the diagrambecomes more complicated, with an increasing number of edges”.

Figure 8.5: Visualization as an extended sociogram: Version 3

129

Chapter 8 Multiple-views analysis approach applied to IRC discourses

The legend for all three figures is presented in Table 8.23.

Table 8.23: Legend of an extended sociogram visualizationLegend (Example)View Element Visualization

CON contextBAR communication barrierDAT date/timeHAW network, serverSOW channelCOM availability status connected/joined connected/not joined not connected

unknownsex male female unknownsender’s nickname

MES message sequence (e.g., log line 1)message type answer questionmessage frequency a) once twice (line thickness)

b) numbers (e.g., once)message directionmessage direct indirect

REL - (see “View COM” and “View MES”)TOP topic, summarizationEMO emotion intensity high medium low (brightness)

emotion valence positive neutral/none negative (message color)CAU effect answer questionEFF - (not visualized in this example)

8.8 Chapter summaryMultiple-views analysis for IRC discourses was carried out. The next chapter focuses onthe creation of IRC nicknames.

130

Part III

Automated discourse analysis with afocus on IRC

131

CHAPTER 9 Creation of IRC nicknames

The detection of communicators’ identifiers such as nicknames is an important steptowards finding sender-receiver relations in discourses. Shortened or creatively-changedforms of nicks within chat discourse, or messages without written receivers, cannot beimmediately linked to the original nickname. These relations are important for laterautomatic discourse analysis in order to know and understand who is chatting with whom(see Chapter 11). Therefore, it is necessary to find out how nicks are created and howthey are used in discourse (see Chapter 10). These following chapters focus mainly onthe views “View DAT” (analysis of timestamps, Subsection 8.4.4), “View COM” (IRCnicknames and nickname creation, Subsection 8.4.7), and “View REL” (nicknames writtenin IRC discourses, Subsection 8.4.8).

Nicknames can be viewed from various aspects such as psychology, sociology, or linguis-tics [BI95; Mor+79; Rei91]. Bechar-Israeli [BI95] defines a nick as “a name we receive inaddition to our legal name [which is] usually given to us by the people surrounding us”.The term nickname “was established as a variant of the Middle English noun eke-name”[Lak06]. It should be mentioned that not only are human participants hidden behindnicks, but also computer programs (e.g., chatbots) [Dör03].

In this chapter, approaches to the classification of chat nicknames and analyses of chatcommunication are presented. Nicknames provide a way of distinguishing between chatusers. For that reason, the significance of unique nicknames is important. Decision-makingcan be self-determined (e.g., requirements for the nick), other-determined (e.g., restrictionsby the IRC network), or both (e.g., story behind the nick). The following section will focuson the inventive way of nickname creation, starting with information on the general steps.

9.1 Information on the general stepsThe goal of this chapter is to understand how nicknames are created. Dataset 1 is usedfor qualitative analysis, while Datasets 1a (nicknames) and 1b (messages) are used forquantitative analysis (see Section 8.1).

9.1.1 Data collectionGelhausen [Gel08] divided frequently-searched chat terms into nine categories. Therefore,at least one channel per category is selected to find as many different nicknames as possiblebecause the topic of a channel influences the choice of nicknames [Lak06].

133

Chapter 9 Creation of IRC nicknames

Table 9.1: Used IRC networksNetwork Server address NICKLEN

DALnet jade.va.us.dal.net 30EFnet irc.servercentral.net 9freenode irc.freenode.net 16GameSurge irc.gamesurge.net 30IrCQ-Net irc.icq.com 15QuakeNet irc.quakenet.org 15SwiftIRC irc.swiftirc.net 30

A total of 7 different IRC networks and 13 different public IRC channels are used (seeTables 9.1 and 9.2). Additionally, the maximum nickname length (NICKLEN) is given.

Table 9.2: Selected channels for the 9 categoriesCategory Network Channel(s)

cars GameSurge #carscelebrities QuakeNet #eminemcountries, languages, cities IrCQ-Net #English

QuakeNet #englandconversation freenode #defocus, #freenode

DALnet #chat-worldSwiftIRC #talk

games, sports EFnet #soccerlove, relationship IrCQ-Net #Romancemusic QuakeNet #eminemreligion DALnet #churchtechnology, Internet freenode ##hardware, #perl

Most of the public conversations in our selected channels were logged from June 27, 2008to July 28, 2008. Because of logging problems, the logging of channel #eminem started onJune 28. The log function of the Java program LogBot is used, which remained silent andmerely observed the chat. The nicknames of the logging bots are named <JustMe01_>,<JustMe02_>, and so on. mIRC, the popular IRC client, is used to communicate with theparticipants. The users’ replies about the formation of their nicknames are quite helpful.mIRC automatically logged the author’s conversations (the nickname <RobiX> wasused).

9.1.2 Data extractionTable 9.3 shows the summaries of public messages (all logged messages; summary ofsystem and user messages), system messages (reported by the IRC server), user messages(written by users), and nicknames of the logged-in users, which were found in the data. Insummary, 2403777 public logged messages (Dataset 1; 803786 by system, 1599991 messageswritten by users) were analyzed. For quantity analysis, the log files of two public channels(#defocus, #freenode) from July 1, 2008 were used; with a total of 8937 messages (Dataset1b; 2958 by system, 5979 by users).

134

9.2 Typologies of chat nicknames

Table 9.3: Dataset 1Public messages System messages User messages Nicknames

Channel Qual. Quant. Qual. Quant. Qual. Quant. Qual. Quant.

#cars 9658 1398 8260 82 19 (25)#chat-world 313216 175707 137509 39079 2041 (3038)#church 10327 8821 1506 841 75 (101)#defocus 136454 5389 35441 1365 101013 4024 4118 330 (516)#eminem 2346 1380 966 96 5 (5)#england 49139 11307 37832 1275 16 (30)#English 700170 207826 492344 49547 1927 (2244)#freenode 88670 3548 38625 1593 50045 1955 6744 394 (630)##hardware 76988 16125 60863 1687 133 (220)#perl 139559 37768 101791 3909 339 (631)#Romance 683995 233328 450667 52201 2341 (2649)#soccer 20949 6265 14684 451 59 (92)#talk 172306 29795 142511 4032 257 (332)

∑ 2403777 8937 803786 2958 1599991 5979 164062 7936 (10513)

Approximately one third of all public messages are system messages. 164062 nicknames(150278 unique, 141898 case-insensitive) were extracted from the join/leave/quit messages,the nick change messages, and the senders of the messages. Shortened nicknames andvariants are extracted by reading the log files to compare the used nicknames in the chatdiscourse with those of the logged-in users. For quantity analysis, the 13 log files from July1, 2008 are used. 10513 nicks were logged in summary, 7936 analyzed in detail with users’feedback (Dataset 1a; 7420 unique, 7326 case-insensitive). 2577 nicks (24.51%) remainedunspecified, due to lack of user feedback.

9.1.3 AnalysisThis chapter examines logs of IRC interactions using discourse analysis, which combinesqualitative perspectives (to find similarities and differences) and quantitative perspectives(to generate statistics). The data were analyzed by hand and with some helping tools (seeNickDecompounder in Subsection 11.5.3). State-of-the-art techniques of natural languageprocessing such as parts of speech (POS) tagging and n-grams were applied. The analysesand results are described in the next sections.

9.2 Typologies of chat nicknamesBasically, nicknames are proper nouns [And07; Sto07]. Some further classifications ofnicknames can be found, but these works focus mainly on semantics. Although Lakaw[Lak06] points out that the topic of a channel has an impact on the creation of IRC nicks,the following studies deal with a small number of investigated nicknames or channels.

One of the pioneering studies regarding IRC nicknames was done by Bechar-Israeli [BI95].Some 260 nicknames from four different IRC channels were analyzed from the perspectiveof content analysis. The focus of this semantic topology is on the origin of these nicks. Theauthor identifies seven main categories: (1) people using their real name, (2) self-related

135

Chapter 9 Creation of IRC nicknames

names, (3) names related to medium, technology, and their nature, (4) names of flora,fauna, or objects, (5) play on words and sounds, (6) names related to figures in literature,films, fairytales, and famous people, and (7) names related to sex and provocation.

Another typology is given by Johnová [Joh04]. She analyzed nicks on a British chat sitewith four chat forums and 12 chat rooms. The author mentions that a nick can be, forinstance, a single word, a whole sentence, a combination of lower- and upper-case letters;it can include numbers, non-alphabetical symbols, or emoticons. She asserts that it is“difficult to predict which part of the nickname will be retained and which part will bedropped” to shorten it. And she adds, “Often several variations can be used ... with eachuser choosing their own variant”. Additionally, Johnová divides the most frequent typesof nicknames into several categories: (1) legitimate names of the user (and their variants),(2) short characterization of the user (can include user’s age, sex, location, and physical orcharacter description), (3) names of famous people or characters, and 4) animals, flowers,or objects.

This present chapter is also related to Stommel’s approach [Sto07]. She collected 83 nick-names from a German forum. The analysis distinguishes between names (proper nouns)and nouns (common nouns), which are sub-divided into six word types in nicknames: (1)commonly known names, (2) novel formations, (3) nouns and noun phrases, (4) adjectives,(5) verb forms, and (6) exclamations. However, the major differences between Stommel’sanalysis and the present chapter are that Stommel uses a German corpus and chunk tags(i.e., phrases) for nickname tagging. Another difference is that, compounded nicks (e.g.,<Estrella1981>) are not further decompounded into single parts of speech (proper noun“Estrella” and cardinal number “1981”).

9.3 Requirements for a “perfect” nickname

In a computing context, a nickname “creates the first impression of a user and is thereforethe first condition for successful communication” [Joh04]. It “must be chosen with careand [be] easy to use both for the speaker and for the listener” [BI95]. Nicknames canprovide users with anonymity and a touch of freedom to obtain a new identity [Gel98;Joh04; Rei91]. The art of nickname creation lies in picking an attractive, unique nick. Theattributes “sounds good” and “short” were mentioned during the chat discourse analysis.But nicks that look like other frequently-used terms (e.g., common words, Internet slang)in the discourse can easily be confused with them. An extract is presented in Log Example19.

Log Example 191 * lol ([email protected]) has joined #Romance2 <ˆZoe> he probably has cheese breath3 <Ky|e> lol oh

In the above discourse, the nick <lol> (line 1) can be confused with the common elementof the Internet slang “LOL” (line 3), which means “laugh (or laughing) out loud”. Similarnicks were <rofl> (“rolling on the floor laughing”), <roflmao> (“rolling on the floorlaughing my ass off”), or <asl> (“age/sex/location”).

136

9.4 The story behind the chosen nickname

9.4 The story behind the chosen nicknameWhen a chat nickname is chosen, there has always been some thought behind it (evenif a nick consists of common words or is a random sequence of letters). In the best case,close relationships between users and their chosen chat identity are built up. There can bea great, interesting story behind the nickname. The following are typical of the kind ofquestions that can be asked:

• What does your nick mean?

• Why did you choose it?

• What is the history, background, or story behind it?

These stories show how nicks are created or selected. The author would like to thank themany participants on IRC who shared with him the colorful stories of their nicks in onlineinterviews. Some extracts of the private messages are given below:

Log Example 201 <ahf> it’s my initials for my name2 <lnf_> LiNux Fan3 <nuba> Networked Ultimate Battle Android4 <jenova> jenova is a non-playable character from final fantasy 75 <MrSkitZo> i have 2 personalities, one drunk one sober :)6 <MrSkitZo> SkitZo aka Schichofrenic7 <JermSnap> my name is jeremy, in the mountains, when I was a pro snowboarder, you get ONE

<JermSnap> sylable,so you can shout peoples names easier,so people called me JERM ..then I started<JermSnap> breaking 1-2 snowboards a day so the guys at the factory started calling me JermSnap...<JermSnap> there ya have it.

8 <Dave-O> It’s my personal version of the nicjk nanme "steve-O" from jackass9 <Dave-O> my name included

10 <Dave-O> the names david11 <NightKhaos> Honestly, Chaos was likely to be taken everywhere I went, by changing it to a K i had

<NightKhaos> a higher change of actually getting the nickname rather than NightChaos2,481,09212 <Adys> was my first charname in wow13 <Adys> which i typed randomly on the keyboard and made readable14 <Rtkwe> on a qwerty keyboard rtkwe is tyler shifted one key to the left15 <kunwon1> it’s an anagram for ’unknown’ with a letter missing. More importantly, it’s unique enough

<kunwon1> that no one else uses it or a variant of it16 <muicalc> its calcium backwards17 <Bspec> well, i named myself after a mode in a video game called Gran Turismo 418 <Bspec> which in of itself contains many cars which have these certain tuned versions such as "M-spec"

<Bspec> or "V-spec"19 <Bspec> it’s called "B-Spec"20 <Rov> My original and rsn is Aeselrov but i like Rov the most21 <Rov> Well...Aeselrov means Dunkeyass in danish...Aesel = Dunkey and Rov = Ass

continued on the next page

137

Chapter 9 Creation of IRC nicknames

22 <nim> it is the short form of nimitz, which came from a book by david webber, a treecat was named<nim> nimitz in the honor harrington series (not the aircraft carrier)

23 <nazgjunk> I started off with "netjunk" when I got on the internet first, found out it was used aplenty,<nazgjunk> so I had to come up with something new - that became nazgjunk. Nazg means ring in the<nazgjunk> Dark Tongue of Mordor (Tolkien), and I’m using junk to mean "junkie" or addict

24 <Kooothor> Thor is a viking god25 <Kooothor> and Kooo is because I like the letter "K" and for some reasons I wanted to have "o" in my nick :)26 <imorrOw> Im swedish and the swedish word of tomorrow is imorgon, so my nick is a mix of thoose

<imorrOw> words cuz im a person thats pretty good of doin things tomorrow insted of today :>27 <aplsin> well it’s short for "Apelsinmannen" (swedish for Orange man), which is the name of an urban

<aplsin> legend about a man who took so much acid he thought he was an orange28 <aplsin> i came up with "aplsin" when entering a highscore on a game that only alowed 6 letters29 <aplsin> the L sound like "el" when you say it, so i replaced "el" by "l"30 <AHA> A Heart Attack - It is my username for a game I play31 <Lawliet> Lawliet (pronounced Low Light)32 <Lawliet> was stolen off death note33 <LAO2829> Libra Alpha Omega34 <lokkju_wrk> "lokkju" == one of the alternative spelling of the name of the Norse god Loki35 <mikearr> nothing, it’s my name36 <mikearr> right, mike R (arr to stand for R)37 <Muzzz_> It’s an abbreviation and of my full name and a phonetically easier way to pronounce it,38 <Karlprof> ’professor’, it’s an injoke among my real life friends39 <cd2cd> ok..well..when you copy a CD it is from one CD to another..hence CD2CD !40 <Teg> when I first started on irc, ten years ago, on efnet, I was given the nickname tegrity41 <Teg> it was shortened from integrity42 <DeD_ReclusE> it’s the name of one of my old bands43 <jenova> jenova is a non-playable character from final fantasy 744 <Ziggy_Sawdust> the song, Ziggy Stardust.45 <Ziggy_Sawdust> ’cuz I like trees46 <SpComb> my nickname was derived from the phrase "Spontaneous Combustion", which I chose

<SpComb> for some unknown reason back in 2003 as my nickname when I registered at some new<SpComb> forum - probably related to having played around with Crocodile Chemistry, dunno

47 <TAsn> i chose a random sequence of letters (length was chosen deliberately) and i tried to make<TAsn> it look nice.

48 <DrPraetor> It’s a roman imperial title, but it’s short for DrPraetorius, who is the villain from Bride<DrPraetor> of Frankenstein.

49 <ktwo> oh ;) well you know K2 skates? i just wrote that out50 <haxplorer> It means several things. I started out with a nickname tuxplorer, a few years back, when

<haxplorer> I was primarily interested in exploring the internals of linux. tux being the mascot for<haxplorer> linux, I combined tux+explorer to coin tuxplorer. Later my interests changed.

51 <nzk> It stands for New Zealand Klingon if you want a meaning52 <kloeri> my nick doesn’t mean anything - I just picked something random about 15 years ago with

<kloeri> the only requirements being that nobody else was using it and that it could be pronounced<kloeri> somehow

138

9.5 Nickname restriction

9.5 Nickname restrictionCreativity in the creation of a nickname is limited, as there are some restrictions in thechoice of nicks by the IRC network [OR93].

9.5.1 Nickname collisionChatters “prefer and consistently use one nickname” [Rei91]. But if a server detectsmore than one instance of a nickname on the network, a nickname collision occurs. Thenickname registration service (NickServ), which is available on a large number of IRCnetworks, solves this problem. It allows users to register their favorite nicks and protectsthem from being used by others. The registered nickname expires after a short period ofinactivity (for example, 30 days) and becomes available for registration by other users.

9.5.2 Erroneous nicknameThe Latin alphabet, digits, and special characters are available to create IRC nicknames.All these permitted characters are used. For example, the frequencies in Table 9.4 can becompared with frequencies of the letters of the English language by Lewand [Lew00]. Insummary, 82.11% of all 69686 used characters for creating 7936 nicknames are letters ofthe Latin alphabet. Compared with word formation in English, digits (10.83%) and specialcharacters (7.06%) have a much higher influence on the creation of nicknames.

Table 9.4: Frequencies of charactersFrequency

Character(s) Abs Rel.

Latin alphabet Lower-case Upper-case ∑a, A 5185 799 5984 8.59%b, B 858 449 1307 1.88%c, C 1241 453 1694 2.43%d, D 1362 361 1723 2.47%e, E 5522 383 5905 8.47%f, F 589 287 876 1.26%g, G 1027 1084 2111 3.03%h, H 1397 299 1696 2.43%i, I 3498 336 3834 5.50%j, J 275 247 522 0.75%k, K 861 202 1063 1.53%l, L 2629 539 3168 4.55%m, M 1591 761 2352 3.38%n, N 3553 452 4005 5.75%o, O 3028 293 3321 4.77%p, P 672 274 946 1.36%q, Q 67 36 103 0.15%r, R 3071 482 3553 5.10%s, S 3020 715 3735 5.36%t, T 2885 347 3232 4.64%

continued on the next page

139

Chapter 9 Creation of IRC nicknames

FrequencyCharacter(s) Abs Rel.

Latin alphabet Lower-case Upper-case ∑u, U 2151 185 2336 3.35%v, V 477 98 575 0.83%w, W 447 165 612 0.88%x, X 279 109 388 0.56%y, Y 1641 162 1803 2.59%z, Z 286 89 375 0.54%

∑ 47612 9607 57219 82.11%

Digit0 695 1.00%1 892 1.28%2 1004 1.44%3 879 1.26%4 873 1.25%5 601 0.86%6 630 0.90%7 656 0.94%8 698 1.00%9 622 0.89%

∑ 7550 10.83%

Special character` grave accent 159 0.23%ˆ caret 431 0.62%_ underscore 3242 4.65%\ backslash 13 0.02%| pipe 156 0.22%[ left square bracket 129 0.19%] right square bracket 94 0.13%{ left curly bracket 15 0.02%} right curly bracket 14 0.02%- hyphen 664 0.95%

∑ 4917 7.06%69686 100.00%

IRC nicks that contain characters such as slashes, umlauts, punctuation marks, or white-spaces are not possible. Hyphen, digit, or space are not allowed at the beginning. A spaceis omitted or replaced by special characters (see Table 9.27). Another technical limitationis that IRC cannot handle diacritics such as German umlauts (e.g., “ä”, “Ö”, “ü”) or theGerman ligature “ß” in nicks. The proper way is to replace them with the underlyingvowel (with or without a following “e”). The apostrophe is used in English to indicatepossession, or used in writing contractions. In IRC, an apostrophe is sometimes omitted,or a grave accent is used as a substitute. A mapping of non-permissible characters isshown in Table 9.5.

140

9.5 Nickname restriction

Table 9.5: Mapping of non-permissible characters

Character Mapping Example(s)

space omitted <IamTheBest>, <CharmyRoman>special character <Keyboard-Cat>, <b|rdˆdog>

apostrophe possessive case omitted <DaddysGirl>, <Nobodys_Girl>grave accent <Satan`s-Angel>, <ProudSerb`s_DaD>

contraction omitted <cantyouread>, <Hiitsnick>diacritic German umlaut underlying vowel with “e” <zahnaerztin_pretty>

underlying vowel without “e” <bayernMunchen>German ligature double “ss” <DerGrosseMann>accent letter without diacritic mark <Cesc_Fabregas>, <deja_vu>

Some 40.52% of all analyzed nicks are exclusively created with letters (Table 9.6). A nickcannot start with a digit. Therefore, nicknames consisting only of digits are not possible.

Table 9.6: Character classes of each nicknameCharacter class Frequency

Latin alphabet Digits Special characters Abs. Rel. Example(s)

yes no no 3216 40.52% <BhaalWK>, <Jassim>

yes no yes 1908 24.04% <clock_>, <ˆjenessa>yes yes yes 1699 21.41% <mib_nxelq2>, <Valerio_886>yes yes no 1102 13.89% <tek1024>, <Guestguy996>no yes yes 7 0.09% <[61814]>, <|0_0|>no no yes 4 0.05% <``>, <________>no yes no 0 0.00%

∑ 7936 100.00%

9.5.3 Letter caseIn general, IRC nicks (and also channel names or commands) are case-insensitive. Thisfactor means that <AMan> is the same as <Aman>, and only one of them can be onlineat the same time. Nevertheless, the handling of letter case for nickname creation isconsidered differently.

In Table 9.7 the letter case of all analyzed nicks is shown in detail. In general, thesenicknames are stem-based nicks because they mainly consist of letters (see page 150).Unknown nicknames (2577 nicks) are excluded. Some chatters even used different con-ventions for the capitalization of their own nicks. For example, the same chatter logged insome sessions as <ˆˆARNOLD> and some as <ˆˆarnold>.

141

Chapter 9 Creation of IRC nicknames

Table 9.7: Letter case

FrequencyLetter case Abs. Rel. Examples

lower-case 3390 42.72% <_what_is_love>, <corvette>upper-case 360 4.54% <ALASKA>, <ˆJENNˆ>mixed-case only first letter in upper-case 303 3.82% <`Maxi_vacationmode>

first letter of each word in upper-case 3093 38.97% <AMuslimGirl>, <Walt>alternate each letter 93 1.17% <AcTiVaTe>, <BaD_575>alternate each word 30 0.38% <AXL_roses>, <notNULL>random 647 8.15% <Dr-knoK>, <JEsus>

undefined nick only consists of number(s) 7 0.09% <`4_8_15_16_23_42>, <[61814]>non-stem-based nick 9 0.11% <[-_-]>, <XxxxX>, <``>mixed-based nick 4 0.05% <fun8]>, <Springfield_XD>

∑ 7936 100.00%

9.5.4 Maximum nickname length (NICKLEN)According to RFC 1459, the maximum nickname length that a client can use is 9 characters,but this is actually determined by the server. The minimum length is one character.These limitations restrict the creation of nicknames. Depending on the maximum length,long nicks need to be shortened to make them suitable (see Table 9.8). Frequently-usedconventional abbreviations are, for example, “m” (male, man), “f” (female), and “gf”(girlfriend). Dropping letters is a quick and simple way to shorten nicks. In particular, thereadability often remains intact when vowels are dropped.

Table 9.8: Variants of shorteningCategory Example(s)

abbreviation <Youngprof-male>, <Mr-Destructive>drop letter(s) <SxySnglMan>cut off the end <benJIman‘on‘holi>, <|414RequestTooLo>arbitrary shortening <NkinOnHevnsDoor>

The NICKLEN parameter depends on the respective IRC networks. The maximumlengths allowed are 9 (EFnet), 15 (IrCQ-Net, QuakeNet), 16 (freenode), and 30 (DALnet,GameSurge, SwiftIRC). The average nick length for all channels is 8.78 characters (seeTable 9.9). <DrPraetor>, a chatter at #perl, mentions that “this network [freenode] hascriminally low nick length limits”. He adds, “On other networks I can fit much moreinteresting nicknames”. This statement is all the more astonishing because only 186 users(2.34%) take the opportunity of using the maximum possible length.

142

9.6 Compounding/decompounding of nicknames

Table 9.9: Average length of nicknames per channelNickname length Nick with its length

Channel Max. Count Average Minimum Maximum

#cars 30 0 9.00 4 <fork> 18 <Cactus-Jack|Server>#chat-world 30 1 9.09 3 <era> 30 <ThisNickNerverWillBeRegistered>#church 30 0 8.68 2 <``> 19 <AzuzephreCommunity->#defocus 16 7 8.23 3 <AHA> 16 <MannyTheMolecule>#eminem 15 0 6.80 1 <Q> 12 <MrSkitZoˆAFK>

#england 15 0 7.38 1 <Q> 15 <sandreinalove_b>#English 15 54 8.67 2 <no> 15 <SilVeR_Sh[a]DoW>

#freenode 16 7 8.04 3 <Inf> 16 <LordGreystoke422>##hardware 16 1 7.92 3 <row> 16 <black_Nightmare_>#perl 16 4 7.83 3 <Kev> 16 <master_of_master>#Romance 15 98 8.98 2 <hi> 15 <Gener[a]lPublic>#soccer 9 14 6.86 3 <NEC> 9 <JingleBel>#talk 30 0 9.47 3 <Zap> 29 <City|Sleep|ILY_Kacey|Call|Me|>

9.5.5 Inappropriate nicknamesInappropriate nicknames, such as abusive or swear words, will be removed by operators.If a user on the auto-kick list (AKICK) attempts to join the channel, a channel service bot(ChanServ) will automatically kick and ban the participant from the channel.

9.6 Compounding/decompounding of nicknamesThis section deals mainly with the compounding of stem-based nicknames, especiallywith the main part stem and its styling. Apart from these, the name of the clan is normallystrictly predefined. No detailed research on the part status has been carried out because itis mostly made up of a single word.

Table 9.10: The five most popular templates used for creating nicknamesFrequency

Template Abs. Rel. Examples

“Guest_”<number> 375 4.73% <Guest_161>, <Guest_236>“Guest”<number> 180 2.27% <Guest22410>, <Guest34312>“Guest_”<number>“_”<number> 49 0.62% <Guest_109_601>, <Guest_978_828>“BRAZILGIRL_”<number> 15 0.19% <BRAZILGIRL_294>, <BRAZILGIRL_841>“Unknown”<number> 15 0.19% <Unknown28115>, <Unknown63013>

∑ 634 7.99%7936 100.00%

Furthermore, some nicknames look similar to each other, which happens if they arecreated from a similar idea or through creating new nicks from existing ones. The fivemost frequently used templates that have identical stem, decoration, and concatenationare shown in Table 9.10. The basic idea of these templates is that random numbers areadded at the end. Copying nicknames (<zhang>) and adding numbers (<zhang2008>)

143

Chapter 9 Creation of IRC nicknames

is a simple way to vary nicknames. There are numerous traditional (which are used tocreate, e.g., new English words) and non-traditional mechanisms for constructing newcreative IRC nicknames (respectively stems).

9.6.1 Creating a stemTo find out in detail which parts of speech (POS) a nick consists of, the whole nick—especially the stem—must be decompounded into single POS and tagged. This step iscrucial to our understanding of how to answer such questions as the following.

• Which parts of speech do nicks consist of in detail?

• In which order are POS concatenated to a compounded nick?

• Which parts of a nick are omitted in discourse?

Decompounding of nicknames: All nicks are decompounded into single POS with Nick-Decompounder, which the author has written in Java (see Subsection 11.5.3).

Tagging of nicknames: After all the nicks decompounded by NickDecompounder aremanually checked and marked with a period at the end (“your heart .”), and then auto-matic tagging is executed with the “Stanford Log-linear Part-Of-Speech Tagger” [TM00]using the “Penn Treebank Tag-set” [Mar+94] (see Table A.1). Tagging can be complexbecause some words represent more than just one POS (e.g., “access” can be a noun orverb). Incorrectly tagged words are manually corrected.

Clustering of POS tags: The tags are clustered into coarse-grained syntactic categories(see Table 9.11).

Table 9.11: Cluster of Penn Treebank POS TagsCluster POS tag Penn Treebank POS Tags

[JJ] JJ, JJR, JJS[NN] NN, NNS, NNP, NNPS[PRP] PRP, PRP$[RB] RB, RBR, RBS[VB] MD, VB, VBD, VBG, VBN, VBP, VBZ[WP] WDT, WP, WP$, WRB

Not every unknown part of a nick refers to a foreign word. They can consist of emoticons,non-words (<zrttrtr>), or pseudowords (<Adys>). Additionally, jargon, slang, andnon-traditional mechanisms for the creation of stems make tagging difficult. Table 9.12shows the four new POS tags that have been added to the “Penn Treebank POS Tags”.

Table 9.12: New POS tagsTag Description Explanation

IW illegal word a) non-word: not pronounceable nor meaningfulb) pseudoword: pronounceable but not meaningful

MB mixed-based nick see Subsection 9.8.3NSB non-stem-based nick see Subsection 9.8.2UW unknown word or nick word or nick of unknown origin

144

9.6 Compounding/decompounding of nicknames

There is no need to use the original POS tag LS (list item marker), “which includes lettersand numerals when they are used to identify items in a list” [Mar+94]. The original POStag SYM (symbol) is divided into “mathematical, scientific and technical symbols” (nowsubpart of NSB) and “expressions that aren’t words of English” (subpart of IW). Theclassification of the rest remains equal (e.g., CC, CD, UH). Examples are shown in Table9.13.

Table 9.13: Tagging nicknamesTag Description Example(s)

CC coordinating conjunction and (<youandmee>), n (<bonnie-n-clyde>)CD cardinal number 47 (<ˆMan47USA>), 25 (<boy25spain>)DT determiner The (<ManOfTheYear>), A (<[[[[A-Man4U>)EX existential there (<there_was_nothing_to_lose>)FW foreign word schokokeks (<schokokeks_3388>)IN preposition or subordinating conjunction for (<oneforall>), with (<Always-with-u>)IW illegal word <dfgdatlrwfr>, <gghdfhgdfh>JJ adjective sweetest (<sweetest_lady>), Bad (<[-BadBoy>)MB mixed-based nick <fun8]>, <danielita_xD>

NN noun yasmine (<yasmine_72>), eagle (<the—eagle>)NSB non-stem-based nick <|{0_0}|>, <[-_-]>, <________>PDT predeterminer such (<suchaniceday>)POS possessive ending `s (<Satan`s-Angel>)PRP personal or possessive pronoun You (<You_248>)RB adverb n0t (<l1k3_n0t>), Just (<JustAGuy>)RP particle up (<ce_never_give_up>)TO to To (<BoredToTears>), 2 (<talk2me–>)UH interjection hehe (<hehe-[boy]>), <okee>, <wau>UW unknown word or nick gt li v <carla18_gt_li_v25>VB verb call (<call_me>), s (<hesthere>)WP wh-determiner, (possessive) wh-pronoun, what (<whatsyourname>)

or wh-adverb

Creating POS groups: Sequences of the same tag are merged together into manageablegroups as in the examples below:

Nick Penn Treebank Tag-Set Cluster POS group<MariaPia> NNP NNP → [NN NN] → (NN)<hotsexyman37> JJ JJ NN CD → [JJ JJ NN CD] → (JJ NN CD)<GoodGuy_446_36> JJ NN CD CD → [JJ NN CD CD] → (JJ NN CD)<women> NNS → [NN] → (NN)<Sabina-in-syd> NNP IN NNP → [NN IN NN] → (NN IN NN)<slimjim> JJ NNP → [JJ NN] → (JJ NN)<handsome_8286> JJ CD → [JJ CD] → (JJ CD)<theniceman> DT JJ NN → [DT JJ NN] → (DT JJ NN)

145

Chapter 9 Creation of IRC nicknames

Table 9.14 shows that the top 10 most frequently used POS groups cover 89.59% of all7936 nicks with users’ feedback. The POS group (NN) is the most used. (NN) is, e.g., afirst or last name, or is a name related to a town, country, or film. For instance, the nick<theniceman> can be decompounded and tagged into “the nice man” with cluster [DT JJNN], and “then ice man” with [RB NN NN]. Neither of the POS groups (DT JJ NN) and(RB NN) are in the top 10. Parts of them, such as (DT JJ NN) (49) and (DT JJ NN) (907),are frequently-used POS groups. Therefore, the decompounding and tagging of the nick<theniceman> into “the nice man” with POS group (DT JJ NN) is more likely.

Table 9.14: Top 10 POS groupsFrequency

POS group Abs. Rel. Examples

(NN) 3348 42.19% <women>, <monaliza>(NN CD) 1972 24.85% <FTorres9>, <User6>(JJ NN) 907 11.43% <slimjim>, <ˆbadboy>(JJ NN CD) 198 2.49% <coollady2008>, <GoodGuy_446_36>(JJ) 188 2.37% <|cute|>, <crzy>(FW) 145 1.83% <je_suis_belle>, <HaKuNa_MaTaTa>(JJ CD) 136 1.71% <handsome_8286>, <Crazy_51>(NN JJ) 94 1.18% <fransisca_cute>, <girl-cool>(NN IN NN) 73 0.92% <ianinAmsterdam>, <Sabina-in-syd>(DT NN) 49 0.62% <TheChicken>, <_A-Girl>

∑ 7110 89.59%7936 100.00%

Morphological processes to create stems: As already mentioned in [BS08; Cry04], Inter-net language uses emoticons, jargon (e.g., technical term <Bspec>), slang (<macnoob>: anoob is a person who is new or inexperienced in a subject), abbreviations, and contractions(<Letsˆchat>), which also come across within the stem creation. Another possibility is analternative spelling, that changes some letters in a word that is pronounced in a similarmanner (e.g., <BigDawg> is an alternative spelling of “big dog”). A further interestingvariant is the spelling done with the help of homophones. Homophones are words withthe same pronunciation, but with different spellings and meanings (e.g., <hairypotter>instead of “Harry Potter”). Various mechanisms of word formation and inflection are usedto create IRC nicks. The most common word formation mechanisms are back-formation,blending, clipping, compounding, conversion, derivation, and neologism (e.g., loanword,onomatopoeia) [Bus06; Pla03; Tra07]. There is no clear-cut classification. All these mor-phological processes are important for nickname creation. Examples are shown in Table9.15.

Table 9.15: Traditional mechanisms for the creation of stemsCategory Example Explanation

abbreviation <ProfLee> “prof” is an abbreviation for “professor”acronym <Radar> acronym for “radio detection and ranging”back-formation <EDIT> removing suffix “or” from noun “editor”blending <pulsar> combining of the two words “pulse” and “quasar”

continued on the next page

146

9.6 Compounding/decompounding of nicknames

Category Example Explanation

borrowing <zebra_591> adopted word from the Bantu languageclipping <Fridge> reduction of the word “refrigerator”coinage <Aspirin_720> invention of a totally new term,compounding <EarthQuaqe> stringing together words “earth” and “quake”conversion <green> noun (referring to a putting-green in golf) is derived

from the adjectivederivation <simply_boyish> adding suffix “ish” to the noun “boy”inflection <badboys> adding inflectional plural affix “s” to the noun “boy”initialism <FBI> initialism for “Federal Bureau of Investigation”loanword <zeitgeist_> the spirit of the times or ageonomatopoeia <meow> sound uttered by cats

A popular strategy is to use self-related nicks, which disclose personal information to theother users. This has already been pointed out by Bechar-Israeli [BI95]. A classification ofnicknames that includes personal information is shown in Table 9.16. In contrast, non-self-related (<sky>) and unknown nicks (<cr_yg>) have a high degree of anonymity, whichmeans “that the nickname used by a participant does not reveal any information aboutthe user’s on-line identity” [Lak06].

Table 9.16: Personal informationCategory Example(s)

name first <john117>last <Wachert>initials <chb>nick <Ollie>

age years <Dana_22>year of birth <ROGER1975>

location country of origin <From_Russia_>residence <Rick_London>country code <zaggy-nl>, <young_PL>

sexuality gender identity <ukgirl>, <FEMALE>, <[-_English_Man>orientation <alicia25bi>, <lesbian24>preferences <foot-fetishman>

interpersonal relationship marital status <[Single_Boy]>family <grandmothera>

religion <christianman>, <IslamicGirl>career education <HighSchoolGirl>

job <geologist>physical appearance aging <young_303>

appearance <TallGuy>clothing <GuyinBlueJeans>

attitude positive or negative view <Hate-My-Self>, <_Life_is_Beautiful_>behavior character <Intelligent_guy>, <nicegirl>emotions feeling <feelgood>

wish <StillWannaBeInBali>

continued on the next page

147

Chapter 9 Creation of IRC nicknames

Category Example(s)

interjection <haha>interest, activity hobby <ˆsportyGirlˆ>, <ArtLove>, <mr_computer08>

reason to be online <letstlkboutsex>, <here4fun>favorites club <ChelseaFC_Girl>

actor <[[[Leonardo_Di_Caprio]]]>

Nevertheless, it may occur that the newly-created nick is still not unique. Some creativenon-traditional morphological processes change this state and help chatters adopt nick-names to distinguish between those with the same names (see Table 9.17). Deliberateeccentric strategies to make a nickname unique are, for example, replacing a letter to standout from other similar variants, or shortening the nickname.

Table 9.17: Non-traditional morphological processesCategory Example(s)

add number <Guest17447>element of obscurity <adamx>variable (e.g., for ages) <RobiX>

drop vowel (consonant writing) <sxyfml>consonant <lngblknhrd>

replace letter <NightKhaos>, <ˆdRaGuLaˆ>reduplicate letter <ˆLoooVeeeRˆ>, <MMEEGGAA>

part <brarraveheart>backward <muicalc>anagram <kunwon1>swap syllable <ConSeannery>shift <Rtkwe>

“Universal Leet (L337, L33T, 1337) Converter”, which can be found in the URL http://www.robertecker.com/hp/research/leet-converter.php, is a small by-product ofthis examination. This script is web-based and written in PHP. It translates text-to-leetand leet-to-text. Leetspeak is a creative form of Internet slang, which can also be utilizedfor nickname creation. The converter has already implemented different levels of lettermappings. The examples in Table 9.18 show how letters, words, or parts of them arerepresented in leetspeak. Neither encoding nor decoding are bijective.

Table 9.18: Letter and (part of) word mappingLetter mapping Example(s)

a ↔ 4 <Cr4zy-Legend>, <YuN4_Ch4N>

b ↔ 3, |3 <ˆCho03y_girlˆ>, <|3ug>e ↔ 3 <Fir3blad3>, <Pix3l>g ↔ 6 <an6eL>h ↔ |-| <|-|0T_Guy>i ↔ 1, | <Damn_G1RL>, <v|s|ble>k ↔ |{, ]{ <Sm0|{e>, <]{ilroy>l ↔ 1, |, |_ <fireba11>, <Breath|ess>, <DazZ|_3R>

continued on the next page

148

9.7 Basic structure of IRC nicknames

n ↔ |\| <|\|ala>o ↔ 0 <T0tal>, <v0id>r ↔ |2 <|2ebeL_girL>s ↔ 5, c, z <f00li5h>, <D0c5i5>, <pRinc3ZZ>t ↔ 1 <AfterDea1h>(Part of) word mapping Example(s)

are ↔ r <UrBabe>-ate ↔ 8 <i_h8_j0ck5>-ate- ↔ 8 <sk8r_Grl>for ↔ 4 <aB0Y4aGal>for- ↔ 4 <friend4ev>one ↔ 1 <Just1OfTheGuys>-one ↔ 1 <som1forme>to ↔ 2 <love2chatman>you ↔ u <FuNK-U->you- ↔ u <UrBabe>

Several types of orthographic errors, which can either be intentional or unintentional, arefound in IRC (see Table 9.19).

Table 9.19: OrthographyCategory Examples

wrong <sexyandpritty>, <canadianseximan>missing <anyting4u>, <luckysot>extra inserted <Horney__Man>, <wildflopwer>transposed <Ronadl>, <shanghaigilr>misspelled <beautyful>, <crazi_boy>

These errors may in turn be due either to typographical errors (errors caused by pressingthe wrong keys), or spelling errors (errors due to insufficient language competence). Amisspelled word can be a correct spelling of another word but can also lead to a newnickname. One or more of these morphological processes are arbitrarily combinable. Thereare theoretically (and maybe practically) no boundaries.

9.6.2 StylingAdding a decoration or concatenation does not merely alter the appearance of a nick. It canalso make an already-used nickname unique without changing the meaning. Examplesfor nickname styling with decoration and concatenation are shown in Table 9.20. Bothoptions can also be combined.

Table 9.20: Different styling possibilitiesDecoration Concatenation

Gentleman_ [[webcam-for-uGentleman__ [[webcam-for_uGentleman___ [[webcam_for_uˆGentlemanˆ [[web-cam-for-u

149

Chapter 9 Creation of IRC nicknames

9.7 Basic structure of IRC nicknamesThe basic structure of nicknames can be divided into the following parts: stem, status,clan, concatenation, and decoration. But not every part of this basic structure needs to bepresent. An overview of the basic structure with possible positions defined by the authoris shown in Table 9.21. The position of the stem has the value 0. A negative/positive valueof the position is used to address the part before/after the stem.

Table 9.21: Basic structure of nicknames with possible positions

PositionPart -5 -4 -3 -2 -1 0 1 2 3 4 5

clan no optional no

stem no optional no

status no optional no

decoration optional no optional no optional

concatenation no optional no optional no optional no optional no optional no

9.7.1 StemA stem is a part of a word from a linguistic perspective. From the author’s point of view, astem is the basis or fundament of the nickname. A stem consists of one or more letters ordigits, which can form words (<yellow>), phrases (<The_Dark_Dragon>), or sentences(<IamTheBest>). The average number of parts is 1.86. Note that these parts includewords and numbers. Due to the small number, sentences are rare. An overview is given inTable 9.22.

Table 9.22: Stem: Part(s) countFrequency

Part(s) count Abs. Rel. Example(s)

1 2646 33.40% <black>, <Gentle__>, <|cute|>, <good>2 4039 50.98% <One-Kiss>, <Sugar_875>, <TexasGirl>3 1004 12.67% <art_m_45_>, <Call_Of_Booty>, <oneforall>4 187 2.36% <Male_With_Cam1>, <within-the-dark-mind>5 39 0.49% <JIM_UK_MSNonCAM>, <aB0Y4aGal>6 6 0.08% <the_man_who_sold_the_world>7 2 0.03% <i-am-a-girl-for-a-reason>

∑ 7923 100.00%

9.7.2 ClanCharacters before the stem can stand for some sort of game clan (electronic sports) orother organization. A clan is a group of users that play games together over the Internetor on local area networks. IRC—especially the IRC network QuakeNet—is very popularamong players of many different games. Therefore, the number of clan tags, which showthe membership of a specific clan, is significantly higher than in other networks. A clantag is usually a shortened form of words or phrases (see Table 9.23).

150

9.7 Basic structure of IRC nicknames

Table 9.23: Clan

Example Explanation

<[p]SandMan> “[p]” is a distinctive mark for a Counter-Strike team. “p” means prominent.<[rYs]ToScA-> “[rYs]” is an Unreal Tournament 99 clan. “rYs” means “resurrected Yoga slaughterers”.<LLˆspyhunter> “LL” (short for “LatterligLett”) is a Counter-Strike clan. It means “ridiculously easy”.

9.7.3 StatusIt is common courtesy that users change their nickname with the command “/nick” toindicate their current status. Thus, for example, everyone can see at a glance that theuser is currently not available for a while and cannot read messages right now (see alsopage 160). For example, statuses can be abbreviations, adjectives, or nouns. They areprimarily single words. The most frequent status is named “away”. A list of statuses (withor without additional information) is shown in Table 9.24.

Table 9.24: StatusExample(s)

Category Status Nick

unspecified action AFK (away from keyboard), away, BNC <|FlorianˆoFF>(bouncer), BUSY, Gone, IDLE, off, Offline,out

time, date bbiab (be back in a bit), BBL (be back later), <[PowerZ|BRB]>BRB (be right back), BBS (be back soon)

specified action Bath, cook, dinner, eating, FishinG, food, <`jordan|Eating>Guitar, meditating, phone, Shower, Sleeping,STUDY, training, work, zzz

location AtWork, Bath-ROOm, Bed, Doctor, Home, <Samurai|Doctor>Market, pooltime, Pub, School, store, Toilet

time, date in2-5, till18th <Guru|BRB-in2-5>feeling, emotion Sick, Depressed, Sad, Headache, Enraged, <Smurf|Sad>

ILY_Kacey|Call|Me|

9.7.4 DecorationThe main parts (stem, clan, and status) can be decorated in front of, within, or behind anick. In particular, special characters (<}{Muffin}{>) beautify nicknames, but also letters,especially the letter x (<XxmelixX>) are also used. Decoration can highlight letters,words, or parts of words. Most of the time, chatters use the same look and style of nick forindividualization. Examples for text-decorated stems are given in Table 9.25.

151

Chapter 9 Creation of IRC nicknames

Table 9.25: Decoration of a stemCategory Example(s)

in front of <][wayne][>, <ˆˆAnne>, <`rocky>within split stem <raphaˆela>, <memori_es>, <Ser`ena>

highlight (letter) <Sa[i]nT>, <gH[ˆ0ˆ]ST>, <[T]I[G]E[R]>, <D_R_A_C_U_L_A>

highlight (word) <{I}{Walk}{Alone}>, <hot{man}>highlight (part) <{G}EN{TLE}MAN>

behind <iceman_->, <Coach__>, <magnetsˆˆ>

An overview of the top 5 decorations for each position is shown in Table 9.26. Mostly,stems are decorated in front of and behind the stem.

Table 9.26: Decoration (Top 5 of each position)Frequency

Position Character Abs. Rel.

-5 in front of clan [ 7 77.78%_ 1 11.11%` 1 11.11%

-4 within clan 0 100.00%

-3 behind clan ] 7 100.00%

-1 in front of stem ˆ 87 27.27%_ 83 26.02%[ 28 8.78%ˆˆ 21 6.58%` 21 6.58%

0 within stem _ 28 26.92%- 21 20.19%[ 17 16.35%ˆ 11 10.58%` 8 7.69%

1 behind stem _ 191 30.08%ˆ 88 13.86%- 54 8.50%` 52 8.19%__ 43 6.77%

3 in front of status [ 8 53.33%{ 3 20.00%` 2 13.33%| 2 13.33%

4 within status 0 100.00%

5 behind status ] 8 42.11%} 4 21.05%ˆ 2 10.53%_xx 2 10.53%| 2 10.53%

152

9.8 Classification of nicknames

9.7.5 ConcatenationAdditionally, the main parts (stem, clan, status), and parts of them can be concatenatedwith different characters. In general, IRC nicks with spaces are not allowed (see restric-tion in Section 9.5). Connecting two POS without spaces is the most common way toconcatenate. An overview is given in Table 9.27.

Table 9.27: Concatenation (Top 5 of each position)Frequency

Position Char Abs. Rel.

-4 within clan 0 100.00%

-2 between clan and stem | 1 100.00%

0 within stem space 3558 52.28%

_ 2675 39.30%- 435 6.39%ˆ 51 0.75%` 35 0.51%

2 between stem and status | 50 53.76%` 15 16.13%- 9 9.68%ˆ 9 9.68%_ 9 9.68%

4 within status space 8 44.44%| 5 27.78%- 2 11.11%ˆ 2 11.11%_ 1 5.56%

9.8 Classification of nicknamesThe results of the analysis show that nicknames can be divided into stem-based (99.84%),non-stem-based (0.11%), and mixed-based nicks (0.05%). After removing clan, decoration,status, and concatenation from the whole nickname, the basis of the nick remains. Table9.28 presents the basis of the nick <[p]Xxx666xxX_AFK>.

Table 9.28: Basis of the nick <[p]Xxx666xxX_AFK>

Character numberRemoved part 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

clan [ p ] X x x 6 6 6 x x X _ A F K

status X x x 6 6 6 x x X _ A F K

concatenation X x x 6 6 6 x x X _

decoration X x x 6 6 6 x x X

basis 6 6 6

153

Chapter 9 Creation of IRC nicknames

In this example, it is assumed that “666” is not part of a decoration. Therefore, the nick<[p]Xxx666xxX_AFK> is stem-based. If “666” is (part of) a decoration, the nick is anon-stem-based nickname. The knowledge of whether the basis consists of a decoration ornot is defined by the nickname creator. Table 9.29 gives an overview of this classification.

Table 9.29: Classification of nicknamesBasis contains ... Nickname contains ...

ASCII art or Clan, status, concatenation, ClassificationStem special character or decoration Stem-b. Non-stem-b. Mixed-b.

7 7 3 7 3 7

7 3 7 7 3 7

7 3 3 7 3 7

3 7 7 3 7 7

3 7 3 3 7 7

3 3 7 7 7 3

3 3 3 7 7 3

Depending on the IRC client, some of the nicknames have special signs in front of them.These indicate that these users have individual rights (and duties) in a channel. Forexample, an at sign in front of their names indicates that they are channel operators. Theyhave the power to kick and ban people. Chatters with a plus have voice privileges andcan talk in moderated channels (“normal” users cannot). Further user mode signs are“%” (user is a half-op on the current channel), “&” (user is an admin), and “˜” (user isfounder/owner of the current channel). These special signs are not part of the nick.

9.8.1 Stem-based nicknameStem-based nicks are the most frequently used ones in IRC. For example, the hypotheticalnick <ˆ_ˆ[p]Germ{a}n_boy-15ˆ_ˆ|Away> is subdivided into the following parts, illustratedin Table 9.30.

Table 9.30: The parts of the nick <ˆ_ˆ[p]Germ{a}n_boy-15ˆ_ˆ|Away>Position

Part -5 -4 0 1 2 4

clan [{p}

stem Germ{a}n_boy-15

status Away

decoration ˆ_ˆ [p] Germ{a}n_boy-15 ˆ_ˆ

concatenation Germ{a}n_boy-15 |

In almost all cases the main part stem is fixed and is not only optional. But the stem can beempty, for example, the user <blizzard-> went away and changed the nick to <_away>.Additionally, only a handful of users created nicks with a different basic structure becausethe order of the parts stem and status is swapped (see Table 9.31). Further examplesare <brb-susu-time> (renamed from <susu-sipper>), <afkmum> (<animum>), and<[Away]Ollie> (<Ollie`>).

154

9.8 Classification of nicknames

Table 9.31: The parts of the nick <BRB_Ghost>Position

Part -4 -2 0

status BRB

stem Ghost

concatenation _

The 10 most popular structure templates are shown in Table 9.32. They cover 98.54% ofthe types, and a further 31 templates cover the remaining 1.46%.

Table 9.32: Top 10 basic structure templates

Position-1 0 1 2 4 FrequencyDecor. Stem Decor. Conc. Decor. Conc. Status Abs. Rel. Example(s)

no yes no yes no no no 4810 60.71% <jason_k>, <JayJay>no yes no no no no no 2153 27.17% <women>, <sniper>no yes no no yes no no 303 3.82% <Infinito->, <carl_>no yes no yes yes no no 143 1.80% <no_way_>, <MrBig->yes yes no no yes no no 85 1.07% <{rani}>, <_Mike_>yes yes no yes no no no 81 1.02% <[_m36>, <_miss_egypt>yes yes no yes yes no no 68 0.86% <ˆcube3ˆ>, <[Jon-Hall]>yes yes no no no no no 67 0.85% <_fanmeile>, <_Toni>no yes no no no yes yes 51 0.64% <Valentino`Away>no yes yes no no no no 46 0.58% <friˆda>, <raphaˆela>

∑ 7807 98.54%7923 100.00%

9.8.2 Non-stem-based nicknameThis type consists mainly of special characters for creating emoticons (a blend of emotionand icon), festoons, or further ASCII art objects. Eastern-style emoticons can be readwithout turning the head to the side. Similar looking emoticons are bixies (BIXies), whichwere used by BIX (byte information exchange). Examples are given in Table 9.33.

Table 9.33: Non-stem-based nicknamesCategory Examples

ASCII art emoticon <dˆ_ˆb>, <d[o_o]b>, <(*_*)>, <[-_-]>object festoon <o-Oo_0_oO-o>, <O-OO-O-o-o-o-o_o>, <XxxxX>

else <ˆ__———-ˆ>, <|_b>, <ˆoˆ>special character <{}>, <|>, <\\>

9.8.3 Mixed-based nicknameAs the name suggests, mixed-based nicks are a combination of stem-based and non-stem-based types. For example, the nick <Springfield_XD> is made up of the stem “Springfield”

155

Chapter 9 Creation of IRC nicknames

and the emoticon “XD”, which expresses laughing very heartily. Further examples aregiven in Table 9.34. Note that adding a status to an emoticon is another possibility forcreating mixed-based nicks.

Table 9.34: Mixed-based nicknamesCategory Examples

mixed-based <fun8]>, <danielita_xD>

9.9 Chapter summaryIRC nicknames were analyzed in this chapter. It showed the restriction and basic structureof IRC nicknames. Nicknames were decompounded into their parts of speech (POS) fordetailed analysis. The next chapter analyzes the use of IRC nicknames in discourses.

156

CHAPTER 10 Use of IRC nicknames in En-glish chatroom discourse

Once connected to an IRC network via an IRC client program (such as mIRC), every userhas his/her own unique nickname within the network. After someone joins a channel, IRCpresents a page containing the room’s conversation and a list of logged-in users. Theseusers include both human beings and software programs (chatbots).

This chat discourse analysis gives an overview about how nicknames are written in dis-course and how they are used to address chatters. Different variations of direct addressingand inexact spelling make automatic detection of nicknames and clear referring difficult.How written nicknames are changed within IRC discourses is summarized in Table 10.1.It is necessary to analyze nicknames in two ways: (1) the specific way a nick is written (in-side) and (2) the nick’s surroundings (outside). A one-to-one string matching of logged-inusers and written nicknames does not often work. For example, if the nickname is notaccurately written (e.g., mistyped).

Table 10.1: Use of nicknames within IRC discoursesCategory Topic

Outside of nickfreestanding nick direct addressing

(a) begin of sender’s message starts with a nick followed by space(b) nick is surrounded with the start and end of sender’s message(c) nick is surrounded with spaces(d) space followed by nick that is at the end of sender’s message

no freestanding nick direct addressing(e) else, i.e., not (a), (b), (c), or (d)

punctuation

Within nickaccurately written nick -not accurately written nick orthographic errors

text normalizationsaving keystrokes, time, and effortcreativity

The topics of Table 10.1 are described in the next sections.

10.1 Information on the general steps

This chapter uses Dataset 1b, which is described in Section 8.1.

157

Chapter 10 Use of IRC nicknames in English chatroom discourse

10.2 Focus on chat communicationA comparison of discourse between IRC and spoken English or face-to-face communicationis described in Hentschel [Hen98] and Kortti [Kor99]. Additional different linguisticperspectives in the IRC discourse are analyzed in Doell [Doe98], Stevenson [Ste00a],and Blakeman [Bla04]. Davis and Brewer [DB97] comment that computer-mediatedcommunication “has many characteristics of both speech and writing”. Generally, whencompared with oral speech, typing speed in text-based chats is slow [DH05]. To guaranteea fluid written chat communication, the participants must react and act quite quickly[Rüg07]. Additionally, a slow response time makes discourse partners bored. They forgetor lose interest before getting an answer.

Key strategies have been developed to increase typing speed, “namely to save time tokeep up with the speed of conversation, ...” [Seg02] or to reduce the number of keystrokes.Segerstad [Seg02] mentions, for instance, “features related to space, punctuation, spellingand case constitute strategies ...”, and “common grammatical features”. There are alsoexpressions such as abbreviations, or emoticons. A disadvantage is the prevalence oforthographic errors due to fast typing [Rin+06; Tav07]. Words and phrases can helpchatters stand out from the online crowd. Sun [Sun10] “pointed out that besides somecommon word-forming methods like derivation and compounding, abbreviation, blendingare the typical methods to form Internet words”. Leetspeak is another possibility forcreating new words. It is a form of Internet slang (also known as netspeak or chatspeak).In leetspeak, or leet for short, letters may be replaced by similar letters, numbers, or specialsigns. For example, “leet” would become “1337” [Per+08]. Döring [Dör03] distinguishesthree main functions for using Internet slang: time-saving economy function, identityfunction, and interpretation function. Questions that arise here are whether and howthe Internet language influences the nickname creation or spelling of a written nick indiscourse.

10.3 Direct addressingDirect addressing occurs when chatters insert nicknames of other users for addressing[Nas05]. This action prevents discourse confusion (see Chapter 10.3). Some 14.87% of all8937 investigated public chat messages include at least one nickname.

Table 10.2: Single direct addressingFrequency

Category Abs. Rel. Example

colon and space (netiquette) 781 58.77% <CBG> GPT: good point.comma and space 100 7.52% <sarkar112> MooCow, hi :Pdropped colon and space 79 5.94% <Wigyan> xteddy is a bot?“–>” and space 24 1.81% <B_O_F_H> GPT–> but yes it issemicolon and space 8 0.60% <eyecue> miranda; how do you figure?else (e.g., within a message) 337 25.36% <dmb> hey Starnestommy :D

1329 100.00%

If direct addressing is necessary, a message should start with the receiver’s nickname,followed by a colon and a space. Some 77.13% of all direct addressing start with a

158

10.3 Direct addressing

nickname, and 58.77% of nicks look similar to the first example in Table 10.2. Otherobserved styles at the beginning involve dropping the colon or replacing it with othercharacters. Additionally, nicknames are found within or at the end of messages. Thismakes detection more complex.

It is important to note that not every word followed by a colon at the beginning of a mes-sage is a nickname. For example, this word can be part of a quotation, note, enumeration,or definition (see Log Example 21).

Log Example 211 <wigyanpy> quotes: “ CBG: Male? Female? Martian?”2 <ei> btw: iron man is just a name3 <tonny_m> First: How are you?4 <Telek> CFM: Cubic Feet per Minute.

Multiple direct addressing is also sighted within a message. The recipient names are listedwhile separated by different characters (with or without a space behind), as shown inTable 10.3 below.

Table 10.3: Multiple direct addressingCategory Example

comma and space <sarkar112> hi MoiraA, Jacco, Kagometransposed <tom100> hello ,00Jafo ,00Sir

comma and no space <academy> rindolf,vincent: thankssemicolon and no space <Nadine_635> HI SHADOW;DARIOslash and no space <pigskin_chaser> Starnestommy/hrist: cool!ampersand and space one ampersand <Daniel0> jpds & nalioth: doesn’t work

two ampersands <pkrumins> hello Zoffix && maukeconjunction word “and” and space <Miranda> hi theblue and KangGuru

Log Example 221 * worried is now known as darkmyst2 <computer_user> @worried, your lol key is stuck3 * Guest_585 ([email protected]) has joined #English4 <MichaelFromH3ll> hey mithos are you here?5 <Guest_585> how r u micharl6 <MichaelFromH3ll> I’m fine guest7 <MichaelFromH3ll> why don’t you change your nick mithos8 <MichaelFromH3ll> call yourself mithos9 <MichaelFromH3ll> I like that name

10 * Guest_585 is now known as mithos11 <notion> sfoster: Mibbit in “Dojo Toolbox”. Release date? ;-)12 <notion> Sorry wrong channel ...13 <Kagome> well gotta go sleep and then i will go bye bye14 * Kagome (i=4a0d651f@gateway/web/ajax/mibbit.com/x-576faab69ecc122d) has left #defocus15 <miranda> Kagome: bye e-mail..drat...

159

Chapter 10 Use of IRC nicknames in English chatroom discourse

Sometimes there is no link between the nickname used in discourse and the originallogged-in one. This problem occurs when, for example, (1) the chatters use another nickin the meantime (e.g., by changing nicknames; Log Example 22, lines 1 and 2); (2) they donot use their regular nicks (lines 3 to 10); (3) the input of the message including a nick is ina wrong channel (lines 11 and 12); (4) the user leaves the channel or quits the chat (lines 13to 15). Conversely, some words in discourse refer to names of logged-in users that justhappen to fit the context (see example with <lol> on page 136).

Direct addressing often comes a) after joining the channel (greeting), b) before leavingthe channel or quitting the network (farewell), and c) when expressing thanks. In manycases, nicknames occur after special signal words (such as “hello” in “<Werdna> helloRichiH”). Examples of signal words are shown in Table 10.4.

Table 10.4: Signal wordsCategory Examples

greeting bonjour, greetings, hello, hey, hi, hiya, how are you, what’s upfarewell adios, aloha, bbl, bye, ciao, cu, cya, farewell, g2g (got to go), goodbye,

goodnight, night, peace out, see you, ttfn (ta ta for now), ttyl (talk toyou later)

expressions of thanks cheers, merci, thank you, thanks, thx, tnx, ty (thank you)

The exceptions are signal words that are used alone, in combination with punctuation(e.g., “hi!”), messages to everybody (“hi there”, “bye then”) or with named groups, e.g.,in the positive sense of friends or partners (“hi bro”). These signal words are used inSubsection 11.3.3 to detect nicknames (see Table 10.5).

Table 10.5: Words after signal words that are not usually nicknamesCategory Examples

everybody all, room, channel, world, people, everyone, now, there, then, smileyslike “:)” or “:-)”

friend/partner, rival/enemy babe, bro(ther), chicken, dudes, folk, friend, geek, guy, loser, mate,mom, partner, sis(ter), stranger

10.4 Tracking of nicknamesChatters are potential discourse partners as long as they remain in the channels. They canbe receivers of written chat messages. Therefore, it is necessary to observe users and theirnicknames to see which ones join, leave or quit the channels, or change nicks. In additionto simple nickname comparing, a smarter way is comparing the IRC hostmask. The useraddresses (hostmask) on IRC networks typically consist of three parts; a nickname, theident (user id), and either the hostname or IP address. For example, a hostmask looks like“[email protected]” (“!” and “@” are separator characters). Commandssuch as WHO or WHOIS inform people about the specified nick [Cha00]. The Log Example23 shows changes for one specific user, who is usually called <Dysaniak>. The trackingof nicknames is used in Subsection 11.4.1 to build the history of used nicknames forcommunicators.

160

10.4 Tracking of nicknames

Log Example 231 [28.06.2008] * Dysaniak ([email protected]) has joined #defocus2 [28.06.2008] * Dysaniak ([email protected]) has left #defocus3 [01.07.2008] * Dysaniak ([email protected]) has joined #defocus4 [01.07.2008] * Dysaniak is now known as Inf5 [01.07.2008] * Inf is now known as Dysaniak6 [01.07.2008] * Dysaniak is now known as GP17 [01.07.2008] * GP1 is now known as Inf8 [01.07.2008] * Inf is now known as Dysaniak9 [01.07.2008] * Dysaniak is now known as DeadBaby

10 [01.07.2008] * DeadBaby is now known as A11 [01.07.2008] * A is now known as Aman12 [01.07.2008] * Aman is now known as Skeleton13 [01.07.2008] * Skeleton is now known as NaughtyChild14 [01.07.2008] * NaughtyChild is now known as Dysaniak15 [01.07.2008] * Dysaniak is now known as Paraelectri116 [01.07.2008] * Paraelectri1 is now known as meninslack17 [01.07.2008] * meninslack is now known as Dysaniak18 [01.07.2008] * Dysaniak ([email protected]) has left #defocus19 [06.07.2008] * Dysaniak ([email protected]) has joined #defocus20 [06.07.2008] * Dysaniak ([email protected]) Quit (“Leaving.”)

Table 10.6: Visualization of nick changesKnot Nickname Visualization

A <Dysaniak>B <Inf>C <GP1>D <DeadBaby>E <A>

F <Aman>G <Skeleton>H <NaughtyChild>I <Paraelectri1>J <meninslack>

<Dysaniak> seems to be the user’s regular nick, because a) it is used most frequently,and for the longest periods; b) he is always logged in with <Dysaniak> (most IRC clientsinclude a setting to automatically set a nickname at startup). Note that he floods the

161

Chapter 10 Use of IRC nicknames in English chatroom discourse

channel with his nick changes (lines 4 to 14), but is not banned. Two visualizations of theabove log are shown in Table 10.6. In the top figure (network diagram), nicknames arerepresented by knots, and nick changes by sequentially numbered links. Additionally, inthe lower figure, timeline and knots with different states are visualized (white: changenickname; green: join channel; red: leave channel; black: quit/disconnect from the server).Further possibilities of statistical analysis could include, for example, the number ofwritten messages, words, characters, addressees, words per line, last online time, or activeor inactive status.

Some 231 nick changes (“is now known as”) were found and analyzed. The results in Table10.7 show how nicks are renamed while chatting (top 5). In most cases, the stem is changedinto a completely new one. A common reason for changing the current nickname is whenthe user currently cannot pay attention to IRC. Therefore, the nickname is renamed intoa new one, including an additional argument (status). 13.85% of all nick changes areespecially used for changing the status.

Table 10.7: How nicks are renamed while chatting (Top 5)Frequency Nickname (example)

Stem Decoration Concatenator Status Abs. Rel. Old New

new - - - 137 59.31% <Diluvium> <daemon>- remove - - 29 12.55% <guru_> <guru>- - add add 15 6.49% <Qst> <Qst_away>- - remove remove 9 3.90% <Qst_away> <Qst>add part - - - 8 3.46% <brown_ca> <brown_cat>

∑ 198 85.71%231 100.00%

A better way of changing status is using the command “/away”, because the currentnickname is not modified. The advantages are that (1) the user still remains reachable,(2) no public notification of nick-changing is necessary, and (3) nobody can “steal” thenick because it is not changed. In this case, if someone uses “/whois” or writes a privatemessage, a description including the away message is shown. The command “/away”without any additional argument will remove the away message.

Mutton’s “PieSpy Social Network Bot” monitors a set of IRC channels and visualizes socialnetworks on IRC. It uses direct addressing of users, monitoring nick changes, temporalproximity, and temporal density to infer relationships between pairs of users [Mut04a].But linguistic play and creativity are not taken into account during discourse.

10.5 Complications in the detection of nicknames while chatting

Saving keystrokes to reduce time and effort plays an important role while chatting. Butthis is not always the case; some variant forms of linguistic playing with nicks sometimesresult in even more keystrokes than the original nickname, or require more effort. Allthese underlying processes, including punctuation and orthographic errors, influence andcomplicate the detection of nicknames in discourse.

162

10.5 Complications in the detection of nicknames while chatting

10.5.1 Punctuation

IRC nicknames cannot contain certain punctuation characters, such as a comma, colon,question and exclamation mark, ellipses, period, or round bracket. However, they can besurrounded within a message by any punctuation or by various letters found in front of thenick (e.g., punctuation marks at the beginning of a quotation) or behind it (e.g., questionmarks for interrogation, periods for the end of sentences); especially in this case, if aspace is forgotten between separate words. Using punctuation marks for self-correction orapplause, mostly unique to online chat, is another important feature of written language.The misspelled word is replaced in the next message with the word marked by an asterisk.Furthermore, one plus means applause, the more pluses, the louder and warmer is theapplause for this user. Below in Table 10.8, examples of using nicknames in connectionwith punctuation are shown.

Table 10.8: Punctuation

ExampleCategory Nick (original) Nick (discourse)

in general interjection, exclamation <javanon> “JAVANON!!!!!”interrogation <runatrain> “runatrain?”possessive case <Starnestommy> “Starnestommy’s cat has 5 legs”word divider (no space) <Jesus> “thank you freefull and jesus:D”

chat-specific self-correction <cal> “cal*” after “cak: go eat Cybergeek2021”applause <Zoffix> “Zoffix++”

10.5.2 Orthographic errors

Orthographic errors can occur particularly while quickly typing a nick that is similar tothe creation of IRC nicknames. A mistakenly written letter for another similar-lookingletter can be considered an unintentional error, but these errors are ambiguous for humanreadability (e.g., 0↔ O, 1↔ l, 8↔ B, q↔ g, 2↔ Z, ‘↔ ’, m↔ rn). Therefore, somenicks are written wrongly, and this is usually due to the chosen font. The same problemhappens “when handwritten forms are scanned and optical character recognition (OCR) isapplied” [Chr06]. Nevertheless, several errors seem to be obviously intentional. They looklike a provocation. The additional input of spaces complicates the nickname detectionbecause separated words are created. Table 10.9 gives examples of orthographic errorsdue to typing.

Table 10.9: Orthographic errors due to typingExample

Category Nick (original) Nick (discourse)

inadvertence mistyped (wrong) <jonsmith1982> “jonsmith1984”mistyped (missing) <sarkar112> “akar112”mistyped (extra inserted) <wahnfrieden> “wahnfrienden”mistyped (transposed) <sheep23591> “sheep23951”misread (similar-looking) <gavin_1> “gavin_l”

intention provocation <lady08> “Lazy08”

163

Chapter 10 Use of IRC nicknames in English chatroom discourse

Users make several text normalization decisions to correct created nicks for use in dis-course (see Table 10.10). Correcting misspelled words, dropping reduplicated letters, orsimply writing out the nick in correct English are examples of these procedures. Also,some nicknames written in leetspeak are fully, partially, or incorrectly reconverted into“normal” text.

Table 10.10: Text normalizationExample

Category Nick (original) Nick (discourse)

mistyping correct a mistake <bloodoby> “bloodboy”reduplication drop reduplicated letter <SeXXXyyy> “sexy”

drop reduplicated part <nickck> “nick”leetspeak fully reconverted <WORLDW1D3> “worldwide”

partially reconverted <f00li5h> “f00lish”incorrectly reconverted <|Stea|thBabe|> “steathbabei”

correct English marking of possessive case <maripiasteacher> “maripia’s teacher”

10.5.3 Saving keystrokes, time, and effort91.65% of all written nicks in discourse corresponded exactly to the logged-in ones. Onereason for this high number of correctly written nicknames is the nick autocompletefeature. Some IRC clients provide this functionality, which completes the rest of thenickname after pressing the first letters of the nick and a special key (e.g., the tabulatorkey).

However, shortened nicknames and variants are also found in the chat discourse. Short-ening or omitting parts of the nickname are two important strategies to save keystrokes.Examples are given in Table 10.11. The length of the dropped substring seems to beespecially arbitrary. The problem therefore is that a new semantic of the remaining wordis possible. Interestingly, short nicks have even been shortened again, and this can leadto confusion. Therefore, many chatters only react if their nicks are written accurately indiscourse. <_Tom>, a user at channel #talk, confirmed this impression: “I only respondwhen somebody says _Tom directly. Because there’s many toms on this network”.

Table 10.11: Shortening or omitting parts of the nicknameExample

Category Nick (original) Nick (discourse)

shortening acronym, initialism <HellDragon> “HD”abbreviation <GuCCiGuRL> “guc”, “GuCC”

omitting POS clan <[P]SandMan> “sandman”status <mouzˆaway> “mouz”decoration <|Soujiro|> “soujiro”concatenation <B_O_F_H> “bofh”

The comparison between the original chatter’s nickname and the used variant in thediscourse shows us which part of a nickname has been omitted in detail. Not only canclan, status, decoration, or concatenation be omitted (see Table 10.11), but also parts of astem (see Table 10.12). Sequences of clusters are not merged together. The below mappingis not bijective.

164

10.5 Complications in the detection of nicknames while chatting

Table 10.12: Omitted POS of nickname while chatting

Clustered POS ExampleNick (discourse) Nick (original) Nick (discourse) Nick (original)

[CD] [CD RB VB], [NN CD] “404” <|404NotFound|>[DT] [DT CD] “The” <The_868>[FW] [FW CD] “sanctus” <sanctus2099>[JJ] [JJ CD], [JJ NN], [JJ PP], [NN JJ NN] “elusive” <ElusiveˆBabe>[JJ NN] [JJ NN CD], [JJ NN CD NN], [JJ NN IN NN] “black beauty” <blackbeauty_720>[NN] [DT NN], [JJ NN], [NN CD], [NN CD NN], “Dragon” <ThE_DrAgOn>

[NN IN NN], [NN JJ NN], [NN NN],[NN NN CD]

[NN NN] [NN CD NN], [NN NN CD], [NN NN NN] “engel girl” <Engel08girl>[RB] [RB TO NN] “Back” <Back2Basics>[VB] [VB IN PP NN)] “dream” <Dreamofmebab>

A prediction of which part of speech will be dropped is not easy to make, but the tendenciesare as follows:

• Omitted parts are mostly cardinal numbers;

• Adjectives and/or nouns often remain; and

• Nicknames are shortened after the first one or two POS.

A key strategy for increasing typing speed and minimizing response time is the handlingand matching of lower- and upper-case letters of a written nick that does not alwayscorrespond exactly to the original user’s nickname. Such inaccuracies can be a consequenceof typing quickly when nicks or the whole messages are written in lower-case letters. Onthe other hand, messages written entirely in upper-case letters are considered to beshouting.

10.5.4 Creative linguistic playgroundChatters play with the original nickname regardless of saved keystrokes, time, or effort.Reduplicated letters signal stressed syllables that are used to express emotions whilechatting. Other creative methods in using nicknames in context are by adding decorationsor writing backwards. All these methods are well-known from nickname creation. Notonly does the creator of the nickname want to stand out, but also the chatter who uses it(see Table 10.13).

Table 10.13: CreativityExample

Category Nick (original) Nick (discourse)

reduplication <Rita> “Riiiitaaaaaa”decoration <adaptr> “[adaptr]”backwards writing <GanjaMan> “naMajnaG”

165

Chapter 10 Use of IRC nicknames in English chatroom discourse

Messages become harder to understand and analyze if addressees are based on the user’sreal name (see Log Example 24, lines 1 and 2), diminutive forms are used (lines 3 and 4),background information is necessary (lines 5 to 10), or linguistic playing occurs with thenick (lines 11 and 12).

Log Example 241 <fallingfromyou> hiya Michael2 <Necromant> hi jen3 <Chewbacca_360> me?4 <t3hbowie> Yeah, you, baccy.5 <OliverWoods> hay guys6 <Chewbacca_360> hello harrypotter’s friend7 <mercury> I’m chinese, from ShangHai,China8 <eileen_> hi chinese guy9 <MiSsUnDeRwEaR> HOWS U Tom

10 <Tom> hugs missdoubleyummyunderwear11 <GothicAngel> MyBabe :))12 * UrBabe kisses GothicAngel on the cheek

10.6 Chapter summaryThis chapter showed how nicknames were used to address logged-in users in IRC dis-courses. Additionally, it analyzed how nicks were written fully in discourse and whichparts of them were omitted. The following chapter introduces automated detection andmapping of nicknames (identifiers).

166

CHAPTER 11 Automated detection and map-ping of identifiers

As already mentioned in Chapter 9, the detection of identifiers within discourses is animportant step to finding sender-receiver relations. These relations are essential for un-derstanding the question “Who is communicating with whom?” The senders’ nicknamesare usually placed within messages at fixed positions (see Subsection 8.4.7). In contrastto senders’ nicknames, the receivers’ nicknames are found at any position within thediscourses (Chapter 10). In addition, shortening or omitting parts of the nicks makefinding of receivers’ nicknames difficult. Therefore, a nickname is decompounded into itssingle parts of speech (POS) by the NickDecompounder software. Every POS is tagged tosee which part is later omitted within the discourse. With the information in Table 10.12,logged-in users and their nicknames are compared with the substrings of the messages.Therefore, nicknames with omitted POS and mistyped nicknames can be detected.

IdentifierMapper identifies possible written nicknames within discourses and maps themto the receivers. This software consists of two main parts: (1) detection and (2) mappingof identifiers. The first part marks text fragments as potential identifiers. The second partcompares these potential identifiers with the senders’ nicknames, decides which of themare identifiers, and maps them to the senders. IdentifierMapper is optimized for IRC butcan be adapted for other CMC systems. An overview of the IdentifierMapper is presentedin Figure 11.1.

Figure 11.1: Overview of the IdentifierMapper, Part I

After describing some general steps, the main parts of this Java program are describedthen in the next sections.

167

Chapter 11 Automated detection and mapping of identifiers

11.1 Information on the general stepsThe Dataset 2a used in this chapter is described in the following subsections.

11.1.1 Data collectionThe channel #defocus on the IRC network freenode with server address irc.freenode.net islogged.

11.1.2 Data extractionAs in Subsection 8.2.2, HTML tags are removed and several HTML entities are converted.Each message consists of a timestamp in milliseconds (e.g., 227062), a LogBot string thatindicates the message type (e.g., black), and the IRC message itself (e.g., “<TomFarr>CanadianFly, You can’t...”) (see Log Example 25, line 2). The prefix “irc-” of each messagetype (i.e., LogBot CSS class name) is removed, and the column separator “˜|˜” is used.

Log Example 251 <span class=“irc-date”>[00:03:47:062]</span> <span class=“irc-black”><TomFarr>

CanadianFly, You can’t...</span><br />2 227062˜|˜black˜|˜<TomFarr> CanadianFly, You can’t...˜|˜

11.1.3 AnalysisThe whole day of January 13, 2012 is analyzed manually and automatically. In summary,5605 public logged messages (1779 created by the system, 3826 messages written by users)are analyzed. These extracted messages are the basis of the discourse analysis in thischapter. Some quantitative statistics are also presented and visualized.

11.2 Read log and configThe software imports a cleaned and delimited log file. Optionally, one’s own identifier canbe read in from a config file. For example, one’s own nickname is mapped to incomingwhispering (as receiver), disconnect-messages (as sender), or to “you” as a referentialelement (that is not yet implemented).

11.3 Message structureThe message structure is shown in Log Example 4. Public messages can be dividedinto user messages (written by users to communicate with others) and system messages(reported by the CMC system, e.g., user joins, nickname changes).

11.3.1 Handle system-specific message templatesSystem messages in particular are based on templates (i.e., nicknames are found at fixedpositions). Table 11.1 shows several templates for mIRC and LogBot with absolute andrelative frequencies.

168

11.3 Message structure

Table 11.1: Analysis of the logged messagesMessage template Frequency Log Example 1

Message type LogBot CSS class name mIRC item Abs. Rel. Line

user black normal 3590 64.05% 1brick action 236 4.21% 9

system brown e.g., notice 0 0.00% 13green has joined 623 11.12% 7

has left 27 0.48% 15is now known as 80 1.43% 12sets mode 452 8.06% 10

navy quit 597 10.65% 16red e.g., version 0 0.00% 2

public 5605 100.00%

All found logged messages include senders’ nicknames at fixed positions. Representationsof the system-specific templates are shown in Table 11.2. Even some receivers’ nicknamescan be extracted from the system-specific templates (lines 2, 8, and 10). Messages areautomatically assigned to the found senders (see <sender>, <sender, old nick>, and<sender, new nick>) and receivers (see <receiver>; in line 4, “you” refers to one’s ownnickname). The senders’ messages are analyzed to find receivers, and this process isdescribed in the next two subsections.

Table 11.2: Automatic setting of senders and receivers

Log Example 1Line mIRC command System-specific message templates

1 - <sender> <sender’s message>2 CTCP [<receiver> VERSION]3 ECHO <sender’s message>4 INVITE * <sender> (<sender’s ip address/mask>) invites you to join <channel>5 JOIN * Now talking in <channel>6 * <server> sets mode: <modes>7 * <sender> (<sender’s ip address/mask>) has joined <channel>8 KICK * <receiver> was kicked from <channel> by <sender>9 ME * <sender> <sender’s message>10 MODE * <sender> sets mode <mode> <receiver>11 MSG *<sender>* <sender’s message>12 NICK * <sender, old nick> is now known as <sender, new nick>13 NOTICE -<sender>- <sender’s message>14 NOTIFY * <sender> [<sender’s ip address/mask>] is on IRC15 PART * <sender> (<sender’s ip address/mask>) has left <channel>16 QUIT * <sender> (<sender’s ip address/mask>) Quit (<sender’s quit message>)17 TOPIC * <sender> changes topic to <sender’s topic message>18 WALLOPS !<sender>! <sender’s message>!19 WHOIS <sender> is <sender’s ip address/mask> * <sender’s info message>

169

Chapter 11 Automated detection and mapping of identifiers

11.3.2 Handle template “direct addressing”The sender’s messages are analyzed. In Table 11.2, receivers’ nicknames are found onlyin user messages (i.e., “<sender’s message>” in lines 1 and 9) in all positions, makingdetection of the receiver’s nicknames more complicated. 1106 of all 5605 (19.73%) investi-gated messages include at least one nickname; in summary, 1146 nicknames are found. Anickname followed by a colon and a space is the most often used direct addressing form(59.69%). The top five templates cover 1091 nicknames (95.20%). They are presented inTable 11.3. “B” means the beginning of the written message, “E” the end of the message,and “identifier” is the placeholder for the written IRC nickname in discourse. Written textsuch as space or comma is surrounded by quotation marks. 76.18% of all messages withfound identifiers start with identifiers.

Table 11.3: Templates for direct addressing (Top 5)Frequency

Template Abs. Rel. Example

B identifier “:” 684 59.69% <Pici> snova: /me does somethingB identifier “ ” 139 12.13% <Invisible_Cat> mary-kate shut up“ ” identifier E 114 9.95% <Ursinha> thanks tomaw“ ” identifier “ ” 111 9.69% <tictric> sorry UdontKnow :) Was a bit to muchB identifier “,” 43 3.75% <cal> mary-kate, usually unnecessary

∑ 1091 95.20%1146 100.00%

11.3.3 Handle template “greeting/farewell”Greetings (opening sequences) and farewells (closing sequences) are often followed byreceivers’ identifiers. Both are an excellent starting and ending point for conversationthreading between communicators. Table 11.4 show these signal words within the ana-lyzed discourse.

Table 11.4: Signal words and receiversFrequency

Signal word Category Abs. Rel. Example

cu farewell 1 2.38% <jsoftw> D[_]: cugood evening greeting 1 2.38% <orbii> good evening, dax.good morning greeting 1 2.38% <mc44> good morning daxykinshai greeting 2 4.76% <tdubellz> hai orbiihello greeting 4 9.52% <orbii> hello tdubellz :)hey greeting 2 4.76% <berban> hey benonsoftwarehi greeting 27 64.29% <Aliv3> hi orbii!!howdy greeting 1 2.38% <dax> howdy orbiinight farewell 1 2.38% <enchilado> Gosh I’m tired. ’night orbiicat, christel,

everyone else...nn (night night) farewell 1 2.38% <christel> nn enchilado <3sleep well farewell 1 2.38% <orbiicat> sleep well, enchilado

∑ 42 100.00%

170

11.4 Communicator

Marked identifiers are compared with the words in Table 10.5. If they are equal, they areremoved, because they are usually not identifiers. In 42 cases (3.67%), nicknames occurbefore (1) or after (41) special signal words for greetings (38) and farewells (4). Furtherexamples of signal words are shown in Table 10.4.

11.4 CommunicatorFor each found sender, a nickname history is built up. Additionally, the availability of eachcommunicator is checked. 457 sender nicknames (456 case-insensitive) are extracted fromthe messages. The chat user <eir> is the most active one, followed by <redcheckers>and <dioz>. 17.88% of all messages in #defocus are created by these nicknames. Onaverage, 12.26 messages per communicator are sent.

11.4.1 Tracking identifier changesCommunicators can change their identifiers in IRC while chatting. Found text fragmentswithin nick change messages are automatically identifiers (see Table 11.2, line 12). It isimportant to track all changes of an identifier to know the history of the current one.Examples and visualization of identifier changes are shown in Section 10.4.

11.4.2 Check availability of communicatorsUsers usually inform others that they are currently not available online for a while (seeTable 2). The scope (visibility) of an identifier starts with a user joining the channel andends with a user leaving the channel or quitting the network (see Table 7.3). Problemsin mapping a written identifier in discourse to a logged-in communicator occur whena message with an included identifier coincides with the receiver’s quitting or leaving(e.g., user suddenly leaves the channel due to bad network connection) or the senderdid not notice the quit or leave. In Log Example 26 (line 1), the user <derp> quits theIRC network. The written word “derp” in the second line is a common word and not anidentifier.

Log Example 261 * derp (˜omg@wikimedia/Zalgo) Quit (Remote host closed the connection)2 <noms> This channel is full of derp today.

11.5 Identifier structureThe structure of each identifier is analyzed in two ways: complexity and parts of speech.Both variants are used to detect identifiers within text fragments (see Subsection 11.6.1).

11.5.1 Calculate complexity of identifiersSimilar to password meters (e.g., “The Password Meter” [Tod16]), a score is used tocalculate the complexity of identifiers. The first score is based on positive (additions, e.g.,number of characters) and negative (deductions, e.g., letters only) complexity attributes.The higher the score, the more complex the identifier and the lower the probability of its

171

Chapter 11 Automated detection and mapping of identifiers

being a common word. In Table 11.5, (1) the less complex nickname <alkafoo> with thelength of 7 characters (n = 7) has the score 44, (2) <chaospsychex> has a score of 49, and(3) <Phr33d0m> has 237.

Table 11.5: Complexity of three identifiersNicknames

(1) (2) (3)Requirement Weight Calculation n ∑ n ∑ n ∑Additionsnumber of characters 4 n ∗ weight 7 28 12 48 8 32upper-case 1 n ∗ weight 0 0 0 0 1 7lower-case 1 (length− n) ∗ weight 7 0 12 0 4 4digits 1 n ∗ weight 0 0 0 0 3 3minus 6 n ∗ weight 0 0 0 0 0 0underline 6 n ∗ weight 0 0 0 0 0 0grave accent 6 n ∗ weight 0 0 0 0 0 0special 6 n ∗ weight 0 0 0 0 0 0brackets 6 n ∗ weight 0 0 0 0 0 0type changes 25 n ∗ weight 1 25 1 25 6 150middle numbers or symbols 5 n ∗ weight 0 0 0 0 3 15requirements 10 n ∗ weight 1 10 1 10 3 30Deductionsletters only −n -7 -12 0numbers only −n 0 0 0repeated characters 2 −n ∗ weight 0 0 0 0 0 0consecutive upper-case 2 −n ∗ weight 0 0 0 0 0 0consecutive lower-case 2 −n ∗ weight 6 -12 11 -22 1 -2consecutive digits 2 −n ∗ weight 0 0 0 0 1 -2

∑ 44 49 237

11.5.2 Calculate identifierprint

An identifierprint denotes the characteristics of an identifier. Character sorting, letter case,and removing duplicates are taken into account to calculate an identifierprint. Writtenidentifiers that consist of transposed letters, reduplicated letters, or different case sen-sitivity can be detected. Table 11.6 shows two versions of the identifiers. For example,<alkafoo> has the identifierprint “aafkloo” (Version A) and “afklo” (Version B).

Table 11.6: IdentifierprintNicknames

Attribute Step (1) (2) (3)

sorting in alphabetical order, ascending, ignore case aafkloo accehhopssxy 033dhmPrletter case Version A: unchanged aafkloo accehhopssxy 033dhmPr

Version B: converted to lower-case aafkloo accehhopssxy 033dhmproccurrence Version A: unchanged aafkloo accehhopssxy 033dhmpr

Version B: removed duplicates afklo acehopsxy 03dhmpr

172

11.6 Comparing

11.5.3 Calculate POS groupsThe identifiers are decompounded into their single parts of speech (POS) with the inte-grated software NickDecompounder (see Subsection 9.6.1). Figure 11.2 shows a simplifiedactivity diagram of nickname decompounding.

Figure 11.2: Decompounding of nicknames with NickDecompounder

A nick with length n hasn

∑i=1

i n-grams and 2n−1 possibilities (paths) to split the nick. In the

following example, the nick <your_heart> is decompounded with NickDecompounder.

• Read nickname: The nickname <your_heart> is read in from a text file.

• Check validation: The string is a valid IRC nickname.

• Remove decoration: No decoration is found.

• Remove concatenation: After removing the concatenation “_” the new string is“yourheart” (with length 9).

• Decompound into n-grams: 45 n-grams are created (“y”, “o”, ..., “r”, “yo”, ...,“yourhear”, “ourheart”, “yourheart”).

• Look up n-grams in dictionary: Nine n-grams (“a”, “art”, “ear”, “he”, “hear”,“heart”, “our”, “you”, “your”) exist in the common dictionary, not including jargonand slang terms.

• Calculate path: In summary, 256 paths are possible.

• Evaluate path: For example, one path of the examined nickname consists of then-grams (words) “art”, “heart” and “you”. The n-grams “you” and “heart” fit into<yourheart> at once, but “art” and “heart” overlap. Therefore, “······art”, “····heart”,“you······”, “you···art” and “you·heart” are possible. In this case “you·heart” is thebest, because 8 positions of the whole nick are covered by the n-grams “you” and“heart” (88.89% of the total).

• Select best path and output decompounded nick: Finally, the best decompoundingof the nick <yourheart> is into “your heart”, which covers all nine characters (100%).This result is better than “your he art”, which needs more n-grams. The selectioncriteria for the best path in the order of selection are: highest coverage, less n-grams,longest found n-gram, and random selection.

173

Chapter 11 Automated detection and mapping of identifiers

11.6 ComparingComparing identifiers and their parts with text fragments (strings and portions of strings)is the most sophisticated part of the IdentifierMapper. An overview of the IdentifierMap-per in Figure 11.3 shows the previous steps in more detail.

Figure 11.3: Overview of the IdentifierMapper, Part II

In Figure 11.3, three lists are built up: (1) a list of communicators (i.e., senders) which in-cludes 100% correctly-detected senders, (2) a list of receivers (100% correctly detected), and(3) a list of marked receivers. An overview of the comparing process in IdentifierMapperis shown in Figure 11.4.

Figure 11.4: Overview of the IdentifierMapper: Comparing

11.6.1 Compare identifiers with text fragmentsAll senders are communicators and therefore possible receivers. They are comparedwith text fragments for each discourse message. Communicators (i.e., potential receivers)

174

11.6 Comparing

are only considered if their complexity scores are equal to or greater than a definedthreshold (Subsection 11.5.1). This threshold helps to prevent the detection of shortcommon words—in this case, these words can also be communicators—as identifiers(see Table 7.4). Receivers under the threshold are only detectable if senders use directaddressing. A receiver is detected within the analyzed message

1. via communicators (complexity ≥ threshold and communicator is available)

a) if an exact identifier is found within text fragment;

b) if a communicator’s nickname and a text fragment have at least one equalidentifierprint (Subsection 11.5.2); or

c) if the text fragment is an exact substring of the communicator’s nickname and

i. the omitting part of the communicator is a cardinal number,

ii. the omitting part is not an adjective or noun, or

iii. at least the first two POS of the communicator’s nickname and text frag-ment are the same (see Subsection 10.5.3).

2. via marked receivers

a) if an exact identifier is found in the beginning of the user’s message, followedby a colon or a comma (see Table 11.3); or

b) if an exact identifier is found and a greeting/farewell is used.

Depending on the scope of the communicator, the receiver is adapted (see Subsection11.6.2). Examples are given in Table 11.7. The threshold in these examples is 45. Commu-nicators are <alkafoo>, <chaospsychex>, and <NightChaos2>.

Table 11.7: Examples of detected and not detected receivers in discourseMessage Receiver Rule

<alkafoo> that’s chaospsychex <chaospsychex> 1a<chaospsychex> why alkafoo <alkafoo> not detected (complexity < threshold)<alkafoo> good chaospsycheeeex <chaospsychex> 1b (Version B is equal for both)<alkafoo> no chaosppyschex <chaospsychex> 1b (Version B is equal for both)<alkafoo> yes NightChaos2 <NightChaos2> 1ci (omitted part is CD)<alkafoo> yes Night2 <NightChaos2> not detected (omitted part is NN, see 1cii)<alkafoo> yes Chaos2 <NightChaos2> not detected (see 1ciii)<chaospsychex> alkafoo: ok <alkafoo> 2a<chaospsychex> hi alkafoo <alkafoo> 2b

Marked receivers are mapped to communicators’ nicknames. If necessary, the followingassignment steps are executed (see Table 11.8).

Table 11.8: Mapping of marked receivers to communicators: Assignment possibilitiesPossibility/-ies Explanation

0 No receiver is detected. An extended step is described in Chapter 11.1 Marked receiver is mapped to a specific communicator.2 or more More possible mappings are found. The last used of them is assigned. If never used

before, a random selection and assignment is executed.

175

Chapter 11 Automated detection and mapping of identifiers

11.6.2 Adapt scope of identifiers due to tracking list

The scopes of all tracked identifiers are adapted (see Subsection 11.4.1). Based on LogExample 27, “hi Qcoder00” (line 3) includes the identifier <Qcoder00> although its scopeends in line 2. The entries of the tracking list with their new scopes are compared with thetext fragments and, if necessary, the receivers’ nicknames are changed. Therefore, in line 3,the receiver of this message is <Guest9967> instead of <Qcoder00>.

Log Example 271 * Qcoder00_ is now known as Qcoder002 * Qcoder00 is now known as Guest99673 <bazhang> hi Qcoder004 * Guest9967 is now known as Qcoder00

11.6.3 Compare POS groups of logged-in identifiers with text fragments

All parts of decompounded identifiers are compared with text fragments of the discourseand, if parts and fragments match, are determined as identifiers (see Subsection 11.5.3).

11.6.4 Adapt orthography and compare

In this step, Section 10.5 is taken into consideration. Additionally to the identifierprints,orthographic errors are corrected with the help of the Levenshtein algorithm [WF74].Letter mappings are replaced. For example, as in leetspeak, “4” is replaced by “a”.Further examples are shown in Table 9.18. These new generated versions are searchedand compared in the discourse again.

11.7 Output

After analyzing the discourse log (see Log Example 28, lines 1 and 2), two text files arecreated. First, a file with all communicators and their unique identification numbers (IDs)(lines 3 to 5). Second, a file with all sender-receiver relations that include message IDs,timestamp in milliseconds, sender IDs, and receiver IDs (lines 6 and 7). Both files can beimported as comma-separated values (CSV) files for further analysis and visualizations.

Log Example 281 5600849˜|˜black˜|˜<Aliv3> hi berban˜|˜2 30966445˜|˜black˜|˜<berban> enchilado: yes˜|˜3 533,enchilado4 779,berban5 843,Aliv36 471,5600849,843,7797 2152,30966445,779,533

176

11.8 Chapter summary

11.8 Chapter summaryIn this chapter, the author presented automated detection and mapping of identifiers indetail; using the software IdentifierMapper. The next chapter extends this software byautomated receiver guessing.

177

CHAPTER 12 Automated receiver guessingwithout semantics

Messages are usually addressed to one or more specific recipients, to groups, or to ev-eryone in the channel. But explicit direct addressing is not always used or required;that is, not every message involves any receivers. In Chapter 11, the program Identi-fierMapper (with the help of NickDecompounder) finds senders and (parts of) writtenreceivers in messages, but cannot assign receivers to messages which contain none. Apossible message-to-receiver assignment depends on the previous discourse messages(i.e., discourse partners). All sender-receiver relations are stored, and adjacency pairs areconsidered. If IdentifierMapper does not detect receivers in a message, ReceiverGuesserguesses and assigns receivers without semantics (i.e., interpretation without semanticknowledge/meaning) to the message. Manually and automatically analyzed results forboth parts are compared in Chapter 13. In Figure 12.1, an overview of the architecture forthe analysis of sender-receiver relations is given. The architecture mainly consists of twoparts:

• automated detection and mapping of identifiers with the help of the programsIdentifierMapper and NickDecompounder (see Chapter 11); and

• automated receiver guessing without semantics with the program ReceiverGuesser,which is described in this chapter.

Figure 12.1: Overview of the architecture for the analysis of sender-receiver relations

In this chapter, receivers are automatically assigned to each message of the discourse. Thisautomated analysis is done without semantics by ReceiverGuesser, which is an extensionof IdentifierMapper (Figure 12.2). The “View DAT” (date/time) plays an important rolefor identifier guessing.

12.1 Information on the general stepsDataset 2a is described in Section 11.1 and used in this chapter. Some 2720 of 3826 (71.09%)user messages contain no identifiers, although they are usually related to receivers.

179

Chapter 12 Automated receiver guessing without semantics

Figure 12.2: Overview of the ReceiverGuesser

12.2 SettingsThe default settings specified in a config file are loaded.

12.2.1 Load analysis approachThe analysis approaches are top-down, bottom-up, and a combination of both. Thedifference between top-down and bottom-up approaches is in the direction in which thediscourse is analyzed (see Figure 12.3a). Additionally, a delta-range (∆-range) analysis isavailable (Figure 12.3b).

(a) Top-down/bottom-up (b) Delta-range

Figure 12.3: Analysis directions

12.2.2 Consider specific settingsThe delta (∆) specifies the message size of the neighborhood of messagen or the maximumtimestamp difference in milliseconds. For example, if ∆1 = 3 and ∆2 = 6, three messagesbefore, the current message, and six messages after the current one (i.e., [n− ∆1, n + ∆2] =[n− 3, n + 6]) are analyzed to determine the receiver of messagen. If the current messagenhas a timestamp of 100 ms, ∆1 = 30 ms, and ∆2 = 0 ms, only messages with timestampsbetween 70 ms and 100 ms are analyzed (i.e., [100− 30, 100 + 0] = [70, 100]).

Log Example 29 Timestamp ∆ Timestamp1 <MrElendig> ˆ where I live 67787096 ms - ms2 <UriR> which is where? 67795120 ms 8024 ms3 <MrElendig> norway 67799722 ms 4602 ms4 <Xd1358> Friday the 13th </drumroll> 67800791 ms 1069 ms

Depending on the timestamp difference between messages and the number of writtencharacters of the newer message, relations can be excluded. In Log Example 29, thewritten message with 27 characters (spaces included) in line 4 cannot be related to line3. It is usually not possible to read line 3, write 27 characters, and respond within 1069

180

12.2 Settings

ms (67800791 ms minus 67799722 ms). Therefore, lines 3 and 4 are not in the samethread. Lines 1 and 4 can be related because it is possible to read line 1, write a messagewith 27 characters, and respond within 13695 ms (67800791 ms minus 67787096 ms). Aformula used here for calculating the minimum response time RT is based on at least threeassumptions. First, the average reading speed is approximately 250 words per minute(WPM). Second, a typing speed of 50 WPM is considered average. Third, an averageEnglish word consists of about five letters. The formula is calculated as follows:

RT[n−1,n] [ms] = duration o f reading messagen−1[ms] + duration o f typing messagen [ms]

= length o f messagen−1 [W]reading speed [WPM]

+ length o f messagen [W]typing speed [WPM]

If message3 for reading has a length of 6 characters and message4 for writing has 27characters, the minimum response time is 6768 milliseconds. Therefore, line 4 is mostprobably not related to line 3.

RT[3,4] [ms] = 6 [characters]250 [WPM]

+ 27 [characters]50 [WPM]

= 6768 [ms]

Figure 12.4 shows detailed overview of IRC messages produced per hour. The timestampsin CET (Central European Time) are extracted from the log files (format hh:mm:ss:ms) andcut off (hh).

Figure 12.4: Messages per hour

Figure 12.5 is a 24× 60 map that represents the whole day in hours and minutes. Themost messages (42) are sent at 02:03 and are represented by the red cell. On average, 3.89messages are logged per minute. The average time gap between two messages is 15.41seconds. It is important to understand that a low delta (e.g., ∆1 = 1000 ms) cannot workperfectly, especially for the marked red cell.

Figure 12.5: Messages per minute

In Log Example 30, consecutive turns are taken into account. <alkafoo> makes fiveimmediately consecutive turns. The communicator “could have mistakenly hit the enterkey, or may want to emphasize a point, or just to take up space” [Neu05]. Direct addressing

181

Chapter 12 Automated receiver guessing without semantics

is only used in the first message (“TomFarr:”). All other messages are automaticallyaddressed to <TomFarr> by this rule.

Log Example 301 <alkafoo> TomFarr: samsung?2 <alkafoo> rca?3 <alkafoo> philips?4 <alkafoo> casio?5 <alkafoo> I CAN GO ON

12.3 Message structureThe message structure part of IdentifierMapper is extended in two points.

12.3.1 Handle messages addressed to a group or all

Signal words for greeting/farewell (e.g., “hi” and “bye”) in combination with words suchas “all”, “everyone”, or “friends” indicate that these messages are addressed either to agroup (line 3) or all users (lines 1 and 2) in the channel (see Table 10.5). Examples areshown in Log Example 31.

Log Example 311 <YX> Hi all2 <benonsoftware> Hello everyone3 <dexter> goodbyebye friends.

12.3.2 Consider adjacency pairs

An adjacency pair [SS73] is a sequence of two turns by different communicators in dis-course, one after the other. Examples of adjacency pairs are shown in Log Example 32.Although no identifiers are found in these messages, lines 1 and 2 (greeting-greeting), 3and 4 (question-answer), and 5 and 6 (inform-acknowledge), are related.

Log Example 321 <pur|> hello2 <happyfunpanda> hello!3 <ImTheBitch> Bananas. You know why?4 <berban> no5 <rcmaehl> and yes it’s joes6 <Ron–> i know... :P

12.4 RelationThis part of ReceiverGuesser connects senders of messages to their calculated receivers.

182

12.5 Chapter summary

12.4.1 Calculate the most probable receiversDifferent rules are applied to guess the receivers’ names for each message. Table 12.1gives examples. Individual rules can be weighted and set on/off for fine-tuning andtesting the calculations. The order of processing the rules is changeable in the source code.Duplicated relations per message are removed.

Table 12.1: Rules for calculating the most probable receiversOrder Rule Receiver

1 initial setting of sender’s relation “a group or all”2 message created with JOIN, MODE, NICK, PART, or QUIT commands “a group or all”3 no receiver found within message <receiver>latest

4 receiver found within message <receiver>5 greetings to a group or all “a group or all”6 farewell to a group or all “a group or all”7 empty sender’s message “none”

A top-down direction without a delta-range is set to analyze the discourses. Considerationof timestamp differences, adjacency pairs, consecutive turns, and weighted rules (e.g.,the younger the found relation, the higher the weight) is not taken into the calculationor presented results of this thesis. Based on the return values of the previous steps andthe calculation, ideally, the correct receivers’ names are guessed. Detection and guessingof receivers for building sender-receiver relations help to answer the question ”Whocommunicates with whom?” for every single discourse message.

12.4.2 Return the calculated receiversThe calculated receivers of each message are returned to the IdentifierMapper and includedin its output.

12.5 Chapter summaryIn this chapter, the author presented automated receiver guessing using the softwareReceiverGuesser. The next chapter compares manual vs. automated analysis.

183

CHAPTER 13 Manual vs. automated analysis

In this chapter, the approaches and results of the manual and automated analysis describedin Chapters 11 and 12 are explained and compared in detail. An overview is presented inFigure 13.1.

Figure 13.1: Overview of manual vs. automated analysis

First, the results of the author’s manual analysis are presented; second, the results ofthe automated analysis are achieved with the help of the software IdentifierMapper andReciverGuesser, followed by the comparison of both results.

13.1 Manual analysisIn this section, the author analyzes the logged IRC discourses from a structural pointof view. Based on the system-specific message templates (see Log Example 1), sendersand receivers are identified. Receivers are premarked, independent of their letter case,in senders’ messages with Microsoft Excel. Premarking helps to prevent any overlookedreceivers. Knowledge about topics and adjacency pairs is considered in order to findsender-receiver relations. The whole discourse is analyzed with different focuses: first, tofind senders; second, to find receivers; and third, to find their relations. These analyses aredone three times and compared to minimize analysis errors.

This section is split into two parts: (1) detection and mapping of identifiers, and (2) receiverguessing. A top-down analysis is done for the whole discourse by hand. A self-madescript written in HTML to create heatmaps is shown in Listing 13.1. Several figures shownin Chapter 13.1.3 visualize the manually analyzed IRC discourses.

13.1.1 Detection and mapping of identifiers20.45% of all 5605 messages contain one or more explicitly addressed receivers (see Table13.1). 1020 receivers are extracted in 990 “normal” messages, 126 receivers in 116 actionmessages. These 1106 messages contain 1146 identifiers. 102 receivers’ names are case-sensitive, 101 are case-insensitive.

185

Chapter 13 Manual vs. automated analysis

Table 13.1: Used direct addressingMessage Receiver

Frequency FrequencyReceiver(s) per message Abs. Rel. Abs. Rel.

0 4499 80.27% 0 0.00%1 1074 19.16% 1074 93.72%2 29 0.52% 58 5.06%3 2 0.04% 6 0.52%8 1 0.02% 8 0.70%

∑ 5605 100.00% 1146 100.00%

The detailed linguistic use of the written identifiers in the discourse is shown in Table 13.2.Some 318 of 1146 (27.75%) receivers’ names found by hand are freestanding and accuratelywritten. Adding punctuation to identifiers is the most frequently found way to surroundnicknames with text (67.02%). The most frequently found form used to change nicknames(inside) is letter case (3.58%). Multi-addressing is used in 32 messages containing 72identifiers (see Table 13.1). Therefore, 5645 sender-receiver relations exist. A self-madescript written in HTML to create heatmaps is shown in Listing 13.1.

Table 13.2: Linguistic use of written receivers

Usage FrequencyOutside Inside Category Abs. Rel.

freestanding accurately freestanding/exactly written nick - 318 27.75%not freestanding accurately punctuation in general 768 67.02%not freestanding accurately punctuation chat-specific 4 0.35%not freestanding not accurately saving keystrokes, time, and effort letter case 41 3.58%not freestanding not accurately saving keystrokes, time, and effort shortening 1 0.09%not freestanding not accurately saving keystrokes, time, and effort omitting POS 7 0.61%not freestanding not accurately creative linguistic playground reduplication 1 0.09%not freestanding not accurately creative linguistic playground diminutive 1 0.09%not freestanding not accurately creative linguistic playground linguistic playing 5 0.44%

∑ 1146 100.00%

Seven messages probably include identifiers but the sender-receiver relations are unclear.The potential identifiers do not occur as extracted logged-in users or variants. In LogExample 33, the unclear text fragments are written on a golden background. Each of thesemessages is unrelated and is not part of any other thread, even though the unclear textfragments look like identifiers. Maybe some communicators are lurkers, or messages arewrongly sent into this channel. The problem with lurkers can be solved if all currentlylogged-in communicators are detected and extracted at the beginning (e.g., in IRC withthe NAMES command).

186

13.1 Manual analysis

Log Example 331 <Aliv3> hi KiNG PiN2 <ttuttle> JavaLover: my other suggestion3 <CoJaBo> Lomion89CMN00b: I’ve had pretty hours4 <YX> hi mr highlighter :D5 <Aliv3> hey vegeta whats the scouter say about6 <orbiicat> nini babe7 <DigitalDragon> hello troii

The most active senders are <eir> (416 sent messages, 7.42%), <redcheckers> (402,7.17%), and <dioz> (184, 3.28%). The nicknames <noms> (87 times, 7.59%), <red-checkers> (86, 7.50%), and <christel> (85, 7.42%) are the most often specified targets (i.e.,receivers). The most directed sender-receiver relations with explicitly addressed receiversare <redcheckers> → <noms> (58, 14.54%), <jsoft> → <christel> (40, 10.03%), and<redcheckers>→ <jsoftw> (35, 8.77%).

Table 13.3: Assignment possibilities between messages and receivers, Part I

Message ReceiverFrequency Msg. with n receiver(s) Frequency

Category Message-to-receiver(s) Abs. Rel. n = 1 2 3 8 Abs. Rel.

assignable one-to-one 2832 50.53% 2832 0 0 0 2832 50.17%one-to-“two or more” 32 0.57% 0 29 2 1 72 1.28%one-to-“a group or all” 496 8.85% 496 0 0 0 496 8.79%unclear/unknown 466 8.31% 466 0 0 0 466 8.26%

not assignable none 1779 31.74% 1779 0 0 0 1779 31.51%

∑ 5605 100.00% 5573 29 2 1 5645 100.00%

In Table 13.3, assignment possibilities between discourse messages and receivers aresummarized. Each message is manually assigned to (1) one receiver, (2) two or morereceivers mentioned by name, or (3) a group or all users in the current channel. Somereceiver assignments are unclear. IRC system messages usually do not include receivers’identifiers, although, for example, a QUIT message could.

13.1.2 Receiver guessing

In some cases (20 messages with 21 identifiers), the previous detected and mappedidentifiers are not the actual receivers. In Log Example 34, some of these cases are writtenon a golden background. Most of them are addressed to all users in the channel. In line 3,<Aliv3> is chatting with <berban> about <Aliv3>’s friend <ttech>. Six one-to-“twoor more” relations (12 receivers) are changed to five one-to-one relations (5 receivers, e.g.,line 2) and one one-to-“a group or all” relation (one receiver, line 9). Therefore, in Table13.4, six sender-receiver relations are fewer in total than in Table 13.3.

187

Chapter 13 Manual vs. automated analysis

Log Example 341 <orbii> tdubellz likes dub-step2 <dax> meetvirginia: it’s because eir likes you3 <Aliv3> ttech is my friend4 <orbiicat> i dnt like savr5 <savr> why is christel’s birthday not in the topic?6 <savr> who is orbiicat?7 <CoJaBo> Vote CoJaBo for President =D8 <KindOne> eir don’t even exist there9 <bazhang> MC44 here?

Table 13.4: Assignment possibilities between messages and receivers, Part II

Message ReceiverFrequency Msg. with n receiver(s) Frequency

Category Message-to-receiver(s) Abs. Rel. n = 1 2 3 8 Abs. Rel.

assignable one-to-one 2837 50.62% 2837 0 0 0 2837 50.31%one-to-“two or more” 26 0.46% 0 23 2 1 60 1.06%one-to-“a group or all” 497 8.87% 497 0 0 0 497 8.81%unclear/unknown 466 8.31% 466 0 0 0 466 8.26%

not assignable none 1779 31.74% 1779 0 0 0 1779 31.55%

∑ 5605 100.00% 5579 23 2 1 5639 100.00%

Unclear messages or threads are skipped if subsequent messages do not provide clarity.In particular, the beginning of a logged discourse may consist of unclear and incompletethreads. Greetings with “Hi all” after joining the channel, or general messages (e.g., “*redcheckers is hungry now.”), are addressed to all communicators in the channel. Somemessages refer to the senders themselves, such as “* orbiicat realises that orbiicat is evenbetter liked than orbii”.

13.1.3 Visualization: Heatmap in HTMLThe following self-made script creates n×m maps (heatmaps).

<html><body><canvas id="visualizeCanvas" width="1000" height="1000" s t y l e ="border:1px␣

solid␣#FFFFFF;"></canvas>< s c r i p t >

var c = document . getElementById ( "visualizeCanvas" ) ;var c t x = c . getContext ( "2d" ) ;c t x . f i l l S t y l e = "#FFFFFF" ;c t x . s t rokeRec t ( 1 , 1 , 10 , 10) ;. . .c t x . f i l l S t y l e = "#FFFFFF" ;c t x . s t rokeRec t ( 5 4 1 , 741 , 10 , 10) ;

</ s c r i p t ></body></html>

Listing 13.1: Heatmap in HTML

188

13.2 Automated analysis

Detection and mapping of identifiers

Figure 13.2a shows the 5605 analyzed discourse messages line by line in their specificLogBot template colors for user messages and system messages (see Table 8.20). In Figure13.2b, the red cells in the 75× 75 map illustrate the 1106 messages with at least one receiverfound by hand.

(a) LogBot templates (b) Messages with written receivers

Figure 13.2: Heatmaps: 5605 messages

The next figures are 34× 34 maps. Figure 13.3a represents the extracted senders. Figure13.3b displays the found and written receivers in discourse. Figure 13.3c shows 399 uniquesender-receiver relations.

(a) Senders (b) Receivers (c) Relations

Figure 13.3: Heatmaps: 1146 messages with receivers

Receiver guessing

Figure 13.4a visualizes the results of Table 13.3. Each message is colored dark green (one-to-one), green (one-to-“two or more”), blue (one-to-“a group or all”), red (unclear/unknown),or black (not assignable). Figure 13.4b is a 76× 75 map with 5639 sender-receiver relations.It shows all sender-receiver relations by hand. Unclear receiver assignments are coloredwhite, system messages are black. Some 611 unique sender-receiver relations are detectedin total. IRC system messages are mapped to the category “not assignable” and coloredblack.

189

Chapter 13 Manual vs. automated analysis

(a) Message-to-receiver(s) (b) Sender-receiver relations

Figure 13.4: Heatmaps: 5605 messages with 5639 relations

13.2 Automated analysis

The automated approach is calculated with the help of IdentifierMapper and Receiver-Guesser. Detailed information about how both tools work is given in Chapters 11 and12.

13.2.1 Detection and mapping of identifiers with IdentifierMapper

IdentifierMapper marks 1142 text fragments (i.e., strings) in 1099 messages as identifiers.The seven examples mentioned in Log Example 33 are not assigned to identifiers.

13.2.2 Receiver guessing with ReceiverGuesser

The default message-to-receiver relation is one-to-“a group or all”. If an identifier isdetected within a message, this receiver is assigned to the sender’s messages until no newidentifier is found within them. Messages like “Hello everyone”, “hi all”, and “Hey guys”are automatically mapped to one-to-“a group or all”. The 1776 system messages are notconsidered for receiver guessing.

The previous results of IdentifierMapper are updated. Two duplicated relations areremoved. In Log Example 33, the receiver <dax> is detected twice in line 10 and also inline 12. The reflexive sender-receiver relation in line 11, <dax>→<dax>, is automaticallyremoved. Only the relation <dax>→ <Boohbah> remains. In summary, 1139 writtenidentifiers within 1099 messages are automatically detected and mapped.

13.3 Comparing both versions

The previous results of both approaches are compared in detail.

13.3.1 Detection and mapping of identifiers

Log Example 35 shows 14 differences (light goldenrod background) between the manualand the automated detection and mapping of the identifiers.

190

13.3 Comparing both versions

Log Example 351 <jjs999jjs> and tgree goes on a stadium tour2 <dax> hidubellz3 <dax> hugs444 <CanadianFly> hi dumbbells5 <orbii> twubwubwubwubbellz6 <mc44> good morning daxykins7 <redcheckers> knight ill play you too.8 <redcheckers> knight group chess is weird when its a stalemate9 * CanadianFly shoots chillout

10 <Boohbah> dax: are you named after Jadzia Dax from DS9?11 <dax> Boohbah: no, Ezri Dax12 <Boohbah> The role of Ezri Dax was created when Terry Farrell (who played Jadzia Dax)

<Boohbah> decided to leave the show and her character was subsequently killed by Dukat.13 * orbii cries, is an orphan now :(

Lines 1 to 6 are not found and mapped by the software although the messages containidentifiers. These are “tgree”→ <tgreer>, “dubellz”→ <tdubellz>, “44”→ <mc44>,“dumbbells”→ <tdubellz>, “twubwubwubwubbellz”→ <tdubellz>, and “daxykins”→ <dax>. In lines 7 to 9, the words “knight” and “chillout”, which are parts of the nicks<[knight01]> and <chill|out>, are marked as identifiers. They are wrongly overruledby the “common word” rule, i.e., the complexity scores of these identifiers are too low.Therefore, nine mappings are false negatives. The last four lines include five false positives.The nick <dax> is part of “Jadzia Dax” (lines 10 and 12) and “Ezri Dax” (lines 11 and 12)but not directly related to the communicator <dax>, although they are talking about theorigin of <dax>’s nickname. “orphan” is a common word but also the name of a currentlylogged-in user. In line 13, “orphan” is part of the message and wrongly determined as anidentifier.

Table 13.5 summarizes the differences between the two versions. The following abbrevia-tions are used in this table: “F” stands for “found” identifiers, “TP” for “true positive”,“FP” for “false positive”, and “FN” for “false negative” errors. The numbers in “FP/FN”refer to the log lines of Log Example 35.

Table 13.5: Detection and mapping of identifiers: Comparing results of both versions

Automated Example(s)Category Manual F TP FP FN TP FP/FN

- 318 319 315 4 3 “CanadianFly”→ <CanadianFly> 7–13in general 768 769 768 1 0 “orbii!!” → <orbii> 12chat-specific 4 4 4 0 0 “gheraint++”→ <gheraint>letter case 41 41 41 0 0 “sutekh”→ <Sutekh>shortening 1 0 0 0 1 “tgree”→ <tgreer> 1omitting POS 7 5 5 0 2 “knight”→ <[knight01]> 2, 3reduplication 1 1 1 0 0 “orbiiiiiiiiiiiiiiiiiiiiiiiiiiiii”→ <orbii>diminutive 1 0 0 0 1 “daxykins”→ <dax> 6linguistic playing 5 3 3 0 2 - 4, 5

∑ 1146 1142 1137 5 9

191

Chapter 13 Manual vs. automated analysis

The 3826 user messages are split into words by whitespace characters such as spaces tocalculate the summary of possible identifiers. 25240 words are extracted to define the“true negative” (TN) with TN = 24095 (extracted words minus identifiers found by hand).On the basis of these four values (TP = 1137, TN = 24095, FP = 5, and FN = 9), thefollowing statistical measures are computed in Table 13.6.

Table 13.6: Statistical measures for IdentifierMapper (adapted from [Faw06; Pow11])Terminology Formula Result

true positive rate TPR = TPTP+FN 99.21466%

true negative rate TNR = TNTN+FP 99.97925%

positive predictive value PPV = TPTP+FP 99.56217%

negative predictive value NPV = TNTN+FN 99.96266%

false positive rate FPR = FPFP+TN 0.02075%

false negative rate FNR = FNTP+FN = 1− TPR 0.78534%

false discovery rate FDR = FPTP+FP = 1− PPV 0.43783%

accuracy ACC = TP+TNTP+FP+FN+TN 99.94455%

13.3.2 Receiver guessing

The receivers of messages—and therefore the sender-receiver relations—are unclear in8.26% of manual mappings. Besides conversation threading, the distinction betweenaddressing specific users by names (one-to-one, one-to-“two or more”) and addressing allin the channels (one-to-“a group or all”) is difficult. Writing short messages (e.g., “hm”,“....”) can be seen as a very effective way of being noticed in a channel. Nevertheless,short messages like “lol”, “no”, “hehe”, and “+1” make mappings difficult. The averagelength of the analyzed discourse messages is 36.50 characters. Background knowledgeand understanding of slang and abbreviations are often necessary for correct message-to-receiver(s) mappings.

Table 13.7 compares both approaches for receiver guessing. The true positive rate (TPR)for the automated version is 76.04% (73.88% without unclear messages, 60.19% withoutunclear messages and system messages based on the manual analysis).

Table 13.7: Receiver guessing: Comparing results of both versionsAutomated

Manual FrequencyFrequency Abs.

Message-to-receiver(s) Abs. Rel. TP FN Rel. TPR

one-to-one 2837 50.31% 1775 1062 78.61% 62.57%one-to-“two or more” 60 1.06% 60 0 0.00% 100.00%one-to-“a group or all” 497 8.81% 208 289 21.39% 41.85%unclear/unknown 466 8.26% 466 0 0.00% 100.00%none 1779 31.55% 1779 0 0.00% 100.00%

∑ 5639 100.00% 4288 1351 100.00% 76.04%

192

13.4 Chapter summary

13.4 Chapter summaryThis chapter showed how well automated discourse analysis works compared to manualanalysis. Although the architecture developed in this thesis is mainly applied to IRC, itcould be easily tailored to other CMC systems. The next chapter summarizes the researchand results achieved as well as directions for future work.

193

Part IV

Conclusion

195

CHAPTER 14 Results and future work

This chapter presents a summary of the main results in the thesis. It concludes withproposals for future work.

14.1 SummaryA multiple-views analysis approach supports the analysis of computer-mediated dis-courses. The first two parts of the thesis show how human communication, and especiallycomputer-mediated communication (CMC), take place. Characteristics of CMC and itsforms (systems) such as Internet Relay Chat (IRC) are described in detail. A transactionalcommunication model for computer-mediated communication with five basic elements ispresented: communicator, CMC system, message, context, and communication barrier. Amultiple-views analysis approach focused on CMC discourses is introduced that extendsgeneral steps such as preparation, data collection, data extraction, analysis, and results.Twelve multiple views and their underlying key questions are defined to help in analyz-ing CMC discourses and extracting basic information (i.e., attributes). IRC is the initialexample of use to show how this communication model for CMC and the multiple-viewsanalysis approach work.

In the third part of the thesis, the main focus is on the “View REL” (relation) to identifysenders, receivers, and their structural relations among users. To know the sender-receiverrelations within discourses is essential in order to understand the question “Who iscommunicating with whom?” at any time. Identifiers such as IRC nicknames are analyzedto understand the linguistic possibilities in their creation, their basic structure, and hownicknames are used within discourses. The results show that there are various linguisticpossibilities in the creation of nicknames. However, the top 10 most frequently used POSgroups and the top 10 basic structure templates cover most of the investigated nicknames.The surprising fact is that about 90% of all written nicks are written exactly like thelogged-in ones. This value is probably much lower in other chats lacking an autocompletefeature. Additionally, two software tools are used to illustrate which parts of a nicknameare often omitted within the chat discourse. Both tools, IdentifierMapper, for detectionand mapping identifiers, and ReceiverGuesser, for receiver guessing, help automate theanswer to the important question “Who is communicating with whom?” The architecturesof both tools are described in detail. Discourse analysis was done by hand and automatedin order to compare the results. The true positive rate (TPR) for the automated versionwas 99.21% (detecting and mapping) and 76.04% (guessing).

14.2 Future workCMC offers a huge number of discourses with interesting information, especially forautomatic discourse analysis. Based on this thesis, existing software such as bots can beimproved. For example, Mutton’s “PieSpy Social Network Bot” infers and visualizes socialnetworks on IRC by monitoring occurrences of direct addressing [Mut04a]. Shortening

197

Chapter 14 Results and future work

or omitting parts of IRC nicknames are not detected. Additionally, messages withoutidentifiers can be better taken into account now. Further studies to improve automateddetection and guessing of sender-receiver relations could consider the following aspects:

• Automated receiver guessing with semantic knowledge (currently without seman-tics) to solve problems when creative linguistic playing within discourse is used.For example, the sender-receiver relation between <mercury> (“I’m chinese, fromShangHai,China”) and <eileen_> (“hi chinese guy”) in the fourth example in LogExample 24 could be detected.

• Comparing different analysis variants for the automated tools. Top-down discourseanalysis is done in this thesis. Maybe the bottom-up approach, a combination oftop-down and bottom-up, or another approach would be better (i.e., find moresender-receiver relations under certain circumstances).

• Merging the individual user’s nicknames to one profile. Users with more nicknamesare currently separately analyzed separately. The nicknames of the same user couldbe merged to one communicator profile (e.g., because of the same fingerprint).

• Based on the manual tagged data sets, machine learning-based approaches can bedeveloped.

Outside the scope of the specific implementation, some ideas for future research on thegeneral approach are, for example:

• Generating an indicator and its visualization for (computer-mediated) discoursesthat is based on the multiple-views analysis. This indicator measures quality andquantity factors. It could be used to compare discourses. Instead of an integer, aradar chart could be used for visualization. Each spoke could represent one of theviews (e.g., emotion, effectiveness).

• Developing a graphical user interface for IRC to minimize the “[l]ack of links be-tween people and what they say” [Smi+00].

198

Bibliography

[AC08b] Ahmed Abbasi and Hsinchun Chen. “Writeprints: A Stylometric Approachto Identity-Level Identification and Similarity Detection in Cyberspace”. In:ACM Transactions on Information Systems 26.2 (2008), pp. 1–29.

[AC08a] Ahmed Abbasi and Hsinchun Chen. “CyberGate: A Design Framework andSystem for Text Analysis of Computer-Mediated Communication”. In: MISQuarterly 32.4 (2008), pp. 811–837.

[Abb00] Janet Abbate. Inventing the Internet. MIT Press, 2000.

[Abr98] Dalen Abraham. Extensions to the Internet Relay Chat Protocol (IRCX). MicrosoftCorporation, 1998. URL: http://tools.ietf.org/id/draft-pfenning-irc-extensions-04.txt (visited on 01/21/2014).

[Adl+11] Ronald B. Adler, George Rodman, and Carrie Cropley Hutchinson. Under-standing Human Communication. Oxford University Press, 2011.

[AG01] Vir Bala Aggarwal and V. S. Gupta. Handbook of Journalism and Mass Communi-cation. Concept Publishing Company, 2001.

[AT09] Robert R. Agne and Karen Tracy. “Conversation, Dialogue, and Discourse”.In: 21st Century Communication: A Reference Handbook. Ed. by William F. Eadie.Vol. 1. SAGE Publications, 2009. Chap. 20, pp. 177–185.

[AF08] Sabah Al-Fedaghi. “Modeling Communication: One More Piece Falling intoPlace”. In: Proceedings of the 26th Annual ACM International Conference on Designof Communication (SIGDOC ’08). New York, NY, USA: ACM, 2008, pp. 103–110.

[AE+09] Nader Ale Ebrahim, Shamsuddin Ahmed, and Zahari Taha. “Virtual Teams: aLiterature Review”. In: Australian Journal of Basic and Applied Sciences (AJBAS)3.3 (2009), pp. 2653–2669.

[AL09] Russ Allbery and Charles H. Lindsey. RFC 5537: Netnews Architecture andProtocols. IETF Trust, 2009. URL: http://tools.ietf.org/html/rfc5537(visited on 10/21/2013).

[Alt97] Scott L. Althaus. “Computer-Mediated Communication in the UniversityClassroom: An Experiment with On-Line Discussions”. In: CommunicationEducation 46.3 (1997), pp. 158–174.

[Ama12] Natia Amaghlobeli. “Linguistic Features of Typographic Emoticons in SMSDiscourse”. In: Theory and Practice in Language Studies. Vol. 2. 2. Finland:Academy Publisher, 2012, pp. 348–354.

[And07] John M. Anderson. The Grammar of Names. New York, NY, USA: OxfordUniversity Press, 2007.

[And14] Jannis Androutsopoulos. “Computer-mediated Communication and Linguis-tic Landscapes”. In: Research Methods in Sociolinguistics: A Practical Guide. Ed.by Janet Holmes and Kirk Hazen. John Wiley & Sons, 2014. Chap. 5, pp. 74–90.

199

Bibliography

[Anr+05] Bernhard Anrig, Emmanuel Benoist, and David-Olivier Jaquet-Chiffelle. “Vir-tual? Identity”. In: FIDIS Deliverable 2.2: Set of use cases and scenarios. Ed. byThierry Nabeth. FIDIS, 2005, pp. 22–34.

[Ant09] João André Pereira Antunes. “Academic Instant Messaging System: Deploy-ing Instant Messaging Over an Existing Session Initiation Protocol and LDAPService Infrastructure Using the Message Session Relay Protocol”. MA thesis.Universidade Técnica de Lisboa, 2009.

[Are11] Jenny Arendholz. “Flattering and Flaming: Interpersonal Relations in OnlineMessage Boards”. PhD thesis. Augsburg, Germany: University of Augsburg,2011.

[AC07] Maria Corazon Aspeli-Castro. “Let’s Chat: An Analysis of some DiscourseFeatures of Synchronous Chat”. In: Journal of English Studies and ComparativeLiterature 9.1 (2007).

[BN98] Joshua D. Baer and Grant Neufeld. RFC 2369: The Use of URLs as Meta-Syntaxfor Core Mail List Commands and their Transport through Message Header Fields.Internet Society, 1998. URL: http://tools.ietf.org/html/rfc2369 (visitedon 10/20/2013).

[BB60] John Ball and Francis C. Byrnes. Research, Principles, and Practices in VisualCommunication. Information Age Publishing, 1960.

[BSJ11] Hani Bani-Salameh and Clinton Jeffery. “Teaching and Learning in a SocialSoftware Development Tool”. In: Social Media Tools and Platforms in LearningEnvironments. Ed. by Bebo White, Irwin King, and Philip Tsang. Springer,2011. Chap. 2, pp. 17–35.

[BD12] Stanley J. Baran and Dennis K. Davis. Mass Communication Theory: Foundations,Ferment, and Future. 6th ed. Wadsworth Series in Mass Communication andJournalism. Cengage Learning, 2012.

[Bar10a] Sandra Nekesa Barasa. Language, Mobile Phones and Internet: A Study of SMSTexting, Email, IM and SNS Chats in Computer Mediated Communication (CMC) inKenya. LOT Dissertation Series. Landelijke Onderzoekschool Taalwetenschap(LOT), 2010.

[Bar09] Dean C. Barnlund. “A Transactional Model of Communication”. In: Commu-nication Theory. Ed. by C. David Mortensen. 2nd ed. Transaction Publishers,2009.

[Bar03a] Naomi S. Baron. “Language of the Internet”. In: Handbook for Language Engi-neers. Ed. by Ali Ahmed Sabry Farghaly. CSLI Publications, 2003.

[Bar03b] Naomi S. Baron. “Why Email Looks Like Speech: Proofreading, Pedagogy,and Public Face”. In: New Media Language. Ed. by Jean Aitchison and Diana M.Lewis. Routledge, 2003. Chap. 9, pp. 85–94.

[Bar10b] Naomi S. Baron. “Discourse Structures in Instant Messaging: The Case ofUtterance Breaks”. In: Language@Internet 7.4 (2010).

200

Bibliography

[Bau10] Christine Bauer. Promotive Activities in Technology-Enhanced Learning: The Im-pact of Media Selection of Peer Review, Active Listening and Motivational Aspects.European University Studies/Europäische Hochschulschriften; Economicsand Management V: Publications Universitaires Européennes. Peter Lang,2010.

[Bay10] Nancy K. Baym. Personal Connections in the Digital Age. Digital Media andSociety Series. Polity Press, 2010.

[Bay98] Hillary Bays. “Framing and face in Internet exchanges: A socio-cognitiveapproach”. In: Linguistik online 1.1 (1998).

[BI95] Haya Bechar-Israeli. “From <Bonehead> to <cLoNehEAd>: Nicknames, Playand Identity on Internet Relay Chat”. In: Journal of Computer-Mediated Commu-nication 1.2 (1995).

[BS08] Michael Beißwenger and Angelika Storrer. “Corpora of Computer-MediatedCommunication”. In: Corpus Linguistics. An International Handbook. Ed. byAnke Lüdeling and Merja Kytö. Vol. 1. Handbücher zur Sprach- und Kom-munikationswissenschaft/Handbooks of Linguistics and CommunicationScience. Berlin, Germany: Mouton de Gruyter, 2008, pp. 292–308.

[BLO00] Boumediene Belkhouche and Cuauhtémoc Lemus-Olalde. “Multiple ViewsAnalysis of Software Designs”. In: International Journal of Software Engineeringand Knowledge Engineering 10.5 (2000), pp. 557–579.

[Ben+04] Jason Bengel, Susan Gauch, Eera Mittur, and Rajan Vijayaraghavan. “Chat-Track: Chat Room Topic Detection Using Classification”. In: Intelligence andSecurity Informatics, Second Symposium on Intelligence and Security Informatics,ISI 2004. Tucson, AZ, USA, 2004, pp. 266–277.

[Ben09] Teresa M. Bennett. “Development and Performance of Distributed Teams:Examining Differences Between Asynchronous and Synchronous Commu-nication in Planning Task Execution”. PhD thesis. Minneapolis, MN, USA:Capella University, 2009.

[Ber01] Bruce L. Berg. Qualitative research methods for the social sciences. 4th ed. Allyn &Bacon, 2001.

[Ber95] Arthur Asa Berger. Essentials of Mass Communication Theory. SAGE Publica-tions, 1995.

[Ber60] David K. Berlo. The Process of Communication: An Introduction to Theory andPractice. Holt, Rinehart and Winston, 1960.

[BL+94] Tim Berners-Lee, Larry Masinter, and Mark McCahill. RFC 1738: UniformResource Locators (URL). Network Working Group, 1994. URL: http://tools.ietf.org/html/rfc1738 (visited on 11/08/2013).

[BL+96] Tim Berners-Lee, Roy T. Fielding, and Henrik Frystyk Nielsen. RFC 1945:Hypertext Transfer Protocol –– HTTP/1.0. Network Working Group, 1996. URL:http://tools.ietf.org/html/rfc1945 (visited on 11/23/2013).

[Ber+09] Robin Berthier, Jorge Arjona, and Michel Cukier. “Analyzing the Processof Installing Rogue Software”. In: Proceedings of the 2009 IEEE/IFIP Interna-tional Conference on Dependable Systems and Networks, DSN 2009, Estoril, Lisbon,Portugal, June 29–July 2, 2009. IEEE Computer Society, 2009, pp. 560–565.

201

Bibliography

[Bez96] Rose-Marié Bezuidenhout. “The Role and Functions of Intrapersonal andTranspersonal Communication in the Management, Development, Transfor-mation and Transcendence of the Self: An Exploration”. MA Thesis. Johan-nesburg, South Africa: Rand Afrikaans University, 1996.

[BB12] Nitin Bhatnagar and Mamta Bhatnagar. Effective Communication and Soft Skills:Strategies for Success. Dorling Kindersley (India), 2012.

[Bü90] Karl Bühler. Theory of Language: The Representational Function of Language.Trans. by Donald Fraser Goodwin. Foundations of Semiotics. John BenjaminsPublishing Company, 1990.

[BLO07] Cristina Bicchieri and Azi Lev-On. “Computer-mediated communicationand cooperation in social dilemmas: an experimental analysis”. In: politics,philosophy & economics 6.2 (2007), pp. 139–168.

[Bin+10] Haji Binali, Chen Wu, and Vidyasagar Potdar. “Computational Approachesfor Emotion Detection in Text”. In: 4th IEEE International Conference on DigitalEcosystems and Technologies (IEEE DEST 2010). 2010, pp. 172–177.

[BP08] Vikram Bisen and Priya. Business Communication. 1st ed. New Age Interna-tional, 2008, p. 176.

[Bla06] A & C Black. Dictionary of Media Studies: Over 8,000 Terms Clearly Defined. Idealfor School and College. A & C Black, 2006.

[Bla04] Adam Blakeman. “An investigation of the language of Internet chat rooms”.Dissertation. Lancaster, England: LAMEL, Lancaster University, 2004.

[Boo04] Ranida Boonthanom. “Computer-Mediated Communication of Emotions: ALens Model Approach”. Dissertation. Tallahassee, FL, USA, 2004.

[Bor97] Prashant Bordia. “Face-to-Face Versus Computer-Mediated Communication:A Synthesis of the Experimental Literature”. In: Journal of Business Communi-cation 34.1 (1997), pp. 99–120.

[Boy10] Danah Boyd. “Social Network Sites as Networked Publics: Affordances, Dy-namics, and Implications”. In: Networked Self: Identity, Community, and Cultureon Social Network Sites (2010). Ed. by Zizi Papacharissi, pp. 39–58.

[Bra58] Richard Braddock. “An Extension of the “Lasswell Formula””. In: Journal ofCommunication 8.2 (1958), pp. 88–93.

[Bra89] Robert Braden. RFC 1122: Requirements for Internet Hosts - CommunicationLayers. Internet Engineering Task Force (IETF), 1989. URL: http://tools.ietf.org/html/rfc1122 (visited on 10/19/2013).

[Bra11] Paul Braeckel. “Feeling Bluetooth: From a Security Perspective”. In: Advancesin Computers 81 (2011). Ed. by Marvin V. Zelkowitz, pp. 161–236.

[BN08] Scott Brave and Clifford Nass. “Emotion in Human-Computer Interaction”. In:The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies,and Emerging Applications. Ed. by Andrew Sears and Julie A. Jacko. 2nd ed.Hillsdale, NJ, USA: Lawrence Erlbaum Associates, 2008.

202

Bibliography

[Bö+02] Katy Börner, William R. Hazlewood, and Sy-Miaw Lin. “Visualizing theSpatial and Temporal Distribution of User Interaction Data Collected in Three-Dimensional Virtual Worlds”. In: Sixth International Conference on InformationVisualization. IEEE Computer Society, 2002, pp. 25–31.

[Bro04] Edward Brocklesby. draft-brocklesby-irc-isupport-03: IRC RPL_ISUPPORT Nu-meric Definition. 2004.

[Bro12] Using Browser Properties for Fingerprinting Purposes. Enschede, Netherlands:University of Twente, 2012, pp. 169–176.

[BY83] Gillian Brown and George Yule. Discourse Analysis. Cambridge Textbooks inLinguistics. Cambridge University Press, 1983.

[Bru92] Amy Bruckman. Identity Workshop: Emergent Social and Psychological Phenomenain Text-Based Virtual Reality. Tech. rep. Cambridge, MA, USA: MIT MediaLaboratory, 1992.

[Bru96] Amy Bruckman. “Gender Swapping on the Internet”. In: High Noon on theElectronic Frontier: Conceptual Issues in Cyberspace. Ed. by Peter Ludlow. DigitalCommunication. MIT Press, 1996. Chap. 26, pp. 317–325.

[Bub01] Goran Bubaš. “Computer mediated communication theories and phenomena:Factors that influence collaboration over the Internet”. In: 3rd CARNet UsersConference. Zagreb, Croatia, 2001.

[BE08] Elizabeth A. Buchanan and Charles Ess. “Internet Research Ethics: The Fieldand Its Critical Issues”. In: The Handbook of Information and Computer Ethics.Ed. by Kenneth Einar Himma and Herman T. Tavani. John Wiley & Sons, 2008.Chap. 11, pp. 273–292.

[Bul13] BulletinBoards.com. Non-Threaded vs Semi-Threaded vs Threaded Display Formats.2013. URL: http://www.bulletinboards.com/ThreadHelp.cfm (visited on10/27/2013).

[BT10] John O. Burtis and Paul D. Turman. Leadership Communication as Citizenship.SAGE Publications, 2010.

[Bus06] Hadumod Bussmann. Routledge Dictionary of Language and Linguistics. Ed. byGregory Trauth and Kerstin Kazzazi. Taylor & Francis, 2006.

[But05] Simon Butcher. IRC/2 Numeric List. 2005. URL: https://www.alien.net.au/irc/irc2numerics.html (visited on 08/16/2012).

[But11] Keith Butterick. Introducing Public Relations: Theory and Practice. SAGE Publi-cations, 2011.

[Bye05] David Byers. Electronic Mail: Principles - DNS - Architectures - Spam. 2005. URL:http://www.ida.liu.se/~TDDI09/lectures/TDDI09-F4.pdf (visited on10/19/2013).

[Byn11] Terrell Bynum. “Computer and Information Ethics”. In: The Stanford Encyclope-dia of Philosophy. Ed. by Edward N. Zalta. Metaphysics Research Lab, StanfordUniversity, 2011. URL: http://plato.stanford.edu/archives/spr2011/entries/ethics-computer/ (visited on 02/23/2014).

203

Bibliography

[Cer69] Vinton G. Cerf. RFC 20: ASCII format for Network Interchange. Network Work-ing Group, 1969. URL: http://tools.ietf.org/html/rfc20 (visited on09/21/2013).

[Cha14] John Chaffee. Thinking Critically. 11th ed. Cengage Learning, 2014.

[CW01] Ravinder Chandhok and Geoffrey Wenger. RFC 2919: List-Id: A StructuredField and Namespace for the Identification of Mailing Lists. Internet Society, 2001.URL: http://tools.ietf.org/html/rfc2919 (visited on 10/20/2013).

[CM11] Daniel Chandler and Rod Munday. Dictionary of Media and Communication.Oxford University Press, 2011.

[Cha00] Alex Charalabidis. The Book of IRC - The Ultimate Guide to Internet Relay Chat.San Francisco, CA, USA: No Starch Press, 2000.

[Che03] Chun-Ying Chen. “Managing perceptions of information overload incomputer-mediated communication”. PhD thesis. College Station, TX, USA:Texas A&M University, 2003.

[Che12] Hsinchun Chen. Dark Web: Exploring and Data Mining the Dark Side of the Web.Integrated Series in Information Systems. Springer, 2012.

[Che+12] Ning Chen, Jun Zhu, Fuchun Sun, and Eric P. Xing. “Large-margin PredictiveLatent Subspace Learning for Multi-view Data Analysis”. In: IEEE Transactionson Pattern Analysis and Machine Intelligence 34.12 (2012), pp. 2365–2378.

[Che98] Brittney G. Chenault. “Developing Personal and Emotional Relationships ViaComputer-Mediated Communication”. In: Computer-Mediated CommunicationMagazine 5.5 (1998). URL: http://www.december.com/cmc/mag/1998/may/chenault.html (visited on 03/01/2017).

[Cho08] Mark S. Choate. Professional Wikis. Wiley Publishing, 2008.

[Chr06] Peter Christen. “A Comparison of Personal Name Matching: Techniques andPractical Issues”. In: Proceedings of the Sixth IEEE International Conference onData Mining - Workshops. Washington, DC, USA: IEEE Computer Society, 2006,pp. 290–294.

[Chu08] Dorothy M. Chun. “Computer-mediated discourse in instructed environ-ments”. In: Mediating Discourse Online. Ed. by Sally Sieloff Magnan. AILAApplied Linguistics Series 3. John Benjamins Publishing Company, 2008,pp. 15–45.

[Cla01] Herbert H. Clark. “Conversation: Linguistic Aspects”. In: International Ency-clopedia of the Social and Behavioral Sciences. Ed. by Neil J. Smelser and Paul B.Baltes. 1st ed. Elsevier, 2001.

[Clo+87] G. L. Clore, A. Ortony, and M. A. Foss. “The Psychological Foundations of theAffective Lexicon”. In: Journal of Personality and Social Psychology 53.4 (1987),pp. 751–766.

[Col14a] CollinsDictionary.com. Definition of “causality” | Collins English Dictionary.2014. URL: http://www.collinsdictionary.com/dictionary/english/causality (visited on 11/02/2014).

204

Bibliography

[Col14b] CollinsDictionary.com. Definition of “cause” | Collins English Dictionary. 2014.URL: http://www.collinsdictionary.com/dictionary/english/cause(visited on 11/02/2014).

[Col14c] CollinsDictionary.com. Definition of “content” | Collins English Dictionary. 2014.URL: http://www.collinsdictionary.com/dictionary/english/content(visited on 02/11/2014).

[Col14d] CollinsDictionary.com. Definition of “effect” | Collins English Dictionary. 2014.URL: http://www.collinsdictionary.com/dictionary/english/effect(visited on 11/02/2014).

[Col15] CollinsDictionary.com. Definition of “topic” | Collins English Dictionary. 2015.URL: http://www.collinsdictionary.com/dictionary/english/topic(visited on 01/01/2015).

[Cri+08] Nathan Crilly, David Good, Derek Matravers, and P. John Clarkson. “Designas communication: exploring the validity and utility of relating intention tointerpretation”. In: Design Studies 29.5 (2008), pp. 425–457.

[Cri03] Mark R. Crispin. RFC 3501: Internet Message Access Protocol - Version 4rev1.Internet Society, 2003. URL: http://tools.ietf.org/html/rfc3501 (visitedon 10/19/2013).

[Cro09] Dave Crocker. RFC 5598: Internet Mail Architecture. IETF Trust, 2009. URL:http://tools.ietf.org/html/rfc5598 (visited on 10/18/2013).

[Cro82] David H. Crocker. RFC 822: Standard for The Format of ARPA Internet TextMessages. 1982. URL: http://tools.ietf.org/html/rfc822 (visited on10/19/2013).

[CO08] David H. Crocker and Paul Overell. RFC 5234: Augmented BNF for SyntaxSpecifications: ABNF. IETF Trust, 2008. URL: http://tools.ietf.org/html/rfc5234 (visited on 09/21/2013).

[Cro+16] Lisa Crossley, Michael Woodworth, Pamela J. Black, and Robert Hare.“The dark side of negotiation: Examining the outcomes of face-to-face andcomputer-mediated negotiations among dark personalities”. In: Personalityand Individual Differences 91 (2016), pp. 47–51.

[Cry04] David Crystal. Language and the Internet. Cambridge University Press, 2004.

[Cry11] David Crystal. Internet Linguistics: A Student Guide. Routledge, 2011.

[Cug+09] Brian Cugelman, Mike Thelwall, and Phil Dawes. “Communication-BasedInfluence Components Model”. In: Proceedings of the 4th International Confer-ence on Persuasive Technology (Persuasive ’09). New York, NY, USA: ACM, 2009,17:1–17:9.

[Cun13a] Ward Cunningham. WikiWikiWeb - Wards Wiki. 2013. URL: http://c2.com/cgi/wiki?WardsWiki (visited on 07/28/2013).

[Cun13b] Ward Cunningham. WikiWikiWeb - Wiki History. 2013. URL: http://c2.com/cgi/wiki?WikiHistory (visited on 07/28/2013).

[DL83] Richard L. Daft and Robert H. Lengel. Information Richness: A New Approach toManagerial Behavior and Organization Design. Technical Report TR-ONR-DG-02.College Station, TX, USA: Department of Management, 1983, pp. 1–69.

205

Bibliography

[Dan67] Frank E. X. Dance. “Towards a Theory of Human Communication”. In: Hu-man Communication Theory: Original Essays. Ed. by Frank E. X. Dance. Holt,Rinehart and Winston, 1967, pp. 288–309.

[Dan82] Frank E. X. Dance. Human Communication Theory: Comparative Essays. Harper& Row, 1982.

[Dan08] Marcel Danesi. Dictionary of Media and Communications. M.E. Sharpe, 2008.

[Dan98] Brenda Danet. “Text as Mask: Gender, Play, and Performance on the Internet”.In: Cybersociety 2.0: Revisiting Computer-Mediated Communication and Commu-nity. Ed. by Steven G. Jones. Vol. 2. New media cultures. SAGE Publications,1998. Chap. 5, pp. 129–158.

[DB97] Boyd H. Davis and Jeutonne P. Brewer. Electronic Discourse: Linguistic Individ-uals in Virtual Space. Albany, NY, USA: State University of New York Press,1997.

[DS98] Frank Dawson and Derik Stenerson. RFC 2445: Internet Calendaring andScheduling Core Object Specification (iCalendar). Internet Society, 1998. URL:http://tools.ietf.org/html/rfc2445 (visited on 04/05/2014).

[DD09] Sathya Swaroop Debasish and Bhagaban Das. Business Communication. EasternEconomy Edition. PHI Learning, 2009.

[Dec97] John December. “Notes on Defining of Computer-Mediated Communication”.In: CMC Magazine 4.1 (1997). URL: http://www.december.com/cmc/mag/1997/jan/december.html (visited on 08/15/2013).

[Der+08] Daantje Derks, Agneta H. Fischer, and Arjan E. R. Bos. “The role of emotionin computer-mediated communication: A review”. In: Computers in HumanBehavior 24.3 (2008).

[DeV86] Joseph A. DeVito. The Communication Handbook: A Dictionary. Harper & Row,1986.

[DeV15] Joseph A. DeVito. Human Communication: The Basic Course. 13th ed. PearsonEducation, 2015.

[Dew+03] Christian Dewes, Arne Wichmann, and Anja Feldmann. “An Analysis ofInternet Chat Systems”. In: IMC ’03: Proceedings of the 3rd ACM SIGCOMMConference on Internet Measurement. New York, NY, USA: ACM, 2003, pp. 51–64.

[DH05] Siegfried Dewitte and Hendrik Hendricks. “Qualitative market research on-line. Easier said than typed.” In: Research Center of Marketing. K.U.Leuven,2005, pp. 1–20.

[DR08] Tim Dierks and Eric Rescorla. RFC 5246: The Transport Layer Security (TLS)Protocol - Version 1.2. IETF Trust, 2008. URL: http://tools.ietf.org/html/rfc5246 (visited on 11/23/2013).

[Die12] Nils Diewald. “Decentralized Online Social Networks”. In: Handbook of Tech-nical Communication. Ed. by Alexander Mehler, Laurent Romary, and DafyddGibbon. Vol. 8. Handbooks of Applied Linguistics [HAL]. Walter de Gruyter,2012, pp. 461–505.

206

Bibliography

[Dix73] Gail Dixon. Communication Theory and Interpreters Theatre: Toward a Model forthe Form. Honors Projects. 1973.

[DK01] Martin Dodge and Rob Kitchin. Atlas of Cyberspace. Addison-Wesley, 2001.

[Doe98] Wernfrid Doell. The Internet Relay Chat (IRC): Linguistic Perspectives. 1998.

[DV02] Judith Donath and Fernanda B. Viégas. “The Chat Circles Series: Explorationsin designing abstract graphical communication interfaces”. In: Proceedings ofthe 4th Conference on Designing Interactive Systems: Processes, Practices, Methods,and Techniques (DIS ’02). New York, NY, USA: ACM, 2002, pp. 359–369.

[Don+99] Judith Donath, Karrie Karahalios, and Fernanda Viégas. “Visualizing Conver-sation”. In: Proceedings of the 32nd Annual Hawaii International Conference onSystem Sciences (HICSS). Washington, DC, USA: IEEE Computer Society, 1999.

[Don+01] Judith Donath, Hyun-Yeul Lee, Danah Boyd, and Jonathan Goler. “Loom2- Intuitively Visualizing Usenet”. In: Proceedings of Conference on ComputerSupported Cooperative Work (CSCW). 2001.

[Dör03] Nicola Döring. Sozialpsychologie des Internet. Die Bedeutung des Internet fürKommunikationsprozesse, Identitäten, soziale Beziehungen und Gruppen. 2nd ed.Göttingen, Germany: Hogrefe, 2003.

[Dro+03] Ralph Droms, Jim Bound, Bernie Volz, Ted Lemon, Charles E. Perkins, andMike Carney. RFC 3315: Dynamic Host Configuration Protocol for IPv6 (DHCPv6).Internet Society, 2003. URL: http://tools.ietf.org/html/rfc3315 (visitedon 11/23/2013).

[Ebe+08] Anja Ebersbach, Markus Glaser, Richard Heigl, and Alexander Warta. Wiki:Web Collaboration. 2nd ed. Springer, 2008.

[Eis04] Abne M. Eisenberg. Anatomy of Communication: Interpersonal - Intercultural.AuthorHouse, 2004.

[Ekk12] Panteleimon Ekkekakis. “Affect, Mood, and Emotion”. In: Measurement inSport and Exercise Psychology. Ed. by Gershon Tenenbaum, Robert C. Eklund,and Akihito Kamata. Human Kinetics, 2012.

[Ekm99] Paul Ekman. “Basic Emotions”. In: Handbook of Cognition and Emotion. Ed. byTim Dalgleish and Mick Power. Sussex, England: John Wiley & Sons, 1999.Chap. 3, pp. 45–60.

[EF86] Paul Ekman and Wallace V. Friesen. “A New Pan-Cultural Facial Expressionof Emotion”. In: Motivation and Emotion 10.2 (1986), pp. 159–168.

[Eli10] Julien Elie. RFC 6048: Network News Transfer Protocol (NNTP) Additions toLIST Command. Internet Engineering Task Force (IETF), 2010. URL: http://tools.ietf.org/html/rfc6048 (visited on 10/25/2013).

[Erl05] Zippy Erlich. “Computer-Mediated Communication”. In: Encyclopedia of Dis-tance Learning. Ed. by Caroline Howard, Judith Boettcher, Lorraine Justice,Karen Schenk, Patricia L. Rogers, and Gary A. Berg. Vol. 1. Hershey, PA, USA:Idea Group, 2005, pp. 353–364.

[Erl+05] Zippy Erlich, Iris Erlich-Philip, and Judith Gal-Ezer. “Skills required forparticipating in CMC courses: An empirical study”. In: Computers & Education44.4 (2005), pp. 477–487.

207

Bibliography

[Eyn+08] Rebecca Eynon, Jenny Fry, and Ralph Schroeder. “The Ethics of Internet Re-search”. In: The SAGE Handbook of Online Research Methods. Ed. by Nigel Field-ing, Raymond M. Lee, and Grant Blank. SAGE Publications, 2008. Chap. 2,pp. 23–41.

[FR07] Siri Fagernes and Kirsten Ribu. “Ethical, Legal and Social Aspects of Systems”.In: Handbook of Network and System Administration. Ed. by Jan Bergstra andMark Burgess. Elsevier, 2007, pp. 969–997.

[Faw06] Tom Fawcett. “An Introduction to ROC Analysis”. In: Pattern RecognitionLetters 27.8 (2006), pp. 861–874.

[Fea06] Clive D. W. Feather. RFC 3977: Network News Transfer Protocol (NNTP). InternetSociety, 2006. URL: http://tools.ietf.org/html/rfc3977 (visited on10/21/2013).

[Fie06] Michael Fielding. Effective Communication in Organisations: Preparing MessagesThat Communicate. 3rd ed. Juta & Co., 2006.

[Fig+02] Maria Elena Figueroa, Manju Kincaid D. Lawrence Rani, and Gary L. Lewis.Communication for Social Change: An Integrated Model for Measuring the Pro-cess and Its Outcomes. Ed. by Brian I. Byrd. 2002. URL: http : / / www .communicationforsocialchange.org/pdf/socialchange.pdf (visited on03/07/2017).

[Fis90] John Fiske. Introduction to Communication Studies. 2nd ed. Studies in Cultureand Communication. Routledge, 1990.

[Fle09] Per Flensburg. “An Enhanced Communication Model”. In: International Jour-nal of Digital Accounting Research 9 (2009), pp. 31–43.

[Fli+04] Sarah Flicker, Dave Haans, and Harvey Skinner. “Ethical Dilemmas in Re-search on Internet Communities”. In: Qualitative Health Research 14.1 (2004),pp. 124–134.

[FK03] Nancy Flynn and Randolph Kahn. E-Mail Rules: A Business Guide to Man-aging Policies, Security, and Legal Issues for E-Mail and Digital Communication.AMACOM, 2003. Chap. 7.

[Fol13] Michael Angel Folorunso. Dynamics of Political Communication: A Treatise inHonour of Asiwaju Bola Tinubu. AuthorHouse, 2013.

[FB96a] Ned Freed and Nathaniel S. Borenstein. RFC 2045: Multipurpose Internet MailExtensions (MIME) Part One: Format of Internet Message Bodies. Network Work-ing Group, 1996. URL: http://tools.ietf.org/html/rfc2045 (visited on10/19/2013).

[FB96b] Ned Freed and Nathaniel S. Borenstein. RFC 2046: Multipurpose Internet MailExtensions (MIME) Part Two: Media Types. Network Working Group, 1996. URL:http://tools.ietf.org/html/rfc2046 (visited on 10/19/2013).

[FB96c] Ned Freed and Nathaniel S. Borenstein. RFC 2049: Multipurpose Internet MailExtensions (MIME) Part Five: Conformance Criteria and Examples. NetworkWorking Group, 1996. URL: http://tools.ietf.org/html/rfc2049 (visitedon 10/19/2013).

208

Bibliography

[Fre+96] Ned Freed, John Klensin, and Jon Postel. RFC 2048: Multipurpose Internet MailExtensions (MIME) Part Four: Registration Procedures. Network Working Group,1996. URL: http://tools.ietf.org/html/rfc2048 (visited on 10/19/2013).

[Fre00] Linton C. Freeman. “Visualizing Social Networks”. In: Journal of Social Struc-ture 1.1 (2000).

[Fre14] Freenode. freenode: Frequently-Asked Questions. 2014. URL: http://freenode.net/faq.shtml (visited on 04/13/2014).

[Fre08] Carmen Frehner. Email - SMS - MMS: The Linguistic Creativity of AsynchronousDiscourse in the New Media Age. Linguistic Insights. Studies in Language andCommunication. Peter Lang, 2008.

[Fre+11] Alan O. Freier, Philip Karlton, and Paul C. Kocher. RFC 6101: The SecureSockets Layer (SSL) Protocol Version 3.0. IETF Trust, 2011. URL: http://tools.ietf.org/html/rfc6101 (visited on 11/23/2013).

[Fri86] Nico H. Frijda. The Emotions. Studies in Emotion and Social Interaction. Cam-bridge University Press, 1986.

[Fuj09] Randy Fujishin. Creating Communication: Exploring and Expanding Your Funda-mental Communication Skills. 2nd ed. Rowman & Littlefield Publishers, 2009.

[GG06] Teri Kwal Gamble and Michael Gamble. Communication Works. 9th ed. TataMcGraw-Hill Education, 2006.

[GJ99] Angela Cora Garcia and Jennifer Baker Jacobs. “The Eyes of the Beholder:Understanding the Turn-Taking System in Quasi-Synchronous Computer-Mediated Communication”. In: Research on Language and Social Interaction 32.4(1999), pp. 337–367.

[Gar+94] Simson Garfinkel, Daniel Weise, and Steven Strassmann. The UNIX-HATERSHandbook. IDG Books Worldwide, 1994.

[Gee11] James Paul Gee. An Introduction to Discourse Analysis: Theory and Method.3rd ed. Routledge, 2011.

[Gel08] Andreas Gelhausen. IRC statistics. 2008. URL: http://irc.netsplit.de/(visited on 11/02/2011).

[Gel13] Andreas Gelhausen. IRC statistics. 2013. URL: http://irc.netsplit.de/(visited on 03/03/2013).

[Gel+98] Randall Gellens, Chris Newman, and Laurence Lundblade. RFC 2449: POP3Extension Mechanism. Internet Society, 1998. URL: http://tools.ietf.org/html/rfc2449 (visited on 10/19/2013).

[Gel98] Péter Gelléri. “The IRC Vernacular: A Linguistic Study of Internet RelayChat”. MA Thesis. 1998. URL: http://gelleri.weebly.com/the- irc-vernacular.html (visited on 09/19/2013).

[Ger67] George Gerbner. “Mass Media and Human Communication Theory”. In:Human Communication Theory: Original Essays. Ed. by Frank E. X. Dance.Pennsylvania, PA, USA: Holt, Rinehart and Winston, 1967, pp. 40–60.

[GA02] David Gill and Bridget Adams. ABC of Communication Studies. 2nd ed. NelsonThornes, 2002.

209

Bibliography

[GC00] James Gillies and Robert Cailliau. How the Web was Born: The Story of the WorldWide Web. Oxford University Press, 2000.

[Goe10] Anita Goel. Computer Fundamentals. Dorling Kindersley (India), 2010.

[GM04] Jennifer Golbeck and Paul Mutton. “Semantic Web Interaction on InternetRelay Chat”. In: Proceedings of Interaction Design on the Semantic Web. NewYork, NY, USA, 2004.

[GD04] Scott A. Golder and Judith Donath. “Social Roles in Electronic Communities”.In: Presented at Association of Internet Researchers (AoIR) conference InternetResearch 5.0. Brighton, England, 2004.

[Gre10] Sandra Greiffenstern. The Influence of Computers, the Internet and Computer-Mediated Communication on Everyday English. Logos Verlag, 2010.

[GK86] Ralph Grishman and Richard Kittredge. Analyzing Language in Restricted Do-mains: Sublanguage Description and Processing. Lawrence Erlbaum Associates,1986.

[Gru04] Jim Grubbs. “E-mail and Instant Messaging”. In: The Internet Encyclopedia:Volume 1. Ed. by Hossein Bidgoli. John Wiley & Sons, 2004.

[GW08] Linwu Gu and Jianfeng Wang. “Moderating Effects of Task type on Knowl-edge Sharing in a Virtual Team”. In: Proceedings of the 39th Annual Meeting ofthe Decision Sciences Institute. Indiana, PA, USA, 2008.

[GW13] Linwu Gu and Jianfeng Wang. “How conflicts may impact intentions to shareknowledge in a virtual team”. In: Issues in Information Systems (IIS) 14.2 (2013),pp. 79–86.

[Gur97] Laura J. Gurak. Persuasion and Privacy in Cyberspace: The Online Protests OverLotus MarketPlace and the Clipper Chip. Yale University Press, 1997.

[Hal65] J. D. Halloran. “The Communicator in Mass Communication Research”. In:The Sociological Review 13 (1965), pp. 5–21.

[Ham95] Sally Hambridge. RFC 1855: Netiquette Guidelines. Network Working Group,1995. URL: http://tools.ietf.org/html/rfc1855 (visited on 08/16/2012).

[Ham11] Cheryl Hamilton. Communicating for Results: A Guide for Business and theProfessions. 9th ed. Thomson/Wadsworth, 2011.

[Har10] Claire Hardaker. “Trolling in asynchronous computer-mediated communica-tion: From user discussions to academic definitions”. In: Journal of PolitenessResearch 6.2 (2010), pp. 215–242.

[Har04] Lee Hardy. TS6 Proposal (v7). 2004. URL: http://www.leeh.co.uk/ircd/TS6.txt (visited on 01/21/2014).

[Har95] Mark Harrison. The USENET Handbook: A User’s Guide to Netnews. A NutshellHandbook. O’Reilly & Associates, 1995.

[Har14] Richard Hartmann. Default Port for Internet Relay Chat (IRC) via TLS/SSL.IETF Trust, 2014. URL: http://tools.ietf.org/html/rfc7194 (visited on08/15/2017).

[Has09] Šárka Hastrdlová. “Language of Internet Relay Chat (reconsidering(im)politeness in the context of IRC)”. PhD thesis. Brno, Czech Republic:Masaryk University, 2009.

210

Bibliography

[Hat03] Maiko Hata. “Literature Review: Using Computer-Mediated Communicationin Second Language Classrooms”. In: Osaka Keidai Ronshu 54.3 (2003), pp. 115–125.

[Hau01] Roland Hausser. Foundations of Computational Linguistics: Human-ComputerCommunication in Natural Language. 2nd ed. Springer, 2001.

[Hei+06] Carmen Heine, Klaus Schubert, and Heidrun Gerzymisch-Arbogast. Text andTranslation: Theory and Methodology of Translation. Jahrbuch Übersetzen undDolmetschen. Narr Francke Attempto Verlag, 2006.

[Hen09] Harry Henderson. Encyclopedia of Computer Science and Technology. RevisedEdition. Facts on File Science Library. Facts On File, 2009.

[Hen98] Elke Hentschel. “Communication on IRC”. In: Linguistik online 1.1 (1998).

[Her96] Susan C. Herring. Computer-Mediated Communication: Linguistic, Social, andCross-cultural Perspectives. Pragmatics & Beyond. John Benjamins PublishingCompany, 1996.

[Her99] Susan C. Herring. “Interactional Coherence in CMC”. In: Journal of Computer-Mediated Communication 4.4 (1999).

[Her01] Susan C. Herring. “Computer-Mediated Discourse”. In: The Handbook of Dis-course Analysis. Ed. by Deborah Schiffrin, Deborah Tannen, and Heidi E.Hamilton. 1st ed. Oxford, England: Blackwell Publishing, 2001, pp. 612–634.

[Her02] Susan C. Herring. “Computer-Mediated Communication on the Internet”. In:Annual Review of Information Science and Technology 36.1 (2002), pp. 109–168.

[Her04] Susan C. Herring. “Computer-Mediated Discourse Analysis: An Approachto Researching Online Behavior”. In: Designing for Virtual Communities in theService of Learning. Ed. by Sasha A. Barab, Rob Kling, and James H. Gray.Learning in Doing: Social, Cognitive & Computational Perspectives. Cam-bridge University Press, 2004. Chap. 12.

[Her07] Susan C. Herring. “A Faceted Classification Scheme for Computer-MediatedDiscourse”. In: Language@Internet 4.1 (2007).

[Her13] Susan C. Herring. “Relevance in computer-mediated conversation”. In: Hand-book of pragmatics of computer-mediated communication. Ed. by Susan C. Herring,Dieter Stein, and Tuija Virtanen. Mouton de Gruyter, 2013, pp. 245–268.

[HA15] Susan C. Herring and Jannis Androutsopoulos. “Computer-Mediated Dis-course 2.0”. In: The Handbook of Discourse Analysis. Ed. by Deborah Tannen,Heidi E. Hamilton, and Deborah Schiffrin. 2nd ed. Oxford, England: JohnWiley & Sons, 2015, pp. 127–151.

[Her+04] Susan C. Herring, Lois Ann Scheidt, Sabrina Bonus, and Elijah Wright. “Bridg-ing the Gap: A Genre Analysis of Weblogs”. In: Proceedings of the 37th Hawai’iInternational Conference on System Sciences (HICSS-37). IEEE Computer SocietyPress. Los Alamitos, CA, USA, 2004.

[HG00] Ray Eldon Hiebert and Sheila Jean Gibbons. Exploring Mass Media for a Chang-ing World. Lawrence Erlbaum Associates, 2000.

211

Bibliography

[Hig92] Robert N. Higgins. “Computer-mediated Cooperative Learning: Synchronousand Asynchronous Communication Between Students Learning NursingDiagnosis”. PhD thesis. Toronto, Ontario, Canada, 1992.

[Hil+07] Anne Hill, James Watson, Danny Rivers, and Mark Joyce. Key Themes in Inter-personal Communication: Culture, Identities and Performance. Open UniversityPress, 2007.

[Hil74] Robert Christian Hillestad. “A Schematic Approach to a Theoretical Analysisof Dress as Nonverbal Communication”. Dissertation. Columbus, OH, USA:Ohio State University, 1974.

[HT85] Starr Roxanne Hiltz and Murray Turoff. “Structuring computer-mediatedcommunication systems to avoid information overload”. In: Communicationsof the ACM 28.7 (1985), pp. 680–689.

[HT93] Starr Roxanne Hiltz and Murray Turoff. The Network Nation: Human Communi-cation via Computer. MIT Press, 1993.

[HV08] Michael A. Hogg and Graham M. Vaughan. Social Psychology. 5th ed. PearsonEducation, 2008.

[Hol08a] Torsten Holmer. “Discourse Structure Analysis of Chat Communication”. In:Language@Internet 5.1 (2008).

[Hol+09] Torsten Holmer, Stephan Lukosch, and Verena Kunz. “Diminishing Chat Con-fusion by Multiple Visualizations”. In: Journal of Universal Computer Science15.16 (2009), pp. 3139–3157.

[Hol+12] Jan Rune Holmevik, Ian Bogost, and Gregory Ulmer. Inter/vention - Free Playin the Age of Electracy. Massachusetts Institute of Technology, 2012.

[Hol08b] Thomas M. Holtgraves. Language as Social Action: Social Psychology and Lan-guage Use. Taylor & Francis e-Library, 2008.

[Hol08c] Carolyn F. Holton. “The Impact of Computer Mediated Communication Sys-tems Monitoring on Organizational Communications Content”. PhD thesis.Tampa, FL, USA, 2008.

[HH09] Courtenay Honeycutt and Susan C. Herring. “Beyond Microblogging: Con-versation and Collaboration via Twitter”. In: Proceedings of the 42nd Hawai’iInternational Conference on System Sciences (HICSS-42). Los Alamitos, CA, USA:IEEE Computer Society, 2009.

[Hor08] Ray Horak. Webster’s New World Telecom Dictionary. Webster’s New World.Wiley Publishing, 2008.

[Hor86] Mark R. Horton. RFC 976: UUCP Mail Interchange Format Standard. NetworkWorking Group, 1986. URL: http://tools.ietf.org/html/rfc976 (visitedon 10/21/2013).

[Hä10] Antti Hätinen. “Mitigating Denial of Service attacks using Quality of Ser-vice mechanisms”. S-38.3138 Special Assignment, Aalto University Schoolof Science and Technology, Faculty of Electronics, Communications and Au-tomation, Department of Communications and Networking. 2010.

212

Bibliography

[Hua+06] Weidong Huang, Seok-Hee Hong, and Peter Eades. “How People Read So-ciograms: A Questionnaire Study”. In: Proceedings of the 2006 Asia-PacificSymposium on Information Visualisation (APVis ’06). Vol. 60. Tokyo, Japan: Aus-tralian Computer Society, 2006, pp. 199–206.

[HB05] James M. Hudson and Amy Bruckman. “Using Empirical Data to Reasonabout Internet Research Ethics”. In: Proceedings of the 9th European Conferenceon Computer-Supported Cooperative Work (ECSCW ’05). Ed. by Hans Gellersen,Kjeld Schmidt, Michel Beaudouin-Lafon, and Wendy E. MacKay. Springer,2005, pp. 287–306.

[Hun+10] Jeremy Hunsinger, Lisbeth Klastrup, and Matthew Allen, eds. InternationalHandbook of Internet Research. Springer, 2010.

[Hut+07] Carl Hutzler, Dave Crocker, Peter Resnick, Eric Allman, and Tony Finch.RFC 5068: Email Submission Operations: Access and Accountability Requirements.IETF Trust, 2007. URL: http://tools.ietf.org/html/rfc5068 (visited on10/19/2013).

[Iqb11] Farkhund Iqbal. “Messaging Forensic Framework for Cybercrime Investiga-tion”. Dissertation. Montréal, Quebec, Canada: Concordia University, 2011.

[IRC13] IRC Beginner.com. #Beginner - Etiquette Netiquette - Play Nice Have Fun. 2013.URL: http://www.ircbeginner.com/ircinfo/etiquette.html (visited on03/16/2014).

[IRC12] IRChelp.org. IRChelp.org Internet Relay Chat (IRC) help archive - RFC: InternetRelay Chat Protocol. 2012. URL: http://www.irchelp.org/irchelp/rfc/(visited on 03/03/2013).

[IRC17] IRCv3 Working Group. IRCv3 Specifications. 2017. URL: http://ircv3.net/irc/ (visited on 08/04/2017).

[ISO04] ISO. ISO/FDIS 8601:2004(E): Data elements and interchange formats - Informationinterchange - Representation of dates and times. 2004.

[JI+07] Ronald L. Jackson II, Darlene K. Drummond, and Sakile Camara. “What IsQualitative Research?” In: Qualitative Research Reports in Communication 8.1(2007), pp. 21–28.

[Jan10] Fred Edmund Jandt. An Introduction to Intercultural Communication: Identitiesin a Global Community. 6th ed. SAGE Publications, 2010.

[JL02] Jørgen Dines Johansen and Svend Erik Larsen. Signs in Use: An Introduction toSemiotics. Routledge, 2002.

[Joh04] Markéta Johnová. “The Language of Chat”. In: PHILOLOGICA.NET: An On-line Journal of Modern Philology (2004).

[Joh97] Deborah G. Johnson. “Ethics Online: Shaping social behavior online takesmore than new laws and modified edicts”. In: Communications of the ACM 40.1(1997), pp. 60–65.

[Joh99] Deborah G. Johnson. “Sorting Out the Uniqueness of Computer-Ethical Is-sues”. In: Etica & Politica/Ethics & Politics 1 (1999). Ed. by Luciano Floridi.URL: http://www2.units.it/etica/1999_2/johnson.html (visited on02/23/2014).

213

Bibliography

[Joh16] Wendell Johnson. The Communication Process and General Semantic Principles.2016. URL: http://www.nicholasjohnson.org/wjohnson/wjcompro.html(visited on 08/04/2017).

[Jon97] Ewa Jonsson. Electronic Discourse: On Speech and Writing on the Internet. 1997.URL: http://www.ludd.luth.se/~jonsson/D-essay/index.html (visitedon 03/16/2014).

[JD12] Andreas H. Jucker and Christa Dürscheid. “The Linguistics of Keyboard-to-screen Communication. A New Terminological Framework”. In: Linguistikonline 56.6 (2012).

[Kad+12] Zaemah Abd Kadir, Marlyna Maros, and Bahiyah Dato’ Hj. Abdul Hamid.“Linguistic Features in the Online Discussion Forums”. In: International Journalof Social Science and Humanity 2.3 (2012).

[Kal07] Yoram M. Kalman. “Silence in Text-Based Computer Mediated Communica-tion: The Invisible Component”. PhD thesis. Haifa, Israel: University of Haifa,2007.

[Kal00a] Christophe Kalt. RFC 2810: Internet Relay Chat: Architecture. Internet Society,2000. URL: http://tools.ietf.org/html/rfc2810 (visited on 08/16/2012).

[Kal00b] Christophe Kalt. RFC 2811: Internet Relay Chat: Channel Management. InternetSociety, 2000. URL: http://tools.ietf.org/html/rfc2811 (visited on08/16/2012).

[Kal00c] Christophe Kalt. RFC 2812: Internet Relay Chat: Client Protocol. Internet Society,2000. URL: http://tools.ietf.org/html/rfc2812 (visited on 08/16/2012).

[Kal00d] Christophe Kalt. RFC 2813: Internet Relay Chat: Server Protocol. Internet Society,2000. URL: http://tools.ietf.org/html/rfc2813 (visited on 08/16/2012).

[Kam16] Steven H. Kaminski. Communication Models. 2016. URL: http : / / www .shkaminski.com/Classes/Handouts/Communication%20Models.htm (vis-ited on 02/28/2016).

[Kar13] Juuso Karikoski. “Empirical analysis of mobile interpersonal communicationservice usage”. PhD thesis. Helsinki, Finland: Aalto University, 2013.

[KSD12] Anupam Karmakar and Bidisha Sarkar Datta. Principles and Practice of Man-agement and Business Communication. Dorling Kindersley (India), 2012.

[Kas+08] Sumit Kasera, Nishit Narang, and A. P. Priyanka. 2.5G Mobile Networks: GPRSand EDGE. McGraw-Hill Professional: Networking Series. Tata McGraw-HillPublishing Company, 2008.

[KY07] Hirofumi Katsuno and Christine Yano. “Kaomoji and Expressivity in aJapanese Housewives’ Chat Room”. In: The Multilingual Internet: Language,culture, and Communication Online. Ed. by Brenda Danet and Susan C. Herring.New York, NY, USA: Oxford University Press, 2007. Chap. 12, pp. 278–300.

[Kau09] Asha Kaul. Business Communication. 2nd ed. PHI Learning, 2009.

[Kav15] Barry Kavanagh. “A Contrastive Analysis of American and Japanese OnlineCommunication: A Study of UMC Function and Usage in Popular PersonalWeblogs”. PhD thesis. Sendai, Japan: Tohoku University, 2015.

214

Bibliography

[Keh92] Brendan P. Kehoe. Zen and the Art of the Internet: A Beginner’s Guide to theInternet. 1992. URL: http://www.cs.indiana.edu/docproject/zen/zen-1.0_6.html (visited on 03/16/2014).

[Ker03] Bernard Kerr. “THREAD ARCS: An Email Thread Visualization”. In: Proceed-ings of the IEEE Symposium on Information Visualization 2003 (INFOVIS ’03).Washington, DC, USA: IEEE Computer Society, 2003, pp. 211–218.

[KP07] Mehdi Khosrow-Pour. Dictionary of Information Science and Technology. IdeaGroup, 2007.

[Kie08] Sara Kiesler. “Network Nation: Human Communication via Computer”. In:HCI Remixed: Reflections on Works That Have Influenced the HCI Community.Ed. by Thomas Erickson and David W. McDonald. MIT Press, 2008.

[Kie+84] Sara Kiesler, Jane Siegel, and Timothy W. McGuire. “Social PsychologicalAspects of Computer-Mediated Communication”. In: 39.10 (1984), pp. 1123–1134.

[Kle08] John C. Klensin. RFC 5321: Simple Mail Transfer Protocol. IETF Trust, 2008. URL:http://tools.ietf.org/html/rfc5321 (visited on 09/20/2013).

[KN02] Graham Klyne and Chris Newman. RFC 3339: Date and Time on the Internet:Timestamps. Internet Society, 2002. URL: http://tools.ietf.org/html/rfc3339 (visited on 04/05/2014).

[KP05] Graham Klyne and Jacob Palme. RFC 4021: Registration of Mail and MIMEHeader Fields. Internet Society, 2005. URL: http://tools.ietf.org/html/rfc4021 (visited on 12/08/2013).

[Kor99] Heikki Kortti. On some similarities between discourse in the Internet Relay Chatand the conventions of spoken English. 1999. URL: http://www.robertecker.com/hp/research/publication/Kortti1999.zip (visited on 11/11/2011).

[KS07] C. P. J. Koymans and J. Scheerder. Email. Ed. by Jan Bergstra and Mark Burgess.Elsevier, 2007. Chap. 2.3, pp. 147–172.

[Koz05] Charles M. Kozierok. The TCP/IP Guide: A Comprehensive, Illustrated InternetProtocols Reference. No Starch Press, 2005.

[Kro94] Ed Krol. The Whole Internet: User’s Guide & Catalog. 2nd ed. A Nutshell Hand-book. O’Reilly & Associates, 1994.

[Kuc+06] Tayfun Kucukyilmaz, Berkant Barla Cambazoglu, Cevdet Aykanat, and FazliCan. “Chat Mining for Gender Prediction”. In: ADVIS. Ed. by Tatyana Yakhnoand Erich Neuhold. Vol. 4243. Lecture Notes in Computer Science. Berlin,Heidelberg, Germany: Springer, 2006, pp. 274–283.

[Kuc+08] Tayfun Kucukyilmaz, Berkant Barla Cambazoglu, Cevdet Aykanat, and FazliCan. “Chat Mining: Predicting User and Message Attributes in Computer-Mediated Communication”. In: Information Processing & Management 44.4(2008), pp. 1448–1466.

[Kul08] Sameer A. Kulkarni. A Text Book of Virtual Marketing. 1st ed. Excel Books, 2008.

[Kuo89] Jyrki Kuoppala. Re: How to get an urgent message to an arbitrary system. 1989.URL: http://securitydigest.org/tcp-ip/archive/1989/08 (visited on10/06/2013).

215

Bibliography

[Kus10] Sri Jin Kushal. Business Communication. V.K. (India) Enterprises, 2010.

[LZ05] Andrew Laghos and Panayiotis Zaphiris. “Frameworks for AnalyzingComputer-Mediated-Communication in e-Learning”. In: Proceedings of the11th International Conference on Human-Computer Interaction (HCI-International).Las Vegas, NV, USA, 2005.

[Lak06] Alexander Lakaw. Hiding behind nicknames. A linguistic study of anonymity inIRC chatrooms. Växjö, Sweden, 2006.

[Lan94] Derek R. Lane. “Computer-Mediated Communication in the Classroom: Assetor Liability?” In: Interconnect ’94 Teaching, Learning & Technology Conference(1994). URL: http://www.uky.edu/~drlane/techno/cmcasset.htm (visitedon 10/06/2013).

[LHJ09] Darren Langdridge and Gareth Hagger-Johnson. Introduction to Research Meth-ods and Data Analysis in Psychology. 2nd ed. Pearson Education, 2009.

[Lan13] Richard L. Lanigan. “Information theories”. In: Theories and Models of Communi-cation. Ed. by Paul Cobley and Peter J. Schulz. Handbooks of CommunicationScience. Walter de Gruyter, 2013. Chap. 4.

[LB05] Gwenaël Le Bodic. Mobile Messaging Technologies and Services: SMS, EMS andMMS. 2nd ed. John Wiley & Sons, 2005.

[Lee16] Carmen Lee. Multilingualism Online. Routledge, 2016.

[Lee02] Carmen K. M. Lee. “Literacy Practices in Computer-Mediated Communica-tion in Hong Kong”. In: The Reading Matrix 2.2 (2002), pp. 1–25.

[Lee11] Cheng Ean Lee. “Computer-Mediated Communication and OrganisationalCommunication: The Use of New Communication Technology in the Work-place”. In: SEARCH: The Journal of the South East Asia Research Centre forCommunication and Humanities 3 (2011), pp. 1–12.

[Lee93] Dick Lee. Developing Effective Communications: Extension and Agricultural Infor-mation. Columbia, MO, USA, 1993. URL: http://extension.missouri.edu/p/CM109 (visited on 09/12/2013).

[Lei+08] Jan Marco Leimeister, Karin Janina Schweizer, Stefanie Leimeister, and Hel-mut Krcmar. “Do virtual communities matter for the social support of pa-tients? Antecedents and effects of virtual relationships in online communities”.In: Information Technology & People (ITP) 21.4 (2008), pp. 350–374.

[Lei+12] Jan Marco Leimeister, Karin Janina Schweizer, and Helmut Krcmar. “DoVirtual Communities Have an Effect on the Social Network of Cancer Patients?Empirical Insights from Germany”. In: E-Health Communities and Online Self-Help Groups: Applications and Usage. Ed. by Asa Smedberg. IGI Global, 2012.Chap. 5, pp. 72–84.

[Len09] Lara Lengel. “Computer-Mediated Communication”. In: 21st Century Commu-nication: A Reference Handbook. Ed. by William F. Eadie. Vol. 2. SAGE Publica-tions, 2009. Chap. 60, pp. 543–549.

[LO09] Azi Lev-On. “Cooperation with and without Trust Online”. In: eTrust: FormingRelationships in the Online World. Ed. by Karen S. Cook, Chris Snijders, VincentBuskens, and Coye Cheshire. The Russell Sage Foundation series on trust.Russell Sage Foundation Publications, 2009. Chap. 11, pp. 292–318.

216

Bibliography

[Lev98] Edward Levinson. RFC 2387: The MIME Multipart/Related Content-type. Inter-net Society, 1998. URL: http://tools.ietf.org/html/rfc2387 (visited on10/19/2013).

[Lew00] Robert Edward Lewand. Cryptological Mathematics. The Mathematical Associ-ation of America, 2000.

[Lic10] Song Lichao. “The Role of Context in Discourse Analysis”. In: Journal ofLanguage Teaching and Research 1.6 (2010), pp. 876–879.

[Lin+12] Ying-Dar Lin, Ren-Hung Hwang, and Fred Baker. Computer Networks: AnOpen Source Approach. McGraw-Hill, 2012.

[LO04] Jeannette Littlemore and David Oakey. “Communication with a purpose:Exploiting the Internet to promote language learning”. In: ICT and LanguageLearning: Integrating Pedagogy and Practice. Ed. by Angela Chambers, Jean E.Conacher, and Jeannette Littlemore. The University of Birmingham Press,2004. Chap. 5, pp. 95–120.

[Liu99] Geoffrey Z. Liu. “Virtual Community Presence in Internet Relay Chatting”.In: Journal of Computer-Mediated Communication 5.1 (1999).

[Lö+09] Alexander Löser, Fabian Hueske, and Volker Markl. “Situational BusinessIntelligence”. In: Business Intelligence for the Real-Time Enterprise. Ed. by MaluCastellanos, Umesh Dayal, and Timos Sellis. LNBIP 27. Springer, 2009, pp. 1–11.

[MS00] Chris Mann and Fiona Stewart. Internet Communication and Qualitative Re-search: A Handbook for Researching Online. 1st ed. New Technologies for SocialResearch series. SAGE Publications, 2000.

[Man10] Maja Lina Mansen. “Think Globally, Act Locally - A Comparative Analysis ofVestas’ Web Communication and Localisation”. MA thesis. Aarhus, Denmark:Aarhus School of Business, Aarhus University, 2010.

[Mar+08] Michel Marcoccia, Hassan Atifi, and Nadia Gauducheau. “Text-Centeredversus Multimodal Analysis of Instant Messaging Conversation”. In: Lan-guage@Internet 5.7 (2008).

[Mar+94] Mitchell P. Marcus, Beatrice Santorini, and Mary A. Marcinkiewicz. “Buildinga Large Annotated Corpus of English: The Penn Treebank”. In: ComputationalLinguistics 19 (1994).

[MB10] Khaled Mardam-Bey. mIRC Version 7.17 - mIRC Help. 2010. URL: http://www.mirc.com/get.html (visited on 08/16/2012).

[Mar04] Annette N. Markham. “Internet communication as a tool for qualitative re-search”. In: Qualitative Research: Theory, Method and Practice. Ed. by DavidSilverman. 2nd ed. SAGE Publications, 2004, pp. 95–124.

[Mar06] Kristine Michelle Markman. “Computer-Mediated Conversation: The Organi-zation of Talk in Chat-Based Virtual Team Meetings”. PhD thesis. Austin, TX,USA, 2006.

[Mas11] Ruth E. Massingill. “Social Marketing Strategies for Combating HIV/AIDSin Rural and/or Disadvantaged Communities in Mexico, Uganda, and theUnited States”. PhD thesis. Middlesbrough, England: Teesside University,2011.

217

Bibliography

[MM12] Jonathan R. Mayer and John C. Mitchell. “Third-Party Web Tracking: Policyand Technology”. In: 2012 IEEE Symposium on Security and Privacy (SP). IEEEComputer Society, 2012, pp. 413–427.

[McC01] Michael McCarthy. “Discourse”. In: The Cambridge Guide to Teaching Englishto Speakers of Other Languages. Ed. by Ronald Carter and David Nunan. Cam-bridge University Press, 2001. Chap. 7, pp. 48–55.

[McL08] K. J. McLennan. The Virtual World of Work: How to Gain Competitive Advantagethrough the Virtual Workplace. Information Age Publishing, 2008.

[MW95] Denis McQuail and Sven Windahl. Communication Models for the Study of MassCommunications. 2nd ed. Routledge, 1995.

[McR96] Shannon McRae. “Coming Apart at the Seams: Sex, Text and the VirtualBody”. In: Wired Women: Gender and New Realities in Cyberspace. Ed. by LynnCherny and Elizabeth R. Weise. Computer/Cultural Studies. Seal Press, 1996,pp. 242–263.

[Mer12] Merlin. mIRC ScriptBox - RAW Name-Index. 2012. URL: http://www.mirc.org/mishbox/reference/raw.nameidx.htm (visited on 08/16/2012).

[Mer13] Merriam-Webster Online. Protocol - Definition and More from the Free Merriam-Webster Dictionary. 2013. URL: http : / / www . merriam - webster . com /dictionary/protocol (visited on 11/24/2013).

[Mer14a] Merriam-Webster Online. Causality - Definition and More from the Free Merriam-Webster Dictionary. 2014. URL: http : / / www . merriam - webster . com /dictionary/causality (visited on 10/26/2014).

[Mer14b] Merriam-Webster Online. Content - Definition and More from the Free Merriam-Webster Dictionary. 2014. URL: http : / / www . merriam - webster . com /dictionary/content (visited on 02/03/2014).

[Mer14c] Merriam-Webster Online. Context - Definition and More from the Free Merriam-Webster Dictionary. 2014. URL: http : / / www . merriam - webster . com /dictionary/context (visited on 04/21/2014).

[Mer17a] Merriam-Webster Online. Analysis | Definition of Analysis by Merriam-Webster.2017. URL: https://www.merriam- webster.com/dictionary/analysis(visited on 04/01/2017).

[Mer17b] Merriam-Webster Online. Emote | Definition of Analysis by Merriam-Webster.2017. URL: https://www.merriam-webster.com/dictionary/emote (visitedon 04/09/2017).

[Met02] Allan A. Metcalf. Predicting New Words: The Secrets of Their Success. HoughtonMifflin, 2002.

[Met94] J. Michel Metz. “Computer-Mediated Communication: Literature Review of aNew Context”. In: Interpersonal Computing and Technology: An Electronic Journalfor the 21st Century 2.2 (1994), pp. 31–49. URL: http://www.helsinki.fi/science/optek/1994/n2/metz.txt (visited on 08/03/2013).

[Mil06] Miroslav Milinovic. Internet Users’ Research Guide. 4th ed. Pearson Education,2006.

218

Bibliography

[ML05] Kevin L. Mitchell and Perry Lorier. draft-mitchell-irc-capabilities-01: IRC ClientCapabilities Extension. 2005.

[MDB01] Marie-Francine Moens and Rik De Busser. “Generic Topic Segmentation ofDocument Texts”. In: Proceedings of the 24th Annual International ACM SIGIRConference on Research and Development in Information Retrieval. Ed. by W. BruceCroft, David J. Harper, Donald H. Kraft, and Justin Zobel. SIGIR 2001. NewYork, NY, USA: ACM, 2001, pp. 418–419.

[Mon07] Rachel Sheal Preethi Mony. “An Exploratory Study of Docents as a Channelfor Institutional Messages at Free-choice Conservation Education Settings”.Dissertation. Columbus, OH, USA: Ohio State University, 2007.

[MD94] David M. Moore and Francis M. Dwyer. Visual Literacy: A Spectrum of VisualLearning. Educational Technology Publications, 1994.

[Moo96] Keith Moore. RFC 2047: MIME (Multipurpose Internet Mail Extensions) PartThree: Message Header Extensions for Non-ASCII Text. Network Working Group,1996. URL: http://tools.ietf.org/html/rfc2047 (visited on 10/19/2013).

[Mor78] Jacob Levy Moreno. Who Shall Survive? Foundations of Sociometry, Group Psy-chotherapy and Sociodrama. 3rd ed. Beacon House, 1978.

[Mor+79] Jane Morgan, Christopher O’Neill, and Rom Harré. Nicknames: Their Ori-gins and Social Consequences. Ed. by Rom Harré. Social Worlds of Childhood.London, Boston, and Henley: Routledge & Kegan Paul, 1979.

[MP13] Deborah Morley and Charles S. Parker. Understanding Computers: Today andTomorrow. 14th ed. Course Technology, Cengage Learning, 2013.

[Mur+09] Kenneth Murchison, Charles H. Lindsey, and Dan Kohn. RFC 5536: NetnewsArticle Format. IETF Trust, 2009. URL: http://tools.ietf.org/html/rfc5536 (visited on 10/22/2013).

[Mur00] Denise E. Murray. “Protean Communication: The Language of Computer-Mediated Communication”. In: TESOL Quarterly 34.3 (2000), pp. 397–421.

[Mut04a] Paul Mutton. “Inferring and Visualizing Social Networks on Internet RelayChat”. In: Proceedings of the Information Visualisation, Eighth International Con-ference (IV ’04). Washington, DC, USA: IEEE Computer Society, 2004, pp. 35–43.

[Mut04b] Paul Mutton. IRC Hacks: 100 Industrial-Strength Tips & Tools. Hacks Series.O’Reilly Media, 2004.

[Mye94] John G. Myers. RFC 1734: POP3 AUTHentication command. Network WorkingGroup, 1994. URL: http://tools.ietf.org/html/rfc1734 (visited on10/22/2013).

[MR96] John G. Myers and Marshall T. Rose. RFC 1939: Post Office Protocol - Version 3.Network Working Group, 1996. URL: http://tools.ietf.org/html/rfc1939 (visited on 10/22/2013).

[Nai11] Paulene Naidoo. “Intercultural Communication: A Comparative Study ofJapanese and South African Work Practice”. Dissertation. KwaDlangezwa,South Africa: University of Zululand, 2011.

[Nar06a] Uma Narula. Communication Models. Atlantic Publishers & Distributors, 2006.

219

Bibliography

[Nar06b] Uma Narula. Dynamics of Mass Communication: Theory and Practice. AtlanticPublishers & Distributors, 2006.

[Nar06c] Uma Narula. Handbook of Communication: Models, Perspectives, Strategies. At-lantic Publishers & Distributors, 2006.

[Nas05] Carlos M. Nash. “Cohesion and Reference in English Chatroom Discourse”.In: Hawaii International Conference on System Sciences 4 (2005), p. 108c.

[Nem+95] Evi Nemeth, Garth Snyder, Scott Seebass, and Trent R. Hein. UNIX SystemAdministration Handbook. 2nd ed. Prentice Hall, 1995.

[Neu05] Terrell Neuage. “Conversational analysis of chatroom talk”. PhD thesis. Ade-laide, Australia: University of South Australia, 2005.

[Neu02] Kimberly A. Neuendorf. The Content Analysis Guidebook. SAGE Publications,2002.

[Ngu08] Long V. Nguyen. “Computer Mediated Communication and Foreign Lan-guage Education: Pedagogical Features”. In: International Journal of Instruc-tional Technology and Distance Learning. Vol. 5. 12. 2008, pp. 23–44.

[Nis99] Helen Nissenbaum. “The Meaning of Anonymity in an Information Age”. In:The Information Society 15 (1999), pp. 141–144.

[Nis07] Helen Nissenbaum. “Will Security Enhance Trust Online, or Supplant It?” In:Trust and Distrust in Organizations: Dilemmas and Approaches. Ed. by RoderickM. Kramer and Karen S. Cook. The Russell Sage Foundation series on trust.Russell Sage Foundation Publications, 2007. Chap. 7, pp. 155–188.

[Nö95] Winfried Nöth. Handbook of Semiotics. Advances in Semiotics. Indiana Univer-sity Press, 1995.

[O’C+10] Theresa O’Connell, John Grantham, Wyatt Wong, Kevin Workman, andAlexander Wang. “dint u say that: Digital Discourse, Digital Natives andGameplay”. In: Journal of Virtual Worlds Research 3.1 (2010).

[Odo04] Wendell Odom. Computer Networking First-Step. First-Step. Cisco Press, 2004.

[Oik93] Jarkko Oikarinen. IRC History by Jarkko Oikarinen. 1993. URL: http://www.irc.org/history_docs/jarkko.html (visited on 05/06/2013).

[OR93] Jarkko Oikarinen and Darren Reed. RFC 1459: Internet Relay Chat Protocol.Network Working Group, 1993. URL: http://tools.ietf.org/html/rfc1459 (visited on 08/16/2012).

[Oku05] Yoshiko Okuyama. “Distance Language Learning via Synchronous Computer-Mediated Communication (SCMC): Eight Factors Affecting NS-NNS ChatInteraction”. In: JALT CALL Journal 1.2 (2005), pp. 3–20.

[OM03] Jacki O’Neill and David Martin. “Text Chat in Action”. In: Proceedings ofthe 2003 International ACM SIGGROUP Conference on Supporting Group Work(GROUP ’03). New York, NY, USA: ACM, 2003, pp. 40–49.

[Onl12] Online Etymology Dictionary. 2012. URL: http://www.etymonline.com/index.php?search=communication (visited on 09/20/2013).

[Onl13] Online Etymology Dictionary. 2013. URL: http://www.etymonline.com/index.php?term=mediator (visited on 11/17/2013).

220

Bibliography

[Onl17] Online Etymology Dictionary. 2017. URL: http://www.etymonline.com/index.php?term=analysis (visited on 04/01/2017).

[OA09] Angela Orebaugh and Jeremy E. Allnutt. “Data Mining Instant MessagingCommunications to Perform Author Identification for Cybercrime Investiga-tions”. In: ICDF2C. Ed. by Sanjay Goel. Vol. 31. Lecture Notes of the Institutefor Computer Sciences, Social Informatics and Telecommunications Engineer-ing. Springer, 2009, pp. 99–110.

[OT90] A. Ortony and Terence J. Turner. “What’s Basic About Basic Emotions?” In:Psychological Review 97.3 (1990), pp. 315–331.

[Oxf13] Oxford Dictionaries. hacktivist: definition of hacktivist in Oxford dictionary (British& World English). 2013. URL: http : / / www . oxforddictionaries . com /definition/english/hacktivist (visited on 11/23/2013).

[Pao99] John C. Paolillo. “The Virtual Speech Community: Social Network and Lan-guage Variation on IRC”. In: Journal of Computer-Mediated Communication.Vol. 4. 4. 1999.

[Par08] Craig Partridge. “The Technical Development of Internet Email”. In: IEEEAnnals of the History of Computing 30.2 (2008), pp. 3–29.

[Pat07] B. V. Pathak. Industrial Psychology and Sociology. 5th ed. Nirali Prakashan, 2007.

[Pea+11] Judy C. Pearson, Paul E. Nelson, Scott Titsworth, and Lynn Harter. HumanCommunication. 4th ed. McGraw-Hill, 2011.

[PD12] Roxy Peck and Jay L. Devore. Statistics: The Exploration & Analysis of Data.7th ed. Cengage Learning, 2012.

[Per+08] Manuel Perea, Jon Andoni Duñabeitia, and Manuel Carreiras. “R34D1NGW0RD5 W1TH NUMB3R5”. In: Journal of Experimental Psychology: HumanPerception and Performance 34.1 (2008), pp. 237–241.

[Per08] Elizabeth M. Perse. Media effects and society. LEA’s communication series.Taylor & Francis e-Library, 2008.

[Pet00] John Durham Peters. Speaking into the Air: A History of the Idea of Communication.University of Chicago Press, 2000.

[PS04] John Durham Peters and Peter Simonson. Mass Communication and Ameri-can Social Thought: Key Texts, 1919–1968. Critical Media Studies. Rowman &Littlefield Publishers, 2004.

[Pfe10] Ulrike Pfeil. Online Support Communities. Ed. by Panayiotis Zaphiris and CheeSiang Ang. Taylor & Francis, 2010. Chap. 6, pp. 121–150.

[Pid14] Pidgin Team. About Pidgin. 2014. URL: https://www.pidgin.im/about/(visited on 01/04/2014).

[Pin13a] Pingdom. Internet 2012 in numbers. 2013. URL: http://royal.pingdom.com/2013/01/16/internet-2012-in-numbers/ (visited on 11/09/2013).

[Pin13b] Pingdom. IRC is dead, long live IRC. 2013. URL: http://royal.pingdom.com/2012/04/24/irc-is-dead-long-live-irc/ (visited on 11/23/2013).

221

Bibliography

[PHL10] Heidar Pirzadeh and Abdelwahab Hamou-Lhadj. “A View of Monitoring andTracing Techniques and Their Application to Service-Based Environments”. In:Multimedia Services in Intelligent Environments: Software Development Challengesand Solutions. Ed. by George A. Tsihrintzis, Maria Virvou, and Lakhmi C. Jain.Vol. 2. Smart Innovation, Systems and Technologies. Springer, 2010. Chap. 4,pp. 49–62.

[Pla03] Ingo Plag. Word-Formation in English. Cambridge Textbooks in Linguistics.New York, NY, USA: Cambridge University Press, 2003.

[PW05] Michael Jay Polonsky and David S. Waller. Designing and Managing a ResearchProject: A Business Student’s Guide. SAGE Publications, 2005.

[Pom05] Jeffrey Pomerantz. “A Linguistic Analysis of Question Taxonomies”. In: Jour-nal of the American Society for Information Science and Technology 56.7 (2005),pp. 715–728.

[Pop06] Miroslav Popovic. Communication Protocol Engineering. Taylor & Francis, 2006.

[Pos81] Jon Postel. RFC 791: Internet Protocol. Defense Advanced Research ProjectsAgency, 1981. URL: http : / / tools . ietf . org / html / rfc791 (visited on04/06/2014).

[PR85] Jon Postel and Joyce Reynolds. RFC 959: File Transfer Protocol (FTP). NetworkWorking Group, 1985. URL: http://tools.ietf.org/html/rfc959 (visitedon 11/23/2013).

[Pow+04] Anne Powell, Gabriele Piccoli, and Blake Ives. “Virtual Teams: A Review ofCurrent Literature and Directions for Future Research”. In: DATA BASE forAdvances in Information Systems 35.1 (2004), pp. 6–36.

[PP10] Robert G. Powell and Dana L. Powell. Classroom Communication and Diversity:Enhancing Instructional Practice. 2nd ed. Routledge Communication Series.Taylor & Francis, 2010.

[Pow11] David M. W. Powers. “Evaluation: From Precision, Recall and F-Measureto ROC, Informedness, Markedness & Correlation”. In: Journal of MachineLearning Technologies 2.1 (2011), pp. 37–63.

[Pri14] Princeton University. WordNet Search - 3.1. Princeton, NJ, USA, 2014. URL:http://wordnetweb.princeton.edu/perl/webwn?s=content (visited on02/11/2014).

[Pro11] Candice Proudfoot. “An analysis of the relationship between writing skillsand “Short Messaging Service” Language: a self-regulatory perspective”. PhDthesis. Potchefstroom, South Africa: North-West University, 2011.

[Pta+11] Michal Ptaszynski, Rafal Rzepka, Kenji Araki, and Yoshio Momouchi. “Re-search on Emoticons: Review of the Field and Proposal of Research Frame-work”. In: Proceedings of the 17th Annual Meeting of the Association for NaturalLanguage Processing (NLP-2011). The Association for Natural Language Pro-cessing, 2011, pp. 1159–1162.

[PT10] Sergey Pupyrev and Alexey Tikhonov. “Analyzing Conversations with Dy-namic Graph Visualization”. In: Proceedings of 10th International Conference onIntelligent Systems Design and Applications (ISDA). IEEE, 2010, pp. 748–753.

222

Bibliography

[Pus10] Cornelius Puschmann. The corporate blog as an emerging genre of computer-mediated communication: features, constraints, discourse situation. GöttingerSchriften zur Internetforschung. Universitätsverlag Göttingen, 2010.

[Put00] Robert D. Putnam. Bowling Alone: The Collapse and Revival of American Commu-nity. Simon & Schuster Paperbacks, 2000.

[RS97] Sheizaf Rafaeli and Fay Sudweeks. “Networked Interactivity”. In: Journal ofComputer-Mediated Communication 2.4 (1997).

[Rei91] Elizabeth M. Reid. Electropolis: Communication and Community on Internet RelayChat. Honours Thesis. Melbourne, Australia: University of Melbourne, 1991.

[Ren04] Jan Renkema. Introduction to Discourse Studies. John Benjamins PublishingCompany, 2004.

[Res08] Peter W. Resnick. RFC 5322: Internet Message Format. IETF Trust, 2008. URL:http://tools.ietf.org/html/rfc5322 (visited on 09/20/2013).

[Rhe00] Howard Rheingold. The Virtual Community: Homesteading on the ElectronicFrontier. MIT Press, 2000.

[RR13] Jesse Rhoades and Rebecca Rhoades. “The Complexity of Online Discussion”.In: MERLOT Journal of Online Learning and Teaching 9.1 (2013).

[Ric+06] Peter Richter, Jelka Meyer, and Fanny Sommer. Well-being and Stress in Mobileand Virtual Work. Ed. by J. H. Erik Andriessen and Matti Vartiainen. Springer,2006, pp. 231–252.

[Rin+06] Christoph Ringlstetter, Klaus U. Schulz, and Stoyan Mihov. “OrthographicErrors in Web Pages: Toward Cleaner Web Corpora”. In: Computational Lin-guistics 32.3 (2006), pp. 295–340.

[Rin07] Erik Ringmar. A Blogger’s Manifesto: Free Speech and Censorship in the Age of theInternet. Anthem Press, 2007.

[Rin+01] E. Sean Rintel, Joan Mulholland, and Jeffery Pittam. “First Things First: Inter-net Relay Chat Openings”. In: Journal of Computer-Mediated Communication 6.3(2001).

[RR05] John W. Rittinghouse and James F. Ransome. IM Instant Messaging Security.Elsevier Digital Press, 2005.

[Riv02a] Giuseppe Riva. “Communicating in CMC: Making Order Out of Miscom-munication”. In: Say Not to Say: New Perspectives on Miscommunication. Ed.by Luigi Anolli, Rita Ciceri, and Giuseppe Riva. Emerging Communication:Studies in New Technologies and Practices in Communication. IOS Press,2002.

[Riv02b] Giuseppe Riva. “The Sociocognitive Psychology of Computer-Mediated Com-munication: The Present and Future of Technology-Based Interactions”. In:CyberPsychology & Behavior 5.6 (2002), pp. 581–598.

[RP99] Lynne D. Roberts and Malcom R. Parks. “The Social Geography of Gender-Switching in Virtual Environments on the Internet”. In: Information, Communi-cation and Society 2.4 (1999), pp. 521–540.

[Rod00] M. V. Rodriques. Perspectives of Communication and Communicative Competence.Concept Publishing Company, 2000.

223

Bibliography

[Roe09] Roeckx. DarNET Wiki: The 005 Numeric Explained. 2009. URL: http://wiki.darenet.org/The_005_Numeric_Explained (visited on 03/03/2013).

[RB16] Anna Rogala and Sylwester Bialowas. Communication in Organizational Envi-ronments: Functions, Determinants and Areas of Influence. Palgrave Macmillan,2016.

[RK81] Everett M. Rogers and D. Lawrence Kincaid. Communication Networks: Towarda New Paradigm for Research. The Free Press, 1981.

[Rua11] Li Ruan. “Meaningful Signs - Emoticons”. In: Theory and Practice in LanguageStudies 1.1 (2011), pp. 91–94.

[Rud97] Joseph Rudman. “The State of Authorship Attribution Studies: Some Prob-lems and Solutions”. In: Computers and the Humanities 31 (1997), pp. 351–365.

[Rüg07] Sabine Rüggenberg. “So nah und doch so fern. Soziale Präsenz und Ver-trauen in der computervermittelten Kommunikation”. Dissertation. Cologne,Germany: University of Cologne, 2007.

[SM07] Larry A. Samovar and Edwin R. McDaniel. Public Speaking in a MulticulturalSociety: The Essentials. Roxbury Publishing Company, 2007.

[Sau66] Ferdinand de Saussure. Course in General Linguistics. Ed. by Charles Bally andAlbert Sechehaye. McGraw-Hill Book Company, 1966.

[SS73] Emanuel A. Schegloff and Harvey Sacks. “Opening Up Closings”. In: Semiotica8.4 (1973), pp. 289–327.

[SR93] Jorge R. Schement and Brent D. Ruben. Between Communication and Information.Vol. 4. Information & Behavior. Transaction Publishers, 1993.

[Sch+09] Gary P. Schneider, Jessica Evans, and Katherine T. Pinard. The Internet. 5th ed.Available Titles Skills Assessment Manager (SAM) - Office 2010 Series. CourseTechnology, Cengage Learning, 2009.

[Sch12] Margrit Schreier. Qualitative Content Analysis in Practice. SAGE Publications,2012.

[Sch99] Markus Schulze. “Substitution of Paraverbal and Nonverbal Cues in theWritten Medium of IRC”. In: Dialogue Analysis and the Mass Media. Proceedingsof the International Conference, Erlangen, April 2–3, 1998. Naumann, Bernd,1999, pp. 65–82.

[ST08] David G. Schwartz and Dov Te’eni. “The Impact of Computer-Mediated Com-munication on Knowledge Transfer and Organizational Form”. In: Knowl-edge Management: An Evolutionary View. Ed. by Irma Becerra-Fernandez andDorothy E. Leidner. Vol. 12. Advances in Management Information Systems(AMIS). M.E. Sharpe, 2008. Chap. 8, pp. 145–162.

[Sea+04] Clive Seale, Gobo Gobo, Jaber F. Gubrium, and David Silverman. QualitativeResearch Practice. International advisory board. SAGE Publications, 2004.

[Seg02] Ylva Hård af Segerstad. “Use and Adaptation of Written Language to theConditions of Computer-Mediated Communication”. Doctoral Dissertation.Göteborg, Sweden: Göteborg University, 2002.

224

Bibliography

[Sen11] Sailesh Sengupta. Business and Managerial Communication. PHI Learning, 2011.

[Seu16] Pieter A. M. Seuren. “Saussure and his intellectual environment”. In: Historyof European Ideas 42.6 (2016), pp. 819–847.

[SE10] Kerstin Severinson Eklundh. “To Quote or Not to Quote: Setting the Contextfor Computer-Mediated Dialogues”. In: Language@Internet 7.5 (2010).

[Sha48] Claude E. Shannon. “A Mathematical Theory of Communication”. In: BellSystem Technical Journal 27 (1948), pp. 379–423, 623–656.

[SA85] Norman Z. Shapiro and Robert H. Anderson. Toward an Ethics and Etiquettefor Electronic Mail. Santa Monica, CA, USA: Rand Corporation, 1985.

[She94] Virginia Shea. Netiquette. Albion Books, 1994.

[SV08] Gary B. Shelly and Misty E. Vermaat. Discovering Computers: Fundamentals.5th ed. Shelly Cashman Series. Course Technology, Cengage Learning, 2008.

[She+06] Dou Shen, Qiang Yang, Jian-Tao Sun, and Zheng Chen. “Thread Detection inDynamic Text Message Streams”. In: Proceedings of the 29th Annual InternationalACM SIGIR Conference on Research and Development in Information Retrieval.SIGIR ’06. New York, NY, USA: ACM, 2006, pp. 35–42.

[Shi+06] Shufang Shi, Punya Mishra, Curtis J. Bonk, Sophia Tan, and Yong Zhao.“Thread Theory: A Framework Applied to Content Analysis of SynchronousComputer Mediated Communication Data”. In: International Journal of Instruc-tional Technology and Distance Learning 3.3 (2006), pp. 19–38.

[Shi02] Debra Littlejohn Shinder. Computer Networking Essentials - An essential guideto understanding networking theory, implementation, and interoperability. CiscoPress Core Series. Cisco Press, 2002.

[Sim02] James Simpson. “Discourse and synchronous computer-mediated commu-nication: uniting speaking and writing”. In: Unity and Diversity in LanguageUse. Ed. by Kristyan Spelman Miller and Paul Thompson. British Studies inApplied Linguistics Series. Continuum, 2002.

[Sim03] James Simpson. “The discourse of computer-mediated communication: Astudy of an online community”. PhD thesis. Reading, England: University ofReading, School of Linguistics and Applied Language Studies, 2003.

[Sim05] James Simpson. “Conversational floors in synchronous text-based CMC dis-course”. In: Discourse Studies 7.3 (2005), pp. 337–361.

[Smi+00] Marc Smith, J. J. Cadiz, and Byron Burkhalter. “Conversation Trees andThreaded Chats”. In: Proceedings of the ACM conference on Computer SupportedCooperative Work (CSCW ’00). New York, NY, USA: ACM, 2000, pp. 97–105.

[Smi99] Marc A. Smith. “Invisible Crowds in Cyberspace: Mapping the Social Struc-ture of the Usenet”. In: Communities in Cyberspace. Ed. by Marc A. Smith andPeter Kollock. Routledge, 1999, pp. 195–218.

[Sou00] Charles Soukup. “Building a theory of multi-media CMC: An analysis, cri-tique and integration of computer-mediated communication theory and re-search”. In: New Media & Society 2.4 (2000), pp. 407–425.

[SJ93] Michael C. St. Johns. RFC 1413: Identification Protocol. Network Working Group,1993. URL: http://tools.ietf.org/html/rfc1413 (visited on 06/05/2014).

225

Bibliography

[Ste06] Sheila Steinberg. Introduction to Communication - Course Book 1: The Basics. TheCourse Book Series. Juta & Co., 2006.

[Ste88] William Stephenson. The Play Theory of Mass Communication. TransactionBooks, 1988.

[Ste12] Janet Sternberg. Misbehavior in Cyber Places: The Regulation of Online Conductin Virtual Communities on the Internet. University Press of America, 2012.

[Ste00a] Jon Stevenson. Language Data Investigation. The Language of Internet Chat Rooms.2000.

[Ste00b] William Stewart. Living Internet. 2000. URL: http://www.livinginternet.com/ (visited on 05/12/2013).

[Sto07] Wyke Stommel. “Mein Nick bin ich! Nicknames in a German Forum on EatingDisorders”. In: Computer-Mediated Communication 13.1 (2007).

[Str+06] Carlo Strapparava, Alessandro Valitutti, and Oliviero Stock. “The AffectiveWeight of Lexicon”. In: Proceedings of the 5th International Conference on Lan-guage Resources and Evaluation (LREC-2006). 2006, pp. 423–426.

[Str+07] H. I. Strømsø, P. Grøttum, and K. H. Lycke. “Content and processes inproblem-based learning: a comparison of computer-mediated and face-to-facecommunication”. In: Journal of Computer Assisted Learning 23.3 (2007), pp. 271–282.

[Sun10] Hong-mei Sun. “A Study of the Features of Internet English from the Lin-guistic Perspective”. In: Studies in Literature and Language 1.7 (2010), pp. 98–103.

[Swi14] SwiftIRC. SwiftIRC Wiki. 2014. URL: http://wiki.swiftirc.net/index.php?title=Hosts (visited on 04/13/2014).

[Tag09] Caroline Tagg. “A corpus linguistics study of SMS text messaging”. PhDthesis. Birmingham, England: University of Birmingham, 2009.

[Tan03] Martin Tanis. “Cues to Identity in CMC: The Impact on Person Perception andSubsequent Interaction Outcomes”. PhD thesis. Amsterdam, Netherlands:University of Amsterdam, 2003, pp. 1–146.

[TP03] Martin Tanis and Tom Postmes. “Social Cues and Impression Formation inCMC”. In: Journal of Communication 53.4 (2003), pp. 676–693.

[Tav07] Mirko Tavosanis. “A Causal Classification of Orthography Errors in WebTexts”. In: IJCAI-2007: Workshop on Analytics for Noisy Unstructured Text Data(2007), pp. 99–106.

[Tay09] Charlotte Taylor. “’Laughter’ in L2 Computer-Mediated Discourse”. In: Ad-vances in Discourse Approaches. Ed. by Marta Dynel. Cambridge Scholars Pub-lishing, 2009. Chap. 9, pp. 174–200.

[Ten99] Jenifer Tennison. “Living Ontologies: Collaborative Knowledge Structuringon the Internet”. PhD thesis. Nottingham, England: University of Nottingham,1999.

[Teš92] Marie Tešitelová. Quantitative Linguistics. Vol. 37. Linguistic & literary studiesin Eastern Europe (LSEE). John Benjamins Publishing Company, 1992.

226

Bibliography

[Thi97] Paul J. Thibault. Re-reading Saussure: The dynamics of signs in social life. Rout-ledge, 1997.

[Thu13] Friedemann Schulz von Thun. Das Kommunikationsquadrat. 2013. URL: http:/ / www . schulz - von - thun . de / index . php ? article _ id = 71 (visited on09/14/2013).

[Thu17] Friedemann Schulz von Thun. Das Kommunikationsquadrat. 2017. URL: http:/ / www . schulz - von - thun . de / index . php ? article _ id = 71 (visited on03/07/2017).

[Thu+04] Crispin Thurlow, Laura Lengel, and Alice Tomic. Computer Mediated Commu-nication: Social Interaction and the Internet. SAGE Publications, 2004, p. 256.

[Tit+00] Stefan Titscher, Michael Meyer, Ruth Wodak, and Eva Vetter. Methods of Textand Discourse Analysis: In Search of Meaning. SAGE Publications, 2000.

[Tod16] Jeff Todnem. The Password Meter. 2016. URL: http://www.passwordmeter.com/ (visited on 07/21/2016).

[TM00] Kristina Toutanova and Christopher D. Manning. “Enriching the knowledgesources used in a maximum entropy part-of-speech tagger”. In: Proceedings ofthe 2000 Joint SIGDAT conference on EMNLP/VLC. Morristown, NJ: Associationfor Computational Linguistics, 2000.

[Tra07] Robert L. Trask. Language and Linguistics: The Key Concepts. Ed. by Peter Stock-well. 2nd ed. Routledge, 2007.

[Tru04] Craig W. Trumbo. “Research Methods in Mass Communication Research: ACensus of Eight Journals 1990–2000”. In: Journalism and Mass CommunicationQuarterly 81.2 (2004), pp. 417–436.

[TC78] Stewart L. Tubbs and Robert M. Carter. Shared Experiences in Human Commu-nication. Hayden Book Company, 1978.

[Tur94] Sherry Turkle. “Constructions and Reconstructions of Self in Virtual Reality:Playing in the MUDs”. In: Mind, Culture, and Activity 1.3 (1994), pp. 158–167.

[Twi14] Twitter. Twitter Help Center | FAQs about Trends on Twitter. 2014. URL: https://support.twitter.com/articles/101125- faqs- about- trends- on-twitter (visited on 12/30/2014).

[TM11] Kavita Tyagi and Padma Misra. Professional Communication. PHI Learning,2011.

[Und00] Undernet Coder Committee. Undernet P10 Protocol and Interface Specification.2000. URL: http://web.mit.edu/klmitch/Sipb/devel/src/ircu2.10.11/doc/p10.html (visited on 01/21/2014).

[Urb14a] Urban Dictionary. Urban Dictionary: :) 2014. URL: http : / / www .urbandictionary.com/define.php?term=:) (visited on 10/13/2014).

[Urb14b] Urban Dictionary. Urban Dictionary: ˆ. 2014. URL: http : / / www .urbandictionary.com/define.php?term=^ (visited on 10/13/2014).

[Urb14c] Urban Dictionary. Urban Dictionary: :( 2014. URL: http : / / www .urbandictionary.com/define.php?term=:( (visited on 10/13/2014).

227

Bibliography

[Urb14d] Urban Dictionary. Urban Dictionary: :/. 2014. URL: http : / / www .urbandictionary.com/define.php?term=:/ (visited on 10/13/2014).

[Urb14e] Urban Dictionary. Urban Dictionary: <3. 2014. URL: http : / / www .urbandictionary.com/define.php?term=<3 (visited on 10/13/2014).

[Urb14f] Urban Dictionary. Urban Dictionary: cluebie. 2014. URL: http : / / www .urbandictionary.com/define.php?term=cluebie (visited on 05/07/2014).

[Urb14g] Urban Dictionary. Urban Dictionary: :D. 2014. URL: http : / / www .urbandictionary.com/define.php?term=:D (visited on 10/13/2014).

[Urb14h] Urban Dictionary. Urban Dictionary: flooder. 2014. URL: http : / / www .urbandictionary.com/define.php?term=flooder (visited on 11/02/2014).

[Urb14i] Urban Dictionary. Urban Dictionary: guru. 2014. URL: http : / / www .urbandictionary.com/define.php?term=guru (visited on 05/07/2014).

[Urb14j] Urban Dictionary. Urban Dictionary: newbie. 2014. URL: http : / / www .urbandictionary.com/define.php?term=newbie (visited on 05/07/2014).

[Urb14k] Urban Dictionary. Urban Dictionary: Noob. 2014. URL: http : / / www .urbandictionary.com/define.php?term=Noob (visited on 05/07/2014).

[Urb14l] Urban Dictionary. Urban Dictionary: :o. 2014. URL: http : / / www .urbandictionary.com/define.php?term=:o (visited on 10/13/2014).

[Urb14m] Urban Dictionary. Urban Dictionary: oldbie. 2014. URL: http : / / www .urbandictionary.com/define.php?term=oldbie (visited on 05/07/2014).

[Urb14n] Urban Dictionary. Urban Dictionary: :P. 2014. URL: http : / / www .urbandictionary.com/define.php?term=:P (visited on 10/13/2014).

[Urb14o] Urban Dictionary. Urban Dictionary: xD. 2014. URL: http : / / www .urbandictionary.com/define.php?term=XD (visited on 10/13/2014).

[Van08] Teun A. Van Dijk. Discourse and Context: A Sociocognitive Approach. CambridgeUniversity Press, 2008.

[VER05] H. A. Van Essen and A. F. Rovers. “Layered Protocols Approach to AnalyzeHaptic Communication over a Network”. In: Proceedings of the First Joint Euro-haptics Conference and Symposium on Haptic Interfaces for Virtual Environmentand Teleoperator Systems (WHC ’05). Washington, DC, USA: IEEE ComputerSociety, 2005, pp. 30–39.

[VG06] Kate M. Van Gass. “’Wat sê jy?’ The linguistic characteristics of Afrikaans onIRC”. In: Stellenbosch Papers in Linguistics PLUS 33 (2006), pp. 69–130.

[VG08] Kate M. Van Gass. “Language contact in computer-mediated communication:Afrikaans-English code switching on internet relay chat (IRC)”. In: SouthernAfrican Linguistics and Applied Language Studies 26.4 (2008), pp. 429–444.

[Ven80] John Venn. “On the Diagrammatic and Mechanical Representation of Proposi-tions and Reasonings”. In: Philosophical Magazine and Journal of Science. 5th ser.10 (1880), pp. 1–18.

[Ver+07] Rudolph F. Verderber, Kathleen S. Verderber, and Cynthia Berryman-Fink.Communicate! 12th ed. Thomson/Wadsworth, 2007.

228

Bibliography

[Ver14] Lieke Verheijen. “Out-of-the-ordinary orthography: the use of textisms inDutch youngsters’ written computer-mediated communication”. In: Proceed-ings of the second Postgraduate and Academic Researchers in Linguistics at York(PARLAY 2014) conference. Ed. by Verónica González Temer, Jelena Horvatic,David O’Reilly, and Aiqing Wang. 1. The Postgraduate Academic Researchersin Linguistics at York (PARLAY), 2014, pp. 127–142.

[Ver99] Sameer Verma. “Diffusion and Adoption of Multicasting: Role of Implicitversus Explicit Communication Initiation Methods”. PhD thesis. Atlanta, GA,USA: Georgia State University, 1999.

[VS04] Fernanda B. Viégas and Marc Smith. “Newsgroup Crowds and AuthorLines:Visualizing the Activity of Individuals in Conversational Cyberspaces”. In:Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS’04). Washington, DC, USA: IEEE Computer Society, 2004.

[Vi04] Fernanda B. Viégas, Martin Wattenberg, and Kushal Dave. “Studying Coop-eration and Conflict between Authors with history flow Visualizations”. In:Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI 2004). Ed. by Elizabeth Dykstra-Erickson and Manfred Tscheligi. NewYork, NY, USA: ACM, 2004, pp. 575–582.

[Vi07] Fernanda B. Viégas, Martin Wattenberg, Jesse Kriss, and Frank van Ham.“Talk Before You Type: Coordination in Wikipedia”. In: Proceedings of the 40thAnnual Hawaii International Conference on System Sciences (HICSS ’07). IEEEComputer Society, 2007.

[Vi13] Fernanda B. Viégas, Martin Wattenberg, Jack Hebert, Geoffrey Borggaard,Alison Cichowlas, Jonathan Feinberg, Jon Orwant, and Christopher R. Wren.“Google+ Ripples: A Native Visualization of Information Flow”. In: Proceed-ings of the IW3C2 WWW 2013 Conference. Ed. by Daniel Schwabe, VirgilioAlmeida, Hartmut Glaser, Ricardo Baeza-Yates, and Sue Moon. Rio de Janeiro,Brazil, 2013, pp. 1389–1398.

[Viv06] John Vivian. The Media of Mass Communication. 8th ed. Allyn & Bacon, 2006.

[Von12] Tjerk Vonck. mIRC: IRC Frequently Asked Questions. 2012. URL: http://www.mirc.com/ircintro.html (visited on 03/03/2013).

[WF74] Robert A. Wagner and Michael J. Fischer. “The String-to-String CorrectionProblem”. In: Journal of the Association for Computer Machinery 21.1 (1974),pp. 168–173.

[WE08] Feihong Wang and Michael A. Evans. “Identity Transformation in Real andVirtual Worlds: The Overlapping World View”. In: International Conference onCyberworlds 2008. IEEE Computer Society, 2008, pp. 81–85.

[Wat+07] Martin Wattenberg, Fernanda B. Viégas, and Katherine Hollenbach. “Visualiz-ing Activity on Wikipedia with Chromograms”. In: Proceedings of the 11th IFIPTC 13 International Conference on Human-computer Interaction - Volume Part II(INTERACT ’07). Springer, 2007, pp. 272–287.

[Wat+67] Paul Watzlawick, Janet Helmick Beavin, and Don D. Jackson. Pragmaticsof Human Communication: A Study of Interactional Patterns, Pathologies, andParadoxes. Palo Alto, CA, USA: W. W. Norton & Company, 1967.

229

Bibliography

[Wei+15] Tim Weilkiens, Jesko G. Lamm, Stephan Roth, and Markus Walker. Model-Based System Architecture. Ed. by Andrew P. Sage. Wiley Series in SystemsEngineering and Management. John Wiley & Sons, 2015.

[Wer96] Christopher C. Werry. “Linguistic and Interactional Features of Internet Re-lay Chat”. In: Computer-Mediated Communication: Linguistic, Social and Cross-Cultural Perspectives. Ed. by Susan C. Herring. Amsterdam, Netherlands: JohnBenjamins Publishing Company, 1996.

[WT11] Richard West and Lynn H. Turner. Understanding Interpersonal Communication:Making Choices in Changing Times. Cengage Learning, 2011.

[WM57] Bruce H. Westley and Malcolm S. MacLean. A Conceptual Model for Communi-cations Research. Vol. 34. 1. 1957, pp. 31–38.

[Wet+01] Margaret Wetherell, Stephanie Taylor, and Simeon J. Yates. Discourse as Data:A Guide for Analysis. SAGE Publications, 2001.

[Wev+04] Peter Weverka, Tony Bove, Mark L. Chambers, Marsha Collier, Brad Hill, JohnR. Levine, Margaret Levine Young, Doug Lowe, Camille McCue, Deborah S.Ray, Eric J. Ray, and Cheryl Rhodes. The Internet GigaBook for Dummies. WileyPublishing, 2004.

[Wik13a] WikiLeaks. About - What is Wikileaks ? 2013. URL: http://wikileaks.org/About.html (visited on 11/01/2013).

[Wik13] Wikipedia. Usenet - Wikipedia, the free encyclopedia.htm. 2013. URL: http://en.wikipedia.org/wiki/Usenet (visited on 12/13/2013).

[Wik13b] Wikipedia. User:ClueBot NG. 2013. URL: http://en.wikipedia.org/wiki/User:ClueBot_NG (visited on 11/03/2013).

[Win+09] Sven Windahl, Benno H. Signitzer, and Jean T. Olson. Using CommunicationTheory: An Introduction to Planned Communication. 2nd ed. SAGE Publications,2009.

[Wit04] Paul L. Witt. “Internet Relay Chat (IRC)”. In: The Internet Encyclopedia: Volume2. Ed. by Hossein Bidgoli. John Wiley & Sons, 2004, pp. 311–319.

[WM10] James C. Witte and Susan E. Mannon. The Internet and Social Inequalities.Contemporary Sociological Perspectives. Taylor & Francis, 2010.

[Woe92] Joseph Woelfel. “Principles of Communication”. Dissertation. Buffalo, NY,USA: Department of Communication, 1992.

[WS05] Andrew F. Wood and Matthew J. Smith. Online Communication: Linking Tech-nology, Identity, and Culture. 2nd ed. LEA’s communication Series. LawrenceErlbaum Associates, 2005.

[Woo12] Julia T. Wood. Communication in Our Lives. 6th ed. Cengage Learning, 2012.

[WK00] Linda A. Wood and Rolf O. Kroger. Doing Discourse Analysis: Methods forStudying Action in Talk and Text. SAGE Publications, 2000.

[WT07] Dan Woods and Peter Thoeny. Wikis for Dummies. Wiley Publishing, 2007.

[Woo05] Robin Wooffitt. Conversation Analysis & Discourse Analysis: A Comparative andCritical Introduction. SAGE Publications, 2005.

230

Bibliography

[Wor+07] David W. Worley, Debra A. Worley, and Laura B. Soldner. CommunicationCounts: Getting It Right in College and Life. Allyn & Bacon, 2007.

[WM11] Kevin B. Wright and Ahlam Muhtaseb. “Personal Relationships andComputer-Mediated Support Groups”. In: Computer-Mediated Communica-tion in Personal Relationships. Ed. by Kevin B. Wright and Lynne M. Webb.Peter Lang Publishing, 2011. Chap. 8, pp. 137–155.

[XD99] Rebecca Xiong and Judith Donath. “PeopleGarden: Creating Data Portraitsfor Users”. In: Proceedings of the 12th Annual ACM Symposium on User InterfaceSoftware and Technology (UIST ’99). New York, NY, USA: ACM, 1999, pp. 37–44.

[Yar06] Majid Yar. Cybercrime and Society. SAGE Publications, 2006.

[You+07] Gerald Young, Andrew W. Kane, and Keith Nicholson. Causality of Psychologi-cal Injury: Presenting Evidence in Court. Springer, 2007.

[Zas01] Charles Zastrow. Social Work with Groups: Using the Class as a Group LeadershipLaboratory. 5th ed. Social Work Series. Brooks/Cole, 2001.

[Zhe+06] Rong Zheng, Jiexun Li, Hsinchun Chen, and Zan Huang. “A Frameworkfor Authorship Identification of online Messages: Writing-Style Features andClassification Techniques”. In: Journal of the American Society for InformationScience and Technology 57.3 (2006), pp. 378–393.

[Zit13] Mimouna Zitouni. “Is English There? : Investigating Language Use amongYoung Algerian Users of Internet”. PhD thesis. Oran, Algeria: University ofOran, 2013.

[Zit04] Michaela Zitzen. “Topic Shift Markers in asynchronous and synchronousComputer-mediated Communication (CMC)”. Dissertation. Heinrich-Heine-Universität Düsseldorf, 2004.

231

Appendix

233

CHAPTER A The Penn Treebank POS Tagset

Table A.1: The Penn Treebank POS Tagset (adapted from [Mar+94])Tag Description

CC coordinating conjunctionCD cardinal numberDT determinerEX existential thereFW foreign wordIN preposition or subordinating conjunctionJJ adjectiveJJR adjective, comparativeJJS adjective, superlativeLS list item markerMD modalNN noun, singular or massNNS noun, pluralNNP proper noun, singularNNPS proper noun, pluralPDT predeterminerPOS possessive endingPRP personal pronounPRP$ possessive pronounRB adverbRBR adverb, comparativeRBS adverb, superlativeRP particleSYM symbol (mathematical or scientific)TO toUH interjectionVB verb, base formVBD verb, past tenseVBG verb, gerund or present participleVBN verb, past participleVBP verb, non-3rd person singular presentVBZ verb, 3rd person singular presentWDT wh-determinerWP wh-pronounWP$ possessive wh-pronounWRB wh-adverb

235

CHAPTER B IRC message format

B.1 “Pseudo” BNF

Table B.1: IRC message format in “pseudo” BNF<message> ::= [':' <prefix> <SPACE> ] <command> <params> <crlf><prefix> ::= <servername> | <nick> [ '!' <user> ] [ '@' <host> ]<command> ::= <letter> { <letter> } | <number> <number> <number><SPACE> ::= ' ' { ' ' }<params> ::= <SPACE> [ ':' <trailing> | <middle> <params> ]<middle> ::= <Any *non-empty* sequence of octets not including SPACE

or NUL or CR or LF, the first of which may not be ':'><trailing> ::= <Any, possibly *empty*, sequence of octets not including NUL or CR or LF><crlf> ::= CR LF

<target> ::= <to> [ "," <target> ]<to> ::= <channel> | <user> '@' <servername> | <nick> | <mask><channel> ::= ('#' | '&') <chstring><servername> ::= <host><host> ::= see RFC 952 [DNS:4] for details on allowed hostnames<nick> ::= <letter> { <letter> | <number> | <special> }<mask> ::= ('#' | '$') <chstring><chstring> ::= <any 8bit code except SPACE, BELL, NUL, CR, LF and comma (',')>

<user> ::= <nonwhite> { <nonwhite> }<letter> ::= 'a' ... 'z' | 'A' ... 'Z'<number> ::= '0' ... '9'<special> ::= '-' | '[' | ']' | '\' | '`' | 'ˆ' | '{' | '}'

237

Appendix B IRC message format

B.2 Augmented BNF (ABNF)

Table B.2: IRC message format in Augmented BNF (ABNF)message = [ ":" prefix SPACE ] command [ params ] crlfprefix = servername / ( nickname [ [ "!" user ] "@" host ] )command = 1*letter / 3digitparams = *14( SPACE middle ) [ SPACE ":" trailing ]

=/ 14( SPACE middle ) [ SPACE [ ":" ] trailing ]nospcrlfcl = %x01-09 / %x0B-0C / %x0E-1F / %x21-39 / %x3B-FF

; any octet except NUL, CR, LF, " " and ":"middle = nospcrlfcl *( ":" / nospcrlfcl )trailing = *( ":" / " " / nospcrlfcl )SPACE = %x20 ; space charactercrlf = %x0D %x0A ; "carriage return" "linefeed"

target = nickname / servermsgtarget = msgto *( "," msgto )msgto = channel / ( user [ "%" host ] "@" servername )msgto =/ ( user "%" host ) / targetmaskmsgto =/ nickname / ( nickname "!" user "@" host )channel = ( "#" / "+" / ( "!" channelid ) / "&" ) chanstring [ ":" chanstring ]servername = hostnamehost = hostname / hostaddrhostname = shortname *( "." shortname )shortname = ( letter / digit ) *( letter / digit / "-" ) *( letter / digit )

; as specified in RFC 1123 [HNAME]hostaddr = ip4addr / ip6addrip4addr = 1*3digit "." 1*3digit "." 1*3digit "." 1*3digitip6addr = 1*hexdigit 7( ":" 1*hexdigit )ip6addr =/ "0:0:0:0:0:" ( "0" / "FFFF" ) ":" ip4addrnickname = ( letter / special ) *8( letter / digit / special / "-" )targetmask = ( "$" / "#" ) maskchanstring = %x01-07 / %x08-09 / %x0B-0C / %x0E-1F / %x21-2Bchanstring =/ %x2D-39 / %x3B-FF ; any octet except NUL, BELL, CR, LF, " ", "," and ":"channelid = 5( %x41-5A / digit ) ; 5( A-Z / 0-9 )

user = 1*( %x01-09 / %x0B-0C / %x0E-1F / %x21-3F / %x41-FF ); any octet except NUL, CR, LF, " " and "@"

key = 1*23( %x01-05 / %x07-08 / %x0C / %x0E-1F / %x21-7F ); any 7-bit US_ASCII character, except NUL, CR, LF, FF, h/v TABs, and " "

letter = %x41-5A / %x61-7A ; A-Z / a-zdigit = %x30-39 ; 0-9hexdigit = digit / "A" / "B" / "C" / "D" / "E" / "F"special = %x5B-60 / %x7B-7D ; "[", "]", "\", "`", "_", "ˆ", "{", "|", "}"

238

CHAPTER C Detailed hourly usage (CET) perchannel

Table C.1: Detailed hourly usage (CET) per channel

Hou

r

#car

s

#def

ocus

#eng

land

#Eng

lish

#fre

enod

e

##ha

rdw

are

#irc

help

#mus

ic

#per

l1

#Rom

ance

#soc

cer

0 5 1725 144 5061 1504 1276 655 62 327 4128 2851 7 1807 124 4030 1159 1051 760 49 285 3374 3142 4 1134 16 3855 1204 1456 687 134 180 2264 663 51 1599 25 3872 1159 1283 759 46 170 2352 594 5 1429 24 3006 1087 1196 595 118 243 2755 465 2 1170 13 2809 980 1395 379 79 123 3003 436 3 1502 15 2394 771 1179 614 47 101 2727 227 1 1260 20 1568 704 1052 526 43 88 3135 348 0 927 59 1908 692 1039 517 23 122 2795 209 1 758 64 2420 985 991 179 16 193 2241 41

10 2 1069 154 3190 835 1276 277 85 117 1343 2211 5 1500 494 2998 660 1162 201 46 307 1290 1612 2 1252 178 3103 857 585 188 72 142 1814 5513 5 1346 169 3755 1096 503 273 43 321 1801 11914 3 1417 180 3244 894 803 310 56 456 1626 11215 2 1724 313 4739 1304 699 483 34 572 1773 6816 1 1379 303 4570 871 931 573 45 429 2808 14117 1 1188 101 4951 1154 1368 763 35 372 2674 15818 0 1189 172 5790 1205 1430 609 87 329 2312 48219 5 1242 105 5169 1177 1232 1022 92 301 3296 26620 7 1064 117 6063 966 952 798 72 331 3978 28621 12 1530 328 6176 1180 1220 1023 62 554 3877 53222 7 1501 198 5553 1233 1613 1218 29 497 3548 34823 5 1337 388 5773 1590 1347 648 57 357 4083 229

∑ 136 32049 3704 95997 25267 27039 14057 1432 6917 64997 3764

239

CHAPTER D IRC commands

D.1 IRC server commands

Table D.1: IRC server commands (adapted from [Kal00c; Kal00d; OR93])Command Parameter and description

ADMIN 1459: [<server>]2812: [ <target> ]Provides information about the administrator of the current or specified server.

AWAY 1459: [<message>]2812: [ <text> ]Leaves a message indicating that the user is not currently paying attention to IRC. If themessage is omitted, the away status is removed.

CONNECT 1459: <target server> [<port> [<remote server>]]2812: <target server> <port> [ <remote server> ]Instructs the current or remote server to connect to <target server> on <port> (only foroperators).

DIE 2812:Instructs the server to shut down (only for operators).

ERROR 1459: <error message>2812: <error message>Used by servers to report errors to other servers or before terminating client connections.

INFO 1459: [<server>]2812: [ <target> ]Returns information about the current or specified server.

INVITE 1459: <nickname> <channel>2812: <nickname> <channel>Sends user an invitation to join a particular channel.

ISON 1459: <nickname>{<space><nickname>}2812: <nickname> *( SPACE <nickname> )The server returns the users who are currently online on the network in a space-separatedlist.

JOIN 1459: <channel>{,<channel>} [<key>{,<key>}]2812: ( <channel> *( "," <channel> ) [ <key> *( "," <key> ) ] ) / "0"2813: <channel>[ %x7 <modes> ] *( "," <channel>[ %x7 <modes> ] )Joins the specified channels (if needed with passwords) in a comma-separated list.

KICK 1459: <channel> <user> [<comment>]2812: <channel> *( "," <channel> ) <user> *( "," <user> ) [<comment>] )Removes (kicks) a user from a channel with optional comment (only for channel operators).

KILL 1459: <nickname> <comment>2812: <nickname> <comment>Forcibly removes user from the network with an optional comment (only for IRC oper-

continued on the next page

241

Appendix D IRC commands

Command Parameter and description

ators).

LINKS 1459: [[<remote server>] <server mask>]2812: [ [ <remote server> ] <server mask> ]Lists all servers which are known by the server answering the query (optional: whichmatch the server mask). If a remote server is given, the command is forwarded to the firstserver found that matches that name (if any), and that server is then required to answerthe query.

LIST 1459: [<channel>{,<channel>} [<server>]]2812: [ <channel> *( "," <channel> ) [ <target> ] ]Lists channels and their topics. If a channel is specified, only the status of that channel isdisplayed. If a server is used (wildcards are allowed), the request is forwarded to thatserver which will generate the reply.

LUSERS 2812: [ <mask> [ <target> ] ]Returns statistics about the size of the whole network. The mask parameter only returnsstatistics reflecting the masked subset of the network. If a target is used, the request isforwarded to that server which will generate the reply.

MODEc 1459: <channel> {[+|-]|o|p|s|i|t|n|b|v} [<limit>] [<user>] [<ban mask>]2812: <channel> *( ( "-" / "+" ) *<modes> *<modeparams> )Different modes are available for channels (used by channel operators).

MODEu 1459: <nickname> {[+|-]|i|w|s|o}2812: <nickname> *( ( "+" / "-" ) *( "i" / "w" / "o" / "O" / "r" ) )Available user modes.

MOTD 2812: [ <target> ]Displays the MOTD (message of the day) of the current or target server.

NAMES 1459: [<channel>{,<channel>}]2812: [ <channel> *( "," <channel> ) [ <target> ] ]Returns a list of all users who are on the comma-separated list of channels. If <channel> isomitted, a list of all channels and their users is shown. If <target> is specified, the com-mand is sent to <target> for evaluation.

NICK 1459: <nickname> [ <hopcount> ]2812: <nickname>2813: <nickname> <hopcount> <username> <host> <servertoken> <umode> <realname>Changes current nickname to a new one. The <hopcount> parameter is used by servers toindicate how far away a nick is from its home server.

NJOIN 2813: <channel> [ "@@" / "@" ] [ "+" ] <nickname> *( "," [ "@@" / "@" ] [ "+" ] <nickname> )Used when two servers connect to each other to exchange the list of channel members foreach channel.

NOTICE 1459: <nickname> <text>2812: <msgtarget> <text>Similar to the PRIVMSG command. This command sends a private message to a user, butautomatic replies must never be sent in response to the message.

OPER 1459: <user> <password>2812: <name> <password>Authenticates a user as an operator on the server/network.

PART 1459: <channel>{,<channel>}

continued on the next page

242

D.1 IRC server commands

Command Parameter and description

2812: <channel> *( "," <channel> ) [ <part message> ]Used to part (or leave) a channel with an optional part message.

PASS 1459: <password>2812: <password>2813: <password> <version> <flags> [<options>]Sets a connection password. The user sends a password before sending the NICK/USERcombination.

PING 1459: <server1> [<server2>]2812: <server1> [ <server2> ]Used to test the presence of an active client/server at the other end of the connection.

PONG 1459: <daemon> [<daemon2>]2812: <server> [ <server2> ]A reply to a PING message.

PRIVMSG 1459: <receiver>{,<receiver>} <text to be sent>2812: <msgtarget> <text to be sent>Sends private message to user or public message to channel. The receiver parametermay also be a host mask or server mask.

QUIT 1459: [<quit message>]2812: [ <quit message> ]2813: [<quit message>]Disconnects the user from the server with an optional quit message.

REHASH 1459:2812:Forces the server to re-read and process its configuration file (used by operators).

RESTART 1459:2812:Forces the server to restart itself (used by operators).

SERVER 1459: <servername> <hopcount> <info>2813: <servername> <hopcount> <token> <info>Tells a server with different parameters that the other end of a new connection is a server.

SERVICE 2812: <nickname> <reserved> <distribution> <type> <reserved> <info>2813: <servicename> <servertoken> <distribution> <type> <hopcount> <info>Registers a new service on the network.

SERVLIST 2812: [ <mask> [ <type> ] ]Lists services currently connected to the network. The result of the query can be restrictedwith the optional parameters.

SQUERY 2812: <servicename> <text>Similar to the PRIVMSG command, but the recipient must be a service.

SQUIT 1459: <server> <comment>2812: <server> <comment>2813: <server> <comment>This command is used to disconnect server links.

STATS 1459: [<query> [<server>]]2812: [ <query> [ <target> ] ]Returns statistics about the current server (if parameters are omitted) or <server>.

continued on the next page

243

Appendix D IRC commands

Command Parameter and description

SUMMON 1459: <user> [<server>]2812: <user> [ <target> [ <channel> ] ]Gives users, who are on a host running a server, a message asking them to join.

TIME 1459: [<server>]2812: [ <target> ]Returns the local time on the specified or current server.

TOPIC 1459: <channel> [<topic>]2812: <channel> [ <topic> ]Changes or views the topic of a specified channel.

TRACE 1459: [<server>]2812: [ <target> ]Used to find the route to specific server, in a similar method to "traceroute". If the targetparameter is omitted, it shows the direct connection to the current server.

USER 1459: <username> <hostname> <servername> <realname>2812: <user> <mode> <unused> <realname>Used at the beginning of connection to specify, e.g., the real name of a new user.

USERHOST 1459: <nickname>{<space><nickname>}2812: <nickname> *( SPACE <nickname> )Returns a list of information about the specified nicknames.

USERS 1459: [<server>]2812: [ <target> ]Returns a list of users logged into the server.

VERSION 1459: [<server>]2812: [ <target> ]Returns the version of a server that is not directly connected or the current server (if parameteris omitted).

WALLOPS 1459: <Text to be sent to all operators currently online>2812: <Text to be sent>Sends a message to all operators currently online (RFC 1459) or all users with user mode"w" (RFC 2812).

WHO 1459: [<name> [<o>]]2812: [ <mask> [ "o" ] ]Returns a list of users which matches the parameter. If the "o" parameter is given, theserver only returns information about operators.

WHOIS 1459: [<server>] <nickmask>[,<nickmask>[,...]]2812: [ <target> ] <mask> *( "," <mask> )Returns information about the comma-separated list of nicks which matches the mask. Ifthe server (target) parameter is given, the command is forwarded to it for processing.

WHOWAS 1459: <nickname> [<count> [<server>]]2812: <nickname> *( "," <nickname> ) [ <count> [ <target> ] ]Returns information about a nickname which no longer exists because of a nick changeor user leaving. Wildcards are allowed in the server and target parameters. In RFC 2812,<nickname> can be a comma-separated list of nicknames.

244

D.2 Additional IRC server commands

D.2 Additional IRC server commands

Table D.2: Additional IRC server commandsCommand Description IRCd example

COMMANDS Lists all currently available commands, the module which InspIRCd-2.0provides them (or the core), and the number of argumentsthey take as a minimum.

CYCLE Equivalent to sending a PART and then a JOIN command. InspIRCd-2.0HELP Shows help information for all available commands. ircd-seven-1.1.3MAP Displays a network map of all server connections. irc-2.11.2p3MODULES Lists all modules loaded on the IRC server. InspIRCd-2.0RULES Shows the rules of the current network. Unreal3.2.6.SwiftIRC(10)

245

Appendix D IRC commands

D.3 IRC client (mIRC) commands1 See description in Table D.1.

Table D.3: mIRC commandsCommand Parameter and description

ACTION <action text>Sends the specified action text to the active channel or query window.

AJINVITE Turns auto-join on invite on or off.

AME <action text>Sends an action to all connected channels.

AMSG <text>Sends a message to all connected channels.

ANICK <nickname>Changes the alternate nickname.

BAN [-{k|r|uN}] [<channel>] {<nickname>|<address>} [<type>]Bans someone from the current channel.

CLEAR [-{s|g|h|l|c}] [<windowname>]Clears the buffer of the mIRC current or specified window.

CLEARALL [-{s|n|q|m|t|g|u|a}]Clears the buffers of the specified or all windows.

CTCP <nickname> <ctcp types> [<message>]Does the given client-to-client protocol (CTCP) request on nickname. <ctcp types>are ping, finger, version, time, userinfo, or clientinfo.

CTCPREPLY <nickname> <ctcp type> [<message>]Sends a reply to a CTCP query.

DCC {chat <nickname>|send <nickname> <file>{,<file>}|get <nickname> <file>}Opens a direct client connection (DCC) window and sends a DCC chat request tonickname or sends/gets the specified files.

DESCRIBE {<nickname>|<channel>} <action text>Sends the text to the specified nickname or channel.

DISCONNECT Forces a hard and immediate disconnect from the IRC server.

DNS [-46|c|h] [<nickname>|<address>]Uses provider’s domain name system (DNS) to resolve an address.

EXIT [-n|r]Closes down mIRC and exits.

FULLNAME <nickname>Changes the full name in the connect dialog.

HOP [-c|n] [<channel>] [<message>]Parts the current channel and joins a new one. If no new channel is specified, it partsand rejoins the current channel without closing the window.

IGNORE [<nickname>|<address>]Adds user or address to the ignore list.

JOIN [-{i|n|x}] <channel>{,<channel>} [<key>{,<key>}]1

KNOCK <channel> <message>A knock with an optional message is used to request an invitation on an invitation-

continued on the next page

246

D.3 IRC client (mIRC) commands

Command Parameter and description

only channel.

LEAVE See PART command.

LINKS [-{n|x}]1

LIST [<channel> [<searchstring>] [-min <min>] [-max <max>]1

LOCALINFO -{u|h|p} [<host>] [<ip>]Looks up and sets the local settings.

LOG <on|off> <window> [-f filename]Turns logging on and off for a window. If a filename is specified, the logs file dialog isnot popped up.

ME <action text>Sends the specified action text to the active channel or query window.

MNICK <nickname>Changes the main nickname.

MODEc <channel> [[+|-]modechars [parameters]]1

MODEu <nickname> [[+|-]modechars [parameters]]1

MSG {<nickname>|<channel>} <text>Sends a private message to a channel or nickname without opening a query window.

NICK <nickname>1

NOTICE {<nickname>|<channel>} <message>1

OMSG [<channel>] <message>Sends the specified message to all operators on a channel.

ONOTICE [<channel>] <message>Sends the specified notice message to all channel operators.

PART [<channel>,<channel> [<part message>]]1

PARTALL [<part message>]User leaves all channels.

QME <message>Sends an action to all open query windows.

QMSG <message>Sends a message to all open query windows.

QUERY [-n] <nickname> [message]Opens a query window to the specified nickname. If a message is provided, it is sent.

QUERYRN <nickname> <newnickname>Changes the nickname of an open query window.

QUOTE See RAW command.

RAW [-{q|n}] <command>Sends the specified command directly in RAW format to the server.

SERVER [-{46|e|m|n|s|a|r|p|f|o|c|z}] {<server>|<groupname>} [<port>] [<password>]Connects to an IRC server.

TIMESTAMP [-{f|g|s|a|e}] [on|off|default] [<windowname>]Enables and disables timestamps as well as sets their formats.

TNICK <nickname>Changes nick to a temporary one without affecting the main or alternate nicknames.

WHOIS <nickname>1

247

Appendix D IRC commands

D.4 Mapping of commands

Table D.4: Mapping of mIRC commands to the respective server commandsClient Server

Mapping toSame Other command

mIRC command JOIN

KIC

K

MO

DE

NIC

K

NO

TIC

E

PAR

T

PRIV

MSG

QU

IT

USE

R

Dir

ect(

Raw

)

ACTION 7 7 7 7 7 7 7 7 3 7 7 7

ADMIN 7 3 7 7 7 7 7 7 7 7 7 7

AJINVITE 3 7 7 7 7 7 7 7 7 7 7 7

AME 7 7 7 7 7 7 7 7 3 7 7 7

AMSG 7 7 7 7 7 7 7 7 3 7 7 7

ANICK 7 7 7 7 7 3 7 7 7 7 7 7

AWAY 7 3 7 7 7 7 7 7 7 7 7 7

BAN 7 7 7 3 3 7 7 7 7 7 7 7

CLEARALL 3 7 7 7 7 7 7 7 7 7 7 7

CLEAR 3 7 7 7 7 7 7 7 7 7 7 7

COMMANDS 7 3 7 7 7 7 7 7 7 7 7 7

CONNECT 7 3 7 7 7 7 7 7 7 7 7 7

CTCPREPLY 7 7 7 7 7 7 3 7 7 7 7 7

CTCP 7 7 7 7 7 7 7 7 3 7 7 7

CYCLE 7 7 3 7 7 7 7 3 7 7 7 7

DCC 7 7 7 7 7 7 3 7 3 7 7 7

DESCRIBE 7 7 7 7 7 7 7 7 3 7 7 7

DIE 7 3 7 7 7 7 7 7 7 7 7 7

DISCONNECT 7 7 7 7 7 7 7 7 7 3 7 7

DNS 3 7 7 7 7 7 7 7 7 7 7 7

EXIT 7 7 7 7 7 7 7 7 7 3 7 7

FULLNAME 3 7 7 7 7 7 7 7 7 7 7 7

HELP 7 3 7 7 7 7 7 7 7 7 7 7

HOP 7 7 3 7 7 7 7 3 7 7 7 7

IGNORE 3 7 7 7 7 7 7 7 7 7 7 7

INFO 7 3 7 7 7 7 7 7 7 7 7 7

INVITE 7 3 7 7 7 7 7 7 7 7 7 7

ISON 7 3 7 7 7 7 7 7 7 7 7 7

JOIN 7 3 7 7 7 7 7 7 7 7 7 7

KICK 7 3 7 7 7 7 7 7 7 7 7 7

KILL 7 3 7 7 7 7 7 7 7 7 7 7

KNOCK 7 3 7 7 7 7 7 7 7 7 7 7

LEAVE 7 7 7 7 7 7 7 3 7 7 7 7

LINKS 7 3 7 7 7 7 7 7 7 7 7 7

LIST 7 3 7 7 7 7 7 7 7 7 7 7

continued on the next page

248

D.4 Mapping of commands

Client ServerMapping to

Same Other Command

mIRC Command JOIN

KIC

K

MO

DE

NIC

K

NO

TIC

E

PAR

T

PRIV

MSG

QU

IT

USE

R

Dir

ect(

Raw

)

LOCALINFO 3 7 7 7 7 7 7 7 7 7 7 7

LOG 3 7 7 7 7 7 7 7 7 7 7 7

LUSERS 7 3 7 7 7 7 7 7 7 7 7 7

MAP 7 3 7 7 7 7 7 7 7 7 7 7

ME 7 7 7 7 7 7 7 7 3 7 7 7

MNICK 7 7 7 7 7 3 7 7 7 7 7 7

MODE 7 3 7 7 7 7 7 7 7 7 7 7

MODULES 7 3 7 7 7 7 7 7 7 7 7 7

MOTD 7 3 7 7 7 7 7 7 7 7 7 7

MSG 7 7 7 7 7 7 7 7 3 7 7 7

NAMES 7 3 7 7 7 7 7 7 7 7 7 7

NICK 7 3 7 7 7 7 7 7 7 7 7 7

NOTICE 7 3 7 7 7 7 7 7 7 7 7 7

OMSG 7 7 7 7 7 7 7 7 3 7 7 7

ONOTICE 7 7 7 7 7 7 3 7 7 7 7 7

OPER 7 3 7 7 7 7 7 7 7 7 7 7

PARTALL 7 7 7 7 7 7 7 3 7 7 7 7

PART 7 3 7 7 7 7 7 7 7 7 7 7

PASS 7 3 7 7 7 7 7 7 7 7 7 7

PING 7 7 7 7 7 7 7 7 3 7 7 7

PONG 7 3 7 7 7 7 7 7 7 7 7 7

PRIVMSG 7 3 7 7 7 7 7 7 7 7 7 7

QME 7 7 7 7 7 7 7 7 3 7 7 7

QMSG 7 7 7 7 7 7 7 7 3 7 7 7

QUERYRN 3 7 7 7 7 7 7 7 7 7 7 7

QUERY 3 7 7 7 7 7 7 7 7 7 7 7

QUIT 7 3 7 7 7 7 7 7 7 7 7 7

QUOTE 7 7 7 7 7 7 7 7 7 7 7 3

RAW 7 7 7 7 7 7 7 7 7 7 7 3

REHASH 7 3 7 7 7 7 7 7 7 7 7 7

RESTART 7 3 7 7 7 7 7 7 7 7 7 7

RULES 7 3 7 7 7 7 7 7 7 7 7 7

SERVER 7 7 7 7 7 3 7 7 7 7 3 7

SERVLIST 7 3 7 7 7 7 7 7 7 7 7 7

SQUERY 7 3 7 7 7 7 7 7 7 7 7 7

SQUIT 7 3 7 7 7 7 7 7 7 7 7 7

STATS 7 3 7 7 7 7 7 7 7 7 7 7

SUMMON 7 3 7 7 7 7 7 7 7 7 7 7

TIMESTAMP 3 7 7 7 7 7 7 7 7 7 7 7

continued on the next page

249

Appendix D IRC commands

Client ServerMapping to

Same Other Command

mIRC Command JOIN

KIC

K

MO

DE

NIC

K

NO

TIC

E

PAR

T

PRIV

MSG

QU

IT

USE

R

Dir

ect(

Raw

)

TIME 7 3 7 7 7 7 7 7 7 7 7 7

TNICK 7 7 7 7 7 3 7 7 7 7 7 7

TOPIC 7 3 7 7 7 7 7 7 7 7 7 7

TRACE 7 3 7 7 7 7 7 7 7 7 7 7

USERHOST 7 3 7 7 7 7 7 7 7 7 7 7

USERS 7 3 7 7 7 7 7 7 7 7 7 7

USER 7 3 7 7 7 7 7 7 7 7 7 7

VERSION 7 3 7 7 7 7 7 7 7 7 7 7

WALLOPS 7 3 7 7 7 7 7 7 7 7 7 7

WHOIS 7 3 7 7 7 7 7 7 7 7 7 7

WHOWAS 7 3 7 7 7 7 7 7 7 7 7 7

WHO 7 3 7 7 7 7 7 7 7 7 7 7

250

Curriculum Vitae

Robert Franz Eckercurriculum vitae

Personal dataDate of birth Nov 21, 1972Place of birth Ried im Innkreis, AustriaHome address Hohenzeller Straße 50, 4910 Ried im Innkreis, Austria

Email [email protected] Austrian

Gender Male

Experience2012–now Business Analyst/Product Owner/Functional Tester, Catalysts GmbH, Linz,

Austria.Home: http://www.catalysts.cc/

2011–2012 Head of Quality Management, ecx.io austria GmbH, Wels, Austria.Home: http://www.ecx.io/

2005–2010 Project Manager, Brau Union Österreich AG (Heineken Group), Linz, Austria.Home: http://www.brauunion.at/

Academic education2005–2017 Doctoral Program in Technical Sciences, Johannes Kepler University of Linz,

Linz, Austria.Home: http://www.jku.at/

1998–2005 Master’s Program in Computer Science, Johannes Kepler University of Linz,Linz, Austria.Home: http://www.jku.at/

Research interests- Computer-mediated communication (e.g., Internet Relay Chat)- Artificial intelligence (e.g., chatbot)- Computational linguistics (e.g., discourse analysis)

Interests- Music- Internet radio

Q [email protected] • � www.robertecker.com

- Online chat- Dancing (especially Rueda de Casino)- Soccer- Continuing education

PublicationsRobert Ecker. Unverträglichkeiten von Softwarekomponenten und deren Erkennungim Open-Source-Bereich. Master’s thesis, Johannes Kepler University of Linz,Austria, 2005.

Robert Ecker. Creation of Internet Relay Chat Nicknames and Their Usage inEnglish Chatroom Discourse. Linguistik online, 50(6):3–29, 2011.

Robert Ecker. Multiple-Views Analysis of Computer-Mediated Discourses. InProceedings of the 17th International Conference on Information Integration andWeb-based Applications & Services, iiWAS ’15, pages 186–195, New York, NY,USA, 2015. ACM.

Robert Ecker. Automated Detection and Guessing without Semantics of Sender-Receiver Relations in Computer-Mediated Discourses. In Proceedings of the 18thInternational Conference on Information Integration and Web-based Applications& Services, iiWAS ’16, pages 172–180, New York, NY, USA, 2016. ACM.

Q [email protected] • � www.robertecker.com

Eidesstattliche Erklärung

Ich erkläre an Eides statt, dass ich die vorliegende Dissertation selbstständig und ohnefremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutztbzw. die wörtlich oder sinngemäß entnommenen Stellen als solche kenntlich gemachthabe.

Die vorliegende Dissertation ist mit dem elektronisch übermittelten Textdokument iden-tisch.

Ried im Innkreis, August 2017

(Robert Ecker)

255