A Proactive Database System and its Query Language for Social Network Simulation

A Proactive Database System and its Query Language for Social Network Simulation

Sadegh Aliakbary1, Jafar Habibi

2

Sharif University of Technology

Tehran, Iran

[email protected],

[email protected]

Abstract. Social networks appear in different forms in various domains. Online social networks,

mobile communication networks, co-authoring networks and human interactions in real life are some

examples of social networks. Today, analysis of dynamics and evolution of social networks is an

important field of research. Computer simulation is a powerful method of study – and in some cases

the only one – in this field. Agent-based computer simulation has found many applications for

simulation of social processes and Agent Based Social Simulation (ABSS) became the dominant

method in this area. Although there exist some well-known and mature ABSS models and frameworks

for social simulation, but most of them have limitations for social network simulation.

In this paper we propose a distributed and scalable simulation model for social networks which

exploits a central proactive database system. In this model, it is possible to distribute millions of

agents in different processing units and propagate the simulation data between the agents efficiently

using the central proactive database. We have also proposed a query language for this database system

to be used by agents for capturing network properties and listening to network changes. The database

is considered proactive, since it not only accepts select and update queries from the agents, but it also

actively sends changes of the network to appropriate agents without being queried. We included some

implementation notes about the proposed framework and the applications in which it is applicable and

useful. Researchers and analysts of social networks may use the proposed model and framework for

developing their desired agents and for running the simulation in a scalable and simple framework.

Keywords: Social network, computer simulation, multi agent systems, database, query language.

Table of Contents

1. Introduction

2. Motivations and Related Works

3. Proactive DBMS

3.1. Query Language

3.2. Proactive Database System Components

4. Implementation Issues

4.1. Coordinator

4.2. Scheduler

4.3. Access Levels

5. Discussion and Conclusion

6. Future Works

References

Appendix. Grammar of Social Network Simulation Query Language

mailto:[email protected]

1. Introduction

Nowadays, social networks are important and ever-growing structures. A social network is a graph of

some social entities along with their relationships. The nodes of a social network are usually human

beings, but sometimes some other entities like animals, robots, web-sites and computers with their

relationships are considered as social networks. Online social networks (e.g. Facebook, Twitter and

Google+), communication networks (e.g. mobile or email communications), citation networks and

collaboration networks (e.g. in co-authoring a paper) are some examples of social networks. Study of

characteristics and evolution of social networks is a noteworthy field in network analysis, with

applicable methods like mathematical modeling, computer simulation and field research.

Computer simulation, as a third way of doing science (after induction and deduction) [1], is a powerful

and flexible approach in network analysis. When the problem is too complex to be modeled

mathematically, computer simulation would usually be the best method of analysis. With social

network simulation, an analyst represents a social network as a computer program, runs social

activities and processes in a virtual simulated environment and then studies the properties, dynamics

and evolution of the network. Diffusion, synchronization, search, advertisement and bargaining are

just some examples of social network interactions and processes, nominated for simulation.

As a sample application for social network simulation, consider this example: managers of an online

social network web-site are willing to enhance some features of their service, but they are uncertain

about the results and effects of these changes on the behavior of their clients and on key performance

indicators. This social network is a complex system with many interdependent processes (e.g.

diffusion of news, dating, effect of advertisements) and parameters (e.g. properties of members and

relationships). In this situation, the best approach would be the initiation of a simulation of the social

network with desired parameters and processes. Social network simulation has also many applications

in sociology, physics, biology, economics and management.

For simulating a social network, we need an appropriate computer program, and there are some

choices to develop such a program. Since there are many similarities and common functionalities in

different social network simulations, we can implement common features as a core infrastructure and

extend it for different simulation instances. So perhaps the worst way is to program (from scratch) for

requirements of our target social network in each simulation problem. It seems that we need a platform

which helps us with an infrastructure of the simulation and lets us develop and configure desired

behaviors of target social entities. In fact, social network simulation is a special kind of social

simulation and there exist some mature platforms for social simulation like Swarm[2], MASON[3] and

Repast[4]. Among different methods of implementation, agent-based simulation is perhaps the most

popular one in social simulation[5]. In this approach, the social system is modeled as a multi-agent

system with autonomous and intelligent agents, representing and simulating the behavior of real social

entities.

But in many situations, the existing agent-based frameworks are not suitable for simulation of social

networks. They are not designed for social networks, they do not support especial properties and

functionalities of social networks and they are not scalable for very large networks with complicated

agents.

In this paper we propose a new method for social network simulation. The main elements of this

method are agents and a proactive database. The agents simulate behavior of network nodes and the

database manages the state of the network structure and properties of nodes and edges. While usual

database systems (e.g. relational or graph databases) are passive, since they just answer queries of

clients, the database in our proposed method is a proactive one. It not only handles agent queries, but it

also informs agents about changes of the network. With this approach, agents do not need repetitive

queries of database to be informed about the network properties: the database informs an agent about a

network change, if this change is considered important and is accessible by the agent. We designed a

query language for this database and access levels for node/edge properties. The proposed method

allows the simulation of very large networks with complicated agent behaviors by distributing the

execution of agents and performing an efficient information flow process.

The rest of this paper is structured as follows: we review the literature and explain the motivations of

our model in section 2. In the third section the proactive database and its query language is described.

Section 4 is about some technical and implementation notes. Section 5 covers some discussion about

properties of the proposed method and a conclusion. Finally, the future works are presented in the last

section. An appendix is also included at the end of this paper with a short specification for the

grammar of the proposed query language.

2. Motivations and Related Works

In this section we overview the main researches related to social network simulation and our proposed

method. We discuss about their features and their pros and cons and finally we propose a new

approach for social network simulation and we discuss the need and necessity to this new model.

When the outcome of a system is hard to be modeled mathematically, computer simulation would be

the best choice for study and analysis of the system. In many instances of social network analysis

problems, we face such a complex, nonlinear and dynamical system that it leads to simulation as the

only applicable approach of study.

Social simulation is the area of applying computational methods for modeling and simulating social

systems and studying their mechanisms. Agent-based social simulation (ABSS) is one of the most

important methods and perhaps the dominant approach in this area [5], [6]. Popularity of ABSS

models and frameworks is growing rapidly[7]. ABSS has found many applications in physics, biology,

sociology and management[6]. As Fig 1 shows, ABSS is the intersection of three scientific fields:

social science, agent-based computing and computer simulation [5][8].

Fig 1. ABSS as a cross disciplinary research field [5]

Some cognitive architectures are proposed by cognitive scientists, for modeling the structure and

behavior of agents, among them Soar [9], ACT-R [10] and Clarion [11] are more popular [12].

Although the use of cognitive architectures for developing agents in our proposed simulation approach

is possible, but our proposed method is a high-level framework for the simulation, so cognitive

architectures are regarded out of the scope in this discussion.

Social simulation frameworks like MASON [3] , Repast [4], Swarm[2] and NetLogo [13] offer

common functionalities of a simulation. The user of these frameworks should implement the behavior

of agents and simulate the behaviors using the tools provided by the framework. These frameworks

have made possible a growing number of applications in a variety of fields and domains. Perhaps the

first candidate for social network simulation is now using one of these agent-based social simulation

frameworks as the starting point and the infrastructure of the simulation.

Despite the existence of ABSS frameworks, we believe that a new simulation model is required in

many instances of social network simulation problems. ABSS frameworks are not designed with social

networks in mind. They usually support simple reactive agents, without a powerful support of scalable

world-state management. For example, in MASON there is a shared world-state object, named

SimState (simulation state: agent properties, environment and etc.) among all the agents. This model

of accessing the world-sate is inefficient, insecure and unscalable. This model and similar approaches

of current ABSS frameworks does not support (at least some of) requirements like these ones:

Limited information access. An agent should see parts of the network that is allowed (not the

whole network).

Large networks. For very large networks with millions of nodes and edges, it is not possible to

run the simulation on a single machine. (CPU and memory limits)

Complex agents. Centralized simulation frameworks do not support distribution of agents

among multiple processing units.

Interrupt instead of polling. In an ordinary approach, an agent requests the required

information in each simulation cycle from the world-state object, but in many circumstances

an agent is waiting for something to happen. In this case it is better (more efficient) to inform

the agent about a change whenever it is necessary, instead of forcing the agent to ask the state

in each cycle.

Hidden agent implementation. In some applications, the internal architecture and

implementation of agents should be hidden and only their interaction with the network could

be public.

Flexible time period for simulation cycles. The time length of a cycle is fixed in most of

simulation engines, but some cycles may consume more or less time to finish.

Some attempts are reported in the literature to encounter some of these challenges. For example

authors of [14] proposed using Terracotta [15] for providing a global heap memory for Repast agents.

It seems that grid, cluster and cloud computing techniques are also able to handle some parts (but not

all) of scalability issues. But we think that no thorough research is available handling all of these

challenges as a social network simulation platform or model. It seems that we need a more powerful

and efficient model of information management and propagation (in handling properties of nodes and

edges) for social network simulation, with better mechanisms for accessing and changing state of the

network.

In this paper we propose a proactive database system and a new simulation paradigm to overcome

mentioned limitations. In our model, world-state is maintained and managed in a centralized social

network database and the agents are executed on different machines. The centralized database is

proactive, meaning that the database may call agents and inform them about a change. Unlike usual

database systems (e.g. relational DBMSs) which are passive, handling only the requests of clients, our

database not only responds the requests, but also it actively sends some necessary information about

network changes to appropriate agents. This approach eliminates some of mentioned limitations by

enabling efficient distribution of agents and managing access of agents to node/edge properties.

Along with the new simulation model and the proactive database system, we have presented a query

language for the social network database. This query language is used by agents for communication

with the centralized database system. There exist some other SQL-like query languages, proposed to

be used with social network databases. For example, SoQL[16] supports queries for finding groups

and paths in social networks. BiQL[17] supports data manipulation operations on social networks and

FQL[18], developed in Facebook for its third-party developers for accessing Facebook data, supports

operations that a user can do on Facebook’s website. But the goal of our proposed query language is

different. Agents in social network simulation may ask information about the network, (especially

about the nodes and edges in their neighborhood) and they may change the properties of some nodes

or edges (almost always they only update themselves and their own relationships). So we propose a

query language that supports these requirements and is also designed with efficiency in mind for a

proactive database system.

As a technical note, it seems that NoSQL databases and graph databases are useful for implementation

of the proposed simulation model. We have implemented the framework with a relational database,

but we think that NoSQL databases like neo4j[19] are better alternatives in this project and hopefully

we will use them for subsequent versions.

3. Proactive DBMS

In this section we explain the proposed simulation scheme and the proactive database system as the

heart of this scheme. In our model, there is a central database system responsible for maintaining and

managing simulation data and properties of nodes and edges. We propose a query language to be used

by agents for communication with this database system.

3.1. Query Language

Before we describe the architecture and processes of the proactive database, let’s start with some

sample queries that agents may use in social network simulation. We categorize queries in three major

classes: Select, Update and Add-Listener. A select query asks the database about an allowed piece of

information. For example, an agent is usually allowed to ask about properties of its friends and edges

(and sometimes about the friends of friends), so these are some valid select queries:

Select name from friends n where link(me(),n).weight>10

Select * from friendsoffriends f where f.education= ‘PhD’

Select weight, trust-level from links l where l.to.age>18

Sophisticated select queries, like those of SoQL[16] that consider discovering especial paths and

groups among arbitrary nodes, are not supported in our database because the nodes of a real social

network almost never have access to such information.

In many cases, an agent may need a special type of information in each simulation cycle. This agent

would ask the information by repeating a Select query in each cycle, and the database system is forced

to answer it even if no change is available. In this situation, it is more efficient to inform the agent

when a change is happened. To support this mechanism, we propose using Add-Listener queries by

which the agent asks the database system to inform him about any changes in specified nodes and

edges. Consider these examples:

Add node-listener on name,education from friends

Add edge-listener on weight from links

Add node-listener on * from friends

When an agent adds a listener, some triggers are installed on some nodes or edges in the database. If

specified properties change on target nodes/edges, the database proactively informs the agents who

have added a corresponding listener. For example, by the first mentioned example query (from three

above), the agent asks the database to inform him whenever the name or education of one of his

friends is changed.

Finally, an agent may change its properties. Update queries are used to reflect a change in node/edge

properties. Here are some examples:

Update node m set health-status=’sick’ where m=me()

Update edge e set weight=10 where e.id=23

(Permission is denied if the link with id=23 is not one of requesters links)

Update edge e set weight=10 where e.from=me() and e.to.id=43

Add Link l where l.from=me() and l.to.id=45

Remove Link l where l.from=me() and l.to.age<18

As you see, the queries are all contextualized by an agent and the database considers the agent identity

to answer the query and to check the permission. It is not possible to answer an agents query without

knowing its identity because spaces like friends, friendsoffriends and functions like me() would be

undefined. The grammar of this query language is described in the appendix.

3.2. Proactive Database System Components

The high level view of the components of the proposed database system is shown in Fig 2. The query

is first parsed and processed by query processor. Permission checker authorizes the query to be

executed by requester agent. Optimizer tries to convert the query to a more efficient one and then

query executor runs the query using a lower-level database system. The network data (nodes, edges

and their properties) should be stored in a database and query executor component uses this database

for running agent queries. NoSQL databases and graph databases are suitable for this purpose.

If the query is an Update or Add-Listener query, then listener manager component is also involved. An

Add-Listener query causes the installation of some listeners by adding some records to Listener DB.

After executing an Update query by query executor, if there are some agents “listening” to these kinds

of updates, then listener manager component handles the invocation of demanding agents.

Fig 2. Proactive social simulation database system

4. Implementation Issues

We have implemented some parts of the designed model as a social network simulation framework.

Java is used as the programming language and a relational database (PostgreSQL) is used for

maintaining network and simulation data. In this section, we briefly explain some technical notes

about implementation of the proposed model.

4.1. Coordinator

Since the simulation is distributed and scalable, we usually have different purpose and separate

servers: one database server, one scheduler server and one or more servers for running the agents

(Even the database server could be distributed on different stations using technologies like Apache

Hadoop). In our implementation, each agent is running in a separate thread. A coordinator component

is implemented as a process, responsible for managing agent threads. Each coordinator is started on a

separate workstation. The number of running agents on a coordinator depends on the complexity of

agents and the processing power of the workstation, ranging from a few agents to the whole

population.

4.2. Scheduler

The framework is implemented as a discrete-time simulation and a scheduler is designed for cycle

management. In each simulation cycle, agents observe the network, they process the data, they

perform some actions and they perhaps change their properties and characteristics of their edges. A

scheduler component informs the agents and the proactive database about starting a new cycle. Each

simulation cycle is divided into four subcycles:

1. Select. The agents ask Select queries from the proactive database and the database responds.

2. Act. The agents process the world-state and perform some actions according to input data.

3. Update. The agents inform the database about some changes in their state using Update

queries.

4. Listen. The database sends information about changes in node/edge properties to agents who

registered a listener (using Add-Listener queries) for listening to these changes.

Although the scheduler has a constant hard deadline as the finishing time of each cycle and subcycle,

but the duration of a subcycle is dynamic and adaptive. Agents, coordinators and database system

cooperate with scheduler for determining the end of a subcycle. In fact these components help the

scheduler to make the subcycles shorter if it is possible. For example, when an agent finishes asking

select queries from database, it informs its coordinator. When all agents of a coordinator has finished

their select queries, the coordinator informs the scheduler, finally if all coordinators (registered on

scheduler) notify finishing this subcycle, the scheduler announces the end of this subcycle to database

system and all the coordinators. The second and third subcycles have a similar procedure and the

fourth subcycle is managed with cooperation of the database system (instead of coordinators and

agents)

4.3. Access Levels

Each node and each edge has some properties in social network simulation. For example id, name,

age, gender and education are node properties and id, weight, direction and trust-level may appear in

edge properties. The number of defined properties are not restricted by the framework, but in each

simulation instance, the user should specify the list of node/edge properties, the type of each property

(number, Boolean, date-time and string are supported now) and the access level of the property. The

supported access levels are:

Public. Public properties are accessible by all agents.

Friendly. Friendly properties are accessible by friends.

Friend-of-friendly. These properties are accessible by friends and friends-of-friends.

Omnisciently. These properties are not accessible by agents, and their changes are sent to no

agent. but they are stored in the central proactive database for the sake of future analysis or

presentations

Private. Private properties are private to all agents and their changes are not propagated to the

database system. These properties are used only internally in agent processes.

The access level of a property is considered by “permission checker” component of the proactive

database. If the access level of a property is violated by an agent, the query would be rejected. For

example if the access level of “education” is defined as “private”, then an agent cannot ask about the

education level of one of his friends.

5. Discussion and Conclusion

In this paper we proposed a new procedure for social network simulation. The model is scalable since

it is able to distribute the agents in different processing units. The agents access information about the

social network and change it using a central database system. The database is proactive: it eliminates

the need for reselecting the world-state in each simulation cycle by all the agents and it is done by

announcing network changes to appropriate agents. A query language is proposed for this proactive

database, which is used by agents for interacting with the database. The simulation model is

implemented in parts as a software framework, capable of simulating different purpose social network

simulation problems.

In what simulation scenarios could this framework be useful and in what problems we should avoid it?

Here we briefly explain some evidences and characteristics of problems in which the proposed

approach of simulation would be helpful.

Mathematical modeling is hard, impractical or impossible and computer simulation is needed.

The basic structure of the environment is a social network. The social network could be

dynamic and changing, but it should be the primary mechanism of connection and

communication among agents. For example if some mobile agents are spread in a

geographical space, without a defined set of neighbors, and the environment consists of many

entities like roads and obstacles, the proposed method of simulation is not recommended,

because the environment is more complicated than a social network.

A centralized simulation is not possible and we need the agents to be distributed in different

processing units, either because of the size of network or the complexity of agent processes.

There exist some security concerns about the accessibility of network or its properties for

some agents. In this case we recommend utilization of access levels for properties.

In some situations we have some concerns about the exposure of internal structure of agents

by some human users. This requirement could be handled by separating the execution of these

agents in an isolated coordinator server, accessible only by authorized people.

6. Future Works

First of all, we are going to simulate some real or well-known social network scenarios and processes

using our proposed model. We will focus on simulation problems with properties specified in second

and fifth section. We have also plans for improving the performance and efficiency of implemented

framework. For example we want to use efficient community detection algorithms like [20] to divide

the social network into some clusters and to run each cluster on a separate coordinator. If the nodes of

an interconnected cluster are executed on the same processing unit, the need for information

propagations between the database server and the processing unit decreases.

When there is no security concern about the place of agent execution, it is sometimes useful to transfer

an agent from one coordinator to another. In addition to using community detection algorithms,

adaptive displacement of agents among coordinators seems to improve efficiency of the simulation.

New algorithms and models are required in database server for recommending the movement of

agents, and some processes should be supported for this movement by agents, coordinators and the

database server.

As another extension for the proposed framework, we are going to add message passing mechanisms

among the agents. Creating message types and message access levels would also be useful. As

mentioned in previous sections, we also think the utilization of NoSQL and graph databases is a good

choice in implementation of this framework. So we will soon replace the relational database with a

graph database.

References

[1] R. Axelrod, “Advancing the art of simulation in the social sciences,” Complexity, vol. 3, no. 2,

pp. 16-22, Nov. 1997.

[2] “Swarm Group.” [Online]. Available: http://www.swarm.org/index.php/Main_Page.

[Accessed: 14-Apr-2012].

[3] “MASON Multiagent Simulation Toolkit.” [Online]. Available:

http://cs.gmu.edu/~eclab/projects/mason/. [Accessed: 14-Apr-2012].

[4] “Repast Suite.” [Online]. Available: http://repast.sourceforge.net/. [Accessed: 14-Apr-2012].

[5] X. Li, W. Mao, and D. Zeng, “Agent-based social simulation and modeling in social

computing,” Intelligence and Security Informatics, pp. 401-412, 2008.

[6] C. M. Macal and M. J. North, “Tutorial on agent-based modelling and simulation,” Journal of

Simulation, vol. 4, no. 3, pp. 151-162, Sep. 2010.

[7] S. F. Railsback, S. L. Lytinen, and S. K. Jackson, “Agent-based Simulation Platforms: Review

and Development Recommendations,” Simulation, vol. 82, no. 9, pp. 609-623, Sep. 2006.

[8] P. Davidsson, “Agent Based Social Simulation : A Computer Science View,” Journal of

Artificial Societies and Social Simulation, vol. 5, no. 1, 2002.

[9] J. E. Laird, “Extending the Soar cognitive architecture,” in Proceeding of the 2008 Conference

on Artificial General Intelligence, 2008, pp. 224–235.

[10] J. Anderson, M. Matessa, and C. Lebiere, “ACT-R: A Theory of Higher Level Cognition and

Its Relation to Visual Attention,” Human-Computer Interaction, vol. 12, no. 4, pp. 439-462,

Dec. 1997.

[11] R. Sun, “Cognitive Architectures and Multi-agent Social Simulation,” Multi-Agent Systems for

Society, pp. 7–21, 2009.

[12] R. Sun, Cognition and multi-agent interaction: From cognitive modeling to social simulation.

Cambridge Univ Pr, 2006.

[13] “NetLogo Home Page.” [Online]. Available: http://ccl.northwestern.edu/netlogo/. [Accessed:

14-Apr-2012].

[14] F. Cicirelli, A. Furfaro, A. Giordano, and L. Nigro, “Parallel Simulation of Multi-agent

Systems Using Terracotta,” 2010 IEEE/ACM 14th International Symposium on Distributed

Simulation and Real Time Applications, pp. 219-222, Oct. 2010.

[15] “Terracotta.” [Online]. Available: http://terracotta.org/. [Accessed: 16-Apr-2012].

[16] R. Ronen and O. Shmueli, “SoQL: A Language for Querying and Creating Data in Social

Networks,” 2009 IEEE 25th International Conference on Data Engineering, pp. 1595-1602,

Mar. 2009.

[17] A. Dries, S. Nijssen, and L. De Raedt, “A query language for analyzing networks,” in

Proceeding of the 18th ACM conference on Information and knowledge management -

CIKM ’09, 2009, pp. 485-494.

[18] “Facebook Query Language (FQL) - Facebook Developers.” [Online]. Available:

developers.facebook.com/docs/reference/fql/. [Accessed: 14-Apr-2012].

[19] “neo4j: World’s Leading Graph Database.” [Online]. Available: http://neo4j.org/. [Accessed:

14-Apr-2012].

[20] Z. Masdarolomoor, R. Azmi, S. Aliakbary, and N. Riahi, “Finding Community Structure in

Complex Networks Using Parallel Approach,” 2011 IFIP 9th International Conference on

Embedded and Ubiquitous Computing, pp. 474-479, Oct. 2011.

Appendix. Grammar of Social Network Simulation Query Language

The syntax of proposed query language is presented here. This BNF representation of the query

language is compacted and some details are eliminated here, so that to keep it concise.

<sns-query> ::= <select-query> | <update-query> | <listener-query>

<select-query> ::= <select_clause> <from_clause> [alias] <where_clause>

<select_clause> ::= SELECT <select_target>

<select_target> ::= ‘*’ | <property-name> {, <property-name>}*

<from_clause> ::= FROM <from_expression>

<from_expression> ::= friends| friendsoffriends | links

<where_clause> ::= WHERE <conditional_expression>

<update-query> ::= <update-node> | <update-edge> | <add-edge> | <remove-edge>

<update-node> ::= UPDATE NODE <update-query-expression>

<update-edge> ::= UPDATE EDGE <update-query-expression>

<update-query-expression> ::= <alias> SET <set_clause> <where_clause>

<add-edge> ::= ADD LINK <edge-query-expression>

<remove-edge>::= REMOVE LINK <edge-query-expression>

<edge-query-expression> ::= <alias> <where_clause>

<listener-query> ::= ADD <listener-type> ON <listener_target> <from _clause>

<listener_target> ::= <select_target> | EXISTENCE

<listener-type> ::= node-listener | edge-listener

A Proactive Database System and its Query Language for Social Network Simulation

Documents

Transcript of A Proactive Database System and its Query Language for Social Network Simulation