A Proactive Database System and its Query Language for Social Network Simulation
Transcript of A Proactive Database System and its Query Language for Social Network Simulation
A Proactive Database System and its Query Language for Social Network Simulation
Sadegh Aliakbary1, Jafar Habibi
2
Sharif University of Technology
Tehran, Iran
Abstract. Social networks appear in different forms in various domains. Online social networks,
mobile communication networks, co-authoring networks and human interactions in real life are some
examples of social networks. Today, analysis of dynamics and evolution of social networks is an
important field of research. Computer simulation is a powerful method of study – and in some cases
the only one – in this field. Agent-based computer simulation has found many applications for
simulation of social processes and Agent Based Social Simulation (ABSS) became the dominant
method in this area. Although there exist some well-known and mature ABSS models and frameworks
for social simulation, but most of them have limitations for social network simulation.
In this paper we propose a distributed and scalable simulation model for social networks which
exploits a central proactive database system. In this model, it is possible to distribute millions of
agents in different processing units and propagate the simulation data between the agents efficiently
using the central proactive database. We have also proposed a query language for this database system
to be used by agents for capturing network properties and listening to network changes. The database
is considered proactive, since it not only accepts select and update queries from the agents, but it also
actively sends changes of the network to appropriate agents without being queried. We included some
implementation notes about the proposed framework and the applications in which it is applicable and
useful. Researchers and analysts of social networks may use the proposed model and framework for
developing their desired agents and for running the simulation in a scalable and simple framework.
Keywords: Social network, computer simulation, multi agent systems, database, query language.
Table of Contents
1. Introduction
2. Motivations and Related Works
3. Proactive DBMS
3.1. Query Language
3.2. Proactive Database System Components
4. Implementation Issues
4.1. Coordinator
4.2. Scheduler
4.3. Access Levels
5. Discussion and Conclusion
6. Future Works
References
Appendix. Grammar of Social Network Simulation Query Language
1. Introduction
Nowadays, social networks are important and ever-growing structures. A social network is a graph of
some social entities along with their relationships. The nodes of a social network are usually human
beings, but sometimes some other entities like animals, robots, web-sites and computers with their
relationships are considered as social networks. Online social networks (e.g. Facebook, Twitter and
Google+), communication networks (e.g. mobile or email communications), citation networks and
collaboration networks (e.g. in co-authoring a paper) are some examples of social networks. Study of
characteristics and evolution of social networks is a noteworthy field in network analysis, with
applicable methods like mathematical modeling, computer simulation and field research.
Computer simulation, as a third way of doing science (after induction and deduction) [1], is a powerful
and flexible approach in network analysis. When the problem is too complex to be modeled
mathematically, computer simulation would usually be the best method of analysis. With social
network simulation, an analyst represents a social network as a computer program, runs social
activities and processes in a virtual simulated environment and then studies the properties, dynamics
and evolution of the network. Diffusion, synchronization, search, advertisement and bargaining are
just some examples of social network interactions and processes, nominated for simulation.
As a sample application for social network simulation, consider this example: managers of an online
social network web-site are willing to enhance some features of their service, but they are uncertain
about the results and effects of these changes on the behavior of their clients and on key performance
indicators. This social network is a complex system with many interdependent processes (e.g.
diffusion of news, dating, effect of advertisements) and parameters (e.g. properties of members and
relationships). In this situation, the best approach would be the initiation of a simulation of the social
network with desired parameters and processes. Social network simulation has also many applications
in sociology, physics, biology, economics and management.
For simulating a social network, we need an appropriate computer program, and there are some
choices to develop such a program. Since there are many similarities and common functionalities in
different social network simulations, we can implement common features as a core infrastructure and
extend it for different simulation instances. So perhaps the worst way is to program (from scratch) for
requirements of our target social network in each simulation problem. It seems that we need a platform
which helps us with an infrastructure of the simulation and lets us develop and configure desired
behaviors of target social entities. In fact, social network simulation is a special kind of social
simulation and there exist some mature platforms for social simulation like Swarm[2], MASON[3] and
Repast[4]. Among different methods of implementation, agent-based simulation is perhaps the most
popular one in social simulation[5]. In this approach, the social system is modeled as a multi-agent
system with autonomous and intelligent agents, representing and simulating the behavior of real social
entities.
But in many situations, the existing agent-based frameworks are not suitable for simulation of social
networks. They are not designed for social networks, they do not support especial properties and
functionalities of social networks and they are not scalable for very large networks with complicated
agents.
In this paper we propose a new method for social network simulation. The main elements of this
method are agents and a proactive database. The agents simulate behavior of network nodes and the
database manages the state of the network structure and properties of nodes and edges. While usual
database systems (e.g. relational or graph databases) are passive, since they just answer queries of
clients, the database in our proposed method is a proactive one. It not only handles agent queries, but it
also informs agents about changes of the network. With this approach, agents do not need repetitive
queries of database to be informed about the network properties: the database informs an agent about a
network change, if this change is considered important and is accessible by the agent. We designed a
query language for this database and access levels for node/edge properties. The proposed method
allows the simulation of very large networks with complicated agent behaviors by distributing the
execution of agents and performing an efficient information flow process.
The rest of this paper is structured as follows: we review the literature and explain the motivations of
our model in section 2. In the third section the proactive database and its query language is described.
Section 4 is about some technical and implementation notes. Section 5 covers some discussion about
properties of the proposed method and a conclusion. Finally, the future works are presented in the last
section. An appendix is also included at the end of this paper with a short specification for the
grammar of the proposed query language.
2. Motivations and Related Works
In this section we overview the main researches related to social network simulation and our proposed
method. We discuss about their features and their pros and cons and finally we propose a new
approach for social network simulation and we discuss the need and necessity to this new model.
When the outcome of a system is hard to be modeled mathematically, computer simulation would be
the best choice for study and analysis of the system. In many instances of social network analysis
problems, we face such a complex, nonlinear and dynamical system that it leads to simulation as the
only applicable approach of study.
Social simulation is the area of applying computational methods for modeling and simulating social
systems and studying their mechanisms. Agent-based social simulation (ABSS) is one of the most
important methods and perhaps the dominant approach in this area [5], [6]. Popularity of ABSS
models and frameworks is growing rapidly[7]. ABSS has found many applications in physics, biology,
sociology and management[6]. As Fig 1 shows, ABSS is the intersection of three scientific fields:
social science, agent-based computing and computer simulation [5][8].
Fig 1. ABSS as a cross disciplinary research field [5]
Some cognitive architectures are proposed by cognitive scientists, for modeling the structure and
behavior of agents, among them Soar [9], ACT-R [10] and Clarion [11] are more popular [12].
Although the use of cognitive architectures for developing agents in our proposed simulation approach
is possible, but our proposed method is a high-level framework for the simulation, so cognitive
architectures are regarded out of the scope in this discussion.
Social simulation frameworks like MASON [3] , Repast [4], Swarm[2] and NetLogo [13] offer
common functionalities of a simulation. The user of these frameworks should implement the behavior
of agents and simulate the behaviors using the tools provided by the framework. These frameworks
have made possible a growing number of applications in a variety of fields and domains. Perhaps the
first candidate for social network simulation is now using one of these agent-based social simulation
frameworks as the starting point and the infrastructure of the simulation.
Despite the existence of ABSS frameworks, we believe that a new simulation model is required in
many instances of social network simulation problems. ABSS frameworks are not designed with social
networks in mind. They usually support simple reactive agents, without a powerful support of scalable
world-state management. For example, in MASON there is a shared world-state object, named
SimState (simulation state: agent properties, environment and etc.) among all the agents. This model
of accessing the world-sate is inefficient, insecure and unscalable. This model and similar approaches
of current ABSS frameworks does not support (at least some of) requirements like these ones:
Limited information access. An agent should see parts of the network that is allowed (not the
whole network).
Large networks. For very large networks with millions of nodes and edges, it is not possible to
run the simulation on a single machine. (CPU and memory limits)
Complex agents. Centralized simulation frameworks do not support distribution of agents
among multiple processing units.
Interrupt instead of polling. In an ordinary approach, an agent requests the required
information in each simulation cycle from the world-state object, but in many circumstances
an agent is waiting for something to happen. In this case it is better (more efficient) to inform
the agent about a change whenever it is necessary, instead of forcing the agent to ask the state
in each cycle.
Hidden agent implementation. In some applications, the internal architecture and
implementation of agents should be hidden and only their interaction with the network could
be public.
Flexible time period for simulation cycles. The time length of a cycle is fixed in most of
simulation engines, but some cycles may consume more or less time to finish.
Some attempts are reported in the literature to encounter some of these challenges. For example
authors of [14] proposed using Terracotta [15] for providing a global heap memory for Repast agents.
It seems that grid, cluster and cloud computing techniques are also able to handle some parts (but not
all) of scalability issues. But we think that no thorough research is available handling all of these
challenges as a social network simulation platform or model. It seems that we need a more powerful
and efficient model of information management and propagation (in handling properties of nodes and
edges) for social network simulation, with better mechanisms for accessing and changing state of the
network.
In this paper we propose a proactive database system and a new simulation paradigm to overcome
mentioned limitations. In our model, world-state is maintained and managed in a centralized social
network database and the agents are executed on different machines. The centralized database is
proactive, meaning that the database may call agents and inform them about a change. Unlike usual
database systems (e.g. relational DBMSs) which are passive, handling only the requests of clients, our
database not only responds the requests, but also it actively sends some necessary information about
network changes to appropriate agents. This approach eliminates some of mentioned limitations by
enabling efficient distribution of agents and managing access of agents to node/edge properties.
Along with the new simulation model and the proactive database system, we have presented a query
language for the social network database. This query language is used by agents for communication
with the centralized database system. There exist some other SQL-like query languages, proposed to
be used with social network databases. For example, SoQL[16] supports queries for finding groups
and paths in social networks. BiQL[17] supports data manipulation operations on social networks and
FQL[18], developed in Facebook for its third-party developers for accessing Facebook data, supports
operations that a user can do on Facebook’s website. But the goal of our proposed query language is
different. Agents in social network simulation may ask information about the network, (especially
about the nodes and edges in their neighborhood) and they may change the properties of some nodes
or edges (almost always they only update themselves and their own relationships). So we propose a
query language that supports these requirements and is also designed with efficiency in mind for a
proactive database system.
As a technical note, it seems that NoSQL databases and graph databases are useful for implementation
of the proposed simulation model. We have implemented the framework with a relational database,
but we think that NoSQL databases like neo4j[19] are better alternatives in this project and hopefully
we will use them for subsequent versions.
3. Proactive DBMS
In this section we explain the proposed simulation scheme and the proactive database system as the
heart of this scheme. In our model, there is a central database system responsible for maintaining and
managing simulation data and properties of nodes and edges. We propose a query language to be used
by agents for communication with this database system.
3.1. Query Language
Before we describe the architecture and processes of the proactive database, let’s start with some
sample queries that agents may use in social network simulation. We categorize queries in three major
classes: Select, Update and Add-Listener. A select query asks the database about an allowed piece of
information. For example, an agent is usually allowed to ask about properties of its friends and edges
(and sometimes about the friends of friends), so these are some valid select queries:
Select name from friends n where link(me(),n).weight>10
Select * from friendsoffriends f where f.education= ‘PhD’
Select weight, trust-level from links l where l.to.age>18
Sophisticated select queries, like those of SoQL[16] that consider discovering especial paths and
groups among arbitrary nodes, are not supported in our database because the nodes of a real social
network almost never have access to such information.
In many cases, an agent may need a special type of information in each simulation cycle. This agent
would ask the information by repeating a Select query in each cycle, and the database system is forced
to answer it even if no change is available. In this situation, it is more efficient to inform the agent
when a change is happened. To support this mechanism, we propose using Add-Listener queries by
which the agent asks the database system to inform him about any changes in specified nodes and
edges. Consider these examples:
Add node-listener on name,education from friends
Add edge-listener on weight from links
Add node-listener on * from friends
When an agent adds a listener, some triggers are installed on some nodes or edges in the database. If
specified properties change on target nodes/edges, the database proactively informs the agents who
have added a corresponding listener. For example, by the first mentioned example query (from three
above), the agent asks the database to inform him whenever the name or education of one of his
friends is changed.
Finally, an agent may change its properties. Update queries are used to reflect a change in node/edge
properties. Here are some examples:
Update node m set health-status=’sick’ where m=me()
Update edge e set weight=10 where e.id=23
(Permission is denied if the link with id=23 is not one of requesters links)
Update edge e set weight=10 where e.from=me() and e.to.id=43
Add Link l where l.from=me() and l.to.id=45
Remove Link l where l.from=me() and l.to.age<18
As you see, the queries are all contextualized by an agent and the database considers the agent identity
to answer the query and to check the permission. It is not possible to answer an agents query without
knowing its identity because spaces like friends, friendsoffriends and functions like me() would be
undefined. The grammar of this query language is described in the appendix.
3.2. Proactive Database System Components
The high level view of the components of the proposed database system is shown in Fig 2. The query
is first parsed and processed by query processor. Permission checker authorizes the query to be
executed by requester agent. Optimizer tries to convert the query to a more efficient one and then
query executor runs the query using a lower-level database system. The network data (nodes, edges
and their properties) should be stored in a database and query executor component uses this database
for running agent queries. NoSQL databases and graph databases are suitable for this purpose.
If the query is an Update or Add-Listener query, then listener manager component is also involved. An
Add-Listener query causes the installation of some listeners by adding some records to Listener DB.
After executing an Update query by query executor, if there are some agents “listening” to these kinds
of updates, then listener manager component handles the invocation of demanding agents.
Fig 2. Proactive social simulation database system
4. Implementation Issues
We have implemented some parts of the designed model as a social network simulation framework.
Java is used as the programming language and a relational database (PostgreSQL) is used for
maintaining network and simulation data. In this section, we briefly explain some technical notes
about implementation of the proposed model.
4.1. Coordinator
Since the simulation is distributed and scalable, we usually have different purpose and separate
servers: one database server, one scheduler server and one or more servers for running the agents
(Even the database server could be distributed on different stations using technologies like Apache
Hadoop). In our implementation, each agent is running in a separate thread. A coordinator component
is implemented as a process, responsible for managing agent threads. Each coordinator is started on a
separate workstation. The number of running agents on a coordinator depends on the complexity of
agents and the processing power of the workstation, ranging from a few agents to the whole
population.
4.2. Scheduler
The framework is implemented as a discrete-time simulation and a scheduler is designed for cycle
management. In each simulation cycle, agents observe the network, they process the data, they
perform some actions and they perhaps change their properties and characteristics of their edges. A
scheduler component informs the agents and the proactive database about starting a new cycle. Each
simulation cycle is divided into four subcycles:
1. Select. The agents ask Select queries from the proactive database and the database responds.
2. Act. The agents process the world-state and perform some actions according to input data.
3. Update. The agents inform the database about some changes in their state using Update
queries.
4. Listen. The database sends information about changes in node/edge properties to agents who
registered a listener (using Add-Listener queries) for listening to these changes.
Although the scheduler has a constant hard deadline as the finishing time of each cycle and subcycle,
but the duration of a subcycle is dynamic and adaptive. Agents, coordinators and database system
cooperate with scheduler for determining the end of a subcycle. In fact these components help the
scheduler to make the subcycles shorter if it is possible. For example, when an agent finishes asking
select queries from database, it informs its coordinator. When all agents of a coordinator has finished
their select queries, the coordinator informs the scheduler, finally if all coordinators (registered on
scheduler) notify finishing this subcycle, the scheduler announces the end of this subcycle to database
system and all the coordinators. The second and third subcycles have a similar procedure and the
fourth subcycle is managed with cooperation of the database system (instead of coordinators and
agents)
4.3. Access Levels
Each node and each edge has some properties in social network simulation. For example id, name,
age, gender and education are node properties and id, weight, direction and trust-level may appear in
edge properties. The number of defined properties are not restricted by the framework, but in each
simulation instance, the user should specify the list of node/edge properties, the type of each property
(number, Boolean, date-time and string are supported now) and the access level of the property. The
supported access levels are:
Public. Public properties are accessible by all agents.
Friendly. Friendly properties are accessible by friends.
Friend-of-friendly. These properties are accessible by friends and friends-of-friends.
Omnisciently. These properties are not accessible by agents, and their changes are sent to no
agent. but they are stored in the central proactive database for the sake of future analysis or
presentations
Private. Private properties are private to all agents and their changes are not propagated to the
database system. These properties are used only internally in agent processes.
The access level of a property is considered by “permission checker” component of the proactive
database. If the access level of a property is violated by an agent, the query would be rejected. For
example if the access level of “education” is defined as “private”, then an agent cannot ask about the
education level of one of his friends.
5. Discussion and Conclusion
In this paper we proposed a new procedure for social network simulation. The model is scalable since
it is able to distribute the agents in different processing units. The agents access information about the
social network and change it using a central database system. The database is proactive: it eliminates
the need for reselecting the world-state in each simulation cycle by all the agents and it is done by
announcing network changes to appropriate agents. A query language is proposed for this proactive
database, which is used by agents for interacting with the database. The simulation model is
implemented in parts as a software framework, capable of simulating different purpose social network
simulation problems.
In what simulation scenarios could this framework be useful and in what problems we should avoid it?
Here we briefly explain some evidences and characteristics of problems in which the proposed
approach of simulation would be helpful.
Mathematical modeling is hard, impractical or impossible and computer simulation is needed.
The basic structure of the environment is a social network. The social network could be
dynamic and changing, but it should be the primary mechanism of connection and
communication among agents. For example if some mobile agents are spread in a
geographical space, without a defined set of neighbors, and the environment consists of many
entities like roads and obstacles, the proposed method of simulation is not recommended,
because the environment is more complicated than a social network.
A centralized simulation is not possible and we need the agents to be distributed in different
processing units, either because of the size of network or the complexity of agent processes.
There exist some security concerns about the accessibility of network or its properties for
some agents. In this case we recommend utilization of access levels for properties.
In some situations we have some concerns about the exposure of internal structure of agents
by some human users. This requirement could be handled by separating the execution of these
agents in an isolated coordinator server, accessible only by authorized people.
6. Future Works
First of all, we are going to simulate some real or well-known social network scenarios and processes
using our proposed model. We will focus on simulation problems with properties specified in second
and fifth section. We have also plans for improving the performance and efficiency of implemented
framework. For example we want to use efficient community detection algorithms like [20] to divide
the social network into some clusters and to run each cluster on a separate coordinator. If the nodes of
an interconnected cluster are executed on the same processing unit, the need for information
propagations between the database server and the processing unit decreases.
When there is no security concern about the place of agent execution, it is sometimes useful to transfer
an agent from one coordinator to another. In addition to using community detection algorithms,
adaptive displacement of agents among coordinators seems to improve efficiency of the simulation.
New algorithms and models are required in database server for recommending the movement of
agents, and some processes should be supported for this movement by agents, coordinators and the
database server.
As another extension for the proposed framework, we are going to add message passing mechanisms
among the agents. Creating message types and message access levels would also be useful. As
mentioned in previous sections, we also think the utilization of NoSQL and graph databases is a good
choice in implementation of this framework. So we will soon replace the relational database with a
graph database.
References
[1] R. Axelrod, “Advancing the art of simulation in the social sciences,” Complexity, vol. 3, no. 2,
pp. 16-22, Nov. 1997.
[2] “Swarm Group.” [Online]. Available: http://www.swarm.org/index.php/Main_Page.
[Accessed: 14-Apr-2012].
[3] “MASON Multiagent Simulation Toolkit.” [Online]. Available:
http://cs.gmu.edu/~eclab/projects/mason/. [Accessed: 14-Apr-2012].
[4] “Repast Suite.” [Online]. Available: http://repast.sourceforge.net/. [Accessed: 14-Apr-2012].
[5] X. Li, W. Mao, and D. Zeng, “Agent-based social simulation and modeling in social
computing,” Intelligence and Security Informatics, pp. 401-412, 2008.
[6] C. M. Macal and M. J. North, “Tutorial on agent-based modelling and simulation,” Journal of
Simulation, vol. 4, no. 3, pp. 151-162, Sep. 2010.
[7] S. F. Railsback, S. L. Lytinen, and S. K. Jackson, “Agent-based Simulation Platforms: Review
and Development Recommendations,” Simulation, vol. 82, no. 9, pp. 609-623, Sep. 2006.
[8] P. Davidsson, “Agent Based Social Simulation : A Computer Science View,” Journal of
Artificial Societies and Social Simulation, vol. 5, no. 1, 2002.
[9] J. E. Laird, “Extending the Soar cognitive architecture,” in Proceeding of the 2008 Conference
on Artificial General Intelligence, 2008, pp. 224–235.
[10] J. Anderson, M. Matessa, and C. Lebiere, “ACT-R: A Theory of Higher Level Cognition and
Its Relation to Visual Attention,” Human-Computer Interaction, vol. 12, no. 4, pp. 439-462,
Dec. 1997.
[11] R. Sun, “Cognitive Architectures and Multi-agent Social Simulation,” Multi-Agent Systems for
Society, pp. 7–21, 2009.
[12] R. Sun, Cognition and multi-agent interaction: From cognitive modeling to social simulation.
Cambridge Univ Pr, 2006.
[13] “NetLogo Home Page.” [Online]. Available: http://ccl.northwestern.edu/netlogo/. [Accessed:
14-Apr-2012].
[14] F. Cicirelli, A. Furfaro, A. Giordano, and L. Nigro, “Parallel Simulation of Multi-agent
Systems Using Terracotta,” 2010 IEEE/ACM 14th International Symposium on Distributed
Simulation and Real Time Applications, pp. 219-222, Oct. 2010.
[15] “Terracotta.” [Online]. Available: http://terracotta.org/. [Accessed: 16-Apr-2012].
[16] R. Ronen and O. Shmueli, “SoQL: A Language for Querying and Creating Data in Social
Networks,” 2009 IEEE 25th International Conference on Data Engineering, pp. 1595-1602,
Mar. 2009.
[17] A. Dries, S. Nijssen, and L. De Raedt, “A query language for analyzing networks,” in
Proceeding of the 18th ACM conference on Information and knowledge management -
CIKM ’09, 2009, pp. 485-494.
[18] “Facebook Query Language (FQL) - Facebook Developers.” [Online]. Available:
developers.facebook.com/docs/reference/fql/. [Accessed: 14-Apr-2012].
[19] “neo4j: World’s Leading Graph Database.” [Online]. Available: http://neo4j.org/. [Accessed:
14-Apr-2012].
[20] Z. Masdarolomoor, R. Azmi, S. Aliakbary, and N. Riahi, “Finding Community Structure in
Complex Networks Using Parallel Approach,” 2011 IFIP 9th International Conference on
Embedded and Ubiquitous Computing, pp. 474-479, Oct. 2011.
Appendix. Grammar of Social Network Simulation Query Language
The syntax of proposed query language is presented here. This BNF representation of the query
language is compacted and some details are eliminated here, so that to keep it concise.
<sns-query> ::= <select-query> | <update-query> | <listener-query>
<select-query> ::= <select_clause> <from_clause> [alias] <where_clause>
<select_clause> ::= SELECT <select_target>
<select_target> ::= ‘*’ | <property-name> {, <property-name>}*
<from_clause> ::= FROM <from_expression>
<from_expression> ::= friends| friendsoffriends | links
<where_clause> ::= WHERE <conditional_expression>
<update-query> ::= <update-node> | <update-edge> | <add-edge> | <remove-edge>
<update-node> ::= UPDATE NODE <update-query-expression>
<update-edge> ::= UPDATE EDGE <update-query-expression>
<update-query-expression> ::= <alias> SET <set_clause> <where_clause>
<add-edge> ::= ADD LINK <edge-query-expression>
<remove-edge>::= REMOVE LINK <edge-query-expression>
<edge-query-expression> ::= <alias> <where_clause>
<listener-query> ::= ADD <listener-type> ON <listener_target> <from _clause>
<listener_target> ::= <select_target> | EXISTENCE
<listener-type> ::= node-listener | edge-listener