Procedia Computer Science Procedia Computer Science 00 (2011) 000–000
www.elsevier.com/locate/procedia
3rd International Conference on Ambient Systems, Networks and Technologies
Resource Allocation for Multi-user Cognitive Radio Systems
using Multi-agent Q-Learning
Ahmed AZZOUNAa, Amel GUEZMIL
a, Anis SAKLY
b, Abdellatif MTIBAA
a,b
aCSR GROUP of Electronic and Microelectronic Laboratory, University of Monastir, Tunisia bNational Engineering School of Monastir, University of Monastir, Tunisia
Abstract
Cognitive Radio (CR) is a new generation of wireless communication system that enables unlicensed users to exploit
underutilized licensed spectrum to optimize the radio spectrum utilization. The resource allocation is difficult to
achieve in a dynamic distributed environment, in which CR users take decisions to select a channel without
negotiation, and react to the environmental changes. This paper focuses on using a multi-agent reinforcement-
learning (MARL), Q-learning algorithm, on channels selection decision by secondary users in 2×2 and 3×3 cognitive
radio system. Numerical results, obtained with MATLAB, demonstrate that resource allocation is realized without
any negotiation between secondary and primary users. In this work, the analogy between the numerical and simulated
results is also noted.
Keywords: Cognitive Radio; Resource Allocation; Multi-agent Reinforcement Learning; Q-Learning Algorithm
1. Introduction
To address the scarcity of spectrum resources and the wireless networks deploying difficulties, several
innovative approaches have been proposed allowing a dynamic and opportunistic use of unused frequency
spectrum [1-5], in addition to the current approach of static spectrum allocation.
Cognitive radio (CR) is a concept that responds this challenge. Radio technology is no longer enslaved
to a single type of network and a limited number of frequency bands. Based on software radio methods
and computing capabilities and analysis, cognitive radio observes, analyzes and adapts to its environment
in a very dynamic way. It’s particularly able to choose the frequency bands and radio networks available
depending on the desired quality of service (QoS) and rules imposed by the regulator. But beyond the
concept developed by Mitola in a founding document [6-7], much work remains to be done: transmission
innovative methods, field measurements, network architecture, heterogeneous networks management,
detection algorithms, dynamic frequency allocation, networks performance optimization...
Management methods of the frequency spectrum should know considerable changes in the coming
years [8]. They are evolving essentially under the influence of two major trends. First, many scientists
suppose that currently the spectrum is not enough efficiently used. On the other hand, radio technology
2 Author name / Procedia Computer Science 00 (2012) 000–000
becoming more and more flexible, allowing the same device to access easily to wide ranges of spectrum
and different types of networks (WiFi, WiMAX, 2G, 3G, 3G+ etc.).
Since its founding by Mitola, Cognitive Radio has been the target of several studies in order to find
new techniques for spectrum detection [9-10], collaborative spectrum sensing [11-12], spectral efficiency
optimization [13-14], radio resource management [15], resource allocation [16-18], and even some
contributions to the hardware implementation of some algorithms [19-20].
In contrast to existing systems where the spectrum allocation is static, the cognitive radio terminals can
dynamically find the network access frequency by detecting free spectrum bands. The resource allocation
is still a major problem addressed by researchers from different angles using several techniques for
example dynamic spectrum access [1], learning algorithms [21] such as Q-learning [22-23] and
reinforcement-learning [24-25], neurons networks [26-27], inference engine and fuzzy techniques [28]
and even meta-heuristic algorithms such as genetic algorithms [29-31] and biological algorithms [32].
This paper focuses on using a multi-agent reinforcement-learning (MARL), Q-learning algorithm, on
channels selection decision by secondary users in cognitive radio system with two channels, two SU
(2×2) and three canals, three SU (3×3). Multi-agent learning is a promising direction of recent research in
the context of intelligent systems. The single-agent case (SARL) has been studied extensively over the
past two decades; however, the multi-agent case has been little studied due to its complexity. When
multiple autonomous agents learn and operate simultaneously, the environment is strictly unpredictable
and all assumptions that are made in the single-agent case, such as stationary and Markov property, are
often inapplicable in the multi-agent context.
2. System Model
In multi-user and multi-channel cognitive radio systems, the resource change is very dynamic.
Therefore, without negotiation between different SUs, serious problems would occur in data transmission
because of the collusion the data packet may be lost, see Fig 1.
Fig. 1. Competition in multi-user and multi-channel cognitive radio systems
This paper treats the N×N case systems with N active secondary users (denoted by letters A,B,C,... )
and N frequency channels (denoted by numbers , , ,...1 2 3 ). The channel selection problem becomes a
game N×N [33]: the secondary user i , i A,B,C,... , receives a reward ijR 0
when transmitting
through the free channel j , j , , ,... 1 2 3 , as long as the other SUs do not use this channel and ijR 0
if one or more SUs attempt to transmit through this channel.
In this game, there is no communication between SUs and the rewards received are unknown to all
users.
Channel 1
SU A
SU B
SU C
SU D
Channel 2
Channel 3
Channel 4
Author name / Procedia Computer Science 00 (2012) 000–000 3
3. Q-learning
In cognitive radio systems, each cognitive user is considered as an agent and the wireless network as
the external environment. Cognitive radio can be formulated as a system in which communicating agents
sense their environment, learn, and adjust their transmission parameters in order to optimize their
performance. This formulation represents the reinforcement learning context, see Fig 2.
Fig. 2. Multi-agent reinforcement learning based cognitive radio
Q-learning is an online reinforcement learning algorithm that determines optimal policy without
detailed modeling of the system environment [21]. Denote decision epochs by t , ,... 1 2 , a constant
epoch duration by Dt , actions by
ta A , and delayed rewards by t tr a1 . Each agent i maintains a Q-
table with A entries to keep track of learnt action value or Q-value, tQ a within an interval of max,Q0
for all its possible actions. The Q-value estimates the level of local reward for an action a ; hence changes
in the Q-value will lead to changes in an agent's action. At each decision epoch t , agent i chooses an
action ta and receives a local reward t tr a1
at time t 1 . The agent i updates the Q-value of action
ta at time t 1
as follows:
i i i i i i
t t t t t tQ a α .Q a α.r a 1 11 (1)
where α , 0 1 is the learning factor.
To assure that all actions will be considered, Boltzmann distribution is used for random exploration,
i.e.
user choose channel ij
ij ij
Q / γ
Q / γQ / γ
eP i j
e e
(2)
where γ is called Boltzmann temperature, which controls the exploration frequency and j
the other
users different from user j.
Convergence using geometric argument proposed in [34], provide an intuitive explanation that the
updating rule of Q-Value, given in (1), will converge to a stationary equilibrium point. Fig 3 represents
Spectrum Decision
Spectrum Mobility
Sp
ectr
um
Sh
arin
g S
pectr
um
Sen
sing
Radio
Environment
Cognitive Users
4 Author name / Procedia Computer Science 00 (2012) 000–000
the dynamics in the Q-learning algorithm for the 2×2 case (B B Bu Q / Q 1 2 versus
A A Au Q / Q 1 2 ). As
can be seen, the plan is divided into four zones limited by Au 1
and
Bu 1 . In the first and third
regions (I and III), each secondary user prefers selecting a different channel. Meanwhile, in the second
and forth regions (II and IV), both secondary users prefer selecting the same channel.
Fig. 3. Dynamics in the Q-learning algorithm
4. Numerical Results
This section demonstrates the analogy between the numerical and simulated results obtained with
MATLAB.
4.1. 2×2 case:
The demonstration of Q-learning algorithm convergence is given by Fig 4 and 5. This figures show the
dynamics of B B Bu Q / Q 1 2 versus
A A Au Q / Q 1 2 for different trajectories. For an initial unstable point,
where the two SUs collide and choose the same channel (zone I or III in Fig 3), the curves converge to a
stable zone (II or IV in Fig 3) which means that SU A selected the channel 1 whilst the SU B selected
channel 2 (or vice versa) and the collusion problem is diverted. This result is confirmed by Fig 6 and 7
which represent the SUs probabilities to choose the first channel. For an initial case, where the two SUs
choose the first channel (A BP P 1 1 1 ), the Q-learning algorithm has avoided the collision by keeping
the second SU choice and switching the first SU to the channel 2 in Fig 6 (and vice versa for Fig 7).
The learning procedure, obtained from 1000 spectrum access periods, is considered as completed when
the probabilities of choosing a channel are larger than 95% for one SU and smaller than 5% for the other.
Fig 8 and 9 represent the learning speed delay for different learning factors α0 and different
Boltzmann temperatures γ , respectively. This Figures show that the learning procedure is faster with a
higher learning factor and a smaller temperature.
4.2. 3×3 case:
The developed algorithm is still valid for the case of 3 channels and 3 secondary users. Fig 10 shows
the evolution of channel’s probability choice for each user. As can be seen, the collision is avoided in a
situation where all users initially choose channel 1.
I II
IV III
1
1
0 QB1/QB2
QA1/QA2
Author name / Procedia Computer Science 00 (2012) 000–000 5
To analyze the impact of channels and SUs number for fixed α and γ , the learning speed for both
cases (2×2 and 3×3) is given in Fig 11. As can be seen, the learning procedure is becoming slower by
increasing N. Nevertheless, this minor difference remains insignificant.
Fig. 4. An example of dynamics of the Q-learning algorithm Fig. 5. An example of dynamics of the Q-learning algorithm
Fig. 6. An example of the evolution of channel selection
probability (N=2)
Fig. 7. An example of the evolution of channel selection
probability (N=2)
Fig. 8. CDF of learning delay with different learning factor α0 Fig. 9. CDF of learning delay with different temperature γ
6 Author name / Procedia Computer Science 00 (2012) 000–000
Fig. 10. An example of the evolution of channel selection probability (N=3)
Fig. 11. CDF of learning delay for N=2 and N=3
Author name / Procedia Computer Science 00 (2012) 000–000 7
5. Conclusion
This paper used a multi-agent reinforcement-learning, Q-learning algorithm, on channels selection
decision by secondary users in cognitive radio system. For simplicity, 2×2 and 3×3 systems was studied.
The learning procedure for spectrum access without negotiation in cognitive radio systems is discussed.
During the learning, each SU considered the channel and other secondary users as its environment,
updated its Q-values, and taked the best action. The learning speed delay was compared for different
learning factors α0 and different Boltzmann temperatures γ . Numerical results showed that SU could
avoid collusion in both cases 2×2 and 3×3.
References
[1] Li, C.; Li, C.; "Dynamic Channel Selection Algorithm for Cognitive Radios", 4th IEEE Conference on Circuits and Systems for Communications (CSC’08), Shanghai, China, May 26-28 2008.
[2] Unnikrishnan, J.; Veeravalli, V.V.; "Algorithms for Dynamic Spectrum Access With Learning for Cognitive Radio", IEEE
Transactions on Signal Processing, vol. 58, n°2, pp.750-760, 2010. [3] Bogere, P.; Okello, D.; "Dynamic spectrum allocation in multiuser wireless networks", IST-Africa Conference Proceedings,
Gaborone, Botswana, May 11-13, 2011.
[4] Alshamrani, A.; Liang-Liang Xie; Xuemin Shen; "Adaptive Admission-Control and Channel-Allocation Policy in Cooperative Ad Hoc Opportunistic Spectrum Networks", IEEE Transactions on Vehicular Technology, vol. 59, n°1, pp. 1618-1629,
May 2010.
[5] Mangold, S.; Zhun Zhong; Challapali, K.; Chun-Ting Chou; "Spectrum Agile Radio: Radio Resource Measurements For Opportunistic Spectrum Usage", IEEE Global Telecommunications Conference (GLOBECOM’04), Dallas, USA, November 29-
December 3, 2004.
[6] Joseph Mitola III; "Cognitive Radio: an Integrated Agent Architecture for Software Defined Radio" Ph.D. dissertation, Computer Communication System Laboratory, Department of Teleinformatics, Royal Institute of Technology (KTH), Stockholm,
Sweden, May 2000.
[7] Joseph Mitola III; "Software Radio Architecture: Object-Oriented Approaches to Wireless Systems Engineering" John Wiley & Sons, New York, USA, 2000.
[8] Qijun Song; Yanbing Su; Wei Zhao; Du Wei; Tian Tian; "Methods on Further Improving the Spectrum Management", 4th
International Symposium on Electromagnetic Compatibility (EMC’07), Qingdao, China, October 23-26, 2007. [9] Yi-bing Li; Hui Huang; Fang Ye; "Spectrum Detection Model for Cognitive Radio Networks", 5th IEEE International
Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA’10), Changsha, China, September 23-26, 2010.
[10] Zengyou Sun; Qianchun Wang; Chenghua Che; "Study of Cognitive Radio Spectrum Detection in OFDM System", Asia-Pacific Conference on Wearable Computing Systems (APWCS’10), Shenzhen, China, April 17-18, 2010.
[11] Hong Li; Ma Junfei; Xu Fangmin; Li ShuRong; Zhou Zheng; "Optimization of Collaborative Spectrum Sensing for
Cognitive Radio", IEEE International Conference on Networking, Sensing and Control (ICNSC’08), Sanya, China, April 6-8, 2008. [12] Arshad, K.; Moessner, K.; "Collaborative Spectrum Sensing for Cognitive Radio", IEEE International Conference on
Communications Workshops (ICC’09), Dresden, Germany, June 14-18, 2009.
[13] Taki, M.; Lahouti, F.; "Spectral Efficiency Optimized Adaptive Transmission for Interfering Cognitive Radios", IEEE International Conference on Communications Workshops (ICC’09), Dresden, Germany, June 14-18, 2009.
[14] Liaoyuan Zeng; McGrath, S.; "Spectrum Efficiency Optimization in Multiuser Ultra Wideband Cognitive Radio Networks",
7th International Symposium on Wireless Communication Systems (ISWCS’10), York, UK, September 19-22, 2010. [15] Waheed, M.; Anni Cai; "Evolutionary algorithms for radio resource management in cognitive radio network", 28th IEEE
International Performance Computing and Communications Conference (IPCCC’09), Phoenix, Arizona, USA, December 14-16,
2009. [16] Akter, L.; Natarajan, B.; "Modeling Fairness in Resource Allocation for Secondary Users in a Competitive Cognitive Radio
Network", 9th Wireless Telecommunications Symposium, Tampa, Florida, USA, April 21-23, 2010.
[17] Wang, P.; Matyjas, J.; Medley, M.; "Joint Spectrum Allocation and Scheduling in Multi-Radio Multi-Channel Cognitive Radio Wireless Networks", 33rd IEEE Sarnoff Symposium, New Jersey, USA, April 12-14, 2010.
[18] Yang, K.; Wang, X.; "Optimal Radio Allocation for Multi-radio Cognitive Wireless Networks", Global
Telecommunications Conference, San Fransisco, California, USA, November 27-December 1, 2006. [19] Hayar, A.M.; Pacalet, R.; Knopp, R.; "Cognitive radio Research and Implementation Challenges", 41st Asilomar
Conference on Signals, Systems and Computers, Pacific Grove, California, USA, November 4-7, 2007.
8 Author name / Procedia Computer Science 00 (2012) 000–000
[20] Ren, Y.; Dmochowski, P.; Komisarczuk, P.; "Analysis and Implementation of Reinforcement Learning on a GNU Radio
Cognitive Radio Platform", 5th International Conference on Cognitive Radio Oriented Wireless Networks & Communications,
Cannes, France, June 09-11 2010. [21] Di Felice, M.; Chowdhury, K.R.; Wu, C.; Bononi, L.; Meleis, W.; "Learning-Based Spectrum Selection in Cognitive Radio
Ad Hoc Networks", 8th International Conference on Wired/Wireless Internet Communications (WWIC), Lulea, Sweden, June 1-3,
2010. [22] Li, H.; "Multi-agent Q-Learning of Channel Selection in Multi-user Cognitive Radio Systems: A Two by Two Case", IEEE
Conference on Systems, Man, and Cybernetics, San Antonio, Texas, USA, October 11-14, 2009.
[23] Li, M.; Xu, Y.; Hu, J.; "A Q-Learning Based Sensing Task Selection Scheme for Cognitive Radio Networks", International Conference on Wireless Communications & Signal Processing, Nanjing, China, November 13-15, 2009.
[24] Wu, C.; Chowdhury, K.R.; Di Felice, M.; Meleis, W.; "Spectrum Management of Cognitive Radio Using Multi-agent
Reinforcement Learning", 9th International Conference on Autonomous Agents and Multiagent Systems: Industry track, Montréal, Canada, May 10-14, 2010.
[25] Berthold, U.; Fangwen Fu; van der Schaar, M.; Jondral, F.K.; "Detection of Spectral Resources in Cognitive Radios Using Reinforcement Learning", 3rd IEEE New Frontiers in Dynamic Spectrum Access Networks, Chicago, USA, October 14-17, 2008.
[26] Zhang, Z.; Xie, X.; "Intelligent Cognitive Radio: Research on Learning and Evaluation of CR Based on Neural Network",
5th Information and Communications Technology, Cairo, Egypt, December 16-18, 2007. [27] Baldo, N.; Zorzi, M.; "Learning and Adaptation in Cognitive Radios using Neural Networks", 5th IEEE Consumer
Communications and Networking Conference, Las Vegas, Nevada, USA, January 10-12, 2008.
[28] Huang, Y., Wang, J.; Jiang, H.; "Modeling of Learning Inference and Decision-Making Engine in Cognitive Radio", 2nd International Conference on Networks Security, Wireless Communications and Trusted Computing, Wuhan, China, April 24-25
2010.
[29] Hauris, J. F.; He, Donya; Michel, Geremy; Ozbay, Cahit; "Cognitive Radio and RF Communications Design Optimization using Genetic Algorithms", Military Communications Conference, Orlando, Florida, USA, October 29-31, 2007.
[30] Zhang, Z.; Xie, X.; "Application Research of Evolution in Cognitive Radio Based on GA", 3rd IEEE Conference on
Industrial Electronics and Applications, Singapore, Singapore, June 03-05, 2008. [31] Herry, S.; Le Martret, C.J.; "Parameter Determination of Secondary User Cognitive Radio Network Using Genetic
Algorithm", IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Victoria, B.C., Canada, August
23-26, 2009. [32] Xu Mao; Hong Ji; "Biologically-Inspired Distributed Spectrum Access for Cognitive Radio Network", 6th International
Conference on Wireless Communications Networking and Mobile Computing (WiCOM’10), Chengdu, China, September 23-25,
2010. [33] Fudenberg, D.; Levine, D.K.; "The Theory of Learning in Games", The MIT Press, Cambridge, Mass, USA, 1998.
[34] Metrick, A.; Polak, B.; "Fictitious Play in 2×2 Games: a Geometric Proof of Convergence", Economic Theory, vol. 4, n°6,
pp. 923–933, 1994.
Top Related