Resource allocation for multiuser cognitive radio with primary user's cooperation

8
Procedia Computer Science Procedia Computer Science 00 (2011) 000000 www.elsevier.com/locate/procedia 3rd International Conference on Ambient Systems, Networks and Technologies Resource Allocation for Multi-user Cognitive Radio Systems using Multi-agent Q-Learning Ahmed AZZOUNA a , Amel GUEZMIL a , Anis SAKLY b , Abdellatif MTIBAA a,b a CSR GROUP of Electronic and Microelectronic Laboratory, University of Monastir, Tunisia b National Engineering School of Monastir, University of Monastir, Tunisia Abstract Cognitive Radio (CR) is a new generation of wireless communication system that enables unlicensed users to exploit underutilized licensed spectrum to optimize the radio spectrum utilization. The resource allocation is difficult to achieve in a dynamic distributed environment, in which CR users take decisions to select a channel without negotiation, and react to the environmental changes. This paper focuses on using a multi-agent reinforcement- learning (MARL), Q-learning algorithm, on channels selection decision by secondary users in 2×2 and 3×3 cognitive radio system. Numerical results, obtained with MATLAB, demonstrate that resource allocation is realized without any negotiation between secondary and primary users. In this work, the analogy between the numerical and simulated results is also noted. Keywords: Cognitive Radio; Resource Allocation; Multi-agent Reinforcement Learning; Q-Learning Algorithm 1. Introduction To address the scarcity of spectrum resources and the wireless networks deploying difficulties, several innovative approaches have been proposed allowing a dynamic and opportunistic use of unused frequency spectrum [1-5], in addition to the current approach of static spectrum allocation. Cognitive radio (CR) is a concept that responds this challenge. Radio technology is no longer enslaved to a single type of network and a limited number of frequency bands. Based on software radio methods and computing capabilities and analysis, cognitive radio observes, analyzes and adapts to its environment in a very dynamic way. It’s particularly able to choose the frequency bands and radio networks available depending on the desired quality of service (QoS) and rules imposed by the regulator. But beyond the concept developed by Mitola in a founding document [6-7], much work remains to be done: transmission innovative methods, field measurements, network architecture, heterogeneous networks management, detection algorithms, dynamic frequency allocation, networks performance optimization... Management methods of the frequency spectrum should know considerable changes in the coming years [8]. They are evolving essentially under the influence of two major trends. First, many scientists suppose that currently the spectrum is not enough efficiently used. On the other hand, radio technology

Transcript of Resource allocation for multiuser cognitive radio with primary user's cooperation

Procedia Computer Science Procedia Computer Science 00 (2011) 000–000

www.elsevier.com/locate/procedia

3rd International Conference on Ambient Systems, Networks and Technologies

Resource Allocation for Multi-user Cognitive Radio Systems

using Multi-agent Q-Learning

Ahmed AZZOUNAa, Amel GUEZMIL

a, Anis SAKLY

b, Abdellatif MTIBAA

a,b

aCSR GROUP of Electronic and Microelectronic Laboratory, University of Monastir, Tunisia bNational Engineering School of Monastir, University of Monastir, Tunisia

Abstract

Cognitive Radio (CR) is a new generation of wireless communication system that enables unlicensed users to exploit

underutilized licensed spectrum to optimize the radio spectrum utilization. The resource allocation is difficult to

achieve in a dynamic distributed environment, in which CR users take decisions to select a channel without

negotiation, and react to the environmental changes. This paper focuses on using a multi-agent reinforcement-

learning (MARL), Q-learning algorithm, on channels selection decision by secondary users in 2×2 and 3×3 cognitive

radio system. Numerical results, obtained with MATLAB, demonstrate that resource allocation is realized without

any negotiation between secondary and primary users. In this work, the analogy between the numerical and simulated

results is also noted.

Keywords: Cognitive Radio; Resource Allocation; Multi-agent Reinforcement Learning; Q-Learning Algorithm

1. Introduction

To address the scarcity of spectrum resources and the wireless networks deploying difficulties, several

innovative approaches have been proposed allowing a dynamic and opportunistic use of unused frequency

spectrum [1-5], in addition to the current approach of static spectrum allocation.

Cognitive radio (CR) is a concept that responds this challenge. Radio technology is no longer enslaved

to a single type of network and a limited number of frequency bands. Based on software radio methods

and computing capabilities and analysis, cognitive radio observes, analyzes and adapts to its environment

in a very dynamic way. It’s particularly able to choose the frequency bands and radio networks available

depending on the desired quality of service (QoS) and rules imposed by the regulator. But beyond the

concept developed by Mitola in a founding document [6-7], much work remains to be done: transmission

innovative methods, field measurements, network architecture, heterogeneous networks management,

detection algorithms, dynamic frequency allocation, networks performance optimization...

Management methods of the frequency spectrum should know considerable changes in the coming

years [8]. They are evolving essentially under the influence of two major trends. First, many scientists

suppose that currently the spectrum is not enough efficiently used. On the other hand, radio technology

2 Author name / Procedia Computer Science 00 (2012) 000–000

becoming more and more flexible, allowing the same device to access easily to wide ranges of spectrum

and different types of networks (WiFi, WiMAX, 2G, 3G, 3G+ etc.).

Since its founding by Mitola, Cognitive Radio has been the target of several studies in order to find

new techniques for spectrum detection [9-10], collaborative spectrum sensing [11-12], spectral efficiency

optimization [13-14], radio resource management [15], resource allocation [16-18], and even some

contributions to the hardware implementation of some algorithms [19-20].

In contrast to existing systems where the spectrum allocation is static, the cognitive radio terminals can

dynamically find the network access frequency by detecting free spectrum bands. The resource allocation

is still a major problem addressed by researchers from different angles using several techniques for

example dynamic spectrum access [1], learning algorithms [21] such as Q-learning [22-23] and

reinforcement-learning [24-25], neurons networks [26-27], inference engine and fuzzy techniques [28]

and even meta-heuristic algorithms such as genetic algorithms [29-31] and biological algorithms [32].

This paper focuses on using a multi-agent reinforcement-learning (MARL), Q-learning algorithm, on

channels selection decision by secondary users in cognitive radio system with two channels, two SU

(2×2) and three canals, three SU (3×3). Multi-agent learning is a promising direction of recent research in

the context of intelligent systems. The single-agent case (SARL) has been studied extensively over the

past two decades; however, the multi-agent case has been little studied due to its complexity. When

multiple autonomous agents learn and operate simultaneously, the environment is strictly unpredictable

and all assumptions that are made in the single-agent case, such as stationary and Markov property, are

often inapplicable in the multi-agent context.

2. System Model

In multi-user and multi-channel cognitive radio systems, the resource change is very dynamic.

Therefore, without negotiation between different SUs, serious problems would occur in data transmission

because of the collusion the data packet may be lost, see Fig 1.

Fig. 1. Competition in multi-user and multi-channel cognitive radio systems

This paper treats the N×N case systems with N active secondary users (denoted by letters A,B,C,... )

and N frequency channels (denoted by numbers , , ,...1 2 3 ). The channel selection problem becomes a

game N×N [33]: the secondary user i , i A,B,C,... , receives a reward ijR 0

when transmitting

through the free channel j , j , , ,... 1 2 3 , as long as the other SUs do not use this channel and ijR 0

if one or more SUs attempt to transmit through this channel.

In this game, there is no communication between SUs and the rewards received are unknown to all

users.

Channel 1

SU A

SU B

SU C

SU D

Channel 2

Channel 3

Channel 4

Author name / Procedia Computer Science 00 (2012) 000–000 3

3. Q-learning

In cognitive radio systems, each cognitive user is considered as an agent and the wireless network as

the external environment. Cognitive radio can be formulated as a system in which communicating agents

sense their environment, learn, and adjust their transmission parameters in order to optimize their

performance. This formulation represents the reinforcement learning context, see Fig 2.

Fig. 2. Multi-agent reinforcement learning based cognitive radio

Q-learning is an online reinforcement learning algorithm that determines optimal policy without

detailed modeling of the system environment [21]. Denote decision epochs by t , ,... 1 2 , a constant

epoch duration by Dt , actions by

ta A , and delayed rewards by t tr a1 . Each agent i maintains a Q-

table with A entries to keep track of learnt action value or Q-value, tQ a within an interval of max,Q0

for all its possible actions. The Q-value estimates the level of local reward for an action a ; hence changes

in the Q-value will lead to changes in an agent's action. At each decision epoch t , agent i chooses an

action ta and receives a local reward t tr a1

at time t 1 . The agent i updates the Q-value of action

ta at time t 1

as follows:

i i i i i i

t t t t t tQ a α .Q a α.r a 1 11 (1)

where α , 0 1 is the learning factor.

To assure that all actions will be considered, Boltzmann distribution is used for random exploration,

i.e.

user choose channel ij

ij ij

Q / γ

Q / γQ / γ

eP i j

e e

(2)

where γ is called Boltzmann temperature, which controls the exploration frequency and j

the other

users different from user j.

Convergence using geometric argument proposed in [34], provide an intuitive explanation that the

updating rule of Q-Value, given in (1), will converge to a stationary equilibrium point. Fig 3 represents

Spectrum Decision

Spectrum Mobility

Sp

ectr

um

Sh

arin

g S

pectr

um

Sen

sing

Radio

Environment

Cognitive Users

4 Author name / Procedia Computer Science 00 (2012) 000–000

the dynamics in the Q-learning algorithm for the 2×2 case (B B Bu Q / Q 1 2 versus

A A Au Q / Q 1 2 ). As

can be seen, the plan is divided into four zones limited by Au 1

and

Bu 1 . In the first and third

regions (I and III), each secondary user prefers selecting a different channel. Meanwhile, in the second

and forth regions (II and IV), both secondary users prefer selecting the same channel.

Fig. 3. Dynamics in the Q-learning algorithm

4. Numerical Results

This section demonstrates the analogy between the numerical and simulated results obtained with

MATLAB.

4.1. 2×2 case:

The demonstration of Q-learning algorithm convergence is given by Fig 4 and 5. This figures show the

dynamics of B B Bu Q / Q 1 2 versus

A A Au Q / Q 1 2 for different trajectories. For an initial unstable point,

where the two SUs collide and choose the same channel (zone I or III in Fig 3), the curves converge to a

stable zone (II or IV in Fig 3) which means that SU A selected the channel 1 whilst the SU B selected

channel 2 (or vice versa) and the collusion problem is diverted. This result is confirmed by Fig 6 and 7

which represent the SUs probabilities to choose the first channel. For an initial case, where the two SUs

choose the first channel (A BP P 1 1 1 ), the Q-learning algorithm has avoided the collision by keeping

the second SU choice and switching the first SU to the channel 2 in Fig 6 (and vice versa for Fig 7).

The learning procedure, obtained from 1000 spectrum access periods, is considered as completed when

the probabilities of choosing a channel are larger than 95% for one SU and smaller than 5% for the other.

Fig 8 and 9 represent the learning speed delay for different learning factors α0 and different

Boltzmann temperatures γ , respectively. This Figures show that the learning procedure is faster with a

higher learning factor and a smaller temperature.

4.2. 3×3 case:

The developed algorithm is still valid for the case of 3 channels and 3 secondary users. Fig 10 shows

the evolution of channel’s probability choice for each user. As can be seen, the collision is avoided in a

situation where all users initially choose channel 1.

I II

IV III

1

1

0 QB1/QB2

QA1/QA2

Author name / Procedia Computer Science 00 (2012) 000–000 5

To analyze the impact of channels and SUs number for fixed α and γ , the learning speed for both

cases (2×2 and 3×3) is given in Fig 11. As can be seen, the learning procedure is becoming slower by

increasing N. Nevertheless, this minor difference remains insignificant.

Fig. 4. An example of dynamics of the Q-learning algorithm Fig. 5. An example of dynamics of the Q-learning algorithm

Fig. 6. An example of the evolution of channel selection

probability (N=2)

Fig. 7. An example of the evolution of channel selection

probability (N=2)

Fig. 8. CDF of learning delay with different learning factor α0 Fig. 9. CDF of learning delay with different temperature γ

6 Author name / Procedia Computer Science 00 (2012) 000–000

Fig. 10. An example of the evolution of channel selection probability (N=3)

Fig. 11. CDF of learning delay for N=2 and N=3

Author name / Procedia Computer Science 00 (2012) 000–000 7

5. Conclusion

This paper used a multi-agent reinforcement-learning, Q-learning algorithm, on channels selection

decision by secondary users in cognitive radio system. For simplicity, 2×2 and 3×3 systems was studied.

The learning procedure for spectrum access without negotiation in cognitive radio systems is discussed.

During the learning, each SU considered the channel and other secondary users as its environment,

updated its Q-values, and taked the best action. The learning speed delay was compared for different

learning factors α0 and different Boltzmann temperatures γ . Numerical results showed that SU could

avoid collusion in both cases 2×2 and 3×3.

References

[1] Li, C.; Li, C.; "Dynamic Channel Selection Algorithm for Cognitive Radios", 4th IEEE Conference on Circuits and Systems for Communications (CSC’08), Shanghai, China, May 26-28 2008.

[2] Unnikrishnan, J.; Veeravalli, V.V.; "Algorithms for Dynamic Spectrum Access With Learning for Cognitive Radio", IEEE

Transactions on Signal Processing, vol. 58, n°2, pp.750-760, 2010. [3] Bogere, P.; Okello, D.; "Dynamic spectrum allocation in multiuser wireless networks", IST-Africa Conference Proceedings,

Gaborone, Botswana, May 11-13, 2011.

[4] Alshamrani, A.; Liang-Liang Xie; Xuemin Shen; "Adaptive Admission-Control and Channel-Allocation Policy in Cooperative Ad Hoc Opportunistic Spectrum Networks", IEEE Transactions on Vehicular Technology, vol. 59, n°1, pp. 1618-1629,

May 2010.

[5] Mangold, S.; Zhun Zhong; Challapali, K.; Chun-Ting Chou; "Spectrum Agile Radio: Radio Resource Measurements For Opportunistic Spectrum Usage", IEEE Global Telecommunications Conference (GLOBECOM’04), Dallas, USA, November 29-

December 3, 2004.

[6] Joseph Mitola III; "Cognitive Radio: an Integrated Agent Architecture for Software Defined Radio" Ph.D. dissertation, Computer Communication System Laboratory, Department of Teleinformatics, Royal Institute of Technology (KTH), Stockholm,

Sweden, May 2000.

[7] Joseph Mitola III; "Software Radio Architecture: Object-Oriented Approaches to Wireless Systems Engineering" John Wiley & Sons, New York, USA, 2000.

[8] Qijun Song; Yanbing Su; Wei Zhao; Du Wei; Tian Tian; "Methods on Further Improving the Spectrum Management", 4th

International Symposium on Electromagnetic Compatibility (EMC’07), Qingdao, China, October 23-26, 2007. [9] Yi-bing Li; Hui Huang; Fang Ye; "Spectrum Detection Model for Cognitive Radio Networks", 5th IEEE International

Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA’10), Changsha, China, September 23-26, 2010.

[10] Zengyou Sun; Qianchun Wang; Chenghua Che; "Study of Cognitive Radio Spectrum Detection in OFDM System", Asia-Pacific Conference on Wearable Computing Systems (APWCS’10), Shenzhen, China, April 17-18, 2010.

[11] Hong Li; Ma Junfei; Xu Fangmin; Li ShuRong; Zhou Zheng; "Optimization of Collaborative Spectrum Sensing for

Cognitive Radio", IEEE International Conference on Networking, Sensing and Control (ICNSC’08), Sanya, China, April 6-8, 2008. [12] Arshad, K.; Moessner, K.; "Collaborative Spectrum Sensing for Cognitive Radio", IEEE International Conference on

Communications Workshops (ICC’09), Dresden, Germany, June 14-18, 2009.

[13] Taki, M.; Lahouti, F.; "Spectral Efficiency Optimized Adaptive Transmission for Interfering Cognitive Radios", IEEE International Conference on Communications Workshops (ICC’09), Dresden, Germany, June 14-18, 2009.

[14] Liaoyuan Zeng; McGrath, S.; "Spectrum Efficiency Optimization in Multiuser Ultra Wideband Cognitive Radio Networks",

7th International Symposium on Wireless Communication Systems (ISWCS’10), York, UK, September 19-22, 2010. [15] Waheed, M.; Anni Cai; "Evolutionary algorithms for radio resource management in cognitive radio network", 28th IEEE

International Performance Computing and Communications Conference (IPCCC’09), Phoenix, Arizona, USA, December 14-16,

2009. [16] Akter, L.; Natarajan, B.; "Modeling Fairness in Resource Allocation for Secondary Users in a Competitive Cognitive Radio

Network", 9th Wireless Telecommunications Symposium, Tampa, Florida, USA, April 21-23, 2010.

[17] Wang, P.; Matyjas, J.; Medley, M.; "Joint Spectrum Allocation and Scheduling in Multi-Radio Multi-Channel Cognitive Radio Wireless Networks", 33rd IEEE Sarnoff Symposium, New Jersey, USA, April 12-14, 2010.

[18] Yang, K.; Wang, X.; "Optimal Radio Allocation for Multi-radio Cognitive Wireless Networks", Global

Telecommunications Conference, San Fransisco, California, USA, November 27-December 1, 2006. [19] Hayar, A.M.; Pacalet, R.; Knopp, R.; "Cognitive radio Research and Implementation Challenges", 41st Asilomar

Conference on Signals, Systems and Computers, Pacific Grove, California, USA, November 4-7, 2007.

8 Author name / Procedia Computer Science 00 (2012) 000–000

[20] Ren, Y.; Dmochowski, P.; Komisarczuk, P.; "Analysis and Implementation of Reinforcement Learning on a GNU Radio

Cognitive Radio Platform", 5th International Conference on Cognitive Radio Oriented Wireless Networks & Communications,

Cannes, France, June 09-11 2010. [21] Di Felice, M.; Chowdhury, K.R.; Wu, C.; Bononi, L.; Meleis, W.; "Learning-Based Spectrum Selection in Cognitive Radio

Ad Hoc Networks", 8th International Conference on Wired/Wireless Internet Communications (WWIC), Lulea, Sweden, June 1-3,

2010. [22] Li, H.; "Multi-agent Q-Learning of Channel Selection in Multi-user Cognitive Radio Systems: A Two by Two Case", IEEE

Conference on Systems, Man, and Cybernetics, San Antonio, Texas, USA, October 11-14, 2009.

[23] Li, M.; Xu, Y.; Hu, J.; "A Q-Learning Based Sensing Task Selection Scheme for Cognitive Radio Networks", International Conference on Wireless Communications & Signal Processing, Nanjing, China, November 13-15, 2009.

[24] Wu, C.; Chowdhury, K.R.; Di Felice, M.; Meleis, W.; "Spectrum Management of Cognitive Radio Using Multi-agent

Reinforcement Learning", 9th International Conference on Autonomous Agents and Multiagent Systems: Industry track, Montréal, Canada, May 10-14, 2010.

[25] Berthold, U.; Fangwen Fu; van der Schaar, M.; Jondral, F.K.; "Detection of Spectral Resources in Cognitive Radios Using Reinforcement Learning", 3rd IEEE New Frontiers in Dynamic Spectrum Access Networks, Chicago, USA, October 14-17, 2008.

[26] Zhang, Z.; Xie, X.; "Intelligent Cognitive Radio: Research on Learning and Evaluation of CR Based on Neural Network",

5th Information and Communications Technology, Cairo, Egypt, December 16-18, 2007. [27] Baldo, N.; Zorzi, M.; "Learning and Adaptation in Cognitive Radios using Neural Networks", 5th IEEE Consumer

Communications and Networking Conference, Las Vegas, Nevada, USA, January 10-12, 2008.

[28] Huang, Y., Wang, J.; Jiang, H.; "Modeling of Learning Inference and Decision-Making Engine in Cognitive Radio", 2nd International Conference on Networks Security, Wireless Communications and Trusted Computing, Wuhan, China, April 24-25

2010.

[29] Hauris, J. F.; He, Donya; Michel, Geremy; Ozbay, Cahit; "Cognitive Radio and RF Communications Design Optimization using Genetic Algorithms", Military Communications Conference, Orlando, Florida, USA, October 29-31, 2007.

[30] Zhang, Z.; Xie, X.; "Application Research of Evolution in Cognitive Radio Based on GA", 3rd IEEE Conference on

Industrial Electronics and Applications, Singapore, Singapore, June 03-05, 2008. [31] Herry, S.; Le Martret, C.J.; "Parameter Determination of Secondary User Cognitive Radio Network Using Genetic

Algorithm", IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Victoria, B.C., Canada, August

23-26, 2009. [32] Xu Mao; Hong Ji; "Biologically-Inspired Distributed Spectrum Access for Cognitive Radio Network", 6th International

Conference on Wireless Communications Networking and Mobile Computing (WiCOM’10), Chengdu, China, September 23-25,

2010. [33] Fudenberg, D.; Levine, D.K.; "The Theory of Learning in Games", The MIT Press, Cambridge, Mass, USA, 1998.

[34] Metrick, A.; Polak, B.; "Fictitious Play in 2×2 Games: a Geometric Proof of Convergence", Economic Theory, vol. 4, n°6,

pp. 923–933, 1994.