Design of Multistandard Channelization Accelerators for Software Defined Radio Handsets

14
IEEE Proof IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011 1 Design of Multistandard Channelization Accelerators for Software Defined Radio Handsets Navin Michael, Student Member, IEEE, A. P. Vinod, Senior Member, IEEE, Christophe Moy, Member, IEEE, and Jacques Palicot, Member, IEEE Abstract—This paper presents a novel multistandard channel- ization accelerator design methodology for the digital front-end of a software defined radio (SDR) handset. Dedicated hardware (HW) accelerator cores have a power efficiency which is several orders higher than a software implementation and hence, have been exten- sively used for accelerating the computationally intensive tasks like channelization. However, these cores are generally inflexible and optimized for a single standard. The growing need for supporting multiple wireless standards with heterogeneous throughput and mobility requirements in a small form factor mobile handset with a limited silicon area, requires the accelerator cores to be flexible and reusable in addition to being power efficient. The proposed methodology exploits commonalities in the channelization specifi- cations to hardwire and reuse a significant portion of the acceler- ator, across multiple standards. The resulting accelerator is area efficient and scalable for supporting an arbitrary number of stan- dards. Index Terms— [Author, please supply index terms/keywords for your paper. To download the IEEE Taxonomy go to http://www.ieee. org/documents/2009Taxonomy_v101.pdf]. I. INTRODUCTION S UPPORT for multiple wireless standards is important for emerging paradigms like 4G, which envisions multiple access networks built on a core IP network, with flexible radio agnostic terminals providing seamless mobility across multiple standards. The traditional approach to multistandard support, commonly referred to as the Velcro approach [2], involves stacking of multiple transceiver chains, each of them optimized for a single standard. The obvious drawback of the Velcro approach is its nonscalability. With increasing number of supported standards, it results in bulky and power consuming radios. The software defined radio (SDR) approach provides an alternative paradigm for designing multistandard radios [3], [4]. It can be characterized by two salient trends that differentiate it from traditional single mode radio design paradigms, namely shifting of a bulk of the signal processing load to the digital do- Manuscript received January 24, 2011; revised April 14, 2011 and June 26, 2011; accepted June 26, 2011. Date of publication July 05, 2011; date of current version September 14, 2011. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Xiang-Gen Xia. This work was supported in part by the Motorola Foundation and by the Merlion Ph.D. Grant, France-Singapore Cooperation Platform for Science and Technology. N. Michael and A. P. Vinod are with the School of Computer Engineering, Nanyang Technological University, Singapore 639798 (e-mail: navi0001@ntu. edu.sg; [email protected]). C. Moy and J. Palicot are with the SUPELEC/IETR, Rennes 35576, France (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TSP.2011.2161301 main and a flexible physical layer hardware both in the analog front-end and digital baseband, that allow a number of radio parameters like the carrier frequency, channel bandwidths, modulation scheme, etc. to be parameterized through software. Flexibility necessarily incurs a power penalty, as flexible hardware is usually several orders lower in power efficiency when compared to custom hardware [5]. Improving the energy efficiency of battery powered mobile terminals faces some daunting challenges as network operators migrate from 2G and 3G technologies to 4G. In power-constrained mobile terminals, the available power budget for computations is only about 1 W and moving towards 4G technologies would require nearly 1 TOPS to be performed within that budget [6], [7]. A hetero- geneous multiprocessor system on chip (MPSoC) approach is emerging as one of the most promising alternatives for implementing a flexible digital baseband, due to its ability to provide both the flexibility required for multistandard support, as well as the power efficiency required to perform the com- putationally intensive baseband algorithms within the power budget of a mobile radio [8], [9]. The processing elements in a heterogeneous MPSoC may comprise of fixed instruction set processors, application specific instruction processors (ASIPs) or dedicated hardware accelerators. Hardware accelerators have a much higher power efficiency than a processor based ap- proach and in energy constrained mobile handsets, they may be the only feasible alternative for implementing computationally intensive algorithms. However they are relatively inflexible and are typically optimized for a single standard. Hence a scalable MPSoC design which needs to provide multistandard support in a limited silicon area, requires the accelerator cores to be flexible and reusable. In this paper, we focus on the design of a power efficient, flexible hardware accelerator that performs the computationally intensive channelization tasks in the digital front end of a multi- standard SDR handset [10]. The functional requirements of the channelization accelerator include selecting the frequency band containing the channel of interest, interferer attenuation, sample rate conversion and pulse shaping. All the required functions can be performed using a multistage decimation filter. The Velcro approach [2] and the coprocessor approach [27] represent two ends of the traditional multistandard filter accelerator design space. The Velcro accelerator consists of multiple single-mode accelerators stacked in parallel, each of which have been opti- mized for a specific standard. While supporting multiple stan- dards in this fashion incurs a significant area penalty, the Velcro accelerator has the advantage of having a dynamic power con- sumption which is comparable to that of a single mode acceler- ator. However its flexibility is limited to the ability to switch be- tween a known set of standards, and cannot support in-field up- gradeability. The coprocessor approach maps the filtering func- 1053-587X/$26.00 © 2011 IEEE

Transcript of Design of Multistandard Channelization Accelerators for Software Defined Radio Handsets

IEEE

Proo

f

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011 1

Design of Multistandard Channelization Acceleratorsfor Software Defined Radio Handsets

Navin Michael, Student Member, IEEE, A. P. Vinod, Senior Member, IEEE, Christophe Moy, Member, IEEE, andJacques Palicot, Member, IEEE

Abstract—This paper presents a novel multistandard channel-ization accelerator design methodology for the digital front-end ofa software defined radio (SDR) handset. Dedicated hardware (HW)accelerator cores have a power efficiency which is several ordershigher than a software implementation and hence, have been exten-sively used for accelerating the computationally intensive tasks likechannelization. However, these cores are generally inflexible andoptimized for a single standard. The growing need for supportingmultiple wireless standards with heterogeneous throughput andmobility requirements in a small form factor mobile handset witha limited silicon area, requires the accelerator cores to be flexibleand reusable in addition to being power efficient. The proposedmethodology exploits commonalities in the channelization specifi-cations to hardwire and reuse a significant portion of the acceler-ator, across multiple standards. The resulting accelerator is areaefficient and scalable for supporting an arbitrary number of stan-dards.

Index Terms— [Author, please supply indexterms/keywords for your paper. To downloadthe IEEE Taxonomy go to http://www.ieee.org/documents/2009Taxonomy_v101.pdf].

I. INTRODUCTION

S UPPORT for multiple wireless standards is important foremerging paradigms like 4G, which envisions multiple

access networks built on a core IP network, with flexible radioagnostic terminals providing seamless mobility across multiplestandards. The traditional approach to multistandard support,commonly referred to as the Velcro approach [2], involvesstacking of multiple transceiver chains, each of them optimizedfor a single standard. The obvious drawback of the Velcroapproach is its nonscalability. With increasing number ofsupported standards, it results in bulky and power consumingradios. The software defined radio (SDR) approach provides analternative paradigm for designing multistandard radios [3], [4].It can be characterized by two salient trends that differentiate itfrom traditional single mode radio design paradigms, namelyshifting of a bulk of the signal processing load to the digital do-

Manuscript received January 24, 2011; revised April 14, 2011 and June 26,2011; accepted June 26, 2011. Date of publication July 05, 2011; date of currentversion September 14, 2011. The associate editor coordinating the review of thismanuscript and approving it for publication was Prof. Xiang-Gen Xia. This workwas supported in part by the Motorola Foundation and by the Merlion Ph.D.Grant, France-Singapore Cooperation Platform for Science and Technology.

N. Michael and A. P. Vinod are with the School of Computer Engineering,Nanyang Technological University, Singapore 639798 (e-mail: [email protected]; [email protected]).

C. Moy and J. Palicot are with the SUPELEC/IETR, Rennes 35576, France(e-mail: [email protected]; [email protected]).

Digital Object Identifier 10.1109/TSP.2011.2161301

main and a flexible physical layer hardware both in the analogfront-end and digital baseband, that allow a number of radioparameters like the carrier frequency, channel bandwidths,modulation scheme, etc. to be parameterized through software.

Flexibility necessarily incurs a power penalty, as flexiblehardware is usually several orders lower in power efficiencywhen compared to custom hardware [5]. Improving the energyefficiency of battery powered mobile terminals faces somedaunting challenges as network operators migrate from 2G and3G technologies to 4G. In power-constrained mobile terminals,the available power budget for computations is only about 1W and moving towards 4G technologies would require nearly1 TOPS to be performed within that budget [6], [7]. A hetero-geneous multiprocessor system on chip (MPSoC) approachis emerging as one of the most promising alternatives forimplementing a flexible digital baseband, due to its ability toprovide both the flexibility required for multistandard support,as well as the power efficiency required to perform the com-putationally intensive baseband algorithms within the powerbudget of a mobile radio [8], [9]. The processing elements ina heterogeneous MPSoC may comprise of fixed instruction setprocessors, application specific instruction processors (ASIPs)or dedicated hardware accelerators. Hardware accelerators havea much higher power efficiency than a processor based ap-proach and in energy constrained mobile handsets, they may bethe only feasible alternative for implementing computationallyintensive algorithms. However they are relatively inflexible andare typically optimized for a single standard. Hence a scalableMPSoC design which needs to provide multistandard supportin a limited silicon area, requires the accelerator cores to beflexible and reusable.

In this paper, we focus on the design of a power efficient,flexible hardware accelerator that performs the computationallyintensive channelization tasks in the digital front end of a multi-standard SDR handset [10]. The functional requirements of thechannelization accelerator include selecting the frequency bandcontaining the channel of interest, interferer attenuation, samplerate conversion and pulse shaping. All the required functions canbe performed using a multistage decimation filter. The Velcroapproach [2] and the coprocessor approach [27] represent twoends of the traditional multistandard filter accelerator designspace. The Velcro accelerator consists of multiple single-modeaccelerators stacked in parallel, each of which have been opti-mized for a specific standard. While supporting multiple stan-dards in this fashion incurs a significant area penalty, the Velcroaccelerator has the advantage of having a dynamic power con-sumption which is comparable to that of a single mode acceler-ator. However its flexibility is limited to the ability to switch be-tween a known set of standards, and cannot support in-field up-gradeability. The coprocessor approach maps the filtering func-

1053-587X/$26.00 © 2011 IEEE

IEEE

Proo

f

2 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

tion onto a set of generic multiply and accumulate (MAC) unitsand is capable of accommodating arbitrary filter specifications.However its flexibility comes at the expense of a significant dy-namic power consumption penalty when compared to a singlemode accelerator. Hence, both the above accelerator paradigmsfalls short of the flexibility and power efficiency requirement ofa SDR handset. This paper tackles the problem of evolving analternate channelization accelerator paradigm that has lower dy-namic power consumption than the coprocessor approach, whilestill being sufficiently flexible to switch between a known setof standards or support a new standard by means of an in-fieldupgrade. The proposed architecture achieves these goals by re-ducing the computational load assigned to generic MAC unitsand identifying opportunities for reusing hard-wired filter stagesacross multiple standards.

The current work expands upon the preliminary work ona multistandard channelization accelerator design paradigm,proposed by us in a recent conference paper [17]. It highlightsthe need for reconfigurable hardware (HW) accelerators in thedigital front-end of a flexible mobile radio and the limitationsof existing multistandard accelerator designs. It also includesexperimental synthesis results that demonstrate the area ad-vantage of the proposed method. The paper is organized asfollows. Section II introduces the multistandard channelizationproblem, the specific challenges that need to be addressedfor multistandard support and the limitations of the existingdesign paradigms in addressing these challenges. Section IIIpresents the fixed factorization method for implementing amultistandard channelization accelerator. Section IV highlightsthe design considerations that need to be taken into account,before using the proposed approach in a practical implemen-tation. Section V presents some experimental synthesis resultsthat demonstrate the area advantage of the proposed approach,when compared to a Velcro style multistandard acceleratorimplementation. Section VI offers our conclusions.

II. THE MULTISTANDARD CHANNELIZATION PROBLEM

A. Background

There is still no consensus on a standard definition for theterm “channelization.” In the current work, it has been usedto refer to all the signal processing tasks required to extractthe channel of interest from the digitized wideband signal.Note that, unlike a base-station channelizer, which requiresmultiple channels to be extracted in parallel, a mobile handsettypically has only a single channel of interest. In this paper,the required channel is assumed to have been shifted to DC,either by a tunable analog quadrature downconversion stage ora digital downconverter. The design of the channelization blockis strongly coupled to the design of the analog frontend andthe analog to digital converter (ADC). A multistandard radiohas to accommodate multiple channel bandwidths. Performingthe channel selection completely in the analog domain wouldrequire highly frequency selective analog filters, tunable overa wide range of bandwidths. The limited flexibility of analogfilters is a major bottleneck for multistandard support. A morepractical multistandard paradigm is the so-called “fixed digi-tization bandwidth” approach [10]. This approach allows theflexibility requirements of the analog filters to be considerablyrelaxed by designing them for the worst case channel band-width, as shown in Fig. 1. The task of fine channel selection is

Fig. 1. Multistandard channelization.

now shifted to the digital domain. Selecting a coarse frequencyband in the analog front-end requires the ADC to digitize awideband signal, which may comprise of unwanted interferersand blockers, in addition to the channel of interest. The powerlevels of the interferers and blockers can be several times thatof the required channel. The high dynamic range of the inputsignal, makes a Nyquist ADC-based digitization very expensivein terms of power consumption [18]. ADCs are an efficientalternative for the above scenario, because of their ability toshape the quantization noise away from the channel of interest,and provide a very high dynamic range only within the channelof interest. ADCs are also inherently reconfigurable,providing a tradeoff between bandwidth and dynamic range.Various reconfigurable ADCs have also been proposedwhose parameters like oversampling rate (OSR), noise shapingorder, number of cascade stages and quantizer bit resolutioncan be reconfigured to digitize different channel bandwidths ata low power consumption, making them a good candidate formultistandard SDRs [20], [45]. A class of ADC known ascontinuous time sigma delta (CT- ) converters, use contin-uous time filters for noise shaping, rather than the discrete timefilters used in traditional ADCs. These converters performan implicit anti-alias function, thereby reducing the load oreven eliminating the need for analog anti-aliasing filters [21].This can lead to highly integrated multistandard radios.

The ADC output is typically highly oversampled with re-spect to the final symbol rate (or chip rate for spread spectrumsystems). The subsequent modem algorithms typically operateat the symbol rate or the chip rate. Hence one of the most impor-tant channelization tasks is to perform a sample rate conversion(SRC) operation. The required SRC factors can be integral or ra-tional. In addition to the channel of interest, the output spectrumof the ADC comprises of interferers and blockers whichwere not attenuated by the analog front-end, as well as a largeamount of quantization noise introduced by the ADC it-self. Successful demodulation requires the potential interferingcomponents to be sufficiently attenuated to satisfy the standardspecific carrier-to-noise (C/N) ratio. In a SRC system, insuffi-ciently attenuated interferers can also alias back into the channelof interest, irrecoverably corrupting the required signal. The biterrors that are introduced due to inter symbol interference (ISI)also need to be considered while implementing the channel-ization function. Band-limiting the transmitted pulse introducesripples in the time domain, which can interfere with the symbol

IEEE

Proo

f

MICHAEL et al.: DESIGN OF MULTISTANDARD CHANNELIZATION ACCELERATORS 3

estimation operation. Combating ISI might require the use ofpulse shaping filters like raised cosine (RC) filters, root raisedcosine (RRC) filters or Gaussian filters [22]. These filters atten-uate the start and end portions of the symbol and ensure that thezero-crossings of the time domain ripples overlap with the mid-point of the neighboring symbols. Pulse shaping may be per-formed either completely by a transmit filter or the load maybe split between matched transmit and receive filters. In a dig-ital system, RRC filters have to operate at a minimum oversam-pling rate (OSR) of two, with respect to the symbol/chip rate toprevent aliasing [22] (Note that henceforth the OSR is alwaysexpressed with respect to the final symbol/chip rate). Hence thereceive-side pulse shaping filter can be considered as the lastdecimation stage in the SRC system. For standards which useRC or RRC pulse shaping filters, the passband width that has tobe protected from aliasing is typically specified in terms of thesymbol rate and the roll-off factor [22]. For the standards whichuse Gaussian pulse shaping filters, the bandwidth that needs tobe protected from aliasing can be determined by the symbol rateand the bandwidth-time (BT) product [23].

The functionally different tasks of interference and aliasattenuation, sample rate conversion and pulse shaping can bejointly performed by a multistage decimation filter structure[24]. The order of a finite impulse response (FIR) filter isinversely proportional to its normalized transition bandwidth,and can be given by (1) [25]

(1)

where is the passband ripple, is the stopband ripple,is the sampling rate, and is the transition bandwidth.represents the normalized transition bandwidth. SinceADC sampling rate, , is typically very high, a single stagedecimation filter has a very narrow normalized transition band-width, resulting in a high order filter. Decimating in multiplestages ensures, that only low order filters operate in a high OSRenvironment while higher order filters need to operate only ona low OSR signal, reducing the overall computational com-plexity. Designing a multistandard channelization acceleratoris challenging because of the large number of variable param-eters that need to be supported, namely, channel bandwidths,band edge specifications, interference and alias attenuationrequirement, SRC factors and pulse shaping requirements.Hence the underlying platform, which implements the chan-nelization function, should be flexible enough to accommodatethese variable parameters. A software implementation of thechannelization function on a general purpose processor (GPP)or a digital signal processor (DSP) offers the highest amount offlexibility, but these platforms have a power efficiency of only1–10 MOPS/mW [5]. The filtering operations required for per-forming the channelization function, incur a very large numberof multiply and add operations per second. Typically, onlyapplication specific integrated circuit (ASIC) implementationshave the sufficient power efficiency to perform the requiredcomputations in a battery powered mobile handset, which hasa power budget of about 1 W. However ASICs also have thedrawback of being highly inflexible. This paper addresses theproblem of designing ASIC based channelization acceleratorsthat are flexible enough to support multiple standards, whileminimizing the power and area penalty.

B. Area Power Tradeoffs in the Multistandard FilterAccelerator Design Space

The starting point for a typical design flow of a filter ac-celerator for channelization consists of a set of channelizationspecifications, typically the band edge specifications, the at-tenuation requirements and the required SRC factor. This isused to generate the behavioral description of a multistage dec-imation filter which satisfies the channelization specifications.The high level synthesis steps transform this behavioral descrip-tion into a register transfer level (RTL) description. The archi-tectural synthesis steps use the RTL description to create thefinal physical layout. In a single mode accelerator, the filtercoefficients are fixed. Hence a full parallel implementation ofthe multistage decimation filter structure, is able to make useof a large class of constant-coefficient based behavioral opti-mizations like constant propagation, operator strength reduc-tion, graph dependency algorithms, and common subexpressionelimination [12]–[16]. These optimizations eliminate the needfor generic multipliers for implementing the filter taps, whilereplacing them by a network of hard-wired shifts and adders,thereby reducing the overall arithmetic complexity. The possi-bility to perform these optimization at such an early stage in thedesign flow, has a significant impact on reducing the area andpower consumption. Given the fact that single mode accelera-tors can be efficiently implemented, the “Velcro” approach tomultistandard support would involve stacking multiple singlemode accelerators in parallel [2]. However this method has thedisadvantage of being unscalable. When a large number of stan-dards need to be supported, using dedicated hardware acceler-ators to implement the computationally intensive kernels in allthe standards can translate to a very large area penalty, even ifeach of the accelerators individually have a high performance toarea ratio.

For very high volume products like mobile handsets, keepingthe silicon die area to a minimum has a very big impact onthe costs. Smaller chip area allows more dies to be packed ina single wafer resulting in higher yields [1]. For battery pow-ered mobile devices which spend a significant amount of timein standby mode, the leakage power consumption is also veryimportant. In nanoscale CMOS technologies, where the leakagepower is the dominant mode of power consumption, increasedarea translates to increased number of leaking transistors. Bothgate leakage and subthreshold leakage power consumption arestrongly correlated to the total gate width and area [26]. Hencereducing the area and the number of leaking transistors is im-portant to reduce the overall standby power consumption. Ag-gressive scaling over the years has also increased the gate den-sity allowing more number of gates to be packed within thesame die area. Hence a naive approach to multistandard accel-erator design would be to use a Velcro style implementationwith multiple standard specific accelerators in parallel. The dy-namic and leakage power of the unused accelerators could po-tentially be reduced by clock gating and power gating respec-tively. While clock gating significantly reduces the switchingactivity, power gating as a leakage reduction strategy does notscale well into nanoscale technologies due gate leakage currentsin the added power gating circuitry [11]. With the scaling of thegate oxide thickness, the gate leakage power has been increasingat a much faster rate than the subthreshold leakage, which makesthe Velcro approach less power efficient in smaller technology

IEEE

Proo

f

4 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

nodes. Thus the area efficiency and reusability of all the hard-ware accelerator cores in a multistandard SDR baseband are im-portant from the perspective of scalability, manufacturing costreduction, and standby leakage reduction.

At the other end of the channelization accelerator designspace is a filter coprocessor approach which has a high degreeof flexibility and reusability. Rather than performing the com-putations in a spatial style, a coprocessor approach performsthe computations in a temporal fashion, by time sharing a fixedset of generic multiply and accumulate (MAC) units [27]. Timesharing results in increased resource usage and allows variablefilter lengths to be folded onto the same underlying hardware.Hence it has a considerably lower data path area than theVelcro approach. The use of generic MAC units also allows itto support arbitrary coefficient sets. However the coprocessorapproach has several drawbacks that need to be taken intoaccount. Specialized decimation filter stages like cascadedintegrator comb (CIC) filter [28] or classical interpolationpolynomial based Farrow structures [29] have either unity co-efficients or small integral coefficients. Therefore using genericMAC units for the coefficient multiplications in these stagesis superfluous. The inability of the coprocessor approach, tomake use constant-coefficient based behavioral optimizationsto reduce the arithmetic complexity also results in increaseddynamic power consumption, when compared to spatial styleimplementations. Time sharing in FIR filters also results inloss of input correlation compared to spatial implementations,which further increases the dynamic power consumption dueto increased switching activity [30]. The use of generic MACunits also requires the complete coefficient set of the multistagedecimation filter to be stored in a reprogrammable coefficientrandom access memory (RAM) or an internal register file. Dueto the limited I/O bandwidth of the accelerator core as wellas the limited memory bandwidth of the processor, the highlevel processor routines for parameterization may require alarge number of cycles to completely transfer the configurationcontext to the core, especially when the number of filter taps isvery high. In the programmable FIR filter in [31] for instance,the coefficient registers are connected in a serial scan chain,and the coefficients are loaded serially on the clock edge.

C. Limitations of Existing Decimation Filters forMultistandard Channelization

There exists a large body of work on the design of decima-tion filters in the existing literature. As a part of the channeliza-tion function in the digital front-end of a multistandard radio,the decimation filter has to perform the multiple tasks of samplerate conversion, channel selection, interference attenuation andpulse shaping, all of which have not been factored into most ofthe reported designs. The ratio of the oversampled ADCto the final symbol/chip rate, is typically a large rational or in-tegral number. Most of the reported decimation filter designshave only focused on the integral decimation. These designs ei-ther assume that the ADC sampling rate is an integral mul-tiple of the standard specific symbol rates, or that any requiredfractional sample rate conversion is performed by a subsequentresampler after the integral decimation filter [32]–[35]. The firstscenario, while eliminating the requirement of a fractional SRCstage, limits the flexibility of the multistandard radio to a setof specifications whose symbol rates are known a priori. Any

new standard, whose symbol rate is not an integral factor of thefixed clock cannot be supported. Due to the difficulty of pro-viding multiple high quality master clocks [36], it can be en-visioned that the total number of available master clocks in amultimode radio would be limited. This necessitates the pres-ence of a fractional SRC stage in the decimation chain to handlearbitrary symbol rates. The second scenario implicitly assumesthat fractional sample rate conversion can always be efficientlyperformed in a low OSR environment, which is not necessarilytrue. The stringency of the filtering specifications progressivelyincreases in a multistage decimation filter chain. Hence a frac-tional SRC operating in a relatively high OSR environment canusually be implemented in a much simpler fashion than ones op-erating in a low OSR environment [36].

When the overall pulse shaping load is split between matchedtransmit and receive stages, a pulse shaping filter has to be in-corporated into the multistage decimation filter chain. The over-sampling factor is one of the key design parameters of pulseshaping filters. Hence while partitioning the original SRC factorinto multiple stages, the oversampling factor of the pulse shapedfiltering stage also needs to be taken into account [37]. Many ofthe reported decimation filter designs which decimate the over-sampled input to the symbol rate, have not taken into accountany possible pulse shaping requirements [32], [35].

The design of a multistandard decimation filter chain is com-plicated by the fact that, the band edge specification of the in-dividual stages, pulse shaping requirement, the position of thealiasing components, their strengths as well as the re-quirement of the standards can show considerable variation [35],thereby requiring each of the above stages to be programmable.The need for programmability precludes the use of the vastbody of existing literature on low power, low area optimiza-tions for fixed coefficient FIR filters. This translates to a muchhigher power consumption and area than a single mode decima-tion filter. In a scalable multistandard design, hardware reuseand resource sharing are very important design issues, in addi-tion to the reconfigurability required for multistandard support.Not many existing works have focused on the issue of identi-fying redundancies that may exist across multiple standards ina multistandard decimation filter, which may allow one or morestages to be hard wired and reused across standards with min-imal changes. The few works that have touched upon the issue[33], have limited their scope to a very small set of standards,hence the results are not easily extendible to a multistandardradio, that is capable of supporting arbitrary number of stan-dards.

III. PROPOSED MULTISTANDARD CHANNELIZATION

ACCELERATOR DESIGN FLOW

A. A Mathematical Formulation of the MultistandardChannelization Problem

Assume that ‘ ’ wireless standards have to be supported bythe channelization accelerator. Let the C/N requirement of theth standard be . Let the SRC factor of the th standard be .The ADC design determines the OSR, , which is neces-sary to achieve the desired dynamic range. Assume that canbe factorized into factors, so that the required decimation isperformed by filtering stages. If the final symbol rate (or chiprate for spread spectrum systems) is , the ADC sampling rate

IEEE

Proo

f

MICHAEL et al.: DESIGN OF MULTISTANDARD CHANNELIZATION ACCELERATORS 5

Fig. 2. Position of alias intervals.

can be expressed in terms of the decimation factors as follows[22]:

(2)

Without loss of generality, all the decimation factors are as-sumed to be integers. The alias attenuation requirements derivedbelow are equally valid for fractional decimation factors. Whenthe pulse shaping for a particular mode is performed by a RCtransmit filter or a pair of matched RRC transmit and receivefilters, the bandwidth that has to be protected from aliasing isa function of the symbol rate and the roll-off factor , asfollows:

(3)

The term corresponds to the passband edge, as shown inFig. 2. The roll-off factor lies in the interval [0,1] [22]. Asmaller results in a smaller bandwidth but also causes an in-crease in the magnitude of the time domain ripples, which in turncould cause bit errors if the symbol estimation is not performedexactly at the midpoint of the symbol period. A higher resultsin lower ripple magnitude, at the expense of increased channelbandwidth and hence is less sensitive to timing errors. Gaussiantransmit filters are also commonly used for pulse shaping, butdo not have an explicit roll-off factor. They are typically speci-fied in terms their BT product, which is defined as the productof the 3-dB bandwidth and the symbol period. Reducing the BTproduct has the effect of reducing the occupied bandwidth atthe cost of increased ISI. Gaussian filters are typically used inscenarios where some amount of ISI is tolerable, and for mostof the commonly used standards, the BT product lies in the in-terval, [0,0.5] [38]. Let the BT product for the th standard be

. The corresponding band edge of the information bandwidththat needs to be protected from aliasing can be given by

(4)

Assuming that the desired channel sensitivity level is given bydBm, each decimation stage has to ensure that any potential

interferer, blocker, or quantization noise that can alias into thepassband interval has to be attenuated to the level

dBm. This criterion is henceforth referred to as the“antialiasing criterion” in the current work.

Consider the design of the th decimation stage of the thstandard, which decimates by the factor . The input andoutput sampling rates of the above stage can be given by

(5)

(6)

The potential alias components for this stage lie in the fre-quency intervals of bandwidth , centered around integralmultiples of output sample rate , as shown in Fig. 2.The potential aliasing components within these bands maycomprise of interferers, blockers or just quantization noise fromthe ADC. The degree of stopband attenuation required tosuppress the aliasing components to the noise floor, dependson the interference levels within the aliasing bands. Designinga decimation filter to strictly conform to the above constraintsrequires a multistopband filter with multiple stopband attenua-tion values, and multiple don’t-care regions. The unattenuateddon’t-care regions, do not cause aliasing into the passband forthe current stage, and can be removed by the latter decimationstages [24]. The design methodologies for low pass FIR filtersbeing more mature, with easily available design tools, it ismore common to implement the decimation filters using lowpass FIR filters, which provide uniform stopband attenuationover both the aliasing and don’t-care regions [24]. The requiredstopband attenuation can be chosen to satisfy the worst caseattenuation requirements, among all the aliasing bands. Therequirement of preventing aliasing in the passband interval,

can be achieved by attenuating all the componentsabove , by . The frequencyrepresents the starting boundary of the first aliasing interval.Hence, the passband edge and the stopband edge ofthe th decimation stage can be given as follows:

(7)

(8)

When RC or RRC filters are used for pulse shaping, the pass-band edge and the stopband edge of the th decimationstage in radians (normalized by ) can be derived from (5)–(8),as shown

(9)

(10)

When the pulse shaping is performed by a Gaussian filter, thecorresponding band edges can be given be given by

(11)

IEEE

Proo

f

6 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

(12)

The (9)–(12) highlight the main sources of variability in a mul-tistandard multistage decimation filter design. It can be seenthat stopband edge, passband edge, and stopband attenuationparameters of each stage in the multistandard multistage dec-imation filter are dependent on the standard specific parame-ters. This limits the possibilities for hardwiring a filtering stageand directly reusing it across multiple standards without anychanges. The need for reprogrammability precludes the use ofconstant-coefficient filter optimizations, and necessitates the useof generic multipliers.

B. Eliminating the Dependency of Filter Stages on StandardSpecific Parameters

It was shown in (9)–(12) that the dependency of the filterstages on the standard specific parameters, prevents them frombeing readily reused across multiple standards. This sectiondiscusses some specific conditions under which the filterspecifications can be modified to eliminate this dependency,without changing their functionality. Consider the case of twomultistage decimation filters corresponding to two differentstandards, henceforth referred to as standard-1 and standard-2.Assume that the first standard performs pulse shaping througha pair of matched transmit and receiver RRC filters, while thesecond standard performs the same using a Gaussian transmitfilter. Let be one of the decimation stages in the firstfilter, and be one of the decimation stages in the secondfilter. The respective filter specifications of these two stagesare dependent on the input OSR, decimation factor, attenuationrequirement and the pulse shaping parameters. Now consider aspecial case where both and lie at the same OSR

with respect to their respective symbol rates and both deci-mate by the same factor , as indicated in Fig. 3. The band edgesand stopband attenuation of both the stages have to be specifiedsuch that the potential alias components are suppressed to thenoise floor and their respective passband intervals are free fromaliasing. Let the roll off factor for the first standard be andthe product for the second standard be . The passbandintervals of the two standards (frequency values normalizedby ) that have to be protected from aliasing, can be givenby and , respectively. For a filterwhich decimates by a factor of , the potential alias imageslie around integral multiples of (frequency value normalizedby ). The starting boundary of the first aliasing intervals of

and lie at and , respec-

tively. Let the required stopband attenuation for andbe and , respectively. The antialiasing criterion

for and can be fulfilled by placing the passbandedge at boundary of the band containing the useful informationin standard-1 and standard-2, respectively, and by placingthe stopband edges at the starting boundary of the first aliasintervals, while attenuating by and in their respectivestopbands. The decimation filter responses andare shown in Fig. 4(a) and (b), respectively. It can be seen thatthe filter specifications show a dependency on the standard

Fig. 3. Position of � ��� and � ��� in the decimation filter chain.

Fig. 4. Frequency responses of � ���,� ���, and � ���.

dependent pulse shaping and stopband attenuation parameters.In a multistage decimation filter, the overall band edge is deter-mined by the last stage filter. The stages prior to the last stagemerely perform a decimation operation while ensuring thatthe passband is free from aliasing, and do not have an explicitband-edge shaping role. Hence the band edges and attenuationspecification of all the stages prior to the last stage, can po-tentially be modified without affecting the functionality of theoverall multistage filter, as long as the modified specificationsfor the individual stages satisfy the antialiasing criterion in thepassband.

Consider a modified filter stage , whose stopband at-tenuation is fixed at and band edges are designedfor the boundary conditions of the pulse shaping parameters

. These boundary conditions correspond tothe maximum permissible passband intervals, and henceprotects a wider band from aliasing, than either or .Fig. 4(c) shows the modified filter stage with the pass-

band and stopband edges fixed at and , respec-tively. It can be seen that, the modified specifications satisfy theanti-aliasing criterion for both the standards. Any unattenuated,undesired component will be removed by the succeeding deci-mation filter stages. It can be inferred from these observationsthat the presence of filtering stages which decimate by the samefactor, at the same OSR in different multistage decimation fil-ters, allows them to be replaced with a modified specificationwhich can be hard wired and reused across the multiple stan-dards without any changes. However this band-edge modifica-

IEEE

Proo

f

MICHAEL et al.: DESIGN OF MULTISTANDARD CHANNELIZATION ACCELERATORS 7

Fig. 5. Programmable nonrecursive CIC structure.

tion strategy cannot be used for the last stage filter, since it deter-mines the overall band-edge of the multistage decimation filter.

C. Proposed Factorization Strategy for MultistandardChannelization

The multistandard channelization accelerator design flow,proposed in this section, differs from the existing filter ac-celerator design paradigms in the fact that, commonalitiesin multiple channelization specifications are identified at theearliest possible stage in the design cycle. This allows a unifiedhigh level and architectural synthesis to be performed, formapping multiple modes onto the same underlying hardware.It also allows a significant portion of the accelerator to be hardwired and reused across multiple modes, with minimal over-heads for reconfiguration and parameterization. The hard-wiredportions of the accelerator are also able to make use of variousconstant-coefficient filter optimizations for reducing the areaand power.

In the arbitrary factorization of the decimation ratio in (2), thenumber of decimation factors and the value of the decimationfactors themselves can show considerable variation across stan-dards. This section introduces an alternate factorization strategythat exploits the observations in the previous section, and maxi-mizes the number of decimation filter stages that lie at the sameOSR and decimate by the same factor, across multiple standards.Consider a factorization of a rational decimation factor, asshown here

(13)

where represents a standard dependent rational factor, whilethe factors , where , represent a set of inte-gral factors, that are common to all standards. The order of thefactors in (13), also represents the actual order in which the mul-tistage decimations are performed. Allowing the first factorto be rational allows any arbitrary rational factor to be fac-torized into the above form. Assume that each of the decima-tion factors corresponding to the supported mode, have beenfactorized in the above form. Consider the case of the decima-tion stages which implement the decimation factor , where

. In general, the band edges and alias attenuationof this stage, in each of the standards, are dependent on stan-dard specific parameters and hence, cannot be readily reusedacross multiple standards. However it can be observed that inall the standards, this filter stage lies at the same input OSR

, and decimates by the same factor . This impliesthat the standard dependent filter stages for the factor ,can be replaced by a modified filter, which satisfies antialiasingcriterion in the passband for all the standards. Hence the filter

stages implementing the factors to , can be hard wiredand reused across all the supported modes without any changes.Only the last decimation stage which decimates by the factorhas to be programmable to support variable band-edge specifi-cations.

The factorization in (13) also requires decimation by a ra-tional factor , prior to the integral decimations. Assume thatthe rational decimation factor can be further decomposed intoan integral factor and a fractional decimation factor asshown

(14)

is assumed to lie in the interval [1, 2). For integral decima-tion by in the first stage, cascaded integrator comb (CIC)filters are very efficient due to the absence of any multipliersoperating at the ADC sampling rate [28]. Fractional decimationby the factor can be performed by a transpose Farrow struc-ture based on classical interpolating polynomials, like Lagrangeor B-splines [39], [40]. Both structures are advantageous fromthe perspective of supporting multiple standards, because of theease of handling variable decimation factors and the absence ofan explicit, standard dependent band edge. The zeroes of boththese structures lie at the center of the potential aliasing bands.The worst case alias attenuation in a CIC filter is determined byits order, while that of the transpose Farrow structure dependson the underlying time domain polynomial which is used to ap-proximate sinc response of an ideal reconstruction filter [39].Both the CIC order as well as the underlying polynomial of thetranspose Farrow structure can be chosen to satisfy the worstcase attenuation requirement among all the standards.

In a classical recursive CIC structure, changing the samplerate merely involves changing the downsampling clock in be-tween the integrator and comb stages [28]. However the recur-sive CIC structure has large wordlength adders in the integra-tors, which operate at the input oversampled rate, which resultsin high power consumption [41]. When the CIC filter decima-tion factor is of the form , the transfer function of a

th order CIC filter can be factorized as follows:

(15)

The Noble identities can be used for commuting the subsequentdownsampling operation with the filter stages for reducing thecomputational rate, resulting in a structure called as the nonre-cursive CIC filter (NRCIC) [42]. Assuming that the stages havebeen hard wired for the worst case CIC order variable samplerates can be supported by inserting additional steering logic tobypass unwanted stages, as shown in Fig. 5.

IEEE

Proo

f

8 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

Fig. 6. Transpose Farrow structure.

A Farrow structure is very efficient for fractional interpola-tion. It reduces the general requirement of a time-varying re-construction filter, with a structure that consist largely of lineartime-invariant subfilters. The only time-varying components arethe so called -multipliers, which share a common input calledthen factional sample interval. The transpose Farrow structure,can be obtained by a transposition of the traditional Farrowstructure [40]. The zeroes of the resulting transfer function lieat the integral multiples of the output sample rate, and henceis suitable for a fractional decimation operation. As long as theunderlying interpolating polynomial is chosen to accommodatethe worst case attenuation requirement, reconfiguring the trans-pose Farrow structure for a different fractional decimation factormerely consists of re-initializing the control word of the counterwhich generates the fractional interval value, as shown in Fig. 6[29].

The last stage filter, which decimates by the factor for allthe modes, determines the overall standard specific band edge.Depending on the division of the pulse shaping load betweenthe transmit and receive side, this filter might need to perform apulse shaping function. When no pulse shaping is performed bythe last stage, the stopband edge can be placed at the boundaryof the channel bandwidth, while the passband edge is still de-termined by the transmit side pulse shaping filter. For RC pulseshaping at the transmit side, the passband edge and stop-band edge in radians (normalized by ) can be given asfollows:

(16)

(17)

where represents the interchannel spacing for the th stan-dard. For Gaussian pulse shaping, the stopband edge is identicalto (17), while the passband edge can be given as

(18)

When the last stage filter implements RRC pulse shaping, thepassband edge is identical to (16), while the stopband edge isgiven by

(19)

A typical multistage decimation filter capable of multistan-dard support, needs the individual stages to be reprogrammableto accommodate variable standard specific parameters like theband edge specification and the alias attenuation requirement.The proposed approach alleviates this problem, allowing all thestages prior to the last stage to be hard wired and reused acrossmultiple modes, with very little reconfiguration overheads. Forthese stages, switching between different standards, only re-quires the control words for the NRCIC filter in the first stageand the fractional sample interval generator of the transposeFarrow structure in the second stage to be parameterized. Thisallows the hard-wired stages to be implemented at a very lowarea and power cost, by making use of constant-coefficient filteroptimizations. Only the last stage filter needs to be completelyprogrammable and needs the use of generic multipliers.

IV. PRACTICAL DESIGN CONSIDERATIONS

Two crucial design issues have to be still addressed beforethe proposed factorization strategy can be used in a practicalimplementation: the optimum factorization of an arbitrary ra-tional sample rate conversion factor in the form required by (13)and the identification of the ideal filtering structure for imple-menting the decimation stages.

A. Design Considerations for NRCIC Stage

While the NRCIC structure has the advantages of low com-plexity and low reconfiguration overhead, it suffers from a nonuniform passband. The magnitude response of a th-order CICfilter, which decimates by a factor of is given as follows [28]:

(20)

For low frequencies, can be approximated by. The passband droop can be compensated by cas-

cading a droop correction filter as shown in Fig. 7(a).Using Noble identities, the decimation factor can be commutedto the right of the , as shown in Fig. 7(b). The compen-sated CIC transfer function is given by . If

has a magnitude response of , within the

intervals , the interpolated filter has

a magnitude response of within the passband

interval , and hence can correct the CIC droop

within this interval. However, the passband edge being a func-tion of the standard dependent pulse shaping parameters, thedroop correction filter needs to be programmable to supportmultiple standards. The proposed factorization strategy in (13)guarantees a minimum OSR of at the output of the

IEEE

Proo

f

MICHAEL et al.: DESIGN OF MULTISTANDARD CHANNELIZATION ACCELERATORS 9

Fig. 7. CIC droop correction.

CIC stage, which can be exploited for hard wiring and reusingthe droop correction filter across multiple standards, as shown.

The input sample rate to the CIC stage can be expressed interms of the symbol rate , as follows:

(21)

Preserving the integrity of the signal within the passband in-terval requires to be greater than or equal to , as per theNyquist criterion [43]. Taking this constraint into account, (21)can be used to derive the following inequalities:

(22)

(23)

where represents the passband edge of the the standard,

in radians (normalized by ). Since the fractional decimationload lies in the interval [1, 2), (23) can be reformulated asfollows:

(24)

Hence if the droop correction filter is designed to have a pass-band edge of , the interpolated response

compensates the CIC response for all frequencies below. Such a filter satisfies the droop correction re-

quirement over a band wider than the passband of any standard,as indicated by (24), and hence can be hard wired and reusedby all standards.

B. Design Considerations for Fractional Decimation

The proposed factorization in (13) requires a fractional dec-imation stage to be present in the decimation chain to factorizearbitrary rational factors into the required regular form. Thetranspose Farrow structure can be used for performing the frac-tional decimation operation, as it attenuates the potential aliascomponents [40]. Transpose Farrow structures based on clas-sical interpolating polynomials like Lagrange or B-splines, havethe major drawback of having a poor wideband response. Hencethey may not be able to sufficiently attenuate potential aliascomponents, if the fractional decimation is performed in a lowOSR environment. The fixed factorization design strategy re-quires the fixed decimation stages to be designed for the max-imum alias attenuation requirement among all supported stan-dards.

Fig. 8. Frequency response of a transpose Farrow Structure.

An ideal reconstruction filter has a brickwall frequencyresponse which removes all the spectral images in the input.The corresponding time domain response is a sinc function,which has an infinite temporal support. The Farrow structureuses piecewise polynomials with a finite temporal support toapproximate the ideal sinc response. Fig. 8 plots the magnti-tude response of the transpose Farrow structure, for differentunderlying polynomials. The x axis shows the frequency nor-malized with respect to the sampling rate at the output of thefractional decimation stage, . The zeroes of thetranspose Farrow structure lie around the integral multiplesof , which are also midpoints of the aliasingintervals. The worst case attenuation for the th standard occurs

at , which is the starting boundary of

the first alias interval, normalized by the output sample rate. Since has to satisfied to prevent

aliasing within the passband, a relationship between andthe output OSR, , can be established as follows:

(25)

(26)

Reducing the output OSR of the fractional dec-imation stage shifts away from the first zero of thetranspose Farrow structure, resulting in a lower alias attenua-tion. Increasing has the inverse effect of shifting

towards the first zero, thereby increasing the degree ofstopband attenuation. Fig. 8 indicates that for a given outputOSR, the alias attenuation is also a function of the specificclass of the interpolating polynomial used for designing thetranspose Farrow structure. Among the classical interpolatingpolynomials, B-splines offer the closest approximation to theideal sinc response and the highest alias attenuation for thesame temporal support. The polynomial has to be chosen suchthat for all supported standards, the transpose Farrow structureoffers a sufficient amount of alias attenuation for the requiredoutput OSR of .

C. Partitioning of the Decimation Load

Partitioning the arbitrary rational factor into multiplestages, needs to take the specific constraints of available

IEEE

Proo

f

10 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

TABLE ICHANNELIZATION PARAMETERS

decimation filter structures into account. The nonrecursiveimplementation of the CIC structure, needs the CIC decima-tion factor to be a power of two. For the subsequent integraldecimation, halfband filters are highly efficient structuresfor a decimation factor of two [44]. Halfband filters have anadvantage over generic FIR decimation stages, due to the factthat nearly 50% of the filter coefficients are zero. When thecoefficient symmetry due to a linear phase is also taken intoaccount, only about 25% of the filter taps need to be actuallyimplemented. Hence they have relatively low area and powercosts, when compared to generic FIR filters. Halfband filters,however have the limitation that the passband and stopbandedges are constrained to be symmetrical around radians,and hence cannot be used for stages whose band edges do notsatisfy the above constraint. Performing a bulk of the decima-tion load using the NRCIC filter and halfband filters requiresthe decimation factors of two or powers of two to be extractedfrom . This can be achieved by splitting into fractionaland integral factors as follows:

(27)

where . The factor , represents the largestpower of two, less than or equal to . The fractional decima-tion factor lies in the interval [1, 2) and can be assigned tothe transpose farrow structure. The integral factor needs tobe further split between the first stage NRCIC filter and the sub-sequent integral decimation stages after the transpose Farrowstructure. Splitting the integral load is determined by the outputOSR required by the chosen transpose Farrow structure for sat-isfying the antialiasing criterion for all the standards. Considera scenario where a simple polynomial is used for implementingthe transpose Farrow structure, which offers the required de-gree of alias attenuation only at a higher output OSR. This re-duces the decimation load on the NRCIC stage while increasingthe decimation load after the fractional decimation stage, ne-cessitating increased number of hard-wired filter stages afterthe fractional decimation. Conversely, a higher order polyno-mial may offer the required degree of alias attenuation even atlow OSRs, which shifts the bulk of the integral decimation loadto the NRCIC stage. If the required output OSR is , thedecimation load of the NRCIC stage can be given by .The decimation by a factor of after the fractional deci-mation stage can be performed by halfband stages.The final decimation by factor of two can be performed by aprogrammable FIR filter, which provides a standard dependentband edge.

V. EXPERIMENTAL SYNTHESIS RESULTS

In this section, we demonstrate a practical implementationof a multistandard channelization accelerator, based on the pro-

posed design paradigm, and compare its area to that of a tra-ditional Velcro implementation. The standards that have beenconsidered for the current analysis are Global System for Mo-bile Communications (GSM), Wideband Code Division Mul-tiple Access (W-CDMA), IEEE 802.11a, and Worldwide Inter-operability for Microwave Access (WiMAX). The design of thechannelization function is strongly coupled to the design of theanalog front-end design and the ADCs. We assume that aprogrammable zero-IF analog front-end architecture is used todownconvert the channels of interest to baseband. In line withthe fixed digitization approach [10], the bandwidth of the lowpass analog baseband filter is assumed to be fixed according tothe widest possible channel bandwidth (10 MHz, for WiMAXand IEEE 802.11a). This low pass filter removes all possibleout of band interferers. Only the interferers within this bandare considered for designing the channelization accelerator. TheA/D conversion is assumed to be performed by the multimode

ADC, proposed in [45], which is capable of handling GSM,W-CDMA, IEEE 802.11a, and WiMAX standards. The chan-nelization and sample rate conversion specifications of the indi-vidual standard are given in Table I. The decimation filter stagesof each standard have to be designed such that the interferers andquantization noise, unattenuated by the previous stages, are at-tenuated to the standard dependent noise floor. To demonstratethe area overheads of a Velcro style multistandard accelerator,four different multistage decimation filters were implemented,corresponding to each of the four standards under consideration.A cubic B-spline based transpose Farrow structure was used forperforming the fractional decimation in all the standards. Thisfilter satisfies the alias attenuation requirement of all the modes,for an output OSR of 4. Spline interpolation belongs to a gen-eral class of interpolation, in which the signal is reconstructedusing interpolation coefficients, obtained by passing the inputsamples through a prefilter [39]. The cubic B-spline prefilter isan unstable IIR filter, which cannot be implemented directly. Alinear phase FIR approximation of the prefilter [46], has beenused in the current implementation.

The decimation load in each of the single mode decimationfilters is partitioned according to the factorization in Table II.The integral decimation in the first stage is performed by a fifthorder NRCIC filter for all the modes. The droop correction filterof the NRCIC stage and the cubic B-spline prefilter in the secondstage, can be clubbed together into a single filter, whose coeffi-cients can be obtained by a convolution of the above two filter re-sponses. The final decimation by a factor of four is performed bya hard-wired constant-coefficient halfband and a hard-wired FIRstage, in each of the four standards. Halfband filters are muchmore efficient than generic FIR filters for decimation by a factorof two, as nearly 50% of the coefficients are zero. However theband edge of these filters is constrained to be symmetric around

. The last FIR decimation stage provides the standard de-pendent band-edge. For W-CDMA and WiMAX standards, the

IEEE

Proo

f

MICHAEL et al.: DESIGN OF MULTISTANDARD CHANNELIZATION ACCELERATORS 11

Fig. 9. Proposed multistandard decimation filter.

TABLE IIFACTORIZATION OF SRC FACTOR

last stage FIR filters also perform a RRC pulse shaping func-tion. The hard-wired filter stages have been optimized by the

multiple constant multiplication (MCM) algorithm [15].A multistandard decimation filter was also designed using

the methodology proposed in the current work, as shown inFig. 9. It also uses the same factorization scheme suggestedin Table II, and the same cubic B-spline based transposeFarrow structure, that was used for fractional decimation inthe single mode decimation filters. Hence in any particularmode, both the single mode and the proposed multistandardimplementations would have the same number of stages, themain difference being that in the single mode case, all stagesare hard wired, whereas in the multistandard case, some ofthe stages are programmable/configurable. The integral dec-imation in the first stage was performed by a programmableNRCIC filter. The decimation by a factor of four was per-formed by constant-coefficient hard-wired halfband filter, anda programmable time-shared FIR filter, which provides thestandard dependent band edge. The band-edge modificationstrategy, proposed in the current work, allows the same half-band filter to be hard wired and reused by all standards withoutany reconfiguration overheads. The hard-wired stages in themultistandard accelerator are also optimized using theMCM algorithm [15]. The last stage FIR filter decimates by afactor of two, and is implemented in polyphase form with eachof the two subfilters mapped onto a single MAC unit. A registerfile capable of storing 16 16-bit coefficients was instantiatedfor each subfilter. Both the Velcro style accelerator and theproposed multistandard accelerator were implemented using aTSMC 0.18 process. Their cell areas were estimated usingthe Synopsys Design Compiler. The area estimates in terms ofthe gate count in Table III were obtained by normalizing thesecell areas by the cell area of a two input NAND gate of thesame library.

The experimental data shows that for the standards under con-sideration, the proposed multistandard channelization acceler-ator approach can result in about 65% reduction in the area,when compared to a Velcro-style multistandard accelerator im-plementation for the four standards under consideration. Withincreasing number of standards, the achieved reductions withrespect to a Velcro style implementation can be expected to in-crease. The area advantage over the Velcro approach comes atthe expense of increased power consumption and reconfigura-tion time. A Velcro style multistandard accelerator requires only

TABLE IIIGATE COUNT OF DIFFERENT CHANNELIZATION ACCELERATOR

IMPLEMENTATIONS

TABLE IVCHANNELIZATION COMPUTATIONAL LOAD

a small control word to switch between multiple single-modeaccelerators. In the proposed approach, the last stage filter iscompletely programmable, while both the NRCIC and transposeFarrow structure need a control word for parameterization.

The proposed structure will always be more expensive interms of dynamic power consumption when compared to aVelcro style accelerator due to the additional overheads ofhaving a programmable FIR stage and a programmable NRCICstage. The increased dynamic power consumption in the pro-posed multistandard channelization accelerator is primarilycontributed by the programmable FIR filter in the last stage.This stage decimates by a factor of two, and is implemented inpolyphase form, with each subfilter mapped to a single MACFIR filter. Temporal style FIR filter implementations experi-ence significantly higher switching activity when compared tospatial style implementations due to loss of input correlation,resulting in increased dynamic power. A number of power op-timizations strategies like coefficient reordering and coefficientsegmentation have targeted the reduction of switching activityin MAC FIR filters [30]. Other works have focused on usingincreased parallelism and algorithmic strength reduction forreducing the power consumption of programmable time-sharedfilters [47], [48]. The use of these optimizations can furtherreduce the power consumption gap between the single-modeaccelerators and the proposed multistandard accelerator. Hencea Velcro style multistandard accelerator with the applicationof power optimization strategies like clock gating and powergating, can potentially have a lower dynamic and leakage powerthan the proposed accelerator in any single mode. However, theproposed design is scalable in terms of area when the numberof standards are increased as evident from Table III.

When compared to the filter coprocessor approach, the pro-posed accelerator will always be more power efficient, as it

IEEE

Proo

f

12 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

TABLE VDYNAMIC POWER CONSUMPTION IN DIFFERENT MODES

TABLE VIFACTORIZATION OF SRC FACTOR FOR WIMAX

offloads a significant portion of the computational load fromgeneric MAC units to hard-wired filter stages, which can exploitconstant-coefficient optimization algorithms to perform the co-efficient multiplications at a very low power consumption. Thepower consumption of the coprocessor method is highly de-pendent on a number of architectural parameters: the degree ofparallelism, the temporal correlation effects due to time multi-plexing, the number of memory accesses, the ability to exploitlinear phase property and polyphase decomposition, the con-trol overheads, etc. A coarse architecture independent estimateof a coprocessor power consumption can be obtained using theinformation about the number of MAC operations per second(MAC/s) in a particular mode and the energy per operation ofa generic MAC unit. The computational load of the channeliza-tion accelerator [in terms of million MAC operations per second(MMAC/s)] has been listed for different operational modes inTable IV.

A MAC circuit comprising of a 16 16 bit multiplier and a32-bit accumulator was synthesized using the TSMC 0.18process. The energy of a single MAC operation was estimatedusing the Synopsys Design Compiler as 164.268 pJ. Using thecomputational load data from Table IV, the estimated powerconsumption of the coprocessor approach is shown in Table V,along with the power consumption of the Velcro style acceler-ator and the proposed multistandard accelerator.

Table V reveals that the proposed multistandard acceleratorsignificantly outperforms a filter coprocessor based acceleratorimplementation in terms of power consumption. The dynamicpower consumption proposed structure is slightly higher inthan the Velcro approach due to the programmability in theNRCIC stage and the last stage filter. When compared to Velcrostyle multistandard accelerator, the proposed architecture hasa greater potential for in-field upgrades for supporting a newstandard, which is one of the important features envisioned ina SDR. Note that the proposed factorization scheme implicitlyassumes that the SRC factor is greater than or equal to thefactor . It is usually possible to identify aset of factors, to such that is smaller than , for allthe standards under consideration. For instance, it can be seenthat for the four standards considered here, , is greater than

. The presence of the NRCIC stage and the transposeFarrow stage, which handle the standard dependent rationaldecimation load of , allows any arbitrary SRC factor, greaterthan or equal to 4 to be factorized into the regular form re-quired by (13). Hence the reusability of the hard-wired NRCICstage, the transpose Farrow stage and the halfband stage in the

proposed multistandard channelization accelerator for a newstandard is limited only by the following two factors:

— The SRC factor is greater than 4.— The hard-wired filter stages satisfy the alias attenuation

requirement of the new standard.The fixed decimation load, , after the fractional decimationstage is a design parameter that has to be chosen by the designerafter taking into consideration, the ADC architecture, theexpected SRC factors for different standards and the alias atten-uation offered by the chosen fractional decimation stage.

Reusing the above multistandard accelerator for a newstandard can be illustrated by the following example. Assumethat the SDR handset has to be upgraded to support the LongTerm Evolution (LTE) standard (20–MHz bandwidth) [49]. The20–MHz LTE channel bandwidth is identical to the channelbandwidths of WiMAX and IEEE 802.11a standards, and canreuse the same multimode ADC [45] used for the abovedesign. The symbol rate corresponding to the 20 MHz channelis 30.72 MSymbol/sec. The resultant SRC factor can be mappedonto the multistage decimation filter as shown in Table VI. Itcan be seen that the NRCIC stage is completely bypassed inthe LTE mode. The configuration information for bypassingthe NRCIC stage, the configuration information for the Farrowstructure and the coefficients for the last stage programmableFIR filter have to be supplied through an infield-upgrade. Thehard-wired halfband stage can be reused without any reconfig-uration overhead.

VI. CONCLUSION

Emerging communication paradigms require radio terminalsto provide seamless mobility across multiple access networks.HW accelerators which implement the computationally in-tensive baseband kernels are typically optimized for a singlestandard, and hence, may limit the flexibility of a multistandardSDR. A traditional Velcro style accelerator implementationof the computationally intensive channelization function isnot scalable for supporting an arbitrary number of standards.Reducing the silicon area penalty for providing multistandardsupport requires the opportunities for hardware reuse to beidentified across multiple standards. In a single mode chan-nelization accelerator, the filter stages which are a part of themultistage decimation filter which performs the channelizationfunction, can be efficiently implemented at a low area/powercost with the help of constant-coefficient filter optimizations.However the ability to reuse these hard-wired filter stage fora different standard is limited by the dependencies on the

IEEE

Proo

f

MICHAEL et al.: DESIGN OF MULTISTANDARD CHANNELIZATION ACCELERATORS 13

standard specific band edge specifications, the attenuationrequirements and the factorization of the SRC loads.

In the current paper, we have proposed a new factorizationstrategy to split any arbitrary SRC loads onto a fixed set of dec-imation filter stages. The proposed method has the advantagethat the stages prior to the last stage have very low reconfigura-tion overheads, and can be readily reused across multiple stan-dards. Only the last stage FIR filter needs to be reprogrammablefor supporting variable coefficient sets and filter lengths. Thereusable stages prior to the last stage also benefit from the areareductions obtained by constant coefficient filter optimizations.The proposed multistandard decimation filter thereby has a sil-icon area which is close to that of a single-mode decimationfilter, while still being scalable enough to support an arbitrarynumber of standards.

ACKNOWLEDGMENT

The authors would like to acknowledge the help fromDr. S. Sreekumar in obtaining the hardware synthesis results.

REFERENCES

[1] C. H. Stapper, “The effects of wafer to wafer defect density variationson integrated circuit defect and fault distributions,” IBM J. Res. De-velop., vol. 29, no. 1, pp. 87–97, Jan. 1985.

[2] V. Rodriguez, C. Moy, and J. Palicot, “Install or invoke?: The optimaltradeoff between performance and cost in the design of multi-stan-dard reconfigurable radios,” Wiley InterSci. Wireless Commun. MobileComput. J., vol. 7, no. 9, pp. 1143–1156, 2007.

[3] J. Mitola, “The software radio archtitecture,” IEEE Commun. Mag.,vol. 33, no. 5, pp. 26–69, 1995.

[4] J. A. Kilpatrick, R. J. Cyr, E. L. Org, and G. Dawe, “New SDR archi-tecture enables ubiquitous data connectivity,” RF and Microw. Technol.Design Eng., Jan. 2006.

[5] J. M. Rabaey, “Silicon platforms for the next generation wirelesssystems—What role does reconfigurable hardware play?,” in Proc.,

2000, pp. 277–85[AU: PLS SUPPLY MOREINFO. NAME OF PUB?-Ed.].

[6] Y. Neuvo, “Cellular phones as embedded systems,” in Proc. IEEE Int.Solid-State Circuits Conf., Feb. 2004, vol. 1, pp. 32–37.

[7] M. Eteläperä and J. P. Soininen, “4G Mobile Terminal Architecture,”VTTTech. Rep., Nov.–Oct. 2007.

[8] W. Wolf, A. A. Jerraya, and G. Martin, “Multiprocessor System-on-Chip (MPSoC) Technology,” IEEE Trans. Comput.-Aided Design In-tegr. Circuits Syst., vol. 27, no. 10m, pp. 1701–1713, Oct. 2008.

[9] C. H. van Berkel, “Multi-core for mobile phones,” in Proc. Design,Autom. Test in Eur. Conf. (DATE’09), Apr. 2009, pp. 1260–1265.

[10] T. Hentschel, M. Henker, and G. P. Fettweis, “The digital front-endof software radio terminals,” IEEE Pers. Commun., vol. 6, no. 4, pp.40–46, Aug. 1999.

[11] Y. Shin, S. Heo, H. Kim, and J. Choi, “Supply switching with groundcollapse: Simultaneous control of subthreshold and gate leakage cur-rent in nanometer-scale CMOS circuits,” IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 15, no. 7, pp. 758–766, Jul. 2007.

[12] A. P. Vinod and E. M. K. Lai, “On the implementation of efficientchannel filters for wideband receivers by optimizing common subex-pression elimination methods,” IEEE Trans. Comput.-Aided Design In-tegr. Circuits Syst., vol. 24, no. 2, pp. 295–304, Feb. 2005.

[13] M. Potkonjak et al., “Multiple constant multiplications: Efficient andversatile framework and algorithms for exploring common subexpres-sion elimination,” IEEE Trans. Comput.-Aided Des. Integr. CircuitsSyst., vol. 15, no. 2, pp. 151–165, 1996.

[14] A. G. Dempster and M. D. Macleod, “Use of minimum-adder multi-plier blocks in FIR digital filters,” IEEE Trans. Circuits Syst. II, AnalogDigit. Signal Process., vol. 42, no. 9, pp. 569–577, Sep. 1995.

[15] Y. Voronenko and M. Püschel, “Multiplierless multiple constant mul-tiplication,” ACM Trans. Algorithms, vol. 3, no. 2, 2007.

[16] C. H. Chang, J. Chen, and A. P. Vinod, “Information theoretic approachto complexity reduction of FIR filter design,” IEEE Trans. Circuits Syst.I, vol. 55, no. 8, pp. 2310–2321, Sep. 2008.

[17] N. Michael, A. P. Vinod, C. Moy, and J. Palicot, “Design paradigm forstandard agnostic channelization in flexible mobile radios,” in Proc.IEEE Int. Symp. Circuits Syst., Paris, France, May–Jun. 2010.

[18] K. Uyttenhove and M. Steyaert, “Speed-power-accuracy trade off inhigh speed ADC’s,” IEEE Trans. Circuits Syst. II, vol. 4, pp. 247–257,Apr. 2002.

[19] A. Rusu, A. Borodenkov, M. Ismail, and H. Tenhunen, “A triple-modesigma-delta modulator for multi-standard wireless radio receivers,”Analog Integr. Circuits Signal Process., vol. 47, no. 2, pp. 113–124,May 2006.

[20] A. Silva, N. H. Horta, and J. G. Guilherme, “Reconfigurable multi-mode sigma-delta modulator for 4G mobile terminals,” Integr., TheVLSI J., vol. 42, no. 1, pp. 34–46, Jan. 2009.

[21] O. Shoaei, “Continuous-time delta-sigma A/D converters for highspeed applications,” Ph.D. dissertation, Carleton Univ., XXXX,

1995[AU: PLS SUPPLY NAME OFCITY. -Ed.].

[22] K. Gentile, Digital Pulse Shaping Filter Basics Analog Devices, Appl.Note, AN-992, pp. 1–12.

[23] T. S. Rappaport, Wireless Communications—Principles and Practice,2nd ed. New Delhi, India: Prentice-Hall, 2004.

[24] L. R. Rabiner, Multirate Digital Signal Processing. Upper SaddleRiver, NJ: Prentice-Hall, PTR, 1996.

[25] M. Bellanger, “On computational complexity in digital filters,” inProc.The Eurioeab Conf. Circuit Theory Design, Aug. 1981, pp.58–63.

[26] N. S. Kim et al., “Leakage current: Moore’s law meets static power,”Computer, vol. 36, no. 12, pp. 68–75, Dec. 2003.

[27] C. Xu et al., “Order-configurable programmable power efficient FIRfilters,” in Proc. Int. Conf. on High Perf. Comput., Dec. 1996, pp.357–361.

[28] E. B. Hogenauer, “An economical class of digital filters for decimationand interpolation,” IEEE Trans. Acoust. Speech Signal Process., vol.ASSP-29, no. 2, pp. 155–162, Apr. 1981.

[29] J. Vankka, Digital Synthesizers and Transmitters for Software Radio.New York: Springer-Verlag, 2005, pp. 239–257.

[30] A. T. Erdogan and T. Arslan, “Low power FIR filter implementationsbased on coefficient ordering algorithm,” in Proc. IEEE Comput. Soc.Ann. Symp. VLSI: Emerging Trends in VLSI Syst. Design, 2004, p. 226.

[31] [AU: More info needed on type ofsource and where to find it]Synopsys In, DW fir.High-Speed Digital FIR Filter Jun. 2009.

[32] T. K. Shahana et al., “Decimation filter design toolbox for multi-stan-dard wireless transceivers using MATLAB,” Int. J. Signal Process., vol.5, no. 2, p. 154, 2009.

[33] C. J. Barrett, “Low-power decimation filter design for multi-standard,”Univ. Calif., Berkeley, CA, Tech. Rep. No. UCB/ERL M97/88, 1997.

[34] A. Ghazel, L. Naviner, and K. Grati, “On design and implementationof a decimation filter for multistandard wireless transceivers,” IEEETrans. Wireless Commun., vol. 1, no. 4, pp. 558–562, Oct. 2002.

[35] Z. Tao and S. Signell, “Multi-standard delta-sigma decimation filterdesign,” in Proc. IEEE Asia Pacific Conf. Circuits Syst., Dec. 2006, pp.1212–1215.

[36] T. Hentschel and G. P. Fettweis, “Sample rate conversion for softwareradio,” IEEE Commun. Mag., vol. 38, no. 8, pp. 142–150, Aug. 2000.

[37] S. Mirabbasi and K. Martin, “IIR digital filter for �� decimation,channel selection, and square-root raised-cosine Nyquist filtering,” inProc. IEEE Int. Solid-State Circuits Conf., 2002, vol. 2, pp. 96–417.

[38] Wireless Communications Specifications RF Cafe [Online]. Avail-able: http://www.rfcafe.com/references/electrical/wireless-comm-specs-new.htm

[39] A. Tkacenko, “Variable sample rate conversion techniques for the Ad-vanced Receiver,” Interplanetary Netw. Progr. Rep., vol. 42–168, Feb.15, 2007.

[40] T. Hentschel and G. Fettweis, “Continuous-time digital filters forsample-rate conversion in reconfigurable radio terminals,” Frequenz,vol. 55, pp. 185–188, 2001.

[41] Y. Gao, L. Jia, J. Isoaho, and H. Tenhunen, “A comparison design ofcomb decimators for sigma-delta analog-to-digital converters,” AnalogIntegr. Circuits Signal Process., vol. 22, pp. 51–60, 1999.

[42] H. Aboushady et al., “Efficient polyphase decomposition of combdecimation filters in �� analog-to-digital converters,” IEEE Trans.Circuits Syst. II, Analog Digit. Signal Process., vol. 48, no. 10, pp.898–903, Oct. 2001.

IEEE

Proo

f

14 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

[43] J. G. Proakis and D. G. Manolakis, Digital Signal Processing. NewYork: Pearson Education, 2004.

[44] P. P. Vaidyanathan, Multirate Systems and Filter Banks. EnglewoodCliffs, NJ: Prentice-Hall, PTR, 1993.

[45] A. Rusu et al., “Reconfigurable ADCs enable smart radios for 4G wire-less connectivity,” IEEE Circuits Dev. Mag., vol. 22, no. 3, pp. 6–11,2006.

[46] M. Unser and M. Eden, “FIR approximations of inverse filters and per-fect reconstruction filter banks,” Signal Process., vol. 36, no. 2, pp.163–174, Mar. 1994.

[47] N. Michael, C. Moy, A. P. Vinod, and J. Palicot, “Area-power tradeoffsfor flexible filtering in green radios,” J. Commun. Netw., vol. 12, no. 2,pp. 158–167, Apr. 2010.

[48] N. Michael, A. P. Vinod, C. Moy, and J. Palicot, “Area-efficient time-shared FIR filters in nanoscale CMOS,” in Proc. IEEE Int. Conf. GreenCircuits Syst., Jun. 2010, pp. 54–59.

[49] H. Tarn, E. Hemphill, and D. Hawke, 3GPP LTE Digital Front End Ref-erence Design, Xilinx Application Note XAPP1123 (v1.0), Oct. 2008.

Navin Michael (S’XX) was born on May 29,1985, in Chennai, India. He received the B.Tech.degree in information and communication tech-nology from DAIICT, Gandhinagar, India, in 2007.

[AU: provide initial year ofmembership grades for allauthors]

Since August 2007, he has been pursuing the Ph.D.degree from Nanyang Technological University, Sin-

gapore. His research focuses on the flexible and low-power implementation ofthe digital front-end in multimode software defined radios. Between June 2008to November 2008 and September 2009 to February 2010, he was working as aVisiting Researcher with the SCEE team of SUPELEC/IETR, Rennes, France.These visits were supported by the French Embassy of Singapore, as a part of theMerlion Ph.D. Grant 2007. His major interests are software defined radios, cog-nitive radios, green communications, low-power signal processing hardware,and channelization.

A. P. Vinod (SM’XX) received the B.Tech. degreein instrumentation and control engineering fromthe University of Calicut, India, in 1994 and theM.Eng. and Ph.D. degrees in computer engineeringfrom Nanyang Technological University (NTU),Singapore, in 2000 and 2004, respectively.

He has spent the first five years of his career inindustry as an automation engineer with Kirloskar,Bangalore, India; Tata Honeywell, Pune, India; andShell Singapore. From September 2000 to September2002, he was a lecturer with the School of Electrical

and Electronic Engineering, Singapore Polytechnic, Singapore. He was a lec-

turer with the School of Computer Engineering, NTU, from September 2002to November 2004, and since December 2004, he has been an Assistant Pro-fessor with NTU. His research interests include digital signal processing (DSP),low-power and reconfigurable DSP circuits, software radio, cognitive radio, andbrain-computer interface. He has published 100 papers in refereed internationaljournals and conferences.

Dr. Vinod is an editor of the International Journal of Advancements in Com-puting Technology.

Christophe Moy (M’XX)received the engineerdiploma of the National Institute of Applied Sci-ences (INSA), Rennes, France, in 1995. He receivedthe M.Sc. and Ph.D. degrees in electronics in 1995and 1999, respectively, from the INSA.

He was then with the Mitsubishi Electric ITE-TCLResearch Lab for six years, where he was focusingon software radio systems and concepts, includingdigital signal processing, HW and SW architecture,codesign methodology, and reconfiguration. Herepresented Mitsubishi Electric at the SDR Forum

and worked on French research program A3S, and IST European project � �.Since 2005, he has been a Professor with SUPELEC. His research, whichfocuses on software radio and cognitive radio, is done in the IETR entityof CNRS. He addresses heterogeneous design techniques for SDR, as wellas high-level design for cognitive management and decision-making insidethe cognitive cycle. He is participating to the IST Network of ExcellenceNEWCOM ++ and SEC EULER project as well as a French ANR project onSDR design called Mopcom. He was also involved in IST projects � � phase2 and NEWCOM, and French ANR project Idromel.

Jacques Palicot (M’XX) received the Ph.D. degreein signal processing from the University of Rennes,France, in 1983.

Since 1988, he has been involved in studies aboutequalization techniques applied to digital transmis-sions and analog TV systems. Since 1991, he has fo-cused mainly in studies concerning the digital com-munications area and automatic measurements tech-niques. He has taken an active part in various interna-tional bodies, such as EBU, CCIR, URSI, and withinthe RACE, ACTS, and IST European projects. He has

published various scientific articles notably on equalization techniques, echocancellation, hierarchical modulations, and software radio techniques. He is cur-rently involved in adaptive signal processing and in new techniques, such assoftware radio and cognitive radio. From November 2001 to September 2003,he had a temporary position within INRIA/IRISA, Rennes. Since October 2003,he has been with SUPELEC, Rennes, where he leads the Signal Communica-tions and Embedded Electronics (SCEE) Research Team.