Application of Field Programmable Gate Array (FPGA) To Digital Signal Processing(DSP

JORIND (9) 1, June, 2011. ISSN 1596 � 8308. www.transcampus.org./journals, www.ajol.info/journals/jorind

37

APPLICATION OF FIELD PROGRAMMABLE GATE ARRAY TO DIGITALSIGNAL PROCESSING

O.A. AbisoyeDepartment of Computer Science, Federal University of Technology, Minna, Nigeria

E-mail: [email protected]

AbstractThis work shows how one parallel technology Field Programmable Gate Array (FPGA) can beapplied to digital signal processing problem to increase computational speed. The bestalgorithm for solving Digital Signal Processing Applications; Fast Fourier Transform (FFT)algorithm has shown significant speed improvement when implemented on a FPGA. The designmethodology, the design tools for implementing DSP functions in FPGAs is discussed e.g. SystemGenerator from Xilinx, Impulse C programming model etc. FPGA design in compares with othertechnolog) is envisaged. In this research work FPGA typically exploits parallelism becauseFPGA is a parallel device. With the use of simulation tool, Impulse Codeveloper (Impulse C), ofFPGA platform on FFT algorithm, graphical tools that provide initial estimates of algorithmthroughput such as loop latencies and pipeline effective rates are generated. Using such tools,you can interactively change optimization options or iteratively modify and recompile C code toobtain higher performance.

Keywords: Platform Programmable Digital Signal Processors Digital Signal Processing (DSP),Field Programmable Gate Array (FPGA)

IntroductionThroughout the history of computing, digitalsignal processing applications have pushed thelimits of computing power, especially in termsof real-time computation. While processedsignals have broadly ranged from media drivenspeech, audio and video wave forms tospecialized radar and sonar data, mostcalculations performed by signal processingsystems have exhibited the same basiccomputational characteristics.

DSP algorithms have long been run onstandard computers, on specialized processorscalled digital signal processors (DSPs) or onpurpose built hardware such as ApplicationSpecific Integrated Circuits (ASICs). Recently,DSP has received increased attention due torapid advancement in multimedia computingand high speed wired and wireless

communication. Today, there are additionaltechnologies used for digital signal processingincluding more powerful general purposemicroprocessors, field programmable gatearrays(FPGAs), digital signalcontrollers(mostly for industrial applicationssuch as motor control and stream processors).

The inherent data parallelism found in manyDigital Signal Processing (DSP) functions hasmade DSP algorithms ideal candidates forhardware implementation, leveragingexpanding Virtual Level Scale IntegratedCircuit (VLSI) capabilities.

In Digital Signal Processing applications ofFPGA includes: Digital image processing,Speech/Audio signal processing,Telecommunication, Biomedical,RADAR,SONAR, and Robotics


38

FPGAs are increasingly used in conventionalHigh Performance computing applicationswhere computational kernels such as FFT orconvolution are performed on the FPGAinstead of a microprocessor.

ObjectiveIt is known that using Programmable DigitalSignal Processors (PDSPs) and ApplicationSpecific Integrated Circuits (ASICs) moredifficulties are still in existence to solve digitalsignal processing applications. To eradicatesuch difficulties, the possible solutions arebeen envisaged �Field Programmable LogicArray.

A radix 2 FFT algorithm was posed and thenimplements the algorithm on FPGA platformusing Impulse Codeveloper from XilinxGenerator as simulator. The algorithm istranslated into C++ program having a greatdeal of FPGA specific hardware knowledge.The resulting optimized c code is compiled bythe FPGA development tools (in particular thec-to hardware compiler) to create a parallelhardware/software implementation.

Research methodThe execution of this research work is dividedinto phases and the goals are achieved throughphases that include:

1. Description of the complete applicationin C++ language and use a standardC++ debugger to verify the algorithm.

2. Profiling the application to find thecomputational �hot spots�.

3. Use of data streaming, messagepassing and/or shared memory topartition the algorithm into multiplecommunicating software and hardwareprocesses.

4. Use of interactive optimization tools toanalyze and improve the performanceof hardware-accelerated functions.

5. Use of C++-to-hardware compiler togenerate synthesizable hardware, in theform of hardware description languagefiles.

Approaches to FPGA applicationdevelopment

Digital signal processingDigital signal processing (DSP) is concernedwith the representation of the signals digitallyas sequences of numbers or symbols and theprocessing of these signals to extractinformation from the signals.

Digital signal processing key operations:The basic DSP operations include convolution,correlation, filtering, transformations andmodulation.

FPGA technology:A field-programmable gate array is asemiconductor device that can be configuredby the customer or designer aftermanufacturing�hence the name "field-programmable". To program an FPGA youspecify how you want the chip to work with alogic circuit diagram or a source code in ahardware description language (HDL). FPGAscan be used to implement any logical functionthat an application-specific integrated circuit(ASIC) could perform, but the ability to updatethe functionality after shipping offersadvantages for many applications.

FPGAs contain programmable logiccomponents called "logic blocks", and ahierarchy of reconfigurable interconnects thatallow the blocks to be "wired together"�somewhat like a one-chip programmablebreadboard. Logic blocks can be configured toperform complex combinational functions, ormerely simple logic gates like AND and XOR.In most FPGAs, the logic blocks also includememory elements, which may be simple flip-flops or more complete blocks of memory.


39

At the highest level, FPGAs arereprogrammable silicon chips. Using prebuiltlogic blocks and programmable routingresources, you can configure these chips toimplement custom hardware functionalitywithout ever having to pick up a breadboard orsoldering iron.

DSP implementationDigital signal processing is often implementedusing specialized microprocessors such as theDSP56000, the TMS320, or the SHARC.These often process data using fixed-pointarithmetic, although some versions areavailable which use floating point arithmeticand are more powerful. For faster applicationswith vast usage, ASICs might be designedspecifically. For slow applications, atraditional slower processor such as a

microcontroller may be adequate. For fasterapplications FPGAs might be used.

Technique for the implementation:Fast Fourier TransformFast Fourier Transform (FFT) is a fastapproach to compute Discrete FourierTransform (DFT). It is of O(nlog n) whileDFT is of O(n)2. The number of operationsrequired is obviously of O(n)2 order. But dueto transform properties it is possible to reducethe number of operations to the order ofO(nlog2n). Historically, DFT is the origindiscrete version of FFT from continuousversion

For a continuous function of one variable f(x),the Fourier Transform F(k) will be defined asan integral of the form:

dx(i)

Where F(k) is fourier transform of kth harmonic, x is consecutive voltage values is twiddlefactor and f(x)= x(nT), T is time series for n values

The transform operates in complex domain. Recall, that imaginary exponent could be written as:

(ii)

For sampled function continuous transform (i) turns into discrete one:

F(k)= (iv)

Expression (iv) is discrete Fourier transform � DFT. Here {f0, f1, ... , fN-1} is input discretefunction and {F0, F1, ... , FN-1} is result of Fourier transform.

where is an Nth Primitive root of unity, Let us put N=8 and write down our DFT:

We can s

Worked1. W

retik=

Thus: W

=

1

SinceT

Thus X(1It has nowcomplex

Impulse Impulsethat geneIt is a sparallel planguageparallelis

JORIND (9)

split the sum

examplesWith the assepresents fouime intervals=3(since N-

a. When

hen kth harmo)=0

b. When

= x(0) + x(T

( )

Then X(k) =1) = 1+i andw been showsequence {2

CC is softw

erates hardwset of librarprogrammine. Impulsesm and ge

1, June, 2011. ISS

m into two by

sumption ofur consecutivs T. The val1=3)

X(

n kth harmoni

monic = 0 th

n kth harmoni

X(

T) + x

=Cos(= 1 + Cos(

is complexwn that the ti2,1+i}

ware to hardwware to softw

ry functionsng for FPGAC optimize

enerates HD

N 1596 � 8308. w

y separating

f this sequeve voltages xlue X(k) is t

k) =

ic = 0 then w

= x(0) = 1 +0

hen we have

ic = 1 then w

k) =

x(2T)

) - iSin(with magnitime series {1

ware compiware interfacs that suppo

As using thes C code f

DL ready f

www.transcampus.o

40

odd and eve

ence {1,0,0,x(0) = 1, x(Tthen calculat

we have

=

) + x(T) + x(+ 0 +1 =2

e X(k) = 2 i

we have

+ x(3T)

= 1+Cos(3) = 1+i

tude 2 and p1,0} has the

leres.ort Cforfor

FPGAfunctioThe ImappiFPGADSPparalleImpuls

org./journals, www

en terms and

,1} that hasT) = 0, x(2T)ted from N=

(2T) + x(3T)

is entirely re

/2) �iSin(3

phase angleDiscrete Fo

A synthesis. Ions to FPGAImpulse Cng of

A/processor sapplicationsel, dataflse C simpli

w.ajol.info/journal

factoring ou

s been proc) = 0, x(3T)=4 : thus k=

)

eal of magni

= 1

/2)

-11 =urier Transf

It moves comA.

approachalgorithms

systems. It is because iflow-orientedifies the cr

ls/jorind

(viii)

ut the latter s

(

cessed. The= 1, recorde

= 0, k=1,k=2

itude 2 and p

+ 0 +

= 45form(DFT) b

mpute �inte

focuses ons to mis ideal for mt creates hd applicaeation of h

sum:

(ix)

dataed vat2 and

phase

0 +

by the

nsive

n themixedmany

highlyation.

highly


41

parallel algorithms, including mixedsoftware/hardware algorithms, through the useof well defined data communication, messagepassing, and synchronization mechanisms.

Experimentation with the simulatorThe source code is divided into various blocksand each block simulation in the hardwareplatform is shown. The pipeline stages, thelatency, effective rate, and number of samplesgenerated per each cycle for each block isshown in the source code. This will help inevaluating the acceleration and theperformance of each algorithm.

Discussion of resultsResults show that the parallel implementationof FFT achieves linear speed-up and real-time performance for large matrix sizes. Thiswas achieved by the use of FPGA technologythat uses Impulse C tools as simulator.

Graphical tools ( Fig 1,3) showing the sourcecode(Fig 2,4) showing the datapath and help toprovide initial estimates of algorithmthroughput such as loop latencies and pipelineeffective rates. Using such tools, you caninteractively change optimization options oriteratively modify and recompile C code toobtain higher performance. Such designiterations may take only a matter of minuteswhen using C, whereas the same iterationsmay require hours of even days when usingVHDL or Verilog.

Moreover, Impulse C-tools uses optimizationtechniques to increase the performance of thecode being used for an application withouthaving a great deal of FPGA-specific hardwareknowledge. We have also shown thatpipelining introduces a potentially high degreeof parallelism in the generated logic, allowingus to achieve the best possible throughput.

Graphical Representation of Synthesis of Code using the Simulator (Impulse Codeveloper

Pipeline stages

Fig 1. Showing the source code of pipeline2 generating latency of 3, effective rate 18,

2samples/cycle and Maximum Unit delay of 9


42

Fig 2. Showing the datapath of pipeline2 generating latency of 3, effective rate 18,

2samples/cycle and Maximum Unit delay of 9

Fig 3. Showing the source code of pipeline4 generating latency of 14, effective rate 32,

1sample/clockcycle and Maximum Unit delay of 32


43

Fig 4. Showing the datapath of pipeline4 generating latency of 14, effective rate 32,

1sample/clockcycle and Maximum Unit delay of 32

With simulator (Block Stages) Fig 5. Max unit Delay of 0


44

Fig 6. Showing the source code of the block stage

Without Simulator

Fig 7. The modules or classes are shown while the pipeline rate, effective rate cannot bedetermined8.0 Findings: Performance Evaluation/Comparative Analysis

WITHOUT FPGA SIMULATOR WITH FPGA SIMULATOR1. It makes use of Discrete Fourier Transform

Algorithm Formula

X(k) = to generate output filters.

It uses Impulse Co-Developer as simulator togenerate output filter from input filters supplied.

2. It does not generate hardware program It generates hardware program simultaneously3. The number of adders, comparators cannot be

calculated.The number of adders, comparators can becalculated

The number of samples generated per cycle can�t bedetermined.

The number of samples generated per cycledetermined.

4. The code cannot be easily pipelined to speed up the It uses Pragma CO-UNROLL & pragma CO-


45

processing of filters. PIPELINE to pipeline the code and processing offilters.

5. It can easily be used for fixed format of filters. It can be used for fixed & complex filters.6. No graphical representation of synthesis of code. It generates graphical representation to show

synthesis of code in blocks.7. The synthesis of code processing flow cannot be

seen.The synthesis of flow of blocks & statements canbe shown

8. Does not generate hardware description language. It generates hardware description language9. It can easily be used to implement radix 2. it can easily be used to implement radix 2, radix

4, algorithm.

10. Presence of low level embedded functions Presence of higher level embedded functions(such as adders & multipliers) and embeddedmemories as well as logic blocks to implementdecoders or mathematical functions.

ConclusionIn conclusion, this project has described thebenefits of using an FPGA as a DSP co-processor than conventional processors. Wehave shown that DSP algorithm can takeadvantage of FPGAs as a viable resource toimprove highly computationally expensivedigital signal processing by moving expensivecomputations from the CPU and into thespecifically designed logic inside the FPGAand thus obtaining high performance at aneconomical price.

Algorithms such as Fast Fourier Transformhave shown significant speed improvementwhen implemented on a FPGA. FPGAs arebecoming easier to use as the developmenttools get better and as the prices on FPGAsfalls smaller/denser chip manufacturingtechnology becomes available, thus makingthem affordable to use in more computingapplications.

Therefore, trends of FPGAs have now proveda better alternative to traditional processorssuch as ASIC for a growing number of higher-volume applications. Further research canfocus on , hardware or software interfacingand FPGA tool development

RecommendationBased on the results and findings above wenow recommend the use of Fieldprogrammable gate array(FPGA) as the besttechnology to solve digital signal processingapplications rather than using conventionalprocessors because it increases thecomputational speed of filters. FPGAtechnology is reliable and efficient comparedto conventional processors.


46

References

Anthony S. & Lan P.(2006),- �The design ofanew FPGA Architecture�, Friday, Jan 20,BDTI Focus Report: FPGAs for DSP, SecondEdition, BDTI Benchmarking,.Dag S. and William W. (2004). �Digital Signal

Processing and Applications� (2nd ed.Elsevier.

Edition: 3, illustrated, revised Published bySpringer, 2007, ISBN3540726128,FPGAs accelerate time tomarket for industrial designs, EETimes 7/2/2004 http://www.us.design-reuse.com/articles/8190/fpgas-accelerate-time-to-market-for-industrial-designs.html

Gregory R.(1995): �A Guide to Using FieldProgrammable Gate Arrays (FPGAs)for Application-Specific Digital SignalProcessing Performance�

Jason, C. and Kenneth, Y.(2000):International Symposium on FieldProgrammable Gate ArraysProceedings of the 2000 ACM/SIGDAeighth international symposium onField programmable gate arraysMonterey, California, United States

Maya G. and Paul S. 2006- �ReconfigurableComputing- Digital Signal ProcessingApplications�.

Moreno, W.A.; Poladia, K.(1998): “Fieldprogrammable gate array design foran application specifics SignalProcessing algorithms� Devices,Circuits and Systems, 1998.Proceedings of the 1998 Second IEEEInternational Caracas Conference onDigital Object Identifier10.1109/ICCDCS.1998.705837Volume 1 , Issue , 2-4 Mar 1998Page(s):222 � 225 Research, April2006.

Roger W. and John M. (2008). �FPGA-basedImplementation of Signal ProcessingSystems�

Russel T. and Wayne, B. (1999),�Reconfigurable Computing ForDigital Signal Processing: Survey.Department of Electrical and ComputerEngineering, University ofMassachusetts, Amherst. MA01003,USA.

Ryle,D. Popig, D. and Stahlberag, V. (2006).�Applying FPGA to BiologicalProblems�

Thompson, M.(2000): �The Field-Programmable Gate Array �(FPGA):Expanding Its Boundaries, InStatMarket .

Uwe M. B. (2006): �Digital Signal ProcessingWith Field Programmable LogicArrays� 2006, 2nd Edition

Application of Field Programmable Gate Array (FPGA) To Digital Signal Processing(DSP

Documents

Transcript of Application of Field Programmable Gate Array (FPGA) To Digital Signal Processing(DSP