Application of Field Programmable Gate Array (FPGA) To Digital Signal Processing(DSP
-
Upload
universitydirectory -
Category
Documents
-
view
5 -
download
0
Transcript of Application of Field Programmable Gate Array (FPGA) To Digital Signal Processing(DSP
JORIND (9) 1, June, 2011. ISSN 1596 � 8308. www.transcampus.org./journals, www.ajol.info/journals/jorind
37
APPLICATION OF FIELD PROGRAMMABLE GATE ARRAY TO DIGITALSIGNAL PROCESSING
O.A. AbisoyeDepartment of Computer Science, Federal University of Technology, Minna, Nigeria
E-mail: [email protected]
AbstractThis work shows how one parallel technology Field Programmable Gate Array (FPGA) can beapplied to digital signal processing problem to increase computational speed. The bestalgorithm for solving Digital Signal Processing Applications; Fast Fourier Transform (FFT)algorithm has shown significant speed improvement when implemented on a FPGA. The designmethodology, the design tools for implementing DSP functions in FPGAs is discussed e.g. SystemGenerator from Xilinx, Impulse C programming model etc. FPGA design in compares with othertechnolog) is envisaged. In this research work FPGA typically exploits parallelism becauseFPGA is a parallel device. With the use of simulation tool, Impulse Codeveloper (Impulse C), ofFPGA platform on FFT algorithm, graphical tools that provide initial estimates of algorithmthroughput such as loop latencies and pipeline effective rates are generated. Using such tools,you can interactively change optimization options or iteratively modify and recompile C code toobtain higher performance.
Keywords: Platform Programmable Digital Signal Processors Digital Signal Processing (DSP),Field Programmable Gate Array (FPGA)
IntroductionThroughout the history of computing, digitalsignal processing applications have pushed thelimits of computing power, especially in termsof real-time computation. While processedsignals have broadly ranged from media drivenspeech, audio and video wave forms tospecialized radar and sonar data, mostcalculations performed by signal processingsystems have exhibited the same basiccomputational characteristics.
DSP algorithms have long been run onstandard computers, on specialized processorscalled digital signal processors (DSPs) or onpurpose built hardware such as ApplicationSpecific Integrated Circuits (ASICs). Recently,DSP has received increased attention due torapid advancement in multimedia computingand high speed wired and wireless
communication. Today, there are additionaltechnologies used for digital signal processingincluding more powerful general purposemicroprocessors, field programmable gatearrays(FPGAs), digital signalcontrollers(mostly for industrial applicationssuch as motor control and stream processors).
The inherent data parallelism found in manyDigital Signal Processing (DSP) functions hasmade DSP algorithms ideal candidates forhardware implementation, leveragingexpanding Virtual Level Scale IntegratedCircuit (VLSI) capabilities.
In Digital Signal Processing applications ofFPGA includes: Digital image processing,Speech/Audio signal processing,Telecommunication, Biomedical,RADAR,SONAR, and Robotics
JORIND (9) 1, June, 2011. ISSN 1596 � 8308. www.transcampus.org./journals, www.ajol.info/journals/jorind
38
FPGAs are increasingly used in conventionalHigh Performance computing applicationswhere computational kernels such as FFT orconvolution are performed on the FPGAinstead of a microprocessor.
ObjectiveIt is known that using Programmable DigitalSignal Processors (PDSPs) and ApplicationSpecific Integrated Circuits (ASICs) moredifficulties are still in existence to solve digitalsignal processing applications. To eradicatesuch difficulties, the possible solutions arebeen envisaged �Field Programmable LogicArray.
A radix 2 FFT algorithm was posed and thenimplements the algorithm on FPGA platformusing Impulse Codeveloper from XilinxGenerator as simulator. The algorithm istranslated into C++ program having a greatdeal of FPGA specific hardware knowledge.The resulting optimized c code is compiled bythe FPGA development tools (in particular thec-to hardware compiler) to create a parallelhardware/software implementation.
Research methodThe execution of this research work is dividedinto phases and the goals are achieved throughphases that include:
1. Description of the complete applicationin C++ language and use a standardC++ debugger to verify the algorithm.
2. Profiling the application to find thecomputational �hot spots�.
3. Use of data streaming, messagepassing and/or shared memory topartition the algorithm into multiplecommunicating software and hardwareprocesses.
4. Use of interactive optimization tools toanalyze and improve the performanceof hardware-accelerated functions.
5. Use of C++-to-hardware compiler togenerate synthesizable hardware, in theform of hardware description languagefiles.
Approaches to FPGA applicationdevelopment
Digital signal processingDigital signal processing (DSP) is concernedwith the representation of the signals digitallyas sequences of numbers or symbols and theprocessing of these signals to extractinformation from the signals.
Digital signal processing key operations:The basic DSP operations include convolution,correlation, filtering, transformations andmodulation.
FPGA technology:A field-programmable gate array is asemiconductor device that can be configuredby the customer or designer aftermanufacturing�hence the name "field-programmable". To program an FPGA youspecify how you want the chip to work with alogic circuit diagram or a source code in ahardware description language (HDL). FPGAscan be used to implement any logical functionthat an application-specific integrated circuit(ASIC) could perform, but the ability to updatethe functionality after shipping offersadvantages for many applications.
FPGAs contain programmable logiccomponents called "logic blocks", and ahierarchy of reconfigurable interconnects thatallow the blocks to be "wired together"�somewhat like a one-chip programmablebreadboard. Logic blocks can be configured toperform complex combinational functions, ormerely simple logic gates like AND and XOR.In most FPGAs, the logic blocks also includememory elements, which may be simple flip-flops or more complete blocks of memory.
JORIND (9) 1, June, 2011. ISSN 1596 � 8308. www.transcampus.org./journals, www.ajol.info/journals/jorind
39
At the highest level, FPGAs arereprogrammable silicon chips. Using prebuiltlogic blocks and programmable routingresources, you can configure these chips toimplement custom hardware functionalitywithout ever having to pick up a breadboard orsoldering iron.
DSP implementationDigital signal processing is often implementedusing specialized microprocessors such as theDSP56000, the TMS320, or the SHARC.These often process data using fixed-pointarithmetic, although some versions areavailable which use floating point arithmeticand are more powerful. For faster applicationswith vast usage, ASICs might be designedspecifically. For slow applications, atraditional slower processor such as a
microcontroller may be adequate. For fasterapplications FPGAs might be used.
Technique for the implementation:Fast Fourier TransformFast Fourier Transform (FFT) is a fastapproach to compute Discrete FourierTransform (DFT). It is of O(nlog n) whileDFT is of O(n)2. The number of operationsrequired is obviously of O(n)2 order. But dueto transform properties it is possible to reducethe number of operations to the order ofO(nlog2n). Historically, DFT is the origindiscrete version of FFT from continuousversion
For a continuous function of one variable f(x),the Fourier Transform F(k) will be defined asan integral of the form:
dx(i)
Where F(k) is fourier transform of kth harmonic, x is consecutive voltage values is twiddlefactor and f(x)= x(nT), T is time series for n values
The transform operates in complex domain. Recall, that imaginary exponent could be written as:
(ii)
For sampled function continuous transform (i) turns into discrete one:
F(k)= (iv)
Expression (iv) is discrete Fourier transform � DFT. Here {f0, f1, ... , fN-1} is input discretefunction and {F0, F1, ... , FN-1} is result of Fourier transform.
where is an Nth Primitive root of unity, Let us put N=8 and write down our DFT:
We can s
Worked1. W
retik=
Thus: W
=
1
SinceT
Thus X(1It has nowcomplex
Impulse Impulsethat geneIt is a sparallel planguageparallelis
JORIND (9)
split the sum
examplesWith the assepresents fouime intervals=3(since N-
a. When
hen kth harmo)=0
b. When
= x(0) + x(T
( )
Then X(k) =1) = 1+i andw been showsequence {2
CC is softw
erates hardwset of librarprogrammine. Impulsesm and ge
1, June, 2011. ISS
m into two by
sumption ofur consecutivs T. The val1=3)
X(
n kth harmoni
monic = 0 th
n kth harmoni
X(
T) + x
=Cos(= 1 + Cos(
is complexwn that the ti2,1+i}
ware to hardwware to softw
ry functionsng for FPGAC optimize
enerates HD
N 1596 � 8308. w
y separating
f this sequeve voltages xlue X(k) is t
k) =
ic = 0 then w
= x(0) = 1 +0
hen we have
ic = 1 then w
k) =
x(2T)
) - iSin(with magnitime series {1
ware compiware interfacs that suppo
As using thes C code f
DL ready f
www.transcampus.o
40
odd and eve
ence {1,0,0,x(0) = 1, x(Tthen calculat
we have
=
) + x(T) + x(+ 0 +1 =2
e X(k) = 2 i
we have
+ x(3T)
= 1+Cos(3) = 1+i
tude 2 and p1,0} has the
leres.ort Cforfor
FPGAfunctioThe ImappiFPGADSPparalleImpuls
org./journals, www
en terms and
,1} that hasT) = 0, x(2T)ted from N=
(2T) + x(3T)
is entirely re
/2) �iSin(3
phase angleDiscrete Fo
A synthesis. Ions to FPGAImpulse Cng of
A/processor sapplicationsel, dataflse C simpli
w.ajol.info/journal
factoring ou
s been proc) = 0, x(3T)=4 : thus k=
)
eal of magni
= 1
/2)
-11 =urier Transf
It moves comA.
approachalgorithms
systems. It is because iflow-orientedifies the cr
ls/jorind
(viii)
ut the latter s
(
cessed. The= 1, recorde
= 0, k=1,k=2
itude 2 and p
+ 0 +
= 45form(DFT) b
mpute �inte
focuses ons to mis ideal for mt creates hd applicaeation of h
sum:
(ix)
dataed vat2 and
phase
0 +
by the
nsive
n themixedmany
highlyation.
highly
JORIND (9) 1, June, 2011. ISSN 1596 � 8308. www.transcampus.org./journals, www.ajol.info/journals/jorind
41
parallel algorithms, including mixedsoftware/hardware algorithms, through the useof well defined data communication, messagepassing, and synchronization mechanisms.
Experimentation with the simulatorThe source code is divided into various blocksand each block simulation in the hardwareplatform is shown. The pipeline stages, thelatency, effective rate, and number of samplesgenerated per each cycle for each block isshown in the source code. This will help inevaluating the acceleration and theperformance of each algorithm.
Discussion of resultsResults show that the parallel implementationof FFT achieves linear speed-up and real-time performance for large matrix sizes. Thiswas achieved by the use of FPGA technologythat uses Impulse C tools as simulator.
Graphical tools ( Fig 1,3) showing the sourcecode(Fig 2,4) showing the datapath and help toprovide initial estimates of algorithmthroughput such as loop latencies and pipelineeffective rates. Using such tools, you caninteractively change optimization options oriteratively modify and recompile C code toobtain higher performance. Such designiterations may take only a matter of minuteswhen using C, whereas the same iterationsmay require hours of even days when usingVHDL or Verilog.
Moreover, Impulse C-tools uses optimizationtechniques to increase the performance of thecode being used for an application withouthaving a great deal of FPGA-specific hardwareknowledge. We have also shown thatpipelining introduces a potentially high degreeof parallelism in the generated logic, allowingus to achieve the best possible throughput.
Graphical Representation of Synthesis of Code using the Simulator (Impulse Codeveloper
Pipeline stages
Fig 1. Showing the source code of pipeline2 generating latency of 3, effective rate 18,
2samples/cycle and Maximum Unit delay of 9
JORIND (9) 1, June, 2011. ISSN 1596 � 8308. www.transcampus.org./journals, www.ajol.info/journals/jorind
42
Fig 2. Showing the datapath of pipeline2 generating latency of 3, effective rate 18,
2samples/cycle and Maximum Unit delay of 9
Fig 3. Showing the source code of pipeline4 generating latency of 14, effective rate 32,
1sample/clockcycle and Maximum Unit delay of 32
JORIND (9) 1, June, 2011. ISSN 1596 � 8308. www.transcampus.org./journals, www.ajol.info/journals/jorind
43
Fig 4. Showing the datapath of pipeline4 generating latency of 14, effective rate 32,
1sample/clockcycle and Maximum Unit delay of 32
With simulator (Block Stages) Fig 5. Max unit Delay of 0
JORIND (9) 1, June, 2011. ISSN 1596 � 8308. www.transcampus.org./journals, www.ajol.info/journals/jorind
44
Fig 6. Showing the source code of the block stage
Without Simulator
Fig 7. The modules or classes are shown while the pipeline rate, effective rate cannot bedetermined8.0 Findings: Performance Evaluation/Comparative Analysis
WITHOUT FPGA SIMULATOR WITH FPGA SIMULATOR1. It makes use of Discrete Fourier Transform
Algorithm Formula
X(k) = to generate output filters.
It uses Impulse Co-Developer as simulator togenerate output filter from input filters supplied.
2. It does not generate hardware program It generates hardware program simultaneously3. The number of adders, comparators cannot be
calculated.The number of adders, comparators can becalculated
The number of samples generated per cycle can�t bedetermined.
The number of samples generated per cycledetermined.
4. The code cannot be easily pipelined to speed up the It uses Pragma CO-UNROLL & pragma CO-
JORIND (9) 1, June, 2011. ISSN 1596 � 8308. www.transcampus.org./journals, www.ajol.info/journals/jorind
45
processing of filters. PIPELINE to pipeline the code and processing offilters.
5. It can easily be used for fixed format of filters. It can be used for fixed & complex filters.6. No graphical representation of synthesis of code. It generates graphical representation to show
synthesis of code in blocks.7. The synthesis of code processing flow cannot be
seen.The synthesis of flow of blocks & statements canbe shown
8. Does not generate hardware description language. It generates hardware description language9. It can easily be used to implement radix 2. it can easily be used to implement radix 2, radix
4, algorithm.
10. Presence of low level embedded functions Presence of higher level embedded functions(such as adders & multipliers) and embeddedmemories as well as logic blocks to implementdecoders or mathematical functions.
ConclusionIn conclusion, this project has described thebenefits of using an FPGA as a DSP co-processor than conventional processors. Wehave shown that DSP algorithm can takeadvantage of FPGAs as a viable resource toimprove highly computationally expensivedigital signal processing by moving expensivecomputations from the CPU and into thespecifically designed logic inside the FPGAand thus obtaining high performance at aneconomical price.
Algorithms such as Fast Fourier Transformhave shown significant speed improvementwhen implemented on a FPGA. FPGAs arebecoming easier to use as the developmenttools get better and as the prices on FPGAsfalls smaller/denser chip manufacturingtechnology becomes available, thus makingthem affordable to use in more computingapplications.
Therefore, trends of FPGAs have now proveda better alternative to traditional processorssuch as ASIC for a growing number of higher-volume applications. Further research canfocus on , hardware or software interfacingand FPGA tool development
RecommendationBased on the results and findings above wenow recommend the use of Fieldprogrammable gate array(FPGA) as the besttechnology to solve digital signal processingapplications rather than using conventionalprocessors because it increases thecomputational speed of filters. FPGAtechnology is reliable and efficient comparedto conventional processors.
JORIND (9) 1, June, 2011. ISSN 1596 � 8308. www.transcampus.org./journals, www.ajol.info/journals/jorind
46
References
Anthony S. & Lan P.(2006),- �The design ofanew FPGA Architecture�, Friday, Jan 20,BDTI Focus Report: FPGAs for DSP, SecondEdition, BDTI Benchmarking,.Dag S. and William W. (2004). �Digital Signal
Processing and Applications� (2nd ed.Elsevier.
Edition: 3, illustrated, revised Published bySpringer, 2007, ISBN3540726128,FPGAs accelerate time tomarket for industrial designs, EETimes 7/2/2004 http://www.us.design-reuse.com/articles/8190/fpgas-accelerate-time-to-market-for-industrial-designs.html
Gregory R.(1995): �A Guide to Using FieldProgrammable Gate Arrays (FPGAs)for Application-Specific Digital SignalProcessing Performance�
Jason, C. and Kenneth, Y.(2000):International Symposium on FieldProgrammable Gate ArraysProceedings of the 2000 ACM/SIGDAeighth international symposium onField programmable gate arraysMonterey, California, United States
Maya G. and Paul S. 2006- �ReconfigurableComputing- Digital Signal ProcessingApplications�.
Moreno, W.A.; Poladia, K.(1998): “Fieldprogrammable gate array design foran application specifics SignalProcessing algorithms� Devices,Circuits and Systems, 1998.Proceedings of the 1998 Second IEEEInternational Caracas Conference onDigital Object Identifier10.1109/ICCDCS.1998.705837Volume 1 , Issue , 2-4 Mar 1998Page(s):222 � 225 Research, April2006.
Roger W. and John M. (2008). �FPGA-basedImplementation of Signal ProcessingSystems�
Russel T. and Wayne, B. (1999),�Reconfigurable Computing ForDigital Signal Processing: Survey.Department of Electrical and ComputerEngineering, University ofMassachusetts, Amherst. MA01003,USA.
Ryle,D. Popig, D. and Stahlberag, V. (2006).�Applying FPGA to BiologicalProblems�
Thompson, M.(2000): �The Field-Programmable Gate Array �(FPGA):Expanding Its Boundaries, InStatMarket .
Uwe M. B. (2006): �Digital Signal ProcessingWith Field Programmable LogicArrays� 2006, 2nd Edition