Messages from 141775

Article: 141775
Subject: Re: About configuring FPGAs
From: gabor <gabor@alacron.com>
Date: Wed, 8 Jul 2009 07:23:50 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 8, 4:54=A0am, cow <cowsgomoosodo...@gmail.com> wrote:
> Hi everyone.
>
> I'm trying to learn more about how configuration of an FPGA works. I
> have been looking at data sheets of integrated boards such as Opal
> Kelly, but I wasn't able to get much information. My question is: In
> boards where a bitstream is sent every time after power-up, where is
> the configuration bitstream stored? Is PROM or Flash required for
> storing it? Is a CPLD or something similar needed to manage the
> configuration of the FPGA? If the configuration is not done by JTAG
> but by sending the bitstream through USB or PCIe, how is the bitstream
> recognized as a bitstream?
>
> It'd be nice if you can point me to any resources concerning this too.
> Thanks in advance!

You can get a lot more info from the FPGA vendors.  Every newer
Xilinx part has an associated "Configuration Users Guide", for
example.

Briefly though, most newer SRAM-based FPGA's allow the use of
some simple serial Flash or PROM like SPI serial.  Older parts
tended to need specialty PROMs like the Atmel AT17xx series.
In the old days people used cheap CPLD's to "convert" more
standard PROM or Flash parts into FPGA bitstream loaders
since the specialty parts were considerably more expensive.
Systems with processors and associated storage media often
leave out the PROM or Flash memory and just download the
bitstream with software.

Regards,
Gabor

Article: 141776
Subject: Re: Multipliers and CORDIC cores
From: rickman <gnuarm@gmail.com>
Date: Wed, 8 Jul 2009 08:24:11 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 8, 6:20 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> Rai <raifa...@gmail.com> wrote:
> > Q.1. I have few questions. I want to know which is the most efficient
> > multiplier algorithm which can be implement on FPGA? I used core
> > generator for multiplier but it utilizes more CLB as i needed. I want
> > efficient and area constrained algorithm. Is there any algorithm
> > considering the efficiency both in area and timing?
>
> If you aren't in a hurry you could do an iterative multiplier
> which generates some number of product bits per clock cycle.
>
> You can multiply an M bit number by an N bit number in M clock
> cycles with an N bit adder and 2N bit shift register, for example.

That is exactly what I did for my current design.  A 16x16 multiply
done using a <= b*c; in VHDL used some 300 LUTs and executes in 1
clock cycle (combinatorial).  I coded it to use 1 clock per multiplier
bit and it was reduced to about 62 LUTs.  I found it was more
efficient to use an unsigned multiplier with detection of the sign of
the multiplier and negation of both inputs when the multiplier is
negative than to use Booth's algorithm.  The difference was not large
however.  Since the multiplier is used one bit at a time, the negative
of it can be calculated in a bit serial subtraction circuit using a
couple of LUTs including the carry term.   In essence the sign of the
multiplier and the current bit of the multiplier determines whether
the multiplicand, the negative of the multiplicand or zero is added to
the product.

BTW, I don't count CLBs since they are most often not the limiting
measurable unit in an FPGA.  CLBs are counted as used if one or all of
the internal objects are used.  In a modern FPGA a CLB has 4 or 8
LUTs, IIRC.  So I count the LUTs.

> > Q.2. I need to know that is square root in fpga core uses CLB or it is
> > hard IP?
>
> There are iterative square root algorithms, too.  Well, most are,
> but there are convenient shift and subtract algorithms for
> simple hardware implementation.
>
> -- glen

I have never explored square root algorithms, but I once worked for a
company that made array processors (mini computer sized DSP
computers).  They used an iterative algorithm to do divide and square
root.  I recall that they used seven iterations rather than to
continue computing until the error term got to within a limit as is
specified by most algorithms.  I guess a known time of computation was
more important than getting a known accuracy.  I think they used
Newton-Raphson, but again, I can't be certain.  I believe I have read
that there are other, better algorithms, but they may be more
complex.  Newton-Raphson is easy to implement in hardware.

When you ask about "hard" IP, I don't think there are hard wired
square root circuits in any FPGAs that I know of.

Rick

Article: 141777
Subject: Re: Multipliers and CORDIC cores
From: "Symon" <symon_brewer@hotmail.com>
Date: Wed, 8 Jul 2009 17:26:43 +0100
Links: << >> << T >> << A >>


"Rai" <raifasih@gmail.com> wrote in message 
news:37f254c4-d193-4948-94a8-baae8a56e50c@a36g2000yqc.googlegroups.com...
>
> Q.2. I need to know that is square root in fpga core uses CLB or it is
> hard IP?
>
> Regards
> Rai

Hi Rai,

Try the modified Dijkstra algoritm. Very simple to implement, 1 bit of 
output calculated per clock.

http://lib.tkk.fi/Diss/2005/isbn9512275279/article3.pdf

HTH., Syms.

Article: 141778
Subject: Breakdown of utilisation
From: Andi <andi@hotmail.com>
Date: Wed, 08 Jul 2009 18:17:59 +0100
Links: << >> << T >> << A >>

Hi

Probably a quite popular question, but when checking on ISE 10.1 in the 
design summary->Module Level Utilisation I do not see the 
slice-breakdown of my single entities. Anyone an idea why I can not see
how much area the different modules in my design contribute to the 
overall size?

Thanks

Article: 141779
Subject: Re: Breakdown of utilisation
From: "Antti.Lukats@googlemail.com" <Antti.Lukats@googlemail.com>
Date: Wed, 8 Jul 2009 10:18:54 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 8, 8:17=A0pm, Andi <a...@hotmail.com> wrote:
> Hi
>
> Probably a quite popular question, but when checking on ISE 10.1 in the
> design summary->Module Level Utilisation I do not see the
> slice-breakdown of my single entities. Anyone an idea why I can not see
> how much area the different modules in my design contribute to the
> overall size?
>
> Thanks

need enable "detailed map report"

Antti

Article: 141780
Subject: Re: Multipliers and CORDIC cores
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 8 Jul 2009 18:50:07 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:
(snip)
 
< I have never explored square root algorithms, but I once worked for a
< company that made array processors (mini computer sized DSP
< computers).  They used an iterative algorithm to do divide and square
< root.  I recall that they used seven iterations rather than to
< continue computing until the error term got to within a limit as is
< specified by most algorithms.  I guess a known time of computation was
< more important than getting a known accuracy.  I think they used
< Newton-Raphson, but again, I can't be certain.  I believe I have read
< that there are other, better algorithms, but they may be more
< complex.  Newton-Raphson is easy to implement in hardware.

Newton-Raphson is easy to implement if you have floating
point hardware available.   Actually, seven sounds high.
In the usual case, you divide the exponent by two.
A simple initial approximation, different for odd or even
input exponent, and two iterations (single precision)
or four (double precision) should be enough.

If you have a fast enough multiplier, there is a Newton-Raphson
divide algorithm used on machines such as the IBM 360/91 and
the Cray-1.
 
< When you ask about "hard" IP, I don't think there are hard wired
< square root circuits in any FPGAs that I know of.

If you don't have a floating point processor available, there
are bit oriented algorithms, somewhat similar to the pencil and
paper algorithm used before electronic calculators.  I believe
that would be a better choice in an FPGA.

-- glen

Article: 141781
Subject: Re: About configuring FPGAs
From: "alan@nishioka.com" <alan@nishioka.com>
Date: Wed, 8 Jul 2009 12:28:49 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 8, 1:54=A0am, cow <cowsgomoosodo...@gmail.com> wrote:
> Hi everyone.
>
> I'm trying to learn more about how configuration of an FPGA works. I
> have been looking at data sheets of integrated boards such as Opal
> Kelly, but I wasn't able to get much information. My question is: In
> boards where a bitstream is sent every time after power-up, where is
> the configuration bitstream stored?

where ever there is non-volatile storage.

> Is PROM or Flash required for storing it?

usually, yes.  some fpga's have built in flash.

> Is a CPLD or something similar needed to manage the
> configuration of the FPGA?

usually no.  you can buy eeproms that are designed to output
bitstreams to fpgas.  newer fpgas will work with some generic eeproms.

> If the configuration is not done by JTAG
> but by sending the bitstream through USB or PCIe, how is the bitstream
> recognized as a bitstream?

xilinx uses special codes to align and recognize bitstreams.  xilinx
also uses crc's to verify accuracy.
but anything you send on the configuration pins is assumed to be
bitstream.

Article: 141782
Subject: Re: webserver
From: "hvo" <hai.vo@synrad.com>
Date: Wed, 08 Jul 2009 16:11:31 -0500
Links: << >> << T >> << A >>


>there are several ways,
>mostly based on HTML/HTTP ideas (like the "refresh" meta tag, just google
it)
>or with a more complex AJAX method (XMLHttpRequest) that is called
>every X second by a client-side page that your server hosts.
>I have also imagined a version where the page is never ending loading,
>which updates your data, but your browser's cache may dislike it.
>
>> Thanks for any sugestions
>> HV
>yg
>
>-- 
>http://ygdes.com / http://yasep.org
>

Thanks for the suggestions.  The refresh meta tag works fine for me. 
Though I discovered the setTimout function which also works.

HV

Article: 141783
Subject: bufif0 wired-or in Altera FLEX10K
From: "Andrew Holme" <ah@nospam.co.uk>
Date: Wed, 8 Jul 2009 22:49:18 +0100
Links: << >> << T >> << A >>

I've just been sent some code which contains the equivalent of :

module top_level (
inout d,
input a,
input b,
input c,
.....);

always @ ( a, ....)
 if (a)
    d <= .......
 else
   d <= 1'bz;

bufif0 (d, ....., b);

bufif0 (d, ....., c);

endmodule

What will the Altera synthesis tool make of this and what happens when both 
b and c are zero?

Article: 141784
Subject: web alternatives to USENET comp.arch.fpga
From: Bob Smith <usenet@linuxtoys.org>
Date: Wed, 08 Jul 2009 15:31:02 -0700
Links: << >> << T >> << A >>

Sorry if this was asked once before ....

ATT is shutting down USENET for its residential DSL subscribers.

Is there a web portal to comp.arch.fpga anywhere?

I'd have to have to pay for third-party usenet if it can be avoided.

thanks
Bob Smith

news-support@sbcglobal.net wrote:
> Please note that on July 15, 2009, AT&T will no longer be offering
> access to the Usenet netnews service.  If you wish to continue reading
> Usenet newsgroups, access is available through third-party vendors.
> 
> For further information, please visit http://support.att.net/usenet
> 
> Sincerely,
> 
> Your AT&T News Team
> 
> Distribution: AT&T SBC Global Usenet Netnews Servers
> 
>

Article: 141785
Subject: Re: How to interpret polyphase coefficients generated in MATLAB
From: vizziee <vizziee@gmail.com>
Date: Wed, 8 Jul 2009 15:44:06 -0700 (PDT)
Links: << >> << T >> << A >>

> Check your equations. =A0I get
>
> ( ( (22-1)/2 )/20 + ( (22-1)/2 )/10 + ( (1010-1)/2 )/5 ) =3D 102.4750
> which is consistant with what you are actually getting, not what you
> are calculating.
>
> Also note that the filters you are identifying as half-band filters
> are not actually half-band filters. =A0Half-band filters have the
> property that almost half of the coefficients are zero, saving
> calculations, which may be useful to you. But, among other properties,
> the lengths of halfband filters are constrained to {3, 7, 11, 15, 19,
> 23, 27, ..., previous+4, ...}, so a filter of length 22 isn't a half-
> band filter.
>
> Dirk Bell

Thanks Dirk. I checked the Halfband filters that I generated in
MATLAB. It says it has Direct Form Polyphase Filter structure. The
coefficients were generated by specifying 'halfband' option to
fdesign.decimator function. The number of generated coefficients for
this half-band decimator is actually 21 and I padded a zero to make it
22 for a 2-path polyphase implementation.

I did a reading for Half-band filters and what you say about the
HalfBand FIR filter coefficients is absolutely true: the number of
coefficients should be 3+4n for a non-negative integer n. However I
couldn't reconcile the MATLAB halfband filter generated above with
this definition. Are halfband decimators different than the Nyquist
half-band filters? Also the way you calculated the delay of half-band
filters appears very much true. However assuming this is the polyphase
implementation of half-band decimators, shouldn't delay be calculated
like a standard polyphase filter delay formula: ((No_of_taps-1)/2)/
Decimation_Factor?

Thanks again for your insightful replies earlier. I could drastically
reduce the no of taps in my current design while also bettering the
response. Though the questions as above still linger in my mind.

Regards,

vizziee.

Article: 141786
Subject: Virtex 4 and 5
From: randyddr <randyddr@gmail.com>
Date: Wed, 8 Jul 2009 16:18:40 -0700 (PDT)
Links: << >> << T >> << A >>

We're interested in buying any overstock or excess Virtex-4 , Virtex-5
or Spartan inventory you may have.  Please email me at
randyh@advancedmp.com .

I know inventory on these can be costly so hopefully we can help.  We
can handle large volume purchases.

thanks,
Randy
randyh@advancedmp.com

Article: 141787
Subject: Re: web alternatives to USENET comp.arch.fpga
From: Barry <barry374@gmail.com>
Date: Wed, 8 Jul 2009 16:31:54 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 8, 3:31=A0pm, Bob Smith <use...@linuxtoys.org> wrote:
> Sorry if this was asked once before ....
>
> ATT is shutting down USENET for its residential DSL subscribers.
>
> Is there a web portal to comp.arch.fpga anywhere?
>
> I'd have to have to pay for third-party usenet if it can be avoided.
>
> thanks
> Bob Smith
>
>
>
> news-supp...@sbcglobal.net wrote:
> > Please note that on July 15, 2009, AT&T will no longer be offering
> > access to the Usenet netnews service. =A0If you wish to continue readin=
g
> > Usenet newsgroups, access is available through third-party vendors.
>
> > For further information, please visithttp://support.att.net/usenet
>
> > Sincerely,
>
> > Your AT&T News Team
>
> > Distribution: AT&T SBC Global Usenet Netnews Servers- Hide quoted text =
-
>
> - Show quoted text -

groups.google.com

Article: 141788
Subject: Re: web alternatives to USENET comp.arch.fpga
From: "Symon" <symon_brewer@hotmail.com>
Date: Thu, 9 Jul 2009 01:47:20 +0100
Links: << >> << T >> << A >>

http://lmgtfy.com/?q=list+of+free+news+servers

Article: 141789
Subject: Re: Virtex 4 and 5
From: "Antti.Lukats@googlemail.com" <Antti.Lukats@googlemail.com>
Date: Thu, 9 Jul 2009 01:22:12 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 9, 2:18=A0am, randyddr <randy...@gmail.com> wrote:
> We're interested in buying any overstock or excess Virtex-4 , Virtex-5
> or Spartan inventory you may have. =A0Please email me at
> ran...@advancedmp.com .
>
> I know inventory on these can be costly so hopefully we can help. =A0We
> can handle large volume purchases.
>
> thanks,
> Randy
> ran...@advancedmp.com

hm, Xilinx turnover did decrease 5% last quarter mainly because they
were not able to supply Virtex-5
so it seems like V-5 silicon could traded as gold or shares

eh, thats great idea.. use FPGA chips as coins
hm, I have some Altera coins too overleft
but I bet i need to wait until the value of those coins go up (as
antique coins i mean)

Antti

Article: 141790
Subject: Re: Breakdown of utilisation
From: Andi <andi@hotmail.com>
Date: Thu, 09 Jul 2009 10:55:46 +0100
Links: << >> << T >> << A >>


> 
> need enable "detailed map report"
> 
> Antti

Thanks for that Antti. I had a look into the map properties and there I 
said

Other map command line options: -detail

According to the command line documentary, this should enable the 
detailed map report, however, result is the same... It does not show me
any breakdown

Any other ideas maybe?

Article: 141791
Subject: EDK 8.2 executable.elf
From: Eyyub Can Odacioglu <ecodacioglu@gmail.com>
Date: Thu, 9 Jul 2009 06:58:25 -0700 (PDT)
Links: << >> << T >> << A >>

Hi All,

I am working on a mp3 implementation project with EDK 8.2,

When I add software application project, program is not creating
executable.elf file. I have also example of the project and if I copy
the file in to my project folder, when I build applications, EDK is
deleting the file.

What can I do?

thanks

Article: 141792
Subject: how to get back multi hier netlist in xst
From: Andy Botterill <andy@plymouth2.demon.co.uk>
Date: Thu, 09 Jul 2009 19:38:30 +0100
Links: << >> << T >> << A >>

Using ISE webpack 10.1.02 running under linux.

I am integrating a set of previously written design blocks into a higher 
level of the design. The top level is instruction_decode. Some time ago 
I enabled  saving files hierarchically by ticking Generate Multiple 
Hierarchical Netlist Files button in the Generate Post-Synthesis 
Simulation Model process. This worked quite well for some time.

I corrected the name of the block within the file (to make it the same 
as the filename). I ended up with two entries in the library. I was 
deleting this and probably did something else wrong. At the moment I 
cannot turn on multiple hierarchical netlist files without using the tcl 
interface.

When I use the tcl interface I use the following commands I can turn on 
multiple hierarchical netlist files.
project set "Keep Hierarchy" "Yes" -process "Synthesize - XST"
project set "Generate Multiple Hierarchical Netlist Files" "True"
Whilst this works and gives me the correct number of files I get 
warnings of the form:-
WARNING:NetListWriters:306 - Signal bus add_0[56 : 8] on block multiply 
is not
    reconstructed, because there are some missing bus signals.

( That design block on its own synthesise cleanly.)

If I do a flat synthesis run. i.e. no sub modules at all. No synthesis 
warnings are given. There is only one design block in the one file. 
There are 4 design blocks.

AR #17693 describes some methods for ISE 6.1 . However some of the 
options do not exist in 10.1.

Using a previous snapshot does not restore the ultiple hierarchical 
netlist files setting.

What I would like to know is to restore *all* of the defaults. Then I 
can enable multiple hierarchical netlist files.

Are there any more modern explanations available e.g. for webpack 10.1?

How can I save settings like this (somewhere else) so that I can restore 
them easily?

Apologies for such a long post I am running out of ideas. Andy B

Article: 141793
Subject: Re: How to interpret polyphase coefficients generated in MATLAB
From: Dirk Bell <bellda2005@cox.net>
Date: Thu, 9 Jul 2009 11:47:16 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 8, 6:44=A0pm, vizziee <vizz...@gmail.com> wrote:
> > Check your equations. =A0I get
>
> > ( ( (22-1)/2 )/20 + ( (22-1)/2 )/10 + ( (1010-1)/2 )/5 ) =3D 102.4750
> > which is consistant with what you are actually getting, not what you
> > are calculating.
>
> > Also note that the filters you are identifying as half-band filters
> > are not actually half-band filters. =A0Half-band filters have the
> > property that almost half of the coefficients are zero, saving
> > calculations, which may be useful to you. But, among other properties,
> > the lengths of halfband filters are constrained to {3, 7, 11, 15, 19,
> > 23, 27, ..., previous+4, ...}, so a filter of length 22 isn't a half-
> > band filter.
>
> > Dirk Bell
>
> Thanks Dirk. I checked the Halfband filters that I generated in
> MATLAB. It says it has Direct Form Polyphase Filter structure. The
> coefficients were generated by specifying 'halfband' option to
> fdesign.decimator function. The number of generated coefficients for
> this half-band decimator is actually 21 and I padded a zero to make it
> 22 for a 2-path polyphase implementation.
>
> I did a reading for Half-band filters and what you say about the
> HalfBand FIR filter coefficients is absolutely true: the number of
> coefficients should be 3+4n for a non-negative integer n. However I
> couldn't reconcile the MATLAB halfband filter generated above with
> this definition. Are halfband decimators different than the Nyquist
> half-band filters? Also the way you calculated the delay of half-band
> filters appears very much true. However assuming this is the polyphase
> implementation of half-band decimators, shouldn't delay be calculated
> like a standard polyphase filter delay formula: ((No_of_taps-1)/2)/
> Decimation_Factor?
>
> Thanks again for your insightful replies earlier. I could drastically
> reduce the no of taps in my current design while also bettering the
> response. Though the questions as above still linger in my mind.
>
> Regards,
>
> vizziee.

Vizziee,

Would you post the coefficients (or preferably all MATLAB inputs and
resulting output) from your halfband design.

BTW a halfband filter could have a length 21 if it is really a length
19 filter with a zero added to each end, but that would be a waste of
computation if you used the zero coefficients.

Dirk Bell
DSP Consultant

Article: 141794
Subject: Generating a negated clock
From: Nemesis <nemesis@nowhere.invalid>
Date: 09 Jul 2009 20:48:34 GMT
Links: << >> << T >> << A >>

Hi all,
I'm using a Virtex4.

My project uses several clock generated by a DCM wich receive the
external 100MHz.

For every of this clock I need the clock and its negated version. DCM
gives 180° version of almost every output but not for the CLKDIV output.
Is possible to generate in a simple way a negated clock and have it
recognized by the syntesizer (ISE8.2) like a real clock?

For example is it possible to use just a not port and put the exit in a
clock buffer?

Regards.
-- 
Accomplishing the impossible means only that the boss will add it to
your regular duties.
 _  _                  _
| \| |___ _ __  ___ __(_)___
| .` / -_) '  \/ -_|_-< (_-<
|_|\_\___|_|_|_\___/__/_/__/ http://xpn.altervista.org

Article: 141795
Subject: How to implementa an FSM in block ram
From: fl <rxjwg98@gmail.com>
Date: Thu, 9 Jul 2009 14:14:31 -0700 (PDT)
Links: << >> << T >> << A >>

Hi,
Block ram in FPGA can implement complex FSM, see the cited clue below.
My question here is how to convert the VHDL FSM to the block ram
contents? Is there a tool to combine the logic bits to the ram
content? Thanks.












Use the 4K bit RAM, properly initialized during configuration. So it
really is a
ROM, since you never write into it.
Use it as 512 x 8 ROM. Feed 5 of the 8 outputs back to the input and
use the
remaining 4 inputs as control.
You now have a 32-state FSM with 4 condition inputs, and 3 extra
arbitrarily decoded
outputs, beyond the 5 encoded outputs. You can define everything, like
recovery from
illegal states, etc. No holes.


In Virtex-II the ROM is bigger, 18K bits.
So you can have a 128-state FSM with 4 control inputs, with the
ROMconfigured 2K x
9.
Or 64 states with 5 control inputs.

Article: 141796
Subject: Re: Generating a negated clock
From: "Andrew Holme" <ah@nospam.co.uk>
Date: Thu, 9 Jul 2009 22:43:45 +0100
Links: << >> << T >> << A >>


"Nemesis" <nemesis@nowhere.invalid> wrote in message 
news:20090709204834.4086.18180.XPN@orion.invalid...
> Hi all,
> I'm using a Virtex4.
>
> My project uses several clock generated by a DCM wich receive the
> external 100MHz.
>
> For every of this clock I need the clock and its negated version. DCM
> gives 180° version of almost every output but not for the CLKDIV output.
> Is possible to generate in a simple way a negated clock and have it
> recognized by the syntesizer (ISE8.2) like a real clock?
>
> For example is it possible to use just a not port and put the exit in a
> clock buffer?

Slices can be configured to work on either edge.  You can use a mixture of:

always @ (posedge clk)
always @ (negedge clk)

and only one global clock resource is used.

Article: 141797
Subject: pullup
From: nobody <cydrollinger@gmail.com>
Date: Thu, 9 Jul 2009 16:58:30 -0700 (PDT)
Links: << >> << T >> << A >>

I have a mom switch normally open switching pin 71, general I/O, on a
Coolrunner II CPLD, XC2C64A VQ100 to ground. I have an LED,  165 ohm
connecting 3.3V into pin 8, general I/O, therefore a low emits
radiation at 532nm, it does. Having enabled a pullup resistor on pin
7i in the ucf, the switch i expected a momentary on and off again on
the LED when pressed and released. Instead, the LED lights and stays
on. I want the LED to follow the switch and not a one shot event. Any
ideas would be appreciated.

Sincerely,
Cy Drollinger

Article: 141798
Subject: Re: Multipliers and CORDIC cores
From: Rai <raifasih@gmail.com>
Date: Thu, 9 Jul 2009 21:52:54 -0700 (PDT)
Links: << >> << T >> << A >>

Hi Rick!

thnx for your kind help and assistance. I got some of your points and
will this work for signed numbers? I am using using signed
multiplication using IP core generator. I'll use 100 of these
multipliers and that makes a big problem while we are talking about
fpga. It can't be implement on a single fpga considering the fact of
area constraint. I also need my result in 1 cliock cycle for 16x16
multiplication. I am considering BOOTH Recoding algorithm and i want
to know is there any algorithm which is most efficient then BOOTH
algo?

Regards
Rai

rickman wrote:
> On Jul 8, 6:20 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> > Rai <raifa...@gmail.com> wrote:
> > > Q.1. I have few questions. I want to know which is the most efficient
> > > multiplier algorithm which can be implement on FPGA? I used core
> > > generator for multiplier but it utilizes more CLB as i needed. I want
> > > efficient and area constrained algorithm. Is there any algorithm
> > > considering the efficiency both in area and timing?
> >
> > If you aren't in a hurry you could do an iterative multiplier
> > which generates some number of product bits per clock cycle.
> >
> > You can multiply an M bit number by an N bit number in M clock
> > cycles with an N bit adder and 2N bit shift register, for example.
>
> That is exactly what I did for my current design.  A 16x16 multiply
> done using a <= b*c; in VHDL used some 300 LUTs and executes in 1
> clock cycle (combinatorial).  I coded it to use 1 clock per multiplier
> bit and it was reduced to about 62 LUTs.  I found it was more
> efficient to use an unsigned multiplier with detection of the sign of
> the multiplier and negation of both inputs when the multiplier is
> negative than to use Booth's algorithm.  The difference was not large
> however.  Since the multiplier is used one bit at a time, the negative
> of it can be calculated in a bit serial subtraction circuit using a
> couple of LUTs including the carry term.   In essence the sign of the
> multiplier and the current bit of the multiplier determines whether
> the multiplicand, the negative of the multiplicand or zero is added to
> the product.
>
> BTW, I don't count CLBs since they are most often not the limiting
> measurable unit in an FPGA.  CLBs are counted as used if one or all of
> the internal objects are used.  In a modern FPGA a CLB has 4 or 8
> LUTs, IIRC.  So I count the LUTs.
>
>
> > > Q.2. I need to know that is square root in fpga core uses CLB or it is
> > > hard IP?
> >
> > There are iterative square root algorithms, too.  Well, most are,
> > but there are convenient shift and subtract algorithms for
> > simple hardware implementation.
> >
> > -- glen
>
> I have never explored square root algorithms, but I once worked for a
> company that made array processors (mini computer sized DSP
> computers).  They used an iterative algorithm to do divide and square
> root.  I recall that they used seven iterations rather than to
> continue computing until the error term got to within a limit as is
> specified by most algorithms.  I guess a known time of computation was
> more important than getting a known accuracy.  I think they used
> Newton-Raphson, but again, I can't be certain.  I believe I have read
> that there are other, better algorithms, but they may be more
> complex.  Newton-Raphson is easy to implement in hardware.
>
> When you ask about "hard" IP, I don't think there are hard wired
> square root circuits in any FPGAs that I know of.
>
> Rick

Article: 141799
Subject: Re: Multipliers and CORDIC cores
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 10 Jul 2009 04:58:35 +0000 (UTC)
Links: << >> << T >> << A >>

Rai <raifasih@gmail.com> wrote:

> thnx for your kind help and assistance. I got some of your points and
> will this work for signed numbers? I am using using signed
> multiplication using IP core generator. I'll use 100 of these
> multipliers and that makes a big problem while we are talking about
> fpga. It can't be implement on a single fpga considering the fact of
> area constraint. I also need my result in 1 cliock cycle for 16x16
> multiplication. I am considering BOOTH Recoding algorithm and i want
> to know is there any algorithm which is most efficient then BOOTH
> algo?

You don't say how big your FPGA is, or how much else you need.
100 multiplies in one clock cycle sounds like a lot.

Can you use pipelined multipliers that give N products in N
clock cycles, where N might be the width of the multiplier?

-- glen

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search