Messages from 153225

Article: 153225
Subject: balancing IIR filter (after adding extra registers)
From: "zak" <kazimayob2@n_o_s_p_a_m.aol.com>
Date: Thu, 12 Jan 2012 06:49:42 -0600
Links: << >> << T >> << A >>

I designed a low pass IIR filter in starix iv but I got speed problem. I
need to run it on 245MHz but can only achieve about 180. I was advised by
experts to insert extra registers and this improved speed but the output of
filter went wrong. 

I was advised to balance the filter since I inserted extra registers. But
how ?

I did some modeling and realized with a surprise that it seems just not
possible that I can balance any IIR filter(but can with FIR filter).

Has anybody any idea about balancing IIR filters. The difficulty is in the
feedback terms.

The filter I am using is Yn = (1-alpha)*Xn + alpha*Yn-1 

Thanks in advance

	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153226
Subject: Re: Can't get the Xilinx cable drivers installed on SL6.1 (RHEL
From: Jan Pech <invalid@void.domain>
Date: Thu, 12 Jan 2012 14:12:34 +0100
Links: << >> << T >> << A >>

On Thu, 2012-01-12 at 06:45 -0600, ngill wrote:
> Jan,
>=20
> Thanks for the tutorial.  Im attempting to do the same thing with
> SL6.1/CentOS6/RHEL6 and the Xilinx 10.1.03 tools with no success.  After =
I
> configure everything as you prescribe when I run impact and try to read t=
he
> jtag chain it bombs and says "Module windrvr6 is not loaded. Please
> reinstall the cable drivers".  Your solution should bypass the windrvr6
> module correct?  Any idea why it would still be looking for it?
>=20
> Thanks,
> Nat
>=20
> 	  =20
> 				=09
> ---------------------------------------	=09
> Posted through http://www.FPGARelated.com


Xilinx ISE version 10 uses the windrvr6 module by default. You have to
set the XIL_IMPACT_USE_LIBUSB environment variable to force the tools to
use the libusb. See http://www.xilinx.com/support/answers/29310.htm for
details.

Jan

Article: 153227
Subject: Re: balancing IIR filter (after adding extra registers)
From: Tim Wescott <tim@seemywebsite.com>
Date: Thu, 12 Jan 2012 12:09:09 -0600
Links: << >> << T >> << A >>

On Thu, 12 Jan 2012 06:49:42 -0600, zak wrote:

> I designed a low pass IIR filter in starix iv but I got speed problem. I
> need to run it on 245MHz but can only achieve about 180. I was advised
> by experts to insert extra registers and this improved speed but the
> output of filter went wrong.
> 
> I was advised to balance the filter since I inserted extra registers.
> But how ?
> 
> I did some modeling and realized with a surprise that it seems just not
> possible that I can balance any IIR filter(but can with FIR filter).
> 
> Has anybody any idea about balancing IIR filters. The difficulty is in
> the feedback terms.
> 
> The filter I am using is Yn = (1-alpha)*Xn + alpha*Yn-1
> 
> Thanks in advance

There have to be books on this...

What's holding up the train?  The addition?  The multiplication?  The 
logic in between?

I don't do FPGA design anywhere near full time -- does the Stratix IV 
have hardware multiply?  Hardware add?  Perhaps even hardware MAC?  If it 
has a hardware multiply-and-add, then you need to make sure you're using 
it efficiently.

If all else fails and you just have to put in delays, then all is not 
lost (presuming that you can stand some delay in the output).  You're 
designing a pretty elementary low-pass filter, so the first thing you can 
do is just see what happens when you stick some extra delay in there.

Let

y_n = a^2 * y_{n-2} + (1 - a^2) * x_{n-1}

This should be easier to realize than your difference equation.

Now perform a z-transform on this (see:
http://www.wescottdesign.com/articles/zTransform/z-transforms.html, and 
please forgive any broken links, &c):

Y(z) = z^-2 * a^2 * Y(z) + z^-1 * (1 - a^2) * X

and solve for the transfer function:

       Y(z)   (1 - a^2) z    (1 - a^2) z
H(z) = ---- = ----------- = --------------
       X(z)    z^2 - a^2    (z - a)(z + a)

If you limit |a| < 1, then H(z) is stable, (and an unstable system is the 
first "wrong" that you might encounter) but while it has a generally low-
pass character up to Fs/4 (Fs = sampling frequency), the response rises 
after that back to unity -- and that's bad.

If you doctor this up a bit with a felicitously placed zero, then you can 
get

       Y(z)   0.5 (1 - a^2) (z + 1)
H(z) = ---- = ---------------------
       X(z)      (z - a)(z + a)

There are a number of ways that you can achieve this, but your result is 
going to be a filter with unity gain at DC (good), the same general 
transfer function as your example difference equation (good), except at 
Fs/2 where the response will be zero (better than yours), and -- 
hopefully -- the extra delay in the difference equation will be enough to 
let you pipeline your math enough to realize this thing and get the speed 
you need.

-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

Article: 153228
Subject: Virtex 5 GC clock pin vs GC//CC clock pins
From: Michael <michael_laajanen@yahoo.com>
Date: Fri, 13 Jan 2012 00:31:40 +0100
Links: << >> << T >> << A >>

Hi,

What is the difference between a GC clock pin and a GC/CC clock pin(I 
dont mean a CC pin I mean a GC/CC pin)?

Such as below for a V5 xc5vlx50t, package ff665,

AB14|adc2_dco_p|IOB|IO_L7P_GC_VRN_4|INPUT|LVDS_25|4||||NONE||LOCATED|NO|DIFF_TERM|
AB15|adc1_dco_p|IOB|IO_L9P_CC_GC_4|INPUT|LVDS_25|4||||NONE||LOCATED|NO|DIFF_TERM|

Any disadvantage with the GC/CC compared to a pure GC pin?

/michael

Article: 153229
Subject: Re: balancing IIR filter (after adding extra registers)
From: "Morten Leikvoll" <mleikvol@yahoo.nospam>
Date: Fri, 13 Jan 2012 09:29:39 +0100
Links: << >> << T >> << A >>

"zak" <kazimayob2@n_o_s_p_a_m.aol.com> wrote in message 
news:Ne-dndbs46R7S5PSnZ2dnUVZ_s2dnZ2d@giganews.com...
>I designed a low pass IIR filter in starix iv but I got speed problem. I
> need to run it on 245MHz but can only achieve about 180. I was advised by
> experts to insert extra registers and this improved speed but the output 
> of
> filter went wrong.
>
> I was advised to balance the filter since I inserted extra registers. But
> how ?
>
> I did some modeling and realized with a surprise that it seems just not
> possible that I can balance any IIR filter(but can with FIR filter).
>
> Has anybody any idea about balancing IIR filters. The difficulty is in the
> feedback terms.
>
> The filter I am using is Yn = (1-alpha)*Xn + alpha*Yn-1


My guess is you need to add some registers on your input and outputs (and 
add latencty). I guess this implementation gets put into a DSP core within a 
tiny area and io's has to run a long distance before getting there or 
getting out.. The feedback has dedicated routing inside a DSP and should be 
very fast. Do you know that this gets implemented in a DSP or does the tool 
try to build it with gates?

To really be able to help I would like to see the source, the timing report 
and details and/or knowledge about input and outputs (like are they IO 
pins?) of this IIR.

Article: 153230
Subject: Re: balancing IIR filter (after adding extra registers)
From: nico@puntnl.niks (Nico Coesel)
Date: Fri, 13 Jan 2012 08:44:15 GMT
Links: << >> << T >> << A >>

"zak" <kazimayob2@n_o_s_p_a_m.aol.com> wrote:

>I designed a low pass IIR filter in starix iv but I got speed problem. I
>need to run it on 245MHz but can only achieve about 180. I was advised by
>experts to insert extra registers and this improved speed but the output of
>filter went wrong. 
>
>I was advised to balance the filter since I inserted extra registers. But
>how ?
>
>I did some modeling and realized with a surprise that it seems just not
>possible that I can balance any IIR filter(but can with FIR filter).
>
>Has anybody any idea about balancing IIR filters. The difficulty is in the
>feedback terms.
>
>The filter I am using is Yn = (1-alpha)*Xn + alpha*Yn-1 

You can't use much registers and just adding registers will make
routing worse, not better.

Your filter seems to consist of 2 multipliers and an adder. The first
optimisation you can do is using one's complement instead of two's
complement. When using one's complement you don't need to sign extend
the multiplicants. In Xilinx FPGAs the multiplipliers get faster when
you use less bits.

-- 
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

Article: 153231
Subject: Re: Virtex 5 GC clock pin vs GC//CC clock pins
From: Gabor <gabor@szakacs.invalid>
Date: Fri, 13 Jan 2012 18:22:13 -0500
Links: << >> << T >> << A >>

Michael wrote:
> Hi,
> 
> What is the difference between a GC clock pin and a GC/CC clock pin(I 
> dont mean a CC pin I mean a GC/CC pin)?
> 
> Such as below for a V5 xc5vlx50t, package ff665,
> 
> AB14|adc2_dco_p|IOB|IO_L7P_GC_VRN_4|INPUT|LVDS_25|4||||NONE||LOCATED|NO|DIFF_TERM| 
> 
> AB15|adc1_dco_p|IOB|IO_L9P_CC_GC_4|INPUT|LVDS_25|4||||NONE||LOCATED|NO|DIFF_TERM| 
> 
> 
> Any disadvantage with the GC/CC compared to a pure GC pin?
> 
> /michael

I think this was answered in the Xilinx forums, but basically the GC pin
has dedicated routing to global resources (BUFGMUX, DCM, PLL) and the CC
pin has dedicated routing to local resources (BUFR, BUFIO).  A GC_CC pin
would therefore have both capabilities.  In the older series where local
clocking resources first showed up (Spartan 3E, 3A?) there was no
advantage to the local clocking if you already had the global
connections.  In the newer parts, the local clocking can run faster
than the max toggle rate of the global routes, so there is some
advantage to adding the CC capability to a GC pin.

-- Gabor

Article: 153232
Subject: Re: balancing IIR filter (after adding extra registers)
From: "zak" <kazimayob2@n_o_s_p_a_m.n_o_s_p_a_m.aol.com>
Date: Sat, 14 Jan 2012 15:49:06 -0600
Links: << >> << T >> << A >>

Thanks all for the replies.

My main concern was not the timing per se as I may eventually get over it.
But specifically "Can we balance a given IIR filter" if we have to add
extra registers??

In my simple filter design there is chain of [a subtractor=> a multiplier=>
a subtractor] without any register in between. Obviously this causes long
paths and need be broken by registers according to RTL methodology.

I understand Tim is suggesting redesigning IIR with inherent registers in
it. It is interesting idea and I managed to verify that the suggested final
filter is better than mine but still it will have - I believe - some long
paths.

Regards

Zak	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153233
Subject: Re: balancing IIR filter (after adding extra registers)
From: Tim Wescott <tim@seemywebsite.please>
Date: Sat, 14 Jan 2012 22:00:46 -0600
Links: << >> << T >> << A >>

On Sat, 14 Jan 2012 15:49:06 -0600, zak wrote:

> Thanks all for the replies.
> 
> My main concern was not the timing per se as I may eventually get over
> it. But specifically "Can we balance a given IIR filter" if we have to
> add extra registers??
> 
> In my simple filter design there is chain of [a subtractor=> a
> multiplier=> a subtractor] without any register in between. Obviously
> this causes long paths and need be broken by registers according to RTL
> methodology.
> 
> I understand Tim is suggesting redesigning IIR with inherent registers
> in it. It is interesting idea and I managed to verify that the suggested
> final filter is better than mine but still it will have - I believe -
> some long paths.

Actually what I was suggesting was a difference equation that you might 
be able to realize with a structure that has more pipelining, not 
something that you would attempt to implement directly.

Pipelining is for you to do -- I'm just being the math egghead.

Whether just one clock worth of delay is going to be enough to do all the 
pipelining you need -- I dunno.

OTOH, the math itself imposes no limit to the amount of delay you can 
have in the filter -- you can have three, four, or 1000 clocks worth.  
But each delay you add puts a null in the response and increases the 
overall delay of the filter; at some point the null will encroach on your 
desired response and that would be a Bad Thing.

The difference equation is easy:

y_n = d^N * y_{n-N} + (1-d^N) * (1/N) * sum from {k=0} to {N-1} x_{n-k}

This gives you the transfer function

H(z) = ((1-d^N)/N)*(z^(N-1) + ... + z + 1) / (z^N - d^N),

the denominator is basically the same old difference equation, only with 
as much delay as you need for pipelining.  The numerator describes a CIC 
filter, which is the Easiest FIR of All.

Presumably, in order to pipeline this effectively you'd have to add an 
additional N counts of delay -- at some point the filter output is going 
to be useless to you just because of delay, if nothing else.

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

Article: 153234
Subject: Re: balancing IIR filter (after adding extra registers)
From: Tim Wescott <tim@seemywebsite.please>
Date: Sat, 14 Jan 2012 22:02:30 -0600
Links: << >> << T >> << A >>

On Thu, 12 Jan 2012 06:49:42 -0600, zak wrote:

> I designed a low pass IIR filter in starix iv but I got speed problem. I
> need to run it on 245MHz but can only achieve about 180. I was advised
> by experts to insert extra registers and this improved speed but the
> output of filter went wrong.
> 
> I was advised to balance the filter since I inserted extra registers.
> But how ?
> 
> I did some modeling and realized with a surprise that it seems just not
> possible that I can balance any IIR filter(but can with FIR filter).
> 
> Has anybody any idea about balancing IIR filters. The difficulty is in
> the feedback terms.
> 
> The filter I am using is Yn = (1-alpha)*Xn + alpha*Yn-1

I got another thought.

What frequency are you filtering _to_?  Why are you using an IIR at all?  
If you are filtering heavily enough you should be able to prefilter with 
a CIC, decimate, and run your IIR filter (if you still need it) at a 
lower rate.  Would that meet your requirements?

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

Article: 153235
Subject: Re: balancing IIR filter (after adding extra registers)
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Sun, 15 Jan 2012 11:10:22 +0000 (UTC)
Links: << >> << T >> << A >>

Tim Wescott <tim@seemywebsite.please> wrote:

(snip)
> Actually what I was suggesting was a difference equation that you might 
> be able to realize with a structure that has more pipelining, not 
> something that you would attempt to implement directly.

> Pipelining is for you to do -- I'm just being the math egghead.

> Whether just one clock worth of delay is going to be enough to do all the 
> pipelining you need -- I dunno.

> OTOH, the math itself imposes no limit to the amount of delay you can 
> have in the filter -- you can have three, four, or 1000 clocks worth.  
> But each delay you add puts a null in the response and increases the 
> overall delay of the filter; at some point the null will encroach on your 
> desired response and that would be a Bad Thing.

The way I usually think about this, partly because of the way ones
I work on are used, is that with added pipelining you can run
interleaved data streams. Now popularly known as Simultaneous
Multithreading, instead of processing one data stream faster, process
many data streams at about the same speed. (Once in a while, remind
the marketing department of the difference. Too often they quote
the faster speed without qualification.)

-- glen

Article: 153236
Subject: Re: balancing IIR filter (after adding extra registers)
From: "zak" <kazimayob2@n_o_s_p_a_m.n_o_s_p_a_m.aol.com>
Date: Sun, 15 Jan 2012 07:40:54 -0600
Links: << >> << T >> << A >>


Let me explain myself further.

A filter (FIR or IIR) has obviously its own terms(z terms of transfer
function) which are implemented as registers as you know(let us call them 
term registers).

On the other hand device speed may require its own registers(I call it
pipeline registers).

I am not worried about input to output delay(let it be 10s of clock
periods)
i.e I can insert registers at input and output freely. 
But inside filter stages I need care to keep filter transfer function 
accurate. For FIR, computations are forward and the rule I found is that if
I
need to delay any FIR term I should delay all its other terms equally. For
IIR filter, there are both forward and feedback computations. I can delay 
forward terms equally and the result stays correct upto to its end but I 
cannot do that for feedback terms.

example: suppose y(n) = a*x(n) + b*y(n-1)

Obviously meaning current output = a*current input + b*previous output.
Suppose I wanted to use a structure that ended up with no register between
result of b*y(n-1)and adder. So I decided to add a pipeline register. This

implies that I am adding b*y(n-2) which can be correct if I added it to 
a*x(n-1).
so I delayed x input and this makes adder result as a*x(n-1). But this
also
means now feedback term becomes b*y(n-3) naturally.

Am I missing the obvious?

Zak



  	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153237
Subject: Re: balancing IIR filter (after adding extra registers)
From: Tim Wescott <tim@seemywebsite.please>
Date: Sun, 15 Jan 2012 13:37:04 -0600
Links: << >> << T >> << A >>

On Sun, 15 Jan 2012 07:40:54 -0600, zak wrote:

> Let me explain myself further.
> 
> A filter (FIR or IIR) has obviously its own terms(z terms of transfer
> function) which are implemented as registers as you know(let us call
> them term registers).
> 
> On the other hand device speed may require its own registers(I call it
> pipeline registers).
> 
> I am not worried about input to output delay(let it be 10s of clock
> periods)
> i.e I can insert registers at input and output freely. But inside filter
> stages I need care to keep filter transfer function accurate. For FIR,
> computations are forward and the rule I found is that if I
> need to delay any FIR term I should delay all its other terms equally.
> For IIR filter, there are both forward and feedback computations. I can
> delay forward terms equally and the result stays correct upto to its end
> but I cannot do that for feedback terms.
> 
> example: suppose y(n) = a*x(n) + b*y(n-1)
> 
> Obviously meaning current output = a*current input + b*previous output.
> Suppose I wanted to use a structure that ended up with no register
> between result of b*y(n-1)and adder. So I decided to add a pipeline
> register. This
> 
> implies that I am adding b*y(n-2) which can be correct if I added it to
> a*x(n-1).
> so I delayed x input and this makes adder result as a*x(n-1). But this
> also
> means now feedback term becomes b*y(n-3) naturally.
> 
> Am I missing the obvious?

Let us say that you want to do an operation

a = b * c + d,

and that this operation can only be done with three stages of pipeline 
delay, such that a is good at the beginning of the 3rd clock after you 
start:

a_n = b * c_{n-3} + d

(Let's also say that you're doing this in a true pipeline, so that a_3, 
a_4, ... are all good assuming that c_0, c_1, ... are good)

Let c_{n-3} = a_{n-3}, which we can do by definition because a is good at 
the beginning of the 3rd clock.

Now we have

a_n = b * a_{n-3} + d

Do I need to continue, or is it all obvious?

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

Article: 153238
Subject: Effective square root algorithms implemented on FPGAs already
From: "dpetrov" <dimitar_petrov@n_o_s_p_a_m.abv.bg>
Date: Sun, 15 Jan 2012 20:53:35 -0600
Links: << >> << T >> << A >>

Hello guys,

I'm trying to find a little bit more information for efficient square root
algorithms which are most likely implemented on FPGA. A lot of algorithms
are found already but which one are for example from Intel or AMD? By
efficient I 
mean they are either really fast or they don't need much memory.
Could anyone mention some or point me some resources where I can get more
information?

Thanks!

	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153239
Subject: Re: Effective square root algorithms implemented on FPGAs already
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Mon, 16 Jan 2012 03:07:10 +0000 (UTC)
Links: << >> << T >> << A >>

dpetrov <dimitar_petrov@n_o_s_p_a_m.abv.bg> wrote:

> I'm trying to find a little bit more information for efficient square root
> algorithms which are most likely implemented on FPGA. A lot of algorithms
> are found already but which one are for example from Intel or AMD? By
> efficient I 
> mean they are either really fast or they don't need much memory.
> Could anyone mention some or point me some resources where I can get more
> information?

The usual software implementation, especially in floating point,
is Newton-Raphson based. With a good starting value, and appopriate
exponent adjustment, it is about two cycles for single precision
and four for double, with a divide in each cycle.

There is an algorithm that is mostly shift and add (or subtract),
slightly similar to shift and subtract divide algorithms, a binary
implementation of the pencil and paper square root algorithm that was
used before calculators became common. There is a related algorithm
used to do square root on an abacus.

-- glen

Article: 153240
Subject: Re: Effective square root algorithms implemented on FPGAs already
From: "dpetrov" <dimitar_petrov@n_o_s_p_a_m.n_o_s_p_a_m.abv.bg>
Date: Mon, 16 Jan 2012 02:43:48 -0600
Links: << >> << T >> << A >>

I thought that the abacus finds just the integer part of the square root
and it's not working on floating point numbers?

>dpetrov <dimitar_petrov@n_o_s_p_a_m.abv.bg> wrote:
>
>> I'm trying to find a little bit more information for efficient square
root
>> algorithms which are most likely implemented on FPGA. A lot of
algorithms
>> are found already but which one are for example from Intel or AMD? By
>> efficient I 
>> mean they are either really fast or they don't need much memory.
>> Could anyone mention some or point me some resources where I can get
more
>> information?
>
>The usual software implementation, especially in floating point,
>is Newton-Raphson based. With a good starting value, and appopriate
>exponent adjustment, it is about two cycles for single precision
>and four for double, with a divide in each cycle.
>
>There is an algorithm that is mostly shift and add (or subtract),
>slightly similar to shift and subtract divide algorithms, a binary
>implementation of the pencil and paper square root algorithm that was
>used before calculators became common. There is a related algorithm
>used to do square root on an abacus.
>
>-- glen
>	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153241
Subject: Re: Effective square root algorithms implemented on FPGAs already
From: Jon Beniston <jon@beniston.com>
Date: Mon, 16 Jan 2012 01:32:15 -0800 (PST)
Links: << >> << T >> << A >>

Search for the paper:

Implementation of Single Precision Floating Point Square Root on FPGAs
- by Yamin Li and Wanming Chu.

Cheers,
Jon

Article: 153242
Subject: Re: Effective square root algorithms implemented on FPGAs already
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Mon, 16 Jan 2012 09:41:49 +0000 (UTC)
Links: << >> << T >> << A >>

dpetrov <dimitar_petrov@n_o_s_p_a_m.n_o_s_p_a_m.abv.bg> wrote:

> I thought that the abacus finds just the integer part of the square root
> and it's not working on floating point numbers?

Fixed point, but you can move the decimal point anywhere you want to.
One position in the square root for each two positions in the argument.
That is, sqrt(2) = sqrt(2000000)/1000.  Just like the slide rule,
you have to remember the position of the decimal point.

I once tested the algorithm to calculate sqrt(2) to six places.
(The digits of the starting number decrease in pairs, as the digits
of the square root increase. I had to make up some tricks to get six.

I bought a cheap abacus with a little booklet of algorithms in the 
Chinatown of some large city. There is also a cube root algorithm,
but you need two abaci for a reasonable number of digits.
(You keep the accumulating cube root and its square as you go.)

-- glen

Article: 153243
Subject: Re: Effective square root algorithms implemented on FPGAs already
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Mon, 16 Jan 2012 09:44:53 +0000 (UTC)
Links: << >> << T >> << A >>

Jon Beniston <jon@beniston.com> wrote:
> Search for the paper:

> Implementation of Single Precision Floating Point Square Root on FPGAs
> - by Yamin Li and Wanming Chu.

Probably not so hard. The hard one to do on FPGAs is floating point
addition. (Pre and post normalization are much bigger than the add.)

You need to multiply by sqrt(2) if the incoming exponent is odd,
otherwise it is as easy as fixed point.

-- glen

Article: 153244
Subject: Re: Effective square root algorithms implemented on FPGAs already
From: "dpetrov" <dimitar_petrov@n_o_s_p_a_m.n_o_s_p_a_m.abv.bg>
Date: Mon, 16 Jan 2012 08:48:35 -0600
Links: << >> << T >> << A >>

Thanks for hints guys.

@Glen: Could you please explain why it's hard to it on FPGAs?

I got a little bit confused right now ;) 
Most of the algorithms which are already implemented on FPGAs are floating
point right?

Cheers,
Dimitar

>Jon Beniston <jon@beniston.com> wrote:
>> Search for the paper:
>
>> Implementation of Single Precision Floating Point Square Root on FPGAs
>> - by Yamin Li and Wanming Chu.
>
>Probably not so hard. The hard one to do on FPGAs is floating point
>addition. (Pre and post normalization are much bigger than the add.)
>
>You need to multiply by sqrt(2) if the incoming exponent is odd,
>otherwise it is as easy as fixed point.
>
>-- glen
>	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153245
Subject: Re: Effective square root algorithms implemented on FPGAs already
From: Rob Gaddi <rgaddi@technologyhighland.invalid>
Date: Mon, 16 Jan 2012 09:39:44 -0800
Links: << >> << T >> << A >>

On Mon, 16 Jan 2012 08:48:35 -0600
"dpetrov" <dimitar_petrov@n_o_s_p_a_m.n_o_s_p_a_m.abv.bg> wrote:

> Thanks for hints guys.
> 
> @Glen: Could you please explain why it's hard to it on FPGAs?
> 
> I got a little bit confused right now ;) 
> Most of the algorithms which are already implemented on FPGAs are floating
> point right?
> 
> Cheers,
> Dimitar

Not at all.  I churn out signal processing designs in FPGAs pretty constantly and have never once implemented a floating-point design.  They're very expensive in terms of hardware used, as compared to fixed-point.  I'm not saying that there aren't plenty of people out there, including on this group, doing floating.  But I think you'd find that floating point designs are in the minority, especially on lower-end (Spartan, Cyclone) FPGAs. 

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.

Article: 153246
Subject: Re: balancing IIR filter (after adding extra registers)
From: davew <david.wooff@gmail.com>
Date: Mon, 16 Jan 2012 10:12:32 -0800 (PST)
Links: << >> << T >> << A >>

On Jan 12, 12:49=A0pm, "zak" <kazimayob2@n_o_s_p_a_m.aol.com> wrote:
> I designed a low pass IIR filter in starix iv but I got speed problem. I
> need to run it on 245MHz but can only achieve about 180. I was advised by
> experts to insert extra registers and this improved speed but the output =
of
> filter went wrong.
>
> I was advised to balance the filter since I inserted extra registers. But
> how ?
>
> I did some modeling and realized with a surprise that it seems just not
> possible that I can balance any IIR filter(but can with FIR filter).
>
> Has anybody any idea about balancing IIR filters. The difficulty is in th=
e
> feedback terms.
>
> The filter I am using is Yn =3D (1-alpha)*Xn + alpha*Yn-1
>
> Thanks in advance
>
> ---------------------------------------
> Posted throughhttp://www.FPGARelated.com

If you can design the module so it processes a specified (or
parameterised) number of channels it is fairly straightforward.  (You
can  design it this way and then simply use only one channel).  Say
your "processing path" i.e. the x.k + y.(1-k) calculation has a
pipeline delay of 3 clock cycles overall, then you create a pipeline
delay from output back to the input of 3 - 3 =3D 0 clock cycles (i.e.
direct connection).  However if you have number_of_channels set to say
16, then this pipeline delay would be 16-3 =3D 13 clock cycles long.

That way, the previous output for each channel lines up with the
current input for that same channel.  The number of channels has to be
a minimum of this pipeline delay for it to work.

If you add a clock enable to the whole shebang, then you can enable
the logic for the specified number of channels at the start of the new
sample frame.  At the next sample frame, repeat and the pipelining
takes care of itself.  Everything inside the module must be controlled
by the clock enable though.

Article: 153247
Subject: Re: balancing IIR filter (after adding extra registers)
From: "zak" <kazimayob2@n_o_s_p_a_m.n_o_s_p_a_m.aol.com>
Date: Mon, 16 Jan 2012 15:19:06 -0600
Links: << >> << T >> << A >>

>If you can design the module so it processes a specified (or
>parameterised) number of channels it is fairly straightforward.  (You
>can  design it this way and then simply use only one channel).  Say
>your "processing path" i.e. the x.k + y.(1-k) calculation has a
>pipeline delay of 3 clock cycles overall, then you create a pipeline
>delay from output back to the input of 3 - 3 =3D 0 clock cycles (i.e.
>direct connection).  However if you have number_of_channels set to say
>16, then this pipeline delay would be 16-3 =3D 13 clock cycles long.
>
>That way, the previous output for each channel lines up with the
>current input for that same channel.  The number of channels has to be
>a minimum of this pipeline delay for it to work.
>
>If you add a clock enable to the whole shebang, then you can enable
>the logic for the specified number of channels at the start of the new
>sample frame.  At the next sample frame, repeat and the pipelining
>takes care of itself.  Everything inside the module must be controlled
>by the clock enable though.
>
>

Thanks Dave, 

That is indeed a solution to inserting extra registers, ofcourse provided
one can afford extra slots or just use clock enable and survive higher
clock rates (exploiting multicycle constraints).

In principle I believe you agreed with me of the nature of the problem. 

Zak 

 
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153248
Subject: Re: Effective square root algorithms implemented on FPGAs already
From: "dpetrov" <dimitar_petrov@n_o_s_p_a_m.n_o_s_p_a_m.abv.bg>
Date: Mon, 16 Jan 2012 15:36:43 -0600
Links: << >> << T >> << A >>

Roby, thanks a lot for bringing a little bit light. I really appreciate
that.  I'll take a look and try to clarify to myself cons and pros of
both.

Cheers,
Dimitar

-

>On Mon, 16 Jan 2012 08:48:35 -0600
>"dpetrov" <dimitar_petrov@n_o_s_p_a_m.n_o_s_p_a_m.abv.bg> wrote:
>
>> Thanks for hints guys.
>> 
>> @Glen: Could you please explain why it's hard to it on FPGAs?
>> 
>> I got a little bit confused right now ;) 
>> Most of the algorithms which are already implemented on FPGAs are
floating
>> point right?
>> 
>> Cheers,
>> Dimitar
>
>Not at all.  I churn out signal processing designs in FPGAs pretty
constantly and have never once implemented a floating-point design. 
They're very expensive in terms of hardware used, as compared to
fixed-point.  I'm not saying that there aren't plenty of people out there,
including on this group, doing floating.  But I think you'd find that
floating point designs are in the minority, especially on lower-end
(Spartan, Cyclone) FPGAs. 
>
>-- 
>Rob Gaddi, Highland Technology -- www.highlandtechnology.com
>Email address domain is currently out of order.  See above to fix.
>	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153249
Subject: Re: balancing IIR filter (after adding extra registers)
From: davew <david.wooff@gmail.com>
Date: Mon, 16 Jan 2012 15:11:39 -0800 (PST)
Links: << >> << T >> << A >>

On Jan 16, 9:19=A0pm, "zak" <kazimayob2@n_o_s_p_a_m.n_o_s_p_a_m.aol.com>
wrote:
> >If you can design the module so it processes a specified (or
> >parameterised) number of channels it is fairly straightforward. =A0(You
> >can =A0design it this way and then simply use only one channel). =A0Say
> >your "processing path" i.e. the x.k + y.(1-k) calculation has a
> >pipeline delay of 3 clock cycles overall, then you create a pipeline
> >delay from output back to the input of 3 - 3 =3D3D 0 clock cycles (i.e.
> >direct connection). =A0However if you have number_of_channels set to say
> >16, then this pipeline delay would be 16-3 =3D3D 13 clock cycles long.
>
> >That way, the previous output for each channel lines up with the
> >current input for that same channel. =A0The number of channels has to be
> >a minimum of this pipeline delay for it to work.
>
> >If you add a clock enable to the whole shebang, then you can enable
> >the logic for the specified number of channels at the start of the new
> >sample frame. =A0At the next sample frame, repeat and the pipelining
> >takes care of itself. =A0Everything inside the module must be controlled
> >by the clock enable though.
>
> Thanks Dave,
>
> That is indeed a solution to inserting extra registers, ofcourse provided
> one can afford extra slots or just use clock enable and survive higher
> clock rates (exploiting multicycle constraints).
>
> In principle I believe you agreed with me of the nature of the problem.
>
> Zak
>
> ---------------------------------------
> Posted throughhttp://www.FPGARelated.com

Yes I agree.  Here's a potential solution for you using the principle
of superposition:

1.  Implement a number of the multi-channel filter module instances I
suggested in parallel (assuming you have enough logic resource).  e.g.
Let's say 4:
2.  Feed instance 0 with sample 0, 4, 8... and zero values for
1,2,3,5,6,7,9,10,11 etc
3.  Feed instance 1 with sample 1,5,9... and zero values for
0,2,3,4,6,7,8,10 etc
4.  Feed instance 2 with sample 2,6,10... and zero values for
0,1,3,4,5,7,8,9,11 etc
5.  Feed instance 3 with sample 3,7,11... and zero values for
0,1,2,4,5,6,8,9,10 etc

and so on - you would need a mux to select the real sample or zero
value for the input to each filter module.

Sum the results together (you might have to pipeline this some more to
achieve the required clock rate).

The result will be identical to a single instance processing all the
samples but with higher raw performance.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search