Messages from 158325

Article: 158325
Subject: Re: Sum of 8 numbers in FPGA
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 22 Oct 2015 01:08:51 +0000 (UTC)
Links: << >> << T >> << A >>

David Wade <dave.g4ugm@gmail.com> wrote:
> On 20/10/2015 22:52, jim.brakefield@ieee.org wrote:
>> On Tuesday, October 20, 2015 at 6:40:35 AM UTC-5, b2508 wrote:
>>> How do I most efficiently add 8 numbers in FPGA?
>>> What is the best way to save LUTs?
>>> How is data width affecting LUT consumption?

> Why not try it out. Run one of the tool chains and see what happens when 
> you build adder in different ways and  then if its not what you expect 
> come and ask on here. The tool chains will show you what the LUT usage 
> is. I was a tad suprised to find that when I coded:-

> byteout <= byte1+byte2+byte3+byte4+byte5+byte6+byte7+byte8 ;

> and compared it with

> temp1 <= byte1 + byte2 + byte3 + byte4 ;
> temp2 <= byte5 + byte6 + byte7 + byte8 ;
> byteout <= temp1 + temp2 ;

> I got the same number of LUTs and Slices used....

Yes, the optimizers can likely figure that one out.

Some years ago, I needed a 36 bit population count.
That is, how many '1' bits there are in a 36 bit word.

The usual way to make one is with carry save adders, so
I build one up, I think first 8 bits, and then combined those.

It was a little unusual, since I needed to know 0, 1, 2, 3, more than 3.

It wasn't hard to make, but it turns out that if you just say:

  p=x[0]+x[1]+x[2]+x[3]+ ... x[35];

it works just about as well.  It might be that I had to pipeline
it also, but it still would have been easier to write.

(snip)

>> The latest parts from Xilinx and Altera will add three numbers 
>> at a time using a single carry chain.

> In this modern world of optimising tool chains why not just put them all 
> in one expression and let the tool chain work out what is best for the chip.

You mean ones with 6 input LUTs?  I haven't looked at those much yet.

(snip)

My favorite test of the optimizer is when I make a tiny mistake, which
turns out to cause some signal to never change, and the optimizer 
optimizes out all the logic!  Nothing at all left!

-- glen

Article: 158326
Subject: Re: Sum of 8 numbers in FPGA
From: jim.brakefield@ieee.org
Date: Thu, 22 Oct 2015 07:09:01 -0700 (PDT)
Links: << >> << T >> << A >>

On Wednesday, October 21, 2015 at 8:08:59 PM UTC-5, glen herrmannsfeldt wrote:
> David Wade <dave...@gmail.com> wrote:
> > On 20/10/2015 22:52, jim...@ieee.org wrote:
> >> On Tuesday, October 20, 2015 at 6:40:35 AM UTC-5, b2508 wrote:
> >>> How do I most efficiently add 8 numbers in FPGA?
> >>> What is the best way to save LUTs?
> >>> How is data width affecting LUT consumption?
>  
> > Why not try it out. Run one of the tool chains and see what happens when 
> > you build adder in different ways and  then if its not what you expect 
> > come and ask on here. The tool chains will show you what the LUT usage 
> > is. I was a tad suprised to find that when I coded:-
>  
> > byteout <= byte1+byte2+byte3+byte4+byte5+byte6+byte7+byte8 ;
>  
> > and compared it with
>  
> > temp1 <= byte1 + byte2 + byte3 + byte4 ;
> > temp2 <= byte5 + byte6 + byte7 + byte8 ;
> > byteout <= temp1 + temp2 ;
>  
> > I got the same number of LUTs and Slices used....
> 
> Yes, the optimizers can likely figure that one out.
> 
> Some years ago, I needed a 36 bit population count.
> That is, how many '1' bits there are in a 36 bit word.
> 
> The usual way to make one is with carry save adders, so
> I build one up, I think first 8 bits, and then combined those.
> 
> It was a little unusual, since I needed to know 0, 1, 2, 3, more than 3.
> 
> It wasn't hard to make, but it turns out that if you just say:
> 
>   p=x[0]+x[1]+x[2]+x[3]+ ... x[35];
> 
> it works just about as well.  It might be that I had to pipeline
> it also, but it still would have been easier to write.
> 
> (snip)
> 
> >> The latest parts from Xilinx and Altera will add three numbers 
> >> at a time using a single carry chain.
>  
> > In this modern world of optimising tool chains why not just put them all 
> > in one expression and let the tool chain work out what is best for the chip.
> 
> You mean ones with 6 input LUTs?  I haven't looked at those much yet.
> 
> (snip)
> 
> My favorite test of the optimizer is when I make a tiny mistake, which
> turns out to cause some signal to never change, and the optimizer 
> optimizes out all the logic!  Nothing at all left!
> 
> -- glen


> You mean ones with 6 input LUTs?  I haven't looked at those much yet.
6LUTs are a favorite of mine:
One 4-to-1 mux or two 2-to-1 muxes
2-to-1 mux and an add/subtract

IMHO their reason for being is that they reduce the number of logic levels.  Routing delay is now larger than logic delay, so reducing logic levels is a big speed win, more so than the greater logic capability.

The ALUT/ALM is somewhat different and more complicated.  Not currently using it, but does appear to have overall characteristics similar to the 6LUT.

Jim

Article: 158327
Subject: error Xst:899
From: "AnamDar" <109354@FPGARelated>
Date: Thu, 22 Oct 2015 10:04:51 -0500
Links: << >> << T >> << A >>

I am receiving this error in xilinx
ERROR:Xst:899 - "../../rtl/dff.v" line 7: The logic for <qout> does not
match a known FF or Latch template.

The code being synthesizd is:

module dff(output reg qout, input clok, rst,d,enf);
always @(posedge clok or negedge rst)
begin
if(enf)
begin
if(rst)
    qout<=0;
else
    qout<=d;
end
end

endmodule


can anyone guide me where i am going wrong?



---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158328
Subject: Interfacing ADS7230 ADC to Altera FPGA
From: "AlexKrish" <109678@FPGARelated>
Date: Thu, 22 Oct 2015 10:04:55 -0500
Links: << >> << T >> << A >>

Hi all,

I want to implement an ADC Interface for an ADC - ADS 7230 (TI) in VHDL. 
I am not very familiar with ADCs to implement it in VHDL. I already have 
an ADC Interface for a 10 bit ADC (MAX 1030) and a 12-bit ADC (LTC1407). 
Unfortunately these are in AHDL.

Is it possible to use any of the existing ADC interfaces and adapt it to 
suit ADS 7230 in AHDL itself? If yes, what are the necessary details I 
should look into from the data sheet to change the existing ADC 
interface available in AHDL?

Or do you have any other suggestions to implement an ADC interface in 
the quickest possible way?

Is there any link where I can get a reference of a 12-bit ADC in VHDL 
similar to ADS 7230?


Thank you!



---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158329
Subject: Re: error Xst:899
From: GaborSzakacs <gabor@alacron.com>
Date: Thu, 22 Oct 2015 13:38:30 -0400
Links: << >> << T >> << A >>

AnamDar wrote:
> I am receiving this error in xilinx
> ERROR:Xst:899 - "../../rtl/dff.v" line 7: The logic for <qout> does not
> match a known FF or Latch template.
> 
> The code being synthesizd is:
> 
> module dff(output reg qout, input clok, rst,d,enf);
> always @(posedge clok or negedge rst)
> begin
> if(enf)
> begin
> if(rst)
>     qout<=0;
> else
>     qout<=d;
> end
> end
> 
> endmodule
> 
> 
> can anyone guide me where i am going wrong?
> 
> 
> 
> ---------------------------------------
> Posted through http://www.FPGARelated.com

What are you trying to do with signal "enf"?  The way
you coded it, enf overrides the asynchronous reset as
well as the clock.  Also since you coded reset as a
negedge signal, it should be tested for being low rather
than high.  If "enf" is supposed to be a clock enable
it should be part of the else clause like:

always @ (posedge clok or negedge rst)
begin
   if (!rst) // active low async reset
     qout <= 0;
   else if (enf) // enf used as a clock enable
     qout <= d;
end

-- 
Gabor

Article: 158330
Subject: DC Blocker
From: "b2508" <108118@FPGARelated>
Date: Thu, 22 Oct 2015 13:18:14 -0500
Links: << >> << T >> << A >>

Hi all,

I need to implement DC blocker in FPGA. Data samples are coming at every
clock cycle.

My original idea was to implement high pass filter as in formula below:

y[n] = x[n] - x[n-1] + p*y[n-1]

However it seems to me that I cannot achieve this with the given data
rate. I am unable to calculate output by the time when I need it in
feedback loop for the next sample.

Is there some way to do this that I don't see?
If not, I was thinking of finding mean value of signal and subtracting it
from signal in order to clear DC.
However, I do not know how to determine appropriate number of samples for
this and do i do this by FIR filtering with all coefficients equal to
1/N?

Thank you in advance.

---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158331
Subject: Re: DC Blocker
From: "kaz" <37480@FPGARelated>
Date: Thu, 22 Oct 2015 13:59:42 -0500
Links: << >> << T >> << A >>

>Hi all,
>
>I need to implement DC blocker in FPGA. Data samples are coming at every
>clock cycle.
>
>My original idea was to implement high pass filter as in formula below:
>
>y[n] = x[n] - x[n-1] + p*y[n-1]
>
>However it seems to me that I cannot achieve this with the given data
>rate. I am unable to calculate output by the time when I need it in
>feedback loop for the next sample.
>

you can, just put one delay stage(register) on input to get x[n-1] and one
on output to get y[n-1], multiply by p and the circuit will do the job at
data rate. The mult output should not be registered and this may be speed
bottleneck. Moreover the above subtraction/addition cannot be pipelined
i.e. result should arrive at same clock edge. What is your data rate
(system clock) and device?

>Is there some way to do this that I don't see?
>If not, I was thinking of finding mean value of signal and subtracting
it
>from signal in order to clear DC.
>However, I do not know how to determine appropriate number of samples
for
>this and do i do this by FIR filtering with all coefficients equal to
>1/N?
>
This an alternative but you may need long delay stages to filter off dc
only.
for n stages, design n stages of delay, subtract current input from last
stage and accumulate/scale.

Kaz
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158332
Subject: Re: Interfacing ADS7230 ADC to Altera FPGA
From: rickman <gnuarm@gmail.com>
Date: Thu, 22 Oct 2015 15:12:21 -0400
Links: << >> << T >> << A >>

On 10/22/2015 11:04 AM, AlexKrish wrote:
> Hi all,
>
> I want to implement an ADC Interface for an ADC - ADS 7230 (TI) in VHDL.
> I am not very familiar with ADCs to implement it in VHDL. I already have
> an ADC Interface for a 10 bit ADC (MAX 1030) and a 12-bit ADC (LTC1407).
> Unfortunately these are in AHDL.
>
> Is it possible to use any of the existing ADC interfaces and adapt it to
> suit ADS 7230 in AHDL itself? If yes, what are the necessary details I
> should look into from the data sheet to change the existing ADC
> interface available in AHDL?
>
> Or do you have any other suggestions to implement an ADC interface in
> the quickest possible way?
>
> Is there any link where I can get a reference of a 12-bit ADC in VHDL
> similar to ADS 7230?

It looks like the ADS7230 uses an SPI interface for data and control. 
This is not a complex interface, but the ADS7230 looks like it is 
intended to be controlled by software in a processor.  To control this 
from an FPGA you will need to design a state machine to initialize the 
appropriate registers before reading data samples.

The LTC1407 is much simpler with no control registers, just read the 
data.  The MAX1030 has control registers, but they are not the same as 
the ADS7230, so I think you are going to need to design this interface 
yourself using available SPI code perhaps.  Just think of the SPI like a 
UART, a vehicle for getting the data in and out of the ADC.  You need to 
read about the registers and figure out how they need to be programmed 
for your application.

-- 

Rick

Article: 158333
Subject: Re: DC Blocker
From: "b2508" <108118@FPGARelated>
Date: Thu, 22 Oct 2015 14:13:35 -0500
Links: << >> << T >> << A >>

Hm.. I tought that multiplication cannot be implemented without delay.

This could cause timing issues to my knowledge.

Moreover, full formula is 

y[n] = Q {x[n] - x[n-1] + p*y[n-1] - e[n-1]}
e[n] = x[n] - x[n-1] + p*y[n-1] - e[n-1] - y[n]

Error is difference between output before and after quantization.
I asked for the initial one because even that i don't know how to
implement.

if x1 appears at t1, corresponding y1 is ready at earlies at t2=t1+1. If I
register subtracting operation as well, e1 is available at t3=t1+1.
However x2 arrives at t2 and neither y1 (corresponding y[n-1]) or e1 are
ready at that time.



---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158334
Subject: Re: DC Blocker
From: "kaz" <37480@FPGARelated>
Date: Thu, 22 Oct 2015 14:30:05 -0500
Links: << >> << T >> << A >>

>Hm.. I tought that multiplication cannot be implemented without delay.
>
>This could cause timing issues to my knowledge.
>
>Moreover, full formula is 
>
>y[n] = Q {x[n] - x[n-1] + p*y[n-1] - e[n-1]}
>e[n] = x[n] - x[n-1] + p*y[n-1] - e[n-1] - y[n]
>
>Error is difference between output before and after quantization.
>I asked for the initial one because even that i don't know how to
>implement.
>
>if x1 appears at t1, corresponding y1 is ready at earlies at t2=t1+1. If
I
>register subtracting operation as well, e1 is available at t3=t1+1.
>However x2 arrives at t2 and neither y1 (corresponding y[n-1]) or e1 are
>ready at that time.
>
>
>
>---------------------------------------
>Posted through http://www.FPGARelated.com

As such you got very long combinatorial paths running from mult input
right through adders/subtractors. Unless your speed is low enough you
can't do that in practice.

The fir subtraction is certainly doable but you need a long delay line
e.g. n = 1024 or more but depends on signal

Kaz
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158335
Subject: Re: DC Blocker
From: rickman <gnuarm@gmail.com>
Date: Thu, 22 Oct 2015 15:59:00 -0400
Links: << >> << T >> << A >>

On 10/22/2015 3:13 PM, b2508 wrote:
> Hm.. I tought that multiplication cannot be implemented without delay.
>
> This could cause timing issues to my knowledge.
>
> Moreover, full formula is
>
> y[n] = Q {x[n] - x[n-1] + p*y[n-1] - e[n-1]}
> e[n] = x[n] - x[n-1] + p*y[n-1] - e[n-1] - y[n]
>
> Error is difference between output before and after quantization.
> I asked for the initial one because even that i don't know how to
> implement.
>
> if x1 appears at t1, corresponding y1 is ready at earlies at t2=t1+1. If I
> register subtracting operation as well, e1 is available at t3=t1+1.
> However x2 arrives at t2 and neither y1 (corresponding y[n-1]) or e1 are
> ready at that time.

What is Q?

Y1 is ready at t1+delta which is a logic delay, not a clock cycle.  So 
don't sweat that.  If you need to pipeline this to meet timing 
constraints, you are in trouble, lol.

What clock rate are you shooting for?

-- 

Rick

Article: 158336
Subject: Re: DC Blocker
From: gtwrek@sonic.net (Mark Curry)
Date: Thu, 22 Oct 2015 20:08:37 +0000 (UTC)
Links: << >> << T >> << A >>

In article <2Z-dnUTCaKvCqLTLnZ2dnUU7-VOdnZ2d@giganews.com>,
b2508 <108118@FPGARelated> wrote:
>Hm.. I tought that multiplication cannot be implemented without delay.
>
>This could cause timing issues to my knowledge.
>
>Moreover, full formula is 
>
>y[n] = Q {x[n] - x[n-1] + p*y[n-1] - e[n-1]}
>e[n] = x[n] - x[n-1] + p*y[n-1] - e[n-1] - y[n]
>
>Error is difference between output before and after quantization.
>I asked for the initial one because even that i don't know how to
>implement.
>
>if x1 appears at t1, corresponding y1 is ready at earlies at t2=t1+1. If I
>register subtracting operation as well, e1 is available at t3=t1+1.
>However x2 arrives at t2 and neither y1 (corresponding y[n-1]) or e1 are
>ready at that time.

Without really looking at your required function in detail (just noting
that it has feedback terms) - I'll just note in general.

The statement "multiplication cannot be implemented without delay" is 
false, in many ways.  It all depends on your processing requirements. 
What is your sample rate?  What are your bit widths?

You're processing clock does NOT need to be the same as your sample clock.
If you wish them to be the same - it may be easier for new FPGA users to 
design - then you MAY be able to run the multiplier full combinational - 
If you're sample rate is low enough.

The alternative (at a high level) is to buffer an input and output, and 
process with a faster processing clock.  Modern FPGA's these days can run 
DSP functions upwards to around 400-500 MHz.  This is likely much faster 
than your sample rate.

Regards,
Mark

Article: 158337
Subject: Re: DC Blocker
From: "b2508" <108118@FPGARelated>
Date: Thu, 22 Oct 2015 15:50:40 -0500
Links: << >> << T >> << A >>

>In article <2Z-dnUTCaKvCqLTLnZ2dnUU7-VOdnZ2d@giganews.com>,
>b2508 <108118@FPGARelated> wrote:
>>Hm.. I tought that multiplication cannot be implemented without delay.
>>
>>This could cause timing issues to my knowledge.
>>
>>Moreover, full formula is 
>>
>>y[n] = Q {x[n] - x[n-1] + p*y[n-1] - e[n-1]}
>>e[n] = x[n] - x[n-1] + p*y[n-1] - e[n-1] - y[n]
>>
>>Error is difference between output before and after quantization.
>>I asked for the initial one because even that i don't know how to
>>implement.
>>
>>if x1 appears at t1, corresponding y1 is ready at earlies at t2=t1+1. If
I
>>register subtracting operation as well, e1 is available at t3=t1+1.
>>However x2 arrives at t2 and neither y1 (corresponding y[n-1]) or e1
are
>>ready at that time.
>
>Without really looking at your required function in detail (just noting
>that it has feedback terms) - I'll just note in general.
>
>The statement "multiplication cannot be implemented without delay" is 
>false, in many ways.  It all depends on your processing requirements. 
>What is your sample rate?  What are your bit widths?
>
>You're processing clock does NOT need to be the same as your sample
clock.
>If you wish them to be the same - it may be easier for new FPGA users to

>design - then you MAY be able to run the multiplier full combinational -

>If you're sample rate is low enough.
>
>The alternative (at a high level) is to buffer an input and output, and 
>process with a faster processing clock.  Modern FPGA's these days can run

>DSP functions upwards to around 400-500 MHz.  This is likely much faster

>than your sample rate.
>
>Regards,
>Mark

OK, I was taught that it is always safer to put registers wherever you
can. I have no choice in my project but to have same sampling and
processing rate.

My rate is 100 MHz.
Input data or x[n] has data format - unsigned, 16 bit, 1 bit for integer.

Also, I am not sure how to select data widths after each of these
operations.

If x[n] and x[n-1] are 16/1 and their subtraction is 17 bit unsigned with
2 bit integers, how do I proceed with data width selection? Feedback loop
part is unclear to me. 
Also, should I use DSP48 for the multiplication with P or should I make it
somehow power of two and do it by shifting?

Q is quantization, or reducing number of samples after all these
operations.


---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158338
Subject: Re: recovery/removal timing
From: Jon Elson <jmelson@wustl.edu>
Date: Thu, 22 Oct 2015 16:10:47 -0500
Links: << >> << T >> << A >>

zak wrote:

> True but the tool takes care of that and tells you if it passed
> recovery/removal at every register based on clock period check. In case it
> doesn't you can then assist by reducing reset fanout e.g. cascading
> stages.
If the reset signal is produced on-chip by a FF clocked off the same clock 
as the system that is getting the rest, then you are right, the tools have 
all the info they need to make sure this is right.  It would be insane to 
hace an externally-generated reset without reclocking it in the FPGA off the 
same clock, but you certainly could do this by mistake.

I just found a condition where you could have an un-synchronized input on a 
product that has been in manufacture for a decade.  Normally, this signal IS 
synchronized on the FPGA, but there was a particular configuration involving 
multiple boards where it would be synched from the OTHER board, only.  OOPS!

Jon

Article: 158339
Subject: Re: DC Blocker
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 22 Oct 2015 21:13:21 +0000 (UTC)
Links: << >> << T >> << A >>

Mark Curry <gtwrek@sonic.net> wrote:

(snip)

> Without really looking at your required function in detail (just noting
> that it has feedback terms) - I'll just note in general.

> The statement "multiplication cannot be implemented without delay" is 
> false, in many ways.  It all depends on your processing requirements. 
> What is your sample rate?  What are your bit widths?

I would say that it is right, but not very useful.

Addition can't be implemented without delay, and for that matter
no filter can be.  Even wires have delay.

If you are lucky, you can do all processing within one sample
period, so one sample delay.  You have to include any delay from
the previous register, so you have less than one sample period.

But more often, you can live with a few cycles delay, and pipeline
the whole system. 

> You're processing clock does NOT need to be the same as your sample clock.
> If you wish them to be the same - it may be easier for new FPGA users to 
> design - then you MAY be able to run the multiplier full combinational - 
> If you're sample rate is low enough.

> The alternative (at a high level) is to buffer an input and output, and 
> process with a faster processing clock.  Modern FPGA's these days can run 
> DSP functions upwards to around 400-500 MHz.  This is likely much faster 
> than your sample rate.

-- glen

Article: 158340
Subject: Re: DC Blocker
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Thu, 22 Oct 2015 16:14:24 -0500
Links: << >> << T >> << A >>

On Thu, 22 Oct 2015 13:18:14 -0500, b2508 wrote:

> Hi all,
> 
> I need to implement DC blocker in FPGA. Data samples are coming at every
> clock cycle.
> 
> My original idea was to implement high pass filter as in formula below:
> 
> y[n] = x[n] - x[n-1] + p*y[n-1]
> 
> However it seems to me that I cannot achieve this with the given data
> rate. I am unable to calculate output by the time when I need it in
> feedback loop for the next sample.
> 
> Is there some way to do this that I don't see?
> If not, I was thinking of finding mean value of signal and subtracting
> it from signal in order to clear DC.
> However, I do not know how to determine appropriate number of samples
> for this and do i do this by FIR filtering with all coefficients equal
> to 1/N?

There are other ways to implement high-pass filters.  I'm not much of an 
FPGA guy, but this one may help.  I'm going to rearrange your 
nomenclature:

u: input
y: output
x: state variable

y[n] = u[n] - x[n-1]
x[n] = d * y[n]

For d << 1 this should be pretty robust even if you have to toss in extra 
delays (i.e., x[n] = d * y[n - m], for some integer value of m).

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Article: 158341
Subject: Re: DC Blocker
From: rickman <gnuarm@gmail.com>
Date: Thu, 22 Oct 2015 17:15:35 -0400
Links: << >> << T >> << A >>

On 10/22/2015 4:50 PM, b2508 wrote:
>> In article <2Z-dnUTCaKvCqLTLnZ2dnUU7-VOdnZ2d@giganews.com>,
>> b2508 <108118@FPGARelated> wrote:
>>> Hm.. I tought that multiplication cannot be implemented without delay.
>>>
>>> This could cause timing issues to my knowledge.
>>>
>>> Moreover, full formula is
>>>
>>> y[n] = Q {x[n] - x[n-1] + p*y[n-1] - e[n-1]}
>>> e[n] = x[n] - x[n-1] + p*y[n-1] - e[n-1] - y[n]
>>>
>>> Error is difference between output before and after quantization.
>>> I asked for the initial one because even that i don't know how to
>>> implement.
>>>
>>> if x1 appears at t1, corresponding y1 is ready at earlies at t2=t1+1. If
> I
>>> register subtracting operation as well, e1 is available at t3=t1+1.
>>> However x2 arrives at t2 and neither y1 (corresponding y[n-1]) or e1
> are
>>> ready at that time.
>>
>> Without really looking at your required function in detail (just noting
>> that it has feedback terms) - I'll just note in general.
>>
>> The statement "multiplication cannot be implemented without delay" is
>> false, in many ways.  It all depends on your processing requirements.
>> What is your sample rate?  What are your bit widths?
>>
>> You're processing clock does NOT need to be the same as your sample
> clock.
>> If you wish them to be the same - it may be easier for new FPGA users to
>
>> design - then you MAY be able to run the multiplier full combinational -
>
>> If you're sample rate is low enough.
>>
>> The alternative (at a high level) is to buffer an input and output, and
>> process with a faster processing clock.  Modern FPGA's these days can run
>
>> DSP functions upwards to around 400-500 MHz.  This is likely much faster
>
>> than your sample rate.
>>
>> Regards,
>> Mark
>
> OK, I was taught that it is always safer to put registers wherever you
> can. I have no choice in my project but to have same sampling and
> processing rate.
>
> My rate is 100 MHz.
> Input data or x[n] has data format - unsigned, 16 bit, 1 bit for integer.
>
> Also, I am not sure how to select data widths after each of these
> operations.
>
> If x[n] and x[n-1] are 16/1 and their subtraction is 17 bit unsigned with
> 2 bit integers, how do I proceed with data width selection? Feedback loop
> part is unclear to me.
> Also, should I use DSP48 for the multiplication with P or should I make it
> somehow power of two and do it by shifting?
>
> Q is quantization, or reducing number of samples after all these
> operations.

Do you know the value of P?  Multiplies are done by shifting and adding. 
  I don't know which chip you are planning to use, but all the 
multipliers I know of require pipelining, the only option is how many 
stages, 1, 2, etc...  Since P is a constant (it *is* a constant, right?) 
you only need to use adders for the 1s, or if there are long runs of 1s 
or 0s, you can subtract at the lsb of the run and add in at the bit just 
past the msb of the run.  The point is you may not need to use a built 
in multiplier.

Your filter seems very complex for a feedback filter.  Is there some 
special need driving this?  Can you use a simpler filter?

-- 

Rick

Article: 158342
Subject: Re: DC Blocker
From: gtwrek@sonic.net (Mark Curry)
Date: Thu, 22 Oct 2015 22:58:56 +0000 (UTC)
Links: << >> << T >> << A >>

In article <nqOdnUF-N6G90bTLnZ2dnUU7-VmdnZ2d@giganews.com>,
b2508 <108118@FPGARelated> wrote:
>>In article <2Z-dnUTCaKvCqLTLnZ2dnUU7-VOdnZ2d@giganews.com>,
>>b2508 <108118@FPGARelated> wrote:
>>>Hm.. I tought that multiplication cannot be implemented without delay.
>>>
>>>This could cause timing issues to my knowledge.
>>>
>>>Moreover, full formula is 
>>>
>>>y[n] = Q {x[n] - x[n-1] + p*y[n-1] - e[n-1]}
>>>e[n] = x[n] - x[n-1] + p*y[n-1] - e[n-1] - y[n]
>>>
>
>OK, I was taught that it is always safer to put registers wherever you
>can. I have no choice in my project but to have same sampling and
>processing rate.

This is a complete non-sequitur.  Yes, register often in an FPGA. That's 
a good rule of thumb.  NOTHING to do with "having same sampling and processing
rate".

Think of it as an analogy - if you implemented this in software,
would you force (if you could) the processor to operate on the
sample clock?  Of course not.  You buffer a few input samples,
do your processing at the higher speed clock, then buffer
your output.  Your requirements are, the total processing
must complete in one-sample time.

When designing at the higher rate clock, each register is NOT neccesarily
a Z-1 sample delay of your function.  The register is just a retiming
step (i.e. pipeline stage).

>My rate is 100 MHz.
>Input data or x[n] has data format - unsigned, 16 bit, 1 bit for integer.

A 100 MHz fully combinational multiply is doable in a modern DSP48.  Now,
whether the rest of the algorithm would fit, I dunno.  I'd not design
it this way.  I'd use the faster processing clock.

>Also, I am not sure how to select data widths after each of these
>operations.
>
>If x[n] and x[n-1] are 16/1 and their subtraction is 17 bit unsigned with
>2 bit integers, how do I proceed with data width selection?   

I'm confused on you're notation - x[n], and x[n-1] should be same format.  But
in any event, in cases like these you just need to make sure your scaling
of each variable is the same (i.e. align the "decimal" points), and appropriate
sign-extend the size of each input.

>part is unclear to me. 
>Also, should I use DSP48 for the multiplication with P or should I make it
>somehow power of two and do it by shifting?

This is an implementation trade-off you must decide. 

Regards,

Mark

Article: 158343
Subject: Re: DC Blocker
From: gtwrek@sonic.net (Mark Curry)
Date: Thu, 22 Oct 2015 23:10:50 +0000 (UTC)
Links: << >> << T >> << A >>

In article <n0bjhg$fr2$1@speranza.aioe.org>,
glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
>Mark Curry <gtwrek@sonic.net> wrote:
>
>(snip)
>
>> Without really looking at your required function in detail (just noting
>> that it has feedback terms) - I'll just note in general.
> 
>> The statement "multiplication cannot be implemented without delay" is 
>> false, in many ways.  It all depends on your processing requirements. 
>> What is your sample rate?  What are your bit widths?
>
>I would say that it is right, but not very useful.
>
>Addition can't be implemented without delay, and for that matter
>no filter can be.  Even wires have delay.

Ok, picking nits - my terminology wasn't clear.  But multiplies and adds can 
be done on an FPGA in 0 clock cycles  i.e. pure combinational logic.
My notation (which I think is common), is counting pipeline cycles.
Both add, and multiply (and quite often both) can be done within 1 cycle,
such that you can register just the final output, and use that 
final output as a new input on the next iteration. 

>
>If you are lucky, you can do all processing within one sample
>period, so one sample delay.  You have to include any delay from
>the previous register, so you have less than one sample period.
>
>But more often, you can live with a few cycles delay, and pipeline
>the whole system. 

Which is, what I think the OP is (correctly) worried about.  His feedback
term is needed for the next calculation, so he can't fully pipeline.
His requirements are "Processing must be complete in one sample time."
Where as in general, full-pipelined designs requirements are just "Can
accept another input in one cycle time";  Output may appear 
(some reasonable) number of clock cycles later.

Regards,

Mark

Article: 158344
Subject: Re: DC Blocker
From: Les Cargill <lcargill99@comcast.com>
Date: Thu, 22 Oct 2015 18:18:51 -0500
Links: << >> << T >> << A >>

Tim Wescott wrote:
> On Thu, 22 Oct 2015 13:18:14 -0500, b2508 wrote:
>
>> Hi all,
>>
>> I need to implement DC blocker in FPGA. Data samples are coming at every
>> clock cycle.
>>
>> My original idea was to implement high pass filter as in formula below:
>>
>> y[n] = x[n] - x[n-1] + p*y[n-1]
>>
>> However it seems to me that I cannot achieve this with the given data
>> rate. I am unable to calculate output by the time when I need it in
>> feedback loop for the next sample.
>>
>> Is there some way to do this that I don't see?
>> If not, I was thinking of finding mean value of signal and subtracting
>> it from signal in order to clear DC.
>> However, I do not know how to determine appropriate number of samples
>> for this and do i do this by FIR filtering with all coefficients equal
>> to 1/N?
>
> There are other ways to implement high-pass filters.  I'm not much of an
> FPGA guy, but this one may help.  I'm going to rearrange your
> nomenclature:
>
> u: input
> y: output
> x: state variable
>
> y[n] = u[n] - x[n-1]
> x[n] = d * y[n]
>

x need not be a vector, does it? I think it works out to be a single
value. That may matter for an FPGA implementation.

> For d << 1 this should be pretty robust even if you have to toss in extra
> delays (i.e., x[n] = d * y[n - m], for some integer value of m).
>

-- 
Les Cargill

Article: 158345
Subject: Re: DC Blocker
From: "kaz" <37480@FPGARelated>
Date: Thu, 22 Oct 2015 18:41:00 -0500
Links: << >> << T >> << A >>

The OP appeared on dsprelated.com first where dsp guys know everthing
about fpgas but then migrated here thankfully.

The guy has posted there a link to a doc written by same dsp guys who
control that forum. It is a leaky integrator based dc filter followed by a
modification where the quantisation error is added back to the loop.

The double equations given here are misleading.

My suggestion is just implement as per filter2 in the diagram and forget
about equations. It is there ready for you. and if you use P as power of 2
it might be enough for your resolution and so mult needed.

your input is 16 bits unsigned?? that means dc offset, I believe the
design is meant for signed.

regarding bit growth: 16bits after addition/subtraction => 17 bits. for
feedback use 16 bits. Truncation error is meant to help that.

If you get into fmax issues then I hope dsp guys will come to help!!

Kaz


---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158346
Subject: Re: DC Blocker
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Thu, 22 Oct 2015 18:43:31 -0500
Links: << >> << T >> << A >>

On Thu, 22 Oct 2015 18:18:51 -0500, Les Cargill wrote:

> Tim Wescott wrote:
>> On Thu, 22 Oct 2015 13:18:14 -0500, b2508 wrote:
>>
>>> Hi all,
>>>
>>> I need to implement DC blocker in FPGA. Data samples are coming at
>>> every clock cycle.
>>>
>>> My original idea was to implement high pass filter as in formula
>>> below:
>>>
>>> y[n] = x[n] - x[n-1] + p*y[n-1]
>>>
>>> However it seems to me that I cannot achieve this with the given data
>>> rate. I am unable to calculate output by the time when I need it in
>>> feedback loop for the next sample.
>>>
>>> Is there some way to do this that I don't see?
>>> If not, I was thinking of finding mean value of signal and subtracting
>>> it from signal in order to clear DC.
>>> However, I do not know how to determine appropriate number of samples
>>> for this and do i do this by FIR filtering with all coefficients equal
>>> to 1/N?
>>
>> There are other ways to implement high-pass filters.  I'm not much of
>> an FPGA guy, but this one may help.  I'm going to rearrange your
>> nomenclature:
>>
>> u: input y: output x: state variable
>>
>> y[n] = u[n] - x[n-1]
>> x[n] = d * y[n]
>>
>>
> x need not be a vector, does it? I think it works out to be a single
> value. That may matter for an FPGA implementation.
> 
>> For d << 1 this should be pretty robust even if you have to toss in
>> extra delays (i.e., x[n] = d * y[n - m], for some integer value of m).
>>

In this case the x[n] notation means "x at sample time n", where "n" 
means "today".

Basically the notation that the OP used.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Article: 158347
Subject: Re: DC Blocker
From: "b2508" <108118@FPGARelated>
Date: Fri, 23 Oct 2015 02:44:14 -0500
Links: << >> << T >> << A >>

>In article <nqOdnUF-N6G90bTLnZ2dnUU7-VmdnZ2d@giganews.com>,
>b2508 <108118@FPGARelated> wrote:
>>>In article <2Z-dnUTCaKvCqLTLnZ2dnUU7-VOdnZ2d@giganews.com>,
>>>b2508 <108118@FPGARelated> wrote:

>>OK, I was taught that it is always safer to put registers wherever you
>>can. I have no choice in my project but to have same sampling and
>>processing rate.
>
>This is a complete non-sequitur.  Yes, register often in an FPGA. That's

>a good rule of thumb.  NOTHING to do with "having same sampling and
>processing
>rate".
>

>
>Regards,
>
>Mark

Hey, these are only two sentences next to each other, I didn't mean that I
register because sample and processing rate are the same :-)
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158348
Subject: Re: DC Blocker
From: "b2508" <108118@FPGARelated>
Date: Fri, 23 Oct 2015 02:51:42 -0500
Links: << >> << T >> << A >>

>On 10/22/2015 4:50 PM, b2508 wrote:
>>> In article <2Z-dnUTCaKvCqLTLnZ2dnUU7-VOdnZ2d@giganews.com>,
>>> b2508 <108118@FPGARelated> wrote:
>>>> Hm.. I tought that multiplication cannot be implemented without
delay.
>>>>
>>>> This could cause timing issues to my knowledge.
>>>>
>>>> Moreover, full formula is
>>>>
>>>> y[n] = Q {x[n] - x[n-1] + p*y[n-1] - e[n-1]}
>>>> e[n] = x[n] - x[n-1] + p*y[n-1] - e[n-1] - y[n]
>>>>
>>>> Error is difference between output before and after quantization.
>>>> I asked for the initial one because even that i don't know how to
>>>> implement.
>>>>
>>>> if x1 appears at t1, corresponding y1 is ready at earlies at
t2=t1+1.
>If
>> I
>>>> register subtracting operation as well, e1 is available at t3=t1+1.
>>>> However x2 arrives at t2 and neither y1 (corresponding y[n-1]) or e1
>> are
>>>> ready at that time.
>>>
>>> Without really looking at your required function in detail (just
noting
>>> that it has feedback terms) - I'll just note in general.
>>>
>>> The statement "multiplication cannot be implemented without delay" is
>>> false, in many ways.  It all depends on your processing requirements.
>>> What is your sample rate?  What are your bit widths?
>>>
>>> You're processing clock does NOT need to be the same as your sample
>> clock.
>>> If you wish them to be the same - it may be easier for new FPGA users
to
>>
>>> design - then you MAY be able to run the multiplier full combinational
-
>>
>>> If you're sample rate is low enough.
>>>
>>> The alternative (at a high level) is to buffer an input and output,
and
>>> process with a faster processing clock.  Modern FPGA's these days can
>run
>>
>>> DSP functions upwards to around 400-500 MHz.  This is likely much
faster
>>
>>> than your sample rate.
>>>
>>> Regards,
>>> Mark
>>
>> OK, I was taught that it is always safer to put registers wherever you
>> can. I have no choice in my project but to have same sampling and
>> processing rate.
>>
>> My rate is 100 MHz.
>> Input data or x[n] has data format - unsigned, 16 bit, 1 bit for
integer.
>>
>> Also, I am not sure how to select data widths after each of these
>> operations.
>>
>> If x[n] and x[n-1] are 16/1 and their subtraction is 17 bit unsigned
with
>> 2 bit integers, how do I proceed with data width selection? Feedback
loop
>> part is unclear to me.
>> Also, should I use DSP48 for the multiplication with P or should I
make
>it
>> somehow power of two and do it by shifting?
>>
>> Q is quantization, or reducing number of samples after all these
>> operations.
>
>Do you know the value of P?  Multiplies are done by shifting and adding.

>  I don't know which chip you are planning to use, but all the 
>multipliers I know of require pipelining, the only option is how many 
>stages, 1, 2, etc...  Since P is a constant (it *is* a constant, right?)

>you only need to use adders for the 1s, or if there are long runs of 1s 
>or 0s, you can subtract at the lsb of the run and add in at the bit just

>past the msb of the run.  The point is you may not need to use a built 
>in multiplier.
>
>Your filter seems very complex for a feedback filter.  Is there some 
>special need driving this?  Can you use a simpler filter?
>
>-- 
>
>Rick

I do not really know the value of P or how to determine it. I was thinking
to use 0.99 because I tried it out in software simulation and it seems to
do what I wanted it to do. The idea for this filter came from this article
/ second filter on Figure 2.

http://www.digitalsignallabs.com/dcblock.pdf

Someone said to forget equations and do as it is drawn in figure, but
these figures never account for potential latency of the
add/subtract/multiply blocks or if I do not add registers, then I may have
timing issues.

Anyway, I will try to do add and multiply in one clock cycle and see where
this gets me. 

Thank you all very much anyhow.


---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158349
Subject: Re: DC Blocker
From: rickman <gnuarm@gmail.com>
Date: Fri, 23 Oct 2015 03:59:02 -0400
Links: << >> << T >> << A >>

On 10/23/2015 3:51 AM, b2508 wrote:
>> On 10/22/2015 4:50 PM, b2508 wrote:
>>>> In article <2Z-dnUTCaKvCqLTLnZ2dnUU7-VOdnZ2d@giganews.com>,
>>>> b2508 <108118@FPGARelated> wrote:
>>>>> Hm.. I tought that multiplication cannot be implemented without
> delay.
>>>>>
>>>>> This could cause timing issues to my knowledge.
>>>>>
>>>>> Moreover, full formula is
>>>>>
>>>>> y[n] = Q {x[n] - x[n-1] + p*y[n-1] - e[n-1]}
>>>>> e[n] = x[n] - x[n-1] + p*y[n-1] - e[n-1] - y[n]
>>>>>
>>>>> Error is difference between output before and after quantization.
>>>>> I asked for the initial one because even that i don't know how to
>>>>> implement.
>>>>>
>>>>> if x1 appears at t1, corresponding y1 is ready at earlies at
> t2=t1+1.
>> If
>>> I
>>>>> register subtracting operation as well, e1 is available at t3=t1+1.
>>>>> However x2 arrives at t2 and neither y1 (corresponding y[n-1]) or e1
>>> are
>>>>> ready at that time.
>>>>
>>>> Without really looking at your required function in detail (just
> noting
>>>> that it has feedback terms) - I'll just note in general.
>>>>
>>>> The statement "multiplication cannot be implemented without delay" is
>>>> false, in many ways.  It all depends on your processing requirements.
>>>> What is your sample rate?  What are your bit widths?
>>>>
>>>> You're processing clock does NOT need to be the same as your sample
>>> clock.
>>>> If you wish them to be the same - it may be easier for new FPGA users
> to
>>>
>>>> design - then you MAY be able to run the multiplier full combinational
> -
>>>
>>>> If you're sample rate is low enough.
>>>>
>>>> The alternative (at a high level) is to buffer an input and output,
> and
>>>> process with a faster processing clock.  Modern FPGA's these days can
>> run
>>>
>>>> DSP functions upwards to around 400-500 MHz.  This is likely much
> faster
>>>
>>>> than your sample rate.
>>>>
>>>> Regards,
>>>> Mark
>>>
>>> OK, I was taught that it is always safer to put registers wherever you
>>> can. I have no choice in my project but to have same sampling and
>>> processing rate.
>>>
>>> My rate is 100 MHz.
>>> Input data or x[n] has data format - unsigned, 16 bit, 1 bit for
> integer.
>>>
>>> Also, I am not sure how to select data widths after each of these
>>> operations.
>>>
>>> If x[n] and x[n-1] are 16/1 and their subtraction is 17 bit unsigned
> with
>>> 2 bit integers, how do I proceed with data width selection? Feedback
> loop
>>> part is unclear to me.
>>> Also, should I use DSP48 for the multiplication with P or should I
> make
>> it
>>> somehow power of two and do it by shifting?
>>>
>>> Q is quantization, or reducing number of samples after all these
>>> operations.
>>
>> Do you know the value of P?  Multiplies are done by shifting and adding.
>
>>   I don't know which chip you are planning to use, but all the
>> multipliers I know of require pipelining, the only option is how many
>> stages, 1, 2, etc...  Since P is a constant (it *is* a constant, right?)
>
>> you only need to use adders for the 1s, or if there are long runs of 1s
>> or 0s, you can subtract at the lsb of the run and add in at the bit just
>
>> past the msb of the run.  The point is you may not need to use a built
>> in multiplier.
>>
>> Your filter seems very complex for a feedback filter.  Is there some
>> special need driving this?  Can you use a simpler filter?
>>
>> --
>>
>> Rick
>
> I do not really know the value of P or how to determine it. I was thinking
> to use 0.99 because I tried it out in software simulation and it seems to
> do what I wanted it to do. The idea for this filter came from this article
> / second filter on Figure 2.
>
> http://www.digitalsignallabs.com/dcblock.pdf
>
> Someone said to forget equations and do as it is drawn in figure, but
> these figures never account for potential latency of the
> add/subtract/multiply blocks or if I do not add registers, then I may have
> timing issues.
>
> Anyway, I will try to do add and multiply in one clock cycle and see where
> this gets me.
>
> Thank you all very much anyhow.

About the timing issues.  Try it without extra registers first.  Then if 
you have problems you will need to find ways to address them.  Your 
calculation can not work if you add more register delays.

-- 

Rick

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search