Messages from 158250

Article: 158250
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined systems
From: wzab01@gmail.com
Date: Tue, 29 Sep 2015 07:09:10 -0700 (PDT)
Links: << >> << T >> << A >>

W dniu wtorek, 29 wrze=C5=9Bnia 2015 11:50:53 UTC+1 u=C5=BCytkownik kaz nap=
isa=C5=82:
> >W dniu wtorek, 29 wrze=C3=85=E2=80=BAnia 2015 07:49:09 UTC+1 u=C3=85=C2=
=BCytkownik glen
> >herrmannsfeldt napisa=C3=85=E2=80=9A:
> >> wzab01@gmail.com wrote:
> >>=20
> >> > Last time I have spent a lot of time on development of quite=20
> >> > complex high speed data processing systems in FPGA.=20
> >> > They all had pipeline architecture, and data were processed in=20
> >> > parallel in multiple  pipelines with different latencies.
> >> =20
> >> > The worst thing was that those latencies were changing=20
> >> > during development. For example some operations were=20
> >> > performed by blocks with tree structure, so the number of=20
> >> > levels depended on number of inputs handled by each node.=20
> >> > The number of inputs in each node was varied to find the=20
> >> > acceptable balance between the number of levels and maximum=20
> >> > clock speed. I also had to add some pipeline registers to=20
> >> > improve timing.
> >>=20
> >> I have heard that some synthesis software now knows how to move
> >> around pipeline registers to optimize timing. I haven't tried
> >> using the feature yet, though. =20
> >>=20
> >> I think it can move registers, but maybe not add them. You might
> >> need enough registers in place for it to move them around.
> >>=20
> >> I used to work on systolic arrays, which are really just very long
> >> (hundred or thousands of stages) pipelines. It is pretty hard to=20
> >> hand optimize them that long.
> >>
> >
> >Yes, of course the pipeline registers may be moved (e.g. using the
> >"retiming" feature). I usually keep this option switched on for
> implementation.
> >My method only ensures, that the number of pipeline stages is the same
> in
> >all parallel paths. And keeping track of that was really a huge problem
> in
> >bigger designs.
> >--=20
> >Wojtek
>=20
> Not sure why you expect the tool to do what you should do and do so for
> simulation tool. How can you you simulate a design that synthesis will pu=
t
> for you registers?
>=20

The tool is supposed to ensure that the appropriate number of registers is =
added.
In case of high-level parametrized description it is really difficult to av=
oid mistakes. Therefore an automated tool is preferred.
The registers are put not only for synthesis, but also for simulation.
I hope, that my preprint explains more clearly both motivation and implemen=
tation.

Regards,
Wojtek

Article: 158251
Subject: Re: Automatic latency balancing in VHDL-implemented complex
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Tue, 29 Sep 2015 15:22:29 -0500
Links: << >> << T >> << A >>

On Tue, 29 Sep 2015 06:49:02 +0000, glen herrmannsfeldt wrote:

> wzab01@gmail.com wrote:
> 
>> Last time I have spent a lot of time on development of quite complex
>> high speed data processing systems in FPGA.
>> They all had pipeline architecture, and data were processed in parallel
>> in multiple  pipelines with different latencies.
>  
>> The worst thing was that those latencies were changing during
>> development. For example some operations were performed by blocks with
>> tree structure, so the number of levels depended on number of inputs
>> handled by each node.
>> The number of inputs in each node was varied to find the acceptable
>> balance between the number of levels and maximum clock speed. I also
>> had to add some pipeline registers to improve timing.
> 
> I have heard that some synthesis software now knows how to move around
> pipeline registers to optimize timing. I haven't tried using the feature
> yet, though.

I knew about this sort of thing ten years ago, although I've never used 
it (for FPGA I'm mostly an armchair coach).

At the time that my FPGA friends were rhapsodizing about it, the designer 
still needed to specify the total delay, but the tools took the 
responsibility for distributing it.

It makes sense to do it that way, because you're the one that has to 
decide how much delay is right, and who has to make sure that the timing 
for section A matches the timing for section B -- for the moment at least 
that's really beyond the tool's ability to cope.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Article: 158252
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined
From: rickman <gnuarm@gmail.com>
Date: Tue, 29 Sep 2015 16:41:08 -0400
Links: << >> << T >> << A >>

On 9/29/2015 4:22 PM, Tim Wescott wrote:
> On Tue, 29 Sep 2015 06:49:02 +0000, glen herrmannsfeldt wrote:
>
>> wzab01@gmail.com wrote:
>>
>>> Last time I have spent a lot of time on development of quite complex
>>> high speed data processing systems in FPGA.
>>> They all had pipeline architecture, and data were processed in parallel
>>> in multiple  pipelines with different latencies.
>>
>>> The worst thing was that those latencies were changing during
>>> development. For example some operations were performed by blocks with
>>> tree structure, so the number of levels depended on number of inputs
>>> handled by each node.
>>> The number of inputs in each node was varied to find the acceptable
>>> balance between the number of levels and maximum clock speed. I also
>>> had to add some pipeline registers to improve timing.
>>
>> I have heard that some synthesis software now knows how to move around
>> pipeline registers to optimize timing. I haven't tried using the feature
>> yet, though.
>
> I knew about this sort of thing ten years ago, although I've never used
> it (for FPGA I'm mostly an armchair coach).
>
> At the time that my FPGA friends were rhapsodizing about it, the designer
> still needed to specify the total delay, but the tools took the
> responsibility for distributing it.
>
> It makes sense to do it that way, because you're the one that has to
> decide how much delay is right, and who has to make sure that the timing
> for section A matches the timing for section B -- for the moment at least
> that's really beyond the tool's ability to cope.

I'm not picturing the model you are describing.  If all sections have 
the same clock, they all have the same timing constraint, no?  As to the 
tools distributing the delays, again, each stage has the same timing 
constraint so unless there are complications such as inputs with 
separately specified delays, the tool just has to move logic across 
register boundaries to make each section meet the timing spec or better 
to balance all the delays in case you wish to have the fastest possible 
clock rate.

Maybe by timing you mean the clock cycles the OP is talking about?

-- 

Rick

Article: 158253
Subject: Re: Automatic latency balancing in VHDL-implemented complex
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Tue, 29 Sep 2015 16:01:59 -0500
Links: << >> << T >> << A >>

On Tue, 29 Sep 2015 16:41:08 -0400, rickman wrote:

> On 9/29/2015 4:22 PM, Tim Wescott wrote:
>> On Tue, 29 Sep 2015 06:49:02 +0000, glen herrmannsfeldt wrote:
>>
>>> wzab01@gmail.com wrote:
>>>
>>>> Last time I have spent a lot of time on development of quite complex
>>>> high speed data processing systems in FPGA.
>>>> They all had pipeline architecture, and data were processed in
>>>> parallel in multiple  pipelines with different latencies.
>>>
>>>> The worst thing was that those latencies were changing during
>>>> development. For example some operations were performed by blocks
>>>> with tree structure, so the number of levels depended on number of
>>>> inputs handled by each node.
>>>> The number of inputs in each node was varied to find the acceptable
>>>> balance between the number of levels and maximum clock speed. I also
>>>> had to add some pipeline registers to improve timing.
>>>
>>> I have heard that some synthesis software now knows how to move around
>>> pipeline registers to optimize timing. I haven't tried using the
>>> feature yet, though.
>>
>> I knew about this sort of thing ten years ago, although I've never used
>> it (for FPGA I'm mostly an armchair coach).
>>
>> At the time that my FPGA friends were rhapsodizing about it, the
>> designer still needed to specify the total delay, but the tools took
>> the responsibility for distributing it.
>>
>> It makes sense to do it that way, because you're the one that has to
>> decide how much delay is right, and who has to make sure that the
>> timing for section A matches the timing for section B -- for the moment
>> at least that's really beyond the tool's ability to cope.
> 
> I'm not picturing the model you are describing.  If all sections have
> the same clock, they all have the same timing constraint, no?  As to the
> tools distributing the delays, again, each stage has the same timing
> constraint so unless there are complications such as inputs with
> separately specified delays, the tool just has to move logic across
> register boundaries to make each section meet the timing spec or better
> to balance all the delays in case you wish to have the fastest possible
> clock rate.
> 
> Maybe by timing you mean the clock cycles the OP is talking about?

The way I've seen it, rather than carefully hand-designing a pipeline, 
you just design a system that's basically

            .---------------------.     .-------.
 data in -->| combinatorial logic |---->| delay |----> data out
            '---------------------'     '-------'

where the "delay" block just delays all the outputs from the 
combinatorial block by some number of clocks.

Then you tell the tool "move delays as you see fit", and it magically 
distributes the delay in a hopefully-optimal way within the combinatorial 
logic, making it pipelined.

As I said, I've never done it -- I couldn't even tell you what search 
terms to use to find out what the tool vendors call the process.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Article: 158254
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined systems
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 29 Sep 2015 21:29:09 +0000 (UTC)
Links: << >> << T >> << A >>

Tim Wescott <seemywebsite@myfooter.really> wrote:

(snip, I wrote)

>> I have heard that some synthesis software now knows how to move around
>> pipeline registers to optimize timing. I haven't tried using the feature
>> yet, though.

> I knew about this sort of thing ten years ago, although I've never used 
> it (for FPGA I'm mostly an armchair coach).

> At the time that my FPGA friends were rhapsodizing about it, the designer 
> still needed to specify the total delay, but the tools took the 
> responsibility for distributing it.

Some time ago, and before I knew about this, I was working on designs
for some very long pipelines, thousands of steps.  Each step is
fairly simple, and all are alike (except for data values). 

I figured that in an FPGA, the pipeline would go across the array,
then down and across backwards, until it got to the end.

I then figured that the delay at the end, where it turned around to
go back, would be longer than other delays, but didn't know how to
modify my code.

As with many pipelines, I can add registers to all the signals without 
affecting the results, though they will come out a little later.
But where to add the registers?

It turned out to be too expensive, so never got built, or even close.
Sometime later, I learned about this feature, but never went back
to try it.

> It makes sense to do it that way, because you're the one that has to 
> decide how much delay is right, and who has to make sure that the timing 
> for section A matches the timing for section B -- for the moment at least 
> that's really beyond the tool's ability to cope.

One could put in sets of optional registers, such that either all or
none of a set get implemented. That might not be so hard, but you
do need a way to say it.

-- glen

Article: 158255
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined systems
From: wzab01@gmail.com
Date: Tue, 29 Sep 2015 14:58:14 -0700 (PDT)
Links: << >> << T >> << A >>

W dniu wtorek, 29 wrze=C5=9Bnia 2015 21:41:26 UTC+1 u=C5=BCytkownik rickman=
 napisa=C5=82:
> On 9/29/2015 4:22 PM, Tim Wescott wrote:
> > On Tue, 29 Sep 2015 06:49:02 +0000, glen herrmannsfeldt wrote:
> >
> >> wzab01@gmail.com wrote:
> >>
> >>> Last time I have spent a lot of time on development of quite complex
> >>> high speed data processing systems in FPGA.
> >>> They all had pipeline architecture, and data were processed in parall=
el
> >>> in multiple  pipelines with different latencies.
> >>
> >>> The worst thing was that those latencies were changing during
> >>> development. For example some operations were performed by blocks wit=
h
> >>> tree structure, so the number of levels depended on number of inputs
> >>> handled by each node.
> >>> The number of inputs in each node was varied to find the acceptable
> >>> balance between the number of levels and maximum clock speed. I also
> >>> had to add some pipeline registers to improve timing.
> >>
> >> I have heard that some synthesis software now knows how to move around
> >> pipeline registers to optimize timing. I haven't tried using the featu=
re
> >> yet, though.
> >
> > I knew about this sort of thing ten years ago, although I've never used
> > it (for FPGA I'm mostly an armchair coach).
> >
> > At the time that my FPGA friends were rhapsodizing about it, the design=
er
> > still needed to specify the total delay, but the tools took the
> > responsibility for distributing it.
> >
> > It makes sense to do it that way, because you're the one that has to
> > decide how much delay is right, and who has to make sure that the timin=
g
> > for section A matches the timing for section B -- for the moment at lea=
st
> > that's really beyond the tool's ability to cope.
>=20
> I'm not picturing the model you are describing.  If all sections have=20
> the same clock, they all have the same timing constraint, no?  As to the=
=20
> tools distributing the delays, again, each stage has the same timing=20
> constraint so unless there are complications such as inputs with=20
> separately specified delays, the tool just has to move logic across=20
> register boundaries to make each section meet the timing spec or better=
=20
> to balance all the delays in case you wish to have the fastest possible=
=20
> clock rate.
>=20
> Maybe by timing you mean the clock cycles the OP is talking about?
>=20
> --=20
>=20
> Rick

The problem I'm dealing with is just about the number of clock cycles, by w=
hich data in each data path are delayed.

The equal distribution of delay between stages of pipeline is so technology=
 specific, that it probably must be handled by the vendor provided tools an=
d in fact usually it is. In old Xilinx tools it was "register balancing", i=
n Altera tools and in new Xilinx tools it is "register retiming".

So my problem is not so complex. And yes, it was solved in GUI based tools =
many years ago.
In old Xilinx System Generator it was a special "sync" block which was doin=
g that. Just see Fig. 4 in my old paper from 2003 ( http://tesla.desy.de/ne=
w_pages/TESLA_Reports/2003/pdf_files/tesla2003-05.pdf ).

The importance of the problem is still emphasized by the vendors of block-b=
ased=20
tools (e.g. http://www.mathworks.com/help/hdlcoder/examples/delay-balancing=
-and-validation-model-workflow-in-hdl-coder.html )

However I've never see tool like this available for designs written in pure=
 HDL,
not composed from blocks in GUI based tool...

I have found that for designs with pipelines with lengths depending on diff=
erent parameters and somehow interconnected in a complex way there is reall=
y a need for a tool for automatic verification, or even better for automati=
c adjustment of those lengths.=20
Without that you can easily get incorrect design which processes data misal=
igned in time.

So that was the motivation.
Sorry if my original post was somehow misleading.

Regards,
Wojtek

Article: 158256
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined
From: rickman <gnuarm@gmail.com>
Date: Tue, 29 Sep 2015 18:33:32 -0400
Links: << >> << T >> << A >>

On 9/29/2015 5:01 PM, Tim Wescott wrote:
> On Tue, 29 Sep 2015 16:41:08 -0400, rickman wrote:
>
>> On 9/29/2015 4:22 PM, Tim Wescott wrote:
>>> On Tue, 29 Sep 2015 06:49:02 +0000, glen herrmannsfeldt wrote:
>>>
>>>> wzab01@gmail.com wrote:
>>>>
>>>>> Last time I have spent a lot of time on development of quite complex
>>>>> high speed data processing systems in FPGA.
>>>>> They all had pipeline architecture, and data were processed in
>>>>> parallel in multiple  pipelines with different latencies.
>>>>
>>>>> The worst thing was that those latencies were changing during
>>>>> development. For example some operations were performed by blocks
>>>>> with tree structure, so the number of levels depended on number of
>>>>> inputs handled by each node.
>>>>> The number of inputs in each node was varied to find the acceptable
>>>>> balance between the number of levels and maximum clock speed. I also
>>>>> had to add some pipeline registers to improve timing.
>>>>
>>>> I have heard that some synthesis software now knows how to move around
>>>> pipeline registers to optimize timing. I haven't tried using the
>>>> feature yet, though.
>>>
>>> I knew about this sort of thing ten years ago, although I've never used
>>> it (for FPGA I'm mostly an armchair coach).
>>>
>>> At the time that my FPGA friends were rhapsodizing about it, the
>>> designer still needed to specify the total delay, but the tools took
>>> the responsibility for distributing it.
>>>
>>> It makes sense to do it that way, because you're the one that has to
>>> decide how much delay is right, and who has to make sure that the
>>> timing for section A matches the timing for section B -- for the moment
>>> at least that's really beyond the tool's ability to cope.
>>
>> I'm not picturing the model you are describing.  If all sections have
>> the same clock, they all have the same timing constraint, no?  As to the
>> tools distributing the delays, again, each stage has the same timing
>> constraint so unless there are complications such as inputs with
>> separately specified delays, the tool just has to move logic across
>> register boundaries to make each section meet the timing spec or better
>> to balance all the delays in case you wish to have the fastest possible
>> clock rate.
>>
>> Maybe by timing you mean the clock cycles the OP is talking about?
>
> The way I've seen it, rather than carefully hand-designing a pipeline,
> you just design a system that's basically
>
>              .---------------------.     .-------.
>   data in -->| combinatorial logic |---->| delay |----> data out
>              '---------------------'     '-------'
>
> where the "delay" block just delays all the outputs from the
> combinatorial block by some number of clocks.
>
> Then you tell the tool "move delays as you see fit", and it magically
> distributes the delay in a hopefully-optimal way within the combinatorial
> logic, making it pipelined.

Yes, but you talked about the tool not being able to "cope" with 
matching the delays in section A and B.  I'm not following that.

-- 

Rick

Article: 158257
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined
From: rickman <gnuarm@gmail.com>
Date: Tue, 29 Sep 2015 18:37:21 -0400
Links: << >> << T >> << A >>

On 9/29/2015 5:58 PM, wzab01@gmail.com wrote:
> W dniu wtorek, 29 września 2015 21:41:26 UTC+1 użytkownik rickman
> napisał:
>> On 9/29/2015 4:22 PM, Tim Wescott wrote:
>>> On Tue, 29 Sep 2015 06:49:02 +0000, glen herrmannsfeldt wrote:
>>>
>>>> wzab01@gmail.com wrote:
>>>>
>>>>> Last time I have spent a lot of time on development of quite
>>>>> complex high speed data processing systems in FPGA. They all
>>>>> had pipeline architecture, and data were processed in
>>>>> parallel in multiple  pipelines with different latencies.
>>>>
>>>>> The worst thing was that those latencies were changing
>>>>> during development. For example some operations were
>>>>> performed by blocks with tree structure, so the number of
>>>>> levels depended on number of inputs handled by each node. The
>>>>> number of inputs in each node was varied to find the
>>>>> acceptable balance between the number of levels and maximum
>>>>> clock speed. I also had to add some pipeline registers to
>>>>> improve timing.
>>>>
>>>> I have heard that some synthesis software now knows how to move
>>>> around pipeline registers to optimize timing. I haven't tried
>>>> using the feature yet, though.
>>>
>>> I knew about this sort of thing ten years ago, although I've
>>> never used it (for FPGA I'm mostly an armchair coach).
>>>
>>> At the time that my FPGA friends were rhapsodizing about it, the
>>> designer still needed to specify the total delay, but the tools
>>> took the responsibility for distributing it.
>>>
>>> It makes sense to do it that way, because you're the one that has
>>> to decide how much delay is right, and who has to make sure that
>>> the timing for section A matches the timing for section B -- for
>>> the moment at least that's really beyond the tool's ability to
>>> cope.
>>
>> I'm not picturing the model you are describing.  If all sections
>> have the same clock, they all have the same timing constraint, no?
>> As to the tools distributing the delays, again, each stage has the
>> same timing constraint so unless there are complications such as
>> inputs with separately specified delays, the tool just has to move
>> logic across register boundaries to make each section meet the
>> timing spec or better to balance all the delays in case you wish to
>> have the fastest possible clock rate.
>>
>> Maybe by timing you mean the clock cycles the OP is talking about?
>>
>> --
>>
>> Rick
>
> The problem I'm dealing with is just about the number of clock
> cycles, by which data in each data path are delayed.

Yes, I understand the problem you are addressing. I have never done a 
design where this was much of a problem, but I'm sure some designs are 
much larger and more complex than the ones I have done.


> The equal distribution of delay between stages of pipeline is so
> technology specific, that it probably must be handled by the vendor
> provided tools and in fact usually it is. In old Xilinx tools it was
> "register balancing", in Altera tools and in new Xilinx tools it is
> "register retiming".
>
> So my problem is not so complex. And yes, it was solved in GUI based
> tools many years ago. In old Xilinx System Generator it was a special
> "sync" block which was doing that. Just see Fig. 4 in my old paper
> from 2003 (
> http://tesla.desy.de/new_pages/TESLA_Reports/2003/pdf_files/tesla2003-05.pdf
> ).
>
> The importance of the problem is still emphasized by the vendors of
> block-based tools (e.g.
> http://www.mathworks.com/help/hdlcoder/examples/delay-balancing-and-validation-model-workflow-in-hdl-coder.html
> )

Yes, it is important to have a tool to do this when the design is large 
or your timing margins are tight.  It can save a lot of work.


> However I've never see tool like this available for designs written
> in pure HDL, not composed from blocks in GUI based tool...
>
> I have found that for designs with pipelines with lengths depending
> on different parameters and somehow interconnected in a complex way
> there is really a need for a tool for automatic verification, or even
> better for automatic adjustment of those lengths. Without that you
> can easily get incorrect design which processes data misaligned in
> time.
>
> So that was the motivation. Sorry if my original post was somehow
> misleading.

Not to me. :)

-- 

Rick

Article: 158258
Subject: Re: Automatic latency balancing in VHDL-implemented complex
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Tue, 29 Sep 2015 18:19:20 -0500
Links: << >> << T >> << A >>

On Tue, 29 Sep 2015 18:33:32 -0400, rickman wrote:

> On 9/29/2015 5:01 PM, Tim Wescott wrote:
>> On Tue, 29 Sep 2015 16:41:08 -0400, rickman wrote:
>>
>>> On 9/29/2015 4:22 PM, Tim Wescott wrote:
>>>> On Tue, 29 Sep 2015 06:49:02 +0000, glen herrmannsfeldt wrote:
>>>>
>>>>> wzab01@gmail.com wrote:
>>>>>
>>>>>> Last time I have spent a lot of time on development of quite
>>>>>> complex high speed data processing systems in FPGA.
>>>>>> They all had pipeline architecture, and data were processed in
>>>>>> parallel in multiple  pipelines with different latencies.
>>>>>
>>>>>> The worst thing was that those latencies were changing during
>>>>>> development. For example some operations were performed by blocks
>>>>>> with tree structure, so the number of levels depended on number of
>>>>>> inputs handled by each node.
>>>>>> The number of inputs in each node was varied to find the acceptable
>>>>>> balance between the number of levels and maximum clock speed. I
>>>>>> also had to add some pipeline registers to improve timing.
>>>>>
>>>>> I have heard that some synthesis software now knows how to move
>>>>> around pipeline registers to optimize timing. I haven't tried using
>>>>> the feature yet, though.
>>>>
>>>> I knew about this sort of thing ten years ago, although I've never
>>>> used it (for FPGA I'm mostly an armchair coach).
>>>>
>>>> At the time that my FPGA friends were rhapsodizing about it, the
>>>> designer still needed to specify the total delay, but the tools took
>>>> the responsibility for distributing it.
>>>>
>>>> It makes sense to do it that way, because you're the one that has to
>>>> decide how much delay is right, and who has to make sure that the
>>>> timing for section A matches the timing for section B -- for the
>>>> moment at least that's really beyond the tool's ability to cope.
>>>
>>> I'm not picturing the model you are describing.  If all sections have
>>> the same clock, they all have the same timing constraint, no?  As to
>>> the tools distributing the delays, again, each stage has the same
>>> timing constraint so unless there are complications such as inputs
>>> with separately specified delays, the tool just has to move logic
>>> across register boundaries to make each section meet the timing spec
>>> or better to balance all the delays in case you wish to have the
>>> fastest possible clock rate.
>>>
>>> Maybe by timing you mean the clock cycles the OP is talking about?
>>
>> The way I've seen it, rather than carefully hand-designing a pipeline,
>> you just design a system that's basically
>>
>>              .---------------------.     .-------.
>>   data in -->| combinatorial logic |---->| delay |----> data out
>>              '---------------------'     '-------'
>>
>> where the "delay" block just delays all the outputs from the
>> combinatorial block by some number of clocks.
>>
>> Then you tell the tool "move delays as you see fit", and it magically
>> distributes the delay in a hopefully-optimal way within the
>> combinatorial logic, making it pipelined.
> 
> Yes, but you talked about the tool not being able to "cope" with
> matching the delays in section A and B.  I'm not following that.

Basically I meant that you need to be responsible for lining up the 
delays in all the sections -- you can't make one section delay by five 
more clocks without identifying all the other pertinent sections that 
depend on that and make them delay by five more clocks, too.

If the tool could do everything we'd all be wiring houses for a living.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Article: 158259
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined
From: rickman <gnuarm@gmail.com>
Date: Tue, 29 Sep 2015 19:32:31 -0400
Links: << >> << T >> << A >>

On 9/29/2015 7:19 PM, Tim Wescott wrote:
> On Tue, 29 Sep 2015 18:33:32 -0400, rickman wrote:
>
>> On 9/29/2015 5:01 PM, Tim Wescott wrote:
>>> On Tue, 29 Sep 2015 16:41:08 -0400, rickman wrote:
>>>
>>>> On 9/29/2015 4:22 PM, Tim Wescott wrote:
>>>>> On Tue, 29 Sep 2015 06:49:02 +0000, glen herrmannsfeldt wrote:
>>>>>
>>>>>> wzab01@gmail.com wrote:
>>>>>>
>>>>>>> Last time I have spent a lot of time on development of quite
>>>>>>> complex high speed data processing systems in FPGA.
>>>>>>> They all had pipeline architecture, and data were processed in
>>>>>>> parallel in multiple  pipelines with different latencies.
>>>>>>
>>>>>>> The worst thing was that those latencies were changing during
>>>>>>> development. For example some operations were performed by blocks
>>>>>>> with tree structure, so the number of levels depended on number of
>>>>>>> inputs handled by each node.
>>>>>>> The number of inputs in each node was varied to find the acceptable
>>>>>>> balance between the number of levels and maximum clock speed. I
>>>>>>> also had to add some pipeline registers to improve timing.
>>>>>>
>>>>>> I have heard that some synthesis software now knows how to move
>>>>>> around pipeline registers to optimize timing. I haven't tried using
>>>>>> the feature yet, though.
>>>>>
>>>>> I knew about this sort of thing ten years ago, although I've never
>>>>> used it (for FPGA I'm mostly an armchair coach).
>>>>>
>>>>> At the time that my FPGA friends were rhapsodizing about it, the
>>>>> designer still needed to specify the total delay, but the tools took
>>>>> the responsibility for distributing it.
>>>>>
>>>>> It makes sense to do it that way, because you're the one that has to
>>>>> decide how much delay is right, and who has to make sure that the
>>>>> timing for section A matches the timing for section B -- for the
>>>>> moment at least that's really beyond the tool's ability to cope.
>>>>
>>>> I'm not picturing the model you are describing.  If all sections have
>>>> the same clock, they all have the same timing constraint, no?  As to
>>>> the tools distributing the delays, again, each stage has the same
>>>> timing constraint so unless there are complications such as inputs
>>>> with separately specified delays, the tool just has to move logic
>>>> across register boundaries to make each section meet the timing spec
>>>> or better to balance all the delays in case you wish to have the
>>>> fastest possible clock rate.
>>>>
>>>> Maybe by timing you mean the clock cycles the OP is talking about?
>>>
>>> The way I've seen it, rather than carefully hand-designing a pipeline,
>>> you just design a system that's basically
>>>
>>>               .---------------------.     .-------.
>>>    data in -->| combinatorial logic |---->| delay |----> data out
>>>               '---------------------'     '-------'
>>>
>>> where the "delay" block just delays all the outputs from the
>>> combinatorial block by some number of clocks.
>>>
>>> Then you tell the tool "move delays as you see fit", and it magically
>>> distributes the delay in a hopefully-optimal way within the
>>> combinatorial logic, making it pipelined.
>>
>> Yes, but you talked about the tool not being able to "cope" with
>> matching the delays in section A and B.  I'm not following that.
>
> Basically I meant that you need to be responsible for lining up the
> delays in all the sections -- you can't make one section delay by five
> more clocks without identifying all the other pertinent sections that
> depend on that and make them delay by five more clocks, too.
>
> If the tool could do everything we'd all be wiring houses for a living.

Ok, but that is not the tool CAD vendors provide.  That is the tool the 
OP is talking about.

-- 

Rick

Article: 158260
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined systems
From: "kaz" <37480@FPGARelated>
Date: Wed, 30 Sep 2015 03:41:09 -0500
Links: << >> << T >> << A >>

Any VHDL compiler cannot be a useful compiler unless it respects the user
entered registers. Though it may fit an equivalent arrangement as in
register retiming for timing purposes.

Register delay stages is obviously what we are talking about rather than
combinatorial/routing delays which is a concern for each register timing
and which the tool decides together with any constraints from user.

It is up to user to decide the register delay stages. It cannot be
technology sensitive unless you are doing some high level coding that does
not specify registers. I don't know what this level is though.

How come a user build a design without being correct about register delay.
How do you add streams or multiply or switch etc. and ask the tool to do
the job?

Kaz
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158261
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined
From: HT-Lab <hans64@htminuslab.com>
Date: Wed, 30 Sep 2015 10:27:35 +0100
Links: << >> << T >> << A >>

On 29/09/2015 22:01, Tim Wescott wrote:
> On Tue, 29 Sep 2015 16:41:08 -0400, rickman wrote:
..
>
> The way I've seen it, rather than carefully hand-designing a pipeline,
> you just design a system that's basically
>
>              .---------------------.     .-------.
>   data in -->| combinatorial logic |---->| delay |----> data out
>              '---------------------'     '-------'
>
> where the "delay" block just delays all the outputs from the
> combinatorial block by some number of clocks.
>
> Then you tell the tool "move delays as you see fit", and it magically
> distributes the delay in a hopefully-optimal way within the combinatorial
> logic, making it pipelined.
>
> As I said, I've never done it -- I couldn't even tell you what search
> terms to use to find out what the tool vendors call the process.
>
As mentioned before just search for register retiming. It works exactly 
as you described although it is not perfect. It can move combinational 
logic between register pairs to balance the slack. Register retiming is 
a relative old technology and has been available on most independent 
tools (like Mentor's Precision and Synopsys's Synplify) and Vendor 
synthesis tools for many years. From what I understand vendor tools can 
only move logic into one direction due to a patent owned by Mentor Graphics.

# Info: [7004]: Starting retiming program ...
# Info: [7012]: Phase 1
# Info: [7012]: Phase 2
# Info: [7012]: Phase 3
# Info: [7012]: Phase 4
# Info: [7012]: Total number of DSPs processed       : 0
# Info: [7012]: Total number of registers added      : 138
# Info: [7012]: Total number of registers removed    : 66
# Info: [7012]: Total number of logic elements added : 0

Register retiming is something you want to enable by default unless you 
are planning to use an equivalence checker,

Hans
www.ht-lab.com

Article: 158262
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined systems
From: "kaz" <37480@FPGARelated>
Date: Wed, 30 Sep 2015 05:18:47 -0500
Links: << >> << T >> << A >>

>On 29/09/2015 22:01, Tim Wescott wrote:
>> On Tue, 29 Sep 2015 16:41:08 -0400, rickman wrote:
>..
>>
>> The way I've seen it, rather than carefully hand-designing a pipeline,
>> you just design a system that's basically
>>
>>              .---------------------.     .-------.
>>   data in -->| combinatorial logic |---->| delay |----> data out
>>              '---------------------'     '-------'
>>
>> where the "delay" block just delays all the outputs from the
>> combinatorial block by some number of clocks.
>>
>> Then you tell the tool "move delays as you see fit", and it magically
>> distributes the delay in a hopefully-optimal way within the
combinatorial
>> logic, making it pipelined.
>>
>> As I said, I've never done it -- I couldn't even tell you what search
>> terms to use to find out what the tool vendors call the process.
>>
>As mentioned before just search for register retiming. It works exactly 
>as you described although it is not perfect. It can move combinational 
>logic between register pairs to balance the slack. Register retiming is 
>a relative old technology and has been available on most independent 
>tools (like Mentor's Precision and Synopsys's Synplify) and Vendor 
>synthesis tools for many years. From what I understand vendor tools can 
>only move logic into one direction due to a patent owned by Mentor
>Graphics.
>
># Info: [7004]: Starting retiming program ...
># Info: [7012]: Phase 1
># Info: [7012]: Phase 2
># Info: [7012]: Phase 3
># Info: [7012]: Phase 4
># Info: [7012]: Total number of DSPs processed       : 0
># Info: [7012]: Total number of registers added      : 138
># Info: [7012]: Total number of registers removed    : 66
># Info: [7012]: Total number of logic elements added : 0
>
>Register retiming is something you want to enable by default unless you 
>are planning to use an equivalence checker,
>
>Hans
>www.ht-lab.com

Register retiming is a technique to help timing of setup/hold of a given
path.
It does not and should not change latency of path in terms of clock
periods.
The OP is referring to latency of a path in terms of clock periods rather
than delay issues within a given path.

Kaz
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158263
Subject: Correlator of a big antenna array on FPGA
From: "ste3191" <107600@FPGARelated>
Date: Wed, 30 Sep 2015 07:13:22 -0500
Links: << >> << T >> << A >>

Hi, i have a serious problem with the architecture of a correlator for a
planar antenna array (16 x 16).
Theoretically i can't implent the normal expression sum(X*X^H) because i
would obtain a covariance matrix of 256 x 256. Then i can think to
implement the spatial smoothing technique, namely it takes an average of
overlapped subarray, with the advantage to have a smaller covariance
matrix. This is right but is slow technique!! I need efficient and fast
method to compute the covariance matrix on FPGA. with a less number of
multiplier possible. Infact for a covariance matrix 16 x 16 i need about
6000 multipliers! So i have seen the correlators based on hard-limiting (
sign+xor + counter) at this link

https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CDYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usg=AFQjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja

but i don't know if this technique is right, on simulink is very different
from the results of normal correlator.
Can someone help me?

thanks
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158264
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined systems
From: wzab01@gmail.com
Date: Wed, 30 Sep 2015 06:07:57 -0700 (PDT)
Links: << >> << T >> << A >>

W dniu =C5=9Broda, 30 wrze=C5=9Bnia 2015 09:41:15 UTC+1 u=C5=BCytkownik kaz=
 napisa=C5=82:
> Any VHDL compiler cannot be a useful compiler unless it respects the user
> entered registers. Though it may fit an equivalent arrangement as in
> register retiming for timing purposes.
>=20
> Register delay stages is obviously what we are talking about rather than
> combinatorial/routing delays which is a concern for each register timing
> and which the tool decides together with any constraints from user.
>=20
> It is up to user to decide the register delay stages. It cannot be
> technology sensitive unless you are doing some high level coding that doe=
s
> not specify registers. I don't know what this level is though.
>=20
> How come a user build a design without being correct about register delay=
.
> How do you add streams or multiply or switch etc. and ask the tool to do
> the job?

In the systems which I have to build there are some paremetrized components=
, in which latency depends on their parameters. Unfortunately I can not pub=
lish the original designs but a simplified version of one of those systems =
is provided as a demonstration of the method on OpenCores.
For example I have a block for finding the maximum value from certain numbe=
r of inputs. It is a tree built from elemantary comparators.=20
When looking for optimal implementation (in terms of resource usage and max=
imum clock frequency) I have to select the number of values compared simult=
aneously in such a basic comparator. My implementation automatically adjust=
s number of stages to the number of inputs in an elementary comparator and =
in the whole system. Of course the number of stages affects the latency (de=
lay in number of clocks). There are many such blocks which may be adjusted =
independently.
Tryig to keep design adjusted properly (in a sense that all latencies in pa=
rallel pipelines are equal) is really difficult and error-prone.
So thats why I needed a tool which does it for me.
Of course I have to analyze the results, and sometime introduce manual corr=
ections...
Does it answer the question above?

Regards,
Wojtek

Article: 158265
Subject: Re: Automatic latency balancing in VHDL-implemented complex pipelined systems
From: "kaz" <37480@FPGARelated>
Date: Wed, 30 Sep 2015 09:45:27 -0500
Links: << >> << T >> << A >>

>W dniu Åroda, 30 wrzeÅnia 2015 09:41:15 UTC+1 uÅ¼ytkownik
>kaz napisaÅ:
>> Any VHDL compiler cannot be a useful compiler unless it respects the
user
>> entered registers. Though it may fit an equivalent arrangement as in
>> register retiming for timing purposes.
>> 
>> Register delay stages is obviously what we are talking about rather
than
>> combinatorial/routing delays which is a concern for each register
timing
>> and which the tool decides together with any constraints from user.
>> 
>> It is up to user to decide the register delay stages. It cannot be
>> technology sensitive unless you are doing some high level coding that
>does
>> not specify registers. I don't know what this level is though.
>> 
>> How come a user build a design without being correct about register
>delay.
>> How do you add streams or multiply or switch etc. and ask the tool to
do
>> the job?
>
>In the systems which I have to build there are some paremetrized
>components, in which latency depends on their parameters. Unfortunately I
can not
>publish the original designs but a simplified version of one of those
systems
>is provided as a demonstration of the method on OpenCores.
>For example I have a block for finding the maximum value from certain
>number of inputs. It is a tree built from elemantary comparators. 
>When looking for optimal implementation (in terms of resource usage and
>maximum clock frequency) I have to select the number of values compared
>simultaneously in such a basic comparator. My implementation
automatically
>adjusts number of stages to the number of inputs in an elementary
comparator and
>in the whole system. Of course the number of stages affects the latency
>(delay in number of clocks). There are many such blocks which may be
adjusted
>independently.
>Tryig to keep design adjusted properly (in a sense that all latencies in
>parallel pipelines are equal) is really difficult and error-prone.
>So thats why I needed a tool which does it for me.
>Of course I have to analyze the results, and sometime introduce manual
>corrections...
>Does it answer the question above?
>
>Regards,
>Wojtek

so in short you regenerate some components with new latency different from
the intended and tested one. I will just balance the latency manually and
run the test. I don't see much practical scope for automating such
change.

Kaz
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158266
Subject: DDR* SDRAM modules for simulation
From: Aleksandar Kuktin <akuktin@gmail.com>
Date: Wed, 30 Sep 2015 21:38:01 +0000 (UTC)
Links: << >> << T >> << A >>

Hi all.

Is there, somewhere, an open-source Verilog (or VHDL, but Verilog is 
preferred) module of a DDR1/2/3 SDRAM that can be used for simulating a 
memory chip/module??

I want to build a memory controller, but want to do as much as possible 
in the simulator and hopefully only verify the correctness of it in 
silicon.

Article: 158267
Subject: Re: Correlator of a big antenna array on FPGA
From: rickman <gnuarm@gmail.com>
Date: Wed, 30 Sep 2015 17:54:39 -0400
Links: << >> << T >> << A >>

On 9/30/2015 8:13 AM, ste3191 wrote:
> Hi, i have a serious problem with the architecture of a correlator for a
> planar antenna array (16 x 16).
> Theoretically i can't implent the normal expression sum(X*X^H) because i
> would obtain a covariance matrix of 256 x 256. Then i can think to
> implement the spatial smoothing technique, namely it takes an average of
> overlapped subarray, with the advantage to have a smaller covariance
> matrix. This is right but is slow technique!! I need efficient and fast
> method to compute the covariance matrix on FPGA. with a less number of
> multiplier possible. Infact for a covariance matrix 16 x 16 i need about
> 6000 multipliers! So i have seen the correlators based on hard-limiting (
> sign+xor + counter) at this link
>
> https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CDYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usg=AFQjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja
>
> but i don't know if this technique is right, on simulink is very different
> from the results of normal correlator.
> Can someone help me?

Even though your solution will be implemented in an FPGA, I'm not sure 
the FPGA group is the best place to ask this question since it is about 
the algorithm more than the FPGA implementation.  I am cross posting to 
the DSP group to see if anyone there has experience with it.

That said, you don't say what your data rate and processing rates are. 
How often do you need to run this calculation?  If it is slow enough you 
can use the same multipliers for many computation to produce one result. 
  Or will this be run on every data sample at a high rate?

-- 

Rick

Article: 158268
Subject: Re: DDR* SDRAM modules for simulation
From: rickman <gnuarm@gmail.com>
Date: Wed, 30 Sep 2015 17:57:30 -0400
Links: << >> << T >> << A >>

On 9/30/2015 5:38 PM, Aleksandar Kuktin wrote:
> Hi all.
>
> Is there, somewhere, an open-source Verilog (or VHDL, but Verilog is
> preferred) module of a DDR1/2/3 SDRAM that can be used for simulating a
> memory chip/module??
>
> I want to build a memory controller, but want to do as much as possible
> in the simulator and hopefully only verify the correctness of it in
> silicon.

You might check with the memory makers.  I know they have IBIS models 
now, but at one time you could get HDL simulation models I believe.  If 
you are using this as a reference to test your memory controller, it 
will be useful to have a verified memory model rather than rolling your 
own which will likely have similar conceptual mistakes as your memory 
controller.

Which part are you looking at using?

-- 

Rick

Article: 158269
Subject: Re: DDR* SDRAM modules for simulation
From: Aleksandar Kuktin <akuktin@gmail.com>
Date: Wed, 30 Sep 2015 23:26:52 +0000 (UTC)
Links: << >> << T >> << A >>

On Wed, 30 Sep 2015 17:57:30 -0400, rickman wrote:

> On 9/30/2015 5:38 PM, Aleksandar Kuktin wrote:
>> Hi all.
>>
>> Is there, somewhere, an open-source Verilog (or VHDL, but Verilog is
>> preferred) module of a DDR1/2/3 SDRAM that can be used for simulating a
>> memory chip/module??
>>
>> I want to build a memory controller, but want to do as much as possible
>> in the simulator and hopefully only verify the correctness of it in
>> silicon.
> 
> You might check with the memory makers.  I know they have IBIS models
> now, but at one time you could get HDL simulation models I believe.  If
> you are using this as a reference to test your memory controller, it
> will be useful to have a verified memory model rather than rolling your
> own which will likely have similar conceptual mistakes as your memory
> controller.
> 
> Which part are you looking at using?

No particular parts at this point. I'm sort-of aiming at random mass 
produced DIMM modules, but honestly I still haven't settled on the 
interface yet. I'd like to use DDR1 because the memory clients are not 
expected to have a very high throughput so the newer interfaces are 
basically overkill. However, I'm not sure if those old modules will be 
available over the next decade (or two, or three).

BTW, I found something over at Micron. At first glance it seems usable, 
but I'll have to read it to be sure.

Article: 158270
Subject: Re: DDR* SDRAM modules for simulation
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Wed, 30 Sep 2015 18:34:06 -0500
Links: << >> << T >> << A >>

On Wed, 30 Sep 2015 23:26:52 +0000, Aleksandar Kuktin wrote:

> On Wed, 30 Sep 2015 17:57:30 -0400, rickman wrote:
> 
>> On 9/30/2015 5:38 PM, Aleksandar Kuktin wrote:
>>> Hi all.
>>>
>>> Is there, somewhere, an open-source Verilog (or VHDL, but Verilog is
>>> preferred) module of a DDR1/2/3 SDRAM that can be used for simulating
>>> a memory chip/module??
>>>
>>> I want to build a memory controller, but want to do as much as
>>> possible in the simulator and hopefully only verify the correctness of
>>> it in silicon.
>> 
>> You might check with the memory makers.  I know they have IBIS models
>> now, but at one time you could get HDL simulation models I believe.  If
>> you are using this as a reference to test your memory controller, it
>> will be useful to have a verified memory model rather than rolling your
>> own which will likely have similar conceptual mistakes as your memory
>> controller.
>> 
>> Which part are you looking at using?
> 
> No particular parts at this point. I'm sort-of aiming at random mass
> produced DIMM modules, but honestly I still haven't settled on the
> interface yet. I'd like to use DDR1 because the memory clients are not
> expected to have a very high throughput so the newer interfaces are
> basically overkill. However, I'm not sure if those old modules will be
> available over the next decade (or two, or three).
> 
> BTW, I found something over at Micron. At first glance it seems usable,
> but I'll have to read it to be sure.

Trying to second-guess the PC market is a fool's game.

You may want to look around and see if anything is marketed toward 
embedded systems -- there are plenty of those that do use DRAM, and 
people doing the systems generally like to see long life parts.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Article: 158271
Subject: Re: DDR* SDRAM modules for simulation
From: Theo Markettos <theom+news@chiark.greenend.org.uk>
Date: 01 Oct 2015 10:23:23 +0100 (BST)
Links: << >> << T >> << A >>

Aleksandar Kuktin <akuktin@gmail.com> wrote:
> No particular parts at this point. I'm sort-of aiming at random mass 
> produced DIMM modules, but honestly I still haven't settled on the 
> interface yet. I'd like to use DDR1 because the memory clients are not 
> expected to have a very high throughput so the newer interfaces are 
> basically overkill. However, I'm not sure if those old modules will be 
> available over the next decade (or two, or three).

There will probably be chip availability for a while, but prices for modules
are heading skywards.  So depends how price sensitive you are, and how much
you need to push the envelope.  2GB DDR1 is expensive, 256MB DDR1 isn't
because there are end of line modules floating around.  However I imagine
that's unlikely to be the case for the next decade.  I'd guess going with
DDR3 would buy you a decade over DDR1 - but obviously more complex.

If you aren't doing high throughput can you do SDRAM?  That market seems to
be more stable, and the controllers are easier.

Theo

Article: 158272
Subject: Re: DDR* SDRAM modules for simulation
From: wzab01@gmail.com
Date: Thu, 1 Oct 2015 02:40:16 -0700 (PDT)
Links: << >> << T >> << A >>

W dniu =C5=9Broda, 30 wrze=C5=9Bnia 2015 22:38:11 UTC+1 u=C5=BCytkownik Ale=
ksandar Kuktin napisa=C5=82:
> Hi all.
>=20
> Is there, somewhere, an open-source Verilog (or VHDL, but Verilog is=20
> preferred) module of a DDR1/2/3 SDRAM that can be used for simulating a=
=20
> memory chip/module??
>=20
> I want to build a memory controller, but want to do as much as possible=
=20
> in the simulator and hopefully only verify the correctness of it in=20
> silicon.

Have you tried the FMF models: http://www.freemodelfoundry.com/fmf_VHDL_mod=
els.php
Namely: http://www.freemodelfoundry.com/ram.php
It seems, that in http://www.freemodelfoundry.com/fmf_models/ram/all_ram_20=
140302.tar.gz you can find a few DDR memories.

Regards,
Wojtek

Article: 158273
Subject: Re: Correlator of a big antenna array on FPGA
From: "ste3191" <107600@FPGARelated>
Date: Thu, 01 Oct 2015 14:15:56 -0500
Links: << >> << T >> << A >>

>On 9/30/2015 8:13 AM, ste3191 wrote:
>> Hi, i have a serious problem with the architecture of a correlator for
a
>> planar antenna array (16 x 16).
>> Theoretically i can't implent the normal expression sum(X*X^H) because
i
>> would obtain a covariance matrix of 256 x 256. Then i can think to
>> implement the spatial smoothing technique, namely it takes an average
of
>> overlapped subarray, with the advantage to have a smaller covariance
>> matrix. This is right but is slow technique!! I need efficient and
fast
>> method to compute the covariance matrix on FPGA. with a less number of
>> multiplier possible. Infact for a covariance matrix 16 x 16 i need
about
>> 6000 multipliers! So i have seen the correlators based on hard-limiting
(
>> sign+xor + counter) at this link
>>
>>
>https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&vedDYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usg¯QjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja
>>
>> but i don't know if this technique is right, on simulink is very
>different
>> from the results of normal correlator.
>> Can someone help me?
>
>Even though your solution will be implemented in an FPGA, I'm not sure 
>the FPGA group is the best place to ask this question since it is about 
>the algorithm more than the FPGA implementation.  I am cross posting to 
>the DSP group to see if anyone there has experience with it.
>
>That said, you don't say what your data rate and processing rates are. 
>How often do you need to run this calculation?  If it is slow enough you

>can use the same multipliers for many computation to produce one result.

>  Or will this be run on every data sample at a high rate?
>
>-- 
>
>Rick

Yes, the sampling rate is higher than 80MSPS and i can't share resources.
I posted it on dsp forum but nobody has answered yet.
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158274
Subject: Re: Correlator of a big antenna array on FPGA
From: rickman <gnuarm@gmail.com>
Date: Thu, 1 Oct 2015 15:38:00 -0400
Links: << >> << T >> << A >>

On 10/1/2015 3:15 PM, ste3191 wrote:
>> On 9/30/2015 8:13 AM, ste3191 wrote:
>>> Hi, i have a serious problem with the architecture of a correlator for
> a
>>> planar antenna array (16 x 16).
>>> Theoretically i can't implent the normal expression sum(X*X^H) because
> i
>>> would obtain a covariance matrix of 256 x 256. Then i can think to
>>> implement the spatial smoothing technique, namely it takes an average
> of
>>> overlapped subarray, with the advantage to have a smaller covariance
>>> matrix. This is right but is slow technique!! I need efficient and
> fast
>>> method to compute the covariance matrix on FPGA. with a less number of
>>> multiplier possible. Infact for a covariance matrix 16 x 16 i need
> about
>>> 6000 multipliers! So i have seen the correlators based on hard-limiting
> (
>>> sign+xor + counter) at this link
>>>
>>>
>> https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&vedDYQFjACahUKEwjwsKjT257IAhVlgXIKHQKFCWw&url=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA337434&usg¯QjCNG5QUylZORV9KFHYizyu1QJZSBM5A&bvm=bv.103627116,d.d2s&cad=rja
>>>
>>> but i don't know if this technique is right, on simulink is very
>> different
>>> from the results of normal correlator.
>>> Can someone help me?
>>
>> Even though your solution will be implemented in an FPGA, I'm not sure
>> the FPGA group is the best place to ask this question since it is about
>> the algorithm more than the FPGA implementation.  I am cross posting to
>> the DSP group to see if anyone there has experience with it.
>>
>> That said, you don't say what your data rate and processing rates are.
>> How often do you need to run this calculation?  If it is slow enough you
>
>> can use the same multipliers for many computation to produce one result.
>
>>   Or will this be run on every data sample at a high rate?
>>
>> --
>>
>> Rick
>
> Yes, the sampling rate is higher than 80MSPS and i can't share resources.
> I posted it on dsp forum but nobody has answered yet.

Yes, I saw that.  Looks like you beat me to it.  lol

I don't know where else to seek advice.  Maybe talk to the FPGA vendors? 
  I know they have various expertise in applications.  Is this something 
you will end up building?  If so, and it uses a lot of resources, you 
should be able to get some application support.

You know, 80 MHz is not so fast for multiplies or adds.  The multiplier 
block in most newer FPGAs will run at 100's of MHz.  So you certainly 
should be able to multiplex the multiplier unit by 4x or more.  But that 
really doesn't solve your problem if you want to do it on a single chip. 
  I haven't looked at the high end, but I'm pretty sure they don't put 
1500 multipliers on a chip.  But it may put you in the ballpark where 
you can do this with a small handful of large FPGAs.  Very pricey though.

-- 

Rick

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search