Messages from 108975

Article: 108975
Subject: Re: Buffering the critical path.
From: Ray Andraka <ray@andraka.com>
Date: Tue, 19 Sep 2006 15:19:44 -0400
Links: << >> << T >> << A >>

Peter Alfke wrote:
> Vessumesh, if you refuse to answer specific helpful questions, then I
> suggest you figure this out yourself, and do not bother this newsgroup.
> Peter
> 
> vssumesh wrote:
> 
>>Peter Alfke wrote:
>>
>>>What kind of device are you using?
>>>20 ns for a 32-bit adder (using dedicated carry) would be ridiculously
>>>slow...
>>>Dedicated carry, available in all Xilinx FPGA devices, uses less than
>>>50 ps per bit (plus some basic delay).
>>>Peter Alfke
>>>
>>>===========
>>>vssumesh wrote:
>>>
>>>>Hello all,
>>>>   In my design i am using a 32 bit adder and some combinational logic
>>>>after that. The full path i want to constrain to double the clock
>>>>period (20ns) and it is not constraing. When analysed the critical path
>>>>observed that there is big carry chain for the adder and a big routing
>>>>delay between the combinational logic (which i never expected). Is the
>>>>big carry chain is causing the trouble in the router. I am thinking of
>>>>buffering the output of the adder with a -ve edge (constrain that path
>>>>to 5ns). And then constrain the other path that is after the buffer to
>>>>next stage FF to 16ns. Will this buffering ease the routing effort.
>>>>Please advice.
>>>>Thanks and regards
>>>>Sumesh V S
>>
>>No 20ns for the adder and the remaining combinational logic. The adder
>>delay is as you said is very much less.
> 
> 

SOunds like there are several layers of combinatorial logic.  Pipeline 
the design.  Also, I think he is using the term "buffer" to indicate 
adding a register stage.

The adder bits are like Peter said, about 50ps per bit, but the time to 
get on and off the carry chain adds more than 2ns, still nowhere near 
the 20ns.

It isn't the carry chain causing the problem.  The problem comes about 
from using many levels of logic (ie the signal goes through lots of 
LUTs) between the flip-flops plus the propagation delay associated with 
the carry chain.  You need to look at the ratio of logic delay to 
routing delay.  If the routing delay on the critical path is more than 
the logic delay, you can likely fix the problem with some manual 
placement.  The placer does a very poor job placing the additional 
layers of LUTs in multi-layer combinatorial logic.  The LUT connected to 
the flip-flop places well, but the LUTs leading up to that one get 
scattered to the far reaches of the chip.  You could try a higher effort 
level on the placement, but that may not provide enough improvement. 
You'll get better results floorplanning the locations of the additional 
layers of LUTs to be laid out logically and close to the rest of the 
LUTs in the path.  Trouble is, the LUT names are subject to change on 
subsequent synthesizer runs, so you have to be really careful.  The best 
solution, if your design can support it, is to pipeline the logic deeper.

Article: 108976
Subject: Re: Spartan3: Multiplier Madness
From: Ray Andraka <ray@andraka.com>
Date: Tue, 19 Sep 2006 15:29:35 -0400
Links: << >> << T >> << A >>

Nico Coesel wrote:

> Austin Lesea <austin@xilinx.com> wrote:
> 
> 
>>Nico,
>>
>>OK, here it is: (for S3, V2, and V2 Pro)
>>
>>"It is likely that the delay will be marginally smaller if you
>>tie the 2 LSB inputs and use the upper 16 inputs only.  However, the
>>software model is pretty simple and won't model that as far as I can
>>remember.  Also, since one of the inputs goes through the Booth encoder
>>it might not be as substantial of an improvement as it would be with an
>>"original recipe" multiplier."
> 
> 
> So what you are saying is that the multiplier is faster when the upper
> inputs are being used, but the place & route software assumes the
> upper bits are slower?
> 
> 

The software assumes the LSBs  of the inputs are toggling and affecting 
the MSBs of the outputs.  The timing model doesn't take into account the
fact that the MSB outputs have shorter propagation delays from the 
higher up input bits, so it doesn't reflect the advantage of using only 
the upper bits when you do the timing analysis.  Instead, I believe the 
model just assumes a certain delay from any of the inputs to a specific 
output.

Article: 108977
Subject: Re: ddr clock issues
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: Tue, 19 Sep 2006 12:35:03 -0700
Links: << >> << T >> << A >>

Nico Coesel wrote:
> All you need is a normal clock and a 90 degrees phase shifted clock.
> The whole clocking outside the fpga thing is unnecessary. If you place
> the output flipflops inside the IOBs and use an fddr in the IOB to
> replicate the internal clock, all signals connected to the DDR memory
> will have the same delay.

But the DDR spec says the DQS strobe for data written to the
fpga must be center aligned. The DQS is in phase with the
DDR clock. That means the data must be put on the lines
1/2 of 1/2 of a clock cycle early for proper alignment.

This requires a clock that is 270 degrees out of phase from the
DDR's clock. This is the clock used for the data lines going into
the DDR..

I don't understand the "clocking outside the fpga" you mention.
The fpga currently has one 50 mhz external clock source. I
run that through a DCM to make it 100 mhz. Then in order for
the DDR to work I need to use two more DCM's. One is used
to make the DDR clocks (positive and negative). The other is
used for everything else.

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 108978
Subject: Re: Spartan3: Multiplier Madness
From: Austin Lesea <austin@xilinx.com>
Date: Tue, 19 Sep 2006 13:08:36 -0700
Links: << >> << T >> << A >>

Ray,

Correct.

Austin

Ray Andraka wrote:
> Nico Coesel wrote:
> 
>> Austin Lesea <austin@xilinx.com> wrote:
>>
>>
>>> Nico,
>>>
>>> OK, here it is: (for S3, V2, and V2 Pro)
>>>
>>> "It is likely that the delay will be marginally smaller if you
>>> tie the 2 LSB inputs and use the upper 16 inputs only.  However, the
>>> software model is pretty simple and won't model that as far as I can
>>> remember.  Also, since one of the inputs goes through the Booth encoder
>>> it might not be as substantial of an improvement as it would be with an
>>> "original recipe" multiplier."
>>
>>
>> So what you are saying is that the multiplier is faster when the upper
>> inputs are being used, but the place & route software assumes the
>> upper bits are slower?
>>
>>
> 
> 
> The software assumes the LSBs  of the inputs are toggling and affecting
> the MSBs of the outputs.  The timing model doesn't take into account the
> fact that the MSB outputs have shorter propagation delays from the
> higher up input bits, so it doesn't reflect the advantage of using only
> the upper bits when you do the timing analysis.  Instead, I believe the
> model just assumes a certain delay from any of the inputs to a specific
> output.

Article: 108979
Subject: Re: ddr clock issues
From: "Gabor" <gabor@alacron.com>
Date: 19 Sep 2006 13:58:05 -0700
Links: << >> << T >> << A >>

David Ashley wrote:
[snip]
> But the open cores DDR doesn't make use of the DQS strobe generated
> by the DDR device itself. I'm only trying to run at 100 mhz. In that
> case xilinx app notes say the timing is adequate so the DQS strobe isn't
> needed to capture data reliably. Maybe the timing would get easier if
> the logic made use of the DQS strobe from the DDR.
>

I'm doing pretty much the same thing with Virtex 2 (similar
architecture
to Spartan 3) on a proprietary board.  This board has a 66.66 MHz
clock that is doubled to run the DDR at 133 MHz (266 DDR).  I
do not use the DQS inputs for sampling data.  I did need to tweak
the delay in my DCM's to get reliable sampling.  I did not use any
expensive test equipment for this, I just used the variable delay
mode of the DCM to run tests at various phases and centered
the final fixed value within the area that seemed to work.

At 100 MHz I would expect the timing margins to be quite good
even in the slowest speed grade parts.  I'm using Virtex 2 -5
speed grade in my 133 MHz design.

> I have a feeling adding some constraints would make the thing work
> with a single DCM. Unfortunately I have no clue what constraints to
> add, as I don't know what's going wrong (and don't know much about
> constraints writing anyway).
>

The problem with  a single DCM is that you need to make up
for phase differences in the board routing.  Signals to the DDR
memory arrive there some prop. delay after they leave the FPGA.
At the memory end they need to meet setup and hold time to
the clock as it arrives at the memory, usually at the same
board routing delay as the clock.  So if your clock and data/
address/control outputs use the same internal clock, you
would need to use board routing or some other delay element
external to the FPGA to ensure hold time is met at the memory.

Then the data returning from the memory shows up 2 board
prop. delays from the driven clock, plus the clock to output
timing specified in the memory datasheet.  So the sampling
point isn't exactly centered within the outgoing clock half-
period.  So your sampling clock may need to be off by some
phase other than 90 degrees from the clock driving your
outputs.  All of this is pretty hard to accomplish with one
DCM, IMHO.  And just adding timing constraints without the
mechanism to meet them makes life miserable on the tools,
which usually fail miserably in response (they have only
internal routing delays to make up your requested timing).

Article: 108980
Subject: Re: ddr clock issues - success
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: Tue, 19 Sep 2006 14:20:43 -0700
Links: << >> << T >> << A >>

David Ashley wrote:
> I will certainly share whatever I learn.

I got my simple write/ read-verify system to work. I was able
to get rid of one of the DCM's, so I only need 2.

DCM #1 takes 50 mhz input and I use the 2X output to
drive a clock buffer. This is the tclock signal. Feedback
comes from the clock buffer.

DCM #2 takes tclock and produces 4 phase output.
The 0 and 270 signals drive 2 clock buffers. The 0
clock buffered version goes back into the feedback input
on the DCM. These signals are sys_clk and sys_clk270.

FDDR's are used to produce the DDR's clock. Their inputs
are hardwired for "01" for the true clock, and "10" for
the negative clock.  Both FDDR's take clock from
sys_clk and inverted sys_clk. The inverter is implicit
in the FDDR configuration, no delay penalty exists.

Here's the trick: The original open cores DDR controller
source sampled the data from the DDR on sys_clk rising
and falling edge. I instead push out the sampling by
1/4 of a cycle:
rising_edge(sys_clk)  replaced by falling_edge(sys_clk270)
falling_edge(sys_clk) replaced by rising_edge(sys_clk270)

Then I made a slight tweak to get the sampled data back
into the sys_clk domain as required elsewhere. It works
fine. I had a feeling the problem was in the sampling side
since no special machinery existed to sample in the middle
of when it was valid. The setup time was not being met.

Here's a sample of the before code:
-- **** CODE BEFORE FIX
      process (sys_clk)
      begin
         if rising_edge(sys_clk) then

            -- sample HI-data word with rising edge
            data_hi_q <= data;

            -- store HI- und LO- data word  in 32bit output register
            data_out_q <= data_hi_q & data_lo2_q;

         end if;
      end process;
-- ...
      process (sys_clk)
      begin
         if falling_edge(sys_clk) then

            -- sample LO- word with falling edge
            data_lo1_q <= data;

            -- 1 clock additional delay to store HI- and LO-word
            -- with the next rising edge as 32bit word
            data_lo2_q <= data_lo1_q;
         end if;
      end process;

-- ***** CODE AFTER FIX

      process (sys_clk270)
      begin
         if falling_edge(sys_clk270) then

            -- sample HI-data word with rising edge
            data_hi_q <= data;

          end if;
      end process;

	process (sys_clk) -- (DA) fix to get back into sys_clk domain
	begin
		if rising_edge(sys_clk) then
            -- store HI- und LO- data word  in 32bit output register
			data_out_q <= data_hi_q & data_lo2_q;
		end if;
	end process;

-- ...
      process (sys_clk270)
      begin
         if rising_edge(sys_clk270) then

            -- sample LO- word with falling edge
            data_lo1_q <= data;

            -- 1 clock additional delay to store HI- and LO-word
            -- with the next rising edge as 32bit word
            data_lo2_q <= data_lo1_q;
         end if;
      end process;


Hope this is of use to other people.
-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 108981
Subject: Re: ddr clock issues
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: Tue, 19 Sep 2006 14:42:15 -0700
Links: << >> << T >> << A >>

Gabor wrote:
> David Ashley wrote:
> [snip]
> I'm doing pretty much the same thing with Virtex 2 (similar
> architecture
> to Spartan 3) on a proprietary board.  This board has a 66.66 MHz
> clock that is doubled to run the DDR at 133 MHz (266 DDR).  I
> do not use the DQS inputs for sampling data.  I did need to tweak
> the delay in my DCM's to get reliable sampling.  I did not use any
> expensive test equipment for this, I just used the variable delay
> mode of the DCM to run tests at various phases and centered
> the final fixed value within the area that seemed to work.

See other email in this thread for details. I got it working
by sampling data from the DDR on the 90 degree phase
clock, now it works fine. No tweaking of the DCM necessary.
And I'm only using one DCM.

The DDR's DQS output transitions right when the data
becomes valid out of the DDR. But the DDR controller
has to transition the DQS right in the middle of the data
going to the DDR being valid. This is hardly fair. I wish
there wasn't even the DQS signal, it's just a PITA.

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 108982
Subject: Re: Hilbert Transform in verilog or VHDL -- it has got to be out there somewhere
From: "Guenter" <GHEDWHCVEAIS@spammotel.com>
Date: 19 Sep 2006 14:50:08 -0700
Links: << >> << T >> << A >>

Austin Lesea wrote:
> OK,
>
> I have looked through a lot of places, but it seems that opencores.org,
> etc. just do not have any Hilbert transform blocks.
>
> I would think that this is not exactly rocket science, as the common
> ways to do this are posted all over the place, and there are c programs
> for DSP also posted.  Even the Xilinx DSP libraries don't seem to have a
> free Hibert transformer (even one for $?).
>

This paper has something on page 11:

http://www.xilinx.com/ipcenter/catalog/logicore/docs/da_fir.pdf

Cheers,

Guenter

Article: 108983
Subject: Re: Hilbert Transform in verilog or VHDL -- it has got to be out
From: Austin Lesea <austin@xilinx.com>
Date: Tue, 19 Sep 2006 15:15:51 -0700
Links: << >> << T >> << A >>

Guenter,

Boy, is that embarrassing: it is right where it is supposed to be, on
the free logic cores stuff.

But, in my defense, it was 'hidden' in with the FIR filter wizard, as
that is how they chose to implement it.

Now if only the search engine would have found it?

Maybe if I didn't look for "Hilbert", but instead looked for "FIR filters"?

Who would have guessed?

It is not only there where you pointed me, but also:
http://www.xilinx.com/bvdocs/ipcenter/data_sheet/fir_compiler_ds534.pdf

Veil dank Guenther,

Austin

Guenter wrote:
> Austin Lesea wrote:
>> OK,
>>
>> I have looked through a lot of places, but it seems that opencores.org,
>> etc. just do not have any Hilbert transform blocks.
>>
>> I would think that this is not exactly rocket science, as the common
>> ways to do this are posted all over the place, and there are c programs
>> for DSP also posted.  Even the Xilinx DSP libraries don't seem to have a
>> free Hibert transformer (even one for $?).
>>
> 
> This paper has something on page 11:
> 
> http://www.xilinx.com/ipcenter/catalog/logicore/docs/da_fir.pdf
> 
> Cheers,
> 
> Guenter
>

Article: 108984
Subject: Re: ddr clock issues - success
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: Tue, 19 Sep 2006 15:20:00 -0700
Links: << >> << T >> << A >>

David Ashley wrote:
> Hope this is of use to other people.
> -Dave
> 

I've gotten email asking for the source, so I put it up, it can
be found here:

http://www.xdr.com/dash/fpga/

It's targeted to a linux build environment. It needs unisim
to be in the right place in order to build as is...or tweak the
Makefile.

It's a pretty much identical copy of the open cores ddr
controller, except I removed one DCM, and I wrapped
it all in a synthesizable tester targeted to the
spartan-3e starter board. The test just fills up memory
with a non-repeating pattern, then reads it back out.
If the pattern matches an LED stays lit. It keeps doing
this forever.

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 108985
Subject: Metastability resolution
From: comp.arch.fpga.posting.account@googlemail.com
Date: 19 Sep 2006 15:22:18 -0700
Links: << >> << T >> << A >>

I am designing a crossdomain synchroniser and wanted to check that I
understand the formula for the mean time between metastable failures
correctly. Sorry if the answer can be easily found on the web; I tried
to find it and failed.

The usual formula is MTBF=1/(T0 f1 f2 e^{-t/tau}), where f1 is the
clock frequency of a flip-flop's clock, f2 is the edge frequency at
which its input transitions, T0 is the metastable window aperture size
(it *is* called that?), and tau is the metastability time constant. A
failure happens whenever the flip-flop becomes metastable and remains
so for at least time t.

The value of T0 seems impossible to find for Xilinx FPGAs, presumably
because it varies exponentially with tau, and that is difficult enough
to measure accurately (?). I think that T0 can be at most t_{setup}
+t_{hold} so that might be one way of obtaining a value (?) [, though
Xilinx say that negative hold times are not guaranteed, so I should
probably stick with just t_{setup} whenever the hold time is negative]

The above formula only works when the two clocks are independent. If
they are not then I think a good upper bound is MTBF >= 1/( ( min f1 f2
) e^{-t/tau} ) (?) The rationalle is that the flip-flop cannot go
metastable any more often than either f1 or f2 (remembering f2 is the
edge frequency, though since the potentially metastable flip-flop is
fed from another flip-flop clocked with frequency f2 that ends up being
the same thing). It might be that even when the two clocks are produced
by the same DCM there will be sufficient jitter to allow the upper
bound to be improved considerably, but probably not if the best known
bound on T0 is of a similar magnitude to the jitter (?)

Could someone please confirm the above is corect? Many thanks in
advance!

Article: 108986
Subject: Re: ISE Simulator Error 222: SuSE 10.1 Linux
From: "Roger" <enquiries@rwconcepts.co.uk>
Date: Tue, 19 Sep 2006 23:24:42 +0100
Links: << >> << T >> << A >>


"gauckler" <gauckler@fh-furtwangen.de> wrote in message 
news:1158562165.368801.101410@i3g2000cwc.googlegroups.com...
> Hi,
>
> i tried to simulate a small vhdl design with xilinx ISE (8.1 - 8.2
> spxx, Webpack or foundation) running SuSE 10.1 linux, unfortunately
> there is an error. Because the  VHDL code simulates  with SuSE 9.2  I
> assume the code is fine and there are no spaces in the file path.
>
>    Started : "Check Syntax".
>    Running vhpcomp
>    Compiling vhdl file "/home/PBuser2/parity/parity.vhd" in Library
> isim_temp.
>    Entity <parity> compiled.
>    Entity <parity> (Architecture <behavior>) compiled.
>    Compiling vhdl file "/home/PBuser2/parity/tb_parity.vhd" in Library
> isim_temp.
>    Entity <tb_parity_vhd> compiled.
>    Entity <tb_parity_vhd> (Architecture <behavior>) compiled.
>    Parsing "tb_parity_vhd_stx.prj": 0.03
>
>    Process "Check Syntax" completed successfully
>
>   Running Fuse ...
>   Parsing "tb_parity_vhd_beh.prj": 0.00
>   Building tb_parity_vhd_isim_beh.exe
>   ERROR:Simulator:222 - Generated C++ compilation was unsuccessful
>
> Has anybody simulated ISE isim under SuSE 10.1.  Any hint is
> appreciated.
>
> Andreas
>

Same thing with just ISE - no solution. Sorry. This really is something 
Xilinx should be sorting.

Rog.

Article: 108987
Subject: Re: Hilbert Transform in verilog or VHDL -- it has got to be out there somewhere
From: John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com>
Date: Tue, 19 Sep 2006 15:29:36 -0700
Links: << >> << T >> << A >>

On Tue, 19 Sep 2006 10:21:17 -0700, Austin Lesea <austin@xilinx.com>
wrote:

>OK,
>
>I have looked through a lot of places, but it seems that opencores.org,
>etc. just do not have any Hilbert transform blocks.
>
>I would think that this is not exactly rocket science, as the common
>ways to do this are posted all over the place, and there are c programs
>for DSP also posted.  Even the Xilinx DSP libraries don't seem to have a
>free Hibert transformer (even one for $?).
>
>Yes, I know how to go about doing one, but, if its already done, why
>recode the wheel?  After all, there are probably at least three good
>ways to do it on an FPGA, and ten bad ones.
>
>Since I have "friends in low places" in ham radio, having a public
>domain Hilbert would be useful for SSB, FM, AMSAT and other SDR
>applications.
>
>Some FIR, IIR, FFT, mixers, accumulators, DDFS, and so forth that are
>pretty easily found plus a Digilent $99 S200 pcb could make a useful
>foundation for software defined radio experiments (that and a
>http://www.digilentinc.com/Products/Detail.cfm?Prod=AIO1&Nav1=Products&Nav2=Accessory
>analog accessory pcb, or make your own A/D, D/A pcb).
>
>If anyone can point me to some sources, it would be appreciated.
>
>By the way, the TAPR class went well on Sunday in Tuscon, and now there
>are 28 more crazy hams out there who are really dangerous...
>
>The talk and slides will be posted when they do their web page for the
>2006 25th anniversary meeting.
>
>http://www.tapr.org
>
>Austin


An opamp-based allpass 90 degree phase shifter is pretty simple; 8
opamp sections, 8 caps, 24 resistors gives nice quadrature signals
over the voice range. And simulating a R-C section in an FPGA is
trivial. So it seems to me that one could do a nice Hilbert with a
fairly small amount of FPGA resources by just mimicing the opamp
circuit in discrete time. That would be a lot smaller than a FIR
implementation.

Anybody done it this way?

John

Article: 108988
Subject: Re: Metastability resolution
From: Austin Lesea <austin@xilinx.com>
Date: Tue, 19 Sep 2006 15:48:08 -0700
Links: << >> << T >> << A >>

Have you read:

http://tinyurl.com/qugxf

http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?iCountryID=1&iLanguageID=1&sTechX_ID=pa_metastability&BV_SessionID=@@@@1476187725.1158705950@@@@&BV_EngineID=cccfaddikmdkkhhcefeceihdffhdfjf.0

and

http://www.xilinx.com/bvdocs/appnotes/xapp094.pdf

?

Austin

Article: 108989
Subject: Re: Metastability resolution
From: comp.arch.fpga.posting.account@googlemail.com
Date: 19 Sep 2006 15:53:37 -0700
Links: << >> << T >> << A >>

Sorry to follow myself up. I made a mistake in the post (used max when
intending to use min, though neither is incorrect), so I cancelled it
and posted the corrected version moments later. Google groups seems to
have honoured the cancel request, but another Usenet server I use has
not. Sorry if you see two copies.

Article: 108990
Subject: Re: A strange problem of Chipscope
From: "Weng Tianxiang" <wtxwtx@gmail.com>
Date: 19 Sep 2006 15:55:35 -0700
Links: << >> << T >> << A >>

MM wrote:
> > Thanks. I tried but it doesn't work either.
>
> Post your MHS file...
>
> /Mikhail

Hi,
Here is a file I recorded for use by myself after I have successfully
inserted debuging information into a *.cdc file and used ChipScope
correctly without error.

How to start ChipScope procedure correctly:
1. Generate a project as usually, including all files containig signals
you want to debug;
2. Synthesize it without errors;
2. Start ChipScope Pro Core Insert;
3. Edit clock channel and trigger/data channels without errors: it
means
   that the number of signals must be the same as you set in the
previous page
   that leads you to go back to several pages to see if they are met:
black font is OK,
   read font is an error. You cannot go forward until you correct the
error.
4. Quit "ChipScope Pro Core Insert" software, then click 'Save' button
to save the edit
    file as *.cdc.
    'NEVER CLICK INSERT BUTTON', otherwise it would generate double
insertion error
    problem later.
5. Insert *.cdc file into the project by adding source file into the
project;
6. Run synthesis only;
7. Double click *.cdc file in the project and check if any signals are
needed to change;
8. 6-7 can be skipped if all debug signals are included in *.cdc.
8. Compile to generate bit stream file as usually.

Then a bitstream containing debugging informatin is generated.

ChipScope has some limits on how signals are accessed:
1. Input pin must be accessed through its registered values;
2. Output pin should be accessed through its internal drive signal;
3. All extra debugging signals, i.e., the signals that are added for
debugging,
   must be linked to an extra output pin to avoid them from
   being optimized out.
   The extra output pin must be added in *.puf for debugging use only.

A lesson:
Never put more than 10 signals at the first time and try the ChipScope
successfully.
It is a very complex system and every step has a chance to trigger a
miner over there. After you clear the way first time, you may add as
many signals as you want to.

Especially it is lucky and better if you have an helper who has
experiences.

Its manual is too detailed to start for a newbie.

Weng

Article: 108991
Subject: Re: Metastability resolution
From: comp.arch.fpga.posting.account@googlemail.com
Date: 19 Sep 2006 16:11:48 -0700
Links: << >> << T >> << A >>

Austin Lesea wrote:

Many thanks for your reply.

> Have you read:
>
> http://tinyurl.com/qugxf

Of course. It seems to be about the only source of the value of tau for
Xilinx FPGAs I could find. Have I missed other TexhXclusives that give
the same data for Virtex4/Spartan etc?

> http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?iCountryID=1&iLanguageID=1&sTechX_ID=pa_metastability&BV_SessionID=@@@@1476187725.1158705950@@@@&BV_EngineID=cccfaddikmdkkhhcefeceihdffhdfjf.0

That is the same thing, right?

> and
>
> http://www.xilinx.com/bvdocs/appnotes/xapp094.pdf

And that is yet another substantially identical copy, except in PDF?

I am very sorry if I missed it, but the article you refer to does not
seem to give a value for T0 and does not address the case when the two
clocks are not independent. What am I missing?

Article: 108992
Subject: Re: Metastability resolution
From: "Peter Alfke" <peter@xilinx.com>
Date: 19 Sep 2006 16:21:00 -0700
Links: << >> << T >> << A >>

I did not give a value for  T0 because it does not affect MTBF very
much.
The measurements were done with uncorrelated frequencies. If that is
not the case, all bets are off. Except that, of course, metastability
cannot ocur more often than once per clock period or once per data
change, whichever is the lower frequency.
I would be interested in your asynchronous environment.
Peter Alfke, Xilinx
====================
comp.arch.fpga.posting.acco...@googlemail.com wrote:
> Austin Lesea wrote:
>
> Many thanks for your reply.
>
> > Have you read:
> >
> > http://tinyurl.com/qugxf
>
> Of course. It seems to be about the only source of the value of tau for
> Xilinx FPGAs I could find. Have I missed other TexhXclusives that give
> the same data for Virtex4/Spartan etc?
>
> > http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?iCountryID=1&iLanguageID=1&sTechX_ID=pa_metastability&BV_SessionID=@@@@1476187725.1158705950@@@@&BV_EngineID=cccfaddikmdkkhhcefeceihdffhdfjf.0
>
> That is the same thing, right?
>
> > and
> >
> > http://www.xilinx.com/bvdocs/appnotes/xapp094.pdf
>
> And that is yet another substantially identical copy, except in PDF?
>
> I am very sorry if I missed it, but the article you refer to does not
> seem to give a value for T0 and does not address the case when the two
> clocks are not independent. What am I missing?

Article: 108993
Subject: Re: Metastability resolution
From: comp.arch.fpga.posting.account@googlemail.com
Date: 19 Sep 2006 16:43:34 -0700
Links: << >> << T >> << A >>

Peter Alfke wrote:

Many thanks for your reply.

> I did not give a value for  T0 because it does not affect MTBF very
> much.

Certainly not as much as tau, though it can make the difference between
being comfortable using a synchroniser with just one flip-flop and
needing two. Even the "trivial" upper bound for V2Pro is something like
0.2ns, so with a 10ns clock it increases MTBF by a factor of 50. I
would hope that T0 is much smaller than 0.2ns but have no way of
knowing for certain. Would I be right in thinking that T0 cannot be
measured with any accuracy?

> The measurements were done with uncorrelated frequencies. If that is
> not the case, all bets are off. Except that, of course, metastability
> cannot ocur more often than once per clock period or once per data
> change, whichever is the lower frequency.

So are you saying that the upper bound I came up with is correct? I
would certainly be pleased if you were.

> I would be interested in your asynchronous environment.

I will not be the final user of the synchroniser for which I need to
know the MTBF. I need to allow for the possibility that the two clocks
will be produced by the same DCM and might therefore be synchronous.

Article: 108994
Subject: Re: Metastability resolution
From: "Peter Alfke" <peter@xilinx.com>
Date: 19 Sep 2006 17:13:15 -0700
Links: << >> << T >> << A >>

comp.arch.fpga.posting.account@googlemail.com wrote:
> Peter Alfke wrote:
>
> Many thanks for your reply.
>
> > I did not give a value for  T0 because it does not affect MTBF very
> > much.
>
> Certainly not as much as tau, though it can make the difference between
> being comfortable using a synchroniser with just one flip-flop and
> needing two. Even the "trivial" upper bound for V2Pro is something like
> 0.2ns, so with a 10ns clock it increases MTBF by a factor of 50. I
> would hope that T0 is much smaller than 0.2ns but have no way of
> knowing for certain. Would I be right in thinking that T0 cannot be
> measured with any accuracy?
>
> > The measurements were done with uncorrelated frequencies. If that is
> > not the case, all bets are off. Except that, of course, metastability
> > cannot ocur more often than once per clock period or once per data
> > change, whichever is the lower frequency.
>
> So are you saying that the upper bound I came up with is correct? I
> would certainly be pleased if you were.
>
> > I would be interested in your asynchronous environment.
>
> I will not be the final user of the synchroniser for which I need to
> know the MTBF. I need to allow for the possibility that the two clocks
> will be produced by the same DCM and might therefore be synchronous.

Let's look at the basics:
A flip-flop has an undefined output delay when the D input changes
within a very tiny portion of the set-up time window, and the delay is
the longer the closer that change is to the center of the tiny window.
For a 3 ns extra delay I measured  (indirectly) this tiny window as a
small fraction of a femtosecond. Expressed this way, MTBF and data and
clock frequencies fall out of the equation, and the behavior looks as
if it were deterministic. So I consider this a basic figure of merit of
the flip-flop.

If your two frequencies are correlated, you may have a very hard time
calculating the proabbility that the two edges ever get that close.
They may always be very close, or they may never be close at all.
Especially if you are exposed to, or if you rely on jitter...
Peter Alfke

Article: 108995
Subject: Re: Buffering the critical path.
From: "vssumesh" <vssumesh_asic@yahoo.com>
Date: 19 Sep 2006 17:47:38 -0700
Links: << >> << T >> << A >>

> Peter Alfke wrote:
> > Vessumesh, if you refuse to answer specific helpful questions, then I
> > suggest you figure this out yourself, and do not bother this newsgroup.
> > Peter
Sorry Peter, but i did not mean that. Sorry for the confusion.
I am using v4LX60 for my design. And there is a requirement of adding
two 37 bit no and  doing some combinational logic based on that. The
total time is 20ns. The adder is taking very little time, the full
logic itself is taking around 4ns delay. But the main problem is with
routing delay. I forgot to tell you that it is a block RAM based
design. And it uses 128 BRAM frm v4lx60(implemented a 16 port RAM).
Also it uses the block RAMS in a scattered manner. So now i have placed
this block in the central region. So the last routing to the block
RAMis taking lot of delays.
In the previous version there was no combinational logic after the
adder and i got the timig correctly. But not now.
What i was asking is to add registers to latch the output of adder.I
thought like it would be good for the PAR to see two paths insted of 1
path from a source FF to destination FF. Also Ray there is 32*16 such
signals. Is it possible to manually route all those signals. I think
the pipeling is not possible since this is part of a pipeline stage of
a processor. Which expects the result in the same cycle. So pipelining
is not an option.
> It isn't the carry chain causing the problem.  The problem comes about
> from using many levels of logic (ie the signal goes through lots of
> LUTs) between the flip-flops plus the propagation delay associated with
> the carry chain.
Ray i was asking that if we brake the above long line into separate
parts using the +ve and -ve edge of the clocks is it possible to help
tool for a better PAR.
Thanks and regards
Sumesh V S

Article: 108996
Subject: Re: E1 to ethernet conversion
From: "jai.dhar@gmail.com" <jai.dhar@gmail.com>
Date: 19 Sep 2006 18:30:39 -0700
Links: << >> << T >> << A >>

TDMOE is a standard that converts TDM (T1/E1) to Ethernet.. used by
Asterisk and the like.

Article: 108997
Subject: Old vs. New FPGAs
From: "rickman" <gnuarm@gmail.com>
Date: 19 Sep 2006 18:48:28 -0700
Links: << >> << T >> << A >>

I was updating a CPU design I did a few years ago and I was a bit
disappointed in the results I see.  The CPU was originally targeted to
an Altera ACEX part which is 5 volt compatible (to give you an idea of
its age).  I did my own CPU because Altera does not support their NIOS
for that family.  I spent a fair amount of time optimizing the
architecture to be easy to implement in 4 input LUTs and other basic
elements found in FPGAs.  I coded it up for the ACEX async memories and
got it running.  If memory serves me, it clocked in at 55 MHz max and I
used it at 40 MHz.

Currently I wanted to look at how fast it might run if I redid it for a
current FPGA architecture using synchronous memories.  I compiled it
for a Spartan 3 and got the speed up to 77 MHz using less than 10% of
an XC3S400 (315 slices).  I am not impressed with the speed.  I
expected a much larger increase and had hoped for operation at over 100
MHz.  I checked the timing analyzer output and the signal paths are
pretty much what I expected, no oddball logic generation and I got
carry chains where I wanted them.  The slow paths have a few long route
times, so although it may approach 100 MHz with careful floorplanning,
I don't think this is worth the effort compared to the >> 100 MHz CPU
cores you can get from the FPGA vendors.

I was wondering if this small speed up is typical of improvements from
one or two generations difference in FPGAs?  The ACEX parts are
designed for economy, not for speed, just like the Spartans.  When I
did the initial design 3 or 4 years ago, the ACEX parts were old news
then!  Given that there was nothing in the design that is tailored for
one FPGA family over another, I guess I expected more like a 2X speedup
in the current technology chip.  Isn't that reasonable given the vast
difference in the timing specs in the data sheets?

Article: 108998
Subject: Re: New Lattice 32-bit Embedded Microprocessor Available Through Unique Open Source License
From: "rickman" <gnuarm@gmail.com>
Date: 19 Sep 2006 19:16:36 -0700
Links: << >> << T >> << A >>

Antti wrote:
> Jim Granville schrieb:
>
> > betterone11@gmail.com wrote:
> > > fpgaman wrote:
> > >
> > >>"http://www.latticesemi.com/products/intellectualproperty/latticemico32"
> > >
> finally - a 100% Eclipse-+GNU based SoC system with open-source RTL
> that just works.

I was looking at the open source agreement and one paragraph strikes me
as a bit odd.

Appendix C
3. The Provider grants to You a personal, non-exclusive right to use
object code created from the Software or a Derivative Work to
physically implement the design in devices such as a programmable logic
devices or application specific integrated circuits. You may distribute
these devices without accompanying them with a copy of this license or
source code.

It looks like the only rights to the object code created is to use it
in an ASIC or FPGA.  Am I just missing the point or does this keep you
from using this for any other purpose?  Or would there be no point to
any other purpose?  I am not real clear on which software the license
is actually talking about.

Article: 108999
Subject: Re: Buffering the critical path.
From: "Peter Alfke" <alfke@sbcglobal.net>
Date: 19 Sep 2006 20:08:45 -0700
Links: << >> << T >> << A >>

Sumesh, I have a special place in the dungeon for people who ask
questions where they leave the most important details out, and tell us
afterwards. "O, by the way..."
You started mentioning address and long carry chains, which -as we know
by now- are completely irrelevant to your problem.
You have a big routing mess, and you are not allowed to pipeline. Tough
luck!
I think Ray has the best possible advice, but I do not see an easy
solution. Look at how you arrange your Dual-Port RAMs, and how you can
exchange data between them. Are there any unexplored addressing tricks?
Have you looked at Virtex-5LX devices? They can perform not only
arithmetic, but also logic in the DSP slice (also called the
multiplier-accumulator). And they are available, as I posted yesterday
(funny, neither praise nor outrage in the ng. Everyone asleep?)
Good luck, you may need it!
Peter
======================
vssumesh wrote:
> > Peter Alfke wrote:
> > > Vessumesh, if you refuse to answer specific helpful questions, then I
> > > suggest you figure this out yourself, and do not bother this newsgroup.
> > > Peter
> Sorry Peter, but i did not mean that. Sorry for the confusion.
> I am using v4LX60 for my design. And there is a requirement of adding
> two 37 bit no and  doing some combinational logic based on that. The
> total time is 20ns. The adder is taking very little time, the full
> logic itself is taking around 4ns delay. But the main problem is with
> routing delay. I forgot to tell you that it is a block RAM based
> design. And it uses 128 BRAM frm v4lx60(implemented a 16 port RAM).
> Also it uses the block RAMS in a scattered manner. So now i have placed
> this block in the central region. So the last routing to the block
> RAMis taking lot of delays.
> In the previous version there was no combinational logic after the
> adder and i got the timig correctly. But not now.
> What i was asking is to add registers to latch the output of adder.I
> thought like it would be good for the PAR to see two paths insted of 1
> path from a source FF to destination FF. Also Ray there is 32*16 such
> signals. Is it possible to manually route all those signals. I think
> the pipeling is not possible since this is part of a pipeline stage of
> a processor. Which expects the result in the same cycle. So pipelining
> is not an option.
> > It isn't the carry chain causing the problem.  The problem comes about
> > from using many levels of logic (ie the signal goes through lots of
> > LUTs) between the flip-flops plus the propagation delay associated with
> > the carry chain.
> Ray i was asking that if we brake the above long line into separate
> parts using the +ve and -ve edge of the clocks is it possible to help
> tool for a better PAR.
> Thanks and regards
> Sumesh V S

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search