Messages from 152225

Article: 152225
Subject: Re: Post-map simulation: timing violation and delays
From: KJ <kkjennings@sbcglobal.net>
Date: Fri, 22 Jul 2011 13:25:21 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 22, 11:53=A0am, "sdaau" <sd@n_o_s_p_a_m.n_o_s_p_a_m.imi.aau.dk>
wrote:
> I am trying to implement a custom counter (with clock and enable inputs);
> synthesis and behavioral & post-translate simulation pass just fine (usin=
g
> ISE WebPack 13.2). On post-map simulation, I get this:
>
> at 271179 ps(5), Instance /my_counter_test/UUT/c_0/ : Warning: /X_FF SETU=
P
> High VIOLATION ON CE WITH RESPECT TO CLK;
> =A0 Expected :=3D 0.428 ns; Observed :=3D 0.144 ns; At : 271.179 ns
>
<snip>
>
> Now, the most obvious thing would be to insert a delay of at least
> 0.428-0.144=3D 0.284 ns between c_0.clk and c_0.ce (or between c_0.clk an=
d
> wclk),

No, the most obvious thing would be to check your testbench and
validate that your inputs meet the timing requirements because that's
where the problem likely lies.

> and I guess then the timing violation would be gone, is that
> correct?
>

No it is not correct...unless you're only interested in covering up
the problem and pushing it down the road to be fixed later.

> However, the problem is that I would not want to move the first clk after
> enable in the next period using the state machine - and I have no idea ho=
w
> to otherwise implement such a delay of ~ 0.3 ns.
>

In FPGAs, you can't implement controlled time delays.  Delay lines are
not a primitive element in the device.

> I was thinking that timing constraints in the .ucf file would help

Timing constraints should have already been specified, but if you
haven't done so yet, then yes you should specify them.

> So I was wandering - what would be the appropriate method to handle these
> timing violations? And have I understood the above situation correctly?
>

I'm guessing based on what you described from the error message to
signals in your design that you may understand the failing path, but
what you're not understanding is what really needs to be fixed.  The
problem could very likely be in your testbench rather than the design
but below I've listed the basic steps you need to follow:

1.  Did you enter setup time constraints for all inputs?  Did you
setup clock to output delay time constraints for all outputs?  (Note:
For your particular problem, the cause is likely on the 'input' side)
2.  What is the basis for the time constraints in #1?  The correct
answer to this question is the datashee(s) of any device(s) that are
connected to the FPGA.
3. Are you sure you used the datasheet(s) timing constraints
properly?  Setup time (Tsu) for the FPGA will be clock to output (Tco)
of the external device less any clock skew (Tskew) of the clock
(period T).  In other words, the UCF file needs to specify a setup
time constraint of  Tsu =3D T - Tskew - Tco.  Repeat for each input.  Do
a similar procedure for FPGA outputs.
4. Did the FPGA's timing report state that it meets all timing
constraints?  The correct answer here is 'yes'.  If not, iterate #1-4
until you have the correct answers to each question.

On the assumption that you've properly made it through #1-4 (and
assuming that there are no clock domain crossings), then your design
is OK.  Since the design is OK, this implies the result of a timing
failure must be the testbench.  The basic triage here is:
1.  Verify that the inputs to the FPGA meet the requirements listed in
the FPGA's timing report output.  As an example, if you have some
input that is generated synchronously, like this...
   Some_Inp <=3D Blah_Blah_Blah when rising_edge(clock);
Then 'Some_Inp' will be transitioning 1 delta cycle (i.e. 0 ns) after
the rising edge of 'clock'.  That will never meet any non-zero hold
time requirement that the FPGA timing report specified.  Maybe the
testbench delays the clock like this...
   Some_Inp <=3D Blah_Blah_Blah when rising_edge(clock);
   Clock_To_Fpga <=3D clock;
Now the FPGA will see 'Some_Inp' and 'clock' transition at the exact
same time.  Think that will meet either a setup or a hold time
requirement?
2.  Although not relevant to your current problem, one would also want
to verify that you're sampling outputs at the appropriate time as
well.  Usually though this is not a problem...if you did have a
mistake here though it would show up as a functional failure reported
by the testbench not a timing error reported by the post-route FPGA
design.

Since you didn't mention anything about multiple clocks in your
design, I've assumed that the design is a single clock design.
However, if there are multiple clocks then the error you reported
could be because the clock enable input is generated in one clock
domain and used to enable your counter which counts in another clock
domain.  If that's the case, then your design will fail, the solution
is to resynchronize with a single flip flop the output from the source
domain into the counter's clock domain.  That resynchronized signal
will be used to enable the counter.

Kevin Jennings

Article: 152226
Subject: About the setup time of BUFGMUX in Spartan6
From: jianhuawow <chenjh1984@gmail.com>
Date: Sun, 24 Jul 2011 02:19:27 -0700 (PDT)
Links: << >> << T >> << A >>

I'm so confused about the setup time of BUFGMUX. In the datasheet of Spartan6, this spec is defined relatively to rising edge.

But if the structure of BUFGMUX is like 
http://www.design-reuse.com/articles/5827/techniques-to-make-clock-switching-glitch-free.html

Then the setup time should be defined relatively to falling edge. Because the couple-register pair are both triggered with clock's negative edge.

Why? What the actual structure of BUFGMUX in Spartan6.

Thanks a lot.

Article: 152227
Subject: Re: Post-map simulation: timing violation and delays
From: "jt_eaton" <z3qmtr45@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com>
Date: Sun, 24 Jul 2011 21:15:12 -0500
Links: << >> << T >> << A >>

>Hi all, 

>
>* wclk and wenbl are the 'master' signals, and they are synchronous (they
>both rise at exactly the same time)


Nothing in post route rises at exactly the same time. Are these input
signals driven from your testbench? If so you need to spec a hold time from
wclk->wenable and change your testbench to add this.

Clock enables are derived from the clock so they will have a clk->Q delay
that gives them hold time. The easiest way to model this is to resync the
wenable to the falling edge of wclk.


The scary thing is that I think your simulation is catching the enable on
the same wclk that creates the wenable. If thats so then everything is
happening one cycle before it should. In real life if a clk creates an
enable then the enabled act occurs on the next clock.

John 


	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 152228
Subject: Re: FSL Problem:Data Return and Use
From: Martin Thompson <martin.j.thompson@trw.com>
Date: Mon, 25 Jul 2011 09:46:01 +0100
Links: << >> << T >> << A >>

Brian Drummond <brian@shapes.demon.co.uk> writes:

> On Fri, 22 Jul 2011 01:22:44 -0500, aibk01 wrote:
>
>> The build is successful. I can download the Bit file and run it. I can
>> see the data being sent via FSL bus on the hyperterminal by printing the
>> sent values.
>> 
>> Now after the values are sent there is no return of data. What shud i do
>> now?
>
> Simulate.

That can be easier said than done when there's EDK involved - it usually
means bringing up a simulation of the whole system, booting the
simulated processor(s), running the test software and grovelling through
an awful lot of waveforms you don't fully understand.

I've had better luck using chipscope to debug these kinds of things on
hardware (and this from a guy who's first answer is usually "Simulate"
:)

Cheers,
Martin

-- 
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.co.uk/capabilities/39-electronic-hardware

Article: 152229
Subject: Re: FSL Problem:Data Return and Use
From: Brian Drummond <brian@shapes.demon.co.uk>
Date: Mon, 25 Jul 2011 09:33:34 +0000 (UTC)
Links: << >> << T >> << A >>

On Mon, 25 Jul 2011 09:46:01 +0100, Martin Thompson wrote:

> Brian Drummond <brian@shapes.demon.co.uk> writes:
> 
>> On Fri, 22 Jul 2011 01:22:44 -0500, aibk01 wrote:
>>
>>> The build is successful. I can download the Bit file and run it. I can
>>> see the data being sent via FSL bus on the hyperterminal by printing
>>> the sent values.
>>> 
>>> Now after the values are sent there is no return of data. What shud i
>>> do now?
>>
>> Simulate.
> 
> That can be easier said than done when there's EDK involved - it usually
> means bringing up a simulation of the whole system, booting the
> simulated processor(s), running the test software and grovelling through
> an awful lot of waveforms you don't fully understand.
> 
> I've had better luck using chipscope to debug these kinds of things on
> hardware (and this from a guy who's first answer is usually "Simulate"
> :)

I have to agree that simulation with an EDK design can be a bit painful, 
and requires a full ISIM rather than the Lite version (or Modelsim). But 
with a bit of creativity to generate the smallest test case, it can be 
useful, especially when chasing bugs in the EDK-generated code.

It's worth having the tool in the arsenal, even if it's rarely used. 
(I tend to relegate Chipscope to that role, but agree you sometimes do 
really need it)

- Brian

Article: 152230
Subject: Re: Issues with Soft-Cores
From: GrizzlySteve <sbattazzo@gmail.com>
Date: Mon, 25 Jul 2011 04:11:21 -0700 (PDT)
Links: << >> << T >> << A >>

Hello,

Do you have the XC3S200 or XC3S1000 on that board? If you got it from
Digilent you would have had the option to get it with the 1000, and it
might make a difference.

I also have the Spartan 3 Starter Kit personally, with the 1000. I've
lent it to somebody in the office, but I don't think anyone is using
it now, so I might see if I can get OpenRISC to work on it.
As Julius said, there is the orpsoc project on opencores which has all
of the Linux makefiles you need almost ready to go.
If you don't need the external memory, and can run on block RAM, then
all you need to do is update the makefiles and pin assignments, make a
new set of design defines for your board and oscillator frequency,
probably update the clock generator to get the right frequencies out
of the PLL, and include the "ram_wb" core in the defines.
The external SRAM would not be too much harder to get included, but
you would need a wishbone controller written for it, which doesn't
seem to exist in the IP. I asked somebody about exactly this on the
OpenCores forums last week, and received some code very close to what
I needed for another board, I just got it updated and am getting ready
to test it out. I think it may eventually get contributed back for
anyone who needs it. Otherwise I'm willing to share, anyway, so just
let me know.

Alternatively, Aeroflex/Gaisler has the LEON3 soft core CPU (based on
SPARC), which they offer for free if you don't need the fault tolerant
version. They have a board support package for the Spartan 3 Starter
Board all ready to go out of the box, and they give pretty good
instructions on how to get it running, and it can be debugged directly
through the stock Xilinx parallel or USB cable if that's what you
happen to have. Once again, if you have the 200 version of that board,
you might be out of luck, as their BSP supports the 1000, and I don't
know if it would fit in the 200 or not.

Article: 152231
Subject: Re: FPGA not getting programmed
From: Dustin Brothers <starsuplightsdown@gmail.com>
Date: Mon, 25 Jul 2011 07:04:53 -0700 (PDT)
Links: << >> << T >> << A >>

Gabor,

Ok awesome, thanks for the clarity. I have never designed a system in this configuration which is why I was asking :)

-D

Article: 152232
Subject: Re: About the setup time of BUFGMUX in Spartan6
From: Dustin <starsuplightsdown@gmail.com>
Date: Mon, 25 Jul 2011 07:29:01 -0700 (PDT)
Links: << >> << T >> << A >>

I would be hesitant to refer to a non-Xilinx diagram for the internal structure of the BUFGMUX block. Plus the author is a technical staff member at Altera, so he's probably writing either:
a. Generically
-or-
b. About an Altera FPGA

I'd pay attention to whatever is in the Spartan6 User's Guide and datasheet and forget about whatever you read in this article (pull some concepts from it, yes, but don't make it your new religion about the S6).

Article: 152233
Subject: synthesizing
From: "ECS.MSc.SOC" <mahdiyar.sarayloo@gmail.com>
Date: Mon, 25 Jul 2011 09:37:22 -0700 (PDT)
Links: << >> << T >> << A >>

Hi all

I want to calculate a simple formula, including multiply and division
operands. I use Verilog language to program FPGA.

Can I use the sign of Multiplication (*) and Division (/)? Or I have
to write the code of a Multiplication algorithm like Booth?

Regards

Article: 152234
Subject: Re: synthesizing
From: Rob Gaddi <rgaddi@technologyhighland.com>
Date: Mon, 25 Jul 2011 09:41:51 -0700
Links: << >> << T >> << A >>

On 7/25/2011 9:37 AM, ECS.MSc.SOC wrote:
> Hi all
>
> I want to calculate a simple formula, including multiply and division
> operands. I use Verilog language to program FPGA.
>
> Can I use the sign of Multiplication (*) and Division (/)? Or I have
> to write the code of a Multiplication algorithm like Booth?
>
> Regards

That depends on whether you expect the math to be done in the hardware 
or at compile time.  If it's the latter, you can do whatever you'd like. 
  If the former, then it'll depend.  Most FPGA families have multiplers 
and the tools are smart enough to use those multipliers to perform a 
multiply when you specify one.

Divides (by other than a power of 2) are a pain, and always require 
serial algorithms to do them.  You'd be well served trying to replace 
that divide with a reciprocal multiply if possible.

-- 
Rob Gaddi, Highland Technology
Email address is currently out of order

Article: 152235
Subject: Re: synthesizing
From: nico@puntnl.niks (Nico Coesel)
Date: Mon, 25 Jul 2011 17:10:10 GMT
Links: << >> << T >> << A >>

"ECS.MSc.SOC" <mahdiyar.sarayloo@gmail.com> wrote:

>Hi all
>
>I want to calculate a simple formula, including multiply and division
>operands. I use Verilog language to program FPGA.
>
>Can I use the sign of Multiplication (*) and Division (/)? Or I have
>to write the code of a Multiplication algorithm like Booth?

That depend on the synthesis tools. I guess with most modern tools you
can use the * and / sign. How it gets mapped to the hardware depends
on the target. You really should consult the manual of the synthesis
tool on how this is handled.

-- 
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

Article: 152236
Subject: Re: synthesizing
From: Tim Wescott <tim@seemywebsite.com>
Date: Mon, 25 Jul 2011 13:29:28 -0500
Links: << >> << T >> << A >>

On Mon, 25 Jul 2011 09:37:22 -0700, ECS.MSc.SOC wrote:

> Hi all
> 
> I want to calculate a simple formula, including multiply and division
> operands. I use Verilog language to program FPGA.
> 
> Can I use the sign of Multiplication (*) and Division (/)? Or I have to
> write the code of a Multiplication algorithm like Booth?

If you can stand the repetition: it depends on your tools, and what 
you're trying to do.  

If you're multiplying integers then most tools will see a '*' and map it 
to a hardware multiplier (or synthesize one).  I wouldn't trust a 
synthesizer to know how to do a fixed-point multiply that wasn't integer, 
although I would give it a whirl and see what happened.  Divide is so 
resource hungry that there are a tremendous number of system-level 
decisions to be made in implementing it: I would be astonished at a 
synthesizer that would see a '/' and automagically map it to some sort of 
a divide.

You need to read up on the algorithms in question to see why divide is so 
different from multiply, and to get an idea of what you might have to do 
to make it work.  (Although I expect that most FPGA manufacturers and/or 
tool chain vendors will have some sort of divide primitive wizard that 
you can at least use for the bit-slice portion, even if you have to wrap 
it with your own sequencing logic).

-- 
www.wescottdesign.com

Article: 152237
Subject: Question on PCI-express verssus Standard PCI performance
From: Benjamin Couillard <benjamin.couillard@gmail.com>
Date: Mon, 25 Jul 2011 13:23:12 -0700 (PDT)
Links: << >> << T >> << A >>

Hi everyone,

I'm working on a conversion project where we needed to convert a PCI
acquisition card to a PCI-express (x1) acquisition card. The project
is essentially the same except instead that the new acquisition card
is a PCI-express endpoint instead of being a standard-PCI endpoint.
The project is implemented on a Xilinx FPGA, but I don't think my
issue is Xilinx specific.

The conversion has worked fine on all levels except one. The read
latency of PCI express is about 4 times higher than standard PCI. For
example, on the old product, it takes about 0.9 us to perform a 1-
DWORD read. With the PCI-express product it takes about 3-4 us to
perform a 1-DWORD read.  I've seen this read latency both in real-life
(with a real board) and in  VHDL Simulation so I don't think that this
is a driver issue. Do any of you have experienced similar performance
issues?

Don't get me wrong, for me PCI-express is a major step ahead, the
write burst and read burst performance is way better than standard
PCI.. Perhaps this is the reason, since most PCI-express cards are
mostly used in burst transactions, the read latency does not really
matter, therefore they sacrificed some read latency in order to obtain
better performance.

Best regards

Article: 152238
Subject: Re: Question on PCI-express verssus Standard PCI performance
From: Robert Wessel <robertwessel2@yahoo.com>
Date: Mon, 25 Jul 2011 16:37:20 -0500
Links: << >> << T >> << A >>

On Mon, 25 Jul 2011 13:23:12 -0700 (PDT), Benjamin Couillard
<benjamin.couillard@gmail.com> wrote:

>Hi everyone,
>
>I'm working on a conversion project where we needed to convert a PCI
>acquisition card to a PCI-express (x1) acquisition card. The project
>is essentially the same except instead that the new acquisition card
>is a PCI-express endpoint instead of being a standard-PCI endpoint.
>The project is implemented on a Xilinx FPGA, but I don't think my
>issue is Xilinx specific.
>
>The conversion has worked fine on all levels except one. The read
>latency of PCI express is about 4 times higher than standard PCI. For
>example, on the old product, it takes about 0.9 us to perform a 1-
>DWORD read. With the PCI-express product it takes about 3-4 us to
>perform a 1-DWORD read.  I've seen this read latency both in real-life
>(with a real board) and in  VHDL Simulation so I don't think that this
>is a driver issue. Do any of you have experienced similar performance
>issues?
>
>Don't get me wrong, for me PCI-express is a major step ahead, the
>write burst and read burst performance is way better than standard
>PCI.. Perhaps this is the reason, since most PCI-express cards are
>mostly used in burst transactions, the read latency does not really
>matter, therefore they sacrificed some read latency in order to obtain
>better performance.


One lane PCIe 1.x should be able to turn a word read around in about
250ns assuming not too much else is going on.  Of course an excessive
number of switches (or slow switches) or slow hardware on either end
are obviously possible issues.  But PCIe is certainly much faster than
3-4us to read a word.

Article: 152239
Subject: Re: Issues with Soft-Cores
From: Bart Fox <bartfox@gmx.net>
Date: Tue, 26 Jul 2011 07:58:03 +0200
Links: << >> << T >> << A >>

Am 25.07.11 13:11, schrieb GrizzlySteve:
> and I don't
> know if it would fit in the 200 or not.
Definitely no. With limited peripherie (e.g. w/o MMU) it is usable on 
Spartan3 1000.

For a really small soft-core (with gcc support) take a look at the ZPU.

regards,
Bart

Article: 152240
Subject: FPGA security, Actel down, now Xilinx too?
From: Antti <antti.lukats@googlemail.com>
Date: Tue, 26 Jul 2011 00:04:45 -0700 (PDT)
Links: << >> << T >> << A >>

Hi

its maybe not so commonly known that there have been products using Actel s=
ecure FPGA's have been cloned already many years ago (readback done by dark=
 engineers at Actel), few month ago a paper was published indicating that P=
roAsic3 (and other newest Actel FPGA's) have master key that is known not o=
nly inside Actel but also for the dark side outside the company. There is a=
t least one known successful Actel ProAsic3 based product cloning done (ass=
umed readback done at Actel fab, not outside).

following post has link to documents that show that Xilinx V2/V4/V5 are vul=
nerable as well.

http://it.slashdot.org/story/11/07/21/1753217/FPGA-Bitstream-Security-Broke=
n

P.S. We do not have more info nor the master keys, please do not ask :)


Antti Lukats
http://trioflex.blogspot.com/

Article: 152241
Subject: Re: Question on PCI-express verssus Standard PCI performance
From: "rupertlssmith@googlemail.com" <rupertlssmith@googlemail.com>
Date: Tue, 26 Jul 2011 02:44:32 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 25, 9:23=A0pm, Benjamin Couillard <benjamin.couill...@gmail.com>
wrote:
> The conversion has worked fine on all levels except one. The read
> latency of PCI express is about 4 times higher than standard PCI. For
> example, on the old product, it takes about 0.9 us to perform a 1-
> DWORD read. With the PCI-express product it takes about 3-4 us to
> perform a 1-DWORD read. =A0I've seen this read latency both in real-life
> (with a real board) and in =A0VHDL Simulation so I don't think that this
> is a driver issue. Do any of you have experienced similar performance
> issues?

I have no actual experience of experimenting with this, however, I
have
been interested in a latency sensitive device that may potentially use
PCI-E
so have been looking around for answers.

Have a look at this write up, of a comparison of HyperTransport and
PCI-E.
The authors claim around 250 nano-seconds (page 9) to read the first
byte:

http://www.hypertransport.org/docs/wp/Low_Latency_Final.pdf

It would be interesting to hear what is causing you to see 3-4 us?
That
would kill off my potential project, so I am hoping to be able to
match the
results in the above paper.

Could there be some inaccuracy in your measurements; how do you
measure the latency?

Rupert

Article: 152242
Subject: Re: Question on PCI-express verssus Standard PCI performance
From: Kolja Sulimma <ksulimma@googlemail.com>
Date: Tue, 26 Jul 2011 05:06:04 -0700 (PDT)
Links: << >> << T >> << A >>

When designing with PCI or PCIe you should really try to avoid reads
as much as possible.
What do you need it for anyway? In a multitasking operating system you
are going to have microseconds of jitter on the software side in
kernel mode and tens of miliseconds in user mode anyway. So I am
wondering what the scenario is that benefits from sub us latency for
software reads?

Kolja
cronologic.de

Article: 152243
Subject: Re: Question on PCI-express verssus Standard PCI performance
From: John Adair <g1@enterpoint.co.uk>
Date: Tue, 26 Jul 2011 09:19:49 -0700 (PDT)
Links: << >> << T >> << A >>

Generally speaking PCI Express much more prone to latency than
convertional PCI because packets have to be constructed, passed
through a structure of nodes, and checked at most levels. Data
checking isn't completed, and onward transmission, until last data
arrives and CRCs are checked.

If you do a "read" this will have a packet outgoing and one coming
back so doubly worse. If you can do a DMA like operation where data is
sent from the data source and then interrupt your system to use the
data in memory.

The latency will also vary from system to system because rooting
structures differ between motherboards. The amount of other things
going on will also affect latency as different things contend for the
data pipes. Generally speaking if you are trying to do anything real
time it is something of a nightmare if you are planning using the host
motherboard processor for control functions.

You can try and make the latency smaller by using smaller packet sizes
and this sometimes helps. Ultimately if there is a real time element
to this then putting the processing and/or control on your card is
probably best for performance and accuracy.

John Adair
Home of Raggedstone2. The Spartan-6 PCIe Development Board.

On Jul 25, 9:23=A0pm, Benjamin Couillard <benjamin.couill...@gmail.com>
wrote:
> Hi everyone,
>
> I'm working on a conversion project where we needed to convert a PCI
> acquisition card to a PCI-express (x1) acquisition card. The project
> is essentially the same except instead that the new acquisition card
> is a PCI-express endpoint instead of being a standard-PCI endpoint.
> The project is implemented on a Xilinx FPGA, but I don't think my
> issue is Xilinx specific.
>
> The conversion has worked fine on all levels except one. The read
> latency of PCI express is about 4 times higher than standard PCI. For
> example, on the old product, it takes about 0.9 us to perform a 1-
> DWORD read. With the PCI-express product it takes about 3-4 us to
> perform a 1-DWORD read. =A0I've seen this read latency both in real-life
> (with a real board) and in =A0VHDL Simulation so I don't think that this
> is a driver issue. Do any of you have experienced similar performance
> issues?
>
> Don't get me wrong, for me PCI-express is a major step ahead, the
> write burst and read burst performance is way better than standard
> PCI.. Perhaps this is the reason, since most PCI-express cards are
> mostly used in burst transactions, the read latency does not really
> matter, therefore they sacrificed some read latency in order to obtain
> better performance.
>
> Best regards

Article: 152244
Subject: Re: Question on PCI-express verssus Standard PCI performance
From: "rupertlssmith@googlemail.com" <rupertlssmith@googlemail.com>
Date: Tue, 26 Jul 2011 12:00:53 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 26, 5:19=A0pm, John Adair <g...@enterpoint.co.uk> wrote:
> If you do a "read" this will have a packet outgoing and one coming
> back so doubly worse. If you can do a DMA like operation where data is
> sent from the data source and then interrupt your system to use the
> data in memory.

In the paper I posted a link to, I think the times are for an
interrupt, or for DMA, not a software initiated "read". Thanks for
explaining the difference.

Rupert

Article: 152245
Subject: Re: Question on PCI-express verssus Standard PCI performance
From: "Morten Leikvoll" <mleikvol@lyse.nospam.net>
Date: Tue, 26 Jul 2011 23:14:33 +0200
Links: << >> << T >> << A >>

"Benjamin Couillard" <benjamin.couillard@gmail.com> wrote in message 
news:62427806-eeec-499b-a0f0-15ffafa0e3ab@w27g2000yqk.googlegroups.com...
> Hi everyone,
>
> I'm working on a conversion project where we needed to convert a PCI
> acquisition card to a PCI-express (x1) acquisition card. The project
> is essentially the same except instead that the new acquisition card
> is a PCI-express endpoint instead of being a standard-PCI endpoint.
> The project is implemented on a Xilinx FPGA, but I don't think my
> issue is Xilinx specific.
>
> The conversion has worked fine on all levels except one. The read
> latency of PCI express is about 4 times higher than standard PCI. For
> example, on the old product, it takes about 0.9 us to perform a 1-
> DWORD read. With the PCI-express product it takes about 3-4 us to
> perform a 1-DWORD read.  I've seen this read latency both in real-life
> (with a real board) and in  VHDL Simulation so I don't think that this
> is a driver issue. Do any of you have experienced similar performance
> issues?

Is it possible that time-stamping the data would disconnect you somewhat 
from the latency problem?
Usually data can't be processed and presented real-time at those speeds 
anyway..

Article: 152246
Subject: VHDL horror in Xcell 76
From: "RCIngham" <robert.ingham@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com>
Date: Wed, 27 Jul 2011 04:53:44 -0500
Links: << >> << T >> << A >>

There is an utterly horrible VHDL howler on page of 45 of the latest Xcell
Journal. Two example codes for a register with reset are given:

signal Q: std_logic:=‘1’;
...
async: process (CLK,RST)
 begin
  if (RST= ‘1’) then
    Q <= ‘0’;
  elsif (rising_edge CLK) then
    Q <= D;
  end if;
 end

 This would be OK if the clock edge function call had been
'rising_edge(CLK)' instead, and there was a semi-colon after the last
'end'.


signal Q: std_logic:=‘1’;
...
async: process (CLK)
 begin
  if (RST= ‘1’) then
    Q <= ‘0’;
  elsif (rising_edge CLK) then
    Q <= D;
  end if;
 end

This has the same errors as the first, but (despite the unchanged process
name) is meant to infer a synchronously reset register. BUT ALAS AND ALACK!
As written - at least in simulation - the reset will be applied on either
edge of CLK. What XST would make of it can only guess. It should be:

signal Q: std_logic:='1';
...
sync: process (CLK)
begin
  if rising_edge(CLK) then
    if (RST= '1') then
      Q <= '0';
    else
      Q <= D;
    end if;
  end if;
end process sync;

Such slip-shop work rather reduces one's confidence in the rest of the
contents.
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 152247
Subject: Re: VHDL horror in Xcell 76
From: "RCIngham" <robert.ingham@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com>
Date: Wed, 27 Jul 2011 04:59:53 -0500
Links: << >> << T >> << A >>

I should have added that this is also at:
http://forums.xilinx.com/t5/General-Technical-Discussion/VHDL-horror-in-Xcell-76/td-p/167622
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 152248
Subject: Re: FPGA security, Actel down, now Xilinx too?
From: radarman <jshamlet@gmail.com>
Date: Wed, 27 Jul 2011 05:17:32 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 26, 2:04=A0am, Antti <antti.luk...@googlemail.com> wrote:
> Hi
>
> its maybe not so commonly known that there have been products using Actel=
 secure FPGA's have been cloned already many years ago (readback done by da=
rk engineers at Actel), few month ago a paper was published indicating that=
 ProAsic3 (and other newest Actel FPGA's) have master key that is known not=
 only inside Actel but also for the dark side outside the company. There is=
 at least one known successful Actel ProAsic3 based product cloning done (a=
ssumed readback done at Actel fab, not outside).
>
> following post has link to documents that show that Xilinx V2/V4/V5 are v=
ulnerable as well.
>
> http://it.slashdot.org/story/11/07/21/1753217/FPGA-Bitstream-Security...
>
> P.S. We do not have more info nor the master keys, please do not ask :)
>
> Antti Lukatshttp://trioflex.blogspot.com/

No one should ever assume the device security offered is 100%
uncrackable. I used to know a guy who did legit "dark engineering" for
government devices, and it was amazing to hear stories of drilling out
holes in "secure devices" and extracting data using microscropic
probes. Another engineer I knew has a collection of IC's embedded in
epoxy - the company he worked for would shave them layer by layer to
extract the design physically. (So no, going to an ASIC won't
necessarily be 100% secure either)

If man can make it, man can break it.

The trick is to make it more expensive for the cloners to crack than
it would be to just license, buy, or reverse engineer another way.
Besides, a lot of places still send bit streams to China for
programming during assembly, and at that point, adding bit-stream
security is a bit like setting the deadbolt on an already open, and
empty, barn.

A better metric for FPGA bitstream security, or any security product,
is the cost per breach and/or time per breach. Assume it can be
breached, and pick a method where the [cost/time]/[breach] equation
works out in your favor. BTW - this also means that devices with a
master key are very bad - because the time/breach is only paid once,
and you can rest assured, someone besides the manufacturer has it
already.

For an example of this done right, there is an IBM crypto chip that I
believe is still unbroken - but it has wires around the die that
control power the SRAM memory holding the crypto keys. If you drill
into the package, and cut one of the wires, the device loses its
memory - and becomes a dud. Obviously, you also have to do this work
with the chip in-system, and running, for the same reason. This is the
equivalent of the lock on an underground bank vault.

We will know FPGA vendors are equally serious when the offer a part
with that level of security. Until then, it's pretty much the
equivalent of the standard locks on our front doors. Good enough to
keep the riff-raff out, but not enough to keep the serious thieves
away.

Article: 152249
Subject: Re: Post-map simulation: timing violation and delays
From: "sdaau" <sd@n_o_s_p_a_m.n_o_s_p_a_m.imi.aau.dk>
Date: Wed, 27 Jul 2011 09:31:13 -0500
Links: << >> << T >> << A >>

Hi all, 

First of all, thank you all for the very prompt responses, and sorry I
couldn't respond earlier. I think the crux of the matter is summed up in
@jt_eaton's comment:

> Nothing in post route rises at exactly the same time. 

.. but I believe I should try to explain a bit, what it is I'm looking
after. A bit of a mammoth post follows - apologies in advance. 


For one, I have only partial knowledge of HDL, but so far I manage somehow.
My biggest problem is, basically, that  when I start coding, usually I end
up confused in the "things happening in the next clock cycle" thing. 

From my sequential programming background, say when I see "a=2;" in C; I
read that as: "_after the program counter passes this statement, a holds
value 2_" ... I try to relate that to HDL as in "_after the simulator
passes this posedge, a holds value 2_" - so when I code stuff with this
expectation, and I see 'action on next cycle' in simulator, I get confused
thoroughly. Then I do all my best to defeat that in behavioral simulation -
and usually I manage; then I come to post-map sim, and I realize most of
that does NOT really work. 


So, I decided to study this a bit on a simpler example; for instance, for a
chip interface, I'll need a clocked counter with enable and reset. The
concept would be simple: when enable high, do increase count on clock
posedge; on reset high, do not increase count and set count to 0. For
instance, that is exactly the kind of device which is given here:

http://www.asic-world.com/vhdl/first1.html#Counter_Design_Block

I modified that code a bit (counter_aw.vhd), and used my own testbench
(test_twb.vhd), which I put here (along with some screenshots I'll refer
to): 

http://sdaaubckp.sourceforge.net/post/vhd_counter_aw/

Clock is 50 Mhz (period 20 ns). The "Counter_Design_Block" is architecture
'behav' in the 'counter_aw.vhd' file (uncommented). This one works under
behavioral simulation as I expect it to (aw_orig_beh_sim.png); that is,
reset of counter to 0 and its increase happen at the posedges I expect.
Same results are for post-translate simulation (aw_orig_post-trans_sim.png)
- however, post place and route sim (post-par_sim_delayed.png) is 'delayed'
- e.g. from posedge of enable, to when cout becomes high, is like 30 ns (~
3 clock semiperiods); however that is not the same delay throughout the sim
run!

Since I encountered this before, I tried to code "my own" counter
(architecture my_starting_point, commented), and I immediately made some
mistakes - first, the final assignment to the output port was within an
'IF', so even behavioral simulation showed everything delayed to next clock
cycle (aw_startp_beh_sim_delayed.png); after fixing that, this counter
behaves more-less the same as the previous example
(aw_startp_beh_sim_ok.png) - but the problem with it, is that it is not
synthesizable (as far as I can see, the problem is using rising_edge twice
on different signals in the same process). 

So, after solving that, I basically ended up with the problem described in
the original post - unfortunately, I cannot reconstruct the conditions with
the X's (that appear approx 4 ns after rise of wclk) that I got in the
original post (then again, that day my PC did crash a couple of times, so
maybe that had something to do with problems with memory for ise or isim?).
Then I got to the inverter thing, removed some of the timing violations
with it; and found that to avoid the final timing violations, 'reset'
internally would have to be effectuated 'first', 'enable' second and the
'clk' last - so I delayed the clk twice (four inverters), and enable once -
and I got to architecture my_ending_point (commented). 

With my_ending_point code, the behavioral simulation
(aw_endp_beh_sim_delayed_no-ucf.png) seems fine, except that the very first
count after enable happens in "next" clock cycle -- however, post-par sim
(aw_endp_post-par_sim_delayed_ucf.png) shows that, in addition, there are
glitches - and there is almost 10 ns delay (the 'effectuation' of the count
happens almost on clk negedge)!! For the post-map sim
(post-map_sim_delayed_ucf.png) this delay seems to be less (though still 5
< x < 10 ns) , but glitches are still there. 

While I'm at the glitches, "Xilinx Synthesis and Simulation Design Guide"
notes: 

> Glitches in Your Design
> When a glitch (small pulse) occurs ..., the glitch
> may be passed along ..., or it may
> be swallowed .. . .. To produce more
> accurate simulation of how signals are propagated 
> within the silicon, Xilinx models this
> behavior in the timing simulation netlist.

When it says "Xilinx *models*", does it mean that the glitches will be
there present "by design" of the HDL code circuit - or is it something the
simulator introduces? Meaning, should I try to eliminate them through
design, or should I just be careful if they "propagate"? Then again - I
wasn't really aware of this until now - I was reading a bit more on this,
and turns out from basics, that minimal configuration of synchronous (as in
combinatorial/unclocked) circuits (Mealy/Moore ?!) are *by default*
glitchy, and one is advised to "buffer" the result with a (clocked) FF -
which results with the actual 'effectuation' occurring on next clock cycle;
so maybe the glitches in the sim just try to illustrate this effect?

Anyways - I'm sure in my initial code I used to get somewhat less than 5 ns
delay for post-map (which is why I'm surprised slightly at the above
results), but I can't reconstruct that anymore. Which, of course, means I
haven't done something right :) I guess my question would be down to - what
am I missing, so that I can get somewhat like the aw_orig_beh_sim.png
results in post-par sim, but delayed by no more than quarter period? That,
for me, would be a confirmation that the engine should more or less work
reliably on the chip as well - but is that a correct assumption? (if not,
then I probably shouldn't bother getting so "ideal" post-map/par results,
ideal as in "results almost like behavioral sim").  

I've tried putting in some timing constraints (aw_endp_counter.ucf), while
trying to get rid of static timing and ise warnings as well (synthesizer
doesn't like outputs of combinatorial logic [due to use of inverters] to be
used as clock) - but I'm not really sure what I'm doing; since as far as I
can remember, changing the constraint values didn't really result with much
difference in post-map/par simulation. 

Well, I guess this is as detailed as I can formulate my problem for now ...



> I presume you forced it to keep the inverters, otherwise they
> will usually optimize away.  You might try with only one forced,
> in which case it will optimize the other by inverting the signal
> somewhere else.  Or with a forced non-inverting gate.

Interesting trick about keeping only one forced - I just used "attribute
KEEP" on all of the involved signals, that seems to have worked.. 

>>
>> Now, the most obvious thing would be to insert a delay ...
>
> No, the most obvious thing would be to check your testbench and
> validate that your inputs meet the timing requirements because that's
> where the problem likely lies.
> ...
>>
>> I was thinking that timing constraints in the .ucf file would help
> 
> Timing constraints should have already been specified, but if you
> haven't done so yet, then yes you should specify them.

Got it - thanks to this comment, I started looking into timing constraints
as ISE understands them (in .ucf file), but I still cannot get a proper
understanding of those.. 


> In FPGAs, you can't implement controlled time delays.  Delay lines are
> not a primitive element in the device.

Got that too - but could one consider two inverters to behave as a somewhat
controlled delay (as in, the actual delay obtained by them is dependent on
how they end up being routed - but we can still now they'll insert, say,
approx 0.4 ns?)


> I'm guessing based on what you described from the error message to
> signals in your design that you may understand the failing path, but
> what you're not understanding is what really needs to be fixed. 

Exactly - this is 100% correct :) 


> The problem could very likely be in your testbench rather than the 
> design

That could indeed be the problem - @jt_eaton seems to agree ...


> below I've listed the basic steps you need to follow:

Thanks for taking the time to write those up, @KJ, much appreciated!


> 1.  Did you enter setup time constraints for all inputs?  Did you
> setup clock to output delay time constraints for all outputs?  (Note:
> For your particular problem, the cause is likely on the 'input' side)

I didn't at first; then I tried, but as I said, I'm not sure I understand
it. For instance, i have:

OFFSET = IN 6 ns VALID 8 ns BEFORE "clk" RISING;

ISE draws a sort of a diagram, and the way I interpret the diagram, the
above sentence should mean "do not allow that a data signal synchronous
with rising edge of CLK, propagates outside of 2 < x < 4 ns range"; which
is likely not correct, since I couldn't perceive anything to that effect in
simulation results. 


> 2.  What is the basis for the time constraints in #1?  The correct
> answer to this question is the datashee(s) of any device(s) that are
> connected to the FPGA.

Well, I have the wrong answer, unfortunately :/ Essentially, I saw the
above timing violations, and simply tried to 'translate' them to timing
constraints (as I understood them above) - that probably was not the right
way to do it. Other than that, I'm running clock @50 MHz, so I tried to
make the testbench for that - and to make the timing constraints relate to
100 MHz clock (as in - "if it works @100, it will work for 50 MHz too");
the device I'm intending to use this with counter with, however, may
require a much slower counter (kHz). 


> 3. Are you sure you used the datasheet(s) timing constraints
> properly?  Setup time (Tsu) for the FPGA will be clock to output (Tco)
> of the external device less any clock skew (Tskew) of the clock
> (period T).  In other words, the UCF file needs to specify a setup
> time constraint of  Tsu = T - Tskew - Tco.  Repeat for each input.  Do
> a similar procedure for FPGA outputs.  

Thanks for this - I'll need to chew on this a bit more, I wasn't aware of
the "setup time constraint". 


> 4. Did the FPGA's timing report state that it meets all timing
> constraints?  The correct answer here is 'yes'.  If not, iterate #1-4
> until you have the correct answers to each question.

Thanks for this too - I found the Implement Design/Map/"Analyze Post-Map
Static Timing"; at first it was complaining (showed red X's), then I got it
to stop (but for the most part, I was just trying different numbers around
based on the messages I got, not sure what I actually did there :) ) 

Actually, now that I come back to it, I can see a fail: 

> Timing constraint: TIMEGRP "couts" OFFSET = OUT 5 ns AFTER COMP "clk"; 
> ...
>   Minimum allowable offset is   6.106ns. 
>
--------------------------------------------------------------------------------

>   
> Paths for end point cout<11> (IOB.PAD), 1 path 
>
--------------------------------------------------------------------------------

> Slack (slowest paths):  -1.106ns 

I guess from this, if I put OFFSET = OUT 6.2 ns, it will pass? Or is there
another way to force the synthesizer to conform to 5 ns? 


> On the assumption that you've properly made it through #1-4 (and
> assuming that there are no clock domain crossings), then your design
> is OK.

Talking about clock domain crossings - would inverting a clock four time,
and "declaring" that signal as clock as well, constitute clock domain
crossing?  


> Since the design is OK, this implies the result of a timing
> failure must be the testbench.  The basic triage here is:

Many thanks for writing this up as well :) 


> 1.  Verify that the inputs to the FPGA meet the requirements listed in
> the FPGA's timing report output.  As an example, if you have some
> input that is generated synchronously, like this...
>   Some_Inp <= Blah_Blah_Blah when rising_edge(clock);
> Then 'Some_Inp' will be transitioning 1 delta cycle (i.e. 0 ns) after
> the rising edge of 'clock'.  That will *NEVER* meet any non-zero hold
> time requirement that the FPGA timing report specified. 

Thanks for this (emphasis mine) - as it can be seen in test_twb.vhd (from
link above), what I do is simply: 

    ...
    wenbl <= '0';
    wreset <= '0';
    ...

.. which, I guess, means "effectuate these signals in parallel/at the same
time" - and thus the 0 ns transition you're speaking of? 


> Maybe the testbench delays the clock like this...
>   Some_Inp <= Blah_Blah_Blah when rising_edge(clock);
>   Clock_To_Fpga <= clock;
> Now the FPGA will see 'Some_Inp' and 'clock' transition at the exact
> same time.  Think that will meet either a setup or a hold time
> requirement?

I have not used the "when" syntax so much - but I'd answer (from my
somewhat sequential programming perspective, and after the tips so far)
like this: 
* Some_Inp part will "block" until rising_edge of clock; when posedge clock
occurs, it will effectuate the next statement - however after a delta of 0
ns; 
** that is, Clock_To_Fpga will be effectuated "now"/"in parallel" with the
previous statement - that is on posedge of 'clk'; 
* FPGA will see both Some_Inp and Clock_To_Fpga change at the "same time";

* since FPGA expects that a setup time and hold time of minimum X ns has
transpired from the moment Some_Inp changes, to the moment 'Clock_To_Fpga'
changes (and, I assume, activates sampling of Some_Inp) 
.. hence, there will be a setup or hold timing violation - i.e. time
requirement will not be met. (?)


> 2.  Although not relevant to your current problem, one would also want
> to verify that you're sampling outputs at the appropriate time as
> well.  Usually though this is not a problem...if you did have a
> mistake here though it would show up as a functional failure reported
> by the testbench not a timing error reported by the post-route FPGA
> design.

Would this be related to glitches too? I.e. if glitches occur close to
posedge sampling clock transition, I may want to 'buffer' the output, until
the next negedge for instance? 


> Since you didn't mention anything about multiple clocks in your
> design, I've assumed that the design is a single clock design.
> However, if there are multiple clocks then the error you reported
> could be because the clock enable input is generated in one clock
> domain and used to enable your counter which counts in another clock
> domain. 

Could it be, that the synthesizer recognizes the "twice inverted" clock
signal as a clock from a second domain?


> If that's the case, then your design will fail, the solution
> is to resynchronize with a single flip flop the output from the source
> domain into the counter's clock domain.  That resynchronized signal
> will be used to enable the counter.

Would that resynchronization be like the 'buffering' for the minimal
Moore/Mealy glitching mentioned above? If so, then it would 'delay' the
'effectuation' of values until next clock cycle, right? 


>> * wclk and wenbl are the 'master' signals, and they are synchronous
(they
>> both rise at exactly the same time)
>
> Nothing in post route rises at exactly the same time. 
>

Thanks for that - I guess now, I'm better aware of that; but when the
thread started I wasn't. Can this also be interpreted as: "Nothing in post
route should rise at exactly the same time" (as far as signals from the
testbench are concerned)?


> Are these input
> signals driven from your testbench? 

Yup.


> If so you need to spec a hold time from
> wclk->wenable and change your testbench to add this.

Many thanks for that - see, *that* I wasn't aware of ... Will have to look
that up. 


> Clock enables are derived from the clock so they will have a clk->Q
delay
> that gives them hold time. 

Ok, that makes sense - much appreciated :)

 
> The easiest way to model this is to resync the
> wenable to the falling edge of wclk.

Makes a lot of sense now - will give it a shot. I know the answer is
probably yes - but in that case, do I again have to worry about timing
constraints?


> The scary thing is that I think your simulation is catching the enable
on
> the same wclk that creates the wenable.

I think that is correct - actually, it seems it does perceive some delay
between the wenable and the wclk, but (I guess) not enough. 


> If thats so then everything is
> happening one cycle before it should. In real life if a clk creates an
> enable then the enabled act occurs on the next clock.

Thanks for that - the occurring on "next clock" was exactly what I wanted
to avoid; and it seems, with all the "inverter delays" and such, what I
managed to do is move everything to happen "one cycle before it should" :)



In any case, to sum up - while I'm starting to see why "update on next
clock" is so important - is it still possible (or smart) to aim for updates
occurring at least earlier than a semiperiod *before* the 'next' clock (and
this is simply for my own perceptual ease in reading simulation results:
then it would be easier for me to read, if I get the value I expect in
*this* cycle)?


Thanks again for the awesome guidance,
Cheers!

	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search