Messages from 152275

Article: 152275
Subject: Re: Bitstream compression
From: Rob Gaddi <rgaddi@technologyhighland.com>
Date: Mon, 01 Aug 2011 09:24:01 -0700
Links: << >> << T >> << A >>

On 7/29/2011 7:36 AM, Noob wrote:
> [ NB: I've added comp.compression to the mix ]
>
> Jason wrote:
>
>> Rob Gaddi wrote:
>>
>>> Just did, on an FPGA with an admittedly small fill ratio.  The
>>> uncompressed bitstream for an XC3S200 is 1,047,616 bits.  Using Xilinx
>>> bitstream compression gets it down to 814,336 bits, or about 100kB.
>>> 7-Zip knocked it down to 16kB.
>>>
>>> Another design uses a decent about of an XC6S45.  Native size is
>>> 11,875,104 bits (~1.5MB).  Bitstream compresson gives me 1.35MB.  7-Zip
>>> gives me 395kB.
>>>
>>> I've got a project coming up around an Altera Arria II, with 30Mb of
>>> configuration.  If I could get a 3:1, 4:1 compression ratio it would be
>>> a pretty radical savings on flash size.
>>
>> The algorithm that 7-Zip uses internally for .7z file compression is
>> LZMA:
>>
>> http://en.wikipedia.org/wiki/Lzma
>>
>> One characteristic of LZMA is that its decoder is much simpler than
>> the encoder, making it well-suited for your application. You would be
>> most interested in the open-source LZMA SDK:
>>
>> http://www.7-zip.org/sdk.html
>>
>> They provide an ANSI C implementation of a decoder that you can port
>> to your platform. Also, there is a reference application for an
>> encoder (I believe also written in C). I have used this in the past to
>> compress firmware for embedded systems with good success. I use the
>> encoder as a post-build step to compress the firmware image into an
>> LZMA stream (note that the compression is not done with 7-Zip, as the .
>> 7z format is a full-blown archive; the reference encoder just gives
>> you a stream of compressed data, which is what you want). The
>> resulting file is then decompressed on the embedded target at firmware
>> update time. The decoder source code is most amenable to porting to 32-
>> bit architectures; I have implemented it on the LPC2138 ARM7 device
>> (with the same 32 KB of RAM as your part) as well as a few AVR32UC3
>> parts.
>>
>> A couple other things: I originally did this ~4 years ago with a much
>> older version of the SDK; it's possible that things have changed since
>> then, but it should still be worth a look. LZMA provides good
>> compression ratios with a decoder that in my experience runs well on
>> embedded platforms. Secondly, you do have to be careful with the
>> parameters you use at the encoder if you want to bound the amount of
>> memory required at the decoder. More specifically, you need to be
>> careful which "dictionary size" you use for the encoder. As you might
>> expect, a larger dictionary gives you better compression ratios, but
>> the target running the decoder will require at least that much memory
>> (e.g. an 8 KB dictionary size will require at least 8 KB of memory for
>> the decoder).
>
> Lempel-Ziv-Oberhumer (LZO) might also be worth investigating.
>
> http://en.wikipedia.org/wiki/Lempel–Ziv–Oberhumer
>
> The LZO library implements a number of algorithms with the
> following characteristics:
> * Compression is comparable in speed to deflate compression.
> * Very fast decompression
> * Requires an additional buffer during compression (of size 8 kB or 64 kB, depending on compression level).
> * Requires no additional memory for decompression other than the source and destination buffers.
> * Allows the user to adjust the balance between compression quality and compression speed, without affecting the speed of decompression.
>
> Regards.

Got the following in an email from Magnus Eriksson, who pled lack of 
Usenet access:
--------------

A bunch of years ago I read about something called "pucrunch", for
compressing programs for use on the C64, or in modern parlance perhaps
"a CPU and memory restricted target system" -- it has a 1 MHz CPU with
few registers, and 64K RAM in total.

Pucrunch uses LZ77 and run-length encoding, with some tweaks; that is
what the author decided was the right compromise between compression
ratio and memory usage.  And it is a "cruncher", which means that the
end result is a self-decompressing binary, one that unpacks a program to
RAM, and the decompressing code has to stay out of its way as far as
possible -- I believe the generic 6502 decompression routine takes up
only _354 bytes_ of code, if I'm reading the code comments correctly.

The catch is that you'll have to write your own decompression routine
(but that should hardly come as a surprise).  There is pseudocode in the
article, and 6502 and Z80 code linked.  The compressor is just one file
of C, so it should be easy to test the compression ratio first, at
least.

It will almost certainly be worse than 7-zip, but just how well it does
on a bitstream might be interesting to see.


You can find it here:

Article:
An Optimizing Hybrid LZ77 RLE Data Compression Program, aka
Improving Compression Ratio for Low-Resource Decompression

http://www.cs.tut.fi/~albert/Dev/pucrunch/

some related things in the parent directory too:
http://www.cs.tut.fi/~albert/Dev/


Hope that might be of some help, or inspiration to further
experimentation.


Take care,
Magnus

Article: 152276
Subject: Re: Bitstream compression
From: Mike Perkins <spam@spam.com>
Date: Tue, 02 Aug 2011 00:57:15 +0100
Links: << >> << T >> << A >>

On 29/07/2011 13:25, maxascent wrote:
>> On 29/07/2011 09:20, maxascent wrote:
>>> I cant really see why you are trying to do this. If its just for fun
> then
>>> great, but flash isn't that expensive and you are only talking about
> 30Mb
>>> not 30Gb.
>>>
>>> Jon 	
>>> 					
>>> ---------------------------------------		
>>> Posted through http://www.FPGARelated.com
>>
>> Many embedded processors have a limited amount of Flash memory, so it
>> would be an advantage to efficiently compress a bitstream to save a
>> component and IO.
>>
>
> Well you can get serial flash devices that use a small amount of IO and
> have large capacities. I dont think you would use the internal flash of a
> processor to store a bitstream unless it was very small. I just cant see
> the point of going to all the trouble to do this unless you could shrink
> the bitstream to almost nothing.
>

It depends, it so happens for one product I use an LPC2378 which has 
enough Flash storage for a Spartan XC3S400A bit file.  It saves an extra 
device, as well as allowing an easy upgrade path.

It's very much down to personal preference and choice of reducing costs.

-- 
Mike Perkins
Video Solutions Ltd
www.videosolutions.ltd.uk

Article: 152277
Subject: Re: die's in different packages
From: Greg Kramer <gkramer61@gmail.com>
Date: Tue, 2 Aug 2011 05:03:01 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 29, 5:15=A0pm, Gabor <ga...@szakacs.invalid> wrote:
> Sharan wrote:
> > can someone tell me if there is any differences in the die for the
> > following 2 devices in virtex-7
>
> > XC7VX415TFFG1158 and XC7VX415TFFG1927
>
> > Both these devices are listed as having same logic resources.
> > XC7VX415TFFG1158 has 35 X 35 mm package
> > XC7VX415TFFG1927 has 45 X 45 mm package
>
> > Can I assume that the only difference is the package and the

Also larger die will require more power and ground pins.

Article: 152278
Subject: Re: DVI-decoder clock question
From: johnp <jprovidenza@yahoo.com>
Date: Tue, 2 Aug 2011 07:01:17 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 29, 6:59=A0am, Mawa_fugo <cco...@netscape.net> wrote:
> Let say I have two DVI streams - generated by two encoders, those have
> different video contents but same pixel clock
>
> The two tmds streams travel thru cables then - are decoded by two
> decoders - then fed into an FPGA
>
> The question is how the two clocks at the output of the encoders look
> like, are they the same? Can we use only one clock for both channel to
> clock the data in the FPGA?
>
> =3D=3D=3D=3D=3D=3D=3D=3D
>
> My theory is that, the original clock goes to two 10x then divided
> back 1/10, so they are supposedly be in same phase... or what
> else ???

Although you may make it work in the lab, I doubt that you can
robustly use
only one clock.  If the encoders are using different crystals, their
frequencies
will be slightly off and will drift over time.  This will break
things.  Also, each
of the encoders may change their phase relationships as they warm up.

If you're running at high clock rates, you're headed for a lot of
heart-burn.

John P

Article: 152279
Subject: Re: FPGA security, Actel down, now Xilinx too?
From: "stephen.craven@gmail.com" <stephen.craven@gmail.com>
Date: Tue, 2 Aug 2011 10:17:49 -0700 (PDT)
Links: << >> << T >> << A >>


> A better metric for FPGA bitstream security, or any security product,
> is the cost per breach and/or time per breach. Assume it can be
> breached, and pick a method where the [cost/time]/[breach] equation
> works out in your favor.

The paper implies the cost is minimal, at least for the V2P parts. It
seems that the equipment required places the attack within the reach
of many universities and electronics companies.
http://eprint.iacr.org/2011/390.pdf
http://eprint.iacr.org/2011/391.pdf

"A full key recovery using 50000 measurements finishes in 8x39
minutes, i.e., in 6 hours (Virtex 4), and a full recovery on Virtex 5
devices using 90000 measurements finishes in 8x67 minutes,i.e., about
9 hours."


A semi-official Xilinx response is available on their forums:
http://forums.xilinx.com/t5/Virtex-Family-FPGAs/Successful-side-channel-attack-on-Virtex-4-and-5-bitstream/m-p/169062#M11290

In his post Austin Lesea says:
"...the attack is a sophisticated known attack method (Differential
Power Analysis) which all crypto chips and systems are subject to, and
there are no known and tested methods to avoid the attack (in theory,
all crypto chips are vulnerable -- although one company is selling
their patents, and is the primary driver behind getting this research
into the public eye).

In practice, the attacker requires access, so any means to prevent
access (anti-tamper) will prevent the attack, or make it more
difficult. Encryption of the bitstream is one aspect of the solution:
access control, and anti-tamper may also be required.

Xilinx continues to research (and provide) solutions. As with any
solution in crypto, the attackers will figure it out, and succeed
again. It is a never-ending battle between attacker, and defender."

Article: 152280
Subject: Re: VHDL horror in Xcell 76
From: valtih1978 <do@not.email.me>
Date: Tue, 02 Aug 2011 20:20:15 +0300
Links: << >> << T >> << A >>

Listening for all synchronizes your RST with CLK. Right?

Article: 152281
Subject: Regarding process time calculation
From: "varun_agr" <VARUN_AGR@n_o_s_p_a_m.n_o_s_p_a_m.YAHOO.COM>
Date: Wed, 03 Aug 2011 01:42:24 -0500
Links: << >> << T >> << A >>

Sir
I want to know that how we can calculate time taken by a process or in
Xilinx ISE anywhwhere we can see it as we are using concurrent programming
and want to know time taken by each process in behavioral modelling.
eg:
process(sensitivity list)

variable declaratons
begin
programming codes
end process
Now we want time taken by such process.
Thanks
Varun Maheshwari
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 152282
Subject: Re: VHDL horror in Xcell 76
From: "RCIngham" <robert.ingham@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com>
Date: Wed, 03 Aug 2011 04:31:44 -0500
Links: << >> << T >> << A >>

>Listening for all synchronizes your RST with CLK. Right?
>
"process (all) is" not, AFAIK, supported yet by XST.

Anyway, if you want to know how to synchronize resets, read this paper:
http://www.sunburst-design.com/papers/CummingsSNUG2003Boston_Resets.pdf

	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 152283
Subject: Re: VHDL horror in Xcell 76
From: valtih1978 <do@not.email.me>
Date: Wed, 03 Aug 2011 14:42:44 +0300
Links: << >> << T >> << A >>

Ok. You used a tricky way to say that Kolja actually meant

	_A_sync: process(all)

Article: 152284
Subject: Re: Regarding process time calculation
From: Martin Thompson <martin.j.thompson@trw.com>
Date: Wed, 03 Aug 2011 13:42:21 +0100
Links: << >> << T >> << A >>

"varun_agr" <VARUN_AGR@n_o_s_p_a_m.n_o_s_p_a_m.YAHOO.COM> writes:

> Sir
> I want to know that how we can calculate time taken by a process or in
> Xilinx ISE anywhwhere we can see it as we are using concurrent programming
> and want to know time taken by each process in behavioral modelling.
> eg:
> process(sensitivity list)
>
> variable declaratons
> begin
> programming codes
> end process
> Now we want time taken by such process.

It's entirely dependent on what chip you implement your logic in.
You'll have to synthesise and then run place and route.  The timing
analyser can then provide you with timing information.

What is normally done is to provide a clock that goes to every process
and wrap the logic in :

if rising_edge(clk) then
 -- logic description in here
end if;

You then tell the tools how fast the clock is that you intend to feed to
the device, and the timing tools will then tell you if the logic (and
internal wiring) you've described is fast enough to have finished
between two consecutive clock edges.

This is very unlike software programming, where you write some code and
then instrument it to get timing information (which will be more or less
variable depending on data access patterns, caching and all the other
non-determinism that comes with a modern microprocessor system).  

In hardware, the tools know everything they need to about the timing of
the chips and can give you a "cast-iron" statement as to how fast things
will be in the worst-case.

Cheers,
Martin

-- 
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.co.uk/capabilities/39-electronic-hardware

Article: 152285
Subject: Re: Regarding process time calculation
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 3 Aug 2011 16:33:46 +0000 (UTC)
Links: << >> << T >> << A >>

Martin Thompson <martin.j.thompson@trw.com> wrote:
> "varun_agr" <VARUN_AGR@n_o_s_p_a_m.n_o_s_p_a_m.YAHOO.COM> writes:

>> I want to know that how we can calculate time taken by a process or in
>> Xilinx ISE anywhwhere we can see it as we are using concurrent programming
>> and want to know time taken by each process in behavioral modelling.
(snip)

> It's entirely dependent on what chip you implement your logic in.
> You'll have to synthesise and then run place and route.  The timing
> analyser can then provide you with timing information.

(snip)
> This is very unlike software programming, where you write some code and
> then instrument it to get timing information (which will be more or less
> variable depending on data access patterns, caching and all the other
> non-determinism that comes with a modern microprocessor system).  

It will be variable, but if one wanted one could find a worst-case
time for most processors.  One normally wants closer to average case.
Assume no cache hits, no instruction overlap, it could be done.

> In hardware, the tools know everything they need to about the timing of
> the chips and can give you a "cast-iron" statement as to how fast things
> will be in the worst-case.

They might do some rounding up in the timing, to be sure.

-- glen

Article: 152286
Subject: Re: Regarding process time calculation
From: Tim Wescott <tim@seemywebsite.com>
Date: Wed, 03 Aug 2011 11:49:10 -0500
Links: << >> << T >> << A >>

On Wed, 03 Aug 2011 01:42:24 -0500, varun_agr wrote:

> Sir
> I want to know that how we can calculate time taken by a process or in
> Xilinx ISE anywhwhere we can see it as we are using concurrent
> programming and want to know time taken by each process in behavioral
> modelling. eg:
> process(sensitivity list)
> 
> variable declaratons
> begin
> programming codes
> end process
> Now we want time taken by such process. Thanks
> Varun Maheshwari

If you mean how long will it take in the FPGA, see Martin's answer.

If you mean how long it will take in the tool, be explicit in your 
question.  There may be a way to get the computer time; look in your 
documentation.

-- 
www.wescottdesign.com

Article: 152287
Subject: Re: Regarding process time calculation
From: Tim Wescott <tim@seemywebsite.com>
Date: Wed, 03 Aug 2011 11:52:20 -0500
Links: << >> << T >> << A >>

On Wed, 03 Aug 2011 13:42:21 +0100, Martin Thompson wrote:

> "varun_agr" <VARUN_AGR@n_o_s_p_a_m.n_o_s_p_a_m.YAHOO.COM> writes:
> 
>> Sir
>> I want to know that how we can calculate time taken by a process or in
>> Xilinx ISE anywhwhere we can see it as we are using concurrent
>> programming and want to know time taken by each process in behavioral
>> modelling. eg:
>> process(sensitivity list)
>>
>> variable declaratons
>> begin
>> programming codes
>> end process
>> Now we want time taken by such process.
> 
> It's entirely dependent on what chip you implement your logic in. You'll
> have to synthesise and then run place and route.  The timing analyser
> can then provide you with timing information.
> 
> What is normally done is to provide a clock that goes to every process
> and wrap the logic in :
> 
> if rising_edge(clk) then
>  -- logic description in here
> end if;
> 
> You then tell the tools how fast the clock is that you intend to feed to
> the device, and the timing tools will then tell you if the logic (and
> internal wiring) you've described is fast enough to have finished
> between two consecutive clock edges.
> 
> This is very unlike software programming, where you write some code and
> then instrument it to get timing information (which will be more or less
> variable depending on data access patterns, caching and all the other
> non-determinism that comes with a modern microprocessor system).
> 
> In hardware, the tools know everything they need to about the timing of
> the chips and can give you a "cast-iron" statement as to how fast things
> will be in the worst-case.

If that's so, then why have I seen so many first-cut FPGA designs fail 
when run over the full military temperature range?  And why have I seen 
FPGA designs fail during temperature cycling after months in production, 
after some unknown process change by the vendor?

Tools know _a lot_ of what they need, and can give you a _very good_ 
statement of how fast things need to be in worst case.  But if you really 
want things to be cast-iron solid then you need to be conservative in how 
you specify your margins, you need to design as if timing matters, and 
you need to make absolutely sure that any wiring that is external to the 
FPGA meets it's timing, too.

-- 
www.wescottdesign.com

Article: 152288
Subject: Re: Regarding process time calculation
From: Tim Wescott <tim@seemywebsite.com>
Date: Wed, 03 Aug 2011 11:54:11 -0500
Links: << >> << T >> << A >>

On Wed, 03 Aug 2011 16:33:46 +0000, glen herrmannsfeldt wrote:

> Martin Thompson <martin.j.thompson@trw.com> wrote:
>> "varun_agr" <VARUN_AGR@n_o_s_p_a_m.n_o_s_p_a_m.YAHOO.COM> writes:
> 
>>> I want to know that how we can calculate time taken by a process or in
>>> Xilinx ISE anywhwhere we can see it as we are using concurrent
>>> programming and want to know time taken by each process in behavioral
>>> modelling.
> (snip)
> 
>> It's entirely dependent on what chip you implement your logic in.
>> You'll have to synthesise and then run place and route.  The timing
>> analyser can then provide you with timing information.
> 
> (snip)
>> This is very unlike software programming, where you write some code and
>> then instrument it to get timing information (which will be more or
>> less variable depending on data access patterns, caching and all the
>> other non-determinism that comes with a modern microprocessor system).
> 
> It will be variable, but if one wanted one could find a worst-case time
> for most processors.  One normally wants closer to average case. Assume
> no cache hits, no instruction overlap, it could be done.

If you're doing something that's really hard real time (i.e. fails once 
and the product -- or the operator -- dies), then for the critical cases 
you want to know absolute worst-case maximums.

Few things are really that hard real time, though.

-- 
www.wescottdesign.com

Article: 152289
Subject: Re: Regarding process time calculation
From: Niklas Holsti <niklas.holsti@tidorum.invalid>
Date: Wed, 03 Aug 2011 20:38:30 +0300
Links: << >> << T >> << A >>

glen herrmannsfeldt wrote:
> Martin Thompson <martin.j.thompson@trw.com> wrote:
>> "varun_agr" <VARUN_AGR@n_o_s_p_a_m.n_o_s_p_a_m.YAHOO.COM> writes:
> 
>>> I want to know that how we can calculate time taken by a process or in
>>> Xilinx ISE anywhwhere we can see it as we are using concurrent programming
>>> and want to know time taken by each process in behavioral modelling.
> (snip)
> 
>> It's entirely dependent on what chip you implement your logic in.
>> You'll have to synthesise and then run place and route.  The timing
>> analyser can then provide you with timing information.
> 
> (snip)
>> This is very unlike software programming, where you write some code and
>> then instrument it to get timing information (which will be more or less
>> variable depending on data access patterns, caching and all the other
>> non-determinism that comes with a modern microprocessor system).  
> 
> It will be variable, but if one wanted one could find a worst-case
> time for most processors.   One normally wants closer to average case.

Depends... for hard-real-time systems, the worst case execution time 
(WCET) is also important, at least for the certification of a 
safety-critical system.

> Assume no cache hits, no instruction overlap, it could be done.

For cached and pipelined processors such crude assumptions will usually 
give a hugely overestimated WCET bound, which may not be useful. There 
are tools that use advanced processor and program analysis to compute 
much better WCET bounds. See 
http://en.wikipedia.org/wiki/Worst-case_execution_time and the first 
referenced article therein.

Most current high-performance processors are so greedy and short-sighted 
in their internal scheduling that they have so-called "timing anomalies" 
which means, for example, that a cache hit at a particular point in the 
program may give a larger overall execution time than a cache miss at 
that point. Finding the worst-case behaviour by manual methods or 
intuitive reasoning is quite hard for such processors.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Article: 152290
Subject: Re: XST 13.1 explodes with generic of enum type with only one
From: Brian Drummond <brian@shapes.demon.co.uk>
Date: Wed, 3 Aug 2011 18:54:09 +0000 (UTC)
Links: << >> << T >> << A >>

On Fri, 01 Jul 2011 11:34:28 +0100, Martin Thompson wrote:

> Brian Drummond <brian@shapes.demon.co.uk> writes:

>> I haven't got around to trying out my carefully cultivated nest of
>> vipers on ISE13 yet.
> Do let us know how you fare :)

(about a month later...) By and large, now trouble free. 
A couple of workarounds for previous bugs are now illegal (or always 
were, but are now detected) but that's OK because the original bugs have 
been fixed.

Type conversions are now usable on output ports (but in the new parser, 
i.e. by default for newer devices only!) 

The only defect I have seen is that ISIM is dog slow and has a huge 
memory footprint for post-route sims on new devices (Spartan-6) - about 
an order of magnitude larger/slower than the same simulation targetting 
Spartan-3.

A definite improvement.
- Brian

Article: 152291
Subject: Re: What is the advantage of source-syncronization (in SDRAMs)?
From: trag <trag@prismnet.com>
Date: Wed, 3 Aug 2011 15:23:35 -0700 (PDT)
Links: << >> << T >> << A >>

On Jun 15, 11:45=A0am, valtih1978 <d...@not.email.me> wrote:
> In your explanation, only one thing is missing: DQS. Why do we need data
> if we still need to calibrate "memory to the clock"? One could calibrate
> DQ directly "to the clock inside FPGA".

On a Read, the DQS signals are generated by the memory chip.  The
relationship between the DQ and DQS signals stays fairly constant (for
a given circuit board layout and operating frequency).   However, the
relationship between DQS and the DDR2 controller clock is not
necessarily the same all the time.

Once the relationship between DQS and DQ has been determined, either
through data training (testing data samples and seeing what timing
works) or through a timing feedback pathway) that timing relationship
can be used reliably to perform read operations.

In my experience, once the timing parameters have been determined,
they're usable on all instantiations of the same circuit board using
the same parts at the same speed.

Jeff Walther

Article: 152292
Subject: Re: Regarding process time calculation
From: Martin Thompson <martin.j.thompson@trw.com>
Date: Thu, 04 Aug 2011 10:11:22 +0100
Links: << >> << T >> << A >>

glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:

> Martin Thompson <martin.j.thompson@trw.com> wrote:
>> This is very unlike software programming, where you write some code and
>> then instrument it to get timing information (which will be more or less
>> variable depending on data access patterns, caching and all the other
>> non-determinism that comes with a modern microprocessor system).  
>
> It will be variable, but if one wanted one could find a worst-case
> time for most processors.  One normally wants closer to average case.

Normally in a non-real-time system.  I don't do many of those, so that
colours my comments ;)

> Assume no cache hits, no instruction overlap, it could be done.
>

It could, yes, but you'd have to do it yourself.  That's part of the
point - in FPGA-land we have tools that do it for us.

Also, absolutely-worst-case for a cached "modern" processor would be
dreadfully pessimistic, to the point of being very little use IMHO.  For
example, how would a simple sort come out if you assumed no cache hits
at all throughout execution?  We know that's too extreme, but how do we
put bounds on what is "sensible"?

More usefully (again IMHO) there are statistical methods for measuring
execution time and its variability down various code paths, which can be
used to provide arbitrarily high levels of confidence as to the
likelihood of missing a real-time deadline, as well as showing what to
optimise to improve the worst-case - this sometimes means using what
might be regarded in mainstream compsci as "inefficient" algorithms, as
they have better bounds on worst-case performance, even at the expense
of average performance.  I ought to stop now, I'm no doubt "rambling to
the choir" as well as getting off-topic :)

>> In hardware, the tools know everything they need to about the timing of
>> the chips and can give you a "cast-iron" statement as to how fast things
>> will be in the worst-case.
>
> They might do some rounding up in the timing, to be sure.
>

I'm not sure what you mean by that:  The tools "know" worst-case timings
for the various silicon paths and add them all up.  Rounding doesn't
come into it.

> -- glen

-- 
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.co.uk/capabilities/39-electronic-hardware

Article: 152293
Subject: Re: Regarding process time calculation
From: Martin Thompson <martin.j.thompson@trw.com>
Date: Thu, 04 Aug 2011 10:56:51 +0100
Links: << >> << T >> << A >>

Tim Wescott <tim@seemywebsite.com> writes:

> On Wed, 03 Aug 2011 13:42:21 +0100, Martin Thompson wrote:
>
>> In hardware, the tools know everything they need to about the timing of
>> the chips and can give you a "cast-iron" statement as to how fast things
>> will be in the worst-case.
>
> If that's so, then why have I seen so many first-cut FPGA designs fail 
> when run over the full military temperature range?  And why have I seen 
> FPGA designs fail during temperature cycling after months in production, 
> after some unknown process change by the vendor?

I can think of lots of reasons that don't come down to the quality of
the timing models...  Or did you track the problem down to an error in
the timing analysis tools and models?

>
> Tools know _a lot_ of what they need, and can give you a _very good_ 
> statement of how fast things need to be in worst case.  

I put "cast-iron" in quotes because nothing in real-life is ever 100%.
They are very good though (IME) when used correctly... but only with the
things that they know about - the innards of the silicon.  In the
context of the original question "how long will my logic take to run?" I
think that's reasonable.

> But if you really 
> want things to be cast-iron solid then you need to be conservative in how 
> you specify your margins, you need to design as if timing matters, and 
> you need to make absolutely sure that any wiring that is external to the 
> FPGA meets it's timing, too.

Of course for a complete system, the engineers responsible still need to
do all the "external engineering" to make sure that they have (amongst other
things):

* Specified the timing constraints correctly 
  (and covered all the paths that matter!)
* Supplied "clean enough" power
* Supplied a "clean enough" clock
* Crossed clock domains properly
* Taken into account the timings of external parts (as you mentioned)
* Used production-grade speedfiles.
* ... and yes, put a bit more margin on top 
  (the amount of which depends on the criticality of the system

Cheers,
Martin

-- 
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.co.uk/capabilities/39-electronic-hardware

Article: 152294
Subject: Re: image storing into BRAM
From: valtih1978 <do@not.email.me>
Date: Thu, 04 Aug 2011 16:35:33 +0300
Links: << >> << T >> << A >>

I remember there was a picoBlaze-related program that could update BRAM 
in a compiled bitstream with any userdata.

Article: 152295
Subject: Re: Regarding process time calculation
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 4 Aug 2011 17:28:46 +0000 (UTC)
Links: << >> << T >> << A >>

Martin Thompson <martin.j.thompson@trw.com> wrote:

(snip on timing of processors and FPGAs)

> More usefully (again IMHO) there are statistical methods for measuring
> execution time and its variability down various code paths, which can be
> used to provide arbitrarily high levels of confidence as to the
> likelihood of missing a real-time deadline, as well as showing what to
> optimise to improve the worst-case - this sometimes means using what
> might be regarded in mainstream compsci as "inefficient" algorithms, as
> they have better bounds on worst-case performance, even at the expense
> of average performance.  I ought to stop now, I'm no doubt "rambling to
> the choir" as well as getting off-topic :)

Sounds right to me.  Now, isn't that also true for FPGAs?

>>> In hardware, the tools know everything they need to about the timing 
>>> of the chips and can give you a "cast-iron" statement as to how 
>>> fast things will be in the worst-case.

>> They might do some rounding up in the timing, to be sure.

> I'm not sure what you mean by that:  The tools "know" worst-case 
> timings for the various silicon paths and add them all up.  
> Rounding doesn't come into it.

Consider that some timings might be Gaussian distributed with
a tail to infinity.  Also, that it is really difficult to get
the exact timings for all possible routing paths.  As with the
real-time processor, you don't really want worst-case, but
just sufficiently unlikely to be exceeded times.

Some of the distributions won't quite be Gaussian, as some of
the worst-case die are rejected.  Consider, though, that there
are a finite number of electrons on a FET gate, ever decreasing
as they get smaller.  

Some of the favorite problems in physics class include determining
the probability of all the air molecules going to one half of
a room, or of a person quantum-mechanically tunneling through
a brick wall, running at a given speed.   Both have a very low,
but non-zero, probability.

-- glen

Article: 152296
Subject: Re: DVI-decoder clock question
From: Ed McGettigan <ed.mcgettigan@xilinx.com>
Date: Thu, 4 Aug 2011 10:51:44 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 29, 6:59=A0am, Mawa_fugo <cco...@netscape.net> wrote:
> Let say I have two DVI streams - generated by two encoders, those have
> different video contents but same pixel clock
>
> The two tmds streams travel thru cables then - are decoded by two
> decoders - then fed into an FPGA
>
> The question is how the two clocks at the output of the encoders look
> like, are they the same? Can we use only one clock for both channel to
> clock the data in the FPGA?
>
> =3D=3D=3D=3D=3D=3D=3D=3D
>
> My theory is that, the original clock goes to two 10x then divided
> back 1/10, so they are supposedly be in same phase... or what
> else ???

If the same source is used for the pixel clock of both encoders, and
by same I mean only one physical clock oscillator is used for both
encoders, then you can be sure that that the bit rate for both
encoders is the same.  However, there will be no guaranteed phase
relationship between the data output of the two encoders.

You could use the same original clock source, or one of the two clock
outputs from the encoders, and a dynamic phase aligner for the
receivers in the FPGA to cut down on the clocking resource requirement
in the FPGA.  However, you may find it easier to use the clock/data
from each encoder to capture the data and then put it through a simple
shallow depth synchronous FIFO and use a single global clock for the
rest of your system.

Ed McGettigan
--
Xilinx Inc.

Article: 152297
Subject: Re: Regarding process time calculation
From: Tim Wescott <tim@seemywebsite.com>
Date: Thu, 04 Aug 2011 13:06:25 -0500
Links: << >> << T >> << A >>

On Thu, 04 Aug 2011 10:56:51 +0100, Martin Thompson wrote:

> Tim Wescott <tim@seemywebsite.com> writes:
> 
>> On Wed, 03 Aug 2011 13:42:21 +0100, Martin Thompson wrote:
>>
>>> In hardware, the tools know everything they need to about the timing
>>> of the chips and can give you a "cast-iron" statement as to how fast
>>> things will be in the worst-case.
>>
>> If that's so, then why have I seen so many first-cut FPGA designs fail
>> when run over the full military temperature range?  And why have I seen
>> FPGA designs fail during temperature cycling after months in
>> production, after some unknown process change by the vendor?
> 
> I can think of lots of reasons that don't come down to the quality of
> the timing models...  Or did you track the problem down to an error in
> the timing analysis tools and models?

Most of my experience with this has been as an amused observer, rather 
than an appalled participant.  It stemmed from a corporate rule "thou 
shalt make the synthesis tool happy about timing" and a rather aggressive 
design group from out of state that was known to have scripts that would 
do synthesis runs over and over until one happened to meet timing, then 
stop and ship that file 'upstairs'.

The balance has stemmed from folks taking the Xilinx speed grades at face 
value, and using what the synthesis tool says (using Xilinx's defaults) 
at face value.  Once the group learned that you have to force a bit of 
margin into the process (and the group thinks that a design that fails to 
synthesize once out of ten is a problem, as opposed to thinking that a 
design that succeeds once out of twenty is 'shippable') then those 
problems went away.

-- 
www.wescottdesign.com

Article: 152298
Subject: Re: die's in different packages
From: Ed McGettigan <ed.mcgettigan@xilinx.com>
Date: Thu, 4 Aug 2011 11:07:12 -0700 (PDT)
Links: << >> << T >> << A >>

On Aug 1, 7:09=A0am, "sharanbr" <sharanb@n_o_s_p_a_m.hcl.com> wrote:
> >Sharan wrote:
> >> can someone tell me if there is any differences in the die for the
> >> following 2 devices in virtex-7
>
> >> XC7VX415TFFG1158 and XC7VX415TFFG1927
>
> >> Both these devices are listed as having same logic resources.
> >> XC7VX415TFFG1158 has 35 X 35 mm package
> >> XC7VX415TFFG1927 has 45 X 45 mm package
>
> >> Can I assume that the only difference is the package and the
> >> underlying die is going to be the same?
>
> >> TIA
>
> >I'm not in the V7 camp yet, but to date that is the way
> >Xilinx handles chip products. =A0For each family and gate
> >size there is one die, which has different bond-outs
> >depending on the package. =A0If it's a small die in a
> >large package you end up with non-connected package pins.
> >A large part in a small package ends up with unbonded
> >IOB's. =A0But open the parts in the FPGA editor and you
> >see the exact same diagram with just different labels
> >on the IOB's. =A0In fact on the smaller packages you
> >can use the unbonded IOB's as extra resources if you
> >run out of fabric flops for example.
>
> >-- Gabor
>
> Thanks, Gabor.
>
> Also, can you tell why certain devices are in a specific package while
> another device with more supported pins is in a smaller package.
>
> For example (example only, nothing specific to Virtex-7)
> v7vx1140t - min package is 45 x 45 mm package & supports 480 pins
> v7vx585T =A0- min package is 35 x 35 mm package & supports 600 pins
>
> I am not sure why a 480 pin needs 45X45 mm package while 600 pins is put =
in
> a 35X35 mm package
>
> --------------------------------------- =A0 =A0 =A0 =A0
> Posted throughhttp://www.FPGARelated.com- Hide quoted text -
>
> - Show quoted text -

The 7VX1140T will be available in two packages the FLG1928 and the
FLG1930 both of these are 45x45mm 1.0mm pitch packages.

The 7VX1140T-FLG1928 supports 0 GTX lanes, 96 GTH lanes and 480 Select
I/O.
The 7VX1140T-FLG1930 supports 0 GTX lanes, 24 GTH lanes and 1100
Select I/O.

The 7VX585T will be available in two packages the FFG1157 (35x35mm)
and the FFG1761 (42.5x42.5mm) both of these are 1.0mm pitch packages.

The 7VX585T-FFG1157 supports 20 GTX lanes, 0 GTH lanes and 600 Select
I/O.
The 7VX585T-FFG1761 supports 36 GTX lanes, 0 GTH lanes and 850 Select
I/O.

Your original comparison was only for the Select I/O pins which is why
it seemed odd to you.  Each of the GTX/GTH (MGT) lanes takes 4 pins
for the TX/RX and for each quad block of MGTs there are 2 reference
clocks (4 pins) plus additional power and ground pins.

Ed McGettigan
--
Xilinx Inc.

Article: 152299
Subject: Re: FPGA security, Actel down, now Xilinx too?
From: "stephen.craven@gmail.com" <stephen.craven@gmail.com>
Date: Thu, 4 Aug 2011 13:01:01 -0700 (PDT)
Links: << >> << T >> << A >>

> following post has link to documents that show that Xilinx V2/V4/V5 are vulnerable as well.
>
> http://it.slashdot.org/story/11/07/21/1753217/FPGA-Bitstream-Security...

Thought I'd add a few links to the discussion.

A post from a Xilinx employee (Austin Lesea) from 2008, discussing the
lack of successful Differential Power Analysis (DPA) attacks on Xilinx
FPGAs:
http://groups.google.com/group/comp.arch.fpga/msg/12769d42109799c4

"All 7 challengers gave up.  Their basic conclusion was all the things
they thought would work, differential power attack, spoofing by power
glitches, attack with freeze spray, etc. FAILED."

A recent post from the same Xilinx employee responding to the latest
announcement of successful DPA attacks on V2P, V4, and V5 FPGAs:
http://forums.xilinx.com/t5/Virtex-Family-FPGAs/Successful-side-channel-attack-on-Virtex-4-and-5-bitstream/m-p/169062/message-uid/169062/highlight/true#U169062

"Encryption of the bitstream is one aspect of the solution: access
control, and anti-tamper may also be required."

Original papers describing attacks:
http://eprint.iacr.org/2011/390.pdf
http://eprint.iacr.org/2011/391.pdf

Stephen

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search