Messages from 12225

Article: 12225
Subject: RAM Implementation in Altera Flex10K100A
From: "Edwin Grigorian" <edwin.grigorian@jpl.nasa.gov>
Date: Mon, 5 Oct 1998 19:02:34 -0700
Links: << >> << T >> << A >>

I have a design that I'm trying to implement in an Altera Flex10k100A part
and it requires:

   Type A:      3 blocks of 16 x 64 bits RAM =  3 x 1024 = 3072 bits
   Type B:     16 blocks of  8 x  8 bits RAM = 16 x   64 = 1024 bits
                                    Total RAM            = 4096 bits

Now the 10k100A has 12 memory blocks (EABs), each 2048 bits for a total of
24k bits.  Each EAB can be configured in any of the following: 256x8, 512x4,
1024x2, or 2048x1.

When I place and route (P&R) my design with MaxPlusII, it fails to implement
the design in a single 10K100, and if I allow it to fit to multiple devices,
it ends up routing it in 5 devices.   From the results, it looks like a
single type A RAM consumes 8 EABs (out of a total of 12) since it needs to
cascade that many to create a 'x64' row.  This seems terribly wasteful and
inefficient.  This is only 4kbits out of a total capacity of 24kbits.  As a
possible solution, I've tried to 'clique' (group) each RAM block together,
but that causes more severe problems (even more devices).

Does anyone know of better way to implement these RAMs in this type Altera
part?


Thanks in Advance,

Edwin Grigorian
JPL

Article: 12226
Subject: Re: RAM Implementation in Altera Flex10K100A
From: Ray Andraka <no_spam_randraka@ids.net>
Date: Mon, 05 Oct 1998 22:43:51 -0400
Links: << >> << T >> << A >>

You didn't mention the speed at which you need to access the RAM.  Is it low
enough to read successive bytes out of an EAB and assemble them in your logic?
If you need all 64 bits at a high enough rate that you can't afford multiple EAB
accesses to get the data, then you are stuck using 8 EABs for each type A
memory.

How about using a Xilinx part?  the memory you describe would occupy 32 CLBs for
each of the 3 "type A" blocks and 4 CLBs for each of the 16 "Type B" blocks, for
a total of 160 CLBs for the memory.  An XCS40 (Spartan equivalent of an XC4020)
is 28x28=784 CLBs, so the memory only occupies a tad over 20% of the device.
XCS40's are quite a bit cheaper than a 10K100A too.

This is yet another reason I prefer the Xilinx 4K architecture over Altera for
DSP and data flow applications (Try doing a couple of delay queues in
Altera...you'll either use up your EABs quickly or waste an awful lot of LEs to
mimic memory).

Edwin Grigorian wrote:

> I have a design that I'm trying to implement in an Altera Flex10k100A part
> and it requires:
>
>    Type A:      3 blocks of 16 x 64 bits RAM =  3 x 1024 = 3072 bits
>    Type B:     16 blocks of  8 x  8 bits RAM = 16 x   64 = 1024 bits
>                                     Total RAM            = 4096 bits
>
> Now the 10k100A has 12 memory blocks (EABs), each 2048 bits for a total of
> 24k bits.  Each EAB can be configured in any of the following: 256x8, 512x4,
> 1024x2, or 2048x1.
>
> When I place and route (P&R) my design with MaxPlusII, it fails to implement
> the design in a single 10K100, and if I allow it to fit to multiple devices,
> it ends up routing it in 5 devices.   From the results, it looks like a
> single type A RAM consumes 8 EABs (out of a total of 12) since it needs to
> cascade that many to create a 'x64' row.  This seems terribly wasteful and
> inefficient.  This is only 4kbits out of a total capacity of 24kbits.  As a
> possible solution, I've tried to 'clique' (group) each RAM block together,
> but that causes more severe problems (even more devices).
>
> Does anyone know of better way to implement these RAMs in this type Altera
> part?

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 12227
Subject: Re: Power estimation of XILINX XV series
From: Phil Hays <spampostmaster@sprynet.com>
Date: Mon, 05 Oct 1998 20:22:27 -0700
Links: << >> << T >> << A >>

Andreas Doering wrote:
> 
> I plan to use an XV (propably XC40150XVBG600) for a research
> project. I want to plan the power supply.
> I only found the estimation of I/O-Power.

How accurate of an estimate do you need?

Power supply current for an FPGA design is really hard to estimate. 
Power use will vary from placement and route to placement and route as
different routing resources will be used for signals, each of which
switches at different rates with different loads.  Unless you maintain
control over the design at a very low level, perhaps reasonable with a
very regular design with fixed placement and very repeatable routing,
you just don't know enough about the circuit to hope to do an accurate
estimate.

-- 
Phil Hays
"Irritatingly,  science claims to set limits on what 
we can do,  even in principle."   Carl Sagan

Article: 12228
Subject: Re: Power estimation of XILINX XV series
From: msimon@tefbbs.com
Date: Tue, 06 Oct 1998 07:01:41 GMT
Links: << >> << T >> << A >>

It would be useful to have some idea what is likely.

                       1. You clock every possible gate at the highest

		possible speed and the worst possible temperature for 
		a mS or two. - what is the current. (this worst case 
		won't last long in real life - thermal effects)

	          2. Package thermal limited continous current use at

		temperature limits and 25 C.

Simon
========================================================================
Phil Hays <spampostmaster@sprynet.com> wrote:

>Andreas Doering wrote:
>> 
>> I plan to use an XV (propably XC40150XVBG600) for a research
>> project. I want to plan the power supply.
>> I only found the estimation of I/O-Power.
>
>How accurate of an estimate do you need?
>
>Power supply current for an FPGA design is really hard to estimate. 
>Power use will vary from placement and route to placement and route as
>different routing resources will be used for signals, each of which
>switches at different rates with different loads.  Unless you maintain
>control over the design at a very low level, perhaps reasonable with a
>very regular design with fixed placement and very repeatable routing,
>you just don't know enough about the circuit to hope to do an accurate
>estimate.
>
>
>-- 
>Phil Hays
>"Irritatingly,  science claims to set limits on what 
>we can do,  even in principle."   Carl Sagan

Design Your Own MicroProcessor(tm) http://www/tefbbs.com/spacetime/index.htm

Article: 12229
Subject: Re: RAM Implementation in Altera Flex10K100A
From: Koenraad Schelfhout <koenraad.schelfhout@alcatel.be>
Date: Tue, 06 Oct 1998 09:40:37 +0200
Links: << >> << T >> << A >>

An EAB in a 10KA series has a maximum of 8 output bits
so : 3 blocks of 64 bits wide :  3x8 = 24 EABs
    16 blocks of  8 bits wide : 16x1 = 16 EABs
  Thus in total you need 40 EABs, thus at least 4 10K100A's.

You are right that Altera is not so well suited for wide rams
They are better suited for deep rams (high number of addresses).

An alternative that I used was to implement some of the rams
(dual port rams) into FFs.  But of course this could also
consume a lot in your case.


Edwin Grigorian wrote:

> I have a design that I'm trying to implement in an Altera Flex10k100A part
> and it requires:
>
>    Type A:      3 blocks of 16 x 64 bits RAM =  3 x 1024 = 3072 bits
>    Type B:     16 blocks of  8 x  8 bits RAM = 16 x   64 = 1024 bits
>                                     Total RAM            = 4096 bits
>
> Now the 10k100A has 12 memory blocks (EABs), each 2048 bits for a total of
> 24k bits.  Each EAB can be configured in any of the following: 256x8, 512x4,
> 1024x2, or 2048x1.
>
> When I place and route (P&R) my design with MaxPlusII, it fails to implement
> the design in a single 10K100, and if I allow it to fit to multiple devices,
> it ends up routing it in 5 devices.   From the results, it looks like a
> single type A RAM consumes 8 EABs (out of a total of 12) since it needs to
> cascade that many to create a 'x64' row.  This seems terribly wasteful and
> inefficient.  This is only 4kbits out of a total capacity of 24kbits.  As a
> possible solution, I've tried to 'clique' (group) each RAM block together,
> but that causes more severe problems (even more devices).
>
> Does anyone know of better way to implement these RAMs in this type Altera
> part?
>
> Thanks in Advance,
>
> Edwin Grigorian
> JPL



--

 Koenraad SCHELFHOUT

 Alcatel Telecom
 Switching Systems Division          http://www.alcatel.com/
 Microelectronics Department - VA21     _______________
________________________________________\             /-___
                                         \           / /
 Phone : (32/3) 240 89 93                 \ ALCATEL / /
 Fax   : (32/3) 240 99 88                  \       / /
 mailto:koenraad.schelfhout@alcatel.be      \     / /
_____________________________________________\   / /______
                                              \ / /
 Francis Wellesplein, 1                        v\/
 B-2018  Antwerpen
 Belgium

Article: 12230
Subject: Re: Synthesis: Exemplar or Synopsys
From: NOJUNK@gecm.com (Dave Storrar)
Date: Tue, 06 Oct 1998 07:54:26 GMT
Links: << >> << T >> << A >>

On 05 Oct 1998 12:23:49 -0400, Scott Bilik <sbilik@nospam.tiac.net>
wrote:

Thanks for pointing that out, Scott.  We're a bit behind the times
here with our Leonardo releases - can't have had one for at least a
month ;-)

Dave

-- 
REPLACE "NOJUNK" in address with "david.storrar" to reply
Development Engineer       |
Marconi Electronic Systems | Tel: +44 (0)131 343 4484
RCS                        | Fax: +44 (0)131 343 4091

Article: 12231
Subject: Re: FIR Filter Design
From: kartheepan@t-three.com (M Kartheepan)
Date: Tue, 06 Oct 1998 08:37:05 GMT
Links: << >> << T >> << A >>

On Wed, 30 Sep 1998 23:14:58 -0400, Ray Andraka
<no_spam_randraka@ids.net> wrote:

>Huh??
>An FIR filter implemented in an FPGA outperforms one implemented in a DSP by a
>wide margin.  The more taps, the higher the performance gain.  The FPGA
>implementations perform the multiplications for all the taps in parallel, where
>a DSP microprocessor computes the taps sequentially.  

The above statement bothers me a little. If you need to perform the
multiplications for all taps in parallel, dont you have to have that
many multipliers in hardware ?  I am just off of a SOC design where I
designed many filters in hardware and one thing stood out : High
precision signed multipliers are not cheap, in gate count and levels
of logic. I eventually implemented filters with one multiplier doing
one tap per clock cycle and that was no different than  the sequential
operation of the DSP.

I would go for the FPGA over the DSP when DSP looks like an overkill
for an embedded application. Otherwise, I do not see FPGA very
attractive, atleast not yet, for I have seen people spending more time
tackling the problems of FPGA rather than the actual design itself.

Just my 2 cents :)

Kartheepan, M

Article: 12232
Subject: Re: Synthesis: Exemplar or Synopsys
From: ems@nospam.riverside-machines.com
Date: Tue, 06 Oct 1998 09:38:19 GMT
Links: << >> << T >> << A >>

On 05 Oct 1998 12:23:49 -0400, Scott Bilik <sbilik@nospam.tiac.net>
wrote:

>So TO SUMMARIZE:
>
>It does have great scripting capabilities. And it's scripting language
>is not limited to merely setting variables (ala Synplify). No GUI
>ever need be invoked. So your worries (and flame) are for naught, at
>least this time. :)

do you also get the scripting features on level 1? and any news on
level 0??

evan

Article: 12233
Subject: Re: A Johnson counter
From: Jonathan Bromley <jsebromley@brookes.ac.uk>
Date: Tue, 06 Oct 1998 13:21:27 +0100
Links: << >> << T >> << A >>

David R Brooks wrote:
<snip count sequence for 6-bit twisted ring counter> 
>  This has of course, used 12 of the 64 states possible in 6 bits. All
> the other states are illegal. If inadvertently entered, the counter
> will usually continue cycling through illegal states. A clean reset
> will start it off right, but if your application must be sure to
> recover from faults, you'll need to trap those illegal states.

Exactly so.  To do this you need to
a) determine all the possible cycles of illegal states
   (don't forget there may be multiple non-intersecting cycles)
b) invent some combinatorial gubbins that will identify
   at least one state in EACH of these cycles, and NO states
   in the wanted cycle
c) use that logic to force the counter into one of the legal
   states and/or raise a fault flag

The same argument applies to one-hot state machines and
indeed any state machine with any illegal states, i.e. any 
(sub)system with N flipflops but fewer than 2^N legal states.

This is a very interesting problem, which most textbooks and
teachers miserably fail to address.  The good reasons for using
one-hot, well rehearsed here and elsewhere, are the small 
amounts of next-state logic and hence the fast inter-flipflop
logic paths they yield.  Conventional wisdom says that 
one-hot SMs are therefore likely to be smaller and faster than
encoded SMs, at least in flipflop-rich FPGAs.  Similarly, Johnson
counters have the benefit of very high speed, tiny amounts of
next-state logic, and (irrelevant to this post, but very useful)
freedom from output decode spikes when used to implement one-of-N
sequencers.

BUT, BUT, BUT: If you are trying to design a _robust_ system,
in which illegal states are detected and dealt-with as they 
should be, you will likely end up with unwieldy amounts of 
illegal-state detection logic which will be just as expensive
of speed and real-estate as the next-state logic for a fully
encoded system!

I am very distrustful of authors who assert that effective start-up
reset will solve this problem.  As FPGA design rules fall to 0.35u
and below, we will surely see soft errors in FPGAs
occasionally?  In any case, in high-reliability applications
you have to accept that metastability will come up from behind
and get you one day.  Sure, I have designed plenty of systems
with a few un-caught illegal states - but I'm not proud of it.

Anyone else as worried about this as I am?

Jonathan Bromley
--

Article: 12234
Subject: Re: Which FPGA tool is better
From: James Doherty <jmdrmd@ix.netcom.com>
Date: Tue, 06 Oct 1998 08:33:06 -0400
Links: << >> << T >> << A >>



Since I am a MINC employee I'll try to keep this as spam free as possible but...

Try MINC's PLSynthesizer.  We were reviewed in the 9/12 issue of EDN along with
FPGAXpress and Accolade with good performance results.  Synplicity refused to take
part.  Here in Boston, we have had some strong design wins against the Big 2 at a
couple of "high profile"accounts.
PLS *does* support scripting.
www.minc.com

Jay Doherty
MINC/Synario Design Automation
508-893-7944
jdoherty@synario.com

Article: 12235
Subject: Re: FIR Filter Design
From: Rickman <spamgoeshere4@yahoo.com>
Date: Tue, 06 Oct 1998 09:34:32 -0400
Links: << >> << T >> << A >>

M Kartheepan wrote:
> 
> On Wed, 30 Sep 1998 23:14:58 -0400, Ray Andraka
> <no_spam_randraka@ids.net> wrote:
> 
> >Huh??
> >An FIR filter implemented in an FPGA outperforms one implemented in a DSP by a
> >wide margin.  The more taps, the higher the performance gain.  The FPGA
> >implementations perform the multiplications for all the taps in parallel, where
> >a DSP microprocessor computes the taps sequentially.
> 
> The above statement bothers me a little. If you need to perform the
> multiplications for all taps in parallel, dont you have to have that
> many multipliers in hardware ?  I am just off of a SOC design where I
> designed many filters in hardware and one thing stood out : High
> precision signed multipliers are not cheap, in gate count and levels
> of logic. I eventually implemented filters with one multiplier doing
> one tap per clock cycle and that was no different than  the sequential
> operation of the DSP.
> 
> I would go for the FPGA over the DSP when DSP looks like an overkill
> for an embedded application. Otherwise, I do not see FPGA very
> attractive, atleast not yet, for I have seen people spending more time
> tackling the problems of FPGA rather than the actual design itself.
> 
> Just my 2 cents :)
> 
> Kartheepan, M

I agree in your analysis. But I disagree with your conclusions. I think
DSP is most usefull anytime you can fit the processing into the
available DSP MIPs. If your embedded application is so light that you
don't need a DSP then I would use a standard embedded micro for the
task. But I would only use FPGAs for DSP when I needed MORE MIPs than
the DSP can provide. 

The design time for a micro is the least since they have good support
and are easy to program in high level language. You might have to
program a DSP fuction in assembler to meet your performance goals
however. 

A DSP is a little harder to develop on since the architecture makes them
somewhat harder to use advanced development tools. But there are a lot
of DSP routines already written which makes some of the work easy. But
when you need every ounce of performance, you often need to use unusual
variations or even unusual algorithms which must be hand coded. 

Then the FPGA can give you the most performance with the highest
requirement for development effort. While processors can be debugged
using many available tools, the FPGA must be debugged in a slow
simulator or in a real system with more difficulty. 

Of course all of this is a generalization and any given problem may be
an exception. But I think most projects will find this ranking to be
true. 

-- 

Rick Collins

redsp@XYusa.net

remove the XY to email me.

Article: 12236
Subject: ADC & 8253 timer
From: Arlan Lucas de Souza <arlan@desq.feq.unicamp.br>
Date: Tue, 06 Oct 1998 11:03:48 -0300
Links: << >> << T >> << A >>

Hello!

Firstly, my english is not very good. Sorry!

I'm a brazilian chemical engineering and I'm a begginer in on line data
acquisition and computer interfacing. I have a board in my PC which
contains:

AD converter (12 bits input),
DC converter (8 bits input),
multiplexer, etc.

I'm not sure but a think that the board has also a 8253 programable
timer. I'm confused because I know that a PC has a 8253 timer. I don't
know if the board have its 8253 chip or if is using the 8253 of the PC.
I guess the first option because in the manual's board (the manual is
old, concise and bad written) is written that the conuter 0 and couter 1
are linked together (cascated, chain).

I'd like to measure a signals from four pressure transducer. I'm a
software loop to measure time but I think it is poor method. I'm trying
to use the 8253 timer.

My questions:
1) anyone adc or dac has a timer chip?
2) why to use a 8253 timer? Would I use a loop to simulated time 
intervals? 
3) how to program a 8253 timer if I have:
	base+12 = counter 0 port
	base+13 = counter 1 port
	base+14 = counter 2 port 
	base+15 = counter 3 port

Thanks in advance and, if possible, send me a e-mail too.

Arlan

Article: 12237
Subject: Re: FIR Filter Design
From: Ray Andraka <no_spam_randraka@ids.net>
Date: Tue, 06 Oct 1998 10:41:03 -0400
Links: << >> << T >> << A >>

There are tricks in hardware that can condense the design.  One of the most useful,
at least for filters, is distributed arithmetic which provides a technique to hide
the multiplications by rearranging the multiply-accumulates at the bit level.
Distributed arithmetic is a bit awkward to describe in a few lines, but I'll do my
best.

If we look at a FIR filter, the input is delayed in a tapped delay line.  At each
tap, a delayed version of the input is multiplied by a possibly unique coefficient.
The products from all the taps are then summed to obtain the filter output.

Now imagine that the input is only one bit wide for a moment.  In that case, the
multiply accumulate can be represented by a look-up table with one input for each
tap, and enough output bits to encompass any combination of the coefficients without
overflow.  This is illustrated by the table below, where the coefficients are
labeled A,B,C & D:

inputs    output
0000        0
0001        A
0010        B
0011        A+B
0100        C
0101        A+C
    ...
1101        A+C+D
1110        B+C+D
1111        A+B+C+D

Note that the multiply-accumulate in this case is accomplished by a 4 input look-up,
and is valid regardless of the width of the coefficients (the width of the table
needs to be sufficient to hold the sum of any combination of coefficients).  Now, in
most cases we need imputs that are wider than one bit.  In that case, we use an
identical instance of the table for each bit of the input to generate a partial sum
of products corresponding to each bit.  Those partials are then combined in an adder
tree.  The inputs to the adder tree are shifted to match the bit weights.  What we
have done, is applied the distributive property of addition and multiplication at
the bit level to reduce the multiply-accumulate to a look-up table and adder tree.
If speed is not as much an issue, the input can be presented one bit at a time to
the same look-up table to save logic resources.  The table output is then shifted
and accumulated till all the bits of the input are accounted for.  Even with the
input serialized, the result of the whole filter is produced in the number of clock
cycles equal to the number of bits in the input regardless of the number of taps
(the table size grows exponentially with the number of taps).   This is a
considerable improvement over having to multiply for each tap as in the method
described by M Kartheepan.  The size of the table can also be contained by combining
results of smaller tables in an adder.  For instance, an eight input MAC would
require a 256 entry table.  Alternatively, it can be constructed from two identical
4 input tables (one table is addressed by the first 4 inputs, the other by the
remaining 4 inputs) if the table outputs are summed.

For more detailed info, you might take a look at xilinx application note "The Role
of Distributed Arithmetic in FPGA based Signal Processing" which can be found at
http://www.xilinx.com/appnotes/theory1.pdf

As far as the time to develop an algorithm in an FPGA goes, a heavily data path
design can be developed reasonably quickly if you are familiar with the FPGA
architecture, tools and hardware implementation of algorithms and your library is
reasonably complete.  It also helps to do the data path design using schematics
rather than synthesizing it so that you can have the control over the design
implementation and placement to obtain good performance.  This said, I have done
fairly well packed (75% and greater utilization) high performance (40+ MHz) XC4028
data path designs in under a week (33 FPGA designs completed in the past year).

Another, not so obvious advantage of FPGAs over DSPs for medical applications is
that the Food and Drug Administration treats the FPGA programs as hardware rather
than software.  That can lead to a shorter product approval cycle when compared to a
similar product using a DSP microprocessor.

M Kartheepan wrote:

> On Wed, 30 Sep 1998 23:14:58 -0400, Ray Andraka
> <no_spam_randraka@ids.net> wrote:
>
> >Huh??
> >An FIR filter implemented in an FPGA outperforms one implemented in a DSP by a
> >wide margin.  The more taps, the higher the performance gain.  The FPGA
> >implementations perform the multiplications for all the taps in parallel, where
> >a DSP microprocessor computes the taps sequentially.
>
> The above statement bothers me a little. If you need to perform the
> multiplications for all taps in parallel, dont you have to have that
> many multipliers in hardware ?  I am just off of a SOC design where I
> designed many filters in hardware and one thing stood out : High
> precision signed multipliers are not cheap, in gate count and levels
> of logic. I eventually implemented filters with one multiplier doing
> one tap per clock cycle and that was no different than  the sequential
> operation of the DSP.
>
> I would go for the FPGA over the DSP when DSP looks like an overkill
> for an embedded application. Otherwise, I do not see FPGA very
> attractive, atleast not yet, for I have seen people spending more time
> tackling the problems of FPGA rather than the actual design itself.
>
> Just my 2 cents :)
>
> Kartheepan, M

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 12238
Subject: USAGE of XILINX "FROM:TO" for VHDL and IMPLEMENTATION
From: Nestor Caouras <nestor@ece.concordia.ca>
Date: Tue, 06 Oct 1998 13:22:04 -0400
Links: << >> << T >> << A >>

Hi everyone.

I was wondering if anyone has tried Xilinx's FROM:TO user constraint.  I
have a VHDL design that I would like to run at 40MHz (period of 25ns). 
I read on Xilinx's homepage (on one of their online slide presentations)
that I can use the following user constraint to ensure that the design
will run at 25MHz from flip flop to flop:

TIMESPEC TS04 = FROM FFS to FFS 25ns;

I have a few questions about this constraint.
1) From flip flop to flip flip assumes flip flops connected via the same
path?

          25ns        25ns
    FF1-------->FF2---------->FF3

2) Does this mean that if the design meets this 25ns constraint then
that data from one flip to another arranged as in 1) will only have 25ns
delay between them? I'm assuming this is only true for the internal flip
flops, not connected to IO pads (IPAD to FF, or FF to OPAD).

I tried compiling a design using this constraint, and I got some numbers
that I was not sure about.
The part I used was an xc4028EX-3 and I got the following results:

Pre  place-and-route timing under this constraint : 181.3MHz
Post place-and-route timing under this constraint : 41.3 MHz

I had also used a 25ns PERIOD constraint on my main clock net


Please post a reply to this newsgroup or email me if anyone knows more
about this constraint.

Many thanks in advance.

-- 
Nestor Caouras
nestor@ece.concordia.ca
http://www.ece.concordia.ca/~nestor/addr.html 
|-------------------------------------------|
| Dept. of Electrical and Computer Eng.     |
| Concordia University                      |
| 1455 de Maisonneuve Blvd (West)           |
| Montreal, Quebec, Canada H3G 1M8.         |
| Tel: (514)848-8784    Fax: (514)848-2802  |
|-------------------------------------------|

Article: 12239
Subject: Re: USAGE of XILINX "FROM:TO" for VHDL and IMPLEMENTATION
From: "Andy Peters" <apeters@noao.edu.NOSPAM>
Date: Tue, 6 Oct 1998 11:18:17 -0700
Links: << >> << T >> << A >>

Welcome to the wonderful and wacky world of Xilinx constraints!

Nestor Caouras wrote:

>I was wondering if anyone has tried Xilinx's FROM:TO user constraint.  I
>have a VHDL design that I would like to run at 40MHz (period of 25ns).
>I read on Xilinx's homepage (on one of their online slide presentations)
>that I can use the following user constraint to ensure that the design
>will run at 25MHz from flip flop to flop:
>
>TIMESPEC TS04 = FROM FFS to FFS 25ns;
>
>I have a few questions about this constraint.
>1) From flip flop to flip flip assumes flip flops connected via the same
>path?
>
>          25ns        25ns
>    FF1-------->FF2---------->FF3

By same "path," do you mean that FFs 1, 2, and 3 share the same clock?
Actually, what the constraint means is that ALL flipflops in your circuit
use this constraint (unless you tell it otherwise with different
constraints).

>2) Does this mean that if the design meets this 25ns constraint then
>that data from one flip to another arranged as in 1) will only have 25ns
>delay between them? I'm assuming this is only true for the internal flip
>flops, not connected to IO pads (IPAD to FF, or FF to OPAD).

What it means is that you want to use a 40 MHz clock, and that the delay
through the logic between the flops will be constrained to be 25 ns or less.
If the place and route tools can't make that logic fast enough to meet that
constraint, it'll flag it as a constraint not met.

>I tried compiling a design using this constraint, and I got some numbers
>that I was not sure about.
>The part I used was an xc4028EX-3 and I got the following results:
>
>Pre  place-and-route timing under this constraint : 181.3MHz
>Post place-and-route timing under this constraint : 41.3 MHz
>
>I had also used a 25ns PERIOD constraint on my main clock net

The number for pre-PPR is simply what clock speed you could run your chip at
if there were zero routing delays.  It's interesting but not necessarily
useful.

The number for Post-PPR is what the actual chip can do - it's the fastest
clock you can actually use.  You told the tools that you wanted to use a
40MHz clock, and the tools chugged away and were able to not only meet your
constraints, but give you a little slack, as well.  So, if for some reason
you wanted to use a 41.3 MHz clock, you could.  (If you dig through the
timing reports, you'll see that this number is what the slowest path in your
design is.)

-andy
-------------------------
Andy Peters
Sr Electrical Engineer
National Optical Astronomy Observatories
950 N Cherry Ave
Tucson, AZ 85719
520-318-8191
apeters@noao.edu

Article: 12240
Subject: Re: A Johnson counter
From: rk <stellare@NOSPAMerols.com>
Date: Tue, 06 Oct 1998 14:57:41 -0400
Links: << >> << T >> << A >>

uh, i'm more worried and we've been looking at the output of synthesizers
for various encoding schemes and have been finding different levels of
robustness.  irregardless of feature size, transients of various sorts can
happen, say from ESD, and the critical parts of a state machine should
continue to function and not lockup or cycle through illegal states.
somehow, i don't see the vendors talk about this when discussing "quality
of results."

rk

Jonathan Bromley wrote:

> David R Brooks wrote:
> <snip count sequence for 6-bit twisted ring counter>
> >  This has of course, used 12 of the 64 states possible in 6 bits. All
> > the other states are illegal. If inadvertently entered, the counter
> > will usually continue cycling through illegal states. A clean reset
> > will start it off right, but if your application must be sure to
> > recover from faults, you'll need to trap those illegal states.
>
> Exactly so.  To do this you need to
> a) determine all the possible cycles of illegal states
>    (don't forget there may be multiple non-intersecting cycles)
> b) invent some combinatorial gubbins that will identify
>    at least one state in EACH of these cycles, and NO states
>    in the wanted cycle
> c) use that logic to force the counter into one of the legal
>    states and/or raise a fault flag
>
> The same argument applies to one-hot state machines and
> indeed any state machine with any illegal states, i.e. any
> (sub)system with N flipflops but fewer than 2^N legal states.
>
> This is a very interesting problem, which most textbooks and
> teachers miserably fail to address.  The good reasons for using
> one-hot, well rehearsed here and elsewhere, are the small
> amounts of next-state logic and hence the fast inter-flipflop
> logic paths they yield.  Conventional wisdom says that
> one-hot SMs are therefore likely to be smaller and faster than
> encoded SMs, at least in flipflop-rich FPGAs.  Similarly, Johnson
> counters have the benefit of very high speed, tiny amounts of
> next-state logic, and (irrelevant to this post, but very useful)
> freedom from output decode spikes when used to implement one-of-N
> sequencers.
>
> BUT, BUT, BUT: If you are trying to design a _robust_ system,
> in which illegal states are detected and dealt-with as they
> should be, you will likely end up with unwieldy amounts of
> illegal-state detection logic which will be just as expensive
> of speed and real-estate as the next-state logic for a fully
> encoded system!
>
> I am very distrustful of authors who assert that effective start-up
> reset will solve this problem.  As FPGA design rules fall to 0.35u
> and below, we will surely see soft errors in FPGAs
> occasionally?  In any case, in high-reliability applications
> you have to accept that metastability will come up from behind
> and get you one day.  Sure, I have designed plenty of systems
> with a few un-caught illegal states - but I'm not proud of it.
>
> Anyone else as worried about this as I am?
>
> Jonathan Bromley
> --

Article: 12241
Subject: Re: USAGE of XILINX "FROM:TO" for VHDL and IMPLEMENTATION
From: Nestor Caouras <nestor@macbeth.ece.concordia.ca>
Date: 6 Oct 1998 19:12:23 GMT
Links: << >> << T >> << A >>

In comp.lang.vhdl Andy Peters <apeters@noao.edu.NOSPAM> wrote:
:>
:>          25ns        25ns
:>    FF1-------->FF2---------->FF3

: By same "path," do you mean that FFs 1, 2, and 3 share the same clock?
: Actually, what the constraint means is that ALL flipflops in your circuit
: use this constraint (unless you tell it otherwise with different
: constraints).

Thanks for replying Andy. 

Yes, all flip flops in my design use the same clock and I data transfer
from one flip flop to another should not exceed 25ns.  My design is heavily 
pipelined because I wanted to avoid a slow design.

Nestor

Article: 12242
Subject: Re: RAM Implementation in Altera Flex10K100A
From: rdamon@BeltronicsInspection.com (Richard Damon)
Date: Tue, 06 Oct 1998 19:37:58 GMT
Links: << >> << T >> << A >>

"Edwin Grigorian" <edwin.grigorian@jpl.nasa.gov> wrote:

>I have a design that I'm trying to implement in an Altera Flex10k100A part
>and it requires:
>
>   Type A:      3 blocks of 16 x 64 bits RAM =  3 x 1024 = 3072 bits
>   Type B:     16 blocks of  8 x  8 bits RAM = 16 x   64 = 1024 bits
>                                    Total RAM            = 4096 bits
>
>Now the 10k100A has 12 memory blocks (EABs), each 2048 bits for a total of
>24k bits.  Each EAB can be configured in any of the following: 256x8, 512x4,
>1024x2, or 2048x1.

Counting the bits of RAM is meaningless unless you can time multiplex the
access. Each EAB can only implement at most 1 RAM block, and then only 8 bits
wide at that. Yes, you are leaving a lot of unused RAM in each EAB since your
memory is so short. 

Thus your EAB usage is:
Type A:  3 blocks of 16 x 64 bits RAM =  3 x 8 EAB = 24 EAB (1/32 used)
Type B: 16 blocks of  8 x  8 bits RAM = 16 x 1 EAB = 16 EAB (1/64 used)
                            Total RAM =              40 EAB   
Possible Solutions:
1) Multiplex your memories to get more out of them, especially the A blocks.
2) Use the new 10KE family parts, I believe each EAB in these can do 256x16 of
   Dual-ported memory. I don't have at hand their release schedule, they are
   very new.
3) Build some Ram out of LEs. These are probably slower and will take a lot of
   device resources, and you will need a bigger part.
4) Use some other part which is better at small building Small RAMs.


-- 
richard_damon@iname.com (Redirector to my current best Mailbox)
rdamon@beltronicsInspection.com (Work Adddress)
Richad_Damon@msn.com (Just for Fun)

Article: 12243
Subject: REQ:An FPGA with automation programming tool
From: elmousa@my-dejanews.com
Date: Tue, 06 Oct 1998 20:07:10 GMT
Links: << >> << T >> << A >>

I work in the field of neural network hardware implementation. I used the
Lattice isp (in-system programmable) FPGAs with excellent results.

I am now looking for an FPGA that supports in system programmability and is
supported by software tools that can allow me to automate the design
architecture of the FPGA with minimum lay i.e. not technical user
intervention. Perhaps, the design software will allow complex scripting,
and/or linking to a high level langauge

The goal is to design a neural hardware system that is general purpose,
mutliconfigurable and user friendly. I believe that FPGAs will allow me to do
this when linked with a special neural processor.

Any ideas, pointers, feedback will be appreciated.

Ali El-Mousa

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

Article: 12244
Subject: Re: A Johnson counter
From: Mike Treseler <tres@tc.fluke.com>
Date: Tue, 06 Oct 1998 13:15:47 -0700
Links: << >> << T >> << A >>

This is a multi-part message in MIME format.
--------------F1A94D87E26CA79E1BB76324
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Jonathan Bromley wrote:
. . .
> BUT, BUT, BUT: If you are trying to design a _robust_ system,
> in which illegal states are detected and dealt-with as they
> should be, you will likely end up with unwieldy amounts of
> illegal-state detection logic which will be just as expensive
> of speed and real-estate as the next-state logic for a fully
> encoded system!
. . .

I think you are right on the money. When you consider the whole
problem, encoding states saves time when it fits and runs fast
enough.

 -Mike Treseler
--------------F1A94D87E26CA79E1BB76324
Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Mike Treseler
Content-Disposition: attachment; filename="vcard.vcf"

begin:          vcard
fn:             Mike Treseler
n:              Treseler;Mike
org:            Fluke Networks Division
adr;dom:        6920 Seaway Blvd;;;Everett WA;;98203;
email;internet: tres@tc.fluke.com
title:          Sr. Staff Engineer
tel;work:       425.356.5409
tel;fax:        425.356.5043
x-mozilla-cpt:  tc.fluke.com;2
x-mozilla-html: TRUE
version:        2.1
end:            vcard

--------------F1A94D87E26CA79E1BB76324--

Article: 12245
Subject: Re: A Johnson counter
From: Rickman <spamgoeshere4@yahoo.com>
Date: Tue, 06 Oct 1998 16:44:24 -0400
Links: << >> << T >> << A >>

Jonathan Bromley wrote:
> > will start it off right, but if your application must be sure to
> > recover from faults, you'll need to trap those illegal states.
> 
> Exactly so.  To do this you need to
> a) determine all the possible cycles of illegal states
>    (don't forget there may be multiple non-intersecting cycles)
> b) invent some combinatorial gubbins that will identify
>    at least one state in EACH of these cycles, and NO states
>    in the wanted cycle
> c) use that logic to force the counter into one of the legal
>    states and/or raise a fault flag
> 
> The same argument applies to one-hot state machines and
> indeed any state machine with any illegal states, i.e. any
> (sub)system with N flipflops but fewer than 2^N legal states.

I don't completely agree with your premise that illegal states MUST be
trapped nor the method of handling them.

My question is, how useful is illegal state detection when your machine
is already screwed up if you are IN an illegal state? If soft errors are
a concern, using a fully encoded machine does not solve the error
problem. It only increases the chance of jumping to a legal, but
incorrect state. Either way, you have an error in the machine operation. 

If you really need soft error resistance in your design, you must use
redundant circuitry with error detection and correction. I have never
explored this type of circuit, but you might be able to use one of the
many ECC schemes with a fully encoded state.

         Next         Cur           Corrected
Inputs   State       State      ECC    Cur  Output
         Logic        FFs      Logic  State Logic
         +----+     +----+     +----+       +----+
-------->|    |---->|D  Q|---->|    |---+-->|    |---->
         |    |     |    |     |    |   |   |    |
    +--->|    |     |>   |     |    |   |   |    |
    |    +----+     +----+     +----+   |   +----+
    +-----------------------------------+

The output of the Next State logic will have N bits of state plus M bits
of ECC. If you have an error in the logic rather than just the FFs, then
I don't know if the ECC circuit can correct for the error. 

-- 

Rick Collins

redsp@XYusa.net

remove the XY to email me.

Article: 12246
Subject: Re: REQ:An FPGA with automation programming tool
From: Rickman <spamgoeshere4@yahoo.com>
Date: Tue, 06 Oct 1998 16:54:53 -0400
Links: << >> << T >> << A >>

elmousa@my-dejanews.com wrote:
> 
> I work in the field of neural network hardware implementation. I used the
> Lattice isp (in-system programmable) FPGAs with excellent results.
> 
> I am now looking for an FPGA that supports in system programmability and is
> supported by software tools that can allow me to automate the design
> architecture of the FPGA with minimum lay i.e. not technical user
> intervention. Perhaps, the design software will allow complex scripting,
> and/or linking to a high level langauge
> 
> The goal is to design a neural hardware system that is general purpose,
> mutliconfigurable and user friendly. I believe that FPGAs will allow me to do
> this when linked with a special neural processor.
> 
> Any ideas, pointers, feedback will be appreciated.

I think I understand what you are trying to do and it may be possible.
The most likely method would be to use your input program to analyze
your user requirements and produce VHDL as output. The VHDL can then be
compiled to a chip by means of any of the many VHDL compiliers
available. 

The problem to this approach will be that you have to work within the
subset of VHDL supported for synthesis by your tool. But if you can
produce C code from your inputs, you should be able to produce VHDL
code. The trick will be producing GOOD VHDL code. 

-- 

Rick Collins

redsp@XYusa.net

remove the XY to email me.

Article: 12247
Subject: Re: A Johnson counter
From: rk <stellare@NOSPAMerols.com>
Date: Tue, 06 Oct 1998 18:44:48 -0400
Links: << >> << T >> << A >>

hi rick,

we went through this a number of years ago, and again more recently.  you do
need to have transitions from all of the states ... they really must be
handled.  the state info + extra bit approach is not sufficient for all
cases.  for example, suppose that you have an "upset" on the clock line,
giving a runt pulse, tossing you into an illegal state or sequence of
states.  the edac schemes typically work based on the hamming distance of
the code implemented.  other examples that can cause multiple upsets are
drop outs on the power bus, esd, certain man-initiated events, etc., etc.

more recently i've been looking at what vhdl synthesizers do to state
machines and am feeding in simple examples of sequencers.  for one hot state
machines, one compiler makes a structure that can either lose it's one-hot
state or have two hot states chasing each other around.  another one makes
quite a robust machine which always rights itself and is not truly one hot
but will right itself no matter what you do to it at little overhead (and
some extra delay for larger numbers of states).  for an encoded machine, i
have found one compiler eliminate 'unreachable' states by default and their
transitions, even if you go to the trouble of coding htem in.  gotta figure
out what each version of the synthesizers and optimizers do and how to
disable certain parts.

anyways,

interesting topic, we're looking into it deeper, open for ideas (email if
you wish, remove no-spam).

rk

______________________________________________________

Rickman wrote:

> Jonathan Bromley wrote:
> > > will start it off right, but if your application must be sure to
> > > recover from faults, you'll need to trap those illegal states.
> >
> > Exactly so.  To do this you need to
> > a) determine all the possible cycles of illegal states
> >    (don't forget there may be multiple non-intersecting cycles)
> > b) invent some combinatorial gubbins that will identify
> >    at least one state in EACH of these cycles, and NO states
> >    in the wanted cycle
> > c) use that logic to force the counter into one of the legal
> >    states and/or raise a fault flag
> >
> > The same argument applies to one-hot state machines and
> > indeed any state machine with any illegal states, i.e. any
> > (sub)system with N flipflops but fewer than 2^N legal states.
>
> I don't completely agree with your premise that illegal states MUST be
> trapped nor the method of handling them.
>
> My question is, how useful is illegal state detection when your machine
> is already screwed up if you are IN an illegal state? If soft errors are
> a concern, using a fully encoded machine does not solve the error
> problem. It only increases the chance of jumping to a legal, but
> incorrect state. Either way, you have an error in the machine operation.
>
> If you really need soft error resistance in your design, you must use
> redundant circuitry with error detection and correction. I have never
> explored this type of circuit, but you might be able to use one of the
> many ECC schemes with a fully encoded state.
>
>          Next         Cur           Corrected
> Inputs   State       State      ECC    Cur  Output
>          Logic        FFs      Logic  State Logic
>          +----+     +----+     +----+       +----+
> -------->|    |---->|D  Q|---->|    |---+-->|    |---->
>          |    |     |    |     |    |   |   |    |
>     +--->|    |     |>   |     |    |   |   |    |
>     |    +----+     +----+     +----+   |   +----+
>     +-----------------------------------+
>
> The output of the Next State logic will have N bits of state plus M bits
> of ECC. If you have an error in the logic rather than just the FFs, then
> I don't know if the ECC circuit can correct for the error.
>
> --
>
> Rick Collins
>
> redsp@XYusa.net
>
> remove the XY to email me.

Article: 12248
Subject: Re: Verilog Simulators
From: "Ken Coffman" <kcoffman@intermec.com>
Date: 6 Oct 1998 22:53:46 GMT
Links: << >> << T >> << A >>

You can't go wrong with Model Technology's ModelSim (www.model.com) or
Simucad Silos III (www.simucad.com). Model Tech will give you a free 30 day
eval license and Simucad has a free version good for 100-200 lines of code.
Download them and try them out. Silos III is probably a little cheaper.

Martin Meserve <meserve@my-dejanews.com> wrote in article
<3618CD06.41C67EA6@my-dejanews.com>...
> Rick Filipkiewicz wrote:
> > 
> > I'm looking for a reasonably priced Verilog simulator to add to our
> > Xilinx Foundation+Express package.
> > So far I can see VeriWell, Chronologic,  QuickTurn. Anybody have any
> > comments on these or others. We could go to $5000 which  I assume
> > writes off Cadence.
> > 
> > Also looking for a Verilog PCI testbench suite.
> 
> If that's your budget, yes, you can write off Cadence's Verilog-XL.
> However, you should also write off Chronologic. It's a toss up
> which one is better, Chronologic or Cadence. They both have their
> good points and their bad points. The last time we received a quote
> form them, Chronologic's was more expensive. By the way, Chronologics'
> VCS is now owned by Synopsys. They have had a tight hold on VHDL
> development tools and now they want to expand to Verilog.
> 
> We have both VCS and Verilog-XL. We use VCS for our large ASIC
> development and Verilog-XL for board development but they can
> be used for either.
> 
> One thing that you didn't mention is a waveform viewing tool.
> I have, in the past, developed a medium sized ASIC with only
> text output (reams and reams of zeros and ones), but it's
> not an easy task. We have a viewer from Summit Design that we use
> with VCS and we use Cadence's built in viewer, with Verilog-XL.
> They both work well.
> 
> If your budget is as tight as you say, take the suggestions from
> the other posters. Other wise you are going to have the get
> someone to stretch the budget. And, don't forget the 15% a year
> for support.
> 
> Martin
> 
> --
> Martin E. Meserve                                      martin.e.meserve
> Engineer, Program/Project Specialist                         AT
> Lockheed Martin Tactical Defense Systems - AZ             lmco.com
> 
> -----------== Posted via Deja News, The Discussion Network ==----------
> http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
>

Article: 12249
Subject: Re: Design security again - the Actel solution
From: Tom Kean <tom@algotronix.com>
Date: Tue, 06 Oct 1998 23:59:23 +0100
Links: << >> << T >> << A >>

This is a multi-part message in MIME format.
--------------DDBDB1AB852CD888C58C5B70
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Botond Kardos wrote:

> .....
>    This paper claims that one needs to make an electron microscope shot
> to get every simple antifuse, and this photos are destructive and quite
> expensive, so breaking an Actel antifuse FPGA wich contains about 50,000
> antifuses might cost $50 million.
>    Is this true ? Aren't there other ways for reprogramming or
> eliminating the read-out protection (it also may be a single or more
> antifuses) for example with an ion-beam ?

Here's my 2p worth: the Actel paper starts out as a reasonable if
superficialintroduction to design security.  Then it gradually gets into the
realms of the
ridiculous as the marketing department takes over.

The figure on Actel's slide 33 is actually $500M to crack an antifuse FPGA
based on $1000 per picture and 500,000 antifuses.

I bet you anything you like that if you ask for 500,000 pictures the lab
will give
you a price break :-)

More seriously I don't think anybody would ever try to take pictures of
every antifuse
to figure out the configuration of an FPGA.  Using an Ion beam machine to
get at the programming circuitry would be a good first step: once the
programming circuitry
is active the antifuse is no safer than an SRAM.

Tom.

--------------DDBDB1AB852CD888C58C5B70
Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Tom Kean
Content-Disposition: attachment; filename="vcard.vcf"

begin:          vcard
fn:             Tom Kean
n:              Kean;Tom
org:            Algotronix Ltd.
adr:            P.O. Box 23116;;;Edinburgh;;EH8 8YB;Scotland
email;internet: tom@algotronix.com
title:          Director
tel;work:       UK +44 131 556 9242
tel;fax:        UK +44 131 556 9247
note:           Web Site: www.algotronix.com
x-mozilla-cpt:  ;0
x-mozilla-html: TRUE
version:        2.1
end:            vcard

--------------DDBDB1AB852CD888C58C5B70--

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search