Messages from 29925

Article: 29925
Subject: FFT in FPGAs
From: Rick Collins <spamgoeshere4@yahoo.com>
Date: Sun, 18 Mar 2001 04:41:40 -0500
Links: << >> << T >> << A >>

I am looking at performing real data, fixed point FFTs in an FPGA and I
would like to get some info on the processing time and logic size
required. The input data is 14 bit, 2048 points. A standard optimization
for processing real data is to fold the data into the complex input
array, so that you only process a 1024 point FFT and then unfold the
real data in an extra step. We have a DSP available which can do the
final unfolding step. 

I checked the Altera web site and found info on their megacore function.
For a 1K FFT, they use about 3000 LE's and 10 block rams (EABs). They
claim the max speed is 90 MHz for 57 uS per block. This is only 3x what
I can get from the DSP chip! 

Is the Altera megacore not highly optimized for speed? Are there other
cores available that can process the data at a higher clock rate? The
data is clocked in at 100 MHz burst rate, if it is fully pipelined and
can start another butterfly each 4 clock cycles it should be able to
process the data in 20 uS. Perhaps that is too much to expect since
there are log2(N)/2 passes. I would like to process the block in 20 uS.
At that point the processing time becomes insignificant in the overall
process. Is that too much to expect from a hardware solution without
using a thousand dollar chip? 


-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 29926
Subject: Re: FFT in FPGAs
From: Peter Alfke <palfke@earthlink.net>
Date: Sun, 18 Mar 2001 16:10:00 GMT
Links: << >> << T >> << A >>

I think the many fast combinatorial 18 x 18 multipliers in Virtex-II give
it a real advantage. I will try to post data tomorrow ( Monday).
Peter Alfke

Rick Collins wrote:

> I am looking at performing real data, fixed point FFTs in an FPGA and I
> would like to get some info on the processing time and logic size
> required. The input data is 14 bit, 2048 points.

Article: 29927
Subject: Re: FFT in FPGAs
From: Ray Andraka <ray@andraka.com>
Date: Sun, 18 Mar 2001 16:13:41 GMT
Links: << >> << T >> << A >>

Rick,  

You get what you pay for applies here. The xilinx virtex core is faster, with
~95 MHz sample rates in the slowest speed grade (virtex-4) parts, but still not
as fast as a truely optimized design (if you ever looked at the floorplan of the
xilinx macro you'd see what I mean).   We offer a 16 point FFt kernel for Xilinx
Virtex and VirtexE families.  It occupies 20 x 25 CLBs and will run at > 240
MS/S in a VIrtexE-8 device. The 16 point kernel plus a cordic rotator, block RAM
and some addressing logic will handle 256 and 4K point FFTs either as 2-3 passes
through  the same kernel or using 2-3 kernels at near full rate (the data rate
gets limited by the block RAM access for the larger FFTs).  Right now we don't
have the larger FFTs encapsulated as a core, but we have done the 4K FFTs for a
couple of customers.  Give me a call if you want more info.  VirtexII claims to
do a 1K FFT in 320ns, but I believe that design uses most of the largest
device.  I suspect I could beat that core with mine by putting several of mine
in parallel (both in terms of speed and area).

Rick Collins wrote:
> 
> I am looking at performing real data, fixed point FFTs in an FPGA and I
> would like to get some info on the processing time and logic size
> required. The input data is 14 bit, 2048 points. A standard optimization
> for processing real data is to fold the data into the complex input
> array, so that you only process a 1024 point FFT and then unfold the
> real data in an extra step. We have a DSP available which can do the
> final unfolding step.
> 
> I checked the Altera web site and found info on their megacore function.
> For a 1K FFT, they use about 3000 LE's and 10 block rams (EABs). They
> claim the max speed is 90 MHz for 57 uS per block. This is only 3x what
> I can get from the DSP chip!
> 
> Is the Altera megacore not highly optimized for speed? Are there other
> cores available that can process the data at a higher clock rate? The
> data is clocked in at 100 MHz burst rate, if it is fully pipelined and
> can start another butterfly each 4 clock cycles it should be able to
> process the data in 20 uS. Perhaps that is too much to expect since
> there are log2(N)/2 passes. I would like to process the block in 20 uS.
> At that point the processing time becomes insignificant in the overall
> process. Is that too much to expect from a hardware solution without
> using a thousand dollar chip?
> 
> --
> 
> Rick "rickman" Collins
> 
> rick.collins@XYarius.com
> Ignore the reply address. To email me use the above address with the XY
> removed.
> 
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design      URL http://www.arius.com
> 4 King Ave                               301-682-7772 Voice
> Frederick, MD 21701-3110                 301-682-7666 FAX

-- 
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com

Article: 29928
Subject: Re: Is there any Virtex-II Evaluation Board?
From: "Allan Cantle" <a.cantle@nallatech.com>
Date: Sun, 18 Mar 2001 09:34:42 -0800
Links: << >> << T >> << A >>

Nallatech are offering a Virtex-II evaluation board called the Ballynuey-3, which is largely similar in functionality to the Ballynuey-2. 

See our website www.nallatech.com for up to date info.

Regards

Allan Cantle
Nallatech Ltd

Article: 29929
Subject: Re: FFT in FPGAs
From: Rick Collins <spamgoeshere4@yahoo.com>
Date: Sun, 18 Mar 2001 13:33:30 -0500
Links: << >> << T >> << A >>

Peter Alfke wrote:
> 
> I think the many fast combinatorial 18 x 18 multipliers in Virtex-II give
> it a real advantage. I will try to post data tomorrow ( Monday).
> Peter Alfke
> 
> Rick Collins wrote:
> 
> > I am looking at performing real data, fixed point FFTs in an FPGA and I
> > would like to get some info on the processing time and logic size
> > required. The input data is 14 bit, 2048 points.

Thanks for the suggestion Peter, but I have to use parts that I can get.
I don't remember exactly what has been said about the XC2V introduction
schedule, but I don't see any sign that XC2V parts are remotely
available. I also seem to remember that there are no low cost members of
this family. The approach I want to take with this project is to design
a board that will use low cost parts for a "standard" version, or can be
built with larger, faster FPGAs for "special" needs such as this one.
The XC2V parts aren't pin compatible with XC2S parts are they? 

I also can't use the XC2S parts or the XCV parts because of the startup
current issue. I only have 2 Amps of max current available and I don't
even know for sure that this can be supplied during the power up ramp.
The Altera parts are MUCH better in this regard. With a total of 5
Xilinx parts on the board, an industrial temperature version of the
board will require 10 AMPS if I use all Xilinx parts. The Altera version
will only use <1.2 AMPS. 

I can consider using a single XC2V part on an optional daughter board if
there is one I know I can get my hands on. Will I be able to get the
XC2V40 or XC2V80 in an FG256 package anytime in the next two months? I
see pricing on the web, but I see no sign of availability. In fact,
Avnet lists it as a special order and the Arrow web site seems to have
forgotten that they sell Xilinx at all. 

You guys may make great parts, but lately they just don't seem to fit on
my boards... 

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 29930
Subject: Re: FFT in FPGAs
From: Rick Collins <spamgoeshere4@yahoo.com>
Date: Sun, 18 Mar 2001 13:42:39 -0500
Links: << >> << T >> << A >>

Ray Andraka wrote:
> 
> Rick,
> 
> You get what you pay for applies here. The xilinx virtex core is faster, with
> ~95 MHz sample rates in the slowest speed grade (virtex-4) parts, but still not
> as fast as a truely optimized design (if you ever looked at the floorplan of the
> xilinx macro you'd see what I mean).   We offer a 16 point FFt kernel for Xilinx
> Virtex and VirtexE families.  It occupies 20 x 25 CLBs and will run at > 240
> MS/S in a VIrtexE-8 device. The 16 point kernel plus a cordic rotator, block RAM
> and some addressing logic will handle 256 and 4K point FFTs either as 2-3 passes
> through  the same kernel or using 2-3 kernels at near full rate (the data rate
> gets limited by the block RAM access for the larger FFTs).  Right now we don't
> have the larger FFTs encapsulated as a core, but we have done the 4K FFTs for a
> couple of customers.  Give me a call if you want more info.  VirtexII claims to
> do a 1K FFT in 320ns, but I believe that design uses most of the largest
> device.  I suspect I could beat that core with mine by putting several of mine
> in parallel (both in terms of speed and area).
> 
> Rick Collins wrote:
> >
> > I am looking at performing real data, fixed point FFTs in an FPGA and I
> > would like to get some info on the processing time and logic size
> > required. The input data is 14 bit, 2048 points. A standard optimization
> > for processing real data is to fold the data into the complex input
> > array, so that you only process a 1024 point FFT and then unfold the
> > real data in an extra step. We have a DSP available which can do the
> > final unfolding step.
> >
> > I checked the Altera web site and found info on their megacore function.
> > For a 1K FFT, they use about 3000 LE's and 10 block rams (EABs). They
> > claim the max speed is 90 MHz for 57 uS per block. This is only 3x what
> > I can get from the DSP chip!
> >
> > Is the Altera megacore not highly optimized for speed? Are there other
> > cores available that can process the data at a higher clock rate? The
> > data is clocked in at 100 MHz burst rate, if it is fully pipelined and
> > can start another butterfly each 4 clock cycles it should be able to
> > process the data in 20 uS. Perhaps that is too much to expect since
> > there are log2(N)/2 passes. I would like to process the block in 20 uS.
> > At that point the processing time becomes insignificant in the overall
> > process. Is that too much to expect from a hardware solution without
> > using a thousand dollar chip?

Ray,

I may well be calling you in the next couple of days, but I just don't
think I can use a Xilinx part for this unless I find a "special" spot on
the board. I am in the process of desiging a "standard" board product
and am trying to use it in a "custom" application. In the standard mode,
I want to use 5 FPGAs on the board since four of them are used as IO
controllers for field replaceable daughter boards. The board is
generating its own 3.3 and x.x volt power from a 5 volt input. So we
can't use the XC2S or XCV parts because of the startup current problem. 

I would consider the XC2V parts since they do seem like a significant
advance in capability. But the price is too high to use them in the
"standard" version of the board. So the only way I could use a Xilinx
part is to put it on a daughterboard as a "special" IO feature. I will
consider this, but I prefer to use the FPGAs I already have on the main
board, possibly bumping the size of the part. 

So have you done much with the Altera parts?


-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 29931
Subject: VHDL code required for a given decimator system
From: nospam@nospam.net (David Nyarko)
Date: Sun, 18 Mar 2001 20:51:58 GMT
Links: << >> << T >> << A >>

Hi,
Any leads on VHDL code to implement the following system:

Inputs
Inputseq: A sequence of standard logic vector elements each of the
same width representing 2's complement inputs typically 16-bits wide. 

clock: sysclk

Each input sequence element is clocked in on the rising edge of
sysclk.

Outputs:
outputA: same size and type as input
outputB: same size and type as input
outclock: sysclk/2


Assuming the input sequence elements are represented as:

x0,x1,x2,x3,x4,x5,x6,x7...

The desired output elements should be:

outputA: x0,-x2,x4,-x6,x8,...
outputB: x1,-x3,x5,-x7,x9,.

The outputs are at half the input rate and should 
appear at the same time (i.e x0 and x1 , -x2 and -x3 etc...)
The output clock (outclock) will be used the clock
the next processing stage whose inputs will
be outputA and outputB.


David

Article: 29932
Subject: TBUFs in Virtex and later chips, going out of fashion, what instead
From: Neil Franklin <neil@franklin.ch.remove>
Date: 19 Mar 2001 00:02:36 +0100
Links: << >> << T >> << A >>

In the earlier Xilinx chips (3000, 4000, 5200) there is always 2
TBUFs per CLB of 2 LUT+FF. And both TBUF lines can be read.

In Virtex and Spartan-II it is down to 2 TBUFS for the 4 LUT+FF of an
CLB, so only one slice can be routed to them, and even only 1 line for
reading back from TBUF lines.

In Virtex-II there is even only 2 TBUFs for 8 LUT+FF per CLB. Which
makes even connecting the output of the 4-wide 2 slices of an single
carry chain to an bus impossible. The data sheet does not give the
amount of readbacks.

From this I get the impression, that Xilinx regards TBUF buses as going
out of fashion. After all, the TBUFs cost in chip space is next to nothing
relative to them many PIPs (about 900 per CLB in Virtex).


In the Jan Gray RISC processors TBUFs are used to implement processor
internal data buses in no space. I have the same type of situation,
with many data producing elements to be selected from. TBUFs seem to
be ideal _horizontal_ wide AND-ORs (vertical is being used for the bits,
because of the carry chains).

So I have a question: What is the Xilinx-suggested replacement for TBUFs?
Is one supposed to use MUXes implemented in the CLBs? Is there an other
trick I have not yet stumbled over?


Note that I need to use Spartan-II, as Spartan is too small and JBits
only runs on Virtex and Spartan-II anyway.


--
Neil Franklin, neil@franklin.ch.remove http://neil.franklin.ch/
Hacker, Unix Guru, El Eng HTL/FH/BSc, Sysadmin, Roleplayer, LARPer

Article: 29933
Subject: Re: FFT in FPGAs
From: Peter Alfke <palfke@earthlink.net>
Date: Mon, 19 Mar 2001 00:30:54 GMT
Links: << >> << T >> << A >>

Rick Collins wrote:

> Thanks for the suggestion Peter, but I have to use parts that I can get.
> I don't remember exactly what has been said about the XC2V introduction
> schedule, but I don't see any sign that XC2V parts are remotely
> available. I also seem to remember that there are no low cost members of
> this family.

As you know, I am neither in Marketing nor in Sales, but:
We are just finishing a project that puts hundreds of evaluation boards in the
hands of our FAEs, and each board has an XC2V40 on it (BG256). So these parts
are available, and I have heard prices of $40 going down to $10.  XC2V1000s
are also becoming available, and also comes in the same package
(pin-out-compatible).
You also don't have to worry about start-up current in XC2V devices, that
problem has been thoroughly licked, the start-up current is something like 40
mA.
You should consider the XC2V1000 available, and you may want to play with the
XC2V40 to get a feel for the new features ( Clock management, large BlockRAMs
and multipliers, digitally-controlled output impedance = built-in series
termination ) Every feature of the larger parts, even the 16 global clocks
with glitch-free input muxes, is available in the tiny XC2V40.  Stop me, I
just came back from a seminar tour...

Peter Alfke, Xilinx Applications

Article: 29934
Subject: Re: Spartan II power
From: hmurray-nospam@megapathdsl.net (Hal Murray)
Date: Mon, 19 Mar 2001 06:57:03 -0000
Links: << >> << T >> << A >>


>I would love to put all of these designs into a single part to save
>board space, chip cost, power and save on procurement and assembly cost.
>But to make my system work, I will need supported, partial
>reconfiguration. I would need to load a portion of the chip with a main
>(static) function, and four modules to match the IO connected to the
>board. 

Here is a probably crazy suggestion...

Do you have a CPU handy that can help with the initialization?
In particular, can it do serious compuation to setup the right
bit pattern to feed to the FPGA?

How many different configurations do you have in your 4 IO
modules?

First suggestion is to stuff everything into one chip and
then setup some script to make the bit pattern for all the
interesting combinations of IO modules.  Then just load the
right one.

I'm assuming you can setup some scripts to make the required
configurations so you don't have to do it by hand.

You can probably save disk/ROM space (if that's interesting)
by diffing various configurations and reconstructing the ones
you need on the fly.


Here comes some serious handwaving...

Suppose a 1 in the configuration file means that a pass transistor
is turned on.  Then you can merge two designes by ORing the bits
together.

So you might be able make a basic design and allocate space in the
big chip for the IO modules.  As long as each IO module didn't use
any resources outside the allocated space, it couldn't conflict with
other IO modules.  That may not work for long lines.

The idea is to make a basic module with the don't-discard-unused
parts option, save that.  Then make another module with each
IO module in each of the locations, diff against the basic module
and save the difference.  You probably have to inspect the result
by hand to verify that nothing is outside the space you allocated.

If that all works, you can make a custom module by just ORing
the appropriate IO module/slot combinations into the basic module.

-- 
These are my opinions, not necessarily my employeers.  I hate spam.

Article: 29935
Subject: Re: Parallel Port EPP
From: hmurray-nospam@megapathdsl.net (Hal Murray)
Date: Mon, 19 Mar 2001 07:04:06 -0000
Links: << >> << T >> << A >>

>I' like to use a CPLD/FPGA (Xilinx) to receive data from the parallel
>port (EPP-mode) of my PC.
>
>Is it a good style to react direct on the edges of the port signals (e.
>g. adress/data strobes) or would it be better to use a fast PLD-Clock to
>sample the port and then to evaluate the signals in a clocked logic?

I think you didn't provide a critical chunk of information.  What
are your goals/priorities?

What are the relative importances of  bandwidth, correctness, design time?
...


If you run all the async signals through the standard pair of FFs
then you (probably) won't have any problems from metastability.

That will cost you 1.5 cycles (average) of round trip time which
turns into reduced bandwidth.

If your top goal is max throughput, then you are almost forced
to use some kludgy logic driven off the strobes.  Fortunately,
that is (probably) small enough that you can get it right.

-- 
These are my opinions, not necessarily my employeers.  I hate spam.

Article: 29936
Subject: Re: Parallel Port EPP
From: hmurray-nospam@megapathdsl.net (Hal Murray)
Date: Mon, 19 Mar 2001 07:10:50 -0000
Links: << >> << T >> << A >>

>                        +------------|&|----------> CE
>                        |             |
>              |     |---+--->|     |  |
>Strobe ------>| FF1 |        | FF2 |  |
>              |     |        |     |O-+
>CLK -------------^--------------^
>
>I'll sample the strobe and data signals with the same clock. The strobe signal
>is shifted through two FF's in series.
>These two FF's generate the CE  (FF1 AND NOT FF2, rising edge) signal for the
>outgoing data.

That's the classic way to get metastability troubles.

It will work fine if your clock rate is slow enough.  But CE goes
to the whole data register so it is likely to have longer routing.
That would set off my alarm bells.


-- 
These are my opinions, not necessarily my employeers.  I hate spam.

Article: 29937
Subject: Re: FFT in FPGAs
From: Rick Collins <spamgoeshere4@yahoo.com>
Date: Mon, 19 Mar 2001 02:37:43 -0500
Links: << >> << T >> << A >>

Peter Alfke wrote:
> 
> Rick Collins wrote:
> 
> > Thanks for the suggestion Peter, but I have to use parts that I can get.
> > I don't remember exactly what has been said about the XC2V introduction
> > schedule, but I don't see any sign that XC2V parts are remotely
> > available. I also seem to remember that there are no low cost members of
> > this family.
> 
> As you know, I am neither in Marketing nor in Sales, but:
> We are just finishing a project that puts hundreds of evaluation boards in the
> hands of our FAEs, and each board has an XC2V40 on it (BG256). So these parts
> are available, and I have heard prices of $40 going down to $10.  XC2V1000s
> are also becoming available, and also comes in the same package
> (pin-out-compatible).
> You also don't have to worry about start-up current in XC2V devices, that
> problem has been thoroughly licked, the start-up current is something like 40
> mA.
> You should consider the XC2V1000 available, and you may want to play with the
> XC2V40 to get a feel for the new features ( Clock management, large BlockRAMs
> and multipliers, digitally-controlled output impedance = built-in series
> termination ) Every feature of the larger parts, even the 16 global clocks
> with glitch-free input muxes, is available in the tiny XC2V40.  Stop me, I
> just came back from a seminar tour...
> 
> Peter Alfke, Xilinx Applications

Sometimes I get really bummed out when I just can't find a way to make
something work that would be so perfect.  The availability is likely not
a real problem except that I am a bit too cautious to commit to a part
and then not be able to get what I need for production.  In this case
production is at 
least 6 months away. So if the XC2V40 and XC2V1000 parts were available
now and I had some reason to believe that I could get them at reasonable
prices by the point of production (like a quote) I would love to design
them in.  But one thing I forgot was that I need to interface to a 5
volt PC/104 bus.  The 5 volt IO would make the design much more
complex.  
I would have to add another power rail for an XC2S part or add many
buffer parts. Neither one is very workable. 

I seem to remember that one of the V parts was 5 volt TTL compatible if
you added series resistors to limit the current. But that would mean
some 90+ extra resistors on the board! But it might work. Will the XC2V
work this way?

BTW, how can the part be PCI compliant without being 5 volt tolerant? Is
it only 3 volt PCI compliant? 

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 29938
Subject: Re: RAM-based Shift Register
From: Heinrich Fonfara <heinrich.fonfara@ibmt.fhg.de>
Date: Mon, 19 Mar 2001 09:37:30 +0100
Links: << >> << T >> << A >>



Hi,

thanks to all for the helpfull advices that has made things more clear.

Heinrich

Article: 29939
Subject: Re: xilinx Webpack missing speed grade
From: Nicolas Matringe <nicolas.matringe@IPricot.com>
Date: Mon, 19 Mar 2001 09:41:43 +0100
Links: << >> << T >> << A >>

Eric Smith wrote:

> WebPACK does NOT support the Virtex parts.

I was sure I had read that it supported the whole Virtex family... I'll
see if I can find it.

I think I'll go back to my old Foundation 2.1i

-- 
Nicolas MATRINGE           IPricot European Headquarters
Conception electronique    10-12 Avenue de Verdun
Tel +33 1 46 52 53 11      F-92250 LA GARENNE-COLOMBES - FRANCE
Fax +33 1 46 52 53 01      http://www.IPricot.com/

Article: 29940
Subject: video coding
From: Frode Vatvedt Fjeld <frodef@acm.org>
Date: 19 Mar 2001 10:06:53 +0100
Links: << >> << T >> << A >>

Does anyone know of any texts concerning implementing digital video
coding (compression, DCT etc.) on FPGAs?

Thanks,
-- 
Frode Vatvedt Fjeld

Article: 29941
Subject: Re: xilinx Webpack missing speed grade
From: Nicolas Matringe <nicolas.matringe@IPricot.com>
Date: Mon, 19 Mar 2001 10:12:51 +0100
Links: << >> << T >> << A >>

Eric Smith wrote:

> WebPACK does NOT support the Virtex parts.  The only FPGAs
> WebPACK supports are the Spartan II and a single Virtex-E part,
> the XCV300E.

I was only talking about the Floorplanner:

Floorplanner Guide

Chapter 1: Introduction 

Supported Architectures
The Floorplanner supports all Xilinx architectures in the Spartan/-II™,
Virtex/-E/-II™, and XC4000™ device families.

(quoted from the WebPACK help)

-- 
Nicolas MATRINGE           IPricot European Headquarters
Conception electronique    10-12 Avenue de Verdun
Tel +33 1 46 52 53 11      F-92250 LA GARENNE-COLOMBES - FRANCE
Fax +33 1 46 52 53 01      http://www.IPricot.com/

Article: 29942
Subject: Virtex gate count...?
From: =?iso-8859-1?Q?Pawe=B3?= J. Rajda <pjrajda@uci.agh.edu.pl>
Date: Mon, 19 Mar 2001 14:35:18 +0100
Links: << >> << T >> << A >>

This is a multi-part message in MIME format.
--------------9D2B3663DDFBCAF862B2A490
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

What is the difference between System Gates (Virtex Data Sheet) and
Logic Gate Equivalent (MAP report)? Does the LGE include BlockRAMs?
How LGE is computed?

--
Regards,
Pawel J. Rajda

-----------------------------------------------------------------------------

Pawel J. Rajda, MSc. E.E.         mail: pjrajda@uci.agh.edu.pl
Dept. of Electronic Engineering    www:
http://galaxy.uci.agh.edu.pl/~pjrajda
AGH Technical University           tel: (+48-12) 617 3980
Al. Mickiewicza 30                 fax: (+48-12) 633 2398
30-059 Cracow, POLAND
-----------------------------------------------------------------------------



--------------9D2B3663DDFBCAF862B2A490
Content-Type: text/x-vcard; charset=us-ascii;
 name="pjrajda.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Pawe³ J. Rajda
Content-Disposition: attachment;
 filename="pjrajda.vcf"

begin:vcard 
n:Rajda;Pawel J.
x-mozilla-html:FALSE
org:AGH Technical University
version:2.1
email;internet:pjrajda@uci.agh.edu.pl
title:M.Sc. E.E.
tel;fax:+48 12 633 2398
tel;home:+48 12 634 0653
tel;work:+48 12 617 3980
adr;quoted-printable:;;Dept. of Electronics=0D=0AAl. Mickiewicza 30;Krakow;;30-059;POLAND
x-mozilla-cpt:;0
fn:Rajda, Pawel J.
end:vcard

--------------9D2B3663DDFBCAF862B2A490--

Article: 29943
Subject: Re: TBUFs in Virtex and later chips, going out of fashion, what instead
From: "Jan Gray" <jsgray@acm.org>
Date: Mon, 19 Mar 2001 06:28:08 -0800
Links: << >> << T >> << A >>

Indeed, the halving of TBUFs/LUT in Virtex, and again in V-II, make my
datapaths larger/less functional per LUT, compared with XC4000.  (Consider
what happens to the result mux in the xr16 CPU datapath schematic S3 on p. 9
of www.fpgacpu.org/papers/xsoc-series-drafts.pdf, for example.  Not to
mention the "zero cost" <<2, <<4, <<8, >>2, >>4, >>8 shifters and bus
byte/word/longword resizers you can build with spare columns of TBUFs.)

Reoptimizing for Virtex has been a chore.  (Another setback in Virtex vs.
4000 was the loss of independent clock inversion on LUT RAM WCLKs and LUT FF
CLKs, but that's another story.)

But for V-II there seems to be no practical alternative but to a) use
(waste) LUTs and their interconnect to build these horizontal muxes, and/or
b) recode your design to help your technology mappers merge some of the
muxes into other logic.

Regarding (b), using the Virtex-style carry logic (including MULT_AND), it
seems possible to build these "free mux" structures:
  1) o[i] = addsub ? (a[i] + b[i]) : (a[i] - b[i])
  2) o[i] = add ? (a[i] + b[i]) : c[i]
  3) o[i] = addb ? (a[i] + b[i]) : (a[i] + c[i])
  4) o[i] = addsub ? (addand ? a[i]+b[i] : a[i]-b[i]) : (addand ? a[i]&b[i]
: a[i]^b[i])

See http://www.fpgacpu.org/log/nov00.html#001112 for details.

Synthesis tools get (1) (usually) but (as far as I know) miss the others.

Consider case (2). An add followed by a mux would seem to be a pretty common
circuit structure, and therefore important to optimize.  Surely using the
single-LUT-per-bit construction is a no-brainer, right?  Not so fast!  There
are some tools issues.

If you inefficiently implement this as two LUTs:
    t = a[i] + b[i];
    o[i] = add ? t : c[i]
then trce will "see" that the latency from c[i] to o[i] is Tilo.  Good.

But if you implement it in one LUT as
    o[i] = add ? (a[i] + b[i]) : c[i]
e.g.
    o[i] = add&(a[i]^b[i]) + ~add&c[i]
along with the appropriate configuration of MULT_AND, MUXCY, and XORCY, then
(if I recall correctly) trce will also find false ripple-carry paths from
c[i], e.g. from c[0] to o[31], which would therefore interfere with correct
static timing analysis and with timing driven placement and routing.  Oops!

Therefore, I would like to see two tools enhancements to enable correct
inference of add/mux in one LUT per bit:

a) Xilinx should enhance trce to do a more precise analysis around
ripple-carry structures, e.g. to rule out the false path from c[i] through
the carry chain to o[i+1]...o[n].  Here with 'add' feeding MULT_AND, there
is no carry-out if 'add' is false, and also, c[i] does not influence the
carry-out when 'add' is true, and thus the MUXCY carry-out does not depend
upon c[i].

b) Xilinx should lobby its synthesis partners to infer add/mux structures
like (2)-(4) when possible.  Or encourage a user-directive to force it.

If both (a) and (b) were done, then Xilinx customers (synthesis users and
RPM builders alike) would probably enjoy somewhat smaller and faster results
in the devices they're already using.

This add/mux inference digression aside, abundant TBUFs were useful and will
be missed.  But I suppose that any FPGA feature that HDL synthesis users and
tools do not take good advantage of, is not long for this world.

Jan Gray, Gray Research LLC

Article: 29944
Subject: about placement and routing
From: MANJUNATHAN <manjunathan_s1@yahoo.com>
Date: Mon, 19 Mar 2001 06:41:44 -0800
Links: << >> << T >> << A >>

Hello everybody !!!

   I need to know how to place and route the design given below.

   the code shown below has to work in 100 mz.but when i synthesized this with xilinx foundation series 2.1 it showed me 32 mz. when i synthesized the same code again it showed me 40 mz working fenquency.
why this difference ? is it possible it to make 100 mz.

    is it possible to place and route the design in virtex device such that its  working frequency is 
100 mz.

   the code is

LIBRARY IEEE ;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.STD_LOGIC_UNSIGNED.ALL ;

ENTITY ADDRESS_GENERATOR is

 port ( 
 	
 	sfp	     : in  std_logic		   	;
	clk	     : in  std_logic		   	;
	reset	     : in  std_logic 		   	; 	
 		        
        READ_address : out std_logic_vector(10 downto 0) 		
      );
 END ADDRESS_GENERATOR ;
 
 
ARCHITECTURE BEHAV OF ADDRESS_GENERATOR IS 

signal read_address_s     : std_logic_vector(10 downto 0);  

begin
 

 process(clk,reset)     
 
   begin
   
    
    if reset='1' then   
       read_address_s <= (others => '0' ) ;
           
    elsif clk='1' and clk'event then
          
       
        
          if sfp='1' then   

           read_address_s <= "00000011000" ;

	  elsif read_address_s ="10000110111"  then 
           read_address_s <= (others => '0') ;                   
          
          else                       
            read_address_s <= read_address_s + 1 ; 
              
          end if ;
          
       
                        	
 	 
    end if ;
    
    
    
  end process ;
  read_address <= read_address_s ;
end behav ;
	    						   
	    						   
configuration cfg_address_generator_behav of address_generator is
	for behav
		
              end for ;

        	
	

end cfg_address_generator_behav;

is there any material or web sites how to  place and route  the design in to the virtex device such that its working frency is very high.

thanx in advance

regards
Manjunathan

Article: 29945
Subject: Spartan-II VREF and VCCO
From: Kolja Sulimma <kolja@bnl.gov>
Date: Mon, 19 Mar 2001 15:52:33 +0100
Links: << >> << T >> << A >>

Hello all,

I have two questions regarding Xilinx Spartan-II I/O.

1. Abuse of VRef as differential input.

I need one high quality low jitter input clock. (<50ps RMS Jitter)
I found a couple of Clock Synthesizers with PECL outputs that have a RMS
jitter down to 2.6ps.
Now I am wondering how to interface PECL to a Spartan-II.
Of course I could buy a PECL to CMOS converter. I could also use
Virtex-II or Virtex-E but engineering
is the art of building what you need with what you have, therefore I
woul like to know:

- could I set the VRef to 2.8V and use one of the PECL signals as single
ended clock input? (2.3V to 3.3V signal)
- could I connect VRef of one bank to the inverted CLK signal and GCLK
to the positive CLK signal an get a
  differential input as aresult? (If have a lot of unused I/O and can
spare a bank)

2. Unused VCCO

I am using a PQ208 Package where the VCCO of all banks are internaly
tied together.
However I am only using the outputs of two of the I/O banks. Is it
sufficient to externaly connect VCCO of these two banks and leave the
unused banks externaly unconnected to simplify the layout?

Thanks in advance,

Kolja

Article: 29946
Subject: Re: Virtex USB solution
From: Kolja Sulimma <kolja@bnl.gov>
Date: Mon, 19 Mar 2001 15:57:19 +0100
Links: << >> << T >> << A >>


Also, there are USB parts around that are not much more expensive than a
configuration PROM.
I think there are a lot of USB applications where an FPGA that receives its
bitstream from the USB driver
is a good way to go.

Kolja

> Believe me, I prefer to do FPGA designs.  But using this type of part
> makes a whole lot more sense than trying to implement USB in an FPGA.  I
> have the product concept nailed down, and I think the TUSB3200 is the
> way to go.
>
> -a

Article: 29947
Subject: Re: TBUFs in Virtex and later chips, going out of fashion, what instead
From: Ray Andraka <ray@andraka.com>
Date: Mon, 19 Mar 2001 15:08:37 GMT
Links: << >> << T >> << A >>



Jan Gray wrote:
> 
> Indeed, the halving of TBUFs/LUT in Virtex, and again in V-II, make my
> datapaths larger/less functional per LUT, compared with XC4000.  (Consider
> what happens to the result mux in the xr16 CPU datapath schematic S3 on p. 9
> of www.fpgacpu.org/papers/xsoc-series-drafts.pdf, for example.  Not to
> mention the "zero cost" <<2, <<4, <<8, >>2, >>4, >>8 shifters and bus
> byte/word/longword resizers you can build with spare columns of TBUFs.)
> 
> Reoptimizing for Virtex has been a chore.  (Another setback in Virtex vs.
> 4000 was the loss of independent clock inversion on LUT RAM WCLKs and LUT FF
> CLKs, but that's another story.)

Another drawback to the VIrtex is that you no longer get the carry chain for
free for functions where you are only interested in the carry out.  As a result,
something like a saturating limiter that was able to be implemented in one
column of CLBs in 4K, now takes two columns of slices with the LUTs in the first
used as pass-throughs to the carry chain :-(

> 
> But for V-II there seems to be no practical alternative but to a) use
> (waste) LUTs and their interconnect to build these horizontal muxes, and/or
> b) recode your design to help your technology mappers merge some of the
> muxes into other logic.

I haven't looked at it closely, but it seems to me that you might be able to use
the horizontal OR chains for this.  Have you investigated it?

> 
> Regarding (b), using the Virtex-style carry logic (including MULT_AND), it
> seems possible to build these "free mux" structures:
>   1) o[i] = addsub ? (a[i] + b[i]) : (a[i] - b[i])
>   2) o[i] = add ? (a[i] + b[i]) : c[i]
>   3) o[i] = addb ? (a[i] + b[i]) : (a[i] + c[i])
>   4) o[i] = addsub ? (addand ? a[i]+b[i] : a[i]-b[i]) : (addand ? a[i]&b[i]
> : a[i]^b[i])
> 
> See http://www.fpgacpu.org/log/nov00.html#001112 for details.
> 
> Synthesis tools get (1) (usually) but (as far as I know) miss the others.
> 
> Consider case (2). An add followed by a mux would seem to be a pretty common
> circuit structure, and therefore important to optimize.  Surely using the
> single-LUT-per-bit construction is a no-brainer, right?  Not so fast!  There
> are some tools issues.

Jan, you are correct.  The tools do not properly infer this (as well as certain
adds/counters with resets if they are anything but a dirt simple
adder/increment).  This, and ability to direct placement are some reasons I
often use instantiated circuits within a generate instead of the more readable
inferred logic.

> 
> If you inefficiently implement this as two LUTs:
>     t = a[i] + b[i];
>     o[i] = add ? t : c[i]
> then trce will "see" that the latency from c[i] to o[i] is Tilo.  Good.
> 
> But if you implement it in one LUT as
>     o[i] = add ? (a[i] + b[i]) : c[i]
> e.g.
>     o[i] = add&(a[i]^b[i]) + ~add&c[i]
> along with the appropriate configuration of MULT_AND, MUXCY, and XORCY, then
> (if I recall correctly) trce will also find false ripple-carry paths from
> c[i], e.g. from c[0] to o[31], which would therefore interfere with correct
> static timing analysis and with timing driven placement and routing.  Oops!

Yep.  TRCE doesn't do anything in the way of analyzing the logic in the
circuit.  It just adds delays between FFs.  If you are careful with the
constraints, you can block the false path, but it usually doesn't warrant the
effort or the added potential for accidently ignoring a valid path.
> 
> Therefore, I would like to see two tools enhancements to enable correct
> inference of add/mux in one LUT per bit:
> 
> a) Xilinx should enhance trce to do a more precise analysis around
> ripple-carry structures, e.g. to rule out the false path from c[i] through
> the carry chain to o[i+1]...o[n].  Here with 'add' feeding MULT_AND, there
> is no carry-out if 'add' is false, and also, c[i] does not influence the
> carry-out when 'add' is true, and thus the MUXCY carry-out does not depend
> upon c[i].
> 
> b) Xilinx should lobby its synthesis partners to infer add/mux structures
> like (2)-(4) when possible.  Or encourage a user-directive to force it.
> 
> If both (a) and (b) were done, then Xilinx customers (synthesis users and
> RPM builders alike) would probably enjoy somewhat smaller and faster results
> in the devices they're already using.
> 
> This add/mux inference digression aside, abundant TBUFs were useful and will
> be missed.  But I suppose that any FPGA feature that HDL synthesis users and
> tools do not take good advantage of, is not long for this world.
> 
> Jan Gray, Gray Research LLC

-- 
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com

Article: 29948
Subject: Re: TBUFs in Virtex and later chips, going out of fashion, what instead
From: "Austin Franklin" <austin@dark99room.com>
Date: Mon, 19 Mar 2001 10:37:37 -0500
Links: << >> << T >> << A >>

> In the earlier Xilinx chips (3000, 4000, 5200) there is always 2
> TBUFs per CLB of 2 LUT+FF. And both TBUF lines can be read.
>
> In Virtex and Spartan-II it is down to 2 TBUFS for the 4 LUT+FF of an
> CLB, so only one slice can be routed to them, and even only 1 line for
> reading back from TBUF lines.
>
> In Virtex-II there is even only 2 TBUFs for 8 LUT+FF per CLB. Which
> makes even connecting the output of the 4-wide 2 slices of an single
> carry chain to an bus impossible. The data sheet does not give the
> amount of readbacks.
>
> From this I get the impression, that Xilinx regards TBUF buses as going
> out of fashion. After all, the TBUFs cost in chip space is next to nothing
> relative to them many PIPs (about 900 per CLB in Virtex).

I believe a lot of this has to do with HDLs.  I know that most all the
people I know using HDLs for Xilinx design don't even know what a TBUF is,
or even how they would use it.  I also think the tools, tutorials, classes
etc. poorly support using them.

Article: 29949
Subject: IRDY/TRDY (was Re: More detailed Spartan II CLB drawings?)
From: Kolja Sulimma <kolja@bnl.gov>
Date: Mon, 19 Mar 2001 16:37:41 +0100
Links: << >> << T >> << A >>

Chris Dunlap wrote:

> You can always look in FPGA editor.  Nothing can be left out there.  If its
> routed or routable, its there.

Sure it can be.
Or can you used the mysterious undocumented IRDY/TRDY pins special features of
Spartan-II in FPGA editor?

Using a dominance in the FPGA market to get an advantage in the PCI-core market
looks at lot like the
Microsoft Internet Explorer case to me.

CU,
        Kolja

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search