Messages from 16550

Article: 16550
Subject: FPGA express + VHDL: strange SR implementation?
From: micheal_thompson@my-deja.com
Date: Thu, 27 May 1999 17:58:31 GMT
Links: << >> << T >> << A >>

Hi
I'm using a fairly text-book ( I think) piece of code here to get an SR
latch:
    busy_sr_reg: process(set_tx_busy, clr_tx_busy)
    begin
		if (set_tx_busy = '1') then
			tx_busy <= '1';
		elsif (clr_tx_busy = '1') then
			tx_busy <='0';
		end if;
    end process busy_sr_reg;

My target FPGA is an ACTEL A3200DX. After FPGA express optimised it and
pumped out an EDIF file ready for ACTEL I went back and converted that
same EDIF into a schematic representation.
All the synchronous stuff (not listed here)looks fine but the SR
implementation seems a tad dodgy. I had expected to see it implemented
using the set and pre-set inputs of a DFF but instead it uses a latch:
The 'D' pin is connected to 'set' and the 'G(ate)'(active lo) is
connected to the 'NOR' of 'set' and 'clear'. Conceivably then if the
Nor-gate path had a large delay you could have a situation where the
falling edge of set will inadvertently clear the output?

So, I'm asking the following:

Is schematic viewing at this point unrepresentative of the actual
(fitted/ layed out) circuit? In other words should I wait until the
vendor tool (which I don't have yet) does its stuff as maybe it will
put its own spin on the circuit?

regds
Mike


--== Sent via Deja.com http://www.deja.com/ ==--
---Share what you know. Learn what you don't.---

Article: 16551
Subject: Re: High Speed Reconfigurability
From: Ray Andraka <randraka@ids.net>
Date: Thu, 27 May 1999 15:48:53 -0400
Links: << >> << T >> << A >>

brian_n_miller@yahoo.com wrote:

> Ray Andraka <randraka@ids.net> wrote:
> >
> > I've been using FPGAs for several years.
>
> But probably not ones which reconfigure during operation,
> which is what this discussion is about.
>

Nasty, Nasty...
Actually, I have been using reconfigurability.  True, there is not much
application for it in high data rate DSP systems unless there is a gap
in the data to provide time for reconfiguration.  But, I have worked
extensively with dynamic partial reconfiguration using Atmel and NSC
Clay parts, as well as on a chip at a time level with Xilinx 4K.
Several of the issues in partial reconfiguration of a system while the
clock is running were first encountered by me about 5 years ago.  You
can see some of the results of my work in the dynamic video processor
paper on my website.  In the higher data rate DSP, I've applied
reconfiguration for mode switching and system debug (see the radar
simulator paper from last year's MAPLD conference, also on my website).
Another application I designed, which I have presented in seminars, is a
universal smart card controller that started with a 'detect'
configuration which would figure out which of about 12 different flavors
of smart card was inserted.  Once detected, the FPGA (a 4013) initated a
reconfiguration of itself with the specific smart card interface for the
detected card.  When the smart card was withdrawn, the FPGA would again
reconfigure with the detect circuit.

> > A rack of DSP processors is outperformed by a single board
> > with a couple FPGAs.  Some examples:
> > A doppler weather radar demodulator, ...a radar simulation
> > system, ...QAM demodulator.
>
> Those are freakish niche applications.  I thought we were
> talking about the applicability of reconfigurable hardware
> to mainstrean software systems.
>

My point is that I think the real killer app is the ability to handle
these "freakish niche apps" in mainstream hardware.  That is a good deal
of the basis of mainstream reconfigurable computing thinking, isn't it?

> > I find the 'killer app' to be the scores of onesy-twosy DSP
> > applications that are pushing the envelope of DSP microprocessor
> > capability.  There's lots of those out there.
>
> But you're not arguing for on-the-fly reconfigurability, are
> you?
>

Where it applies, yes I am.  I just so happens that until the
reconfiguration bottle-neck is solved, that the time to reconfigure
swamps the processing time by several orders of magnitude.  For
applications like video where you can use the retrace time to
reconfigure, dynamic reconfiguration makes sense if there is a savings
in the logic by using overlays (see my paper).  If the number of
overlays needed is small and the volume is large, then the end user is
probably going to elect to use an ASIC with circuits for all the
overlays built in.  The effort involved for successfully pulling off a
dynamically reconfigured system is considerably greater than the effort
required to make static designs, especially if part of the chip is left
operating while part is reconfigured.  Issues include making sure in-use
routing and logic is not disturbed during configuration.  That entails
making sure the in-use routing in the vicinity of the reconfigured logic
is left alone, as is the logic.  There is also the issue of making sure
that inputs from the reconfigured area are ignored during configuration
and outputs to the reconfigured area are buffered so that inadvertant
contention does not disturb operating parts using the same signal (this
can happen even if you are real careful: during configuration, the
circuit changes are not done instantaneously.  Depending on the previous
circuit and the order that peices are changed, you can get momentary
contention on signals...this also depends on the part.  I've seen this
happen first hand).  Keeping sanity while reconfiguring with an active
clock requires extensive handcrafting of the design overlays as well as
the configuration process to avoid the many pitfalls.  Right now, there
are not really any tools to help in this regard.

What it comes down to is this:  The tools for on-the-fly reconfiguration
are essentially non-existant, so the effort to successfully pull it off
(and the risk involved if you don't) makes partial reconfiguration less
attractive.  Additionally, the configuration takes too long to do
relative to the processing clock cycle to be useful in many cases.

I've been around the block a few times on configurable systems, and I
think I've found a number of the pitfalls the hard way.  Yes it can be
done if you are careful.  Is there a killer app around the corner?  I
haven't seen anything that hints of one, but it could be sneaking up
behind me.  Point is, an application with enough volume to be called a
killer, is probably going to use something that has less of an FPGA
flavor and more of a 'can of circuits optimized for the task' connected
by a programmable interconnect.  For comm, that might be a demodulator
with extra logic to make it configurable to many standards, plus some
error recovery logic (again with some limited programmability), etc.
Reconfigurable, yes.  Reconfigurable in the sense talked about here...I
don't think so.

> --== Sent via Deja.com http://www.deja.com/ ==--
> ---Share what you know. Learn what you don't.---

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 16552
Subject: Re: High Speed Reconfigurability
From: brian_n_miller@yahoo.com
Date: Thu, 27 May 1999 20:53:43 GMT
Links: << >> << T >> << A >>

tt@cryogen.com wrote:
>
> Evolutionary computation spring to mind.

Total freak show.

> /* Parallelisable parts - like this, for example */
> Widget[] array;
> for (int i=0; i++ < MAX; ) {
>   array[i] = new Widget(); }

How do you know that that is parallelisable?  Widget's
constructor may have side-effects.  If Widget gives
each new instance a serial number, and the program expects
the array to contain ordered instances of increasing
serial number, then the assignment of serial numbers
can't be parallelized, and therefore Widget instantiation
can't either.  You're not proposing an unsafe optimization,
are you?

Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.

Article: 16553
Subject: Free Hardware/Software Co-Verification Workshop - Raleigh, NC
From: mike@rtp-nc.mentorg.com (Mike Walsh)
Date: 27 May 1999 21:44:04 GMT
Links: << >> << T >> << A >>

             Hardware/Software Co-Verification with Seamless-CVE

                                June 3, 1999
                        Marriott Research Triangle Park
                              4700 Guardian Dr.
                              Durham, N.C. 27703
                                (919) 484-2515

As software and hardware complexities continue to increase, traditional
verification methods that focus on independent verification of software and
hardware portions are proving inadequate.   During this free one-day
workshop, you'll acquire hands-on experience with Mentor Graphics' Seamless
Co-Verification Environment* which enables software testing on simulated
hardware before a hardware prototype is available.  Discover how to produce
a better quality system while reducing time to market using co-verification
techniques.

You'll learn how to:
*  Incorporate hardware/software co-verification techniques in your
   design environment
*  Use your hardware design as a virtual prototype
*  Run unmodified software on a virtual hardware prototype
*  Apply optimization techniques to verify large amounts of software on
   your system
*  Identify systems which will benefit most from co-verification
   methodologies

Who should attend:
*  Hardware, Software, Firmware and Diagnostic Engineers designing
   systems with embedded microprocessors
*  ASIC designers utilizing embedded microprocessor cores
*  Systems engineers
*  Engineering and CAD managers who implement co-verification processes

Particulars:
*  Workshop is one day long and runs from 9:00 AM to 4:30 PM.
*  Workshop will be held at the Marriott RTP, 4700 Guardian Dr, Durham
   N.C. 27703
*  Continental breakfast and lunch will be served.
*  Workshop includes notebook of presentation materials and labs.
*  Complete the on-line registration form or call 1-919-484-2500 to
   register.  Registration is limited to 14 attendees per session.

Attendee comments from previous workshops:
*  The hands-on approach is much more effective than a slide
   presentation show.
*  Instructor was informative emphasizing "real life" situations.
*  Lab intensive learning is the way to go!

Article: 16554
Subject: Pipeline/Delay Stages in a Feedback Loop
From: "Nestor C." <nestor@ece.concordia.ca>
Date: Thu, 27 May 1999 21:32:33 -0400
Links: << >> << T >> << A >>

Hi everybody.

I was wondering if anyone had some information on speeding up a design
with a feedback loop.  I am building a Digital Phase-Locked Loop (DPLL)
in VHDL for synthesis, but creating it directly as the theory suggests
makes it very slow since the general theoretical model does not include
any pipelining.

I was looking for techniques to increase the operational speed but I am
having some problems in dealing with pipeline registers that I include
in the feedback path.  This is a problem because data that arrives at
the end of the feedback path arrives too late to update the data that
will just enter the loop. I have thought about adding a redundant
section for the updating that will work in parallel with the main design
and should be able to fix the problem, but this solution is not very
elegant and will cost me area in my FPGA.

Has anyone optimized feedback circuits using similar or better
techniques? Any good reference on hardware design of DPLLs or other
feedback circuits would be greatly appreciated.

Thanks in advance.
Nestor Caouras
nestor@ece.concordia.ca

Article: 16555
Subject: Re: Xilinx M1.5 Crash
From: Zoltan Kocsi <root@127.0.0.1>
Date: 28 May 1999 12:34:54 +1000
Links: << >> << T >> << A >>

"Daniel K. Elftmann" <elftmann@ix.netcom.com> writes:

> determining the native platform a tool was developed on.  Unfortunately, at
> least for the Actel P&R I know for fact (I get to visit with the developers
> quite
> regularly) that the tools are all developed on Solaris first then ported to
> the PC environment.  Sorry, but just could'nt let that go without a
> correction.

I stand corrected, in fact I have to apologise for false statement. Their
P&R tool (designer)'s About box doesn't actually say that it's a Windows 
version, only the burner software's (apsw) does. 

However, you made *my* point: they develop using unix (that is, they spend 
extra $ for development), then, spending more $ they port it to Windows.
Then you can buy the unix version, or you can buy the Windows version for 
half that much or you can order your free Windows CD which includes:

- VeriBest HDL & schematic entry
- VeriBest simulator
- Synplicity synthesis
- Designer Lite
- Silicon Explorer

I guess Actel did not get the 3rd party tools for free either (although
I can't possibly be sure). You get them for free (for a year) if you
use Windows. Of course, if you use unix, you dont get it for free.
Actually, you may not get them at all. Which leads to the next
question: according to you, they develop all tools using Solaris. Then 
I assume they develop Silicon Explorer using that, then they port it to 
Windows so you can buy the Windows version. So why is then that you can't 
buy the Solaris version ? Last time I tried, I've been told that Explorer 
is available *only* for Windows.

Anyway, from this thread it seems to me that I'm pretty much alone with 
my wish of an affordable yet not Windows based FPGA toolchain.
FPGA designers do indeed *want* to work with Windows, and Windows only. 
This very much justifies the vendors' focusing on Windows tools, even if 
at least some of them rather develop under unix (as you pointed out).

Zoltan

-- 
+------------------------------------------------------------------+
| ** To reach me write to zoltan in the domain of bendor com au ** |
+--------------------------------+---------------------------------+
| Zoltan Kocsi                   |   I don't believe in miracles   |  
| Bendor Research Pty. Ltd.      |   but I rely on them.           |
+--------------------------------+---------------------------------+

Article: 16556
Subject: Re: High Speed Reconfigurability
From: Steven Casselman <sc@vcc.com>
Date: Thu, 27 May 1999 19:39:13 -0700
Links: << >> << T >> << A >>

brian_n_miller@yahoo.com wrote:

> rolandpj@bigfoot.com wrote:
> >
> > [JIT:] why not do the same thing, but right down to the hardware,
> > rather than down to machine code. What you need, however, is a
> > general compiler from a high-level language (Java bytecode?) to
> > fpga gates.
>
> Which function or aspect of JVM operation would most benefit
> from reconfigurable hardware?
>
> --== Sent via Deja.com http://www.deja.com/ ==--
> ---Share what you know. Learn what you don't.---

First off you'd most likely have a java processor
in the FPGA.
Today:
Next you might profile your program
code and find out where the hot spots are.
Then you design a custom unit to hook onto
the java processor (like a coprocessor) and then
substitute the that code with some little code that talks
to your java processor that you built.

In the near future:
download the bytecode to you JVM that resides on your
reconfigurable computer it sees some flages in the code
that says "if you can reconfigure your hardware then the
circuit is X" your machine compiles your accelerator dynamiclly
loads it next to the hardware JVM and you've got
faster execution for you problem.

I've been in reconfigurable computing since the beginning
and I've noticed that the power of RC grows
by an order of magnitude every 4-5 years.
This beats Moores Law's doubling every 18 months
(and getting longer) because RC gets all
the benefit of Moores Law along with the ability to
optimize an algorithm down the bit level.
Add to this that the FPGAs are getting much better
architectures and you'll see processors don't
stand a chance.

You see there is a wall called the Flynn Limit the
applies to processors which is
"a processor is limited by the number of instructions
it can execute in one clock."

There is no such limit for reconfigurable computers.

For RC systems the limit is "a RC system is
limited by how much logic can be applied
to an algorithm at any one time."

--
Steve Casselman, President
Virtual Computer Corporation
http://www.vcc.com

Article: 16557
Subject: Re: C to EDIF translator??Anyone?
From: Tom Kean <tom@algotronix.com>
Date: Fri, 28 May 1999 03:42:18 +0100
Links: << >> << T >> << A >>

This is a multi-part message in MIME format.
--------------CEC50A01669225425CB8D308
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

You could have a look at the Handel-C work by
Ian Page at Oxford University (www.comlab.ox.ac.uk).

There is a company called Embedded Solutions that
markets the software (www.embeddedsol.com).

Tom.

prastogi@my-dejanews.com wrote:
> 
> Hi,
> 
> I am working on the translation of C to EDIF file
> format for synthesis to FPGAs/ASICS. In this
> respect I need to write a Translator from a subset
> of C (say only mathematical operations) to EDIF.
> 
> Please do let me know EVEN if you have a hint or
> know something vaguely. Any references will be of
> great use. ALso If there are any commercial tools
> in the industry/research community , I am
> interested . Price is not a constraint.
> 
> Thank you all,
> 
> Pranav R.
> 
> --== Sent via Deja.com http://www.deja.com/ ==--
> ---Share what you know. Learn what you don't.---
--------------CEC50A01669225425CB8D308
Content-Type: text/x-vcard; charset=us-ascii;
 name="tom.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Tom Kean
Content-Disposition: attachment;
 filename="tom.vcf"

begin:vcard 
n:Kean;Tom
tel;fax:UK +44 131 556 9247
tel;work:UK +44 131 556 9242
x-mozilla-html:TRUE
org:Algotronix Ltd.
adr:;;P.O. Box 23116;Edinburgh;;EH8 8YB;Scotland
version:2.1
email;internet:tom@algotronix.com
title:Director
note:Web Site: www.algotronix.com
x-mozilla-cpt:;4768
fn:Tom Kean
end:vcard

--------------CEC50A01669225425CB8D308--

Article: 16558
Subject: RAM for external/internal use
From: Garrick Kremesec <gkremese@ews.uiuc.edu>
Date: Thu, 27 May 1999 22:12:08 -0500
Links: << >> << T >> << A >>

Hello,

   I am very new to FPGA vhdl programming, but I have done a lot of work
with PLD (yes, big diff. in use, style, difficult, etc.).  Anyway,
starting from scratch, I am using Max Plus 9.23 and working with an
EPF10K10 FPGA.

I have most of my microcontroller control lines, several address lines,
and my 16 bit data bus connected to the FPGA.  I would like a memory
block (64 words of data) that I can read or write to using the
microcontroller, but I can also work with internally.

For instance, I have an ADC connected to the FPGA which will be flipping
through a mux and reading values and I want them stored in these memory
locations.

Also, I have to generate 8 PWM signals.  I want the microcontroller to
be able to send configuation bytes with information on duty cycle.

I can't figure out how to do this.  The ram/rom packages I could find
all appear to not allow other internal access simulateneously of this
memory:

I thought I would have to set up dual port memory, but the only dual
port ram component Altera has places the input and output of BOTH ports
on seperate lines!  Thats 64 data lines for a 16 bit, dual port ram. 
Perhaps I'm not looking in the right places.  And perhaps there is a way
to combine the data lines into one I/O/tristate internally and
externally.  Any help will be greatly appreciated.

Garrick Kremesec
University of Illinois
gkremese@ews.uiuc.edu

Article: 16559
Subject: Re: C to VHDL translator?
From: Baris Aksoy <baris.aksoy@gte.net>
Date: Fri, 28 May 1999 05:50:16 GMT
Links: << >> << T >> << A >>

Enrico Migliore wrote:

> hi all
> I'm looking for a program that translates C into VHDL.
> I know there is at least one around and its name is AIRT Builder,
> but can't find who makes it.
>
> thanks
> Enrico

Either Frontierd or Clevel cannot gurantee the %100 conversion. The
efficiency of the conversion really depends on the description of your C
code. Both of them not good in behavioral C conversion. There're some
free university developed tools as well. I can't remember right now, but
I'll post it tomorrow...

Baris

Article: 16560
Subject: Re: High Speed Reconfigurability
From: Roland Paterson-Jones <rpjones@hursley.ibm.com>
Date: Fri, 28 May 1999 11:26:59 +0100
Links: << >> << T >> << A >>

brian_n_miller@yahoo.com wrote:

> Total freak show.

Yes, thanks, it's very entertaining.

> > /* Parallelisable parts - like this, for example */
> > Widget[] array;
> > for (int i=0; i++ < MAX; ) {
> >   array[i] = new Widget(); }
>
> How do you know that that is parallelisable?

Because you inspect (and inline) the code. This is getting puerile.

Roland

Article: 16561
Subject: Re: High Speed Reconfigurability
From: Tim Tyler <tt@cryogen.com>
Date: Fri, 28 May 1999 12:31:31 GMT
Links: << >> << T >> << A >>

In comp.arch.fpga brian_n_miller@yahoo.com wrote:
: tt@cryogen.com wrote:

[people buying FPGAs to run a particular type of s/w?]

:> Evolutionary computation spring to mind.

: Total freak show.

Great - my primary interest in high-speed reconfigurability dismissed
with a wave of Brian Miller's hand ;-)

Evolutionary computations parallelise /extremely/ well.  The field offers 
applications for which there's no practical limit on how much parallelism
is desirable - the more the merrier.  Further, typically it eats
processing power like there's no tomorrow.  There are few applications
*more* suitable for implementation on FPGAs than this one.

Was the other example application that I gave where people were prepared
to buy FPGA in order to run particular software - that of prototyping
electronic circuitry - also (in your own words) "Pure fantasy"?

:> /* Parallelisable parts - like this, for example */
:> Widget[] array;
:> for (int i=0; i++ < MAX; ) {
:>   array[i] = new Widget(); }

: How do you know that that is parallelisable?  Widget's
: constructor may have side-effects. [...]

Obviously you look at Widget's constructor and determine whether a
parellel implementation is possible or not.

If you can see that it's parallelisable - then you can parallelise it.

If it's not - or the constructor is so hairy that the compiler can't
figure out whether it is or not - then you don't.
-- 
__________
 |im |yler  The Mandala Centre  http://www.mandala.co.uk/  tt@cryogen.com

With a mind like yours, who needs a body?

Article: 16562
Subject: Re: High speed with VHDL
From: "Jamie Sanderson" <jamie@nortelnetworks.com>
Date: Fri, 28 May 1999 09:15:13 -0400
Links: << >> << T >> << A >>

This will depend on the device you're using and the VHDL synthesis tool.
These tools are moving more and more towards intelligent use of device
resources. Now, for example, counters, adders, etc. created in VHDL and
synthesized with Synplify will be just as fast as those instantiated from a
schematic. Furthermore, flip-flops that can be put into I/O blocks
(architecture dependent) are now put there. Of course, this depends on your
having intimate knowledge of the synthesis tool.

Unfortunately, there are still many functions that aren't recognized by many
synthesis tools, such as DSP primitives. For these, you would need to
instantiate a piece of pre-synthesized logic into your design in order to
get the same performance as a schematic design. In fact, this is really
equivalent to a schematic, except that you're using text instead of
pictures.

Finally, some improvements can be made through giving constraints to the
vendor's fitting tool. This is where you get the least chance to fix the
design, though. The fitter will never turn a sub-optimal multiplier into the
optimal one.

Cheers,
Jamie

Heinrich Fonfara wrote in message <374D3B70.6F9110D5@ibmt.fhg.de>...
>Hi,
>
>Because of the high speed requirements to the designs I have implemented
>in FPGAs
>I mostly used schematic entry for sequential parts.  I recognized  the
>benefits of  VHDL
>only for coders or state machines (maybe also some other parts) that had
>after
>compilation the same performance as schematic solutions.
>
>My question is: How can I constrain the compiler to make the results
>most similar to that I found with schematic ?
>
>
>Thanks
>
>Heinrich Fonfara
>
> fonfarah@ibmt.fhg.de
> http://www.ibmt.fhg.de/
>

Article: 16563
Subject: Re: High speed with VHDL
From: Ray Andraka <randraka@ids.net>
Date: Fri, 28 May 1999 09:16:27 -0400
Links: << >> << T >> << A >>

Depends on the level of detail in the schematic.  In the extreme case, you
can instantiate the fpga primitives and put placement attributes (user
attributes) on them so that the edif netlist is essentially identical to the
edif netlist generated by the schematic.   The major exception you will see
is in the naming of the instances.

Heinrich Fonfara wrote:

> Hi,
>
> Because of the high speed requirements to the designs I have implemented
> in FPGAs
> I mostly used schematic entry for sequential parts.  I recognized  the
> benefits of  VHDL
> only for coders or state machines (maybe also some other parts) that had
> after
> compilation the same performance as schematic solutions.
>
> My question is: How can I constrain the compiler to make the results
> most similar to that I found with schematic ?
>
> Thanks
>
> Heinrich Fonfara
>
>  fonfarah@ibmt.fhg.de
>  http://www.ibmt.fhg.de/

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 16564
Subject: Re: High Speed Reconfigurability
From: Rickman <spamgoeshere4@yahoo.com>
Date: Fri, 28 May 1999 09:59:00 -0400
Links: << >> << T >> << A >>

Ray Andraka wrote:
...snip...
> Another application I designed, which I have presented in seminars, is a
> universal smart card controller that started with a 'detect'
> configuration which would figure out which of about 12 different flavors
> of smart card was inserted.  Once detected, the FPGA (a 4013) initated a
> reconfiguration of itself with the specific smart card interface for the
> detected card.  When the smart card was withdrawn, the FPGA would again
> reconfigure with the detect circuit.

I am planning on doing something similar. My controller won't have to
deal with on-the-fly changes to the card, but it will identify the card
on power up and load the corresponding interface design. This is all
directed by the processor rather than by the FPGA design. I was thinking
of using a Dallas one-wire ID part for the identification. Very small,
and they only use one pin on my interface. How did you do the ID?

> > > A rack of DSP processors is outperformed by a single board
> > > with a couple FPGAs.  Some examples:
> > > A doppler weather radar demodulator, ...a radar simulation
> > > system, ...QAM demodulator.
> >
> > Those are freakish niche applications.  I thought we were
> > talking about the applicability of reconfigurable hardware
> > to mainstrean software systems.
> >
> 
> My point is that I think the real killer app is the ability to handle
> these "freakish niche apps" in mainstream hardware.  That is a good deal
> of the basis of mainstream reconfigurable computing thinking, isn't it?

I don't know that a QAM demodulator is "freakish". Often your other
choice is to add a DSP with memory, PROM... Doing this in an FPGA can
really save a lot of space. QAM is extensively used. It may not be the
"killer app", but it is something that would be used. 

...more snippped...

-- 

Rick Collins

rick.collins@XYarius.com

remove the XY to email me.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 16565
Subject: Re: virtex vs apex20k family comparison for DSP ?
From: Ray Andraka <randraka@ids.net>
Date: Fri, 28 May 1999 10:10:18 -0400
Links: << >> << T >> << A >>

Well, since nobody has taken this up, I'll do it.  I've looked relatively briefly at
the Apex architecture with regards to DSP application.  Both are capable devices, and
will work well for most DSP applications.   I find the Virtex to be a better choice
for most DSP applications, but as with most things in life, YMMV:

First a quick overview of the pertinent 20K improvements:
1. EABs have an added capability for product terms, sort of a PAL like structure.
Each supports up to 32 product terms. These are useful for wide input functions, but
frankly, I don't see much application to DSP data-flow architectures.

2.  The EAB in the 20KE has a content addressable memory mode.  This is very useful
for sorting and searching.  If your DSP application would benefit from this
capability, this may tip the scales in favor of the 20KE

3.  The largest devices have over 200 EABs. The large number of memories is intriguing
for applications that can benefit from enlarged LUTs.  However, I don't think there
are enough of these for the amount of LE's in the same device, considering that the
LE's can only store one bit per LE

4.  The LE is significantly improved over the 10K.
    a). The clock enable has been separated from the LUT inputs so that you don't lose
a LUT input by using CE.
    b). Dedicated synchronous clear and synchronous load is added to the LE flip-flop,
making it possible to do an accumulator in a single level of logic (Xilinx has alway
had this capability).  The 10K required 2 levels of logic because the arithmetic modes
reduce the LUT to a pair of 3 LUTs, so there was no way to synchronously load or clear
a FF without going to two levels of logic.
    c). Direct connections have been added between adjacent LABs in a row, so there is
now the high speed local interconnect required to efficiently implement multilevel
arithmetic logic (which is necessary due to the LE architecture for adder/subtractors
and multiplier partial products).

The biggest shortcomings of the 20K that I can see without actually doing a design in
the part are:
1.  The LE's cannot become small RAM elements.  This is significant in signal
processing in a few areas.
    a) First, it means that delay queues (which are quite common in pipelined DSP
processors) have to be implemented either as individual LE flip-flops, which rapidly
chews up logic resource, or in the EABs.  Using the EABs for delay queues is
inefficient because the minimum depth is 256, which is much deeper than the typical
delay queue requirement (which is usually less than 16 clocks).  Each different length
delay queue requires a separate EAB.  In Xilinx FPGAs, the CLB RAMs can be used as 1
bit by 1 to 16 clock delay queues. Virtex includes a self contained shift register
capability in the CLB RAM so that an external counter is not even needed.  The logic
reduction for delay queues in virtex is 17:1.
    b)  The lack of the CLB RAM makes implementation of an adaptive filter in Altera
very challenging compared with the implementation in any xilinx FPGA.  The distributed
arithmetic filters can be made adaptive by rewriting the LUTs used for the partials.
In xilinx this is easily done, since the LUTs can be implemented as CLB RAM instead of
CLBROMs.  For slow adaptation, the RAM is directly rewritten, while for faster
adaptation, a second set of LUTs implemented in RAM is muxed in and out, so that one
can be rewritten while the other is used.  In Altera, a fixed distributed arithmetic
filter can also be constructed in the LEs.  However, if the filter coefficients need
te be able to be rewritten (without reconfiguring the device), the LEs cannot be
used.  The filter LUTs can be implemented using EABs to get 8-12 rewritable serial
taps per EAB depending on the width of the coefficients.  If the sample rate is higher
than can be supported by a serial input, then EABs need to be parallelled.  If the
coefficients are wider than the EAB, then additional EABs are used to extend the
width, and when more taps are required, additional EABs are used.  For high rate
and/or long filters, the EAB resource is quickly depleted.  In that respect, the
Virtex is capable of much bigger and faster adaptive filters than the 20K.
    c) register files and reordering queues must be implemented in EAB resources or as
individual LE flip-flops and steering logic in the 20K, regardless of the size of the
file or queue.  Virtex allows the exact size required to be constructed in CLB RAM.

2.  Xilinx CLBs allow 4 input arithmetic functions in a single layer of logic, and
include sufficient logic to perform a 2xN partial product in a single level of logic.
Altera's structure breaks the LUT into a 3 LUT if the carry chain is used, and one of
the LUT inputs is for the carry, so only 2 input arithmetic functions can be performed
in a single level (with the exception that direct clears and loads are now
permitted).  This provides a speed and area advantage for Virtex in arithmetic
applications like multipliers, and CORDIC rotators.

3. The Virtex block RAM is considerably more powerful than the 20K EAB, although there
is alot less of it.  Virtex block Ram is a true dual port, allowing simultaneous read
and write access from both ports to the memory (either port can read or write).  Each
port can work on an independent clock, and the memory organization can be set to
different settings for each port.  The memory organization thing lets you access the
memory as say a 1Kx 4 on one side and as a 256 x16 on the otherside, giving a
bus-resizing capability for 'free'.  The Altera EAB is also dual ported and can change
the memory organization.  The memory organization, however is common to both ports.
The dual port capability in the Altera EAB provides a read port and a write port: you
cannot write into the read port or read from the write port.  The ports can operate on
separate clocks.

In Altera's favor, the routing structure makes the route delays more uniform, so
placement is not as critical as it is for xilinx.  That makes it easier for the
synthesizer to get good results.  In fairness, the Virtex is quite an improvement over
the 4K in this regard, as the routing is more plentiful and more predictable in terms
of delays.  Still, there is a strong correlation between distance and delay, so
floorplanning a Virtex design can still result in significant performance gains.  The
Altera tools are geared more toward the big green pushbutton, which is fine for the
casual user (even advantageous),  Unfortunately, the altera tools don't provide the
hooks for the expert to get into the tool to drastically improve the performance (this
may have changed in the quartus tools for Apex, which I have not yet seen).  Xilinx
tools have more ways to get in and control (or F*$%-up) the results, and the silicon
has more potential for the expert user to wring out extra performance.

So, in summary, it comes down to what is needed by your application.  For general DSP,
the lack of the CLB RAM capability and the multi-levels needed for arithmetic
functions are significant demerits in most of the cases I deal with.

muzo wrote:

> hi,
> has anyone studied virtex vs apex20k for dsp applications and have
> postable results ?
>
> thanks for any and all comments.
> muzo
>
> Verilog, ASIC/FPGA and NT Driver Development Consulting (remove nospam from email)

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 16566
Subject: Re: Xilinx M1.5 Crash
From: Rickman <spamgoeshere4@yahoo.com>
Date: Fri, 28 May 1999 11:09:51 -0400
Links: << >> << T >> << A >>

Zoltan Kocsi wrote:
> However, you made *my* point: they develop using unix (that is, they spend
> extra $ for development), then, spending more $ they port it to Windows.
> Then you can buy the unix version, or you can buy the Windows version for
> half that much or you can order your free Windows CD which includes:
> 
> - VeriBest HDL & schematic entry
> - VeriBest simulator
> - Synplicity synthesis
> - Designer Lite
> - Silicon Explorer
> 
> I guess Actel did not get the 3rd party tools for free either (although
> I can't possibly be sure). You get them for free (for a year) if you
> use Windows. Of course, if you use unix, you dont get it for free.
> Actually, you may not get them at all. Which leads to the next
> question: according to you, they develop all tools using Solaris. Then
> I assume they develop Silicon Explorer using that, then they port it to
> Windows so you can buy the Windows version. So why is then that you can't
> buy the Solaris version ? Last time I tried, I've been told that Explorer
> is available *only* for Windows.

You are confusing the issues of product development with the decisions
of marketing. If they sell 10 copies of Windows software for each copy
of Unix software, it doesn't matter what platform they develop on. As to
the copies they give away, that is not an issue except that it likely
results in more sales. Remember, software is not like hardware. You
don't need to make money on the initial sale. You can make money on
updates and maintenance. 

> Anyway, from this thread it seems to me that I'm pretty much alone with
> my wish of an affordable yet not Windows based FPGA toolchain.
> FPGA designers do indeed *want* to work with Windows, and Windows only.
> This very much justifies the vendors' focusing on Windows tools, even if
> at least some of them rather develop under unix (as you pointed out).

You are also confusing an open discussion with a lack of support. If all
the things I have heard about Linux (or even half, like not crashing
twice a day) are true, and I could get free (or low cost), vendor
supported tools, then I would be happy to use it. I have thought about
getting it just so I could evaluate it. But time is a real issue for me
now. I have a friend who might do that for me if I buy it. I might just
take him up on it. 

-- 

Rick Collins

rick.collins@XYarius.com

remove the XY to email me.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 16567
Subject: Rom use, Renoir => leonardo => maxplus2
From: "ron van smoorenburg" <r.smoorenburg@pbc.bc.philips.com>
Date: Fri, 28 May 1999 17:20:48 +0200
Links: << >> << T >> << A >>

hi

i want to implement a some rom in a flex10k. Therefore i have written vhdl
code. When i want to compile this code by maxplus2, he gives the
following message, during the database build.=>  Error : node
'|timing_control:I1|timing_cntl:I0|:10676.IN1' missing source
And he repeats it 91 times. Can anyone tell whether this is usefull
information and how i should read it.

I have tried to overcome the problem by compiling the vhdl files with
leonardo spectrum, this program gives an edif file which i want to read with
maxplus2. then i get the next error during the compiler netlist extractor
phase.
=>Error :  Cann't find design file 'romd_dl_hex'
my first guess was that it wasn't in the correct directory, so i changed it,
then i noticed in the hierarchy display that he was looking for a .gdf file.
So i changed the name of the file to romd_dl_hex.gdf and maxplus2 shaw the
file and noticed it was an incorrected type(intel hex file).
So i figured out that during the compilation leonardo did something wrong,
so maxplus2 doesn't understand it anymore.
Does someone know an answer to one these questions?

Thanks anyway.
ron.

Article: 16568
Subject: Rom use, Renoir => leonardo => maxplus2
From: "ron van smoorenburg" <r.smoorenburg@pbc.bc.philips.com>
Date: Fri, 28 May 1999 17:20:48 +0200
Links: << >> << T >> << A >>

hi

i want to implement a some rom in a flex10k. Therefore i have written vhdl
code. When i want to compile this code by maxplus2, he gives the
following message, during the database build.=>  Error : node
'|timing_control:I1|timing_cntl:I0|:10676.IN1' missing source
And he repeats it 91 times. Can anyone tell whether this is usefull
information and how i should read it.

I have tried to overcome the problem by compiling the vhdl files with
leonardo spectrum, this program gives an edif file which i want to read with
maxplus2. then i get the next error during the compiler netlist extractor
phase.
=>Error :  Cann't find design file 'romd_dl_hex'
my first guess was that it wasn't in the correct directory, so i changed it,
then i noticed in the hierarchy display that he was looking for a .gdf file.
So i changed the name of the file to romd_dl_hex.gdf and maxplus2 shaw the
file and noticed it was an incorrected type(intel hex file).
So i figured out that during the compilation leonardo did something wrong,
so maxplus2 doesn't understand it anymore.
Does someone know an answer to one these questions?

Thanks anyway.
ron.

Article: 16569
Subject: Rom use, Renoir => leonardo => maxplus2
From: "ron van smoorenburg" <r.smoorenburg@pbc.bc.philips.com>
Date: Fri, 28 May 1999 17:20:48 +0200
Links: << >> << T >> << A >>

hi

i want to implement a some rom in a flex10k. Therefore i have written vhdl
code. When i want to compile this code by maxplus2, he gives the
following message, during the database build.=>  Error : node
'|timing_control:I1|timing_cntl:I0|:10676.IN1' missing source
And he repeats it 91 times. Can anyone tell whether this is usefull
information and how i should read it.

I have tried to overcome the problem by compiling the vhdl files with
leonardo spectrum, this program gives an edif file which i want to read with
maxplus2. then i get the next error during the compiler netlist extractor
phase.
=>Error :  Cann't find design file 'romd_dl_hex'
my first guess was that it wasn't in the correct directory, so i changed it,
then i noticed in the hierarchy display that he was looking for a .gdf file.
So i changed the name of the file to romd_dl_hex.gdf and maxplus2 shaw the
file and noticed it was an incorrected type(intel hex file).
So i figured out that during the compilation leonardo did something wrong,
so maxplus2 doesn't understand it anymore.
Does someone know an answer to one these questions?

Thanks anyway.
ron.

Article: 16570
Subject: Rom use, Renoir => leonardo => maxplus2
From: "ron van smoorenburg" <r.smoorenburg@pbc.bc.philips.com>
Date: Fri, 28 May 1999 17:20:48 +0200
Links: << >> << T >> << A >>

hi

i want to implement a some rom in a flex10k. Therefore i have written vhdl
code. When i want to compile this code by maxplus2, he gives the
following message, during the database build.=>  Error : node
'|timing_control:I1|timing_cntl:I0|:10676.IN1' missing source
And he repeats it 91 times. Can anyone tell whether this is usefull
information and how i should read it.

I have tried to overcome the problem by compiling the vhdl files with
leonardo spectrum, this program gives an edif file which i want to read with
maxplus2. then i get the next error during the compiler netlist extractor
phase.
=>Error :  Cann't find design file 'romd_dl_hex'
my first guess was that it wasn't in the correct directory, so i changed it,
then i noticed in the hierarchy display that he was looking for a .gdf file.
So i changed the name of the file to romd_dl_hex.gdf and maxplus2 shaw the
file and noticed it was an incorrected type(intel hex file).
So i figured out that during the compilation leonardo did something wrong,
so maxplus2 doesn't understand it anymore.
Does someone know an answer to one these questions?

Thanks anyway.
ron.

Article: 16571
Subject: Re: FPGA express + VHDL: strange SR implementation?
From: Rickman <spamgoeshere4@yahoo.com>
Date: Fri, 28 May 1999 11:30:14 -0400
Links: << >> << T >> << A >>

micheal_thompson@my-deja.com wrote:
> 
> Hi
> I'm using a fairly text-book ( I think) piece of code here to get an SR
> latch:
>     busy_sr_reg: process(set_tx_busy, clr_tx_busy)
>     begin
>                 if (set_tx_busy = '1') then
>                         tx_busy <= '1';
>                 elsif (clr_tx_busy = '1') then
>                         tx_busy <='0';
>                 end if;
>     end process busy_sr_reg;
> 
> My target FPGA is an ACTEL A3200DX. After FPGA express optimised it and
> pumped out an EDIF file ready for ACTEL I went back and converted that
> same EDIF into a schematic representation.
> All the synchronous stuff (not listed here)looks fine but the SR
> implementation seems a tad dodgy. I had expected to see it implemented
> using the set and pre-set inputs of a DFF but instead it uses a latch:
> The 'D' pin is connected to 'set' and the 'G(ate)'(active lo) is
> connected to the 'NOR' of 'set' and 'clear'. Conceivably then if the
> Nor-gate path had a large delay you could have a situation where the
> falling edge of set will inadvertently clear the output?
> 
> So, I'm asking the following:
> 
> Is schematic viewing at this point unrepresentative of the actual
> (fitted/ layed out) circuit? In other words should I wait until the
> vendor tool (which I don't have yet) does its stuff as maybe it will
> put its own spin on the circuit?

I can't answer your question directly since I don't have or know the
Actel tools. But I would expect that the schematic you are viewing is
representative of the gates listed in the EDIF file. 

I also agree that this is not a good implementation. But this is what
the code describes. To tell you the truth, I don't think what you have
defined can really be an SR latch. Your code does not specify what
happens when both S and R are asserted. An SR latch always has a second
internal node which is what makes it operate differently from a latch.
You have specified the behavior of just the one output. The tool tried
to give you what you asked for. 

Since an SR flipflop is really just cross coupled combinatoral logic and
not sequential (no clock), you might try defining it as such. I can't
remember if there is any prohibition to using a feedback loop in the
concurrent section of VHDL. 

tx_busy <= set_tx_busy or not not_tx_busy;
not_tx_busy <= clr_tx_busy or not tx_busy; 

I believe this code describes the cross coupled gates of an SR latch. I
haven't banged any VHDL code for a few months, so I am getting a little
rusty. Please check it and see if it produces better logic. 

You pointed out the race condition of the generated latch. a cross
coupled gate version also has a race condition when both inputs are
asserted and released at the same time. You can't tell what the final
state will be. 

-- 

Rick Collins

rick.collins@XYarius.com

remove the XY to email me.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 16572
Subject: Re: RAM for external/internal use
From: Rickman <spamgoeshere4@yahoo.com>
Date: Fri, 28 May 1999 11:39:49 -0400
Links: << >> << T >> << A >>

Garrick Kremesec wrote:
> 
> Hello,
> 
>    I am very new to FPGA vhdl programming, but I have done a lot of work
> with PLD (yes, big diff. in use, style, difficult, etc.).  Anyway,
> starting from scratch, I am using Max Plus 9.23 and working with an
> EPF10K10 FPGA.
> 
> I have most of my microcontroller control lines, several address lines,
> and my 16 bit data bus connected to the FPGA.  I would like a memory
> block (64 words of data) that I can read or write to using the
> microcontroller, but I can also work with internally.
> 
> For instance, I have an ADC connected to the FPGA which will be flipping
> through a mux and reading values and I want them stored in these memory
> locations.
> 
> Also, I have to generate 8 PWM signals.  I want the microcontroller to
> be able to send configuation bytes with information on duty cycle.
> 
> I can't figure out how to do this.  The ram/rom packages I could find
> all appear to not allow other internal access simulateneously of this
> memory:
> 
> I thought I would have to set up dual port memory, but the only dual
> port ram component Altera has places the input and output of BOTH ports
> on seperate lines!  Thats 64 data lines for a 16 bit, dual port ram.
> Perhaps I'm not looking in the right places.  And perhaps there is a way
> to combine the data lines into one I/O/tristate internally and
> externally.  Any help will be greatly appreciated.

I believe you are looking, not for a dual port ram, but a ram with
shared address and data buses. A dual port memory can be accessed at the
same time by both parties. If you need the simultaineous access, then
you have to have two sets of busses. In effect it is two memories that
always have the same contents. 

If you don't need to access the ram from two ports at the same time, you
just put a mux on the address bus and either mux the data in or go with
a tristate bus. The controls have to have an arbitor to control access.
This can be very simple and just allows one device at a time to select
whether it is a read or a write cycle. 

The tristate bus is simple. You hook the data inputs to the ram directly
to the tristate bus and hook the data outputs to the bus through
tristate buffers. Do the same on the two units accessing the ram.

Is that any more clear?

-- 

Rick Collins

rick.collins@XYarius.com

remove the XY to email me.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 16573
Subject: Re: FPGA express + VHDL: strange SR implementation?
From: "Stephen J. Byrne" <skbyrne@mindspring.com>
Date: Fri, 28 May 1999 12:20:34 -0400
Links: << >> << T >> << A >>

I have implemented many RS flip-flops in the Actel A32200DX family of parts
without problems.  However, they were synchronous flip-flops, not latches.
The description was much the same as yours with a synchronous clock and
asynchronous reset:

sr: process(clk, rst_n)
begin

    if (rst_n = '0') then
        tx_busy <= '0';
    elsif (clk'event and clk = '1') then
        if (set_tx_busy = '1') then
            tx_busy <= '1';
        elsif (clr_tx_busy = '1') then
            tx_busy <= '0';
        end if;
    end if;

end process;

It shouldn't surprise you that your code (below) would synthesize to a
latch as that's exactly what you specified.  You've written a combinational
process where all possible input variations are not specified, therefore, a
latch is inferred.  The synthesis tool won't make use of a DFF which is a
synchronous device for a non-synchronous process.

busy_sr_reg: process(set_tx_busy, clr_tx_busy)
    begin
                if (set_tx_busy = '1') then
                        tx_busy <= '1';
                elsif (clr_tx_busy = '1') then
                        tx_busy <='0';
                end if;
    end process busy_sr_reg;

To answer your final question, it's is definitely worthwhile to view the
edif netlist in schematic form.  While layout tools might remove redundant
logic and optimize timing paths,  they cannot correct a misinterpretation
of your code as specified in the edif netlist.

Steve

>

Article: 16574
Subject: Re: RAM for external/internal use
From: Garrick Kremesec <gkremese@ews.uiuc.edu>
Date: Fri, 28 May 1999 11:25:45 -0500
Links: << >> << T >> << A >>

> I believe you are looking, not for a dual port ram, but a ram with
> shared address and data buses. A dual port memory can be accessed at the
> same time by both parties. If you need the simultaineous access, then
> you have to have two sets of busses. In effect it is two memories that
> always have the same contents.

Well, if I have separate processes working continuously (as quickly as
possible) generating pulses (which have to compare state against what is
requested by the user in ram) and reading many ADC channels (and writing
them to ram), I think I will be accessing memory simultaneously.  I
didn't think 64 words of dual port data was too much to ask out of a
EPF10K10, but maybe I'm quite wrong.
 
> If you don't need to access the ram from two ports at the same time, you
> just put a mux on the address bus and either mux the data in or go with
> a tristate bus. The controls have to have an arbitor to control access.
> This can be very simple and just allows one device at a time to select
> whether it is a read or a write cycle.
> 
> The tristate bus is simple. You hook the data inputs to the ram directly
> to the tristate bus and hook the data outputs to the bus through
> tristate buffers. Do the same on the two units accessing the ram.
>
> Is that any more clear?
> 
> --
> 
> Rick Collins
> 
> rick.collins@XYarius.com

Will I be able to redirect the output to the same bus used as input
(defined as inout) using a separate process on the FPGA, or do you mean
an external tristate buffer is required on separate output lines?  That
just seems excessively wasteful of the I/O pins.

I have used several dual port rams that have only two (not four) busses
that are I/O/tristate.

Here is the only Altera component for dual port ram:

component csdpram
   generic ( LPM_WIDTH: POSITIVE;
             LPM_WIDTHAD: POSITIVE;
             LPM_NUMWORDS: POSITIVE );
   port ( dataa: in STD_LOGIC_VECTOR(LPM_WIDTH-1 downto 0);
          datab: in STD_LOGIC_VECTOR(LPM_WIDTH-1 downto 0);
          addressa: in STD_LOGIC_VECTOR(LPM_WIDTHAD-1 downto 0);
          addressb: in STD_LOGIC_VECTOR(LPM_WIDTHAD-1 downto 0);
          clock: in STD_LOGIC;
          clockx2: in STD_LOGIC;
          wea: in STD_LOGIC;
          web: in STD_LOGIC;
          qa: out STD_LOGIC_VECTOR(LPM_WIDTH-1 downto 0);
          qb: out STD_LOGIC_VECTOR(LPM_WIDTH-1 downto 0);
          busy: out STD_LOGIC );
end component;

I see writes selects, address, data in, data out, and busy, but where is
read? I the read line essentially the clockx2 signal???  I think I need
a better set of reference material.

Thanks a lot for your help/time.

Garrick

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search