Messages from 19100

Article: 19100
Subject: Re: VHDL vs. schematic entry
From: Greg Neff <gregneff@my-deja.com>
Date: Mon, 29 Nov 1999 14:42:30 GMT
Links: << >> << T >> << A >>

In article <384034D5.2138D838@yahoo.com>,
  Rickman <spamgoeshere4@yahoo.com> wrote:
>
> I guess this is the ultimate yawn in a newsgroup. The original topic
of
> discussion is so uninteresting that the topic changes and the thread
> continues on without anyone even acknowledging the fact.
>
> I guess that is the answer to my question. Few engineers are even
> interested enough in the Lucent Orca parts to even discuss why they
> don't use them??!!
>

My fault, due to my lack of experience with deja.com.  I thought that
changing the subject when replying under "power post" would start a new
thread.  Instead, I inadvertently hijacked your thread.  After that I
started a new thread with my subject, but everyone seemed to stick with
this thread.  Sorry for the mixup.

--
Greg Neff
VP Engineering
*Microsym* Computers Inc.
greg@guesswhichwordgoeshere.com

Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 19101
Subject: Re: Xilinx Virtex design (xcv-800) into production
From: John F Gostomski <jfg@nospam.com>
Date: Mon, 29 Nov 1999 15:29:41 GMT
Links: << >> << T >> << A >>

We are nearing the end of the hardware development for a project that uses
the xcv-1000-bg560.  The tools used were :

   VCS for verilog simulation;
   Synplify for verilog synthesis;
   Alliance M2.1i for place and route.

VCS is a bit pricey but we were fortunate to have it under maintenance from
a past ASIC project.  All in all I was pleased with the EDA tools (a first).
Everything went fairly smooth with very little surprises, although every
design brings with it a new problem set.

arafeeq@my-deja.com wrote:
> Hello all!

> Has anyone put the xilinx's virtex (xcv-800-bga432) device into
> Production. If yes, what was the EDA tools flow used. like..

> Verilog/vhdl/synplify/or fpga-express or fpga compiler II or leonardo
> /alliance tools etc...

> I appreciate the answers,

> Best Regards,

> Abdul Rafeeq.

> Sent via Deja.com http://www.deja.com/
> Before you buy.

Article: 19102
Subject: Re: FPGA vs DSP vs PENTIUM MMX
From: Ray Andraka <randraka@ids.net>
Date: Mon, 29 Nov 1999 11:36:55 -0500
Links: << >> << T >> << A >>

If the pentium outperformed an FPGA doing a 2D convolution, then whomever did
the design for the FPGA wasn't taking advantage of parallelism.  See my paper
entitled "FGPA makes a radar signal processor on a Chip" for a discussion of how
these types of things are done in FPGAs.  The paper discusses, among other
things,  a complex 256 tap matched filter running at a 5 MHz sample rate.  The
design discussed is doing roughly 10 Billion with a 'B' multiplications per
second.  Thats more 2 orders of magnitude more performance than you'll get out
of a pentium.

George wrote:

> Dear All,
>
> I am willing to do a performance analysis of FPGAs, DSPs and Pentium III MMX
> microprocessors for highly  parallel DSP applications such us Image
> Processing. I am interested in particular in the use of MMX technology in
> PENTIUM III general purpose microprocessors. With clock frequencies reching
> 500 MHz, I may expect them to outperform both FPGAs and DSP in some
> applications (e.g.  2D convolution). Has anybody done a similar case study?
> Do you know any valuable references on this issue?
>
> Any comment will be highly appreciated.
>
> Thanks in advance.
>
> G.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19103
Subject: Re: MACH445 - parallel port programming cable
From: sw@spamtrap.cellware.de (Stefan Wimmer)
Date: Mon, 29 Nov 1999 17:24:31 GMT
Links: << >> << T >> << A >>

In article <383BDC4E.31D2A09A@fokus.gmd.de>, Guido Pohl <pohl@fokus.gmd.de> wrote:
>I'am searching for the pin-out of a programming cable used for a MACH445, i.e.
>to program it from a PC parallel port ...
>I couldn't find any information to this topic - is it a secret 8-?

No, it isn't.
You can find a schematic in the ol' VANTIS MACH ISP Manual on page A-2

>I know that AMD's MACH is nowadays a M4-128/64 from Lattice.

Lattice should have similar doku.

-- 
Stefan Wimmer                        Cellware Broadband
Email   sw@cellware.de               Justus-von-Liebig-Str. 7
WWW     http://www.cellware.de/      12489 Berlin, Germany

Visit my private Homepage:  Love, Electronics, Rockets, Fireworks!
http://www.geocities.com/CapeCanaveral/6368/

Article: 19104
Subject: Re: FPGA vs DSP vs PENTIUM MMX
From: Steven Derrien <sderrien@irisa.fr>
Date: Mon, 29 Nov 1999 19:42:58 +0100
Links: << >> << T >> << A >>

Ray Andraka wrote:

> If the pentium outperformed an FPGA doing a 2D convolution, then whomever did
> the design for the FPGA wasn't taking advantage of parallelism.  See my paper
> entitled "FGPA makes a radar signal processor on a Chip" for a discussion of how
> these types of things are done in FPGAs.  The paper discusses, among other
> things,  a complex 256 tap matched filter running at a 5 MHz sample rate.  The
> design discussed is doing roughly 10 Billion with a 'B' multiplications per
> second.  Thats more 2 orders of magnitude more performance than you'll get out
> of a pentium.

Such speed-up might not always happen especially when you consider problem with a
high communication over computation ratio (3x3 2D Convolution for example). Then,
the level of  "usable" parallelism might be contrained by off-chip bandwidth as
anyhow execution time is bounded by communication time (time to transfer dat in and
off chip).
Using a very very rough approximation you can express an upper bound for the
effective parallelism in the FPGA using

Pmax=(Computation_Volume*Computation_Time)/(Communication_Volume*Communication_Time);

If you try to acccelerate your photoshop by  implementing your 3x3 convolution
routines on a PCI base FPGA board for example, your maximum parallelism will be
(assuming a virtex design perfomring a 8bit  MAC at 100Mhz, on a HxW image, with a
PCI at full burst delevering 4x8 bits every 30ns)

Pmax=((H*W*3*3)*10)/(2*H*W*30/4)=6

Which means 10/6=1.3ns for performing a 8 bit MAC operation. Using your PIII@600Mhz,
as MMX can process 8 pixel per MMX intruction your optimal peak performance will be
2/8=0.25 ns  . This peak performance is unrealistic as we don't consider loading and
unloading data from MMX registers, in practice it shoul be be between two or three
time more, which still matches (or even beat) FPGA performances...

Note : I should probably also consider communictaion from main memory and cache
misses for MMX version , which might actually worsen execution time, but not by a
strong factor I think.

Hence it's not so much a matter of  'how you design' rather than a matter of picking
the right  application , especially one that provide a good computation over
communicatio ratio. As an example implementing a 9x9 2d convolution on the same FPGA
wouldl certainly provide a huge speed up.

Steven

Article: 19105
Subject: Using Altera to pipeline a CLA adder
From: "Milliwave" <milliwave@rfengineering.freeserve.co.uk>
Date: Mon, 29 Nov 1999 19:02:47 -0000
Links: << >> << T >> << A >>

Hello,

I'm trying to pipeline 4 bit CLA where I'm only interested in the Cout4 and
another 4bit CLA,
The problem is that I'm simply running out of DFF's and was hoping if
someone could shed some
light on how I can construct the CLA with Cout4 with 3 -stages of pipeling?
and a complete 4bit CLA
with 3 stages of pipelining

managed to pipeline a riple carry adder. The issue which is putting me on
the back foot
is the presence of combinatorial logic between the full adder stages
(example would be a 4 bit CLA)
So, can any intellect out there spare some time explaining how I might
achieve pipelining a 4 bit
carry look ahead adder?

One last query the carry propagate signal are some time defined as A+B  and
sometime   A XOR B  why? The generate is always defined as A.B

Article: 19106
Subject: Re: VHDL vs. schematic entry
From: "Keith Jasinski, Jr." <jasinski@mortara.com>
Date: Mon, 29 Nov 1999 13:12:09 -0600
Links: << >> << T >> << A >>

I use a mix of schematic and Verilog.  I put state machines in Verilog and
most other stuff in schematic.  Verilog (or VHDL) cannot be beat for state
machines.  Easy to code, easy to dianose, easy to change in seconds instead
of hours to re-design the machine.  The more complex the state machine, the
greater the time savings.

I plan to use Verilog for more in the future, but I don't have enough
confidence yet that what I put in is putting out what I want.

--
Keith F. Jasinski, Jr.
kfjasins@execpc.com
Greg Neff <gregneff@my-deja.com> wrote in message
news:81cjav$opg$1@nnrp1.deja.com...
> In article <38398D1C.A7B9E445@ids.net>,
>   Ray Andraka <randraka@ids.net> wrote:
> > Don Husby wrote:
> >
> (snip)
> > >
> > > I agree that schematics are still the best way to enter a design.
> I just
> > > thought I would beat my head against the VHDL wall one more time
> before
> > > going back to schematics.
> >
> > I've been using, no beating my head against the wall, with VHDL
> lately too.
> > I'm doing it for two reasons: First I have some customers who bought
> the VHDL
> > thing hook, line, and sinker (try to convince them they're wrong!),
> and for my
> > own stuff because it allows me to parameterize functions pretty
> easily.  I'm
> > beginning to wonder if I'll ever see the return on the design
> investment for
> > those parameterized thingies though.
> >
> (snip)
>
> I having been using schematic entry for FPGAs, probably because I have
> been drawing schematics since before the days of PALs, let alone
> FPGAs.  I am now considering taking the leap to VHDL entry, but I am
> not convinced that there will be a benefit in either time to design or
> design quality.  The above comments seem to be indicative of those like
> myself, who are highly skilled at schematic entry for FPGAs.
>
> I'm not talking about a situation where a team of engineers is
> designing a mega-gate FPGA.  I am more interested in small to mid-range
> (say, up to 100K gate) designs that are being entered and maintained by
> one person.
>
> I would be interested to hear from those people that have gone through
> the VHDL learning curve.
>
> Has the move to VHDL reduced design entry time?
> Has the design quality improved (fewer problems)?
> Is design debugging easier?
> Is design maintenance easier?
> Is design reuse easier?
> Did you get to the point where VHDL is more efficient than schematics?
> If so, how long did it take to get to this point?
> Bottom line: Was it worth it?
>
> I would like to hear from Don and Ray, to see if they consider
> themselves to be still on the learning curve, or if they truly think
> that VHDL is not worth the hassle.
>
> --
> Greg Neff
> VP Engineering
> *Microsym* Computers Inc.
> greg@guesswhichwordgoeshere.com
>
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.

Article: 19107
Subject: Re: HDL editor?
From: "B. Joshua Rosen" <bjrosen@polybus.com>
Date: Mon, 29 Nov 1999 14:17:09 -0500
Links: << >> << T >> << A >>

Xemacs, get it at http://www.xemacs.org

"Ahmad A." wrote:
> 
> Hi..
> Can any one tell me where can I find Free, Student edition, or Shareware HDL
> editor?
> 
> Thank you in advanced.
> Ahmad.

Article: 19108
Subject: Re: Anybody using Lucent OR3TP12?
From: Eric Crabill <crabill@xilinx.com>
Date: Mon, 29 Nov 1999 11:27:59 -0800
Links: << >> << T >> << A >>

Hi Bruce,

The Xilinx PCI64 and PCI32 cores do not require FIFOs that back up
during a bus transfer.  Depending on the type of transfer, we may require
the user to back up the FIFO when the bus transaction terminates.

The situation you describe is never an issue when the core is the target of
a write or the initiator of a read.  In cases where the core is the target
of a read, or the initiator of a write, it can occur if the other bus agent
inserts wait states in the middle of a burst.

Our implementation contains the necessary "shadow" registers as a buffer
for the cases where it can be an issue.  In situations where the core needs
to use this buffered data, it does so automatically and transparent to the user.

In cases where it can be in issue, the "shadow" registers may still hold valid
data at the end of a transfer, depending on how the transfer terminated.
We ask the user to back up their FIFO at this time.  There are two main
reasons for this:

1.  It forces a FIFO state consistent with what took place on the bus.
2.  It allows the next transfer on the user side to be unrelated to the first.

The largest issue is item two.  Many designs have more than one target
address space (multiple base address registers) and have more than one
"channel" as an initiator.  To assume otherwise would seriously limit the
flexibility of our implementation.

Hope that clarifies,
Eric Crabill

Bruce Nepple wrote:

> One thing you might look at when you consider PCI cores is whether you need
> a "backup fifo" or is it implemented in the core.  When you get a late
> TRDY-false does the core save the data in a hidden register or do you have
> to backup your fifo pointer to send it (there is no way you will be able to
> stop the fifo from advancing since the signal comes so late).  My impression
> is that Xilinx requires a backup fifo.

Article: 19109
Subject: Re: FPGA vs DSP vs PENTIUM MMX
From: tronsmith@my-deja.com
Date: Mon, 29 Nov 1999 20:04:07 GMT
Links: << >> << T >> << A >>

George,

  Be sure to include in your comparison figures for
size and power as well. You will find that, in addition
to a dedicated circuit in an FPGA having orders of
magnitude better performance than one based on general
purpose or even DSP, the FPGA implementation may
consume an order of magnitude less power, and
probably considerably less space.

 A Pentium or other processor contains so much extra
circuitry dedicated to making it a good performer at
many different tasks, while an FPGA implementation
can be a fantastic performer at a single task, as all
resources are directed towards that one task. For example,
speculative branch execution logic is of little no use
when performing well determined calculations as found in
most signal processing tasks. Yet it is still there in
a GPP. Most cache has no use in a well designed pipelined
circuit, or may be built into the pipeline in the places
where it does the most good in a dedicated FPGA circuit.
In a GPP, the cache sits in one place, whether it is
needed or not.

So please, if you do such a comparison, consider not
only raw performance, but other important metrics such
as power/performance and the related space/performance.

Ray's paper doesn't mention it, but I'll bet his radar
processor consumes less than 1/3 of the power that a
DSP or GPP implementation would use.

- John

In article <3842ABA7.A27C813@ids.net>,
  Ray Andraka <randraka@ids.net> wrote:
> If the pentium outperformed an FPGA doing a 2D convolution, then
whomever did
> the design for the FPGA wasn't taking advantage of parallelism.  See
my paper
> entitled "FGPA makes a radar signal processor on a Chip" for a
discussion of how
> these types of things are done in FPGAs.  The paper discusses, among
other
> things,  a complex 256 tap matched filter running at a 5 MHz sample
rate.  The
> design discussed is doing roughly 10 Billion with a 'B'
multiplications per
> second.  Thats more 2 orders of magnitude more performance than you'll
get out
> of a pentium.
>
> George wrote:
> > Dear All,
> > I am willing to do a performance analysis of FPGAs, DSPs and Pentium
III MMX
> > microprocessors for highly  parallel DSP applications such us Image
> > Processing. I am interested in particular in the use of MMX
technology in
> > PENTIUM III general purpose microprocessors. With clock frequencies
reching
> > 500 MHz, I may expect them to outperform both FPGAs and DSP in some
> > applications (e.g.  2D convolution). Has anybody done a similar case
study?
> > Do you know any valuable references on this issue?
> >
> > Any comment will be highly appreciated.
> >
> > Thanks in advance.
> >
> > G.
>
> --
> -Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email randraka@ids.net
> http://users.ids.net/~randraka
>
>

Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 19110
Subject: Re: ClearLogic Vs. Altera
From: "Scott I. Chase" <chase@clear-logic.com>
Date: Mon, 29 Nov 1999 13:09:34 -0800
Links: << >> << T >> << A >>

Pat wrote:

>         Does anyone have any info' about the Clearlogic Vs. Altera
> bust-up ? I was thinking of using ClearLogic because the cost saving is
> quite dramatic but, if they're going to suddenly withdraw their service
> 'cos Altera have successfully sued them I'll be in deep Do-Do's.
>
>         Anyone know anything.
> --
> Pat

Pat,

To start, I would like to refer you to the Clear Logic press release that
responds to your question:

http://www.clear-logic.com/pressrelease/11-18-99.htm

However, let me add the following personal observations:

From what I have heard of the complaint, it is a huge long-shot, a desperate
move by Altera, with roughly zero probability of success. Still, it's not a bad
strategy, since in the end, each side will have burned up N dollars on lawyers,
which obviously stings Clear Logic more than Altera. But put us out of business?
Not a chance. We easily have the financial strength to weather the costs of the
legal defense.

Look at it this way. If Altera could beat us on price, performance or quality,
they would.

Instead they have taken us to court. You are nobody in the Silicon Valley if you
have not been sued. And every successful startup in the Silicon Valley is
eventually sued by the competition. It's all just a part of growing up.

Clear Logic is here to stay.

DISCLAIMER: I am, of course, speaking for myself, and not my employer. They
would not be so foolish as to even entertain the possibility of considering the
option of allowing me to speak for them. Please refer to the Clear Logic
website, www.clear-logic.com for official statements from management.

--Scott Chase
   Senior Applications Engineer
   Clear Logic, Inc.

Article: 19111
Subject: Re: Xilinx FPGA Editor guessing games solved!
From: Steve Dewey <steve@s-dewey123.demon.co.uk>
Date: Mon, 29 Nov 1999 21:41:41 +0000
Links: << >> << T >> << A >>

Isn't it interesting that Peter Alkfe, who regularly gives advice and
explains the reasoning behind certain xylinx decisions always goes very
quiet when there is a discussion that highlights the shortcomings of the
software xylinx sells ?


    In article <01bf395a$df27ca70$207079c0@drt1>, Austin Franklin
<austin@darkroom098.com> writes
>> I would guess by the tone of your message that you are pretty frustrated
>> over this. 
>
>Frustrated?  HA!  That's an understatement!
>
>And they ask me the most STUPID question they can possibly every ask "Why
>do you want to use that tool anyway?...NO ONE uses it".  DAMN does that
>piss me off.  I need to use THAT tool because with it, I can fill in the
>blanks that THEIR documentation leaves out...like what is the best IOB to
>use for the RESET input, where the hell IS the upper right and left corner
>of the die, with relation to the package...and just WHAT did the tools do
>to my logic?
>
>These, and many other questions can be answered with this tool...but their
>attitude is, "hey, the OTHER tools work just fine, so you don't really need
>that tool, Oh, and by the way, NO ONE uses it anyway"...GRRRRRRRR.
>

-- 
Steve Dewey
remove 123 to email.

Article: 19112
Subject: Re: Configuration of ALTERA EPC2LC20 Please help!
From: ying@soda.CSUA.Berkeley.EDU (Ying C.)
Date: 29 Nov 1999 21:53:27 GMT
Links: << >> << T >> << A >>

Because the JTAG pins are dedicated pins on 10K, they do not interfere with JTAG programming.
What Martin is refering to is probably the case where a blank EPC2 and Flex 6K are in the same JTAG chain. 
Because nConfig on 6K is most likely tied to a pull-up, when the board powers up, nConfig goes high and the 6K
enters the configuration mode. With EPC2 unprogrammed, the 6K can't get configured (assuming that the 6K is
designed to get it configuration bit stream from the EPC2) and remains in the
configuration mode. Since the JTAG pins on 6K are dual-purpose pins, they are tri-stated while the
6K is in the configuration mode. With the JTAG ppins on 6K tristated, the JTAG chain is effectively broken.
Thus, to be able to ISP the blank EPC2 via JTAG, nConfig of 6K needs to be pulled low so that the 6K is not in the
configuration mode. Alterantively, you can pre-programmed the EPC2 so that the 6K can exit the configuration
mode, enter the user mode and allow JTAG pins to operate.

As for the "Unrecognized Device or Socket is empty problem", well, I am not sure. It could be a sw version problem
(get the latest ASAP2 from Altera and try again). You might also want to check the voltage level and look for any
noise.

Ying

In article <383eaf14.9384960@news.freeserve.net>,
 <martin@the-thompsons.freeserve.co.uk> wrote:
>Hi Volker.
>
>If the 10K10 is like the 6016 I used once, you need to pull down the
>nCONFIG line (I think) the first time you program (and if you ever
>corrupt the EPC2) otherwise it tristates the JTAG I/Os until its knows
>what they are configured as!  I put a jumper on my board for this
>purpose, there may be one on your EV board.
>
>Altera have an App note on this somewhere, try a search on their
>website (http://www.altera.com surprise surprise :).  If you can't
>find it, let me know and when I get into work I'll check what I did
>last time.
>
>Cheers,
>	Martin
>
>On Wed, 24 Nov 1999 20:52:43 -0800, Volker Kalms
><ea0038@uni-wuppertal.de> wrote:
>
>>Hi all,
>>
>>Since a quarter of a year I discover the beatiful world of
>>AHDL and VHDL. Until now everithing worked fine. But now I would
>>be very grateful for a little help.
>>
>>Lately I got an FPGA evaluation board (DIGILAB 10K10, manufactured
>>by Ing. Buero Lindmeier) in my hands. This evaluation board contains
>>an ALTERA EPF 10K10LC84-4. To configure this FLEX device I use the
>>ALTERA MAX+plusII (v 9.1) software......no problem to this point.
>>
>>Two weeks ago I purchased an configuration EPROM (EPC2LC20), which
>>could optional plugged into my evaluation board.I set up the MAX plus
>>JTAG chain due to the requirements (as far as I would say), performed
>>an JTAG Chain Info in the Multi-Device JTAG Chain Setup and MAX plus
>>detected the additional device in the JTAG Chain.
>>But when I try to Program the .pof file to the EPROM I get the message:
>>Unrecognized device or socked empty.
>>
>>
>>What am I doing wrong????? From my point of view I changed nearly every
>>parameter in the MAX plus setup. 
>>
>>I hope there is somebody out there, who could give me a hint how to get 
>>this EPROM configured.
>>
>>
>>MANY THANKS IN ADVANCE!!!
>>
>>Best regards,
>>
>>Volker
>
>Martin Thompson
>martin@the-thompsons.freeserve.co.uk
>http://www.the-thompsons.freeserve.co.uk/

Article: 19113
Subject: Re: VHDL vs. schematic entry
From: s_clubb@NOSPAMnetcomuk.co.uk (Stuart Clubb)
Date: Mon, 29 Nov 1999 22:07:02 GMT
Links: << >> << T >> << A >>

Thanks Evan,

There I was, thinking I could avoid this thread :-)

Tom Hill at Exemplar used my original example (mine was rather
verbose, but I thought it useful) and created the Exemplar app note
now available on the web site. Something was rumored to be in XCell
but I might have missed it.

Now that the area location constraints work in M2.1 (and I can put
them in already using Spectrum) I can begin work on the ASIC-like
floor-planning and incremental compile methodology in Spectrum, which
will make life a lot easier on the big structured design front.(did
somebody say DSP?)

I've put the note up on my personal webspace (oh, if only I had tome
for a real website). Those interested can find it at:

http:\\www.netcomuk.co.uk\~s_clubb\increment.zip

Cheers
Stuart

On Sun, 28 Nov 1999 15:34:09 GMT, eml@riverside-machines.com.NOSPAM
wrote:

>I don't think this app note has got as far as Xilinx or Exemplar yet.
>I've copied this to Stuart and hopefully he can provide details on
>where to get it from.
>
>Evan
>

For Email remove "NOSPAM" from the address

Article: 19114
Subject: Re: FPGA vs DSP vs PENTIUM MMX
From: Ray Andraka <randraka@ids.net>
Date: Mon, 29 Nov 1999 20:22:45 -0500
Links: << >> << T >> << A >>

In the case of a 3x3 convolution, the FPGA can still significantly outperform the
Pentium, and it uses a lower clock frequency and much less power.  The secret here is
that the FPGA can perform all the multiplies in parallel, while the pentium does at best
one multiply per clock.  Use of on chip memory or a dedicated external memory buffer
allows the FPGA to process each pixel as it arrives without having to repetitively fetch
the surrounding pixels.  In many cases, the pixel rate is even slow enough that the FPGA
can process the data serially.  A 640x480 image at 60 frames/sec has a pixel rate of only
18 MHz.  If the pixels are 8 bits, then you can condense the hardware considerably by
working on two bits at a time at a system clock of 73 MHz.  Yes, I've done this, and a
3x3 takes up a very small area - less than 100 CLBs.  Expanding it to a larger 2D
convolution takes more area and more local line buffer memory (may have to be wider for
the I/O bandwidth if it is off-chip), but has no real impact on the pixel
rate...something a microprocessor implementation can't claim.

The power savings are also considerable.  I didn't put the power savings in my paper, as
I did not have measurements or estimates to cite.  Parallelism lets you use a lower clock
frequency, and purpose built logic keeps the gate count small.  It does use a different
design flow, and requires a different set of skills than a microprocessor based design.

Steven Derrien wrote:

> Ray Andraka wrote:
>
> > If the pentium outperformed an FPGA doing a 2D convolution, then whomever did
> > the design for the FPGA wasn't taking advantage of parallelism.  See my paper
> > entitled "FGPA makes a radar signal processor on a Chip" for a discussion of how
> > these types of things are done in FPGAs.  The paper discusses, among other
> > things,  a complex 256 tap matched filter running at a 5 MHz sample rate.  The
> > design discussed is doing roughly 10 Billion with a 'B' multiplications per
> > second.  Thats more 2 orders of magnitude more performance than you'll get out
> > of a pentium.
>
> Such speed-up might not always happen especially when you consider problem with a
> high communication over computation ratio (3x3 2D Convolution for example). Then,
> the level of  "usable" parallelism might be contrained by off-chip bandwidth as
> anyhow execution time is bounded by communication time (time to transfer dat in and
> off chip).
> Using a very very rough approximation you can express an upper bound for the
> effective parallelism in the FPGA using
>
> Pmax=(Computation_Volume*Computation_Time)/(Communication_Volume*Communication_Time);
>
> If you try to acccelerate your photoshop by  implementing your 3x3 convolution
> routines on a PCI base FPGA board for example, your maximum parallelism will be
> (assuming a virtex design perfomring a 8bit  MAC at 100Mhz, on a HxW image, with a
> PCI at full burst delevering 4x8 bits every 30ns)
>
> Pmax=((H*W*3*3)*10)/(2*H*W*30/4)=6
>
> Which means 10/6=1.3ns for performing a 8 bit MAC operation. Using your PIII@600Mhz,
> as MMX can process 8 pixel per MMX intruction your optimal peak performance will be
> 2/8=0.25 ns  . This peak performance is unrealistic as we don't consider loading and
> unloading data from MMX registers, in practice it shoul be be between two or three
> time more, which still matches (or even beat) FPGA performances...
>
> Note : I should probably also consider communictaion from main memory and cache
> misses for MMX version , which might actually worsen execution time, but not by a
> strong factor I think.
>
> Hence it's not so much a matter of  'how you design' rather than a matter of picking
> the right  application , especially one that provide a good computation over
> communicatio ratio. As an example implementing a 9x9 2d convolution on the same FPGA
> wouldl certainly provide a huge speed up.
>
> Steven

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19115
Subject: Re: VHDL vs. schematic entry
From: Ray Andraka <randraka@ids.net>
Date: Mon, 29 Nov 1999 20:33:32 -0500
Links: << >> << T >> << A >>

Keith Jasinski, Jr. wrote:

> I use a mix of schematic and Verilog.  I put state machines in Verilog and
> most other stuff in schematic.  Verilog (or VHDL) cannot be beat for state
> machines.  Easy to code, easy to dianose, easy to change in seconds instead
> of hours to re-design the machine.  The more complex the state machine, the
> greater the time savings.

Perhaps not, but in many cases you can do as well with a structured schematic.
I use wrappers around the basic components for 1-hot state machines so that in
the schematic the state machine looks like a flowchart.  It makes the SM easy
to grok, and entry and modifications are easy to do.  It also has the advantage
of making it easy to see how many levels of combinatorial logic will be needed.

For small encoded machines, you can use n:1 selectors with the select inputs
driven by the state machine registers.  The data inputs are tied to 0, 1,
control or not control  to direct the state machine in the next state.  The
tools will reduce the selector logic to a log2(n) or less input gate, and the
function can be read off the mux inputs.  It's a bit harder to follow, but
still not to bad.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19116
Subject: Re: FPGA vs DSP vs PENTIUM MMX
From: Steven Derrien <sderrien@irisa.fr>
Date: Tue, 30 Nov 1999 10:15:34 +0100
Links: << >> << T >> << A >>



Ray Andraka wrote:

> In the case of a 3x3 convolution, the FPGA can still significantly outperform the
> Pentium, and it uses a lower clock frequency and much less power.  The secret here is
> that the FPGA can perform all the multiplies in parallel, while the pentium does at best
> one multiply per clock.

MMX can perform 8 multiplication per clock cycle on 8 bit words (those used for image
processing) and moreover recent PIII clock speed are approx 4 times those of FPGA.

Anyhow, in my previous post I was considering Multplier working in parrallel.

> Use of on chip memory or a dedicated external memory buffer

> allows the FPGA to process each pixel as it arrives without having to repetitively fetch
> the surrounding pixels.

Right; but total execution time will always be bounded by the time it takes to transfer the
orginal image into the FPGA plus the time to transfer the resulting image out of the FPGA.

What I wanted to say is that in many cases the maximum speed-up that you can expect from an
FPGA solution is strongly limited by communication time.


> In many cases, the pixel rate is even slow enough that the FPGA

> can process the data serially. A 640x480 image at 60 frames/sec has a pixel rate of only

> 18 MHz.  If the pixels are 8 bits, then you can condense the hardware considerably by
> working on two bits at a time at a system clock of 73 MHz.

You could also do that using a 200 Mhz Pentium. FPGA won't provide any speed-up here because
again the limitating factor in execution time is bandwidth not processing power !

>  Yes, I've done this, and a
> 3x3 takes up a very small area - less than 100 CLBs.

So here you definitely gain in terms of area/complexity over MMX, but I think the initial
question was more about speed than "cost effectiveness".

> memory (may have to be wider for the I/O bandwidth if it is off-chip), but has no real
impact > on the pixel rate...something a microprocessor implementation can't claim.

> The power savings are also considerable.  I didn't put the power savings in my paper, as
> I did not have measurements or estimates to cite.  Parallelism lets you use a lower clock
> frequency, and purpose built logic keeps the gate count small.  It does use a different
> design flow, and requires a different set of skills than a microprocessor based design.

I agree ...

> Steven Derrien wrote:
>
> > Ray Andraka wrote:
> >
> > > If the pentium outperformed an FPGA doing a 2D convolution, then whomever did
> > > the design for the FPGA wasn't taking advantage of parallelism.  See my paper
> > > entitled "FGPA makes a radar signal processor on a Chip" for a discussion of how
> > > these types of things are done in FPGAs.  The paper discusses, among other
> > > things,  a complex 256 tap matched filter running at a 5 MHz sample rate.  The
> > > design discussed is doing roughly 10 Billion with a 'B' multiplications per
> > > second.  Thats more 2 orders of magnitude more performance than you'll get out
> > > of a pentium.
> >
> > Such speed-up might not always happen especially when you consider problem with a
> > high communication over computation ratio (3x3 2D Convolution for example). Then,
> > the level of  "usable" parallelism might be contrained by off-chip bandwidth as
> > anyhow execution time is bounded by communication time (time to transfer dat in and
> > off chip).
> > Using a very very rough approximation you can express an upper bound for the
> > effective parallelism in the FPGA using
> >
> > Pmax=(Computation_Volume*Computation_Time)/(Communication_Volume*Communication_Time);
> >
> > If you try to acccelerate your photoshop by  implementing your 3x3 convolution
> > routines on a PCI base FPGA board for example, your maximum parallelism will be
> > (assuming a virtex design perfomring a 8bit  MAC at 100Mhz, on a HxW image, with a
> > PCI at full burst delevering 4x8 bits every 30ns)
> >
> > Pmax=((H*W*3*3)*10)/(2*H*W*30/4)=6
> >
> > Which means 10/6=1.3ns for performing a 8 bit MAC operation. Using your PIII@600Mhz,
> > as MMX can process 8 pixel per MMX intruction your optimal peak performance will be
> > 2/8=0.25 ns  . This peak performance is unrealistic as we don't consider loading and
> > unloading data from MMX registers, in practice it shoul be be between two or three
> > time more, which still matches (or even beat) FPGA performances...
> >
> > Note : I should probably also consider communictaion from main memory and cache
> > misses for MMX version , which might actually worsen execution time, but not by a
> > strong factor I think.
> >
> > Hence it's not so much a matter of  'how you design' rather than a matter of picking
> > the right  application , especially one that provide a good computation over
> > communicatio ratio. As an example implementing a 9x9 2d convolution on the same FPGA
> > wouldl certainly provide a huge speed up.
> >
> > Steven
>
> --
> -Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email randraka@ids.net
> http://users.ids.net/~randraka

Steven

Article: 19117
Subject: AGP based FPGA board
From: Steven Derrien <sderrien@irisa.fr>
Date: Tue, 30 Nov 1999 13:23:26 +0100
Links: << >> << T >> << A >>

Hello,

I was wondering if there exists some AGP based FPGA board ?
Are there any commercial intereface chip like the one taht existe for
pci (PLX, AMCC) ?

Steven

Article: 19118
Subject: Re: VHDL vs. schematic entry
From: Brian Drummond <brian@shapes.demon.co.uk>
Date: Tue, 30 Nov 1999 13:15:20 +0000
Links: << >> << T >> << A >>

On Mon, 29 Nov 1999 22:07:02 GMT, s_clubb@NOSPAMnetcomuk.co.uk (Stuart
Clubb) wrote:

>Thanks Evan,
>
>There I was, thinking I could avoid this thread :-)
>
>I've put the note up on my personal webspace (oh, if only I had tome
>for a real website). Those interested can find it at:
>
>http:\\www.netcomuk.co.uk\~s_clubb\increment.zip

Or even 

http://www.netcomuk.co.uk/~s_clubb/increment.zip

Thanks!

- Brian

Article: 19119
Subject: Re: FPGA vs DSP vs PENTIUM MMX
From: Ray Andraka <randraka@ids.net>
Date: Tue, 30 Nov 1999 09:46:35 -0500
Links: << >> << T >> << A >>

I wasn't aware that the Pentium III has 8 multipliers.  I agree that the performance is limited
to the comm time.  One of the advantages of using the FPGA however, is that you can do more
processing before returning the data, especially if you have a local dedicated memory to use
too.  For a simple example, in edge detection you will probably use both a horizontal and
vertical Sobel operator, each of which is a 3x3 (or larger) 2 D convolution.  With the FPGA, you
can do both at once and combine them before returning the result.  As the algorithm becomes more
complicated, the FPGA shows greater gains, In most cases, it can also work at a considerably
lower clock rate, thereby reducing power.


--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 19120
Subject: Re: FPGA vs DSP vs PENTIUM MMX
From: Steven Derrien <sderrien@irisa.fr>
Date: Tue, 30 Nov 1999 16:22:23 +0100
Links: << >> << T >> << A >>

Ray Andraka wrote:

> I wasn't aware that the Pentium III has 8 multipliers.

It is actualy part of the MMX SIMD intruction set which perform 8 8bit multiplictaion per cycles.
Its performances are however limited by the large number of cycle you need to load/unload the MMX
register with correctly formatted data.

> I agree that the performance is limited to the comm time.  One of the advantages of using the FPGA
> however, is that you can do more processing before returning the data, especially if you have a
> local dedicated memory to use too.

This is the same for CPU with on chip L2 cache (like PII), using good programming techniques you can
have a good reuse of cached data and then limit off-chip comunication to its minimum.

>  For a simple example, in edge detection you will probably use both a horizontal and
> vertical Sobel operator, each of which is a 3x3 (or larger) 2 D convolution.  With the FPGA, you
> can do both at once and combine them before returning the result.

By using simple loop merging technique, this will also work for CPU with L2 cache memory.

>  As the algorithm becomes more complicated, the FPGA shows greater gains,

Not necessary, FPGA can show greater gain for algorithms exhibiting good regularity in computations
and with a high computation over communication ratio.

> In most cases, it can also work at a considerably lower clock rate, thereby reducing power.

True in most cases.

Steven

> --
> -Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email randraka@ids.net
> http://users.ids.net/~randraka

Article: 19121
Subject: IP Reuse/System-On-A-Chip Start Up Company
From: ipreuse@my-deja.com
Date: Tue, 30 Nov 1999 19:19:51 GMT
Links: << >> << T >> << A >>

Please feel free to circulate the following position announcements:

Position: President
Location: San Francisco Bay Area

Get in on the ground level: start-up in the area of intellectual
property reuse (design reuse) for system-on-a-chip. The system-on-a-
chip marketplace is expected to grow from $5.9 billion in 1999 to $15.7
billion in 2003.

Seeking President to work with CEO.

Responsibilities:
- Drive/implement business strategy
- Help raise initial and subsequent venture capital rounds
- Identify and create industry alliances/partnerships
- Help assemble core team

Requirements:
- Background in semiconductor industry; understanding of intellectual
property reuse desired.
- Ability to work with business and technical personnel

Please email your resume to ipreuse@my-deja.com


Position: Chief Operating Officer
Location: San Francisco Bay Area

Get in on the ground level: Ground level start-up in the area of
intellectual property reuse (design reuse) for system-on-a-chip. The
system-on-a-chip marketplace is expected to grow from $5.9 billion in
1999 to $15.7 billion in 2003.

Responsibilities:
- Implement business strategy
- Run day-to-day operations of start-up company
- Identify and create industry alliances/partnerships

Requirements:
- Background in semiconductor industry; understanding of intellectual
property reuse desired.
- Ability to work with business and technical personnel

Please email your resume to ipreuse@my-deja.com

Position: Consulting Design Engineers
Location: San Francisco Bay Area

Get in on the ground level: Ground level start-up in the area of
intellectual property reuse (design reuse) for system-on-a-chip. The
system-on-a-chip marketplace is expected to grow from $5.9 billion in
1999 to $15.7 billion in 2003.

Responsibilities:
- Provide technical and process consulting in the area of design reuse

Requirements:
- EE degree; Background in semiconductor industry; technical
understanding of intellectual property reuse.

Please email your resume to ipreuse@my-deja.com


Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 19122
Subject: Re: FPGA vs DSP vs PENTIUM MMX
From: Arrigo Benedetti <arrigo@vision.caltech.edu>
Date: 30 Nov 1999 11:24:25 -0800
Links: << >> << T >> << A >>

Steven Derrien <sderrien@irisa.fr> writes:

> Ray Andraka wrote:
> 
> > In the case of a 3x3 convolution, the FPGA can still significantly outperform the
> > Pentium, and it uses a lower clock frequency and much less power.  The secret here is
> > that the FPGA can perform all the multiplies in parallel, while the pentium does at best
> > one multiply per clock.
> 
> MMX can perform 8 multiplication per clock cycle on 8 bit words (those used for image
> processing) and moreover recent PIII clock speed are approx 4 times those of FPGA.
> 
> Anyhow, in my previous post I was considering Multplier working in parrallel.
> 
> > Use of on chip memory or a dedicated external memory buffer
> 
> > allows the FPGA to process each pixel as it arrives without having to repetitively fetch
> > the surrounding pixels.
> 
> Right; but total execution time will always be bounded by the time it takes to transfer the
> orginal image into the FPGA plus the time to transfer the resulting image out of the FPGA.
> 

Not necessarily: the image stream can "flow-through" the FPGA and be processed
on the fly. Delay lines can be implemented inside the FPGA to build pixel
neighborhoods. The delay between the output and the input images is given by
the latency of the arithmetic operators (e.g. multipliers) and the need to fill
the delay lines to build the neighborhoods.

Some ideas relevant to this problem are presented in these papers:

@InProceedings{cvpr98,
  author =       {A. Benedetti and P. Perona},
  title =        "{Real-time 2-D Feature Detection on a Reconfigurable Computer}",
  booktitle =    {Proceedings of the 1998 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'98)},
  year =         {1998},
  month =        {Jun.},
  address =      {Santa Barbara (CA)},
  pages =        {586--593}
}

@InProceedings{iscas99,
  author =       {A. Benedetti and P. Perona},
  title =        "{A Novel System Architecture for Real-Time Low-Level Vision}",
  booktitle =    {Proceedings of the 1999 IEEE Symposium on Circuit and Systems (ISCAS'99)},
  year =         {1999},
  month =        {Jun.},
  address =      {Orlando (FL)}
}

Best,
-Arrigo
--
Dr. Arrigo Benedetti                e-mail: arrigo@vision.caltech.edu
Caltech, MS 136-93	  			phone: (626) 395-3695
Pasadena, CA 91125	  			fax:   (626) 795-8649

Article: 19123
Subject: Free classified ads
From: sonic@sega.net
Date: Tue, 30 Nov 1999 21:52:25 GMT
Links: << >> << T >> << A >>

Costech has a free classified page on their site no gimmicks absolutely
free. Anything you wish to sell please feel free to post it there.

You may include a picture linked to your URL.


http://www.costech.com.

Thanks

Article: 19124
Subject: Re: FPGA vs DSP vs PENTIUM MMX
From: Ray Andraka <randraka@ids.net>
Date: Tue, 30 Nov 1999 17:56:19 -0500
Links: << >> << T >> << A >>

Basically what I was trying to say.  Given enough local memory and the bandwidth (pins) to access it, an FPGA can do
the processing at the video frame rate, as demonstrated in numerous systems, including the one described in my 1996
paper " A Dynamic Hardware Video Processing Platform " which did image recognition processing at the frame rate
using a tiled array of 4 chips similar to the Atmel AT6005s.

Arrigo Benedetti wrote:

> Steven Derrien <sderrien@irisa.fr> writes:
>
>
> Not necessarily: the image stream can "flow-through" the FPGA and be processed
> on the fly. Delay lines can be implemented inside the FPGA to build pixel
> neighborhoods. The delay between the output and the input images is given by
> the latency of the arithmetic operators (e.g. multipliers) and the need to fill
> the delay lines to build the neighborhoods.
>
> Some ideas relevant to this problem are presented in these papers:
>

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search