Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
In article <H+lBzDApkc1zEwed@devantech.demon.co.uk>, Gerald Coe <devantech@devantech.demon.co.uk> wrote: >In article <5qqf36$r1r@thorgal.et.tudelft.nl>, "L. Kumpa" ><LKumpa@nonet.net> writes >>The "Try Actel Software / Free Software!" link is STALE... (19/7/97) >> >No it isn't. I downloaded the package on sunday 20/7/97, No problems. >> I managed to download OK the other day at about 9pm local time. 9am the next morning I had the local Actel rep on the telephone asking me all sorts of questions about my current developement system etc. Talk about efficient! Maybe it pays to accidentally get your telephone number wrong when filling out the mandatory 'guest book' entry. NicArticle: 7026
Peter Alfke <peter@xilinx.com> wrote in article <33D6A254.1C07@xilinx.com>... > There is something strange going on here. > I see only five postings in the whole week between July 9 and July 16. > Where is the killer ? I think it is called 'vacation' ;-) AustinArticle: 7027
Hi, I just read some performance claims of TI and AD about the speed for FFTs. TI's C60 give these numbers: Complex Radix 4 FFT (1024 Points): 0.066ms Complex Radix 2 FFT (1024 Points): 0.104ms ADSP 21061: FFT 1K complex: 0.46ms I'm wondering how such a speed will be reached with a 50 MHz FPGA (which is a very hard desing). I think most FPGAs are running at lower clock rates. Everybody talks that FPGAs can outperform DSP in these applications. Either the FPGA guys use some cool algorithms (which one do they use at all?) or the trick must be somewhere else. Any hint is welcome. Robert M. Muench SCRAP EDV-Anlagen GmbH, Karlsruhe, Germany ==> Private mail : r.m.muench@ieee.org <== ==> ask for PGP public-key <==Article: 7028
I'm evaluating the Aptix MP3 for use on a rapid prototyping job and am looking for opinions on its usefulness from users of it. thanks Jeff W.Article: 7029
I am happy to report that this embarrassing problem has come to a happy ending. When you receive the M1 shipment ( M1.3 ) "anyday soon now", you can download from the net a small collection of patches, and one of them will give M1 the ability to create a concatenated bitstream for any mixture of older and newer parts. This problem was not really as complex as it was made out to be. I am glad it is solved. Peter Alfke, Xilinx ApplicationsArticle: 7030
Try a web site called Deja News. Not sure of the syntax, but it's close. Supposed to have old posts listed there. Not to be critical, but how come you use a date format that cannot be recognized intuitively? Isn't it easy to confuse it with the other non-intuitive method? You know, 3/1/97 is March fisrt to some, and January third to others. But, the 3 Jan 1997 format can't be confused as being March 1 1997 or anything other than what it is, right? Maybe it's good for non-alpha applications, but it is often interpeted backwards (according to your frame of reference). Anyway, I hope I helped with finding old posts. I too am left wondering what the logic is behind what many of rules and programming practices are used on the net! Regards, Tom On 23 Jul 1997 20:24:53 GMT, "Peter Welten" <welten@miles.nl> wrote: >Hello fellow newsgroup users, > >On 16-7-97 I posted a question in this newsgroup about selection criteria >for FPGA's and CPLD's. > >Some of you kindly responded on 16 and 17-7-97 (Philip Freidin, Brian >Dipert (twice), L. Kumpa). > >When I came to see if any answers were there (23/7/97, I'm a busy guy), all >these messages were apparantly deleted from the server, that is, I could >still see what messages had been posted, but the messages themselves were >gone. > >Why? > >Peter WeltenArticle: 7031
"Robert M. Münch" <Robert.M.Muench@SCRAP.de> wrote: >I just read some performance claims of TI and AD about the speed for >FFTs. > >TI's C60 give these numbers: >Complex Radix 4 FFT (1024 Points): 0.066ms >Complex Radix 2 FFT (1024 Points): 0.104ms > >ADSP 21061: >FFT 1K complex: 0.46ms > >I'm wondering how such a speed will be reached with a 50 MHz FPGA (which >is a very hard desing). I think most FPGAs are running at lower clock >rates. > >Everybody talks that FPGAs can outperform DSP in these applications. >Either the FPGA guys use some cool algorithms (which one do they use at >all?) or the trick must be somewhere else. > >Any hint is welcome. > >Robert M. Muench A couple of points...1) FPGA's have the most advantage over DSP's for fixed point applications because the FPGA can implement exactly the right word width while the DSP has to use 16 or 32 bits. 2) FPGA's gain over DSP's because they have no overhead for shift, mask and other bit twiddling operations. 3) FPGA's have much more fine grain parallelism with the need for looping and branch instructions. So in summary FPGA's exactly implement algorithms without any wasted hardware and very little overhead. Andre DeHon (http://www.ai.mit.edu/people/andre/phd.html) has a paper claiming FPGA's have a 10-30 to 1 advantage over general purpose processor and interestingly, groups DSP's in with GP's. So, I don't know if an FPGA can beat a floating point DSP on FFT algorithms...probably not unless you include glue logic and I/O in the comparison. But compare FPGA's and DSP on a fixed point image processing or other similar algorithm and the difference becomes profound. jeffArticle: 7032
Robert M. Münch wrote: > > Hi, > > I just read some performance claims of TI and AD about the speed for > FFTs. > > TI's C60 give these numbers: > Complex Radix 4 FFT (1024 Points): 0.066ms > Complex Radix 2 FFT (1024 Points): 0.104ms > > ADSP 21061: > FFT 1K complex: 0.46ms > > I'm wondering how such a speed will be reached with a 50 MHz FPGA (which > is a very hard desing). I think most FPGAs are running at lower clock > rates. Robert, The clock speed alone does not determine the overall performance for two basic reasons: 1. An FPGA may have several multiplier/adder/subtractor units(used for FFT) that can run in parallel albeit at a lower speed than a DSP. 2. FPGAs don't have go through the fetch-decode-execute cycles of a DSP. Regards, Kayvon Irani Los AngelesArticle: 7033
I claim the following response is on topic for this news group! How often does this happen :-) If the algorithm depends on Floating point and random addressing of memory (like in an FFT), then FPGAs may not have an advantage. For fixed point applications, the advantages come from several places, and together more than make up for a difference in clock speed. 1) No instruction fetching, decoding, or bus usage, (or silicon usage for on chip cache), address generators, branch and looping logic etc. 2) No pipeline stalls for branches, or looping, because the whole algorithm can be executed in parallel. 3) No thrashing in a register file saving and retrieving intermediate result because the datapath pipeline can be layed out to match the algorithm. 4) Data path can be designed with just the hardware to match the algorithm, including width, bit ops, carry, saturation, clipping, store/fetch, constrained algorithm to match FPGA strengths. 5) Flow thru (data-flow really, without the tokens) data path may allow a new calculation to start before the current one has finished. Typically, I find that the place for optimization of DSP algorithms in FPGAs is in how the multiplies are done. Often (say in an FIR filter) the multiply is by a constant. So the first optimization is to throw away the slices of the multiply where the constant bit is a zero. What is left is the '1' bits. So an optimization may be to adjust the FIR design so that the constants have a minimal set of '1' bits. For example you could support a dynamic range of constants of 8 bits, giving 0 thru 255, but set a rule that no constant can have more than 3 bits set. This would allow constants like 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, ... upto 224. While all possible values are not available, you get a good range down near zero, a reasonable selection all the way upto 224. Why bother? Well you can implement this constrained multiply with just one or two adders. Here is another example of a constraint: the constant can have at most 2 bits that are non-zero, and the non-zero bits can have either Positive or Negative weighting. So you can do things like 7 = 8-1, 9 = 8+1, 24 = 16+8 and 56 = 64-8, and lots more. This type of multiplier can be implemented in just one adder/subtractor! You never think of these type of 'optimizations' when working with a DSP, because you have a multiplier. But typicallly only 1. In an FPGA, rather than build 1 general purpose 8 x 8 multiplier, I can build 8 of my constrained range constant multiplier, and they run 8 times faster than the general purpose multiplier. This gives a potential speed up of 64, and can more than make up for the difference in clock rates between a DSP and an FPGA. Here is a real example to think about. I have a video data stream delivering pixels to me in scan line order at 25 MHz, and I want to do a 7 by 7 filter on it. In DSP land, I will have to store the data in memory, and after the 7th line starts arriving I can start processing. I need 49 multiplies and 48 adds. If the data is 8 bits wide, and the constants are also 8 bits, my adds will result in a total that is 22 bits wide. To keep up with the data stream, I will need to do 49 fetches, and 49 MAC cycles (folding the adds into the multiplies, which most DSPs can do). I will need to do this all in 40nS, because thats when the next pixel arrives. So as a rough guess the required performance is 1200 MIPS (just for the MACs), and quite a serious problem with the data fetching. A BIG register file might help, but long scan lines will still cause problems. Some hardware assist such as 6 line buffers opperated as a rolling buffer might just save the day on the data fetch side. Now all you need is 1200+ MIPS, and a 7 stage rolling pipeline inside the DSP (7 of them) (implies a register file of at least 49 registers), and the ability to load 7 new data values each 40nS (i.e. 175 MB/sec of data fetch bandwidth. (We'll assume that the external logic running the rolling buffers deals with the addressing, as the DSP certainly can't). In FPGA land, I've done this design for one of my clients, in a 4010E, with room left over, and it ran at 33MHz. It did require the 6 line buffers, and some playing around with the constants. I have over 100 other examples of DSP functions that I have done in FPGAs for my clients, and all of them had no chance in typical DSPs. Philip Freidin fliptron@netcom.comArticle: 7034
Robert M. Münch wrote: > > Hi, > > I just read some performance claims of TI and AD about the speed for > FFTs. > > TI's C60 give these numbers: > Complex Radix 4 FFT (1024 Points): 0.066ms > Complex Radix 2 FFT (1024 Points): 0.104ms > > ADSP 21061: > FFT 1K complex: 0.46ms > > I'm wondering how such a speed will be reached with a 50 MHz FPGA (which > is a very hard desing). I think most FPGAs are running at lower clock > rates. > > Everybody talks that FPGAs can outperform DSP in these applications. > Either the FPGA guys use some cool algorithms (which one do they use at > all?) or the trick must be somewhere else. > > Any hint is welcome. Hi Robert, 'Parallelism' is the magic word. I haven't done FFT yet in an FPGA, but last year I designed a neural net and its operations are similar to an FFT. A design on an FPGA can have multiple execution units for the different computing phases and can work in pipeline. So with a fast FPGA the narrowest bandwidth is the speed data is fetched from an external memory at. If it is for example 10 MWords/s the pipeline would do 10000 mul+add operations in a usecond. I always used Altera FPGAs which ones aren't speed demons, but 10 MWords/s ain't no big deal. Cheers, Botond -- Kardos, Botond - at Innomed Medical Ltd. in Hungary eMail: kardos@mail.matav.hu phone/fax: (0036 1) 351-2934 fax: (0036 1) 321-1075Article: 7035
Reetinder P. S. Sidhu wrote: > > Hi > > Our research group is interested in > > 1. Designing using HDL/schematic entry tools, > 2. Simulating the placed and routed design on various vendors' FPGA > architectures. > > We understand that for step 1 architecture independent tools > from Synopsys/Viewlogic can be used. This is close, but not accurate. I can't speak for schematic entry, but I know that vi is good for HDL design. In your assumptions you skip some steps. HDL can be designed using a simple editor. It should then be simulated. You can get HDL simulators from synopsys or cadence to name just two. Following successful simultation (that is testing the HDL functions as you designed it) the next step is to synthesize it for the target architecture. Synopsys can perform the synthesis but you must have the libraries from the FPGA vendor to synthesis FROM HDL TO the target architecture. (Libraries usually include crude timing models so timing simulation can be conducted at this point). Out of the synthesis tool you have (essentially) a sea of gates which must be made to fit into the coarse grain architecture of an FPGA. The FPGA vendor-specific place and route tools are used to perform this. These tools have (more) accurate timing models so you should generate a timing annotated HDL model of the final design to confirm the design still functions correctly within the timing limits of the device. Hope that answers the question you asked. Tim Warland ASIC Engineer -- You better be doing something so that in the future you can look back on "the good old days" My opinions != Nortel's opinion;Article: 7036
Virtual Computer Corp. announces European Training Schedule for H.O.T. Works Hardware/Software Co-Design Development System Take the Course and Walk Away with the Knowledge and H.O.T.Works A Two Day Course on Programming the New Xilinx XC6200 Series RPU and using the H.O.T. Works Hardware/Software Co-Design System _______________________________________________________________ Sept. 4-5 in London, UK at Imperial College, at the site of the 8th International Workshop Field Programmable Logic and Applications. Sept. 11-12 in Brest, FRANCE at the UBO Département Informatique. ________________________________________________________________ Seat are Limited -- Register Now!! Price: $2295.00 (Special University Price $1990) includes two-day course, lunch and the H.O.T. Works Development System For more info on H.O.T Works Training Class & Registration: http://www.vcc.com/vcct1.html For more info on H.O.T Works Development System: http://www.vcc.com/hotann.html ------------------------------------------------------------------------ John Schewel Virtual Computer Corp. 6925 Canby Ave Suite 103 Reseda CA 91335 USA tel: 818-342-8294 fax: 818-342-0240 email jas@vcc.com http://www.vcc.comArticle: 7037
Peter wrote: > > Hello, > > I have just read in the Xilinx newsletter that one of the problems > with *large* FPGAs is the dynamic dissipation. This is fairly obvious > I suppose. The move to 3.3V is offered as the cure, for the time > being, but this gives only some 50% drop. > > I also know, having done a few FPGA -> ASIC conversions where the > target had to be low-power, that the bulk of an ASIC's *dynamic* Icc > can be avoided by clock gating (i.e. local clocks, rather than a > global clock going all over the place, and using clock enables). I saw > drops of up to 5x by using heavy clock gating in the FPGA version, and > much more (10x-100x easily) in an ASIC. > > Should Xilinx therefore not provide some local clocking? > The "Global Early Buffers" on the 4000X series might be of some use, since each drives only one quadrant of the array. I doubt that they would help much with the real problem with clock gating and multiple clocks since there are no relative skew guarantees between different clocks as far as I know (except for a few special cases like the fast I/O clocks). What would make me really happy (for multirate DSP) is some way of synchronously distributing derived clocks, each synchronously gateable. For example, from a 1X 100MHz fast clock input, distributing 100, 50 and 25 Mhz clocks on chip, with all clocks having guaranteed relative skew so that data could be transferred between any combination of flip-flops with no timing hassles. Easy to say, but hard to do I'm sure. Right now, the provision of several flavors of corner buffers helps a bit, but dividing or gating clocks on-chip is still really yucky from a timing perspective if you need to transfer data between clock domains. regards, tom (tburgess@drao.nrc.ca)Article: 7038
Qualis Design Corporation has released the Fall schedule for our many hands-on, application-focused courses in VHDL- and Verilog-based design. Our courses are like no other -- just take a look at our lineup: VHDL System Design ------------------ Introductory: High Level Design Using VHDL (5 days) System Verification Using VHDL (5 days) VHDL for Board-Level Design (5 days) Elite: ASIC Synthesis and Verification Strategies Using VHDL (5 days) Advanced Techniques Using VHDL (3 days) VHDL Synthesis -------------- Introductory: VHDL for Synthesis: A Solid Foundation (5 days) Elite: ASIC Synthesis Strategies Using VHDL (3 days) Behavioral Synthesis Strategies Using VHDL (3 days) For more info on our suite of HDL classes, to review our Fall schedule, or if you're interested in an on-site class, check out our web site at http://www.qualis.com or call Michael Horne on our hotline at 888.644.9700. Qualis Design Corporation 8705 SW Nimbus Suite 118 Beaverton OR 97008 USA Ph: +1.503.644.9700 Fax: +1.503.643.1583 http://www.qualis.com Copyright (c) 1997 Qualis Design CorporationArticle: 7039
Qualis Design Corporation has released the Fall schedule for our many hands-on, application-focused courses in Verilog- and VHDL-based design. Our courses are like no other -- just take a look at our lineup: Verilog System Design --------------------- Introductory: High Level Design Using Verilog (5 days) System Verification Using Verilog (5 days) Verilog for Board-Level Design (5 days) Elite: ASIC Synthesis and Verification Strategies Using Verilog (5 days) Advanced Techniques Using Verilog (3 days) Verilog Synthesis ----------------- Introductory: Verilog for Synthesis: A Solid Foundation (5 days) Elite: ASIC Synthesis Strategies Using Verilog (3 days) Behavioral Synthesis Strategies Using Verilog (3 days) For more info on our suite of HDL classes, to review our Fall schedule, or if you're interested in an on-site class, check out our web site at http://www.qualis.com or call Michael Horne on our hotline at 888.644.9700. Qualis Design Corporation 8705 SW Nimbus Suite 118 Beaverton OR 97008 USA Ph: +1.503.644.9700 Fax: +1.503.643.1583 http://www.qualis.com Copyright (c) 1997 Qualis Design CorporationArticle: 7040
On Wed, 23 Jul 1997 18:12:08 GMT, z80@dserve.com (Peter) wrote: >Should Xilinx therefore not provide some local clocking? Probably. >Presumably this is because the speed improvement has improved the >clock-Q timing by a larger factor than it has improved the worst-case >interconnect delays. This would break any shift-reg sort of circuit. >So we are now right back to routing ACLK/GCLK to *every* D-type, and >using clock enables, together with what is sometimes silly extra logic >to generate the clock-enable at the right time. Lots of milliwatts! So try a 3.3V ORCA FPGA, and put clocks on globals, longs, or even half length lines. The I/O's can direct drive all the routing (xL, xH, x4, and x1, so clock pad to PFU (CLB) clock input can be fast too. StuartArticle: 7041
hi, first for an airborne system it is extremely doubtful that latchup would be a concern (not lockup). latchup can occur when a heavy ion like a cosmic ray hits the part and turns on a parasitic scr; this conducts currents ranging from say 10's of mA to *VERY* large if it goes into thermal runaway. also, it is possible for a proton to cause a latchup but this is very, very rare. now, i assume you mean latchup since they want you to use an 'epi' layer. next, an epi layer is no guarantee of avoiding latchup. there are plenty of examples of this. in the fpga world parts with epi-layers that latch are the A1020B and the A32200DX. then again, most fpga's *don't* have epi layers and latch much, much easier. to avoid latchup the epi layer must be the right thickness (say <= 10 um) and certain design rules should be followed (like guard rings). it's a combination of factors, including the size of the ic circuit structures that matter: like a1020 (2 um) and a1020a (1.2 um) don't latch; the a1020b (1.0 um) do - and all have the same epi layer. so, for *assurance* that a part doesn't latch, it's recommended that you perform radiation testing on it with a particle accelerator (not too expensive but not cheap either - say ~ $600/hour). you are also correct, most fpga's don't have epi layers. all actel devices, for example do and most don't have any latch up problem at all. another one that doesn't is national's clay-31, which is sram-based. then again, not having an epi layer is not a guarantee of disaster (although it frequently is). for example, the latchup threshold for the chip express qyh500 series (0.8 um) is very, very high when running at 5.0 volts - at 3.6 volts it hasn't been seen to latch at even high er energies. for pal's, a good solid choice is utmc's UT22VP10, which is an amorphous silicon antifuse based PAL; other pals can be used but most of them have flip-flops that are rather easily upset by radiation. another approach is to have a latchup detection/protection circuit. by appropriately sensing the increase of current for a period of time, one can *declare* latchup and remove power quickly, hopefully before anything in the device has busted or overstressed (say from electro-migration) which can shorten the device's life time. this approach is being implemented by space electronics, inc. for the gatefield fpgas - they naturally are relatively easy to latch. of course, you should make sure that when you remove and restore power to the part you don't have other problems like driving into unpowered inputs, etc., etc. don't know exactly how sei handles this (they wouldn't say) but they do package the gatefield chip in a package with their own custom chip for protection. anybody else know of other fpga's with epi-layers? a thin epi layer can help with single event upsets (SEUs) but not enough to get excited about - it turns out that internal circuit design is much more of an important factor here. also, for sram-based fpga's, the upset rate will be much higher than an equivalent antifuse based device (e.g., quicklogic, actel) since the number of configuration memory elements is much higher (~2 orders of magnitude or so) than the user memory. if they want to prevent lockup, then you need to have robust state machines with no lockup states or use things like triple modular redundancy, parity, pairs of flip-flops with compare logic, configuration validation logic, periodic resets, etc.; this will depend on what error rate is acceptable and how much protection your system needs (like can you break something). hope this helps, rk p.s. summary paper coming out on this in Dec. 1997 IEEE Transaction on Nuclear Science. ______________________________________________ Mike Kelly <cogent@cogcomp.com> wrote in article <33D60000.3F82@cogcomp.com>... > Hi, > > We are doing a project with NASA for an airborne measuring system. We > will be providing the system cpu board. The engineers at NASA would > like us to use programmable logic (I prefer EPLDs) that have an > epitaxial layer. They say this provides better resistance to radiation > induced lockup. What I want to know is what chips out there have this > epitaxial layer. I gather it is not common. Any help is most > appreciated. > > Please email me your ideas as well as posting them. I don't check this > group as often as I should. :) > > > Michael J. Kelly > tel: (508) 278-9400 > fax: (508) 278-9500 > web: http://www.cogcomp.com >Article: 7042
>So try a 3.3V ORCA FPGA, and put clocks on globals, longs, or even >half length lines. The I/O's can direct drive all the routing (xL, xH, >x4, and x1, so clock pad to PFU (CLB) clock input can be fast too. If this works in Orca devices, why not in Xilinx devices? Peter. Return address is invalid to help stop junk mail. E-mail replies to z80@digiserve.com.Article: 7043
Would anyone know where I could get the photo of an FPGA silicon chip? I need it for a presentation. Thanks in advance. Reetinder SidhuArticle: 7044
Wen-King Su (wen-king@myri.com) wrote: : What I would really like to see happen is a union of antifuse and SRAM : based FPGA technology. One of the things that will make the PCI equipped : ORCA expensive is the small size of its market. Instead, if we can add : superfast antifuse connected links to a conventional SRAM based FPGA, we : can implement the same speed sensitive logic core with antifuse, and leave : the rest of the chip reconfigurable. No need to fragment the market by : incorporating different custom ASIC cores into the same family of products. Are you sure you want to use "superfast" and "antifuse" in the same sentence? I have never found both in the same chip. John EatonArticle: 7045
Have a look at Lucent, Altera, Actel, QuickLogic, Xilinx, Atmel, ... web pages, and in particular, look at the product announcement/press release sections. Another possibility would be to call the companies, and ask for the last 2 or 3 quarterly reports. These often have pictures of smiling 'customers' holding up someone else's board design, and praising the company for making their life so wonderful. Sometimes (if you are lucky) you will also get a sultry picture (with carefully shaded background pastel colours) of the FPGA, without all its packaging on. Of course, another option would be to just call the companies and ask for someone in product marketing, and ask them for whatever you need. Philip Freidin. In article <5rclds$8a4$1@halcyon.usc.edu> sidhu@halcyon.usc.edu (Reetinder P. S. Sidhu) writes: >Would anyone know where I could get the photo of an FPGA silicon chip? >I need it for a presentation. Thanks in advance. > > Reetinder SidhuArticle: 7046
In a previous article johne@vcd.hp.com (John Eaton) writes: : ;Wen-King Su (wen-king@myri.com) wrote: : ;: What I would really like to see happen is a union of antifuse and SRAM :: based FPGA technology. One of the things that will make the PCI equipped ;: ORCA expensive is the small size of its market. Instead, if we can add :: superfast antifuse connected links to a conventional SRAM based FPGA, we ;: can implement the same speed sensitive logic core with antifuse, and leave :: the rest of the chip reconfigurable. No need to fragment the market by ;: incorporating different custom ASIC cores into the same family of products. : ;Are you sure you want to use "superfast" and "antifuse" in the same sentence? :I have never found both in the same chip. The overall speed of a typical design may not be too much faster with antifuse, but that is because antifuse FPGAs have smaller logic cells and require more level to implement a given logic. Maximum routing delay is only about 1ns, and is faster than anything SRAM based FPGA can offer. The key to a fast antifuse FPGA design is in the fanout control.Article: 7047
> -----Original Message----- > From: Kayvon Irani [SMTP:kirani@cinenet.net] > Posted At: Friday, July 25, 1997 6:25 AM > Posted To: fpga > Conversation: How do FPGAs outperform DSP at FFT? > Subject: Re: How do FPGAs outperform DSP at FFT? > > > 1. An FPGA may have several multiplier/adder/subtractor > units(used for FFT) > that can run in parallel albeit at a lower speed than a DSP. > 2. FPGAs don't have go through the fetch-decode-execute cycles > of a DSP. [Robert M. Münch] Ok that's clear for me. But I think the DSPs have a main advantage as they have a very good cache system. This enables the DSPs to read the data, twiggels etc. very fast without the need to access an external memory. I don't think an FPGA can store all needed data for a 1024 FFT on chip. So the memory problem still exists, it doesn't help if I can make a lot of parallel operations if I can't get the data fast enough in and out. Robert M. Muench SCRAP EDV-Anlagen GmbH, Karlsruhe, Germany ==> Private mail : r.m.muench@ieee.org <== ==> ask for PGP public-key <==Article: 7048
> -----Original Message----- > From: Kardos, Botond [SMTP:kardos@mail.matav.hu] > Posted At: Friday, July 25, 1997 2:58 PM > Posted To: fpga > Conversation: How do FPGAs outperform DSP at FFT? > Subject: Re: How do FPGAs outperform DSP at FFT? > > 'Parallelism' is the magic word. [Robert M. Münch] Please see my other posting. > A design on an FPGA can have multiple execution units for the > different computing phases and can work in pipeline. So with a fast > FPGA > the narrowest bandwidth is the speed data is fetched from an external > memory at. If it is for example 10 MWords/s the pipeline would do > 10000 > mul+add operations in a usecond. [Robert M. Münch] Ok as long as you can calculate in a stream. But the FFT has to fetch some data on the way and exchange the results. You could build an add-tree with 1024 mul+add operations and make it ten levels deep. But I think this won't fit in any FPGA yet. Robert M. Muench SCRAP EDV-Anlagen GmbH, Karlsruhe, Germany ==> Private mail : r.m.muench@ieee.org <== ==> ask for PGP public-key <== >Article: 7049
John Eaton wrote: > > Wen-King Su (wen-king@myri.com) wrote: > > : What I would really like to see happen is a union of antifuse and SRAM > : based FPGA technology. One of the things that will make the PCI equipped > : ORCA expensive is the small size of its market. Instead, if we can add > : superfast antifuse connected links to a conventional SRAM based FPGA, we > : can implement the same speed sensitive logic core with antifuse, and leave > : the rest of the chip reconfigurable. No need to fragment the market by > : incorporating different custom ASIC cores into the same family of products. > > Are you sure you want to use "superfast" and "antifuse" in the same sentence? > I have never found both in the same chip. > > John Eaton A blown antifuse has a resistance in the tens of ohms. A transmission gate or pass transistor controlled by SRAM probably has a connection resistance of thousands of ohms. In my experience the fastest FPGAs are the antifuse based Quicklogic parts and Actel's latest generation aren't too bad either. I haven't seen a RAM based FPGA with either the speed or the level of routing predictability and uniformity that I needed. Either way all FPGAs suck tremendously in performance and per unit cost compared to gate array or standard cell ASIC implementation. I redid a 8000 gate FPGA built in a 0.6 um process with a 10,000 gate gate array (I added a few extra functions) in a nearly obsolete two metal 1.0 um process. The FPGA only ran at about 2/3 of the maximum target clock frequency, the gate array twice as fast as needed. The gate array was about 25% the cost (for a relatively low volume video capture and processing VME card) of the FPGA and the gate array NRE plus my time was paid for after selling less than a 100 units. (This didn't even include the cost of maintaining the FPGA fuse files and programming parts on the production floor). A good compromise is if you can design system ASIC(s) to be drop in compatible with an existing FPGA. The system can be brought up earlier with the FPGA and lab testing and evaluation can provide valuable input to the final ASIC design (and even reduced capability early beta units to customers). Sometimes some FPGA samples might run fast enough under benign or conditions to run a few systems at full speed prior to sampling the ASIC. -- Paul W. DeMone The 801 experiment SPARCed an ARMs race Kanata, Ontario to put more PRECISION and POWER into demone@mosaid.com architectures with MIPSed results but PaulDeMone@EasyInternet.net ALPHA's well that ends well.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z