Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
In article <384034D5.2138D838@yahoo.com>, Rickman <spamgoeshere4@yahoo.com> wrote: > > I guess this is the ultimate yawn in a newsgroup. The original topic of > discussion is so uninteresting that the topic changes and the thread > continues on without anyone even acknowledging the fact. > > I guess that is the answer to my question. Few engineers are even > interested enough in the Lucent Orca parts to even discuss why they > don't use them??!! > My fault, due to my lack of experience with deja.com. I thought that changing the subject when replying under "power post" would start a new thread. Instead, I inadvertently hijacked your thread. After that I started a new thread with my subject, but everyone seemed to stick with this thread. Sorry for the mixup. -- Greg Neff VP Engineering *Microsym* Computers Inc. greg@guesswhichwordgoeshere.com Sent via Deja.com http://www.deja.com/ Before you buy.Article: 19101
We are nearing the end of the hardware development for a project that uses the xcv-1000-bg560. The tools used were : VCS for verilog simulation; Synplify for verilog synthesis; Alliance M2.1i for place and route. VCS is a bit pricey but we were fortunate to have it under maintenance from a past ASIC project. All in all I was pleased with the EDA tools (a first). Everything went fairly smooth with very little surprises, although every design brings with it a new problem set. arafeeq@my-deja.com wrote: > Hello all! > Has anyone put the xilinx's virtex (xcv-800-bga432) device into > Production. If yes, what was the EDA tools flow used. like.. > Verilog/vhdl/synplify/or fpga-express or fpga compiler II or leonardo > /alliance tools etc... > I appreciate the answers, > Best Regards, > Abdul Rafeeq. > Sent via Deja.com http://www.deja.com/ > Before you buy.Article: 19102
If the pentium outperformed an FPGA doing a 2D convolution, then whomever did the design for the FPGA wasn't taking advantage of parallelism. See my paper entitled "FGPA makes a radar signal processor on a Chip" for a discussion of how these types of things are done in FPGAs. The paper discusses, among other things, a complex 256 tap matched filter running at a 5 MHz sample rate. The design discussed is doing roughly 10 Billion with a 'B' multiplications per second. Thats more 2 orders of magnitude more performance than you'll get out of a pentium. George wrote: > Dear All, > > I am willing to do a performance analysis of FPGAs, DSPs and Pentium III MMX > microprocessors for highly parallel DSP applications such us Image > Processing. I am interested in particular in the use of MMX technology in > PENTIUM III general purpose microprocessors. With clock frequencies reching > 500 MHz, I may expect them to outperform both FPGAs and DSP in some > applications (e.g. 2D convolution). Has anybody done a similar case study? > Do you know any valuable references on this issue? > > Any comment will be highly appreciated. > > Thanks in advance. > > G. -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 19103
In article <383BDC4E.31D2A09A@fokus.gmd.de>, Guido Pohl <pohl@fokus.gmd.de> wrote: >I'am searching for the pin-out of a programming cable used for a MACH445, i.e. >to program it from a PC parallel port ... >I couldn't find any information to this topic - is it a secret 8-? No, it isn't. You can find a schematic in the ol' VANTIS MACH ISP Manual on page A-2 >I know that AMD's MACH is nowadays a M4-128/64 from Lattice. Lattice should have similar doku. -- Stefan Wimmer Cellware Broadband Email sw@cellware.de Justus-von-Liebig-Str. 7 WWW http://www.cellware.de/ 12489 Berlin, Germany Visit my private Homepage: Love, Electronics, Rockets, Fireworks! http://www.geocities.com/CapeCanaveral/6368/Article: 19104
Ray Andraka wrote: > If the pentium outperformed an FPGA doing a 2D convolution, then whomever did > the design for the FPGA wasn't taking advantage of parallelism. See my paper > entitled "FGPA makes a radar signal processor on a Chip" for a discussion of how > these types of things are done in FPGAs. The paper discusses, among other > things, a complex 256 tap matched filter running at a 5 MHz sample rate. The > design discussed is doing roughly 10 Billion with a 'B' multiplications per > second. Thats more 2 orders of magnitude more performance than you'll get out > of a pentium. Such speed-up might not always happen especially when you consider problem with a high communication over computation ratio (3x3 2D Convolution for example). Then, the level of "usable" parallelism might be contrained by off-chip bandwidth as anyhow execution time is bounded by communication time (time to transfer dat in and off chip). Using a very very rough approximation you can express an upper bound for the effective parallelism in the FPGA using Pmax=(Computation_Volume*Computation_Time)/(Communication_Volume*Communication_Time); If you try to acccelerate your photoshop by implementing your 3x3 convolution routines on a PCI base FPGA board for example, your maximum parallelism will be (assuming a virtex design perfomring a 8bit MAC at 100Mhz, on a HxW image, with a PCI at full burst delevering 4x8 bits every 30ns) Pmax=((H*W*3*3)*10)/(2*H*W*30/4)=6 Which means 10/6=1.3ns for performing a 8 bit MAC operation. Using your PIII@600Mhz, as MMX can process 8 pixel per MMX intruction your optimal peak performance will be 2/8=0.25 ns . This peak performance is unrealistic as we don't consider loading and unloading data from MMX registers, in practice it shoul be be between two or three time more, which still matches (or even beat) FPGA performances... Note : I should probably also consider communictaion from main memory and cache misses for MMX version , which might actually worsen execution time, but not by a strong factor I think. Hence it's not so much a matter of 'how you design' rather than a matter of picking the right application , especially one that provide a good computation over communicatio ratio. As an example implementing a 9x9 2d convolution on the same FPGA wouldl certainly provide a huge speed up. StevenArticle: 19105
Hello, I'm trying to pipeline 4 bit CLA where I'm only interested in the Cout4 and another 4bit CLA, The problem is that I'm simply running out of DFF's and was hoping if someone could shed some light on how I can construct the CLA with Cout4 with 3 -stages of pipeling? and a complete 4bit CLA with 3 stages of pipelining managed to pipeline a riple carry adder. The issue which is putting me on the back foot is the presence of combinatorial logic between the full adder stages (example would be a 4 bit CLA) So, can any intellect out there spare some time explaining how I might achieve pipelining a 4 bit carry look ahead adder? One last query the carry propagate signal are some time defined as A+B and sometime A XOR B why? The generate is always defined as A.BArticle: 19106
I use a mix of schematic and Verilog. I put state machines in Verilog and most other stuff in schematic. Verilog (or VHDL) cannot be beat for state machines. Easy to code, easy to dianose, easy to change in seconds instead of hours to re-design the machine. The more complex the state machine, the greater the time savings. I plan to use Verilog for more in the future, but I don't have enough confidence yet that what I put in is putting out what I want. -- Keith F. Jasinski, Jr. kfjasins@execpc.com Greg Neff <gregneff@my-deja.com> wrote in message news:81cjav$opg$1@nnrp1.deja.com... > In article <38398D1C.A7B9E445@ids.net>, > Ray Andraka <randraka@ids.net> wrote: > > Don Husby wrote: > > > (snip) > > > > > > I agree that schematics are still the best way to enter a design. > I just > > > thought I would beat my head against the VHDL wall one more time > before > > > going back to schematics. > > > > I've been using, no beating my head against the wall, with VHDL > lately too. > > I'm doing it for two reasons: First I have some customers who bought > the VHDL > > thing hook, line, and sinker (try to convince them they're wrong!), > and for my > > own stuff because it allows me to parameterize functions pretty > easily. I'm > > beginning to wonder if I'll ever see the return on the design > investment for > > those parameterized thingies though. > > > (snip) > > I having been using schematic entry for FPGAs, probably because I have > been drawing schematics since before the days of PALs, let alone > FPGAs. I am now considering taking the leap to VHDL entry, but I am > not convinced that there will be a benefit in either time to design or > design quality. The above comments seem to be indicative of those like > myself, who are highly skilled at schematic entry for FPGAs. > > I'm not talking about a situation where a team of engineers is > designing a mega-gate FPGA. I am more interested in small to mid-range > (say, up to 100K gate) designs that are being entered and maintained by > one person. > > I would be interested to hear from those people that have gone through > the VHDL learning curve. > > Has the move to VHDL reduced design entry time? > Has the design quality improved (fewer problems)? > Is design debugging easier? > Is design maintenance easier? > Is design reuse easier? > Did you get to the point where VHDL is more efficient than schematics? > If so, how long did it take to get to this point? > Bottom line: Was it worth it? > > I would like to hear from Don and Ray, to see if they consider > themselves to be still on the learning curve, or if they truly think > that VHDL is not worth the hassle. > > -- > Greg Neff > VP Engineering > *Microsym* Computers Inc. > greg@guesswhichwordgoeshere.com > > > Sent via Deja.com http://www.deja.com/ > Before you buy.Article: 19107
Xemacs, get it at http://www.xemacs.org "Ahmad A." wrote: > > Hi.. > Can any one tell me where can I find Free, Student edition, or Shareware HDL > editor? > > Thank you in advanced. > Ahmad.Article: 19108
Hi Bruce, The Xilinx PCI64 and PCI32 cores do not require FIFOs that back up during a bus transfer. Depending on the type of transfer, we may require the user to back up the FIFO when the bus transaction terminates. The situation you describe is never an issue when the core is the target of a write or the initiator of a read. In cases where the core is the target of a read, or the initiator of a write, it can occur if the other bus agent inserts wait states in the middle of a burst. Our implementation contains the necessary "shadow" registers as a buffer for the cases where it can be an issue. In situations where the core needs to use this buffered data, it does so automatically and transparent to the user. In cases where it can be in issue, the "shadow" registers may still hold valid data at the end of a transfer, depending on how the transfer terminated. We ask the user to back up their FIFO at this time. There are two main reasons for this: 1. It forces a FIFO state consistent with what took place on the bus. 2. It allows the next transfer on the user side to be unrelated to the first. The largest issue is item two. Many designs have more than one target address space (multiple base address registers) and have more than one "channel" as an initiator. To assume otherwise would seriously limit the flexibility of our implementation. Hope that clarifies, Eric Crabill Bruce Nepple wrote: > One thing you might look at when you consider PCI cores is whether you need > a "backup fifo" or is it implemented in the core. When you get a late > TRDY-false does the core save the data in a hidden register or do you have > to backup your fifo pointer to send it (there is no way you will be able to > stop the fifo from advancing since the signal comes so late). My impression > is that Xilinx requires a backup fifo.Article: 19109
George, Be sure to include in your comparison figures for size and power as well. You will find that, in addition to a dedicated circuit in an FPGA having orders of magnitude better performance than one based on general purpose or even DSP, the FPGA implementation may consume an order of magnitude less power, and probably considerably less space. A Pentium or other processor contains so much extra circuitry dedicated to making it a good performer at many different tasks, while an FPGA implementation can be a fantastic performer at a single task, as all resources are directed towards that one task. For example, speculative branch execution logic is of little no use when performing well determined calculations as found in most signal processing tasks. Yet it is still there in a GPP. Most cache has no use in a well designed pipelined circuit, or may be built into the pipeline in the places where it does the most good in a dedicated FPGA circuit. In a GPP, the cache sits in one place, whether it is needed or not. So please, if you do such a comparison, consider not only raw performance, but other important metrics such as power/performance and the related space/performance. Ray's paper doesn't mention it, but I'll bet his radar processor consumes less than 1/3 of the power that a DSP or GPP implementation would use. - John In article <3842ABA7.A27C813@ids.net>, Ray Andraka <randraka@ids.net> wrote: > If the pentium outperformed an FPGA doing a 2D convolution, then whomever did > the design for the FPGA wasn't taking advantage of parallelism. See my paper > entitled "FGPA makes a radar signal processor on a Chip" for a discussion of how > these types of things are done in FPGAs. The paper discusses, among other > things, a complex 256 tap matched filter running at a 5 MHz sample rate. The > design discussed is doing roughly 10 Billion with a 'B' multiplications per > second. Thats more 2 orders of magnitude more performance than you'll get out > of a pentium. > > George wrote: > > Dear All, > > I am willing to do a performance analysis of FPGAs, DSPs and Pentium III MMX > > microprocessors for highly parallel DSP applications such us Image > > Processing. I am interested in particular in the use of MMX technology in > > PENTIUM III general purpose microprocessors. With clock frequencies reching > > 500 MHz, I may expect them to outperform both FPGAs and DSP in some > > applications (e.g. 2D convolution). Has anybody done a similar case study? > > Do you know any valuable references on this issue? > > > > Any comment will be highly appreciated. > > > > Thanks in advance. > > > > G. > > -- > -Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email randraka@ids.net > http://users.ids.net/~randraka > > Sent via Deja.com http://www.deja.com/ Before you buy.Article: 19110
Pat wrote: > Does anyone have any info' about the Clearlogic Vs. Altera > bust-up ? I was thinking of using ClearLogic because the cost saving is > quite dramatic but, if they're going to suddenly withdraw their service > 'cos Altera have successfully sued them I'll be in deep Do-Do's. > > Anyone know anything. > -- > Pat Pat, To start, I would like to refer you to the Clear Logic press release that responds to your question: http://www.clear-logic.com/pressrelease/11-18-99.htm However, let me add the following personal observations: From what I have heard of the complaint, it is a huge long-shot, a desperate move by Altera, with roughly zero probability of success. Still, it's not a bad strategy, since in the end, each side will have burned up N dollars on lawyers, which obviously stings Clear Logic more than Altera. But put us out of business? Not a chance. We easily have the financial strength to weather the costs of the legal defense. Look at it this way. If Altera could beat us on price, performance or quality, they would. Instead they have taken us to court. You are nobody in the Silicon Valley if you have not been sued. And every successful startup in the Silicon Valley is eventually sued by the competition. It's all just a part of growing up. Clear Logic is here to stay. DISCLAIMER: I am, of course, speaking for myself, and not my employer. They would not be so foolish as to even entertain the possibility of considering the option of allowing me to speak for them. Please refer to the Clear Logic website, www.clear-logic.com for official statements from management. --Scott Chase Senior Applications Engineer Clear Logic, Inc.Article: 19111
Isn't it interesting that Peter Alkfe, who regularly gives advice and explains the reasoning behind certain xylinx decisions always goes very quiet when there is a discussion that highlights the shortcomings of the software xylinx sells ? In article <01bf395a$df27ca70$207079c0@drt1>, Austin Franklin <austin@darkroom098.com> writes >> I would guess by the tone of your message that you are pretty frustrated >> over this. > >Frustrated? HA! That's an understatement! > >And they ask me the most STUPID question they can possibly every ask "Why >do you want to use that tool anyway?...NO ONE uses it". DAMN does that >piss me off. I need to use THAT tool because with it, I can fill in the >blanks that THEIR documentation leaves out...like what is the best IOB to >use for the RESET input, where the hell IS the upper right and left corner >of the die, with relation to the package...and just WHAT did the tools do >to my logic? > >These, and many other questions can be answered with this tool...but their >attitude is, "hey, the OTHER tools work just fine, so you don't really need >that tool, Oh, and by the way, NO ONE uses it anyway"...GRRRRRRRR. > -- Steve Dewey remove 123 to email.Article: 19112
Because the JTAG pins are dedicated pins on 10K, they do not interfere with JTAG programming. What Martin is refering to is probably the case where a blank EPC2 and Flex 6K are in the same JTAG chain. Because nConfig on 6K is most likely tied to a pull-up, when the board powers up, nConfig goes high and the 6K enters the configuration mode. With EPC2 unprogrammed, the 6K can't get configured (assuming that the 6K is designed to get it configuration bit stream from the EPC2) and remains in the configuration mode. Since the JTAG pins on 6K are dual-purpose pins, they are tri-stated while the 6K is in the configuration mode. With the JTAG ppins on 6K tristated, the JTAG chain is effectively broken. Thus, to be able to ISP the blank EPC2 via JTAG, nConfig of 6K needs to be pulled low so that the 6K is not in the configuration mode. Alterantively, you can pre-programmed the EPC2 so that the 6K can exit the configuration mode, enter the user mode and allow JTAG pins to operate. As for the "Unrecognized Device or Socket is empty problem", well, I am not sure. It could be a sw version problem (get the latest ASAP2 from Altera and try again). You might also want to check the voltage level and look for any noise. Ying In article <383eaf14.9384960@news.freeserve.net>, <martin@the-thompsons.freeserve.co.uk> wrote: >Hi Volker. > >If the 10K10 is like the 6016 I used once, you need to pull down the >nCONFIG line (I think) the first time you program (and if you ever >corrupt the EPC2) otherwise it tristates the JTAG I/Os until its knows >what they are configured as! I put a jumper on my board for this >purpose, there may be one on your EV board. > >Altera have an App note on this somewhere, try a search on their >website (http://www.altera.com surprise surprise :). If you can't >find it, let me know and when I get into work I'll check what I did >last time. > >Cheers, > Martin > >On Wed, 24 Nov 1999 20:52:43 -0800, Volker Kalms ><ea0038@uni-wuppertal.de> wrote: > >>Hi all, >> >>Since a quarter of a year I discover the beatiful world of >>AHDL and VHDL. Until now everithing worked fine. But now I would >>be very grateful for a little help. >> >>Lately I got an FPGA evaluation board (DIGILAB 10K10, manufactured >>by Ing. Buero Lindmeier) in my hands. This evaluation board contains >>an ALTERA EPF 10K10LC84-4. To configure this FLEX device I use the >>ALTERA MAX+plusII (v 9.1) software......no problem to this point. >> >>Two weeks ago I purchased an configuration EPROM (EPC2LC20), which >>could optional plugged into my evaluation board.I set up the MAX plus >>JTAG chain due to the requirements (as far as I would say), performed >>an JTAG Chain Info in the Multi-Device JTAG Chain Setup and MAX plus >>detected the additional device in the JTAG Chain. >>But when I try to Program the .pof file to the EPROM I get the message: >>Unrecognized device or socked empty. >> >> >>What am I doing wrong????? From my point of view I changed nearly every >>parameter in the MAX plus setup. >> >>I hope there is somebody out there, who could give me a hint how to get >>this EPROM configured. >> >> >>MANY THANKS IN ADVANCE!!! >> >>Best regards, >> >>Volker > >Martin Thompson >martin@the-thompsons.freeserve.co.uk >http://www.the-thompsons.freeserve.co.uk/Article: 19113
Thanks Evan, There I was, thinking I could avoid this thread :-) Tom Hill at Exemplar used my original example (mine was rather verbose, but I thought it useful) and created the Exemplar app note now available on the web site. Something was rumored to be in XCell but I might have missed it. Now that the area location constraints work in M2.1 (and I can put them in already using Spectrum) I can begin work on the ASIC-like floor-planning and incremental compile methodology in Spectrum, which will make life a lot easier on the big structured design front.(did somebody say DSP?) I've put the note up on my personal webspace (oh, if only I had tome for a real website). Those interested can find it at: http:\\www.netcomuk.co.uk\~s_clubb\increment.zip Cheers Stuart On Sun, 28 Nov 1999 15:34:09 GMT, eml@riverside-machines.com.NOSPAM wrote: >I don't think this app note has got as far as Xilinx or Exemplar yet. >I've copied this to Stuart and hopefully he can provide details on >where to get it from. > >Evan > For Email remove "NOSPAM" from the addressArticle: 19114
In the case of a 3x3 convolution, the FPGA can still significantly outperform the Pentium, and it uses a lower clock frequency and much less power. The secret here is that the FPGA can perform all the multiplies in parallel, while the pentium does at best one multiply per clock. Use of on chip memory or a dedicated external memory buffer allows the FPGA to process each pixel as it arrives without having to repetitively fetch the surrounding pixels. In many cases, the pixel rate is even slow enough that the FPGA can process the data serially. A 640x480 image at 60 frames/sec has a pixel rate of only 18 MHz. If the pixels are 8 bits, then you can condense the hardware considerably by working on two bits at a time at a system clock of 73 MHz. Yes, I've done this, and a 3x3 takes up a very small area - less than 100 CLBs. Expanding it to a larger 2D convolution takes more area and more local line buffer memory (may have to be wider for the I/O bandwidth if it is off-chip), but has no real impact on the pixel rate...something a microprocessor implementation can't claim. The power savings are also considerable. I didn't put the power savings in my paper, as I did not have measurements or estimates to cite. Parallelism lets you use a lower clock frequency, and purpose built logic keeps the gate count small. It does use a different design flow, and requires a different set of skills than a microprocessor based design. Steven Derrien wrote: > Ray Andraka wrote: > > > If the pentium outperformed an FPGA doing a 2D convolution, then whomever did > > the design for the FPGA wasn't taking advantage of parallelism. See my paper > > entitled "FGPA makes a radar signal processor on a Chip" for a discussion of how > > these types of things are done in FPGAs. The paper discusses, among other > > things, a complex 256 tap matched filter running at a 5 MHz sample rate. The > > design discussed is doing roughly 10 Billion with a 'B' multiplications per > > second. Thats more 2 orders of magnitude more performance than you'll get out > > of a pentium. > > Such speed-up might not always happen especially when you consider problem with a > high communication over computation ratio (3x3 2D Convolution for example). Then, > the level of "usable" parallelism might be contrained by off-chip bandwidth as > anyhow execution time is bounded by communication time (time to transfer dat in and > off chip). > Using a very very rough approximation you can express an upper bound for the > effective parallelism in the FPGA using > > Pmax=(Computation_Volume*Computation_Time)/(Communication_Volume*Communication_Time); > > If you try to acccelerate your photoshop by implementing your 3x3 convolution > routines on a PCI base FPGA board for example, your maximum parallelism will be > (assuming a virtex design perfomring a 8bit MAC at 100Mhz, on a HxW image, with a > PCI at full burst delevering 4x8 bits every 30ns) > > Pmax=((H*W*3*3)*10)/(2*H*W*30/4)=6 > > Which means 10/6=1.3ns for performing a 8 bit MAC operation. Using your PIII@600Mhz, > as MMX can process 8 pixel per MMX intruction your optimal peak performance will be > 2/8=0.25 ns . This peak performance is unrealistic as we don't consider loading and > unloading data from MMX registers, in practice it shoul be be between two or three > time more, which still matches (or even beat) FPGA performances... > > Note : I should probably also consider communictaion from main memory and cache > misses for MMX version , which might actually worsen execution time, but not by a > strong factor I think. > > Hence it's not so much a matter of 'how you design' rather than a matter of picking > the right application , especially one that provide a good computation over > communicatio ratio. As an example implementing a 9x9 2d convolution on the same FPGA > wouldl certainly provide a huge speed up. > > Steven -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 19115
Keith Jasinski, Jr. wrote: > I use a mix of schematic and Verilog. I put state machines in Verilog and > most other stuff in schematic. Verilog (or VHDL) cannot be beat for state > machines. Easy to code, easy to dianose, easy to change in seconds instead > of hours to re-design the machine. The more complex the state machine, the > greater the time savings. Perhaps not, but in many cases you can do as well with a structured schematic. I use wrappers around the basic components for 1-hot state machines so that in the schematic the state machine looks like a flowchart. It makes the SM easy to grok, and entry and modifications are easy to do. It also has the advantage of making it easy to see how many levels of combinatorial logic will be needed. For small encoded machines, you can use n:1 selectors with the select inputs driven by the state machine registers. The data inputs are tied to 0, 1, control or not control to direct the state machine in the next state. The tools will reduce the selector logic to a log2(n) or less input gate, and the function can be read off the mux inputs. It's a bit harder to follow, but still not to bad. -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 19116
Ray Andraka wrote: > In the case of a 3x3 convolution, the FPGA can still significantly outperform the > Pentium, and it uses a lower clock frequency and much less power. The secret here is > that the FPGA can perform all the multiplies in parallel, while the pentium does at best > one multiply per clock. MMX can perform 8 multiplication per clock cycle on 8 bit words (those used for image processing) and moreover recent PIII clock speed are approx 4 times those of FPGA. Anyhow, in my previous post I was considering Multplier working in parrallel. > Use of on chip memory or a dedicated external memory buffer > allows the FPGA to process each pixel as it arrives without having to repetitively fetch > the surrounding pixels. Right; but total execution time will always be bounded by the time it takes to transfer the orginal image into the FPGA plus the time to transfer the resulting image out of the FPGA. What I wanted to say is that in many cases the maximum speed-up that you can expect from an FPGA solution is strongly limited by communication time. > In many cases, the pixel rate is even slow enough that the FPGA > can process the data serially. A 640x480 image at 60 frames/sec has a pixel rate of only > 18 MHz. If the pixels are 8 bits, then you can condense the hardware considerably by > working on two bits at a time at a system clock of 73 MHz. You could also do that using a 200 Mhz Pentium. FPGA won't provide any speed-up here because again the limitating factor in execution time is bandwidth not processing power ! > Yes, I've done this, and a > 3x3 takes up a very small area - less than 100 CLBs. So here you definitely gain in terms of area/complexity over MMX, but I think the initial question was more about speed than "cost effectiveness". > memory (may have to be wider for the I/O bandwidth if it is off-chip), but has no real impact > on the pixel rate...something a microprocessor implementation can't claim. > The power savings are also considerable. I didn't put the power savings in my paper, as > I did not have measurements or estimates to cite. Parallelism lets you use a lower clock > frequency, and purpose built logic keeps the gate count small. It does use a different > design flow, and requires a different set of skills than a microprocessor based design. I agree ... > Steven Derrien wrote: > > > Ray Andraka wrote: > > > > > If the pentium outperformed an FPGA doing a 2D convolution, then whomever did > > > the design for the FPGA wasn't taking advantage of parallelism. See my paper > > > entitled "FGPA makes a radar signal processor on a Chip" for a discussion of how > > > these types of things are done in FPGAs. The paper discusses, among other > > > things, a complex 256 tap matched filter running at a 5 MHz sample rate. The > > > design discussed is doing roughly 10 Billion with a 'B' multiplications per > > > second. Thats more 2 orders of magnitude more performance than you'll get out > > > of a pentium. > > > > Such speed-up might not always happen especially when you consider problem with a > > high communication over computation ratio (3x3 2D Convolution for example). Then, > > the level of "usable" parallelism might be contrained by off-chip bandwidth as > > anyhow execution time is bounded by communication time (time to transfer dat in and > > off chip). > > Using a very very rough approximation you can express an upper bound for the > > effective parallelism in the FPGA using > > > > Pmax=(Computation_Volume*Computation_Time)/(Communication_Volume*Communication_Time); > > > > If you try to acccelerate your photoshop by implementing your 3x3 convolution > > routines on a PCI base FPGA board for example, your maximum parallelism will be > > (assuming a virtex design perfomring a 8bit MAC at 100Mhz, on a HxW image, with a > > PCI at full burst delevering 4x8 bits every 30ns) > > > > Pmax=((H*W*3*3)*10)/(2*H*W*30/4)=6 > > > > Which means 10/6=1.3ns for performing a 8 bit MAC operation. Using your PIII@600Mhz, > > as MMX can process 8 pixel per MMX intruction your optimal peak performance will be > > 2/8=0.25 ns . This peak performance is unrealistic as we don't consider loading and > > unloading data from MMX registers, in practice it shoul be be between two or three > > time more, which still matches (or even beat) FPGA performances... > > > > Note : I should probably also consider communictaion from main memory and cache > > misses for MMX version , which might actually worsen execution time, but not by a > > strong factor I think. > > > > Hence it's not so much a matter of 'how you design' rather than a matter of picking > > the right application , especially one that provide a good computation over > > communicatio ratio. As an example implementing a 9x9 2d convolution on the same FPGA > > wouldl certainly provide a huge speed up. > > > > Steven > > -- > -Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email randraka@ids.net > http://users.ids.net/~randraka StevenArticle: 19117
Hello, I was wondering if there exists some AGP based FPGA board ? Are there any commercial intereface chip like the one taht existe for pci (PLX, AMCC) ? StevenArticle: 19118
On Mon, 29 Nov 1999 22:07:02 GMT, s_clubb@NOSPAMnetcomuk.co.uk (Stuart Clubb) wrote: >Thanks Evan, > >There I was, thinking I could avoid this thread :-) > >I've put the note up on my personal webspace (oh, if only I had tome >for a real website). Those interested can find it at: > >http:\\www.netcomuk.co.uk\~s_clubb\increment.zip Or even http://www.netcomuk.co.uk/~s_clubb/increment.zip Thanks! - BrianArticle: 19119
I wasn't aware that the Pentium III has 8 multipliers. I agree that the performance is limited to the comm time. One of the advantages of using the FPGA however, is that you can do more processing before returning the data, especially if you have a local dedicated memory to use too. For a simple example, in edge detection you will probably use both a horizontal and vertical Sobel operator, each of which is a 3x3 (or larger) 2 D convolution. With the FPGA, you can do both at once and combine them before returning the result. As the algorithm becomes more complicated, the FPGA shows greater gains, In most cases, it can also work at a considerably lower clock rate, thereby reducing power. -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 19120
Ray Andraka wrote: > I wasn't aware that the Pentium III has 8 multipliers. It is actualy part of the MMX SIMD intruction set which perform 8 8bit multiplictaion per cycles. Its performances are however limited by the large number of cycle you need to load/unload the MMX register with correctly formatted data. > I agree that the performance is limited to the comm time. One of the advantages of using the FPGA > however, is that you can do more processing before returning the data, especially if you have a > local dedicated memory to use too. This is the same for CPU with on chip L2 cache (like PII), using good programming techniques you can have a good reuse of cached data and then limit off-chip comunication to its minimum. > For a simple example, in edge detection you will probably use both a horizontal and > vertical Sobel operator, each of which is a 3x3 (or larger) 2 D convolution. With the FPGA, you > can do both at once and combine them before returning the result. By using simple loop merging technique, this will also work for CPU with L2 cache memory. > As the algorithm becomes more complicated, the FPGA shows greater gains, Not necessary, FPGA can show greater gain for algorithms exhibiting good regularity in computations and with a high computation over communication ratio. > In most cases, it can also work at a considerably lower clock rate, thereby reducing power. True in most cases. Steven > -- > -Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email randraka@ids.net > http://users.ids.net/~randrakaArticle: 19121
Please feel free to circulate the following position announcements: Position: President Location: San Francisco Bay Area Get in on the ground level: start-up in the area of intellectual property reuse (design reuse) for system-on-a-chip. The system-on-a- chip marketplace is expected to grow from $5.9 billion in 1999 to $15.7 billion in 2003. Seeking President to work with CEO. Responsibilities: - Drive/implement business strategy - Help raise initial and subsequent venture capital rounds - Identify and create industry alliances/partnerships - Help assemble core team Requirements: - Background in semiconductor industry; understanding of intellectual property reuse desired. - Ability to work with business and technical personnel Please email your resume to ipreuse@my-deja.com Position: Chief Operating Officer Location: San Francisco Bay Area Get in on the ground level: Ground level start-up in the area of intellectual property reuse (design reuse) for system-on-a-chip. The system-on-a-chip marketplace is expected to grow from $5.9 billion in 1999 to $15.7 billion in 2003. Responsibilities: - Implement business strategy - Run day-to-day operations of start-up company - Identify and create industry alliances/partnerships Requirements: - Background in semiconductor industry; understanding of intellectual property reuse desired. - Ability to work with business and technical personnel Please email your resume to ipreuse@my-deja.com Position: Consulting Design Engineers Location: San Francisco Bay Area Get in on the ground level: Ground level start-up in the area of intellectual property reuse (design reuse) for system-on-a-chip. The system-on-a-chip marketplace is expected to grow from $5.9 billion in 1999 to $15.7 billion in 2003. Responsibilities: - Provide technical and process consulting in the area of design reuse Requirements: - EE degree; Background in semiconductor industry; technical understanding of intellectual property reuse. Please email your resume to ipreuse@my-deja.com Sent via Deja.com http://www.deja.com/ Before you buy.Article: 19122
Steven Derrien <sderrien@irisa.fr> writes: > Ray Andraka wrote: > > > In the case of a 3x3 convolution, the FPGA can still significantly outperform the > > Pentium, and it uses a lower clock frequency and much less power. The secret here is > > that the FPGA can perform all the multiplies in parallel, while the pentium does at best > > one multiply per clock. > > MMX can perform 8 multiplication per clock cycle on 8 bit words (those used for image > processing) and moreover recent PIII clock speed are approx 4 times those of FPGA. > > Anyhow, in my previous post I was considering Multplier working in parrallel. > > > Use of on chip memory or a dedicated external memory buffer > > > allows the FPGA to process each pixel as it arrives without having to repetitively fetch > > the surrounding pixels. > > Right; but total execution time will always be bounded by the time it takes to transfer the > orginal image into the FPGA plus the time to transfer the resulting image out of the FPGA. > Not necessarily: the image stream can "flow-through" the FPGA and be processed on the fly. Delay lines can be implemented inside the FPGA to build pixel neighborhoods. The delay between the output and the input images is given by the latency of the arithmetic operators (e.g. multipliers) and the need to fill the delay lines to build the neighborhoods. Some ideas relevant to this problem are presented in these papers: @InProceedings{cvpr98, author = {A. Benedetti and P. Perona}, title = "{Real-time 2-D Feature Detection on a Reconfigurable Computer}", booktitle = {Proceedings of the 1998 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'98)}, year = {1998}, month = {Jun.}, address = {Santa Barbara (CA)}, pages = {586--593} } @InProceedings{iscas99, author = {A. Benedetti and P. Perona}, title = "{A Novel System Architecture for Real-Time Low-Level Vision}", booktitle = {Proceedings of the 1999 IEEE Symposium on Circuit and Systems (ISCAS'99)}, year = {1999}, month = {Jun.}, address = {Orlando (FL)} } Best, -Arrigo -- Dr. Arrigo Benedetti e-mail: arrigo@vision.caltech.edu Caltech, MS 136-93 phone: (626) 395-3695 Pasadena, CA 91125 fax: (626) 795-8649Article: 19123
Costech has a free classified page on their site no gimmicks absolutely free. Anything you wish to sell please feel free to post it there. You may include a picture linked to your URL. http://www.costech.com. ThanksArticle: 19124
Basically what I was trying to say. Given enough local memory and the bandwidth (pins) to access it, an FPGA can do the processing at the video frame rate, as demonstrated in numerous systems, including the one described in my 1996 paper " A Dynamic Hardware Video Processing Platform " which did image recognition processing at the frame rate using a tiled array of 4 chips similar to the Atmel AT6005s. Arrigo Benedetti wrote: > Steven Derrien <sderrien@irisa.fr> writes: > > > Not necessarily: the image stream can "flow-through" the FPGA and be processed > on the fly. Delay lines can be implemented inside the FPGA to build pixel > neighborhoods. The delay between the output and the input images is given by > the latency of the arithmetic operators (e.g. multipliers) and the need to fill > the delay lines to build the neighborhoods. > > Some ideas relevant to this problem are presented in these papers: > -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randraka
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z