Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
I am looking at performing real data, fixed point FFTs in an FPGA and I would like to get some info on the processing time and logic size required. The input data is 14 bit, 2048 points. A standard optimization for processing real data is to fold the data into the complex input array, so that you only process a 1024 point FFT and then unfold the real data in an extra step. We have a DSP available which can do the final unfolding step. I checked the Altera web site and found info on their megacore function. For a 1K FFT, they use about 3000 LE's and 10 block rams (EABs). They claim the max speed is 90 MHz for 57 uS per block. This is only 3x what I can get from the DSP chip! Is the Altera megacore not highly optimized for speed? Are there other cores available that can process the data at a higher clock rate? The data is clocked in at 100 MHz burst rate, if it is fully pipelined and can start another butterfly each 4 clock cycles it should be able to process the data in 20 uS. Perhaps that is too much to expect since there are log2(N)/2 passes. I would like to process the block in 20 uS. At that point the processing time becomes insignificant in the overall process. Is that too much to expect from a hardware solution without using a thousand dollar chip? -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 29926
I think the many fast combinatorial 18 x 18 multipliers in Virtex-II give it a real advantage. I will try to post data tomorrow ( Monday). Peter Alfke Rick Collins wrote: > I am looking at performing real data, fixed point FFTs in an FPGA and I > would like to get some info on the processing time and logic size > required. The input data is 14 bit, 2048 points.Article: 29927
Rick, You get what you pay for applies here. The xilinx virtex core is faster, with ~95 MHz sample rates in the slowest speed grade (virtex-4) parts, but still not as fast as a truely optimized design (if you ever looked at the floorplan of the xilinx macro you'd see what I mean). We offer a 16 point FFt kernel for Xilinx Virtex and VirtexE families. It occupies 20 x 25 CLBs and will run at > 240 MS/S in a VIrtexE-8 device. The 16 point kernel plus a cordic rotator, block RAM and some addressing logic will handle 256 and 4K point FFTs either as 2-3 passes through the same kernel or using 2-3 kernels at near full rate (the data rate gets limited by the block RAM access for the larger FFTs). Right now we don't have the larger FFTs encapsulated as a core, but we have done the 4K FFTs for a couple of customers. Give me a call if you want more info. VirtexII claims to do a 1K FFT in 320ns, but I believe that design uses most of the largest device. I suspect I could beat that core with mine by putting several of mine in parallel (both in terms of speed and area). Rick Collins wrote: > > I am looking at performing real data, fixed point FFTs in an FPGA and I > would like to get some info on the processing time and logic size > required. The input data is 14 bit, 2048 points. A standard optimization > for processing real data is to fold the data into the complex input > array, so that you only process a 1024 point FFT and then unfold the > real data in an extra step. We have a DSP available which can do the > final unfolding step. > > I checked the Altera web site and found info on their megacore function. > For a 1K FFT, they use about 3000 LE's and 10 block rams (EABs). They > claim the max speed is 90 MHz for 57 uS per block. This is only 3x what > I can get from the DSP chip! > > Is the Altera megacore not highly optimized for speed? Are there other > cores available that can process the data at a higher clock rate? The > data is clocked in at 100 MHz burst rate, if it is fully pipelined and > can start another butterfly each 4 clock cycles it should be able to > process the data in 20 uS. Perhaps that is too much to expect since > there are log2(N)/2 passes. I would like to process the block in 20 uS. > At that point the processing time becomes insignificant in the overall > process. Is that too much to expect from a hardware solution without > using a thousand dollar chip? > > -- > > Rick "rickman" Collins > > rick.collins@XYarius.com > Ignore the reply address. To email me use the above address with the XY > removed. > > Arius - A Signal Processing Solutions Company > Specializing in DSP and FPGA design URL http://www.arius.com > 4 King Ave 301-682-7772 Voice > Frederick, MD 21701-3110 301-682-7666 FAX -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.comArticle: 29928
Nallatech are offering a Virtex-II evaluation board called the Ballynuey-3, which is largely similar in functionality to the Ballynuey-2. See our website www.nallatech.com for up to date info. Regards Allan Cantle Nallatech LtdArticle: 29929
Peter Alfke wrote: > > I think the many fast combinatorial 18 x 18 multipliers in Virtex-II give > it a real advantage. I will try to post data tomorrow ( Monday). > Peter Alfke > > Rick Collins wrote: > > > I am looking at performing real data, fixed point FFTs in an FPGA and I > > would like to get some info on the processing time and logic size > > required. The input data is 14 bit, 2048 points. Thanks for the suggestion Peter, but I have to use parts that I can get. I don't remember exactly what has been said about the XC2V introduction schedule, but I don't see any sign that XC2V parts are remotely available. I also seem to remember that there are no low cost members of this family. The approach I want to take with this project is to design a board that will use low cost parts for a "standard" version, or can be built with larger, faster FPGAs for "special" needs such as this one. The XC2V parts aren't pin compatible with XC2S parts are they? I also can't use the XC2S parts or the XCV parts because of the startup current issue. I only have 2 Amps of max current available and I don't even know for sure that this can be supplied during the power up ramp. The Altera parts are MUCH better in this regard. With a total of 5 Xilinx parts on the board, an industrial temperature version of the board will require 10 AMPS if I use all Xilinx parts. The Altera version will only use <1.2 AMPS. I can consider using a single XC2V part on an optional daughter board if there is one I know I can get my hands on. Will I be able to get the XC2V40 or XC2V80 in an FG256 package anytime in the next two months? I see pricing on the web, but I see no sign of availability. In fact, Avnet lists it as a special order and the Arrow web site seems to have forgotten that they sell Xilinx at all. You guys may make great parts, but lately they just don't seem to fit on my boards... -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 29930
Ray Andraka wrote: > > Rick, > > You get what you pay for applies here. The xilinx virtex core is faster, with > ~95 MHz sample rates in the slowest speed grade (virtex-4) parts, but still not > as fast as a truely optimized design (if you ever looked at the floorplan of the > xilinx macro you'd see what I mean). We offer a 16 point FFt kernel for Xilinx > Virtex and VirtexE families. It occupies 20 x 25 CLBs and will run at > 240 > MS/S in a VIrtexE-8 device. The 16 point kernel plus a cordic rotator, block RAM > and some addressing logic will handle 256 and 4K point FFTs either as 2-3 passes > through the same kernel or using 2-3 kernels at near full rate (the data rate > gets limited by the block RAM access for the larger FFTs). Right now we don't > have the larger FFTs encapsulated as a core, but we have done the 4K FFTs for a > couple of customers. Give me a call if you want more info. VirtexII claims to > do a 1K FFT in 320ns, but I believe that design uses most of the largest > device. I suspect I could beat that core with mine by putting several of mine > in parallel (both in terms of speed and area). > > Rick Collins wrote: > > > > I am looking at performing real data, fixed point FFTs in an FPGA and I > > would like to get some info on the processing time and logic size > > required. The input data is 14 bit, 2048 points. A standard optimization > > for processing real data is to fold the data into the complex input > > array, so that you only process a 1024 point FFT and then unfold the > > real data in an extra step. We have a DSP available which can do the > > final unfolding step. > > > > I checked the Altera web site and found info on their megacore function. > > For a 1K FFT, they use about 3000 LE's and 10 block rams (EABs). They > > claim the max speed is 90 MHz for 57 uS per block. This is only 3x what > > I can get from the DSP chip! > > > > Is the Altera megacore not highly optimized for speed? Are there other > > cores available that can process the data at a higher clock rate? The > > data is clocked in at 100 MHz burst rate, if it is fully pipelined and > > can start another butterfly each 4 clock cycles it should be able to > > process the data in 20 uS. Perhaps that is too much to expect since > > there are log2(N)/2 passes. I would like to process the block in 20 uS. > > At that point the processing time becomes insignificant in the overall > > process. Is that too much to expect from a hardware solution without > > using a thousand dollar chip? Ray, I may well be calling you in the next couple of days, but I just don't think I can use a Xilinx part for this unless I find a "special" spot on the board. I am in the process of desiging a "standard" board product and am trying to use it in a "custom" application. In the standard mode, I want to use 5 FPGAs on the board since four of them are used as IO controllers for field replaceable daughter boards. The board is generating its own 3.3 and x.x volt power from a 5 volt input. So we can't use the XC2S or XCV parts because of the startup current problem. I would consider the XC2V parts since they do seem like a significant advance in capability. But the price is too high to use them in the "standard" version of the board. So the only way I could use a Xilinx part is to put it on a daughterboard as a "special" IO feature. I will consider this, but I prefer to use the FPGAs I already have on the main board, possibly bumping the size of the part. So have you done much with the Altera parts? -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 29931
Hi, Any leads on VHDL code to implement the following system: Inputs Inputseq: A sequence of standard logic vector elements each of the same width representing 2's complement inputs typically 16-bits wide. clock: sysclk Each input sequence element is clocked in on the rising edge of sysclk. Outputs: outputA: same size and type as input outputB: same size and type as input outclock: sysclk/2 Assuming the input sequence elements are represented as: x0,x1,x2,x3,x4,x5,x6,x7... The desired output elements should be: outputA: x0,-x2,x4,-x6,x8,... outputB: x1,-x3,x5,-x7,x9,. The outputs are at half the input rate and should appear at the same time (i.e x0 and x1 , -x2 and -x3 etc...) The output clock (outclock) will be used the clock the next processing stage whose inputs will be outputA and outputB. DavidArticle: 29932
In the earlier Xilinx chips (3000, 4000, 5200) there is always 2 TBUFs per CLB of 2 LUT+FF. And both TBUF lines can be read. In Virtex and Spartan-II it is down to 2 TBUFS for the 4 LUT+FF of an CLB, so only one slice can be routed to them, and even only 1 line for reading back from TBUF lines. In Virtex-II there is even only 2 TBUFs for 8 LUT+FF per CLB. Which makes even connecting the output of the 4-wide 2 slices of an single carry chain to an bus impossible. The data sheet does not give the amount of readbacks. From this I get the impression, that Xilinx regards TBUF buses as going out of fashion. After all, the TBUFs cost in chip space is next to nothing relative to them many PIPs (about 900 per CLB in Virtex). In the Jan Gray RISC processors TBUFs are used to implement processor internal data buses in no space. I have the same type of situation, with many data producing elements to be selected from. TBUFs seem to be ideal _horizontal_ wide AND-ORs (vertical is being used for the bits, because of the carry chains). So I have a question: What is the Xilinx-suggested replacement for TBUFs? Is one supposed to use MUXes implemented in the CLBs? Is there an other trick I have not yet stumbled over? Note that I need to use Spartan-II, as Spartan is too small and JBits only runs on Virtex and Spartan-II anyway. -- Neil Franklin, neil@franklin.ch.remove http://neil.franklin.ch/ Hacker, Unix Guru, El Eng HTL/FH/BSc, Sysadmin, Roleplayer, LARPerArticle: 29933
Rick Collins wrote: > Thanks for the suggestion Peter, but I have to use parts that I can get. > I don't remember exactly what has been said about the XC2V introduction > schedule, but I don't see any sign that XC2V parts are remotely > available. I also seem to remember that there are no low cost members of > this family. As you know, I am neither in Marketing nor in Sales, but: We are just finishing a project that puts hundreds of evaluation boards in the hands of our FAEs, and each board has an XC2V40 on it (BG256). So these parts are available, and I have heard prices of $40 going down to $10. XC2V1000s are also becoming available, and also comes in the same package (pin-out-compatible). You also don't have to worry about start-up current in XC2V devices, that problem has been thoroughly licked, the start-up current is something like 40 mA. You should consider the XC2V1000 available, and you may want to play with the XC2V40 to get a feel for the new features ( Clock management, large BlockRAMs and multipliers, digitally-controlled output impedance = built-in series termination ) Every feature of the larger parts, even the 16 global clocks with glitch-free input muxes, is available in the tiny XC2V40. Stop me, I just came back from a seminar tour... Peter Alfke, Xilinx ApplicationsArticle: 29934
>I would love to put all of these designs into a single part to save >board space, chip cost, power and save on procurement and assembly cost. >But to make my system work, I will need supported, partial >reconfiguration. I would need to load a portion of the chip with a main >(static) function, and four modules to match the IO connected to the >board. Here is a probably crazy suggestion... Do you have a CPU handy that can help with the initialization? In particular, can it do serious compuation to setup the right bit pattern to feed to the FPGA? How many different configurations do you have in your 4 IO modules? First suggestion is to stuff everything into one chip and then setup some script to make the bit pattern for all the interesting combinations of IO modules. Then just load the right one. I'm assuming you can setup some scripts to make the required configurations so you don't have to do it by hand. You can probably save disk/ROM space (if that's interesting) by diffing various configurations and reconstructing the ones you need on the fly. Here comes some serious handwaving... Suppose a 1 in the configuration file means that a pass transistor is turned on. Then you can merge two designes by ORing the bits together. So you might be able make a basic design and allocate space in the big chip for the IO modules. As long as each IO module didn't use any resources outside the allocated space, it couldn't conflict with other IO modules. That may not work for long lines. The idea is to make a basic module with the don't-discard-unused parts option, save that. Then make another module with each IO module in each of the locations, diff against the basic module and save the difference. You probably have to inspect the result by hand to verify that nothing is outside the space you allocated. If that all works, you can make a custom module by just ORing the appropriate IO module/slot combinations into the basic module. -- These are my opinions, not necessarily my employeers. I hate spam.Article: 29935
>I' like to use a CPLD/FPGA (Xilinx) to receive data from the parallel >port (EPP-mode) of my PC. > >Is it a good style to react direct on the edges of the port signals (e. >g. adress/data strobes) or would it be better to use a fast PLD-Clock to >sample the port and then to evaluate the signals in a clocked logic? I think you didn't provide a critical chunk of information. What are your goals/priorities? What are the relative importances of bandwidth, correctness, design time? ... If you run all the async signals through the standard pair of FFs then you (probably) won't have any problems from metastability. That will cost you 1.5 cycles (average) of round trip time which turns into reduced bandwidth. If your top goal is max throughput, then you are almost forced to use some kludgy logic driven off the strobes. Fortunately, that is (probably) small enough that you can get it right. -- These are my opinions, not necessarily my employeers. I hate spam.Article: 29936
> +------------|&|----------> CE > | | > | |---+--->| | | >Strobe ------>| FF1 | | FF2 | | > | | | |O-+ >CLK -------------^--------------^ > >I'll sample the strobe and data signals with the same clock. The strobe signal >is shifted through two FF's in series. >These two FF's generate the CE (FF1 AND NOT FF2, rising edge) signal for the >outgoing data. That's the classic way to get metastability troubles. It will work fine if your clock rate is slow enough. But CE goes to the whole data register so it is likely to have longer routing. That would set off my alarm bells. -- These are my opinions, not necessarily my employeers. I hate spam.Article: 29937
Peter Alfke wrote: > > Rick Collins wrote: > > > Thanks for the suggestion Peter, but I have to use parts that I can get. > > I don't remember exactly what has been said about the XC2V introduction > > schedule, but I don't see any sign that XC2V parts are remotely > > available. I also seem to remember that there are no low cost members of > > this family. > > As you know, I am neither in Marketing nor in Sales, but: > We are just finishing a project that puts hundreds of evaluation boards in the > hands of our FAEs, and each board has an XC2V40 on it (BG256). So these parts > are available, and I have heard prices of $40 going down to $10. XC2V1000s > are also becoming available, and also comes in the same package > (pin-out-compatible). > You also don't have to worry about start-up current in XC2V devices, that > problem has been thoroughly licked, the start-up current is something like 40 > mA. > You should consider the XC2V1000 available, and you may want to play with the > XC2V40 to get a feel for the new features ( Clock management, large BlockRAMs > and multipliers, digitally-controlled output impedance = built-in series > termination ) Every feature of the larger parts, even the 16 global clocks > with glitch-free input muxes, is available in the tiny XC2V40. Stop me, I > just came back from a seminar tour... > > Peter Alfke, Xilinx Applications Sometimes I get really bummed out when I just can't find a way to make something work that would be so perfect. The availability is likely not a real problem except that I am a bit too cautious to commit to a part and then not be able to get what I need for production. In this case production is at least 6 months away. So if the XC2V40 and XC2V1000 parts were available now and I had some reason to believe that I could get them at reasonable prices by the point of production (like a quote) I would love to design them in. But one thing I forgot was that I need to interface to a 5 volt PC/104 bus. The 5 volt IO would make the design much more complex. I would have to add another power rail for an XC2S part or add many buffer parts. Neither one is very workable. I seem to remember that one of the V parts was 5 volt TTL compatible if you added series resistors to limit the current. But that would mean some 90+ extra resistors on the board! But it might work. Will the XC2V work this way? BTW, how can the part be PCI compliant without being 5 volt tolerant? Is it only 3 volt PCI compliant? -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 29938
Hi, thanks to all for the helpfull advices that has made things more clear. HeinrichArticle: 29939
Eric Smith wrote: > WebPACK does NOT support the Virtex parts. I was sure I had read that it supported the whole Virtex family... I'll see if I can find it. I think I'll go back to my old Foundation 2.1i -- Nicolas MATRINGE IPricot European Headquarters Conception electronique 10-12 Avenue de Verdun Tel +33 1 46 52 53 11 F-92250 LA GARENNE-COLOMBES - FRANCE Fax +33 1 46 52 53 01 http://www.IPricot.com/Article: 29940
Does anyone know of any texts concerning implementing digital video coding (compression, DCT etc.) on FPGAs? Thanks, -- Frode Vatvedt FjeldArticle: 29941
Eric Smith wrote: > WebPACK does NOT support the Virtex parts. The only FPGAs > WebPACK supports are the Spartan II and a single Virtex-E part, > the XCV300E. I was only talking about the Floorplanner: Floorplanner Guide Chapter 1: Introduction Supported Architectures The Floorplanner supports all Xilinx architectures in the Spartan/-II™, Virtex/-E/-II™, and XC4000™ device families. (quoted from the WebPACK help) -- Nicolas MATRINGE IPricot European Headquarters Conception electronique 10-12 Avenue de Verdun Tel +33 1 46 52 53 11 F-92250 LA GARENNE-COLOMBES - FRANCE Fax +33 1 46 52 53 01 http://www.IPricot.com/Article: 29942
This is a multi-part message in MIME format. --------------9D2B3663DDFBCAF862B2A490 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit What is the difference between System Gates (Virtex Data Sheet) and Logic Gate Equivalent (MAP report)? Does the LGE include BlockRAMs? How LGE is computed? -- Regards, Pawel J. Rajda ----------------------------------------------------------------------------- Pawel J. Rajda, MSc. E.E. mail: pjrajda@uci.agh.edu.pl Dept. of Electronic Engineering www: http://galaxy.uci.agh.edu.pl/~pjrajda AGH Technical University tel: (+48-12) 617 3980 Al. Mickiewicza 30 fax: (+48-12) 633 2398 30-059 Cracow, POLAND ----------------------------------------------------------------------------- --------------9D2B3663DDFBCAF862B2A490 Content-Type: text/x-vcard; charset=us-ascii; name="pjrajda.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Paweł J. Rajda Content-Disposition: attachment; filename="pjrajda.vcf" begin:vcard n:Rajda;Pawel J. x-mozilla-html:FALSE org:AGH Technical University version:2.1 email;internet:pjrajda@uci.agh.edu.pl title:M.Sc. E.E. tel;fax:+48 12 633 2398 tel;home:+48 12 634 0653 tel;work:+48 12 617 3980 adr;quoted-printable:;;Dept. of Electronics=0D=0AAl. Mickiewicza 30;Krakow;;30-059;POLAND x-mozilla-cpt:;0 fn:Rajda, Pawel J. end:vcard --------------9D2B3663DDFBCAF862B2A490--Article: 29943
Indeed, the halving of TBUFs/LUT in Virtex, and again in V-II, make my datapaths larger/less functional per LUT, compared with XC4000. (Consider what happens to the result mux in the xr16 CPU datapath schematic S3 on p. 9 of www.fpgacpu.org/papers/xsoc-series-drafts.pdf, for example. Not to mention the "zero cost" <<2, <<4, <<8, >>2, >>4, >>8 shifters and bus byte/word/longword resizers you can build with spare columns of TBUFs.) Reoptimizing for Virtex has been a chore. (Another setback in Virtex vs. 4000 was the loss of independent clock inversion on LUT RAM WCLKs and LUT FF CLKs, but that's another story.) But for V-II there seems to be no practical alternative but to a) use (waste) LUTs and their interconnect to build these horizontal muxes, and/or b) recode your design to help your technology mappers merge some of the muxes into other logic. Regarding (b), using the Virtex-style carry logic (including MULT_AND), it seems possible to build these "free mux" structures: 1) o[i] = addsub ? (a[i] + b[i]) : (a[i] - b[i]) 2) o[i] = add ? (a[i] + b[i]) : c[i] 3) o[i] = addb ? (a[i] + b[i]) : (a[i] + c[i]) 4) o[i] = addsub ? (addand ? a[i]+b[i] : a[i]-b[i]) : (addand ? a[i]&b[i] : a[i]^b[i]) See http://www.fpgacpu.org/log/nov00.html#001112 for details. Synthesis tools get (1) (usually) but (as far as I know) miss the others. Consider case (2). An add followed by a mux would seem to be a pretty common circuit structure, and therefore important to optimize. Surely using the single-LUT-per-bit construction is a no-brainer, right? Not so fast! There are some tools issues. If you inefficiently implement this as two LUTs: t = a[i] + b[i]; o[i] = add ? t : c[i] then trce will "see" that the latency from c[i] to o[i] is Tilo. Good. But if you implement it in one LUT as o[i] = add ? (a[i] + b[i]) : c[i] e.g. o[i] = add&(a[i]^b[i]) + ~add&c[i] along with the appropriate configuration of MULT_AND, MUXCY, and XORCY, then (if I recall correctly) trce will also find false ripple-carry paths from c[i], e.g. from c[0] to o[31], which would therefore interfere with correct static timing analysis and with timing driven placement and routing. Oops! Therefore, I would like to see two tools enhancements to enable correct inference of add/mux in one LUT per bit: a) Xilinx should enhance trce to do a more precise analysis around ripple-carry structures, e.g. to rule out the false path from c[i] through the carry chain to o[i+1]...o[n]. Here with 'add' feeding MULT_AND, there is no carry-out if 'add' is false, and also, c[i] does not influence the carry-out when 'add' is true, and thus the MUXCY carry-out does not depend upon c[i]. b) Xilinx should lobby its synthesis partners to infer add/mux structures like (2)-(4) when possible. Or encourage a user-directive to force it. If both (a) and (b) were done, then Xilinx customers (synthesis users and RPM builders alike) would probably enjoy somewhat smaller and faster results in the devices they're already using. This add/mux inference digression aside, abundant TBUFs were useful and will be missed. But I suppose that any FPGA feature that HDL synthesis users and tools do not take good advantage of, is not long for this world. Jan Gray, Gray Research LLCArticle: 29944
Hello everybody !!! I need to know how to place and route the design given below. the code shown below has to work in 100 mz.but when i synthesized this with xilinx foundation series 2.1 it showed me 32 mz. when i synthesized the same code again it showed me 40 mz working fenquency. why this difference ? is it possible it to make 100 mz. is it possible to place and route the design in virtex device such that its working frequency is 100 mz. the code is LIBRARY IEEE ; USE IEEE.STD_LOGIC_1164.ALL; USE IEEE.STD_LOGIC_UNSIGNED.ALL ; ENTITY ADDRESS_GENERATOR is port ( sfp : in std_logic ; clk : in std_logic ; reset : in std_logic ; READ_address : out std_logic_vector(10 downto 0) ); END ADDRESS_GENERATOR ; ARCHITECTURE BEHAV OF ADDRESS_GENERATOR IS signal read_address_s : std_logic_vector(10 downto 0); begin process(clk,reset) begin if reset='1' then read_address_s <= (others => '0' ) ; elsif clk='1' and clk'event then if sfp='1' then read_address_s <= "00000011000" ; elsif read_address_s ="10000110111" then read_address_s <= (others => '0') ; else read_address_s <= read_address_s + 1 ; end if ; end if ; end process ; read_address <= read_address_s ; end behav ; configuration cfg_address_generator_behav of address_generator is for behav end for ; end cfg_address_generator_behav; is there any material or web sites how to place and route the design in to the virtex device such that its working frency is very high. thanx in advance regards ManjunathanArticle: 29945
Hello all, I have two questions regarding Xilinx Spartan-II I/O. 1. Abuse of VRef as differential input. I need one high quality low jitter input clock. (<50ps RMS Jitter) I found a couple of Clock Synthesizers with PECL outputs that have a RMS jitter down to 2.6ps. Now I am wondering how to interface PECL to a Spartan-II. Of course I could buy a PECL to CMOS converter. I could also use Virtex-II or Virtex-E but engineering is the art of building what you need with what you have, therefore I woul like to know: - could I set the VRef to 2.8V and use one of the PECL signals as single ended clock input? (2.3V to 3.3V signal) - could I connect VRef of one bank to the inverted CLK signal and GCLK to the positive CLK signal an get a differential input as aresult? (If have a lot of unused I/O and can spare a bank) 2. Unused VCCO I am using a PQ208 Package where the VCCO of all banks are internaly tied together. However I am only using the outputs of two of the I/O banks. Is it sufficient to externaly connect VCCO of these two banks and leave the unused banks externaly unconnected to simplify the layout? Thanks in advance, KoljaArticle: 29946
Also, there are USB parts around that are not much more expensive than a configuration PROM. I think there are a lot of USB applications where an FPGA that receives its bitstream from the USB driver is a good way to go. Kolja > Believe me, I prefer to do FPGA designs. But using this type of part > makes a whole lot more sense than trying to implement USB in an FPGA. I > have the product concept nailed down, and I think the TUSB3200 is the > way to go. > > -aArticle: 29947
Jan Gray wrote: > > Indeed, the halving of TBUFs/LUT in Virtex, and again in V-II, make my > datapaths larger/less functional per LUT, compared with XC4000. (Consider > what happens to the result mux in the xr16 CPU datapath schematic S3 on p. 9 > of www.fpgacpu.org/papers/xsoc-series-drafts.pdf, for example. Not to > mention the "zero cost" <<2, <<4, <<8, >>2, >>4, >>8 shifters and bus > byte/word/longword resizers you can build with spare columns of TBUFs.) > > Reoptimizing for Virtex has been a chore. (Another setback in Virtex vs. > 4000 was the loss of independent clock inversion on LUT RAM WCLKs and LUT FF > CLKs, but that's another story.) Another drawback to the VIrtex is that you no longer get the carry chain for free for functions where you are only interested in the carry out. As a result, something like a saturating limiter that was able to be implemented in one column of CLBs in 4K, now takes two columns of slices with the LUTs in the first used as pass-throughs to the carry chain :-( > > But for V-II there seems to be no practical alternative but to a) use > (waste) LUTs and their interconnect to build these horizontal muxes, and/or > b) recode your design to help your technology mappers merge some of the > muxes into other logic. I haven't looked at it closely, but it seems to me that you might be able to use the horizontal OR chains for this. Have you investigated it? > > Regarding (b), using the Virtex-style carry logic (including MULT_AND), it > seems possible to build these "free mux" structures: > 1) o[i] = addsub ? (a[i] + b[i]) : (a[i] - b[i]) > 2) o[i] = add ? (a[i] + b[i]) : c[i] > 3) o[i] = addb ? (a[i] + b[i]) : (a[i] + c[i]) > 4) o[i] = addsub ? (addand ? a[i]+b[i] : a[i]-b[i]) : (addand ? a[i]&b[i] > : a[i]^b[i]) > > See http://www.fpgacpu.org/log/nov00.html#001112 for details. > > Synthesis tools get (1) (usually) but (as far as I know) miss the others. > > Consider case (2). An add followed by a mux would seem to be a pretty common > circuit structure, and therefore important to optimize. Surely using the > single-LUT-per-bit construction is a no-brainer, right? Not so fast! There > are some tools issues. Jan, you are correct. The tools do not properly infer this (as well as certain adds/counters with resets if they are anything but a dirt simple adder/increment). This, and ability to direct placement are some reasons I often use instantiated circuits within a generate instead of the more readable inferred logic. > > If you inefficiently implement this as two LUTs: > t = a[i] + b[i]; > o[i] = add ? t : c[i] > then trce will "see" that the latency from c[i] to o[i] is Tilo. Good. > > But if you implement it in one LUT as > o[i] = add ? (a[i] + b[i]) : c[i] > e.g. > o[i] = add&(a[i]^b[i]) + ~add&c[i] > along with the appropriate configuration of MULT_AND, MUXCY, and XORCY, then > (if I recall correctly) trce will also find false ripple-carry paths from > c[i], e.g. from c[0] to o[31], which would therefore interfere with correct > static timing analysis and with timing driven placement and routing. Oops! Yep. TRCE doesn't do anything in the way of analyzing the logic in the circuit. It just adds delays between FFs. If you are careful with the constraints, you can block the false path, but it usually doesn't warrant the effort or the added potential for accidently ignoring a valid path. > > Therefore, I would like to see two tools enhancements to enable correct > inference of add/mux in one LUT per bit: > > a) Xilinx should enhance trce to do a more precise analysis around > ripple-carry structures, e.g. to rule out the false path from c[i] through > the carry chain to o[i+1]...o[n]. Here with 'add' feeding MULT_AND, there > is no carry-out if 'add' is false, and also, c[i] does not influence the > carry-out when 'add' is true, and thus the MUXCY carry-out does not depend > upon c[i]. > > b) Xilinx should lobby its synthesis partners to infer add/mux structures > like (2)-(4) when possible. Or encourage a user-directive to force it. > > If both (a) and (b) were done, then Xilinx customers (synthesis users and > RPM builders alike) would probably enjoy somewhat smaller and faster results > in the devices they're already using. > > This add/mux inference digression aside, abundant TBUFs were useful and will > be missed. But I suppose that any FPGA feature that HDL synthesis users and > tools do not take good advantage of, is not long for this world. > > Jan Gray, Gray Research LLC -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.comArticle: 29948
> In the earlier Xilinx chips (3000, 4000, 5200) there is always 2 > TBUFs per CLB of 2 LUT+FF. And both TBUF lines can be read. > > In Virtex and Spartan-II it is down to 2 TBUFS for the 4 LUT+FF of an > CLB, so only one slice can be routed to them, and even only 1 line for > reading back from TBUF lines. > > In Virtex-II there is even only 2 TBUFs for 8 LUT+FF per CLB. Which > makes even connecting the output of the 4-wide 2 slices of an single > carry chain to an bus impossible. The data sheet does not give the > amount of readbacks. > > From this I get the impression, that Xilinx regards TBUF buses as going > out of fashion. After all, the TBUFs cost in chip space is next to nothing > relative to them many PIPs (about 900 per CLB in Virtex). I believe a lot of this has to do with HDLs. I know that most all the people I know using HDLs for Xilinx design don't even know what a TBUF is, or even how they would use it. I also think the tools, tutorials, classes etc. poorly support using them.Article: 29949
Chris Dunlap wrote: > You can always look in FPGA editor. Nothing can be left out there. If its > routed or routable, its there. Sure it can be. Or can you used the mysterious undocumented IRDY/TRDY pins special features of Spartan-II in FPGA editor? Using a dominance in the FPGA market to get an advantage in the PCI-core market looks at lot like the Microsoft Internet Explorer case to me. CU, Kolja
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z