Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Just testing!Article: 17951
I read that Xilinx and others recommend not using the Global GSR in Virtex devices (apparently because they are too slow - but who cares ?), but instead to using normal routing resources. However, from first impressions of the Virtex slice structure it seems that the slice SR pin that is used to asynchronously set/reset the flip-flops is also used for the WE signal for rams. So, my understanding is that if you don't use the global GSR, you can't fit a write-enabled ram in the same slice as an async' set/reset flip-flop. Is this true ?. -- Edward MooreArticle: 17952
SHORT COURSES DSP/MULTIMEDIA/COMMUNICATIONS. http://www.cysip.comArticle: 17953
The carry architecture in virtex is not as powerful as that in the 4K, but there are some workarounds which really aren't rocket science. Loadable arithmetic can be done in front of the carry logic if you force the whole carry chain to a known value so that you know what the carry input of the xorcy is. For example, a loadable up-counter can be done in one slice per 2 bits if the load signal is active low. The load signal is fed to each LUT to control the mux in the LUT *and* to the carry-in of the first MUXCY and XORCY in the carry chain. The 0 side of each MUXCY is tied to zero so that if LD is low, the carry chain is forced low regardless of the output of the LUT. That way, the value out of the LUT propagates to through the xorcy unchanged. a similar trick can be played if the 0 side of the muxcy has to be a 1 for your logic (a downcounter for example), in which case you force the carry chain to all '1's and invert the load data in the LUT. In cases where the MUXCY D0 is used, such as an adder or a subtractor, you can use the MULT_AND to force the MUXCY D0 inputs to zero. So yes, it can be done in one level most of the time, Yes the new carry chain is not that great, and yes the libraries seem to have been done by an amateur. Edward Moore wrote: > I have designs which use a lot of loadable arithmetic, of the type : > > if load = '1' then > sum <= a; > else > sum <= a + b; > end if; > > I've just run one of these designs which was implemented in a 4000-xla-09 > device (slowest available) through synthesis and p&r targeted to a > Virtex -4 device (slowest available), and was suprised to find that it > consumed 20 % more resources, and only ran about 10 % faster. > > I've tracked most of this poor performance down to the loadable arithmetic, > which in the XLA devices consumed roughly 1 LUT per > bit, but in Virtex consume 2. This is because the arithmetic is being split > into a carry chain column and a seperate column for the loadable registers. > > Looking at the structure of a Virtex slice (I'm very new to Virtex's), I > can't see any way to implement loadable arithmetic as efficiently as in > the XLA devices. Can anyone offer any insights into this ?. I'm hoping that > i've missed something, as the 20 % LUT overhead wipes out any cost benefits > of moving to Virtex, and the slow speed puts a damper on any plans to run at > the advertised clock rates. > > I can't find any examples of a better implementation in Coregen (which > doesn't seems to support Virtex arithmetic). Should I wait for VirtexE ?. > > -- > Edward Moore -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 17954
Edward Moore wrote: > I read that Xilinx and others recommend not using the Global GSR in Virtex > devices (apparently because they are too slow - but who cares ?), but > instead to using normal routing resources. > You should care deeply. If your reset distribution is slower than your clock, then when you release reset, parts of the design will come out of reset a clock cycle later than other parts. Not generally a good situation. It can be dealt with, but you have to be careful there. > However, from first impressions of the Virtex slice structure it seems that > the slice SR pin that is used to asynchronously set/reset the flip-flops is > also used for the WE signal for rams. > yep, and the BY pin which can be used for the REV is used for the DI too. > So, my understanding is that if you don't use the global GSR, you can't fit > a write-enabled ram in the same slice as an async' set/reset flip-flop. > > Is this true ?. > I believe so. You also can't use the synchronous reset/set capability. You should be favoring the synchronous S/R over the async one for most FPGA designs. FPGAs really aren't well suited for asynchronous designs and guess what, the async sets and resets are....well, asynchronous (so are transparent latches, so in most cases you shouldn't be using them either). This shouldn't be a big deal in most designs. You really don't need to reset every flip-flop in the design, just the ones that won't reach a known state within some number of clock cycles of the reset of key flip-flops in the design. Oh, yeah, and there are a few that might not need to be reset in the hardware, but do need to be reset to make the simulation work. Those should be reset too to keep you sane and keep the design reviewers happy. -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 17955
Thanks. Generally I make sure the design doesn't care what state the flip-flops power up in. I only use the asynchronous set/reset to get the simulator going. BTW where does the name REV come from ? -- Edward Moore Ray Andraka <randraka@ids.net> wrote in message news:37E46CF4.939F9C40@ids.net... > > > Edward Moore wrote: > > > I read that Xilinx and others recommend not using the Global GSR in Virtex > > devices (apparently because they are too slow - but who cares ?), but > > instead to using normal routing resources. > > > > You should care deeply. If your reset distribution is slower than your clock, > then when you release reset, parts of the design will come out of reset a clock > cycle later than other parts. Not generally a good situation. It can be dealt > with, but you have to be careful there. > > > However, from first impressions of the Virtex slice structure it seems that > > the slice SR pin that is used to asynchronously set/reset the flip-flops is > > also used for the WE signal for rams. > > > > yep, and the BY pin which can be used for the REV is used for the DI too. > > > So, my understanding is that if you don't use the global GSR, you can't fit > > a write-enabled ram in the same slice as an async' set/reset flip-flop. > > > > Is this true ?. > > > > I believe so. You also can't use the synchronous reset/set capability. You > should be favoring the synchronous S/R over the async one for most FPGA > designs. FPGAs really aren't well suited for asynchronous designs and guess > what, the async sets and resets are....well, asynchronous (so are transparent > latches, so in most cases you shouldn't be using them either). > > This shouldn't be a big deal in most designs. You really don't need to reset > every flip-flop in the design, just the ones that won't reach a known state > within some number of clock cycles of the reset of key flip-flops in the > design. Oh, yeah, and there are a few that might not need to be reset in the > hardware, but do need to be reset to make the simulation work. Those should be > reset too to keep you sane and keep the design reviewers happy. > > -- > -Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email randraka@ids.net > http://users.ids.net/~randraka > >Article: 17956
Well I'm a bit dissapointed with Virtexes. I had an inkling that the work-around is to use an active low load. The problem is this requires a new library of instantiate-able arithhmetic, as the synthesis tools won't infer it. At least it will be possible to instantiate the carry chains; something which hasn't been possible in the 4000 series with Exemplar for a couple of years now. ( I just tried the lastest version of Leonardo and it is *still* 'optimising' away examine-ci's). Thankfully Exemplar has finally learned how to build inferred arithmetic properly. Anyway, the fact is that some designs, $ for $, will have poor performance when targeted to Virtex. -- Edward Moore Ray Andraka <randraka@ids.net> wrote in message news:37E469A1.2C61982C@ids.net... > The carry architecture in virtex is not as powerful as that in the 4K, but there > are some workarounds which really aren't rocket science. Loadable arithmetic > can be done in front of the carry logic if you force the whole carry chain to a > known value so that you know what the carry input of the xorcy is. > > For example, a loadable up-counter can be done in one slice per 2 bits if the > load signal is active low. The load signal is fed to each LUT to control the > mux in the LUT *and* to the carry-in of the first MUXCY and XORCY in the carry > chain. The 0 side of each MUXCY is tied to zero so that if LD is low, the carry > chain is forced low regardless of the output of the LUT. That way, the value > out of the LUT propagates to through the xorcy unchanged. a similar trick can > be played if the 0 side of the muxcy has to be a 1 for your logic (a downcounter > for example), in which case you force the carry chain to all '1's and invert the > load data in the LUT. In cases where the MUXCY D0 is used, such as an adder or > a subtractor, you can use the MULT_AND to force the MUXCY D0 inputs to zero. > > So yes, it can be done in one level most of the time, Yes the new carry chain > is not that great, and yes the libraries seem to have been done by an amateur. > > > > Edward Moore wrote: > > > I have designs which use a lot of loadable arithmetic, of the type : > > > > if load = '1' then > > sum <= a; > > else > > sum <= a + b; > > end if; > > > > I've just run one of these designs which was implemented in a 4000-xla-09 > > device (slowest available) through synthesis and p&r targeted to a > > Virtex -4 device (slowest available), and was suprised to find that it > > consumed 20 % more resources, and only ran about 10 % faster. > > > > I've tracked most of this poor performance down to the loadable arithmetic, > > which in the XLA devices consumed roughly 1 LUT per > > bit, but in Virtex consume 2. This is because the arithmetic is being split > > into a carry chain column and a seperate column for the loadable registers. > > > > Looking at the structure of a Virtex slice (I'm very new to Virtex's), I > > can't see any way to implement loadable arithmetic as efficiently as in > > the XLA devices. Can anyone offer any insights into this ?. I'm hoping that > > i've missed something, as the 20 % LUT overhead wipes out any cost benefits > > of moving to Virtex, and the slow speed puts a damper on any plans to run at > > the advertised clock rates. > > > > I can't find any examples of a better implementation in Coregen (which > > doesn't seems to support Virtex arithmetic). Should I wait for VirtexE ?. > > > > -- > > Edward Moore > > > > -- > -Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email randraka@ids.net > http://users.ids.net/~randraka > >Article: 17957
Ray, you didn't draw this did you? something like: clocked : process (rst, clk) begin if (rst = '0') then count <= (others => '0') ; elsif (clk'event and clk = '1') then if (ena = '1') then if (load = '0') then count <= load_data ; else count <= count + '1' ; end if ; end if ; end if; end process ; result <= count ; gets exactly the implementation you describe. For the loadable adder (probably a bit-slice add/pass multiplier?) The following code seems OK: clocked : process (rst, clk) begin if (rst = '0') then add <= (others => '0') ; elsif (clk'event and clk = '1') then if (ena = '1') then if (load = '1') then add <= a ; else add <= a + b ; end if ; end if ; end if; end process ; result <= add ; again, for 16 bit values I get 8 slices/16 FGs. The only caveat is that you'll need a recent release of Leonardo Spectrum to do this. 1999.1d or the latest release 'e' should do fine. (I was using 'd' at home) Registering the inputs a, b and the load, and not using I/O FF's gives this an internal speed of 199 MHz post P&R in a -6 part for the 16-bit busses. (M2.1i SP1) That should be fast enough, I think. Cheers Stuart On Sun, 19 Sep 1999 00:42:09 -0400, Ray Andraka <randraka@ids.net> wrote: >The carry architecture in virtex is not as powerful as that in the 4K, but there >are some workarounds which really aren't rocket science. Loadable arithmetic >can be done in front of the carry logic if you force the whole carry chain to a >known value so that you know what the carry input of the xorcy is. > >For example, a loadable up-counter can be done in one slice per 2 bits if the >load signal is active low. The load signal is fed to each LUT to control the >mux in the LUT *and* to the carry-in of the first MUXCY and XORCY in the carry >chain. The 0 side of each MUXCY is tied to zero so that if LD is low, the carry >chain is forced low regardless of the output of the LUT. That way, the value >out of the LUT propagates to through the xorcy unchanged. a similar trick can >be played if the 0 side of the muxcy has to be a 1 for your logic (a downcounter >for example), in which case you force the carry chain to all '1's and invert the >load data in the LUT. In cases where the MUXCY D0 is used, such as an adder or >a subtractor, you can use the MULT_AND to force the MUXCY D0 inputs to zero. > >So yes, it can be done in one level most of the time, Yes the new carry chain >is not that great, and yes the libraries seem to have been done by an amateur. > > > >Edward Moore wrote: > >> I have designs which use a lot of loadable arithmetic, of the type : >> >> if load = '1' then >> sum <= a; >> else >> sum <= a + b; >> end if; >> >> I've just run one of these designs which was implemented in a 4000-xla-09 >> device (slowest available) through synthesis and p&r targeted to a >> Virtex -4 device (slowest available), and was suprised to find that it >> consumed 20 % more resources, and only ran about 10 % faster. >> >> I've tracked most of this poor performance down to the loadable arithmetic, >> which in the XLA devices consumed roughly 1 LUT per >> bit, but in Virtex consume 2. This is because the arithmetic is being split >> into a carry chain column and a seperate column for the loadable registers. >> >> Looking at the structure of a Virtex slice (I'm very new to Virtex's), I >> can't see any way to implement loadable arithmetic as efficiently as in >> the XLA devices. Can anyone offer any insights into this ?. I'm hoping that >> i've missed something, as the 20 % LUT overhead wipes out any cost benefits >> of moving to Virtex, and the slow speed puts a damper on any plans to run at >> the advertised clock rates. >> >> I can't find any examples of a better implementation in Coregen (which >> doesn't seems to support Virtex arithmetic). Should I wait for VirtexE ?. >> >> -- >> Edward Moore > > > >-- >-Ray Andraka, P.E. >President, the Andraka Consulting Group, Inc. >401/884-7930 Fax 401/884-7950 >email randraka@ids.net >http://users.ids.net/~randraka > > For Email remove "NOSPAM" from the addressArticle: 17958
In 1.5(and maybe earlier version) this was a problem. You can check Xilinx web page and I believe you will find some information concerning the 9572 bsd files. Fortunately, when I ran in to this 1.5i was just comming out so I upgraded and the problem was corrected. I have the newer 2.xi but haven't installed it yet, but I would hope Xilinx didn't break it. >Hello > I have problem when I download to CPLD >It's have error with JTAG program. "not found xc9572_ver2.bsd" >I can't load to CPLD how can I solve this problem >Regard >Wannarat Suntiamorntut >ksuwanna@kmitl.ac.th > > >Article: 17959
I don't think this is true. I believe the synthizier must know about the targeted technology. >Additionally, for larger devices, you can compile, simulate >your designs using >Warp and target to a non Cypress part using a fitter from that vendor... >True as well?Article: 17960
Stuart Clubb wrote: > Ray, you didn't draw this did you? No, I just did it off the top of my head. I found out early on that schematics didn't help me out much with virtex because some of the primitives had no sim models. I'm not on the machine with synplicity on it right now, but last time I tried, code similar to what you list below was generating two levels of logic similar to the schematic library elements in the xilinx library (the xilinx libraries are lousy). I have been doing a structural instantiation to work around the synthesizer, so I haven't tried the higher level code on the 5.15. > > > something like: > > clocked : process (rst, clk) > begin > if (rst = '0') then > count <= (others => '0') ; > elsif (clk'event and clk = '1') then > if (ena = '1') then > if (load = '0') then > count <= load_data ; > else > count <= count + '1' ; > end if ; > end if ; > end if; > end process ; > > result <= count ; > > gets exactly the implementation you describe. > > For the loadable adder (probably a bit-slice add/pass multiplier?) > > The following code seems OK: > > clocked : process (rst, clk) > begin > if (rst = '0') then > add <= (others => '0') ; > elsif (clk'event and clk = '1') then > if (ena = '1') then > if (load = '1') then > add <= a ; > else > add <= a + b ; > end if ; > end if ; > end if; > end process ; > > result <= add ; > > again, for 16 bit values I get 8 slices/16 FGs. > > The only caveat is that you'll need a recent release of Leonardo > Spectrum to do this. 1999.1d or the latest release 'e' should do fine. > (I was using 'd' at home) > > Registering the inputs a, b and the load, and not using I/O FF's gives > this an internal speed of 199 MHz post P&R in a -6 part for the 16-bit > busses. (M2.1i SP1) That should be fast enough, I think. > > Cheers > Stuart > > On Sun, 19 Sep 1999 00:42:09 -0400, Ray Andraka <randraka@ids.net> > wrote: > > >The carry architecture in virtex is not as powerful as that in the 4K, but there > >are some workarounds which really aren't rocket science. Loadable arithmetic > >can be done in front of the carry logic if you force the whole carry chain to a > >known value so that you know what the carry input of the xorcy is. > > > >For example, a loadable up-counter can be done in one slice per 2 bits if the > >load signal is active low. The load signal is fed to each LUT to control the > >mux in the LUT *and* to the carry-in of the first MUXCY and XORCY in the carry > >chain. The 0 side of each MUXCY is tied to zero so that if LD is low, the carry > >chain is forced low regardless of the output of the LUT. That way, the value > >out of the LUT propagates to through the xorcy unchanged. a similar trick can > >be played if the 0 side of the muxcy has to be a 1 for your logic (a downcounter > >for example), in which case you force the carry chain to all '1's and invert the > >load data in the LUT. In cases where the MUXCY D0 is used, such as an adder or > >a subtractor, you can use the MULT_AND to force the MUXCY D0 inputs to zero. > > > >So yes, it can be done in one level most of the time, Yes the new carry chain > >is not that great, and yes the libraries seem to have been done by an amateur. > > > > > > > >Edward Moore wrote: > > > >> I have designs which use a lot of loadable arithmetic, of the type : > >> > >> if load = '1' then > >> sum <= a; > >> else > >> sum <= a + b; > >> end if; > >> > >> I've just run one of these designs which was implemented in a 4000-xla-09 > >> device (slowest available) through synthesis and p&r targeted to a > >> Virtex -4 device (slowest available), and was suprised to find that it > >> consumed 20 % more resources, and only ran about 10 % faster. > >> > >> I've tracked most of this poor performance down to the loadable arithmetic, > >> which in the XLA devices consumed roughly 1 LUT per > >> bit, but in Virtex consume 2. This is because the arithmetic is being split > >> into a carry chain column and a seperate column for the loadable registers. > >> > >> Looking at the structure of a Virtex slice (I'm very new to Virtex's), I > >> can't see any way to implement loadable arithmetic as efficiently as in > >> the XLA devices. Can anyone offer any insights into this ?. I'm hoping that > >> i've missed something, as the 20 % LUT overhead wipes out any cost benefits > >> of moving to Virtex, and the slow speed puts a damper on any plans to run at > >> the advertised clock rates. > >> > >> I can't find any examples of a better implementation in Coregen (which > >> doesn't seems to support Virtex arithmetic). Should I wait for VirtexE ?. > >> > >> -- > >> Edward Moore > > > > > > > >-- > >-Ray Andraka, P.E. > >President, the Andraka Consulting Group, Inc. > >401/884-7930 Fax 401/884-7950 > >email randraka@ids.net > >http://users.ids.net/~randraka > > > > > > For Email remove "NOSPAM" from the address -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 17961
Yes . . . the make-or-buy decision is complicated by the way in which FPGA prices are bandied about. If you check the website of a XILINX distributor, e.g. <www.avnet.com> you'll find XILINX prices. As you can see, the low-quantity prices exceed the cost of products using these devices by as much as a factor of 10. It's very difficult to understand. However, many tasks can be accomplished by a very fast processor, e.g. the SCENIX SX, which comesin speeds up to 100 MHz with 1 MIPS/MHz performance. In a processor like this, you can get a task which would take weeks in an FPGA running in a single day, and the parts are VERY inexpensive. They are, in most cases, competitive with small CPLD's, i.e. under $10 U.S. in unit quantity. There are things for which these aren't suitable, but it's not inconceivable to use two or three, or even a dozen, since they're small parts. What's more, given one of your boards, I won't be able to copy it right away. Dick On Sat, 21 Aug 1999 14:43:22 -0400, SIGTEK <rick@sigtek.com> wrote: >Pieter, > >Check out our APS-X240 boards at http://www.associatedpro.com they have >PC104 connectors 2 128K by SRAMs an osc socket, 6 50 pin connectors, and >can be ordered with any xilinx 4000 or 5000 series (XLA/XL/E/EX) 240 pin >QFP, The 50 pin connectors are .1 inch centered connectors with 25 signals >and 25 grounds so you could connect an AtoD demo board from Analog devices >for example right ot the board. (We have done this before). Also the X240 >board can be used in a PC ISA slot (with an optional carrier board) IN A >pc104 STACK, OR STAND ALONE. A 2.4 amp wall transformer is available for >stand alone operation. > >Pieter Op de Beeck wrote: > >> Hi, >> >> Currently I find myself in a position where I have to decide whether I >> should buy or make an fpga based board. To give more details : it >> should contain a modest fpga (XC4025) and possibly an A/D converter. >> That's it, not even external ram or anything. >> So, what would you suggest? >> >> By the way, where can I find pricelists for the Xilinx devices? >> >> Kind regards, >> Pieter >> >> Pieter Op de Beeck >> K.U.Leuven - ESAT - PSI/ACCA >> Kardinaal Mercierlaan 94 >> 3001 Heverlee >> pieter.opdebeeck@esat.kuleuven.ac.be >Article: 17962
Don't forget that you can buy microcontrollers built into an FPGA. <www.triscend.com>. If you build your own peripherals in the FPGA, these hold a great deal of promise. Dick On Fri, 27 Aug 1999 15:40:07 -0400, Joshua Lamorie <jpl@xiphos.ca> wrote: >Daniel Figuerola Estrada wrote: >> Has anyone worked with those two technologies and could give his opinion >> of them? > >Why look at them as mutually exclusive? We develop a controller board >(very small, PCMCIA form-factor) that has both a microcontroller AND an >FPGA. This allows for some amazing flexibility and utility. Especially >because of all sorts of neat devices already on the die on the uC. > >We run the FPGA at twice the clock speed of the uC, and operate >networking, flash/memory interface, ADC, and many more things in there. >We run our Fuzzy and PID control in the uC.. it's perfect. > >If you look at straight logic, you can also get amazing performance for >cost, weight, and power. Any comparisons are strictly dependant on a >specific application, but for many of our trade-off studies. A logic >only node can beat an embedded PowerPC!! > >In the end though, it's apples and oranges, so why not have fruit salad? >=) > >Joshua Lamorie >Systems Designer >Xiphos Technologies Inc.Article: 17963
I ran this in synplicity 5.15 and it does indeed do the right thing (one level of logic similar to what I described previously), so there has been an improvement in the synthesis results. I think it was 5.08 that put the mux after the carry logic forcing it to two levels of logic. BTW, if I were going to draw this, it would only take me 5-10 minutes: the way I would do it would be to make a one bit slice containing the muxcy, the xorcy, a 2:1 mux and the flip flop. Put the fmap around the 2:1 mux and put rloc=r0c0 on the fmap, the carry components and the ff. Draw a quick symbol for that circuit then instantiate it as many times as needed for the number of bits. An RLOC on each instantiation is all that is needed to complete it. Viola, an arbitrary width counter schematic in a few minutes. Next time I need it, I could reuse the symbol for the counter changing the bus widths, and reuse the 1 bit slices - no need to do the low level logic or placement again. Stuart Clubb wrote: > Ray, you didn't draw this did you? > > something like: > > clocked : process (rst, clk) > begin > if (rst = '0') then > count <= (others => '0') ; > elsif (clk'event and clk = '1') then > if (ena = '1') then > if (load = '0') then > count <= load_data ; > else > count <= count + '1' ; > end if ; > end if ; > end if; > end process ; > > result <= count ; > > gets exactly the implementation you describe. > > For the loadable adder (probably a bit-slice add/pass multiplier?) > > The following code seems OK: > > clocked : process (rst, clk) > begin > if (rst = '0') then > add <= (others => '0') ; > elsif (clk'event and clk = '1') then > if (ena = '1') then > if (load = '1') then > add <= a ; > else > add <= a + b ; > end if ; > end if ; > end if; > end process ; > > result <= add ; > > again, for 16 bit values I get 8 slices/16 FGs. > > The only caveat is that you'll need a recent release of Leonardo > Spectrum to do this. 1999.1d or the latest release 'e' should do fine. > (I was using 'd' at home) > > Registering the inputs a, b and the load, and not using I/O FF's gives > this an internal speed of 199 MHz post P&R in a -6 part for the 16-bit > busses. (M2.1i SP1) That should be fast enough, I think. > > Cheers > Stuart > > On Sun, 19 Sep 1999 00:42:09 -0400, Ray Andraka <randraka@ids.net> > wrote: > > >The carry architecture in virtex is not as powerful as that in the 4K, but there > >are some workarounds which really aren't rocket science. Loadable arithmetic > >can be done in front of the carry logic if you force the whole carry chain to a > >known value so that you know what the carry input of the xorcy is. > > > >For example, a loadable up-counter can be done in one slice per 2 bits if the > >load signal is active low. The load signal is fed to each LUT to control the > >mux in the LUT *and* to the carry-in of the first MUXCY and XORCY in the carry > >chain. The 0 side of each MUXCY is tied to zero so that if LD is low, the carry > >chain is forced low regardless of the output of the LUT. That way, the value > >out of the LUT propagates to through the xorcy unchanged. a similar trick can > >be played if the 0 side of the muxcy has to be a 1 for your logic (a downcounter > >for example), in which case you force the carry chain to all '1's and invert the > >load data in the LUT. In cases where the MUXCY D0 is used, such as an adder or > >a subtractor, you can use the MULT_AND to force the MUXCY D0 inputs to zero. > > > >So yes, it can be done in one level most of the time, Yes the new carry chain > >is not that great, and yes the libraries seem to have been done by an amateur. > > > > > > > >Edward Moore wrote: > > > >> I have designs which use a lot of loadable arithmetic, of the type : > >> > >> if load = '1' then > >> sum <= a; > >> else > >> sum <= a + b; > >> end if; > >> > >> I've just run one of these designs which was implemented in a 4000-xla-09 > >> device (slowest available) through synthesis and p&r targeted to a > >> Virtex -4 device (slowest available), and was suprised to find that it > >> consumed 20 % more resources, and only ran about 10 % faster. > >> > >> I've tracked most of this poor performance down to the loadable arithmetic, > >> which in the XLA devices consumed roughly 1 LUT per > >> bit, but in Virtex consume 2. This is because the arithmetic is being split > >> into a carry chain column and a seperate column for the loadable registers. > >> > >> Looking at the structure of a Virtex slice (I'm very new to Virtex's), I > >> can't see any way to implement loadable arithmetic as efficiently as in > >> the XLA devices. Can anyone offer any insights into this ?. I'm hoping that > >> i've missed something, as the 20 % LUT overhead wipes out any cost benefits > >> of moving to Virtex, and the slow speed puts a damper on any plans to run at > >> the advertised clock rates. > >> > >> I can't find any examples of a better implementation in Coregen (which > >> doesn't seems to support Virtex arithmetic). Should I wait for VirtexE ?. > >> > >> -- > >> Edward Moore > > > > > > > >-- > >-Ray Andraka, P.E. > >President, the Andraka Consulting Group, Inc. > >401/884-7930 Fax 401/884-7950 > >email randraka@ids.net > >http://users.ids.net/~randraka > > > > > > For Email remove "NOSPAM" from the address -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 17964
Err. this is a bit embarrasing!. Apologies to everyone for wasting their time. It seems you can do loadable artithmetic in slice per bit, using an active low or active high load. The code segment i was using was actually IF ld = '1' THEN sum <= a; ELSE sum <= a + b + carry; END IF; The carry makes all the difference, and makes it use 2 slices per bit. -- Edward MooreArticle: 17965
Even with the carry you can make it work in one level if you gate the carry in. Edward Moore wrote: > Err. this is a bit embarrasing!. Apologies to everyone for wasting their > time. It seems you can do loadable artithmetic in slice per bit, > using an active low or active high load. The code segment i was using was > actually > > IF ld = '1' THEN > sum <= a; > ELSE > sum <= a + b + carry; > END IF; > > The carry makes all the difference, and makes it use 2 slices per bit. > > -- > Edward Moore -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 17966
Hi All, I want to place some VCC and GND signals on a 4000 FPGA. I thought about using FMAPs with a Zero truth table (e.g. a0* !a0) but this is trimmed away by Foundation. Disabling the trimming is not suitable for me since I need it in other parts of the circuit. Any other ideas?? Cheers.Article: 17967
Vasant Ram wrote: > > Mark Summerfield <m.summerfield@ieee.org> wrote: > > by Wakerly now covers VHDL and includes the Xilinx student edition > > software. I haven't seen a copy of this book myself yet, but the > > previous edition is a very good book, so I'd expect that the new edition > > is worth checking out. > > Wow you like this book? Ugh, we used it for our digital logic class > (senior level) and I thought it was *horrible* A really good general > purpose digital logic book it put out by M. Morris Mano, forgot the exact > name but it's an *excellent* book. I have that book, too -- it was the set text when I was an undergraduate (as opposed to when I was *teaching* undergraduates ;-) You're right -- Mano is a great "general purpose digital logic book", a classic in fact. It's sitting right here on my bookshelf alongside Wakerly's and Skahill's books. It covers the fundamentals of digital logic analysis, design and synthesis. In an ideal world, every student would know it back to front before they were let loose on modern, real-world tools to do any medium-to-large scale digital design. But we do not live in an ideal world. In the real world, we have four years to teach students enough for them to start their professional careers as electrical or electronic engineers, which means that we have to make sure they know something about current-generation technologies and design methodologies. Where does Mano describe programmable logic? Where does he discuss the use of software tools to perform many of the synthesis functions his book describes? Where does he explain how modern tools can be used to design complex digital systems with (effectively) thousands of states (or more)? You can spend a lot of time teaching from Mano, offering students a theoretical grounding which is undoubtedly "good for them" in some wider, more idealistic sense. Some students will eventually gain real long-term benefits from this kind of education. Others will simply emerge from their four-year degree hopelessly under-prepared for the realities of their first job. We have quite a bit of experience with Wakerly's book. We have found that first-year students find it difficult, even though it purports to start from the very fundamentals. I suspect that it tries to cover too much too fast, and is thus confusing to beginning students. It is a little more successful as a second year text, although student opinion appears divided at that level. However, by the end of the third year digital systems course (which when I was teaching it went beyond the scope of the second edition), many students had come to see Wakerly as an invaluable reference covering a large part of their required knowledge in the field. I received comments on student feedback forms such as "Wakerly is a legend!" I haven't seen the third edition yet, with its coverage of VHDL and more emphasis an system-level design (so I hear), but I think that Wakerly is a good author and educator, and I have high expectations. However, one should always beware of publishers hype which suggests that any text might be the first and last word on a topic, or the only book you will ever need! MarkArticle: 17968
Is it possible to constrain this path? _______ | | data ----|myDFF|-------------- | | | |---|> | | | |_____| | | | INTERNAL | CLOCK | |\| | \ internal_signal ---------| >------ ram_ctrl_signal | / |/ ram_ctrl_signal is the output pin of the chip. INST "myDFF" TNM=ram_ctrl_driver ; TIMEGRP "ram_ctrl_signal"=PADS ("ram_ctrl_signal") ; TIMESPEC TS01=FROM:ram_ctrl_driver:TO:"ram_ctrl_signal":80 ns ; This PAR program reports an empty Timing Specification for this constraint. Design is XC40150XV. On the other hand, we have implemented some asynchronous logic, like AND, OR, other than tristate-buffers, in Spartan, and PAR program does NOT report an empty Timing Specification in this case. 1. Is it possible to constrain a path tracing through the tristate buffer? 2. If yes, then why is there a difference between XV and Spartan? UtkuArticle: 17969
Thank you very much! AustinArticle: 17970
Having prototyped a few designs on xilinx's 4kXL series, I have made the experience that 80% of dynamic current consumption was contributed through obuf's driving capacitive loads. As to the clock gating, as indicated by peter, it can reduce your dynamic consumption by as much as perhaps 75%. In a xilinx fpga (4k or newer) i am consequently using synchronous design methods and the dadicated clock-enable inputs to all internal and io-flip flops. The power consumption of global buffers for clock distr.etc, are rated at <=3.5mW/MegaTransitions/sec (here both signal edges do count). As to the lowest power fpga, I would look at the 2V5 virtex, or perhaps the CoolRun cpld from philips with extreme low stand-by currents (if device gate density adequate). kostasArticle: 17971
Hi, I'm trying to configure a Spartan chip (XCS20XL-PQ208) and it doesn't work. I'm using slave mode, generating 640ns wide CCLK pulses and about 16usec between them. I've checked DOUT, it is OK. The problem is, that INIT goes low after the first frame which means that frame error is detected. If I turn off the CRC checking when generating the bitstream, then the chip accepts the first frame and pulls INIT low after the second frame. I've tried to download the configuration via the xchecker cable using JTAG programmer of the Foundation 1.51i. It downloads the bitstream via the boundary scan interface and everything looks OK (DONE goes high for example) but the chip doesn't operate. When I try to verify, it says "to many mismatches". Does somebody have any idea what can be wrong? DanielArticle: 17972
In article <37D45F1F.FA0FBF59@fivedots.coe.psu.ac.th>, wannarat <wannarat@fivedots.coe.psu.ac.th> wrote: > Hello > I have problem when I download to CPLD > It's have error with JTAG program. "not found xc9572_ver2.bsd" > I can't load to CPLD how can I solve this problem I had a similar problem with a Spartan device - the .bsd file for the device was missing from the CD-ROM. I downloaded it from the Xilinx web site. Leon -- Leon Heller, G1HSM Tel (Mobile): 079 9098 1221 (Work): 01327 357824 Email: leon_heller@hotmail.com Web: http://www.geocities.com/SiliconValley/Code/1835 Sent via Deja.com http://www.deja.com/ Share what you know. Learn what you don't.Article: 17973
Hi all, I have just started using the lattice PLD development system and seem to be having problems dowloading to a LSI device. The software (ISP Daisy Chain Download Version 7.0) is able to detect the PLD device id, and can erase the device. However when I try and download a JEDEC file it complains (after a few seconds) that security is enabled so cannot verify download. As far as I know all security protection has been disabled. Any ideas, Thanks Derren CromeArticle: 17974
Do you have the right bitstream? Was it configured for the same part you are trying to program? Drotos Daniel wrote: > Hi, > > I'm trying to configure a Spartan chip (XCS20XL-PQ208) and it doesn't > work. I'm using slave mode, generating 640ns wide CCLK pulses and > about 16usec between them. I've checked DOUT, it is OK. > > The problem is, that INIT goes low after the first frame which means > that frame error is detected. If I turn off the CRC checking when > generating the bitstream, then the chip accepts the first frame and > pulls INIT low after the second frame. > > I've tried to download the configuration via the xchecker cable > using JTAG programmer of the Foundation 1.51i. It downloads the > bitstream via the boundary scan interface and everything looks > OK (DONE goes high for example) but the chip doesn't operate. > When I try to verify, it says "to many mismatches". > > Does somebody have any idea what can be wrong? > > Daniel -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randraka
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z