Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
John the best is to design to never reset !Article: 151951
>John >the best is to design to never reset ! > You can create a design that will work with no resets at all. The problem is that the verification suite will take a few eons to finish. John --------------------------------------- Posted through http://www.FPGARelated.comArticle: 151952
I am trying to map, place & route a large design on a Xilinx Virtex 6 FPGA Target Device : xc6vlx550t Target Package : ff1759 Target Speed : -2 My mapping process fails with the following errors: ERROR:Pack:2310 - Too many comps of type "DSP48E1" found to fit this device= . ERROR:Pack:2860 - The number of logical carry chain blocks exceeds the capa= city for the target device. This design requires 100940 slices but only has 85920 slices available that allow carry chains. ERROR:Map:237 - The design is too large to fit the device. Please check th= e Design Summary section to see which resource requirement for your design exceeds the resources available in the device. Note that the= number of slices reported may not be reflected accurately as their packing might not have been completed. When I inspect the Mapping report file, I see: Interim Summary --------------- Slice Logic Utilization: Number of Slice Registers: 460,088 out of 687,360 66% Number used as Flip Flops: 399,848 Number used as Latches: 0 Number used as Latch-thrus: 0 Number used as AND/OR logics: 60,240 Number of Slice LUTs: 388,284 out of 343,680 112% (OV= ERMAPPED) Number used as logic: 384,856 out of 343,680 111% (OV= ERMAPPED) Number using O6 output only: 311,180 Number using O5 output only: 10,716 Number using O5 and O6: 62,960 Number used as ROM: 0 Number used as Memory: 114 out of 99,200 1% Number used as Dual Port RAM: 0 Number used as Single Port RAM: 0 Number used as Shift Register: 114 Number using O6 output only: 114 Number using O5 output only: 0 Number using O5 and O6: 0 Number used exclusively as route-thrus: 3,314 Number with same-slice register load: 0 Number with same-slice carry load: 3,313 Number with other load: 1 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 584,470 Number with an unused Flip Flop: 125,987 out of 584,470 21% Number with an unused LUT: 196,186 out of 584,470 33% Number of fully used LUT-FF pairs: 262,297 out of 584,470 44% Number of unique control sets: 233 Number of slice register sites lost to control set restrictions: 854 out of 687,360 1% Also, Number of DSP48E1s: 4,800 out of 864 555% (OV= ERMAPPED) ----------------------------------------------------------------------- I did a quick calculation on design resource usage such as LUTs versus DSP4= 8E1s from the Xilinx Coregen GUI: 1. Multiplier1 uses 86 LUTs vs 1 DSP48E1. The design uses Multiplier1 x96. = So I am looking at either 96 DSP48E1s or 8256 LUTs. 2. Multiplier2 uses 142 LUTs vs 1 DSP48E1. The design uses Multiplier2 x470= 4. So I am looking at either 4704 DSP48E1s or 667968 LUTs. I tried different options to synthesize my design using LUTs and using DSPs= . Before I partition my design, I just wanted to check with everyone here, = on how the multipliers can optimize the usage of DSP48Es vs LUTs. The curre= nt mapping report indicates all the multipliers were mapped using DSPs, hen= ce 4800 DSPs.=20 1. How can the XST tool or the mapping partition the usage of the multiplie= rs using both DSPs and slice logic? Is this possible with some constraint? 2. The multiplier cores are currently set for Area optimization vs Speed op= timization and I have used "use Mults" option. If I set "use LUTs" option, = will the XST and Mapping process partition the multiplier usage between LUT= s and DSPs? Thanks in advance !!!Article: 151953
John so include it in the design and go for eons !Article: 151954
jt_eaton <z3qmtr45@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com> wrote: (snip) > You can create a design that will work with no resets at all. > The problem is that the verification suite will take a few > eons to finish. Most FPGA do an asynchronous reset on all FF at the end of configuration. I don't believe that is optional. -- glenArticle: 151955
On Jun 13, 8:21=A0pm, "jt_eaton" <z3qmtr45@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com> wrote: > >On Jun 12, 7:22=3DA0pm, "jt_eaton" > ><z3qmtr45@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com> wrote: > > >Thanks for that pointer. =A0I have always been a believer in using the > >async reset and now I see that this may not always be the best way to > >reset a design. =A0But the devil is in the details. =A0I wonder if this > >still applies to non-Xilinx designs? > > >Rick > > It applies it all designs. Designers who started their careers with > asynchronous logic carried it with them when Design for Synthesis and > synchronous design became a requirement but it has never been the best > choice. Many designers make the mistake of thinking that because they nee= d > an asynchronous reset system that they must design it using asynchronous > logic. That is simply not true. We design synchronous systems that are > black box equivalent to asynchronous systems all the time. The main thing > that you need to realize about reset system design is that the purpose of > the reset system is not to reset the system when a trigger event occurs. > It's purpose is to NOT reset the system when a trigger event is NOT > occuring. > > The same is true for airbag controllers.The job of an airbag controller i= s > not to deploy the bag when the car is in a accident, it's job is to not > deploy the bag when the car is not having an accident. Any system where t= he > expected number of uses is small and the effects of the usage is large wi= ll > follow this rule. > > Remember the 1st StarWars movie? They built DeathStar with an emergency > exhaust port that provided a direct path from the reactor core to the > surface. It was ray shielded but could not be particle shielded. Bad plan= . > > An asynchronous reset has a direct path from a pad into every flip-flop i= n > the entire chip. It is analog shielded but not digitally shielded. Bad > plan. > > Resets in a real product (not a simulation) are really rare events. If a > reset is delayed by 20 microseconds then nobody will notice. If a product > that you are using suddenly resets itself then you will likely notice. > Spend a few hundred cycles on a digital filter before you do something > drastic. > > John Eaton > > --------------------------------------- =A0 =A0 =A0 =A0 > Posted throughhttp://www.FPGARelated.com Interesting philosophy. RickArticle: 151956
On Jun 14, 1:40=A0pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote: > jt_eaton <z3qmtr45@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com> wrote: > > (snip) > > > You can create a design that will work with no resets at all. > > The problem is that the verification suite will take a few > > eons to finish. > > Most FPGA do an asynchronous reset on all FF at the end of > configuration. =A0 I don't believe that is optional. =A0 > > -- glen I believe that is optional for any given FF. The GSR has to be enabled on each FF and that is the point of the white paper. In Xilinx devices using the GSR uses one of the set/reset input on a FF as an async input which also configures the other input as async IIRC. The tools are capable of using the Set and Reset inputs a synchronous inputs to reduce the LUT usage and improving the speed of a design... in some cases. As to the philosophical avoidance of async resets, I can't say I share that belief. As you point out, there is one async reset on the chip that you can't eliminate, the PROGRAM pin. Even if it doesn't reset the FFs, it will stop the design from working and reload all the LUTs and memory. It has been a long time since I used a Xilinx part, so I may not remember them 100% correctly. RickArticle: 151957
Lots of interesting advice here! In particular I read the Xilinx whitepaper with interest. Unfortunately, a lot of the advice seemed to be inapplicable to my problem. I can't look for the individual submodule that's taking up most of the area, because my application is a single long pipeline with a large number of very similar stages: the area isn't taken up by any one stage, but more by the number of stages. And because the design is a pipeline with general logic (mostly bitwise, plus a small bit of basic arithmetic) between registers, I don't really see any opportunities for special primitives like SRLs, DSPs, or the like that would reduce area. I can probably solve my problem by building a smaller pipeline and reusing it; I preferred not to do that as it will decrease system performance but it looks like I don't have much choice now. Thanks anyway! ChrisArticle: 151958
>The data arrives with some unknown phase shift relatively to system >(synchronized to SDRAM) clock. DQ can be captured more reliably if we >route the data clock, DQS, along the data. They suggest that it is easy >to transport the received data bursts into the system clock domain using >a FIFO afterwards. This is great. I just see a one small problem: > > How do you know that the read operation takes place so that > the captured data are valid for submission into FIFO? > > >A READ_EN signal must be delivered from the SDRAM write/command part >(CLK domain) into asynchronously running receiver in DQS domain (the >period is the same but phase is unknown) with one DQS clock precision. >Remember that we run away from strobing DQ by CLK phases because we do >not know the data arriving phase relatively to CLK. That is why we >introduced the DQS. But now, we still must figure out the phase shift. >It looks like our attempt to do without the phase difference has failed. > >Why people still use DQS for strobing data instead of some CLK-derived >phase? > Some DDR2 SDRAM controllers require a feedback clok input, being their output clock via a loop of track that goes the same distance as to the SDRAM and back. Others go through a training phase where they work out the "time-of-flight" from the controller to the SDRAM and back. Either works well enough. If your FPGA is from Xilinx, use their MIG tool to generate the controller. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 151959
Something to remember about Xilinx FPGAs, at least when designing in VHDL and synthesizing with XST, is that you can specify the initial value of registered signals (when declaring the signal in the declarative part of the architecture). This is sometimes considered bad practice (bad coding style) in other contexts, and may not be supported by other tool flows. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 151960
"RCIngham" <robert.ingham@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com> wrote in message news:nYadnfl5gelSNWXQnZ2dnUVZ_tOdnZ2d@giganews.com... > >The data arrives with some unknown phase shift relatively to system >>(synchronized to SDRAM) clock. DQ can be captured more reliably if we >>route the data clock, DQS, along the data. They suggest that it is easy >>to transport the received data bursts into the system clock domain using >>a FIFO afterwards. This is great. I just see a one small problem: >> >> How do you know that the read operation takes place so that >> the captured data are valid for submission into FIFO? >> >> >>A READ_EN signal must be delivered from the SDRAM write/command part >>(CLK domain) into asynchronously running receiver in DQS domain (the >>period is the same but phase is unknown) with one DQS clock precision. >>Remember that we run away from strobing DQ by CLK phases because we do >>not know the data arriving phase relatively to CLK. That is why we >>introduced the DQS. But now, we still must figure out the phase shift. >>It looks like our attempt to do without the phase difference has failed. >> >>Why people still use DQS for strobing data instead of some CLK-derived >>phase? >> > > Some DDR2 SDRAM controllers require a feedback clok input, being their > output clock via a loop of track that goes the same distance as to the > SDRAM and back. Others go through a training phase where they work out the > "time-of-flight" from the controller to the SDRAM and back. Either works > well enough. If your FPGA is from Xilinx, use their MIG tool to generate > the controller. I think the point was:If you dont know the timing between outclk and inclk (or dqs) - It could be >1clk in theory - how do you know when data is valid on a read? I guess you can't trust DQS as it is floating when not active.. You just need to assume there is <1clk delay (and I think that is specified in the std). Imho, dq's should be single direction and separate for r/w.. Maybe they did that to later DDR standards.Article: 151961
On Jun 15, 1:35=A0am, Christopher Head <ch...@is.invalid> wrote: > Lots of interesting advice here! In particular I read the Xilinx > whitepaper with interest. Unfortunately, a lot of the advice seemed to > be inapplicable to my problem. I can't look for the individual > submodule that's taking up most of the area, because my application is > a single long pipeline with a large number of very similar stages: the > area isn't taken up by any one stage, but more by the number of stages. > And because the design is a pipeline with general logic (mostly > bitwise, plus a small bit of basic arithmetic) between registers, I > don't really see any opportunities for special primitives like SRLs, > DSPs, or the like that would reduce area. I can probably solve my > problem by building a smaller pipeline and reusing it; I preferred not > to do that as it will decrease system performance but it looks like I > don't have much choice now. > > Thanks anyway! > Chris I would start by saying that the biggest opportunities for savings are almost always by starting at the algorithm level. You'll only get so far by playing with implementation. one suggestion might be to look for places where you could do 'double clocking' - ie generate a 2x clock with the DCM and run a particular piece of logic twice per cycle, muxing the inputs and distributing the outputs. We have some designs that were multiplier limited, so we used this trick as our main pipeline was slow enough to use one multiplier to do double duty per pipeline stage. some other tricks - use multipliers as shifters if you have them spare. See if you can rejigger your pipeline stages. Some of the older parts (vitrex-2 or so) have dedicated BUFT primitives that you can use to reduce the number of logic elements in multiplexers. Look at and understand the logic usage reports from the synthesizer. If a module gets generated with more f/fs than you think it should, it's good to dig in and figure out what got generated. For XST There is a tool or option that will show a schematic of synthesized logic, this can be handy.Article: 151962
I have a design partitioned over 2 FPGAs. I am trying to determine the bene= fits of selecting GTX links vs. LVDS to transfer the data between FPGAs. =20 Target Device : xc6vlx550t Target Package : ff1759 Target Speed : -2=EF=BB=BF =20 Latency calculations: 1. GTX interface: The GTX transceiver is configured at 106.25 MHz with 20 b= its input. This means the bits are transmitted at bit-rate =3D 20*106.25 MH= z =3D 2.125 Gbps. # of bits to be transferred =3D 1728 Latency of this interface =3D 1/(80% of bit-rate * (20/16)*(# of bits=EF=BB= =BF transferred/16)) =3D 1/(2.295+e11) =3D 4.35+e-12 seconds =20 2. LVDS+Aurora: The Aurora interface is configured at 600MHz (6 Gbps) with = lane width as 2 bytes. =20 Latency of this interface =3D 1/(80% of clock rate * (# of bits=EF=BB=BF tr= ansferred/16)=EF=BB=BF) =3D 1/(5.184+e10) =3D 19.29+e-12 seconds =20 =20 Is this calculation correct? My assumption for the LVDS calculation is that= Aurora does not up-sample the clock frequency by 20 for transmitting data. =20 Thanks in advance for all the feedback.Article: 151963
> Some DDR2 SDRAM controllers require a feedback clok input, being their > output clock via a loop of track that goes the same distance as to the > SDRAM and back.Others go through a training phase where they work out the > "time-of-flight" from the controller to the SDRAM and back. Either works > well enough. I do believe that this works very well. I just want to know one thing: how all this stuff helps to strobe nothing but valid data bits? > If your FPGA is from Xilinx, use their MIG tool to generate > the controller. My board is http://www.xilinx.com/univ/XUPV2P, routed for Xilinx http://www.xilinx.com/support/documentation/ip_documentation/plb_ddr.pdf memory controller It involves the on-board clock feedback trace, which matches the FPGA-to-SDRAM trace length. Can you explain the advantage of this design in 7.05.2011 topic "Why feedback clock in SDRAM controllers?" There are two problems to use EDK controller: 1. The CoreGen of ISE10.1 (latest for XCv2p) does not include the memory generator and 2. plb_ddr.pdf says: "Due to the variation in board layout, the DDR clock and the DDR data relationship can vary. Therefore, the designer should analyze the time delays of the system and set all of the attributes of the phase shift controls of the DCM as needed to insure stable clocking of the DDR data." I just do not understand how to measure these timings and, at the first place, why do we need these DQS if phase shift with respect to system clock still must be adjusted manually? Why not to strobe DQ by this manually adjusted system clock phase right away?Article: 151964
With DDR memory you would use some sort of calibartion scheme so that the data coming from the memory was calibrated to the clock inside the FPGA. This usually consists of writing a 1010 pattern into the memory and then reading it back and using a IO delay inside the FPGA to alter the relationship between the data and internal clock. Jon --------------------------------------- Posted through http://www.FPGARelated.comArticle: 151965
>I have a design partitioned over 2 FPGAs. I am trying to determine the bene= >fits of selecting GTX links vs. LVDS to transfer the data between FPGAs. >=20 >Target Device : xc6vlx550t >Target Package : ff1759 >Target Speed : -2=EF=BB=BF >=20 >Latency calculations: >1. GTX interface: The GTX transceiver is configured at 106.25 MHz with 20 b= >its input. This means the bits are transmitted at bit-rate =3D 20*106.25 MH= >z =3D 2.125 Gbps. ># of bits to be transferred =3D 1728 >Latency of this interface =3D 1/(80% of bit-rate * (20/16)*(# of bits=EF=BB= >=BF transferred/16)) =3D 1/(2.295+e11) =3D 4.35+e-12 seconds >=20 >2. LVDS+Aurora: The Aurora interface is configured at 600MHz (6 Gbps) with = >lane width as 2 bytes. >=20 >Latency of this interface =3D 1/(80% of clock rate * (# of bits=EF=BB=BF tr= >ansferred/16)=EF=BB=BF) =3D 1/(5.184+e10) =3D 19.29+e-12 seconds >=20 >=20 >Is this calculation correct? My assumption for the LVDS calculation is that= > Aurora does not up-sample the clock frequency by 20 for transmitting data. >=20 >Thanks in advance for all the feedback. > Generate both lots of IP. Write a testbench with both instantiated. Simulate. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 151966
[snipped] > >My board is http://www.xilinx.com/univ/XUPV2P, routed for Xilinx >http://www.xilinx.com/support/documentation/ip_documentation/plb_ddr.pdf >memory controller >It involves the on-board clock feedback trace, which matches the >FPGA-to-SDRAM trace length. Can you explain the advantage of this design >in 7.05.2011 topic "Why feedback clock in SDRAM controllers?" > [snipped] Oh the old Virtex-2PRO stuff. Bad luck! It all works lovely on Virtex-4 and Virtex-5 with recent ISE and CoreGen. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 151967
In your explanation, only one thing is missing: DQS. Why do we need data if we still need to calibrate "memory to the clock"? One could calibrate DQ directly "to the clock inside FPGA".Article: 151968
Why do we need _DQS_, I mean. Thank you for the appreciation.Article: 151969
On 15.06.2011 18:48, maxascent wrote: > With DDR memory you would use some sort of calibartion scheme so that the > data coming from the memory was calibrated to the clock inside the FPGA. > This usually consists of writing a 1010 pattern into the memory and then > reading it back and using a IO delay inside the FPGA to alter the > relationship between the data and internal clock. BTW, why the static installation of FPGA-SDRAM on a single board needs the dynamic calibration? 1010 is produced by DQS. Do you mean that duplicaiton is needed because all DQ bits, in one DQS group, must be treated separately?Article: 151970
AMDyer@gmail.com <amdyer@gmail.com> wrote: (snip) > I would start by saying that the biggest opportunities for savings are > almost always by starting at the algorithm level. You'll only get so > far by playing with implementation. > one suggestion might be to look for places where you could do 'double > clocking' - ie generate a 2x clock with the DCM and run a particular > piece of logic twice per cycle, muxing the inputs and distributing the > outputs. We have some designs that were multiplier limited, so we > used this trick as our main pipeline was slow enough to use one > multiplier to do double duty per pipeline stage. For systolic arrays, which I will guess that the OP is working on, that often doesn't help. You could speed up the whole thing by a factor of two, though. -- glenArticle: 151971
>Something to remember about Xilinx FPGAs, at least when designing in VHDL >and synthesizing with XST, is that you can specify the initial value of >registered signals (when declaring the signal in the declarative part of >the architecture). This is sometimes considered bad practice (bad coding >style) in other contexts, and may not be supported by other tool flows. > > >--------------------------------------- >Posted through http://www.FPGARelated.com > I really like the fact that you can initialize rams as well. You no longer need to think in terms of rams or roms, you have a universal read/writable rom for everything. Need a screen buffer for your display? Create a startup screen image file and have that loaded as well. Need some boot/test code. Load it in at startup and then reuse that memory later. This stuff is great!! John --------------------------------------- Posted through http://www.FPGARelated.comArticle: 151972
On Jun 15, 2:35=A0am, Christopher Head <ch...@is.invalid> wrote: > Lots of interesting advice here! In particular I read the Xilinx > whitepaper with interest. Unfortunately, a lot of the advice seemed to > be inapplicable to my problem. I can't look for the individual > submodule that's taking up most of the area, because my application is > a single long pipeline with a large number of very similar stages: the > area isn't taken up by any one stage, but more by the number of stages. > And because the design is a pipeline with general logic (mostly > bitwise, plus a small bit of basic arithmetic) between registers, I > don't really see any opportunities for special primitives like SRLs, > DSPs, or the like that would reduce area. I can probably solve my > problem by building a smaller pipeline and reusing it; I preferred not > to do that as it will decrease system performance but it looks like I > don't have much choice now. > > Thanks anyway! > Chris "General" logic is always ripe for optimization, or maybe I should say, de-unoptimization. If I were you, I would code each stage as a separate module and measure the size to compare to what you think it should be. I have seen many times where the tools took what I thought was pretty straight forward code and blew it up to something ugly. Obviously it was doing what I told it to, but I would have been able to do better than the machine because I understood the logic better. So I had to change my code to indicate how it could be simplified. Don't worry about the special features of a chip. First figure out if the tools did an ok job... RickArticle: 151973
> >As to the philosophical avoidance of async resets, I can't say I share >that belief. As you point out, there is one async reset on the chip >that you can't eliminate, the PROGRAM pin. Even if it doesn't reset >the FFs, it will stop the design from working and reload all the LUTs >and memory. > >Rick > You can't avoid 100% of all async reset flops but you can easily do the 99.999% where sync will give you a smaller, faster design and your design is still a black box equivalent to using the async reset. With xilinx parts every flop with an async reset wastes 1 lut over a sync reset. In asic design every async reset flop doubles the number of endpoints needing timing closure from 1 to 2. If you do a really lousy job in designing your reset distribution then these async paths could become critical paths and start taking routing resources away from your other more important paths. Async resets on flops are nothing but trouble. John --------------------------------------- Posted through http://www.FPGARelated.comArticle: 151974
Vivek, I've recently determined the latency of Aurora in my design by running simulation. It's V6, 250Mhz, 20bit, no framing. I've got 340ns. If there is a clock compensation, it periodically inserts a symbol and adds an additional clock . Thanks, Evgeni ======================== http://outputlogic.com
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z