Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Peter Alfke wrote: > Vessumesh, if you refuse to answer specific helpful questions, then I > suggest you figure this out yourself, and do not bother this newsgroup. > Peter > > vssumesh wrote: > >>Peter Alfke wrote: >> >>>What kind of device are you using? >>>20 ns for a 32-bit adder (using dedicated carry) would be ridiculously >>>slow... >>>Dedicated carry, available in all Xilinx FPGA devices, uses less than >>>50 ps per bit (plus some basic delay). >>>Peter Alfke >>> >>>=========== >>>vssumesh wrote: >>> >>>>Hello all, >>>> In my design i am using a 32 bit adder and some combinational logic >>>>after that. The full path i want to constrain to double the clock >>>>period (20ns) and it is not constraing. When analysed the critical path >>>>observed that there is big carry chain for the adder and a big routing >>>>delay between the combinational logic (which i never expected). Is the >>>>big carry chain is causing the trouble in the router. I am thinking of >>>>buffering the output of the adder with a -ve edge (constrain that path >>>>to 5ns). And then constrain the other path that is after the buffer to >>>>next stage FF to 16ns. Will this buffering ease the routing effort. >>>>Please advice. >>>>Thanks and regards >>>>Sumesh V S >> >>No 20ns for the adder and the remaining combinational logic. The adder >>delay is as you said is very much less. > > SOunds like there are several layers of combinatorial logic. Pipeline the design. Also, I think he is using the term "buffer" to indicate adding a register stage. The adder bits are like Peter said, about 50ps per bit, but the time to get on and off the carry chain adds more than 2ns, still nowhere near the 20ns. It isn't the carry chain causing the problem. The problem comes about from using many levels of logic (ie the signal goes through lots of LUTs) between the flip-flops plus the propagation delay associated with the carry chain. You need to look at the ratio of logic delay to routing delay. If the routing delay on the critical path is more than the logic delay, you can likely fix the problem with some manual placement. The placer does a very poor job placing the additional layers of LUTs in multi-layer combinatorial logic. The LUT connected to the flip-flop places well, but the LUTs leading up to that one get scattered to the far reaches of the chip. You could try a higher effort level on the placement, but that may not provide enough improvement. You'll get better results floorplanning the locations of the additional layers of LUTs to be laid out logically and close to the rest of the LUTs in the path. Trouble is, the LUT names are subject to change on subsequent synthesizer runs, so you have to be really careful. The best solution, if your design can support it, is to pipeline the logic deeper.Article: 108976
Nico Coesel wrote: > Austin Lesea <austin@xilinx.com> wrote: > > >>Nico, >> >>OK, here it is: (for S3, V2, and V2 Pro) >> >>"It is likely that the delay will be marginally smaller if you >>tie the 2 LSB inputs and use the upper 16 inputs only. However, the >>software model is pretty simple and won't model that as far as I can >>remember. Also, since one of the inputs goes through the Booth encoder >>it might not be as substantial of an improvement as it would be with an >>"original recipe" multiplier." > > > So what you are saying is that the multiplier is faster when the upper > inputs are being used, but the place & route software assumes the > upper bits are slower? > > The software assumes the LSBs of the inputs are toggling and affecting the MSBs of the outputs. The timing model doesn't take into account the fact that the MSB outputs have shorter propagation delays from the higher up input bits, so it doesn't reflect the advantage of using only the upper bits when you do the timing analysis. Instead, I believe the model just assumes a certain delay from any of the inputs to a specific output.Article: 108977
Nico Coesel wrote: > All you need is a normal clock and a 90 degrees phase shifted clock. > The whole clocking outside the fpga thing is unnecessary. If you place > the output flipflops inside the IOBs and use an fddr in the IOB to > replicate the internal clock, all signals connected to the DDR memory > will have the same delay. But the DDR spec says the DQS strobe for data written to the fpga must be center aligned. The DQS is in phase with the DDR clock. That means the data must be put on the lines 1/2 of 1/2 of a clock cycle early for proper alignment. This requires a clock that is 270 degrees out of phase from the DDR's clock. This is the clock used for the data lines going into the DDR.. I don't understand the "clocking outside the fpga" you mention. The fpga currently has one 50 mhz external clock source. I run that through a DCM to make it 100 mhz. Then in order for the DDR to work I need to use two more DCM's. One is used to make the DDR clocks (positive and negative). The other is used for everything else. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architectureArticle: 108978
Ray, Correct. Austin Ray Andraka wrote: > Nico Coesel wrote: > >> Austin Lesea <austin@xilinx.com> wrote: >> >> >>> Nico, >>> >>> OK, here it is: (for S3, V2, and V2 Pro) >>> >>> "It is likely that the delay will be marginally smaller if you >>> tie the 2 LSB inputs and use the upper 16 inputs only. However, the >>> software model is pretty simple and won't model that as far as I can >>> remember. Also, since one of the inputs goes through the Booth encoder >>> it might not be as substantial of an improvement as it would be with an >>> "original recipe" multiplier." >> >> >> So what you are saying is that the multiplier is faster when the upper >> inputs are being used, but the place & route software assumes the >> upper bits are slower? >> >> > > > The software assumes the LSBs of the inputs are toggling and affecting > the MSBs of the outputs. The timing model doesn't take into account the > fact that the MSB outputs have shorter propagation delays from the > higher up input bits, so it doesn't reflect the advantage of using only > the upper bits when you do the timing analysis. Instead, I believe the > model just assumes a certain delay from any of the inputs to a specific > output.Article: 108979
David Ashley wrote: [snip] > But the open cores DDR doesn't make use of the DQS strobe generated > by the DDR device itself. I'm only trying to run at 100 mhz. In that > case xilinx app notes say the timing is adequate so the DQS strobe isn't > needed to capture data reliably. Maybe the timing would get easier if > the logic made use of the DQS strobe from the DDR. > I'm doing pretty much the same thing with Virtex 2 (similar architecture to Spartan 3) on a proprietary board. This board has a 66.66 MHz clock that is doubled to run the DDR at 133 MHz (266 DDR). I do not use the DQS inputs for sampling data. I did need to tweak the delay in my DCM's to get reliable sampling. I did not use any expensive test equipment for this, I just used the variable delay mode of the DCM to run tests at various phases and centered the final fixed value within the area that seemed to work. At 100 MHz I would expect the timing margins to be quite good even in the slowest speed grade parts. I'm using Virtex 2 -5 speed grade in my 133 MHz design. > I have a feeling adding some constraints would make the thing work > with a single DCM. Unfortunately I have no clue what constraints to > add, as I don't know what's going wrong (and don't know much about > constraints writing anyway). > The problem with a single DCM is that you need to make up for phase differences in the board routing. Signals to the DDR memory arrive there some prop. delay after they leave the FPGA. At the memory end they need to meet setup and hold time to the clock as it arrives at the memory, usually at the same board routing delay as the clock. So if your clock and data/ address/control outputs use the same internal clock, you would need to use board routing or some other delay element external to the FPGA to ensure hold time is met at the memory. Then the data returning from the memory shows up 2 board prop. delays from the driven clock, plus the clock to output timing specified in the memory datasheet. So the sampling point isn't exactly centered within the outgoing clock half- period. So your sampling clock may need to be off by some phase other than 90 degrees from the clock driving your outputs. All of this is pretty hard to accomplish with one DCM, IMHO. And just adding timing constraints without the mechanism to meet them makes life miserable on the tools, which usually fail miserably in response (they have only internal routing delays to make up your requested timing).Article: 108980
David Ashley wrote: > I will certainly share whatever I learn. I got my simple write/ read-verify system to work. I was able to get rid of one of the DCM's, so I only need 2. DCM #1 takes 50 mhz input and I use the 2X output to drive a clock buffer. This is the tclock signal. Feedback comes from the clock buffer. DCM #2 takes tclock and produces 4 phase output. The 0 and 270 signals drive 2 clock buffers. The 0 clock buffered version goes back into the feedback input on the DCM. These signals are sys_clk and sys_clk270. FDDR's are used to produce the DDR's clock. Their inputs are hardwired for "01" for the true clock, and "10" for the negative clock. Both FDDR's take clock from sys_clk and inverted sys_clk. The inverter is implicit in the FDDR configuration, no delay penalty exists. Here's the trick: The original open cores DDR controller source sampled the data from the DDR on sys_clk rising and falling edge. I instead push out the sampling by 1/4 of a cycle: rising_edge(sys_clk) replaced by falling_edge(sys_clk270) falling_edge(sys_clk) replaced by rising_edge(sys_clk270) Then I made a slight tweak to get the sampled data back into the sys_clk domain as required elsewhere. It works fine. I had a feeling the problem was in the sampling side since no special machinery existed to sample in the middle of when it was valid. The setup time was not being met. Here's a sample of the before code: -- **** CODE BEFORE FIX process (sys_clk) begin if rising_edge(sys_clk) then -- sample HI-data word with rising edge data_hi_q <= data; -- store HI- und LO- data word in 32bit output register data_out_q <= data_hi_q & data_lo2_q; end if; end process; -- ... process (sys_clk) begin if falling_edge(sys_clk) then -- sample LO- word with falling edge data_lo1_q <= data; -- 1 clock additional delay to store HI- and LO-word -- with the next rising edge as 32bit word data_lo2_q <= data_lo1_q; end if; end process; -- ***** CODE AFTER FIX process (sys_clk270) begin if falling_edge(sys_clk270) then -- sample HI-data word with rising edge data_hi_q <= data; end if; end process; process (sys_clk) -- (DA) fix to get back into sys_clk domain begin if rising_edge(sys_clk) then -- store HI- und LO- data word in 32bit output register data_out_q <= data_hi_q & data_lo2_q; end if; end process; -- ... process (sys_clk270) begin if rising_edge(sys_clk270) then -- sample LO- word with falling edge data_lo1_q <= data; -- 1 clock additional delay to store HI- and LO-word -- with the next rising edge as 32bit word data_lo2_q <= data_lo1_q; end if; end process; Hope this is of use to other people. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architectureArticle: 108981
Gabor wrote: > David Ashley wrote: > [snip] > I'm doing pretty much the same thing with Virtex 2 (similar > architecture > to Spartan 3) on a proprietary board. This board has a 66.66 MHz > clock that is doubled to run the DDR at 133 MHz (266 DDR). I > do not use the DQS inputs for sampling data. I did need to tweak > the delay in my DCM's to get reliable sampling. I did not use any > expensive test equipment for this, I just used the variable delay > mode of the DCM to run tests at various phases and centered > the final fixed value within the area that seemed to work. See other email in this thread for details. I got it working by sampling data from the DDR on the 90 degree phase clock, now it works fine. No tweaking of the DCM necessary. And I'm only using one DCM. The DDR's DQS output transitions right when the data becomes valid out of the DDR. But the DDR controller has to transition the DQS right in the middle of the data going to the DDR being valid. This is hardly fair. I wish there wasn't even the DQS signal, it's just a PITA. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architectureArticle: 108982
Austin Lesea wrote: > OK, > > I have looked through a lot of places, but it seems that opencores.org, > etc. just do not have any Hilbert transform blocks. > > I would think that this is not exactly rocket science, as the common > ways to do this are posted all over the place, and there are c programs > for DSP also posted. Even the Xilinx DSP libraries don't seem to have a > free Hibert transformer (even one for $?). > This paper has something on page 11: http://www.xilinx.com/ipcenter/catalog/logicore/docs/da_fir.pdf Cheers, GuenterArticle: 108983
Guenter, Boy, is that embarrassing: it is right where it is supposed to be, on the free logic cores stuff. But, in my defense, it was 'hidden' in with the FIR filter wizard, as that is how they chose to implement it. Now if only the search engine would have found it? Maybe if I didn't look for "Hilbert", but instead looked for "FIR filters"? Who would have guessed? It is not only there where you pointed me, but also: http://www.xilinx.com/bvdocs/ipcenter/data_sheet/fir_compiler_ds534.pdf Veil dank Guenther, Austin Guenter wrote: > Austin Lesea wrote: >> OK, >> >> I have looked through a lot of places, but it seems that opencores.org, >> etc. just do not have any Hilbert transform blocks. >> >> I would think that this is not exactly rocket science, as the common >> ways to do this are posted all over the place, and there are c programs >> for DSP also posted. Even the Xilinx DSP libraries don't seem to have a >> free Hibert transformer (even one for $?). >> > > This paper has something on page 11: > > http://www.xilinx.com/ipcenter/catalog/logicore/docs/da_fir.pdf > > Cheers, > > Guenter >Article: 108984
David Ashley wrote: > Hope this is of use to other people. > -Dave > I've gotten email asking for the source, so I put it up, it can be found here: http://www.xdr.com/dash/fpga/ It's targeted to a linux build environment. It needs unisim to be in the right place in order to build as is...or tweak the Makefile. It's a pretty much identical copy of the open cores ddr controller, except I removed one DCM, and I wrapped it all in a synthesizable tester targeted to the spartan-3e starter board. The test just fills up memory with a non-repeating pattern, then reads it back out. If the pattern matches an LED stays lit. It keeps doing this forever. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architectureArticle: 108985
I am designing a crossdomain synchroniser and wanted to check that I understand the formula for the mean time between metastable failures correctly. Sorry if the answer can be easily found on the web; I tried to find it and failed. The usual formula is MTBF=1/(T0 f1 f2 e^{-t/tau}), where f1 is the clock frequency of a flip-flop's clock, f2 is the edge frequency at which its input transitions, T0 is the metastable window aperture size (it *is* called that?), and tau is the metastability time constant. A failure happens whenever the flip-flop becomes metastable and remains so for at least time t. The value of T0 seems impossible to find for Xilinx FPGAs, presumably because it varies exponentially with tau, and that is difficult enough to measure accurately (?). I think that T0 can be at most t_{setup} +t_{hold} so that might be one way of obtaining a value (?) [, though Xilinx say that negative hold times are not guaranteed, so I should probably stick with just t_{setup} whenever the hold time is negative] The above formula only works when the two clocks are independent. If they are not then I think a good upper bound is MTBF >= 1/( ( min f1 f2 ) e^{-t/tau} ) (?) The rationalle is that the flip-flop cannot go metastable any more often than either f1 or f2 (remembering f2 is the edge frequency, though since the potentially metastable flip-flop is fed from another flip-flop clocked with frequency f2 that ends up being the same thing). It might be that even when the two clocks are produced by the same DCM there will be sufficient jitter to allow the upper bound to be improved considerably, but probably not if the best known bound on T0 is of a similar magnitude to the jitter (?) Could someone please confirm the above is corect? Many thanks in advance!Article: 108986
"gauckler" <gauckler@fh-furtwangen.de> wrote in message news:1158562165.368801.101410@i3g2000cwc.googlegroups.com... > Hi, > > i tried to simulate a small vhdl design with xilinx ISE (8.1 - 8.2 > spxx, Webpack or foundation) running SuSE 10.1 linux, unfortunately > there is an error. Because the VHDL code simulates with SuSE 9.2 I > assume the code is fine and there are no spaces in the file path. > > Started : "Check Syntax". > Running vhpcomp > Compiling vhdl file "/home/PBuser2/parity/parity.vhd" in Library > isim_temp. > Entity <parity> compiled. > Entity <parity> (Architecture <behavior>) compiled. > Compiling vhdl file "/home/PBuser2/parity/tb_parity.vhd" in Library > isim_temp. > Entity <tb_parity_vhd> compiled. > Entity <tb_parity_vhd> (Architecture <behavior>) compiled. > Parsing "tb_parity_vhd_stx.prj": 0.03 > > Process "Check Syntax" completed successfully > > Running Fuse ... > Parsing "tb_parity_vhd_beh.prj": 0.00 > Building tb_parity_vhd_isim_beh.exe > ERROR:Simulator:222 - Generated C++ compilation was unsuccessful > > Has anybody simulated ISE isim under SuSE 10.1. Any hint is > appreciated. > > Andreas > Same thing with just ISE - no solution. Sorry. This really is something Xilinx should be sorting. Rog.Article: 108987
On Tue, 19 Sep 2006 10:21:17 -0700, Austin Lesea <austin@xilinx.com> wrote: >OK, > >I have looked through a lot of places, but it seems that opencores.org, >etc. just do not have any Hilbert transform blocks. > >I would think that this is not exactly rocket science, as the common >ways to do this are posted all over the place, and there are c programs >for DSP also posted. Even the Xilinx DSP libraries don't seem to have a >free Hibert transformer (even one for $?). > >Yes, I know how to go about doing one, but, if its already done, why >recode the wheel? After all, there are probably at least three good >ways to do it on an FPGA, and ten bad ones. > >Since I have "friends in low places" in ham radio, having a public >domain Hilbert would be useful for SSB, FM, AMSAT and other SDR >applications. > >Some FIR, IIR, FFT, mixers, accumulators, DDFS, and so forth that are >pretty easily found plus a Digilent $99 S200 pcb could make a useful >foundation for software defined radio experiments (that and a >http://www.digilentinc.com/Products/Detail.cfm?Prod=AIO1&Nav1=Products&Nav2=Accessory >analog accessory pcb, or make your own A/D, D/A pcb). > >If anyone can point me to some sources, it would be appreciated. > >By the way, the TAPR class went well on Sunday in Tuscon, and now there >are 28 more crazy hams out there who are really dangerous... > >The talk and slides will be posted when they do their web page for the >2006 25th anniversary meeting. > >http://www.tapr.org > >Austin An opamp-based allpass 90 degree phase shifter is pretty simple; 8 opamp sections, 8 caps, 24 resistors gives nice quadrature signals over the voice range. And simulating a R-C section in an FPGA is trivial. So it seems to me that one could do a nice Hilbert with a fairly small amount of FPGA resources by just mimicing the opamp circuit in discrete time. That would be a lot smaller than a FIR implementation. Anybody done it this way? JohnArticle: 108988
Have you read: http://tinyurl.com/qugxf http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?iCountryID=1&iLanguageID=1&sTechX_ID=pa_metastability&BV_SessionID=@@@@1476187725.1158705950@@@@&BV_EngineID=cccfaddikmdkkhhcefeceihdffhdfjf.0 and http://www.xilinx.com/bvdocs/appnotes/xapp094.pdf ? AustinArticle: 108989
Sorry to follow myself up. I made a mistake in the post (used max when intending to use min, though neither is incorrect), so I cancelled it and posted the corrected version moments later. Google groups seems to have honoured the cancel request, but another Usenet server I use has not. Sorry if you see two copies.Article: 108990
MM wrote: > > Thanks. I tried but it doesn't work either. > > Post your MHS file... > > /Mikhail Hi, Here is a file I recorded for use by myself after I have successfully inserted debuging information into a *.cdc file and used ChipScope correctly without error. How to start ChipScope procedure correctly: 1. Generate a project as usually, including all files containig signals you want to debug; 2. Synthesize it without errors; 2. Start ChipScope Pro Core Insert; 3. Edit clock channel and trigger/data channels without errors: it means that the number of signals must be the same as you set in the previous page that leads you to go back to several pages to see if they are met: black font is OK, read font is an error. You cannot go forward until you correct the error. 4. Quit "ChipScope Pro Core Insert" software, then click 'Save' button to save the edit file as *.cdc. 'NEVER CLICK INSERT BUTTON', otherwise it would generate double insertion error problem later. 5. Insert *.cdc file into the project by adding source file into the project; 6. Run synthesis only; 7. Double click *.cdc file in the project and check if any signals are needed to change; 8. 6-7 can be skipped if all debug signals are included in *.cdc. 8. Compile to generate bit stream file as usually. Then a bitstream containing debugging informatin is generated. ChipScope has some limits on how signals are accessed: 1. Input pin must be accessed through its registered values; 2. Output pin should be accessed through its internal drive signal; 3. All extra debugging signals, i.e., the signals that are added for debugging, must be linked to an extra output pin to avoid them from being optimized out. The extra output pin must be added in *.puf for debugging use only. A lesson: Never put more than 10 signals at the first time and try the ChipScope successfully. It is a very complex system and every step has a chance to trigger a miner over there. After you clear the way first time, you may add as many signals as you want to. Especially it is lucky and better if you have an helper who has experiences. Its manual is too detailed to start for a newbie. WengArticle: 108991
Austin Lesea wrote: Many thanks for your reply. > Have you read: > > http://tinyurl.com/qugxf Of course. It seems to be about the only source of the value of tau for Xilinx FPGAs I could find. Have I missed other TexhXclusives that give the same data for Virtex4/Spartan etc? > http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?iCountryID=1&iLanguageID=1&sTechX_ID=pa_metastability&BV_SessionID=@@@@1476187725.1158705950@@@@&BV_EngineID=cccfaddikmdkkhhcefeceihdffhdfjf.0 That is the same thing, right? > and > > http://www.xilinx.com/bvdocs/appnotes/xapp094.pdf And that is yet another substantially identical copy, except in PDF? I am very sorry if I missed it, but the article you refer to does not seem to give a value for T0 and does not address the case when the two clocks are not independent. What am I missing?Article: 108992
I did not give a value for T0 because it does not affect MTBF very much. The measurements were done with uncorrelated frequencies. If that is not the case, all bets are off. Except that, of course, metastability cannot ocur more often than once per clock period or once per data change, whichever is the lower frequency. I would be interested in your asynchronous environment. Peter Alfke, Xilinx ==================== comp.arch.fpga.posting.acco...@googlemail.com wrote: > Austin Lesea wrote: > > Many thanks for your reply. > > > Have you read: > > > > http://tinyurl.com/qugxf > > Of course. It seems to be about the only source of the value of tau for > Xilinx FPGAs I could find. Have I missed other TexhXclusives that give > the same data for Virtex4/Spartan etc? > > > http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?iCountryID=1&iLanguageID=1&sTechX_ID=pa_metastability&BV_SessionID=@@@@1476187725.1158705950@@@@&BV_EngineID=cccfaddikmdkkhhcefeceihdffhdfjf.0 > > That is the same thing, right? > > > and > > > > http://www.xilinx.com/bvdocs/appnotes/xapp094.pdf > > And that is yet another substantially identical copy, except in PDF? > > I am very sorry if I missed it, but the article you refer to does not > seem to give a value for T0 and does not address the case when the two > clocks are not independent. What am I missing?Article: 108993
Peter Alfke wrote: Many thanks for your reply. > I did not give a value for T0 because it does not affect MTBF very > much. Certainly not as much as tau, though it can make the difference between being comfortable using a synchroniser with just one flip-flop and needing two. Even the "trivial" upper bound for V2Pro is something like 0.2ns, so with a 10ns clock it increases MTBF by a factor of 50. I would hope that T0 is much smaller than 0.2ns but have no way of knowing for certain. Would I be right in thinking that T0 cannot be measured with any accuracy? > The measurements were done with uncorrelated frequencies. If that is > not the case, all bets are off. Except that, of course, metastability > cannot ocur more often than once per clock period or once per data > change, whichever is the lower frequency. So are you saying that the upper bound I came up with is correct? I would certainly be pleased if you were. > I would be interested in your asynchronous environment. I will not be the final user of the synchroniser for which I need to know the MTBF. I need to allow for the possibility that the two clocks will be produced by the same DCM and might therefore be synchronous.Article: 108994
comp.arch.fpga.posting.account@googlemail.com wrote: > Peter Alfke wrote: > > Many thanks for your reply. > > > I did not give a value for T0 because it does not affect MTBF very > > much. > > Certainly not as much as tau, though it can make the difference between > being comfortable using a synchroniser with just one flip-flop and > needing two. Even the "trivial" upper bound for V2Pro is something like > 0.2ns, so with a 10ns clock it increases MTBF by a factor of 50. I > would hope that T0 is much smaller than 0.2ns but have no way of > knowing for certain. Would I be right in thinking that T0 cannot be > measured with any accuracy? > > > The measurements were done with uncorrelated frequencies. If that is > > not the case, all bets are off. Except that, of course, metastability > > cannot ocur more often than once per clock period or once per data > > change, whichever is the lower frequency. > > So are you saying that the upper bound I came up with is correct? I > would certainly be pleased if you were. > > > I would be interested in your asynchronous environment. > > I will not be the final user of the synchroniser for which I need to > know the MTBF. I need to allow for the possibility that the two clocks > will be produced by the same DCM and might therefore be synchronous. Let's look at the basics: A flip-flop has an undefined output delay when the D input changes within a very tiny portion of the set-up time window, and the delay is the longer the closer that change is to the center of the tiny window. For a 3 ns extra delay I measured (indirectly) this tiny window as a small fraction of a femtosecond. Expressed this way, MTBF and data and clock frequencies fall out of the equation, and the behavior looks as if it were deterministic. So I consider this a basic figure of merit of the flip-flop. If your two frequencies are correlated, you may have a very hard time calculating the proabbility that the two edges ever get that close. They may always be very close, or they may never be close at all. Especially if you are exposed to, or if you rely on jitter... Peter AlfkeArticle: 108995
> Peter Alfke wrote: > > Vessumesh, if you refuse to answer specific helpful questions, then I > > suggest you figure this out yourself, and do not bother this newsgroup. > > Peter Sorry Peter, but i did not mean that. Sorry for the confusion. I am using v4LX60 for my design. And there is a requirement of adding two 37 bit no and doing some combinational logic based on that. The total time is 20ns. The adder is taking very little time, the full logic itself is taking around 4ns delay. But the main problem is with routing delay. I forgot to tell you that it is a block RAM based design. And it uses 128 BRAM frm v4lx60(implemented a 16 port RAM). Also it uses the block RAMS in a scattered manner. So now i have placed this block in the central region. So the last routing to the block RAMis taking lot of delays. In the previous version there was no combinational logic after the adder and i got the timig correctly. But not now. What i was asking is to add registers to latch the output of adder.I thought like it would be good for the PAR to see two paths insted of 1 path from a source FF to destination FF. Also Ray there is 32*16 such signals. Is it possible to manually route all those signals. I think the pipeling is not possible since this is part of a pipeline stage of a processor. Which expects the result in the same cycle. So pipelining is not an option. > It isn't the carry chain causing the problem. The problem comes about > from using many levels of logic (ie the signal goes through lots of > LUTs) between the flip-flops plus the propagation delay associated with > the carry chain. Ray i was asking that if we brake the above long line into separate parts using the +ve and -ve edge of the clocks is it possible to help tool for a better PAR. Thanks and regards Sumesh V SArticle: 108996
TDMOE is a standard that converts TDM (T1/E1) to Ethernet.. used by Asterisk and the like.Article: 108997
I was updating a CPU design I did a few years ago and I was a bit disappointed in the results I see. The CPU was originally targeted to an Altera ACEX part which is 5 volt compatible (to give you an idea of its age). I did my own CPU because Altera does not support their NIOS for that family. I spent a fair amount of time optimizing the architecture to be easy to implement in 4 input LUTs and other basic elements found in FPGAs. I coded it up for the ACEX async memories and got it running. If memory serves me, it clocked in at 55 MHz max and I used it at 40 MHz. Currently I wanted to look at how fast it might run if I redid it for a current FPGA architecture using synchronous memories. I compiled it for a Spartan 3 and got the speed up to 77 MHz using less than 10% of an XC3S400 (315 slices). I am not impressed with the speed. I expected a much larger increase and had hoped for operation at over 100 MHz. I checked the timing analyzer output and the signal paths are pretty much what I expected, no oddball logic generation and I got carry chains where I wanted them. The slow paths have a few long route times, so although it may approach 100 MHz with careful floorplanning, I don't think this is worth the effort compared to the >> 100 MHz CPU cores you can get from the FPGA vendors. I was wondering if this small speed up is typical of improvements from one or two generations difference in FPGAs? The ACEX parts are designed for economy, not for speed, just like the Spartans. When I did the initial design 3 or 4 years ago, the ACEX parts were old news then! Given that there was nothing in the design that is tailored for one FPGA family over another, I guess I expected more like a 2X speedup in the current technology chip. Isn't that reasonable given the vast difference in the timing specs in the data sheets?Article: 108998
Antti wrote: > Jim Granville schrieb: > > > betterone11@gmail.com wrote: > > > fpgaman wrote: > > > > > >>"http://www.latticesemi.com/products/intellectualproperty/latticemico32" > > > > finally - a 100% Eclipse-+GNU based SoC system with open-source RTL > that just works. I was looking at the open source agreement and one paragraph strikes me as a bit odd. Appendix C 3. The Provider grants to You a personal, non-exclusive right to use object code created from the Software or a Derivative Work to physically implement the design in devices such as a programmable logic devices or application specific integrated circuits. You may distribute these devices without accompanying them with a copy of this license or source code. It looks like the only rights to the object code created is to use it in an ASIC or FPGA. Am I just missing the point or does this keep you from using this for any other purpose? Or would there be no point to any other purpose? I am not real clear on which software the license is actually talking about.Article: 108999
Sumesh, I have a special place in the dungeon for people who ask questions where they leave the most important details out, and tell us afterwards. "O, by the way..." You started mentioning address and long carry chains, which -as we know by now- are completely irrelevant to your problem. You have a big routing mess, and you are not allowed to pipeline. Tough luck! I think Ray has the best possible advice, but I do not see an easy solution. Look at how you arrange your Dual-Port RAMs, and how you can exchange data between them. Are there any unexplored addressing tricks? Have you looked at Virtex-5LX devices? They can perform not only arithmetic, but also logic in the DSP slice (also called the multiplier-accumulator). And they are available, as I posted yesterday (funny, neither praise nor outrage in the ng. Everyone asleep?) Good luck, you may need it! Peter ====================== vssumesh wrote: > > Peter Alfke wrote: > > > Vessumesh, if you refuse to answer specific helpful questions, then I > > > suggest you figure this out yourself, and do not bother this newsgroup. > > > Peter > Sorry Peter, but i did not mean that. Sorry for the confusion. > I am using v4LX60 for my design. And there is a requirement of adding > two 37 bit no and doing some combinational logic based on that. The > total time is 20ns. The adder is taking very little time, the full > logic itself is taking around 4ns delay. But the main problem is with > routing delay. I forgot to tell you that it is a block RAM based > design. And it uses 128 BRAM frm v4lx60(implemented a 16 port RAM). > Also it uses the block RAMS in a scattered manner. So now i have placed > this block in the central region. So the last routing to the block > RAMis taking lot of delays. > In the previous version there was no combinational logic after the > adder and i got the timig correctly. But not now. > What i was asking is to add registers to latch the output of adder.I > thought like it would be good for the PAR to see two paths insted of 1 > path from a source FF to destination FF. Also Ray there is 32*16 such > signals. Is it possible to manually route all those signals. I think > the pipeling is not possible since this is part of a pipeline stage of > a processor. Which expects the result in the same cycle. So pipelining > is not an option. > > It isn't the carry chain causing the problem. The problem comes about > > from using many levels of logic (ie the signal goes through lots of > > LUTs) between the flip-flops plus the propagation delay associated with > > the carry chain. > Ray i was asking that if we brake the above long line into separate > parts using the +ve and -ve edge of the clocks is it possible to help > tool for a better PAR. > Thanks and regards > Sumesh V S
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z