Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On Jul 22, 11:53=A0am, "sdaau" <sd@n_o_s_p_a_m.n_o_s_p_a_m.imi.aau.dk> wrote: > I am trying to implement a custom counter (with clock and enable inputs); > synthesis and behavioral & post-translate simulation pass just fine (usin= g > ISE WebPack 13.2). On post-map simulation, I get this: > > at 271179 ps(5), Instance /my_counter_test/UUT/c_0/ : Warning: /X_FF SETU= P > High VIOLATION ON CE WITH RESPECT TO CLK; > =A0 Expected :=3D 0.428 ns; Observed :=3D 0.144 ns; At : 271.179 ns > <snip> > > Now, the most obvious thing would be to insert a delay of at least > 0.428-0.144=3D 0.284 ns between c_0.clk and c_0.ce (or between c_0.clk an= d > wclk), No, the most obvious thing would be to check your testbench and validate that your inputs meet the timing requirements because that's where the problem likely lies. > and I guess then the timing violation would be gone, is that > correct? > No it is not correct...unless you're only interested in covering up the problem and pushing it down the road to be fixed later. > However, the problem is that I would not want to move the first clk after > enable in the next period using the state machine - and I have no idea ho= w > to otherwise implement such a delay of ~ 0.3 ns. > In FPGAs, you can't implement controlled time delays. Delay lines are not a primitive element in the device. > I was thinking that timing constraints in the .ucf file would help Timing constraints should have already been specified, but if you haven't done so yet, then yes you should specify them. > So I was wandering - what would be the appropriate method to handle these > timing violations? And have I understood the above situation correctly? > I'm guessing based on what you described from the error message to signals in your design that you may understand the failing path, but what you're not understanding is what really needs to be fixed. The problem could very likely be in your testbench rather than the design but below I've listed the basic steps you need to follow: 1. Did you enter setup time constraints for all inputs? Did you setup clock to output delay time constraints for all outputs? (Note: For your particular problem, the cause is likely on the 'input' side) 2. What is the basis for the time constraints in #1? The correct answer to this question is the datashee(s) of any device(s) that are connected to the FPGA. 3. Are you sure you used the datasheet(s) timing constraints properly? Setup time (Tsu) for the FPGA will be clock to output (Tco) of the external device less any clock skew (Tskew) of the clock (period T). In other words, the UCF file needs to specify a setup time constraint of Tsu =3D T - Tskew - Tco. Repeat for each input. Do a similar procedure for FPGA outputs. 4. Did the FPGA's timing report state that it meets all timing constraints? The correct answer here is 'yes'. If not, iterate #1-4 until you have the correct answers to each question. On the assumption that you've properly made it through #1-4 (and assuming that there are no clock domain crossings), then your design is OK. Since the design is OK, this implies the result of a timing failure must be the testbench. The basic triage here is: 1. Verify that the inputs to the FPGA meet the requirements listed in the FPGA's timing report output. As an example, if you have some input that is generated synchronously, like this... Some_Inp <=3D Blah_Blah_Blah when rising_edge(clock); Then 'Some_Inp' will be transitioning 1 delta cycle (i.e. 0 ns) after the rising edge of 'clock'. That will never meet any non-zero hold time requirement that the FPGA timing report specified. Maybe the testbench delays the clock like this... Some_Inp <=3D Blah_Blah_Blah when rising_edge(clock); Clock_To_Fpga <=3D clock; Now the FPGA will see 'Some_Inp' and 'clock' transition at the exact same time. Think that will meet either a setup or a hold time requirement? 2. Although not relevant to your current problem, one would also want to verify that you're sampling outputs at the appropriate time as well. Usually though this is not a problem...if you did have a mistake here though it would show up as a functional failure reported by the testbench not a timing error reported by the post-route FPGA design. Since you didn't mention anything about multiple clocks in your design, I've assumed that the design is a single clock design. However, if there are multiple clocks then the error you reported could be because the clock enable input is generated in one clock domain and used to enable your counter which counts in another clock domain. If that's the case, then your design will fail, the solution is to resynchronize with a single flip flop the output from the source domain into the counter's clock domain. That resynchronized signal will be used to enable the counter. Kevin JenningsArticle: 152226
I'm so confused about the setup time of BUFGMUX. In the datasheet of Spartan6, this spec is defined relatively to rising edge. But if the structure of BUFGMUX is like http://www.design-reuse.com/articles/5827/techniques-to-make-clock-switching-glitch-free.html Then the setup time should be defined relatively to falling edge. Because the couple-register pair are both triggered with clock's negative edge. Why? What the actual structure of BUFGMUX in Spartan6. Thanks a lot.Article: 152227
>Hi all, > >* wclk and wenbl are the 'master' signals, and they are synchronous (they >both rise at exactly the same time) Nothing in post route rises at exactly the same time. Are these input signals driven from your testbench? If so you need to spec a hold time from wclk->wenable and change your testbench to add this. Clock enables are derived from the clock so they will have a clk->Q delay that gives them hold time. The easiest way to model this is to resync the wenable to the falling edge of wclk. The scary thing is that I think your simulation is catching the enable on the same wclk that creates the wenable. If thats so then everything is happening one cycle before it should. In real life if a clk creates an enable then the enabled act occurs on the next clock. John --------------------------------------- Posted through http://www.FPGARelated.comArticle: 152228
Brian Drummond <brian@shapes.demon.co.uk> writes: > On Fri, 22 Jul 2011 01:22:44 -0500, aibk01 wrote: > >> The build is successful. I can download the Bit file and run it. I can >> see the data being sent via FSL bus on the hyperterminal by printing the >> sent values. >> >> Now after the values are sent there is no return of data. What shud i do >> now? > > Simulate. That can be easier said than done when there's EDK involved - it usually means bringing up a simulation of the whole system, booting the simulated processor(s), running the test software and grovelling through an awful lot of waveforms you don't fully understand. I've had better luck using chipscope to debug these kinds of things on hardware (and this from a guy who's first answer is usually "Simulate" :) Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.conekt.co.uk/capabilities/39-electronic-hardwareArticle: 152229
On Mon, 25 Jul 2011 09:46:01 +0100, Martin Thompson wrote: > Brian Drummond <brian@shapes.demon.co.uk> writes: > >> On Fri, 22 Jul 2011 01:22:44 -0500, aibk01 wrote: >> >>> The build is successful. I can download the Bit file and run it. I can >>> see the data being sent via FSL bus on the hyperterminal by printing >>> the sent values. >>> >>> Now after the values are sent there is no return of data. What shud i >>> do now? >> >> Simulate. > > That can be easier said than done when there's EDK involved - it usually > means bringing up a simulation of the whole system, booting the > simulated processor(s), running the test software and grovelling through > an awful lot of waveforms you don't fully understand. > > I've had better luck using chipscope to debug these kinds of things on > hardware (and this from a guy who's first answer is usually "Simulate" > :) I have to agree that simulation with an EDK design can be a bit painful, and requires a full ISIM rather than the Lite version (or Modelsim). But with a bit of creativity to generate the smallest test case, it can be useful, especially when chasing bugs in the EDK-generated code. It's worth having the tool in the arsenal, even if it's rarely used. (I tend to relegate Chipscope to that role, but agree you sometimes do really need it) - BrianArticle: 152230
Hello, Do you have the XC3S200 or XC3S1000 on that board? If you got it from Digilent you would have had the option to get it with the 1000, and it might make a difference. I also have the Spartan 3 Starter Kit personally, with the 1000. I've lent it to somebody in the office, but I don't think anyone is using it now, so I might see if I can get OpenRISC to work on it. As Julius said, there is the orpsoc project on opencores which has all of the Linux makefiles you need almost ready to go. If you don't need the external memory, and can run on block RAM, then all you need to do is update the makefiles and pin assignments, make a new set of design defines for your board and oscillator frequency, probably update the clock generator to get the right frequencies out of the PLL, and include the "ram_wb" core in the defines. The external SRAM would not be too much harder to get included, but you would need a wishbone controller written for it, which doesn't seem to exist in the IP. I asked somebody about exactly this on the OpenCores forums last week, and received some code very close to what I needed for another board, I just got it updated and am getting ready to test it out. I think it may eventually get contributed back for anyone who needs it. Otherwise I'm willing to share, anyway, so just let me know. Alternatively, Aeroflex/Gaisler has the LEON3 soft core CPU (based on SPARC), which they offer for free if you don't need the fault tolerant version. They have a board support package for the Spartan 3 Starter Board all ready to go out of the box, and they give pretty good instructions on how to get it running, and it can be debugged directly through the stock Xilinx parallel or USB cable if that's what you happen to have. Once again, if you have the 200 version of that board, you might be out of luck, as their BSP supports the 1000, and I don't know if it would fit in the 200 or not.Article: 152231
Gabor, Ok awesome, thanks for the clarity. I have never designed a system in this configuration which is why I was asking :) -DArticle: 152232
I would be hesitant to refer to a non-Xilinx diagram for the internal structure of the BUFGMUX block. Plus the author is a technical staff member at Altera, so he's probably writing either: a. Generically -or- b. About an Altera FPGA I'd pay attention to whatever is in the Spartan6 User's Guide and datasheet and forget about whatever you read in this article (pull some concepts from it, yes, but don't make it your new religion about the S6).Article: 152233
Hi all I want to calculate a simple formula, including multiply and division operands. I use Verilog language to program FPGA. Can I use the sign of Multiplication (*) and Division (/)? Or I have to write the code of a Multiplication algorithm like Booth? RegardsArticle: 152234
On 7/25/2011 9:37 AM, ECS.MSc.SOC wrote: > Hi all > > I want to calculate a simple formula, including multiply and division > operands. I use Verilog language to program FPGA. > > Can I use the sign of Multiplication (*) and Division (/)? Or I have > to write the code of a Multiplication algorithm like Booth? > > Regards That depends on whether you expect the math to be done in the hardware or at compile time. If it's the latter, you can do whatever you'd like. If the former, then it'll depend. Most FPGA families have multiplers and the tools are smart enough to use those multipliers to perform a multiply when you specify one. Divides (by other than a power of 2) are a pain, and always require serial algorithms to do them. You'd be well served trying to replace that divide with a reciprocal multiply if possible. -- Rob Gaddi, Highland Technology Email address is currently out of orderArticle: 152235
"ECS.MSc.SOC" <mahdiyar.sarayloo@gmail.com> wrote: >Hi all > >I want to calculate a simple formula, including multiply and division >operands. I use Verilog language to program FPGA. > >Can I use the sign of Multiplication (*) and Division (/)? Or I have >to write the code of a Multiplication algorithm like Booth? That depend on the synthesis tools. I guess with most modern tools you can use the * and / sign. How it gets mapped to the hardware depends on the target. You really should consult the manual of the synthesis tool on how this is handled. -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico@nctdevpuntnl (punt=.) --------------------------------------------------------------Article: 152236
On Mon, 25 Jul 2011 09:37:22 -0700, ECS.MSc.SOC wrote: > Hi all > > I want to calculate a simple formula, including multiply and division > operands. I use Verilog language to program FPGA. > > Can I use the sign of Multiplication (*) and Division (/)? Or I have to > write the code of a Multiplication algorithm like Booth? If you can stand the repetition: it depends on your tools, and what you're trying to do. If you're multiplying integers then most tools will see a '*' and map it to a hardware multiplier (or synthesize one). I wouldn't trust a synthesizer to know how to do a fixed-point multiply that wasn't integer, although I would give it a whirl and see what happened. Divide is so resource hungry that there are a tremendous number of system-level decisions to be made in implementing it: I would be astonished at a synthesizer that would see a '/' and automagically map it to some sort of a divide. You need to read up on the algorithms in question to see why divide is so different from multiply, and to get an idea of what you might have to do to make it work. (Although I expect that most FPGA manufacturers and/or tool chain vendors will have some sort of divide primitive wizard that you can at least use for the bit-slice portion, even if you have to wrap it with your own sequencing logic). -- www.wescottdesign.comArticle: 152237
Hi everyone, I'm working on a conversion project where we needed to convert a PCI acquisition card to a PCI-express (x1) acquisition card. The project is essentially the same except instead that the new acquisition card is a PCI-express endpoint instead of being a standard-PCI endpoint. The project is implemented on a Xilinx FPGA, but I don't think my issue is Xilinx specific. The conversion has worked fine on all levels except one. The read latency of PCI express is about 4 times higher than standard PCI. For example, on the old product, it takes about 0.9 us to perform a 1- DWORD read. With the PCI-express product it takes about 3-4 us to perform a 1-DWORD read. I've seen this read latency both in real-life (with a real board) and in VHDL Simulation so I don't think that this is a driver issue. Do any of you have experienced similar performance issues? Don't get me wrong, for me PCI-express is a major step ahead, the write burst and read burst performance is way better than standard PCI.. Perhaps this is the reason, since most PCI-express cards are mostly used in burst transactions, the read latency does not really matter, therefore they sacrificed some read latency in order to obtain better performance. Best regardsArticle: 152238
On Mon, 25 Jul 2011 13:23:12 -0700 (PDT), Benjamin Couillard <benjamin.couillard@gmail.com> wrote: >Hi everyone, > >I'm working on a conversion project where we needed to convert a PCI >acquisition card to a PCI-express (x1) acquisition card. The project >is essentially the same except instead that the new acquisition card >is a PCI-express endpoint instead of being a standard-PCI endpoint. >The project is implemented on a Xilinx FPGA, but I don't think my >issue is Xilinx specific. > >The conversion has worked fine on all levels except one. The read >latency of PCI express is about 4 times higher than standard PCI. For >example, on the old product, it takes about 0.9 us to perform a 1- >DWORD read. With the PCI-express product it takes about 3-4 us to >perform a 1-DWORD read. I've seen this read latency both in real-life >(with a real board) and in VHDL Simulation so I don't think that this >is a driver issue. Do any of you have experienced similar performance >issues? > >Don't get me wrong, for me PCI-express is a major step ahead, the >write burst and read burst performance is way better than standard >PCI.. Perhaps this is the reason, since most PCI-express cards are >mostly used in burst transactions, the read latency does not really >matter, therefore they sacrificed some read latency in order to obtain >better performance. One lane PCIe 1.x should be able to turn a word read around in about 250ns assuming not too much else is going on. Of course an excessive number of switches (or slow switches) or slow hardware on either end are obviously possible issues. But PCIe is certainly much faster than 3-4us to read a word.Article: 152239
Am 25.07.11 13:11, schrieb GrizzlySteve: > and I don't > know if it would fit in the 200 or not. Definitely no. With limited peripherie (e.g. w/o MMU) it is usable on Spartan3 1000. For a really small soft-core (with gcc support) take a look at the ZPU. regards, BartArticle: 152240
Hi its maybe not so commonly known that there have been products using Actel s= ecure FPGA's have been cloned already many years ago (readback done by dark= engineers at Actel), few month ago a paper was published indicating that P= roAsic3 (and other newest Actel FPGA's) have master key that is known not o= nly inside Actel but also for the dark side outside the company. There is a= t least one known successful Actel ProAsic3 based product cloning done (ass= umed readback done at Actel fab, not outside). following post has link to documents that show that Xilinx V2/V4/V5 are vul= nerable as well. http://it.slashdot.org/story/11/07/21/1753217/FPGA-Bitstream-Security-Broke= n P.S. We do not have more info nor the master keys, please do not ask :) Antti Lukats http://trioflex.blogspot.com/Article: 152241
On Jul 25, 9:23=A0pm, Benjamin Couillard <benjamin.couill...@gmail.com> wrote: > The conversion has worked fine on all levels except one. The read > latency of PCI express is about 4 times higher than standard PCI. For > example, on the old product, it takes about 0.9 us to perform a 1- > DWORD read. With the PCI-express product it takes about 3-4 us to > perform a 1-DWORD read. =A0I've seen this read latency both in real-life > (with a real board) and in =A0VHDL Simulation so I don't think that this > is a driver issue. Do any of you have experienced similar performance > issues? I have no actual experience of experimenting with this, however, I have been interested in a latency sensitive device that may potentially use PCI-E so have been looking around for answers. Have a look at this write up, of a comparison of HyperTransport and PCI-E. The authors claim around 250 nano-seconds (page 9) to read the first byte: http://www.hypertransport.org/docs/wp/Low_Latency_Final.pdf It would be interesting to hear what is causing you to see 3-4 us? That would kill off my potential project, so I am hoping to be able to match the results in the above paper. Could there be some inaccuracy in your measurements; how do you measure the latency? RupertArticle: 152242
When designing with PCI or PCIe you should really try to avoid reads as much as possible. What do you need it for anyway? In a multitasking operating system you are going to have microseconds of jitter on the software side in kernel mode and tens of miliseconds in user mode anyway. So I am wondering what the scenario is that benefits from sub us latency for software reads? Kolja cronologic.deArticle: 152243
Generally speaking PCI Express much more prone to latency than convertional PCI because packets have to be constructed, passed through a structure of nodes, and checked at most levels. Data checking isn't completed, and onward transmission, until last data arrives and CRCs are checked. If you do a "read" this will have a packet outgoing and one coming back so doubly worse. If you can do a DMA like operation where data is sent from the data source and then interrupt your system to use the data in memory. The latency will also vary from system to system because rooting structures differ between motherboards. The amount of other things going on will also affect latency as different things contend for the data pipes. Generally speaking if you are trying to do anything real time it is something of a nightmare if you are planning using the host motherboard processor for control functions. You can try and make the latency smaller by using smaller packet sizes and this sometimes helps. Ultimately if there is a real time element to this then putting the processing and/or control on your card is probably best for performance and accuracy. John Adair Home of Raggedstone2. The Spartan-6 PCIe Development Board. On Jul 25, 9:23=A0pm, Benjamin Couillard <benjamin.couill...@gmail.com> wrote: > Hi everyone, > > I'm working on a conversion project where we needed to convert a PCI > acquisition card to a PCI-express (x1) acquisition card. The project > is essentially the same except instead that the new acquisition card > is a PCI-express endpoint instead of being a standard-PCI endpoint. > The project is implemented on a Xilinx FPGA, but I don't think my > issue is Xilinx specific. > > The conversion has worked fine on all levels except one. The read > latency of PCI express is about 4 times higher than standard PCI. For > example, on the old product, it takes about 0.9 us to perform a 1- > DWORD read. With the PCI-express product it takes about 3-4 us to > perform a 1-DWORD read. =A0I've seen this read latency both in real-life > (with a real board) and in =A0VHDL Simulation so I don't think that this > is a driver issue. Do any of you have experienced similar performance > issues? > > Don't get me wrong, for me PCI-express is a major step ahead, the > write burst and read burst performance is way better than standard > PCI.. Perhaps this is the reason, since most PCI-express cards are > mostly used in burst transactions, the read latency does not really > matter, therefore they sacrificed some read latency in order to obtain > better performance. > > Best regardsArticle: 152244
On Jul 26, 5:19=A0pm, John Adair <g...@enterpoint.co.uk> wrote: > If you do a "read" this will have a packet outgoing and one coming > back so doubly worse. If you can do a DMA like operation where data is > sent from the data source and then interrupt your system to use the > data in memory. In the paper I posted a link to, I think the times are for an interrupt, or for DMA, not a software initiated "read". Thanks for explaining the difference. RupertArticle: 152245
"Benjamin Couillard" <benjamin.couillard@gmail.com> wrote in message news:62427806-eeec-499b-a0f0-15ffafa0e3ab@w27g2000yqk.googlegroups.com... > Hi everyone, > > I'm working on a conversion project where we needed to convert a PCI > acquisition card to a PCI-express (x1) acquisition card. The project > is essentially the same except instead that the new acquisition card > is a PCI-express endpoint instead of being a standard-PCI endpoint. > The project is implemented on a Xilinx FPGA, but I don't think my > issue is Xilinx specific. > > The conversion has worked fine on all levels except one. The read > latency of PCI express is about 4 times higher than standard PCI. For > example, on the old product, it takes about 0.9 us to perform a 1- > DWORD read. With the PCI-express product it takes about 3-4 us to > perform a 1-DWORD read. I've seen this read latency both in real-life > (with a real board) and in VHDL Simulation so I don't think that this > is a driver issue. Do any of you have experienced similar performance > issues? Is it possible that time-stamping the data would disconnect you somewhat from the latency problem? Usually data can't be processed and presented real-time at those speeds anyway..Article: 152246
There is an utterly horrible VHDL howler on page of 45 of the latest Xcell Journal. Two example codes for a register with reset are given: signal Q: std_logic:=‘1’; ... async: process (CLK,RST) begin if (RST= ‘1’) then Q <= ‘0’; elsif (rising_edge CLK) then Q <= D; end if; end This would be OK if the clock edge function call had been 'rising_edge(CLK)' instead, and there was a semi-colon after the last 'end'. signal Q: std_logic:=‘1’; ... async: process (CLK) begin if (RST= ‘1’) then Q <= ‘0’; elsif (rising_edge CLK) then Q <= D; end if; end This has the same errors as the first, but (despite the unchanged process name) is meant to infer a synchronously reset register. BUT ALAS AND ALACK! As written - at least in simulation - the reset will be applied on either edge of CLK. What XST would make of it can only guess. It should be: signal Q: std_logic:='1'; ... sync: process (CLK) begin if rising_edge(CLK) then if (RST= '1') then Q <= '0'; else Q <= D; end if; end if; end process sync; Such slip-shop work rather reduces one's confidence in the rest of the contents. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 152247
I should have added that this is also at: http://forums.xilinx.com/t5/General-Technical-Discussion/VHDL-horror-in-Xcell-76/td-p/167622 --------------------------------------- Posted through http://www.FPGARelated.comArticle: 152248
On Jul 26, 2:04=A0am, Antti <antti.luk...@googlemail.com> wrote: > Hi > > its maybe not so commonly known that there have been products using Actel= secure FPGA's have been cloned already many years ago (readback done by da= rk engineers at Actel), few month ago a paper was published indicating that= ProAsic3 (and other newest Actel FPGA's) have master key that is known not= only inside Actel but also for the dark side outside the company. There is= at least one known successful Actel ProAsic3 based product cloning done (a= ssumed readback done at Actel fab, not outside). > > following post has link to documents that show that Xilinx V2/V4/V5 are v= ulnerable as well. > > http://it.slashdot.org/story/11/07/21/1753217/FPGA-Bitstream-Security... > > P.S. We do not have more info nor the master keys, please do not ask :) > > Antti Lukatshttp://trioflex.blogspot.com/ No one should ever assume the device security offered is 100% uncrackable. I used to know a guy who did legit "dark engineering" for government devices, and it was amazing to hear stories of drilling out holes in "secure devices" and extracting data using microscropic probes. Another engineer I knew has a collection of IC's embedded in epoxy - the company he worked for would shave them layer by layer to extract the design physically. (So no, going to an ASIC won't necessarily be 100% secure either) If man can make it, man can break it. The trick is to make it more expensive for the cloners to crack than it would be to just license, buy, or reverse engineer another way. Besides, a lot of places still send bit streams to China for programming during assembly, and at that point, adding bit-stream security is a bit like setting the deadbolt on an already open, and empty, barn. A better metric for FPGA bitstream security, or any security product, is the cost per breach and/or time per breach. Assume it can be breached, and pick a method where the [cost/time]/[breach] equation works out in your favor. BTW - this also means that devices with a master key are very bad - because the time/breach is only paid once, and you can rest assured, someone besides the manufacturer has it already. For an example of this done right, there is an IBM crypto chip that I believe is still unbroken - but it has wires around the die that control power the SRAM memory holding the crypto keys. If you drill into the package, and cut one of the wires, the device loses its memory - and becomes a dud. Obviously, you also have to do this work with the chip in-system, and running, for the same reason. This is the equivalent of the lock on an underground bank vault. We will know FPGA vendors are equally serious when the offer a part with that level of security. Until then, it's pretty much the equivalent of the standard locks on our front doors. Good enough to keep the riff-raff out, but not enough to keep the serious thieves away.Article: 152249
Hi all, First of all, thank you all for the very prompt responses, and sorry I couldn't respond earlier. I think the crux of the matter is summed up in @jt_eaton's comment: > Nothing in post route rises at exactly the same time. .. but I believe I should try to explain a bit, what it is I'm looking after. A bit of a mammoth post follows - apologies in advance. For one, I have only partial knowledge of HDL, but so far I manage somehow. My biggest problem is, basically, that when I start coding, usually I end up confused in the "things happening in the next clock cycle" thing. From my sequential programming background, say when I see "a=2;" in C; I read that as: "_after the program counter passes this statement, a holds value 2_" ... I try to relate that to HDL as in "_after the simulator passes this posedge, a holds value 2_" - so when I code stuff with this expectation, and I see 'action on next cycle' in simulator, I get confused thoroughly. Then I do all my best to defeat that in behavioral simulation - and usually I manage; then I come to post-map sim, and I realize most of that does NOT really work. So, I decided to study this a bit on a simpler example; for instance, for a chip interface, I'll need a clocked counter with enable and reset. The concept would be simple: when enable high, do increase count on clock posedge; on reset high, do not increase count and set count to 0. For instance, that is exactly the kind of device which is given here: http://www.asic-world.com/vhdl/first1.html#Counter_Design_Block I modified that code a bit (counter_aw.vhd), and used my own testbench (test_twb.vhd), which I put here (along with some screenshots I'll refer to): http://sdaaubckp.sourceforge.net/post/vhd_counter_aw/ Clock is 50 Mhz (period 20 ns). The "Counter_Design_Block" is architecture 'behav' in the 'counter_aw.vhd' file (uncommented). This one works under behavioral simulation as I expect it to (aw_orig_beh_sim.png); that is, reset of counter to 0 and its increase happen at the posedges I expect. Same results are for post-translate simulation (aw_orig_post-trans_sim.png) - however, post place and route sim (post-par_sim_delayed.png) is 'delayed' - e.g. from posedge of enable, to when cout becomes high, is like 30 ns (~ 3 clock semiperiods); however that is not the same delay throughout the sim run! Since I encountered this before, I tried to code "my own" counter (architecture my_starting_point, commented), and I immediately made some mistakes - first, the final assignment to the output port was within an 'IF', so even behavioral simulation showed everything delayed to next clock cycle (aw_startp_beh_sim_delayed.png); after fixing that, this counter behaves more-less the same as the previous example (aw_startp_beh_sim_ok.png) - but the problem with it, is that it is not synthesizable (as far as I can see, the problem is using rising_edge twice on different signals in the same process). So, after solving that, I basically ended up with the problem described in the original post - unfortunately, I cannot reconstruct the conditions with the X's (that appear approx 4 ns after rise of wclk) that I got in the original post (then again, that day my PC did crash a couple of times, so maybe that had something to do with problems with memory for ise or isim?). Then I got to the inverter thing, removed some of the timing violations with it; and found that to avoid the final timing violations, 'reset' internally would have to be effectuated 'first', 'enable' second and the 'clk' last - so I delayed the clk twice (four inverters), and enable once - and I got to architecture my_ending_point (commented). With my_ending_point code, the behavioral simulation (aw_endp_beh_sim_delayed_no-ucf.png) seems fine, except that the very first count after enable happens in "next" clock cycle -- however, post-par sim (aw_endp_post-par_sim_delayed_ucf.png) shows that, in addition, there are glitches - and there is almost 10 ns delay (the 'effectuation' of the count happens almost on clk negedge)!! For the post-map sim (post-map_sim_delayed_ucf.png) this delay seems to be less (though still 5 < x < 10 ns) , but glitches are still there. While I'm at the glitches, "Xilinx Synthesis and Simulation Design Guide" notes: > Glitches in Your Design > When a glitch (small pulse) occurs ..., the glitch > may be passed along ..., or it may > be swallowed .. . .. To produce more > accurate simulation of how signals are propagated > within the silicon, Xilinx models this > behavior in the timing simulation netlist. When it says "Xilinx *models*", does it mean that the glitches will be there present "by design" of the HDL code circuit - or is it something the simulator introduces? Meaning, should I try to eliminate them through design, or should I just be careful if they "propagate"? Then again - I wasn't really aware of this until now - I was reading a bit more on this, and turns out from basics, that minimal configuration of synchronous (as in combinatorial/unclocked) circuits (Mealy/Moore ?!) are *by default* glitchy, and one is advised to "buffer" the result with a (clocked) FF - which results with the actual 'effectuation' occurring on next clock cycle; so maybe the glitches in the sim just try to illustrate this effect? Anyways - I'm sure in my initial code I used to get somewhat less than 5 ns delay for post-map (which is why I'm surprised slightly at the above results), but I can't reconstruct that anymore. Which, of course, means I haven't done something right :) I guess my question would be down to - what am I missing, so that I can get somewhat like the aw_orig_beh_sim.png results in post-par sim, but delayed by no more than quarter period? That, for me, would be a confirmation that the engine should more or less work reliably on the chip as well - but is that a correct assumption? (if not, then I probably shouldn't bother getting so "ideal" post-map/par results, ideal as in "results almost like behavioral sim"). I've tried putting in some timing constraints (aw_endp_counter.ucf), while trying to get rid of static timing and ise warnings as well (synthesizer doesn't like outputs of combinatorial logic [due to use of inverters] to be used as clock) - but I'm not really sure what I'm doing; since as far as I can remember, changing the constraint values didn't really result with much difference in post-map/par simulation. Well, I guess this is as detailed as I can formulate my problem for now ... > I presume you forced it to keep the inverters, otherwise they > will usually optimize away. You might try with only one forced, > in which case it will optimize the other by inverting the signal > somewhere else. Or with a forced non-inverting gate. Interesting trick about keeping only one forced - I just used "attribute KEEP" on all of the involved signals, that seems to have worked.. >> >> Now, the most obvious thing would be to insert a delay ... > > No, the most obvious thing would be to check your testbench and > validate that your inputs meet the timing requirements because that's > where the problem likely lies. > ... >> >> I was thinking that timing constraints in the .ucf file would help > > Timing constraints should have already been specified, but if you > haven't done so yet, then yes you should specify them. Got it - thanks to this comment, I started looking into timing constraints as ISE understands them (in .ucf file), but I still cannot get a proper understanding of those.. > In FPGAs, you can't implement controlled time delays. Delay lines are > not a primitive element in the device. Got that too - but could one consider two inverters to behave as a somewhat controlled delay (as in, the actual delay obtained by them is dependent on how they end up being routed - but we can still now they'll insert, say, approx 0.4 ns?) > I'm guessing based on what you described from the error message to > signals in your design that you may understand the failing path, but > what you're not understanding is what really needs to be fixed. Exactly - this is 100% correct :) > The problem could very likely be in your testbench rather than the > design That could indeed be the problem - @jt_eaton seems to agree ... > below I've listed the basic steps you need to follow: Thanks for taking the time to write those up, @KJ, much appreciated! > 1. Did you enter setup time constraints for all inputs? Did you > setup clock to output delay time constraints for all outputs? (Note: > For your particular problem, the cause is likely on the 'input' side) I didn't at first; then I tried, but as I said, I'm not sure I understand it. For instance, i have: OFFSET = IN 6 ns VALID 8 ns BEFORE "clk" RISING; ISE draws a sort of a diagram, and the way I interpret the diagram, the above sentence should mean "do not allow that a data signal synchronous with rising edge of CLK, propagates outside of 2 < x < 4 ns range"; which is likely not correct, since I couldn't perceive anything to that effect in simulation results. > 2. What is the basis for the time constraints in #1? The correct > answer to this question is the datashee(s) of any device(s) that are > connected to the FPGA. Well, I have the wrong answer, unfortunately :/ Essentially, I saw the above timing violations, and simply tried to 'translate' them to timing constraints (as I understood them above) - that probably was not the right way to do it. Other than that, I'm running clock @50 MHz, so I tried to make the testbench for that - and to make the timing constraints relate to 100 MHz clock (as in - "if it works @100, it will work for 50 MHz too"); the device I'm intending to use this with counter with, however, may require a much slower counter (kHz). > 3. Are you sure you used the datasheet(s) timing constraints > properly? Setup time (Tsu) for the FPGA will be clock to output (Tco) > of the external device less any clock skew (Tskew) of the clock > (period T). In other words, the UCF file needs to specify a setup > time constraint of Tsu = T - Tskew - Tco. Repeat for each input. Do > a similar procedure for FPGA outputs. Thanks for this - I'll need to chew on this a bit more, I wasn't aware of the "setup time constraint". > 4. Did the FPGA's timing report state that it meets all timing > constraints? The correct answer here is 'yes'. If not, iterate #1-4 > until you have the correct answers to each question. Thanks for this too - I found the Implement Design/Map/"Analyze Post-Map Static Timing"; at first it was complaining (showed red X's), then I got it to stop (but for the most part, I was just trying different numbers around based on the messages I got, not sure what I actually did there :) ) Actually, now that I come back to it, I can see a fail: > Timing constraint: TIMEGRP "couts" OFFSET = OUT 5 ns AFTER COMP "clk"; > ... > Minimum allowable offset is 6.106ns. > -------------------------------------------------------------------------------- > > Paths for end point cout<11> (IOB.PAD), 1 path > -------------------------------------------------------------------------------- > Slack (slowest paths): -1.106ns I guess from this, if I put OFFSET = OUT 6.2 ns, it will pass? Or is there another way to force the synthesizer to conform to 5 ns? > On the assumption that you've properly made it through #1-4 (and > assuming that there are no clock domain crossings), then your design > is OK. Talking about clock domain crossings - would inverting a clock four time, and "declaring" that signal as clock as well, constitute clock domain crossing? > Since the design is OK, this implies the result of a timing > failure must be the testbench. The basic triage here is: Many thanks for writing this up as well :) > 1. Verify that the inputs to the FPGA meet the requirements listed in > the FPGA's timing report output. As an example, if you have some > input that is generated synchronously, like this... > Some_Inp <= Blah_Blah_Blah when rising_edge(clock); > Then 'Some_Inp' will be transitioning 1 delta cycle (i.e. 0 ns) after > the rising edge of 'clock'. That will *NEVER* meet any non-zero hold > time requirement that the FPGA timing report specified. Thanks for this (emphasis mine) - as it can be seen in test_twb.vhd (from link above), what I do is simply: ... wenbl <= '0'; wreset <= '0'; ... .. which, I guess, means "effectuate these signals in parallel/at the same time" - and thus the 0 ns transition you're speaking of? > Maybe the testbench delays the clock like this... > Some_Inp <= Blah_Blah_Blah when rising_edge(clock); > Clock_To_Fpga <= clock; > Now the FPGA will see 'Some_Inp' and 'clock' transition at the exact > same time. Think that will meet either a setup or a hold time > requirement? I have not used the "when" syntax so much - but I'd answer (from my somewhat sequential programming perspective, and after the tips so far) like this: * Some_Inp part will "block" until rising_edge of clock; when posedge clock occurs, it will effectuate the next statement - however after a delta of 0 ns; ** that is, Clock_To_Fpga will be effectuated "now"/"in parallel" with the previous statement - that is on posedge of 'clk'; * FPGA will see both Some_Inp and Clock_To_Fpga change at the "same time"; * since FPGA expects that a setup time and hold time of minimum X ns has transpired from the moment Some_Inp changes, to the moment 'Clock_To_Fpga' changes (and, I assume, activates sampling of Some_Inp) .. hence, there will be a setup or hold timing violation - i.e. time requirement will not be met. (?) > 2. Although not relevant to your current problem, one would also want > to verify that you're sampling outputs at the appropriate time as > well. Usually though this is not a problem...if you did have a > mistake here though it would show up as a functional failure reported > by the testbench not a timing error reported by the post-route FPGA > design. Would this be related to glitches too? I.e. if glitches occur close to posedge sampling clock transition, I may want to 'buffer' the output, until the next negedge for instance? > Since you didn't mention anything about multiple clocks in your > design, I've assumed that the design is a single clock design. > However, if there are multiple clocks then the error you reported > could be because the clock enable input is generated in one clock > domain and used to enable your counter which counts in another clock > domain. Could it be, that the synthesizer recognizes the "twice inverted" clock signal as a clock from a second domain? > If that's the case, then your design will fail, the solution > is to resynchronize with a single flip flop the output from the source > domain into the counter's clock domain. That resynchronized signal > will be used to enable the counter. Would that resynchronization be like the 'buffering' for the minimal Moore/Mealy glitching mentioned above? If so, then it would 'delay' the 'effectuation' of values until next clock cycle, right? >> * wclk and wenbl are the 'master' signals, and they are synchronous (they >> both rise at exactly the same time) > > Nothing in post route rises at exactly the same time. > Thanks for that - I guess now, I'm better aware of that; but when the thread started I wasn't. Can this also be interpreted as: "Nothing in post route should rise at exactly the same time" (as far as signals from the testbench are concerned)? > Are these input > signals driven from your testbench? Yup. > If so you need to spec a hold time from > wclk->wenable and change your testbench to add this. Many thanks for that - see, *that* I wasn't aware of ... Will have to look that up. > Clock enables are derived from the clock so they will have a clk->Q delay > that gives them hold time. Ok, that makes sense - much appreciated :) > The easiest way to model this is to resync the > wenable to the falling edge of wclk. Makes a lot of sense now - will give it a shot. I know the answer is probably yes - but in that case, do I again have to worry about timing constraints? > The scary thing is that I think your simulation is catching the enable on > the same wclk that creates the wenable. I think that is correct - actually, it seems it does perceive some delay between the wenable and the wclk, but (I guess) not enough. > If thats so then everything is > happening one cycle before it should. In real life if a clk creates an > enable then the enabled act occurs on the next clock. Thanks for that - the occurring on "next clock" was exactly what I wanted to avoid; and it seems, with all the "inverter delays" and such, what I managed to do is move everything to happen "one cycle before it should" :) In any case, to sum up - while I'm starting to see why "update on next clock" is so important - is it still possible (or smart) to aim for updates occurring at least earlier than a semiperiod *before* the 'next' clock (and this is simply for my own perceptual ease in reading simulation results: then it would be easier for me to read, if I get the value I expect in *this* cycle)? Thanks again for the awesome guidance, Cheers! --------------------------------------- Posted through http://www.FPGARelated.com
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z