Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
True, crying is easy. The design is running at 200Mhz DDR. As crappy as the DCMs are in the ES parts, I don't think I want to try running a faster clock in the part to sample with. And I am already using 96% of the XC2V3000 part so I don't have much logic left to play tricks with. I am going to get down in the part and go for performance on this one. The hand routing solution from Carl works, so I'm off to the races. Bryan "Falk Brunner" <Falk.Brunner@gmx.de> wrote in message news:9vtr7v$hs8un$3@ID-84877.news.dfncis.de... > "Bryan" <bryan@srccomp.com> schrieb im Newsbeitrag > news:3c20c0ef$0$25796$4c41069e@reader1.ash.ops.us.uu.net... > > > ncd to hand route my couple nets. What I am building into a macro is a 16 > > bit FIFO, I have 16 of these FIFOs in the design and each one contains IOB > > The question is, how fast are the clocks for this FIFOs. If it is not too > fast, you can use some oversampling methods to run everything on ONE clock. > Also, if the frequency of these FIFOs is considerably high, you should think > about a very small, fast synchronizer circuit to synchonize the incomming > datas to one clock domain and, again, work everything on one clock. > > Its quite easy to run out of clock nets, but its engineering to solve the > problem without crying that much ;-) > > -- > MfG > Falk > > > > >Article: 37851
"Ray Andraka" <ray@andraka.com> wrote in message news:3C22757C.8EA86695@andraka.com... > I think using both in the same design, at least for a consultant doing designs for > a client, is a sin among sins. It requires the client to obtain and maintain two > tool flows if he wishes to do anything with the design without having to come back > to you. Pick one and stick with it for the whole design. I entirely disagree. FPGAs have to go on a board. The board is drawn in schematics...I don't know of any HDL board tools...so typically, clients already have a schematic package. Technically, you are already mixing tools...you have the front end synthesis, and the back end Xilinx tools...and there are many of them. Being able to do mixed input designs can vastly increase productivity, as well as allow for the inclusion of external designs done in the "other" language. No one has ever had a complaint with my doing this. In fact, using Abel as a source for schematic modules was very popular some 8 or so years ago...before Verilog/VHDL synthesis tools became "usable" for FPGAs. Also, when synthesis tools are rev'd, there inevitably are changes in the way the tool creates its output...and as such, won't give you consistent results, so you technically should be archiving the tools with the design...BUT...the vendors (typically) won't give you a forever license for THAT revision of the synthesis tools so you can do that...as well as trying to get multiple revisions of tools running on the same computer...you're better off archiving the whole damn computer! To each his own, but for me, I found mixed designs work great, are easier to read/understand and give the client as well as the designers, more flexibility and a better end product. There is ONE tool that is missing from schematics, and that's something like "grep"...it would be great to be able to compare two schematics and find out the differences visually... I do like that about text files... AustinArticle: 37852
Bret, I'm still waiting for an answer to this, would you (or anyone who knows the answer) please respond... This was what I asked: Bret, Where are you assigning these attributes? You said in the "front end tools", yet Synplcity has an "syn_useioff" that doesn't appear to matter...you still need the "-pr b" in the mapper. According to the Synplicity docs, there is no "iob" attribute... Are you talking about in a constraint file? That's really got nothing to do with the synthesis front end tools... AustinArticle: 37853
Do you need your large count with a resolution of 1? Quite often a large count doesn't need to be adjustable to the nearest count, but instead say a multiple of 32. If so, you can use one counter as a prescaler, doing a /32, then the other to count the prescaler count.Article: 37854
> Copy the component declaration wizard code into your architecture > code, and instantiate it. > > Leonardo generates an .edf file with the component as a black box. > > Maxplus2 reads a wizard-generated vhdl file to fill in the > black-box component. Make sure the wizard files are in the > same directory as the .edf file. That is a workable band-aid, but did Altera explain the original synthesis problem? -- Mike TreselerArticle: 37855
I see your problem, but we will not make the read operation combinatorial. There are user-advantages to being clocked, but there are also cases, like yours, where it is a drawback. But the circuit implications on our side are such that we will stick with clocked read for the foreeable future. There are "slightly dirty" tricks, like using the falling edge as the read clock. Most likely involves some clock XOR-gating, which requires finesse... Sorry, Santa cannot help you. Peter Alfke Rob Finch wrote: > I wish block ram had an async read option like distributed ram. How hard > would this be to do ? The problem I have is using the same port to perform > both read and write operations for a cpu. The cpu always generates an > address that is a registered output registered on the clock edge. So just > after the clock edge, the address is available. This works great for writes > because the next clock edge can be used to write the data to the block ram. > However it doesn't work for reads, because we want the next clock edge to > latch the data into a cpu register. Instead, the read data isn't available > until after the next clock edge. So 1) a wait state could be inserted for > read operations (cuts performance in half). 2) we can use the address from > the cpu as it is just before it's registered and use a second port of the > block ram - means we have two address busses and the block ram can't be > shared with another device, or twice as many blocks rams are required. > > RobArticle: 37856
I wish the same thing as you do, and for the same reason... I think altera has a solution for this, but the use of their blockrams is not so clear documented as from Xilinx. "Rob Finch" <robfinch@sympatico.ca> schreef in bericht news:vpBU7.36868$x25.3709356@news20.bellglobal.com... > I wish block ram had an async read option like distributed ram. How hard > would this be to do ? The problem I have is using the same port to perform > both read and write operations for a cpu. The cpu always generates an > address that is a registered output registered on the clock edge. So just > after the clock edge, the address is available. This works great for writes > because the next clock edge can be used to write the data to the block ram. > However it doesn't work for reads, because we want the next clock edge to > latch the data into a cpu register. Instead, the read data isn't available > until after the next clock edge. So 1) a wait state could be inserted for > read operations (cuts performance in half). 2) we can use the address from > the cpu as it is just before it's registered and use a second port of the > block ram - means we have two address busses and the block ram can't be > shared with another device, or twice as many blocks rams are required. > > Rob > > >Article: 37857
Hi, Frank. What you describe is the classical double-synchronizer ( sped up by using alternate clock edges) At any reasonable clock rate, this will stop the propagation of metastable signals. So, go ahead. Frohes Fest ! Peter Alfke, Xilinx Applications =============================== Frank Papenfuss wrote: > Dear FPGA comunity, > > I have a design that must cope with asynchronous input > signals. Basically I have a WE pulse that gates a data > vector into the chip. The WE signal is sampled by two > FFs to enshure proper pulse detection. One FF is clocked > by the positive edge of the system clock and > one by the negative edge (I do not want to go > into too much details about why I must do this). The FFs > that sample the pulse connect to the CE (clock enable) > of the following FF to prevent the metastable state from > probagating actually into the design. Since I have only > simulated this so far I cannot say if it will really work > inside the chip (which will be a XILINX FPGA). > > My question is: Has anyone experience with using CE as > a mean to prevent a metastable state from probagating > further. > > Tool Setup: > ----------- > Simulation & Synthesis: SYNOPSIS Ver 1999.10 > Target Technology Mapping: XILINX Design Manager V3.3.08i > Target Part: XILINX VirtexE XCV300E-8-PQ240 > > I would also be greatful if you could point me to some > electronically available article, technote or appnote > about this topic, if available. > > Thanks in advance, > FRANKArticle: 37858
Rick Filipkiewicz wrote: > Frank Papenfuss wrote: > > > Dear FPGA comunity, > > > > I have a design that must cope with asynchronous input > > signals. Basically I have a WE pulse that gates a data > > vector into the chip. The WE signal is sampled by two > > FFs to enshure proper pulse detection. One FF is clocked > > by the positive edge of the system clock and > > one by the negative edge (I do not want to go > > into too much details about why I must do this). The FFs > > that sample the pulse connect to the CE (clock enable) > > of the following FF to prevent the metastable state from > > probagating actually into the design. Since I have only > > simulated this so far I cannot say if it will really work > > inside the chip (which will be a XILINX FPGA). > > > > My question is: Has anyone experience with using CE as > > a mean to prevent a metastable state from probagating > > further. > > > > Frank, > > It is an unfortunate fact that if an signal from a source async to a > clock is sampled on that clock then there is always a chance that a > metastable state could propagate arbitrarily far into your system. > > Metastability is a statistical thing and so all you can do is reduce the > probability of its affecting your system to some very small number (or > the MTBF >> time between you changing jobs). > The first flip-flop will undoubtably go metastable occasionally. For the second flip-flop to go metastable, the first Q must transition just at the sensitive moment of the second flip-flop. That is very unlikely ( but the probability is not zero) If the settling time margin from the Q of the first flip-flop to the D of the second flip-flop is reasonably long ( 5 ns or more) then the probability of the second Q behaving strangely will border on zero. If human life depends on the proper operation of this circuit, add another stage. Peter AlfkeArticle: 37860
Neat idea, but is it really worth it? A synchronous binary counter with a capacity of counting to one million takes just 20 flip-flops, using the carry structure in modern FPGAs. That's 5 CLBs in Virtex or Spartan-II. And this design runs well above 100 MHz and requires zero creativity or even thinking. Peter Alfke =========================== Carl Brannen wrote: > Re very long counter design... > > > In my design I need to make a synchronous counter that counts, let's > > say, till 1000000. (Actual aim for counter is to built in a delay). I > > do this by the use of integer type signals and with each clock'event I > > add 1 till I reach the wanted 1000000. When I try to implement this > > in an FPGA it consumes a very high amount of CLBs and it seems very > > disastrous for the maximum reachable clock freq. > > Assuming that you don't care about the intervening counts, you can use SRL16s > and SRL16Es to create relatively efficient large counters. And you don't have > to deal with decoding LFSR values either. > > An SRL16 with its Q output brought back to its D input can be initialized (with > an INIT attribute) to have only a single bit high. The other (of up to 17 > bits), are initialized to zero. As it clocks around, it produces a pulse > every 17th clock. > > This puts a counter with a length of up to a little over 4 bits (i.e. log2(17)) > into a single LUT. That's 4x as efficient as regular counters, and you get a > free registered "done" bit. You can gang these up, either by using the > enables, or by ANDing the outputs of counters whos periods have no common > divisor. > > Example with 5 SRL16/SRL16Es, gets within 5% of 10^6 clocks, uses only 7 LUTs: > > First SRL16 goes high every 17th clock. It's output connects to the enable > input of an SRL16E that also is set for 17 clocks. The result: Two bits, that > when ANDed, produce a pulse every 17^2 = 289 clocks. > > Third SRL16 goes high every 15th clock. It's output connects to the enable > input of an SRL16E that also is set for 15 clocks. The result: Two bits, that > when ANDed, produce a pulse every 15^2 = 225 clocks. > > Fifth SRL16 goes high every 16th clock. > > Since 17^2, 15^2, and 16 have no common divisors, the outputs of the five SRL16 > / SRL16Es can be ANDed together to produce a counter that pulses once every > 17^2 * 15^2 * 16 = 1040400 clocks. This is in excess of the 1000000 (as was > asked for), and it only took 7 LUTs (<2 CLBs). In addition, there are no lines > that have a loading of more than 3. The 5-input AND can be implemented with > a registered 4-input AND (of the first four SRL16s), and a registered 2-input > AND. That means that there are no paths that go through un registered logic, > and the design will clock at a very high rate. > > One downside is that the SRLs require so much GND and VCC routing, but > you can create all that yourself and prevent the placer from going hog wild > with it. > > Another downside is what happens to the SRL16s if you have glitches on your > clock. Unlike most counters, this circuit will not "fix" itself. But lets try > to not think too much about that. > > You can also play sneaky games with the first layer SRL16s. When that first > registered 4-input AND gate goes high, all the SRL16s will have just been in > their high state. That means that if you replace those two SRL16s with two > SRL16Es you can hook the registered AND gate output back up to the (inverted > logic) enables of those first two SRL16Es. The effect of that modification > will be to change that registered AND gate from counting to (16^2 * 17^2) > to one that counts to (16^2*17^2 + 1). Since this is relatively prime > to the previous 16^2*17^2, that means that you can build two such circuits > and AND their outputs together to get a period of 73984 * 73985 with just > 11 LUTs. This is getting a 32.35 bit binary count, with DONE pulse, and very > high speed performance for only 11 LUTs or 2.94 bits per LUT. > > I should mention that I've never implemented that last sneaky game, so if it > doesn't work I'd not be completely surprised. Sure seems like it would > though, and my instincts for this sort of stuff are usually pretty good. > > Carl > > -- > Posted from firewall.terabeam.com [216.137.15.2] > via Mailgate.ORG Server - http://www.Mailgate.ORGArticle: 37861
Ian Smith wrote: > Do you need your large count with a resolution of 1? Quite often a large > count doesn't need to be adjustable to the nearest count, but instead say a > multiple of 32. If so, you can use one counter as a prescaler, doing a /32, > then the other to count the prescaler count. Yes, but what is the advantage? Prescalers and pulse-swallowers and also LFSR counters are great for speed, but they do nothing for area efficiency. To divide by a million, you need 20 flip-flops ( yes, 16 might be hidden in an SRL16 look-up table ). Peter AlfkeArticle: 37862
stefaan vanheesbeke wrote: > I wish the same thing as you do, and for the same reason... > > I think altera has a solution for this, but the use of their blockrams is > not so clear documented as from Xilinx. > Don't trigger my allergic reaction mentioning Altera's obfuscating documentation. A data sheet should describe the functionality honestly. I do not want to have to be on constant guard against half-truths and three-quarter lies, as if I were visiting a used-car lot. When it's called "dual-port", it should be dual-port ( or the limitation that one port is write the other read should be stated). When "low-power" is claimed, it should not be the irrelevant power dissipated in the terminating resistor... That's marketing at its worst, and it does not belong in a data sheet or app note. As long as I have been involved in Xilinx documentation (13 years) , I have always tried to describe the features and limitations in a forthright way. A data sheet is first and foremost written for the design engineer. And most pages describe the device limitation ( delays, set-up time requirements, max frequency, leakage currents etc) Marketing can create their own glossy brochures where everything is "great". Peter AlfkeArticle: 37863
Peter Alfke wrote: > As long as I have been involved in Xilinx documentation (13 years) , I have > always tried to describe the features and limitations in a forthright way. A > data sheet is first and foremost written for the design engineer. And most pages > describe the device limitation ( delays, set-up time requirements, max > frequency, leakage currents etc) > Marketing can create their own glossy brochures where everything is "great". > > Peter Alfke There are really only 2 sorts of silicon vendors - Those whose data sheets you can trust and whose manuals are written to be read by human beings, and the others. Xilinx is definitely category 1. However I've found that its still usually advisable to read data sheets starting from the back to avoid the slight marketing leakage that sometimes contaminates the first page's bullet point list :-)Article: 37864
Rick Filipkiewicz wrote: > There are really only 2 sorts of silicon vendors - Those whose data sheets you can > trust and whose manuals are written to be read by human beings, and the others. > Xilinx is definitely category 1. Thanks, nice to hear this. It requires constant vigilance. > > However I've found that its still usually advisable to read data sheets starting > from the back to avoid the slight marketing leakage that sometimes contaminates the > first page's bullet point list :-) Yes and no. The front page summarizes all the good things about the part, and that gives the reader ( who is assumed to not know anything about this part ) a feel for whether it is meaningful to bother reading the rest. By necessity, this front page is cryptic and up-beat. We think this is a neat part, otherwise we would not offer it for sale. But I am constantly toning down the fancy adjectives that marketing tries to sneak in... The front page is the 1-minute first encounter. "Why would you like this ". Peter AlfkeArticle: 37865
Hi Stephen, > Perhaps I'm missing something, but why can't you send the output of the > F-LUT to output X, and the output of the F5-MUX to output F5? As far as I can see, you can do that, and then use the same algorithm I gave in order to get the G-LUT output out of the slice with the F6-MUX. Unfortunately, my brain is just a few neurons short of quickly figuring out how efficient it would be. My guess is that it is an improvement... Carl -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORGArticle: 37867
The F5 output only goes to the F6 mux in the neighboring slice, nowhere else. At the F6 you run into a similar problem, because it has to get out somewhere, and it's other input has to be sourced by the F5 in that slice, which in turn is sourced by the LUTs. Stephen Melnikoff wrote: > > The problem with doing it is that it's hard to get the output of the "F" > LUT > > out of the slice. But it can be done by brining it out the CARRY-OUT. > > Perhaps I'm missing something, but why can't you send the output of the > F-LUT to output X, and the output of the F5-MUX to output F5? > > Stephen Melnikoff. > > -- > Stephen Melnikoff - s.j.melnikoff@iee.org > Electronic, Electrical and Computer Engineering > University of Birmingham, Birmingham, UK -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 37868
Kevin, I was under the impression that Xilinx put the IRDY and TRDY hardware in there because without it they couldn't guarantee PCI compatibility. > Regarding the "built-in PCI logic," I will assume what you mean > is Xilinx's special IRDY and TRDY logic. > Because the PCI IP core has to be portable across different platforms, I > am not interested in using that special IRDY and TRDY logic, and I don't > really know how it works. I had a design once where the customer selected the pins himself, and I had to make them cut and jumper the prototypes in order to get the PCI IP installed right. The Xilinx PCI logic takes an IRDY and a TRDY input, along with I1, I2, and I3, and produces an output called "PCI_CE". It's intended use is as a clock enable for when the xilinx drives the CBE[3:0] and AD[31:0] outputs. That should give a clue about what the logic in it is. This should give another clue: http://support.xilinx.com/xlnx/xil_ans_display.jsp?iLanguageID=1&iCountryID=1&getPagePath=10397 Since IRDY and TRDY are being brought in as inputs, I suppose this logic applies to the case when the Xilinx is a bus master, and it's used to extend cycles when the slave isn't ready. The idea would be to keep CBE constant (and AD too, if it was a master write cycle), if the slave responded with a not ready response. But it's been a while since I looked at a PCI spec. I'm pretty sure that if it were possible to make a Xilinx PCI IP core without the special logic, Xilinx would have done it. On the other hand, maybe their new parts are enough faster than before that the special logic isn't needed. One thing I like about Xilinx is that their silicon has always been pretty much rock solid for me. I've never had a real complaint about their silicon, but I complain all the time about their software. If something acts silly it's always because I've got signal integrity issues (or whatever), but they're not Xilinx' fault. Carl I always try to register all my inputs and outputs in the IOBs because that makes it a lot easier to analyze timing for the system. It breaks the system timing calculations into two parts. (1) Getting on and off of the Xilinx, but all those signals have pretty much the same timing, and (2) moving data around inside the Xilinx, but the tools handle that for me. I guess you can't simplify to that kind of system with a PCI interface. -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORGArticle: 37869
Bryan, if you ask engineers how to solve a problem and they can't help you, it seems like they will tell you that you're going about it the wrong way. (g) SRC Computers sounds like a cool place to work. I love pushing silicon to its limits. That's not always what the customer wants, though. Carl -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORGArticle: 37870
Hi Andraka > In my experience, which numbers into the hundreds of FPGA designs, I haven't seen a > single case where the design actually went to ASIC, although a decent percentage of > the customers naively believe that theirs will and want the FPGA design done > generically enough to go directly to ASIC. Of course they also want performance, > or they probably wouldn't have called me in the first place. I've only had one design go to ASIC, and it was in production with the XC2064 for years before they did it. It was coded in straight from XACT and it was very tightly packed, which is probably why it took so long to put it into an ASIC. Volumes were huge and by the time it was out of production, Xilinx was selling the XC2064s in huge volume at an amazingly low price. Xilinx must have some of the steepest price / volume curves in the industry. When you get to the high 10 thousands those guys will cut you a deal. The horrible thing is that it was only supposed to be a temporary remedy until we got an ASIC, so management got me to cut some very tight corners in the design. For years I was afraid that Xilinx was going to update their process and blow me away on my minimum path delays. They wanted it done too quick to go to an ASIC, and they didn't like the "risk" of ASICs. Then they kept putting off the ASIC conversion. It ran off of a (max) 72MHz clock by immediately dividing it to 36MHz. Then the outputs were "DDRed" back to 72MHz. In order to get that to work, I had to make the output bus "source synchronous", so I rebuilt a 72MHz clock from something like an XOR with delay of the 36MHz clock with itself. When I say the thing scared me I'm not kidding. For a while parts were screened for compatibility by observing their behavior in a test circuit. The test circuit fed the chip a clock that was reduced in amplitude to 60% of normal amplitude had slow rise and fall times, and varied its average voltage another 25% at audio frequencies. If the part could come up with a decent output clock under that kind of input clock, it got soldered into the spot. The 10% or so that failed would get soldered into another position. The time required to screen parts was around 10 seconds each. I was sure that the EE police were going to arrest me for the stunt, but I got away clean. I suspect that if I hadn't built the screening device we'd have had problems in manufacturing, but that never happened. Eventually, Xilinx' process was sped up to the point where they always had 100% passing, and manufacturing quit screening for suitability. (Which just made me more scared.) Carl -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORGArticle: 37871
Mike Treseler wrote: > > > Copy the component declaration wizard code into your architecture > > code, and instantiate it. > > > > Leonardo generates an .edf file with the component as a black box. > > > > Maxplus2 reads a wizard-generated vhdl file to fill in the > > black-box component. Make sure the wizard files are in the > > same directory as the .edf file. > > That is a workable band-aid, but did Altera explain > the original synthesis problem? Not yet. Seems that leonardo just doesn't do dual-port rams. I found that to avoid a maxplus2-vhdl-licence problem, once the vhdl wizard code is put into your own code, then regenerate the wizard files in ahdl. Using that method, i've been using lpm_ram_dp. However, i found that i can run it at 2.5MHz, but not 5MHz (the dac output goes crappy, or other parts of unrelated code stop working). I've got flip-flop pipelining everywhere, and if i bypass the ram, i can get good waveforms. How fast can EAB dual-port rams be run in an acex 1k30 speed -3 device?Article: 37872
I vote for the schematic camp. I can look at a schematic and get the big picture quickly. Easy to spot bad connections too. I have waded through other people's code and still been completely unable to figure out how they intended it to work. Often I have tried sketching block diagrams of such code and found the code had no real coherent structure. Personally I find trying to understand non-trivial VHDL hardware designs from the source code is as awful as figuring out a circuit design from the netlist. If just writing netlists was adequate, people would not have invented schematic/PCB CAD packages. People buy pictorial maps of cities, not lists of street names and their junctions. Arguing for text-based design on the grounds that text editors are commonly available seems to me like arguing we should use fingernails because not everyone has a screwdriver. I will be compromising, by using a package like Orcad to design the top-level view and generate a VHDL skeleton (by selecting VHDL as the netlist output format), then fleshing out the "soft components". KArticle: 37873
In article <3BC471C6.D745D1CE@xilinx.com>, Austin Lesea <austin.lesea@xilinx.com> wrote: > Getting the data in and out is the next problem after the internal > processing, and the LVDS IO allows for data buses of ~16 bits running at > ~700 Mb/s DDR rates, and the 840 Mb/s rates with some careful placement. FWIW, my 8:64 demux operates perfectly fine at 950 Gbps on the input bits (475 MHz clock) as a result of hand placing all the high speed stuff. It's a -5 part if I recall, but I've been on vacation for 3 days so it's already hazy. ;-) My only problem, previously posted, is that TRCE reports a max clock of 170 MHz or so because multiclock designs seem to befuddle the poor thing.Article: 37874
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z