Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Ray Andraka wrote: > > Jonathan Bromley wrote: > > > One potential problem with doing this in an FPGA is that the routing > > delays from the taps on your delay line to the mux inputs, and the > > delays through the mux itself, might be comparable with the delay line > > inter-tap delays. This could result in a non-monotonic relationship > > between tap number and delay, which would seriously screw up the servo. > > The trick would be to make each stage have a delay at least as long as > > the longest possible routing delays. > > > > Still, it's an interesting idea and well worth trying. > > > > This is what I was trying to say a few days ago when this first came up. I > looked at doing this a whileback using the carry chains for the delay > elements. The problem I ran into was that it was extremely difficult to > match the delays for each of the taps thru the feedback mux, so the delay > selection was not monotonic. If you can accept a large granularity in the > delay steps, (so that the delay differences through the mux are swamped by > the incremental delays in the delay chain), then you should be able to do > it. You can use the carry chain as a monotonic programmable delay element, but it is a small delta (.2ns/CLB) on top of several nanoseconds. I built it up as a servo system. The basic idea is to configure the carry chain as an OR chain, then drive a signal into the carry chain from a selected tap(CLB). To avoid routing differentials, signal is fed into a column of CLBs from a vertical longline. You select a tap into the carry chain by gating the signal with the LUT before it enters the carry block. The LUT and control register are placed in the same CLB as the carry block. The interconnect must be local direct connect and the router won't get it right, so you need to hand edit the routing. As I remember, with a XL device, a signal entering the FPGA from a pin and routed out to a pin which passes through 1-24 Carry blocks had a delay variable between 5 and 10 ns. Since the basic design guarantees monotonicty, (the carry goes through 1-24 delays), it will be legal in a servo system if the minimum delay can be guaranteed. Be aware however that the positive and negative edges travel at different rates so the duty cycle of a clock will be distorted. If you can tolerate a few ns of jitter, a perhaps more useful to way to get a programmable PLL is to just use a programmable counter/timer along with a ring oscillator built from a LUT. As long as the divide ratio can be made large, the PLL can tolerate faster silicon. You will also need to build a phase detector to servo the counter limit. This is also guaranteed to be monotonic. I always thought this might be a useful block for multi-rate DSP. - BradArticle: 16076
I wouldn't mind talking about dynamic reconfiguration (I'm working on it, actually). Guido MeardiArticle: 16077
Hi Frank, Ray, Group, I think if you need the utmost speed, with 32-bit address comparison, you still use the fast carry logic. Ray's suggestion led me to modify the comparison implementation via carry logic subtraction. Split it into two halves for each of the low vector and high vector address comparisons: [Low Vector 16-LSB compare]----\ [Low Vector 16-MSB compare]---\ \ [Hi Vector 16-LSB compare]---\ \ \ [Hi Vector 16-MSB compare]--->[In range detect]-->Out where the [In range detect] is a single 4-LUT. Timing for this circuit, assuming the input address is set up to the comparators at time 0, and using databook numbers for xc4000e -3 (sorry, I don't have spartan timing handy), and rule of thumb route delay = F/G to X/Y delay: Operand to Cout -> 2.6 Cin to Cout x7 -> 4.9 Cin to X -> 3.3 Route delay -> ~2 f/g to x/y -> 2 Total: 14.8 now Ray's circuit: [MUX]--->[4-bit range detects]--->[range tree]--->Out (I've included the MUX because this is supposed to be a programmable circuit, and you need 16 writes to fill in your range detect rams) Anyway, Timing here is MUX | Range det | Range Tree | CLB Route CLB route CLB route CLB route CLB 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 = 18nS Thus, in typical classic engineering trade-off, double the CLB usage to get only 16% timing improvement. Of course, the numbers may be completely off cuz I don't have Spartan timings handy, and routing delays may vary significantly from thumb rule. Maybe you'd like to complete the excercise, Frank, and let us know which way turns out fastest? For even larger vectors, Ray's method rapidly pulls ahead in speed, being of o(ceiling(lg(N))), where the carry logic is o(N). But the carry logic is so much faster than the LUT (RAM), it often pays to look for ways to use it when you need the speed. - John Ray Andraka wrote: >> Assuming you need to do all the compares on the same clock cycle, I'd opt > for a partitioned look-up compare approach. For each group of 4 bits in > the memory address, you generate a 16 x 2n look up table (n is the number > of compares) containing all of the compares for that 4 bit slice. In your > case, it sounds like you are looking for an in-range/out-of-range > indication for each pair of limits, so the upper and lower limit compares > within each slice can also be combined so that the slice outputs are two > of "in-range", "above range", and "below range". The outputs of each > slice are then combined in separate compare trees for each limit pair. > You are storing the compare results for each input address rather than the > compare values, so you will need to do some pre-computation to get the LUT > values. The reward for the extra effort comes in the form of a smaller > and faster circuit, and less storage locations to deal with. > Frank Papendorf wrote: > > > I have to store 16 32-Bit-vectors in a Xilinx XCS30XL. These vectors > > must be accessible by a address from outside (like a RAM), and on the > > other hand I must access them simultanous to compare them with another > > vector. > > > > vector 0 \ > > compare if vector 0 <= vector x<= vector 1 > > vector 1 / > > > > vector 2 \ > > compare if vector 2 <= vector x <= vector 3 > > vector 3 / > > : : > > vector15 > > > > The vectors are limits of memory-ranges. For instance is vector 0 the > > lower limit and vector 1 the upper limit of a range. > > With the comparison I want find out in wich range is vector x is.Article: 16078
In article <3729AA29.AD7D0D3B@ids.net>, Ray Andraka <randraka@ids.net> wrote: >You might also look at Xilinx Virtex, as >they have fast dual port 4K blocks with async capability. > A minor correction here, the 4K Dual Read/Write Port Block SelectRAMs in Virtex do not have asynchronous read or write functionality. Th Block SelectRAMs are synchronous with all of the inputs having a setup to the port clock and the data out lines have a clock-to-out relative to the port clock. Each operation only needs a single clock edge. More details for the curious are in XAPP130 http://www.xilinx.com/xapp/xapp130.pdf EdArticle: 16079
Try not to use one big mux, but many little muxes, each of which is guaranteed to have an incremental delay delta: |-----------------[Inverter]<--------------------------------| | | | | Delay | | Delay | | Delay | | Delay | | |---->| MUX |---->| MUX |---->| MUX |---->| MUX |----| |_______| |_______| |_______| |_______| ^ ^ ^ ^ _____|_____________|_____________|_____________|____ "0"->| Bi-Directional Shift Register |<--"1" |___________________________________________________| This throws a lot more logic at the problem, but guarantees monotonic delay control. Granularity depends on implementation of the Delay Mux. It which might select between carry chain or normal route, or between 1 LUT/2LUT delay. - John Ray Andraka wrote: > > Jonathan Bromley wrote: > (about non-monotonic delay controls) > >(more about non-monotonic delay controls)Article: 16080
mcgett@efc1.xsj.xilinx.com (Ed Mcgettigan) writes: > In article <3729AA29.AD7D0D3B@ids.net>, Ray Andraka <randraka@ids.net> wrote: > >You might also look at Xilinx Virtex, as > >they have fast dual port 4K blocks with async capability. > > > > A minor correction here, the 4K Dual Read/Write Port Block SelectRAMs > in Virtex do not have asynchronous read or write functionality. Not yet, anyway... ;-) Homann -- Magnus Homann Email: d0asta@dtek.chalmers.se URL : http://www.dtek.chalmers.se/DCIG/d0asta.html The Climbing Archive!: http://www.dtek.chalmers.se/Climbing/index.htmlArticle: 16081
How about using an interleave type approach, such as commonly used in software buffers? One device can be running while the second device is being loaded with the new configuration file. At the desired moment, you tristate the original device and enable the second one. The now disabled device can be loaded with whatever you want next, while the second device is running... This obviously requires more hardware, but could be scaled up to work with as many FPGA devices as you can fit on the board. Just a thought... -- ~~~ /@ @\ -----------oOO-{ U }-OOo------------------------------------------ \ ^ / __ _ ___// _ __ _____ __ _ Gary Cameron, P. Eng | , \| |/ //\| | .\|_ _|/ \| | DSP Software Developer | |\ | | // | / | | | -- /| |_ Nortel Wireless Networks |_| \ _|\//__/|_|\_\ |_| \ ___||___| (613)-763-1817 (ESN 6+393-1817) // Northern Telecom gcameron@nortelnetworks.comArticle: 16082
I guess I should have been clearer. Each port of the block rams can have its own clock, which can be asynchronous with respect to the other port's clock. The data I/O through the port still has to be synchronous with the port clock. In most cases, this results in a quite workable system, and provides a nifty way of buffering and synchronizing signals between clock domains. Ed Mcgettigan wrote: > In article <3729AA29.AD7D0D3B@ids.net>, Ray Andraka <randraka@ids.net> wrote: > >You might also look at Xilinx Virtex, as > >they have fast dual port 4K blocks with async capability. > > > > A minor correction here, the 4K Dual Read/Write Port Block SelectRAMs > in Virtex do not have asynchronous read or write functionality. Th Block > SelectRAMs are synchronous with all of the inputs having a setup to the > port clock and the data out lines have a clock-to-out relative to > the port clock. Each operation only needs a single clock edge. More > details for the curious are in XAPP130 http://www.xilinx.com/xapp/xapp130.pdf > > Ed -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 16083
John, the original query was how to handle the large amount of parallel storage (there were 16 vectors that need to be accessed simultaneously to compare to a common input vector). One could use the LUTs as I described and then combine the results using an additional layer of LUTs and the carry chain for a faster result, especially if you are using pairs and just need an in-range/out-of range indication. The concept is pretty much the same, but alot simpler to explain as a tree of compares. John L. Smith wrote: > Hi Frank, Ray, Group, > > I think if you need the utmost speed, with 32-bit address > comparison, you still use the fast carry logic. > Ray's suggestion led me to modify the comparison > implementation via carry logic subtraction. > Split it into two halves for each of > the low vector and high vector address comparisons: > > [Low Vector 16-LSB compare]----\ > [Low Vector 16-MSB compare]---\ \ > [Hi Vector 16-LSB compare]---\ \ \ > [Hi Vector 16-MSB compare]--->[In range detect]-->Out > > where the [In range detect] is a single 4-LUT. > > Timing for this circuit, assuming the input address is > set up to the comparators at time 0, and using databook > numbers for xc4000e -3 (sorry, I don't have spartan timing > handy), and rule of thumb route delay = F/G to X/Y delay: > > Operand to Cout -> 2.6 > Cin to Cout x7 -> 4.9 > Cin to X -> 3.3 > Route delay -> ~2 > f/g to x/y -> 2 > Total: 14.8 > > now Ray's circuit: > > [MUX]--->[4-bit range detects]--->[range tree]--->Out > > (I've included the MUX because this is supposed to be > a programmable circuit, and you need 16 writes to fill > in your range detect rams) Anyway, Timing here is > > MUX | Range det | Range Tree | > CLB Route CLB route CLB route CLB route CLB > 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 = 18nS > > Thus, in typical classic engineering trade-off, > double the CLB usage to get only 16% timing improvement. > Of course, the numbers may be completely off cuz > I don't have Spartan timings handy, and routing delays > may vary significantly from thumb rule. Maybe you'd > like to complete the excercise, Frank, > and let us know which way turns out fastest? > > For even larger vectors, Ray's method rapidly > pulls ahead in speed, being of o(ceiling(lg(N))), > where the carry logic is o(N). But the carry logic > is so much faster than the LUT (RAM), it often pays > to look for ways to use it when you need the speed. > > - John -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 16084
The app I was looking at needed granularity of around 300ps, which is why I was looking at the carry chain. Brad's technique looks like it might have done the trick, although I am not sure that the transport delay on a long line in Virtex is really equal (within the 300ps) all the way down it. John L. Smith wrote: > Try not to use one big mux, but many little muxes, each of which > is guaranteed to have an incremental delay delta: > > |-----------------[Inverter]<--------------------------------| > | | > | | Delay | | Delay | | Delay | | Delay | | > |---->| MUX |---->| MUX |---->| MUX |---->| MUX |----| > |_______| |_______| |_______| |_______| > ^ ^ ^ ^ > _____|_____________|_____________|_____________|____ > "0"->| Bi-Directional Shift Register |<--"1" > |___________________________________________________| > > This throws a lot more logic at the problem, but guarantees > monotonic delay control. Granularity depends on implementation > of the Delay Mux. It which might select between carry chain or > normal route, or between 1 LUT/2LUT delay. > > - John > > Ray Andraka wrote: > > > > Jonathan Bromley wrote: > > > (about non-monotonic delay controls) > > > >(more about non-monotonic delay controls) -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 16085
Hi, i am new guys in this room. Because i will make a thesis in FPGA topic, if you don't mind, tell me about FPGA is. 1. What is FPGA? Is the function of FPGA can do the same like Microcontrolller? 2. I confuse if FPGA programmable from the beginning(using VHDL or Verilog), how can we change the design when we already made it? 3. If the FPGA can do the same function like Microcontroller(such as Motorola 68HC11), is that any possibilities that FPGA will replace Microcontroller in the future? Thanks for your attention and helpArticle: 16086
hi I was wondering how tough and how long it will take to design an IrDA (SIR) controller at vhdl or verilog level. thanks for any help EnricoArticle: 16087
You can find various free and low-cost packages for programmable logic device on the Programmable Logic Jump Station at http://www.optimagic.com/lowcost.shtml. ----------------------------------------------------------- Steven K. Knapp OptiMagic, Inc. -- "Great Designs Happen 'OptiMagic'-ally" E-mail: sknapp@optimagic.com Web: http://www.optimagic.com ----------------------------------------------------------- david@home.com wrote in message <372b2a5e.48586435@news.alt.net>... >I was wondering if anyone was offering a >schematic with post (back annotated) routed simulation for an >incircuit prog. fpga for free or very low cost? >davidArticle: 16088
I was wondering if anyone was offering a schematic with post (back annotated) routed simulation for an incircuit prog. fpga for free or very low cost? davidArticle: 16089
hi, I am trying to implement a dual port ram block in 10K200E. What I want is to model a real dual port ram where you have two of everything (two address busses, two data busses, two writes etc) and where you can read/write from/to different addresses at the same time just like a CY7C132 or similar dual port devices. I have looked at the 10KE data sheet and I don't see how I can do this. Is this possible at all ? If not, what's dual port about the embedded memory in 10KE ? thanks muzo Verilog, ASIC/FPGA and NT Driver Development Consulting (remove nospam from email)Article: 16090
Hi, Strongly depending on which version or IRDA you intend to target to ! The first version 1.0 is very simple (asynchronous transfert, a 0 is coded no-irda-pulse, a 1 is coded 1 pulse). Second version 1.1 is more complex. It defines true protocol with different layers, error detection, and so on ... So ... Karim.EMBAREK@wanadoo.fr Enrico Migliore wrote: > > hi > I was wondering how tough and how long it will take > to design an IrDA (SIR) controller at vhdl or verilog > level. > > thanks for any help > EnricoArticle: 16091
Yes. But Xilinx's app notes didn't talk much about how to write them in VHDL. Actually, you can write your code more lower-level. If your synthesis tool can have intelligent options to generate faster counters, that eases your effort. However, I prefer to write at lower level as my usual practice to get fast & critical performance. Leslie / Louis Yip leslie.yip@asmpt.com / louiyip@netvigator.com David Miller wrote in message ... >phil_jackson@my-dejanews.com writes: > >> I normally cascade large counters in the code so as to ensure an >> efficient/fast synthesis. They're may be another way, but this was I am >> usually safe. >What is the target? >Xilinx produced a couple of application notes for designing fast counters >by making best use of the carry logic. Other manufacturers probably >produce similar app notes. >-- >--------------------------------------------------------------------------- ---- >David Miller Tel: +44 (0)131 343 4963 >Development Engineer Fax: +44 (0)131 343 4091 >Marconi Avionics, RCS Division Email : david.miller@gecm.com >Crewe Toll, Edinburgh >EH5 2XS >--------------------------------------------------------------------------- ----Article: 16092
I haven't used Leonardo in a while but it used to build Xilinx counters out of inc_dec primitives followed by registers rather than building them as integrated counters. Synplify always built them correctly. I used to build my counters by hand and instantiate them as black boxes (e.g., using LogiBlox for Xilinx) but have started to lean more on the synthesis tools lately as they tend to do a pretty good job. It depends though. For Orca (Lucent) Synplify doesn't do as good a job as Exemplar. How good a synth tool does with a given architecture often depends on which fpga vendors the synth tool vendor is partnered up with. Xilinx is tight with Synplicity and Synopsys while Lucent is tight with Exemplar. Not sure about Altera. In general I'd say that coding style won't make a big difference. The synth tool will almost certainly recognize a counter when it sees it, but may not always map it to the target architecture in the most efficient way. In that case you're better off to instantiate. Bob Sefton "C. Michele Rogers" wrote: > > Hi everyone, > > Can Synplicity and Leonardo automatically optimize large binary counters to be > fast? Or do I have to recode them? > > Any help will be highly appreciated. > > Thanks > MicheleArticle: 16093
Hello ! I can tell ,implementing DPR in 10K200E is very possible just like you want. I already implement this kind of design few weeks ago and it's working. For your convenience please use the wizard that altera gives. But if you still found a difficulties to do , you can contact with by phone or email. muzo wrote: > hi, > I am trying to implement a dual port ram block in 10K200E. What I want > is to model a real dual port ram where you have two of everything (two > address busses, two data busses, two writes etc) and where you can > read/write from/to different addresses at the same time just like a > CY7C132 or similar dual port devices. I have looked at the 10KE data > sheet and I don't see how I can do this. Is this possible at all ? If > not, what's dual port about the embedded memory in 10KE ? > > thanks > muzo > > Verilog, ASIC/FPGA and NT Driver Development Consulting (remove nospam from email)Article: 16094
Does anyone have material or knows where to find material on the advances in FPGA Technology. Please Email me at trini@wam.umd.eduArticle: 16095
Hi, Do someone have a virtex family symbols for ORCAD ? Tnx ============================== Jacob Eluz Intel NCGJ (Network Communication Group Jerusalem) Add: Hamarpe 6 POB 45032 Jerusalem 91450 Israel Tel: 972-2-5892642 Fax: 972-2-5892600 Email: jacob.eluz@intel.com ==============================Article: 16096
Hello world I have the following problem with the netlister in the Foundation software from Xilinx: In my VHDL-code I have defined an input as an 8 bit bus, but in the design I only use the 7 most significant bits (at least after the last modification). During syntax check and synthesis I get no warnings or errors, during implementation the program terminates due to an unused input. Of course I could change my entity, and modify the design all the way through the design hierarchy, but I would prefer if someone could inform me on a more elegant solution. If I se this from a "documentational" point of wiev this would also be prefered, as the formulas, I try to implement contains a divide by 2. Thanks LarsArticle: 16097
Hello Everybody! I saw somebody posted a message few days ago. It said that is a newsgroup service which has a lot of uncensored pictures, softwares and mpeg files. So, I joined that newsgroup. But, I discovered that there is also a referral program. If you joined under my userid, you and I have half month service free. So, please join it under my user name called SMART. The newsgroup service provider web site is as follows: http://www.cit-news.com Thank you very much zvcjtijlcfqctliircjcvpzzcdzpsgvnozuehcgmvdhcgrsnoxixicvtwqzydfiiumzvkitwwkbwnrdcqmjsetfeylitrdArticle: 16098
In article <372BFEA8.45B9C78@dsi.co.il>, Eli Keren <elik@dsi.co.il> wrote: >Hello ! > >I can tell ,implementing DPR in 10K200E is very possible just like you >want. I already implement this kind of design few weeks ago and it's >working. For your convenience please use the wizard that altera gives. >But if you still found a difficulties to do , you can contact with by >phone or email. Unless Altera has added a new function to their MegaWizard, they don't support what muzo was requesting, 2 ports each with read/write capability. The current (9.1) MegaWizard lists a dual-port RAM, but it has one read port and one write port. Since this is the EAB structure in the 10K200E, it makes sense. I can't think of an easy way to implement this in a 10K200E since with two write ports you can't just add another EAB to get the increased functionality. You could do this if you only needed a write and 2 reads however. There may be some unique way of implementing this using time slicing, but it would require extra logic and would surely destroy your RAM performance by at least 50%. On the competitive side (I do work for Xilinx), the Virtex family supports this function directly in the Block SelectRAMs. The CY7C132 is a 2Kx8 dual read/write RAM, this can implemented with 4 blocks each configured as a 2Kx2 with no extra logic. EdArticle: 16099
In article <7gad1t$266f$1@noao.edu>, "Andy Peters" <apeters@noao.edu.NOSPAM> wrote: > Willy_Tsai wrote in message <7g6ce4$9e5@netnews.hinet.net>... > > "I am develope a project. It need Z80 and Z80-CTC and Z80-PIO." > > Uh, pick up a digikey, mouser, or JDR catalog and order the chips? ..or better have a look at the TMPZ/Z84C015: smaller housing and all peripherals on chip. -- Stefan Wimmer Cellware Broadband Email sw@cellware.de Rudower Chaussee 5 WWW http://www.cellware.de/ 12489 Berlin, Germany Visit my private Homepage: Love, Electronics, Rockets, Fireworks! http://www.geocities.com/CapeCanaveral/6368/
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z