Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A recent thread went over the maximum permitted power on a device. So why don't we have a sensing diode (well, a transistor with collector tied to base works better) on the die somewhere? It's fairly easy to do, according to my VLSI acquaintances, and with faster and faster IO [implying as it does faster and faster logic switching as well], it would make sense to see these on any largescale (more than 100 pins perhaps) FPGA package. Comments? Cheers PeteSArticle: 107101
On 2006-08-24, me_2003@walla.co.il <me_2003@walla.co.il> wrote: > different bit field it should be also written to it (like a bit-wise OR > between current value and new value). You could have both sides write to their own BRAM and define the output of the whole thing as the OR of the outputs of the BRAMs. -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/Article: 107102
"PeteS" <PeterSmith1954@googlemail.com> wrote in message > > > So why don't we have a sensing diode (well, a transistor with collector > tied to base works better) on the die somewhere? It is there, at least in Virtex 4. /MikhailArticle: 107103
MM wrote: > "PeteS" <PeterSmith1954@googlemail.com> wrote in message > > > > > So why don't we have a sensing diode (well, a transistor with collector > > tied to base works better) on the die somewhere? > > It is there, at least in Virtex 4. > > /Mikhail I'd like to see it in all the larger devices :) Cheers PeteSArticle: 107104
Very interesting coding style. I'm curious why there are separate clocked processes. You could just tack on the output code to the bottom of the state transition process, but that is only a nit. As long as I'm using registered outputs, I would personally prefer a combined process, but that's just how I approach the problem. I want to know everthing that happens in conjunction with a state by looking in one place, not by looking here to see where/when the next state goes, and then looking there to see what outputs are generated. To illustrate, by modifying the original example: my_state_proc: process(clk, reset_n) type my_state_type is (wait, act, test); variable my_state: my_state_type; begin if (reset_n = '0') then my_state := wait; my_output <= '0'; elsif (rising_edge(clk)) case my_state is when wait => if (some_input = some_value) then my_state := act; end if; ... ... when act => if some_input = some_other_val then my_output <= yet_another_value; else ... end if; ... when test => ... when others => my_state := wait; end case; end if; end process; The only time I would use separate logic code for outputs is if I wanted to have combinatorial outputs (from registered variables, not from inputs). Then I would put the output logic code after the clocked clause, inside the process. I try to avoid combinatorial input-to-output paths if at all possible. Then it would look like this: my_state_proc: process(clk, reset_n) type my_state_type is (wait, act, test); variable my_state: my_state_type; begin if (reset_n = '0') then my_state := wait; my_output <= '0'; elsif (rising_edge(clk)) case my_state is when wait => if (some_input = some_value) then my_state := act; end if; ... ... when act => if some_input = some_other_val then my_output <= yet_another_value; else ... end if; ... when test => ... when others => my_state := wait; end case; end if; if state = act then -- cannot use process inputs here my_output <= yet_another_value; -- or here end if; end process; Interestingly, the clock cycle behavior of the above is identical if I changed the end of the process to: ... end case; -- you CAN use process inputs here: if (state = act) then my_output <= yet_another_value; -- or here end if; end if; end process; Note that my_output is now a registered output from combinatorial inputs, whereas before it was a combinatorial output from registered values. Previously you could not use process inputs, now you can. Andy Eli Bendersky wrote: > backhus wrote: > > Hi Eli, > > discussion about styles is not really satisfying. You find it in this > > newsgroup again and again, but in the end most people stick to the style > > they know best. Style is a personal queastion than a technical one. > > > > Just to give you an example: > > The 2-process -FSM you gave as an example always creates the registered > > outputs one clock after the state changes. That would drive me crazy > > when checking the simulation. > > I guess this indeed is a matter of style. It doesn't drive me crazy > mostly because I'm used to it. Except in rare cases, this single clock > cycle doesn't change anything. However, the benefit IMHO is that the > separation is cleaner, especially when a lot of signals depend on the > state. > > > > > Why are you using if-(elsif?) in the second process? If you have an > > enumerated state type you could use a case there as well. Would look > > much nicer in the source, too. > > I prefer to use if..else if there is only one "if". When there are > "elsif"s, case is preferable. > > > > > Now... Will you change your style to overcome these "flaws" or are you > > still satisfied with it, becaused you are used to it? > > > > Both is OK. :-) > > > > Anyway, each style has it's pros and cons and it always depends on what > > you want to do. > > -- has the synthesis result to be very fast or very small? > > -- do you need to speed up your simulation > > -- do you want easy readable sourcecode (that also is very personal, > > what one considers "readable" may just look like greek to someone else) > > -- etc. etc. > > > > So, there will be no common consensus. > > > > In my original post I had no intention to reach a common consensus. I > wanted to see practical code examples which demonstrate the various > techniques and discuss their relative merits and disadvantages. > > Kind regards, > Eli > > > > > Eli Bendersky schrieb: > > > Hello all, > > > > > > In a recent thread (where the O.P. looked for a HDL "Code Complete" > > > substitute) an interesting discussion arised regarding the style of > > > coding state machines. Unfortunately, the discussion was mostly > > > academic without much real examples, so I think there's place to open > > > another discussion on this style, this time with real examples > > > displaying the various coding styles. I have also cross-posted this to > > > c.l.vhdl since my examples are in VHDL. > > > > > > I have written quite a lot of VHDL (both for synthesis and simulation > > > TBs) in the past few years, and have adopted a fairly consistent coding > > > style (so consistent, in fact, that I use Perl scripts to generate some > > > of my code :-). My own style for writing complex logic and state > > > machines in particular is in separate clocked processes, like the > > > following: > > > > > > > > > type my_state_type is > > > ( > > > wait, > > > act, > > > test > > > ); > > > > > > signal my_state: my_state_type; > > > signal my_output; > > > > > > ... > > > ... > > > > > > my_state_proc: process(clk, reset_n) > > > begin > > > if (reset_n = '0') then > > > my_state <= wait; > > > elsif (rising_edge(clk)) > > > case my_state is > > > when wait => > > > if (some_input = some_value) then > > > my_state <= act; > > > end if; > > > ... > > > ... > > > when act => > > > ... > > > when test => > > > ... > > > when others => > > > my_state <= wait; > > > end case; > > > end if; > > > end process; > > > > > > my_output_proc: process(clk, reset_n) > > > begin > > > if (reset_n = '0') then > > > my_output <= '0'; > > > elsif (rising_edge(clk)) > > > if (my_state = act and some_input = some_other_val) then > > > ... > > > else > > > ... > > > end if; > > > end if; > > > end process; > > > > > > > > > Now, people were referring mainly to two styles. One is variables used > > > in a single big process, with the help of procedures (the style Mike > > > Tressler always points to in c.l.vhdl), and another style - two > > > processes, with a combinatorial process. > > > > > > It would be nice if the proponents of the other styles presented their > > > ideas with regards to the state machine design and we can discuss the > > > merits of the approaches, based on real code and examples. > > > > > > Thanks > > > Eli > > >Article: 107105
Chuck Levin <clevin1234@comcast.net> wrote: > Hi, > I was thinking about using QuickLogic for a low power FPGA design. Does > anyone have any experiences they would like to share about their devices or > tools ? Be sure that you have neither reprogrammability nor in-circuit programmability at all. -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------Article: 107106
Hi peter, Peter Alfke wrote: > I did not really understand the question, but: > You can configure the BRAM 1-bit wide, and thus address each bit > individually. > It seems to me that this solves all your problems. Yes, that is what I was going to do but i thought that maybe someone will come out with a nicer idea, or that maybe there is a way to make Xilinx BRAMs to work as I needed them to (data out in the same cycle). the clock rate is not the issue beacuse I dont want to give two cycles for each write (not very elegant). > You can configure the two ports separately, e.g. one can be 1-bit wide, > the other 9 bits wide. That's an interesting idea, If I define portA to be 1 bit wide and portB 6 bits wide, when reading from portB (address 0) would I get the 6 values written to portA (address 0 to 5) ? Thanks, Mordehay.Article: 107107
Ben Jackson wrote: > On 2006-08-24, me_2003@walla.co.il <me_2003@walla.co.il> wrote: > > different bit field it should be also written to it (like a bit-wise OR > > between current value and new value). > > You could have both sides write to their own BRAM and define the output > of the whole thing as the OR of the outputs of the BRAMs. > Yes but I need a 6 bit wide vector - so that means that I would have to use 6 BRAMs. while utilizing a very small precentage of each (I need only 512 entries). Thanks, Mordehay.Article: 107108
Anyone have experience with directly driving a cable with RocketIO? I am interested in any information/experiences/advice regarding linking two FPGAs via RocketIO over a cable. I have seen some signal characterization information for high-speed links over copper, but usually less than 800Mhz. I believe my implementation would use a a less than 1 meter, but would like to know it works at 3, 5, 10...meters. Ideally I would like to run the link at 10gbits, but 6gbits could work. How feasible is this, or is it back to the drawing board? Thanks in advance! DennisArticle: 107109
David Ashley wrote: > Andreas Ehliar wrote: > >>No need to do that, the S3E starter kit is supported by Bryan's >>modified XC3Sprog, available at http://inisyn.org/src/xup/ . >> >>/Andreas > > > Andreas, > > Very good! It says USB1 is not supported, get a USB2 > card. I'm not sure what this means. I'm able to download > bit files to spartan-3e starter board with no problems under > windows -- but I don't think my usb controller is usb 2.0 > (as in high speed). Will the xup work on my machine? > > -Dave F***'n A! I just answered my own question. Followed the step by step instructions except I'm using gentoo so just did emerge sdcc Then picked up from the rest of the process (from the tar zxvf xup-0.0.2.tar.gz) and everything worked fine. The ./p outputs a lot of text and I wasn't sure if there was a problem, but after 5 minutes or so it finished reporting success. The step "./xc3prog /some/file.bit" needs to actually be "xc3sprog" ^ The 's' is missing. Worked fine. This fills in a big hole in my linux development approach. All I'm likely to be doing is fire-and-forget downloads anyway. The fpga isn't big enough for chipscope and I don't want to pay for chipscope anyway :^). I'll invest in a logic analyzer and just bring signals out if necessary. 8 seconds to download is the least of my worries -- making the bit file in the first place takes lots of minutes. Thanks!!! -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architectureArticle: 107110
Tommy Thorn wrote: > KJ wrote: > .... a (AFAICT) correct description of Avalon. > > Ah, we only differ in perspective. Yes, Avalon _allows_ you to write > slaves like that Umm, yeah it's defined up front in the spec and not off in some corner like Wishbone's tag method either. > and if your fabric consists only of such slaves, then > yes, they are the same. What is the same as what? Also, there is no restriction about having latency aware masters and slaves. > But variable latency does _not_ work like that, How do you think it works? I've been using the term 'variable latency' as it is used by Avalon which is that there can be an arbitrary delay between the end of the address phase (i.e. when waitrequest is not asserted to the master) and the end of the data phase (i.e. when readdatavalid, is asserted to the master). > thus you can't make such an assumption in general if you wish the fabric > to be able to accommodate arbitrary Avalon slaves. What assumption do you think I'm making? The Avalon fabric can connect any mix of Avalon slaves whether they are fixed latency, variable latency or no latency (i.e. controlled by waitrequest). Furthermore it can be connected to an Avalon master that is 'latency aware' (i.e. has a 'readdatavalid' input) or one that is not (i.e. does not have 'readdatavalid' as an input, so cycles are controlled only by 'waitrequest'). You get different performance based on which method is used but that is a design choice on the master and slave side design, not something that Avalon is doing anything to help or hinder. > > That was not my understanding. SimpCon allows Martin to get an "early > warning" that a transaction is about to complete. And what happens as a result of this 'early warning'? I *thought* it allowed the JOP Avalon master to start up another transaction of some sort. If so, then that can be accomplished with waitrequest and readdatavalid. But maybe it's something on the data path side that gets the jump that I'm just not getting just yet. > > > Anyway, hopefully that explains why it's not abusing Avalon in any way. > > My wording was poor. Another way to say it is "to use Avalon in a > constrained way". I'm not clear on what constraint you're seeing in the usage. > Used this way you cannot hook up slaves with variable > latency, so it's not really Avalon, it's a subset of Avalon. If anything, choosing to not use the readdatavalid signal in the master or slave design to allow for completion of the address phase prior to the data phase is the subset not the other way around. KJArticle: 107111
Eli Bendersky wrote: > Hello all, > > In a recent thread (where the O.P. looked for a HDL "Code Complete" > substitute) an interesting discussion arised regarding the style of > coding state machines. Unfortunately, the discussion was mostly > academic without much real examples, so I think there's place to open > another discussion on this style, this time with real examples > displaying the various coding styles. I have also cross-posted this to > c.l.vhdl since my examples are in VHDL. > > I have written quite a lot of VHDL (both for synthesis and simulation > TBs) in the past few years, and have adopted a fairly consistent coding > style (so consistent, in fact, that I use Perl scripts to generate some > of my code :-). My own style for writing complex logic and state > machines in particular is in separate clocked processes, like the > following: > > > type my_state_type is > ( > wait, > act, > test > ); > > signal my_state: my_state_type; > signal my_output; > > ... > ... > > my_state_proc: process(clk, reset_n) > begin > if (reset_n = '0') then > my_state <= wait; > elsif (rising_edge(clk)) > case my_state is > when wait => > if (some_input = some_value) then > my_state <= act; > end if; > ... > ... > when act => > ... > when test => > ... > when others => > my_state <= wait; > end case; > end if; > end process; > > my_output_proc: process(clk, reset_n) > begin > if (reset_n = '0') then > my_output <= '0'; > elsif (rising_edge(clk)) > if (my_state = act and some_input = some_other_val) then > ... > else > ... > end if; > end if; > end process; > > > Now, people were referring mainly to two styles. One is variables used > in a single big process, with the help of procedures (the style Mike > Tressler always points to in c.l.vhdl), and another style - two > processes, with a combinatorial process. > > It would be nice if the proponents of the other styles presented their > ideas with regards to the state machine design and we can discuss the > merits of the approaches, based on real code and examples. > > Thanks > Eli I usually separate the state register and combinational logic for the following reason. First, I think that the term "coding style" is very misleading. It is more like "design style". My approach for designing a system (not just FSM) is - Study the specification and think about the hardware architecture - Draw a sketch of top-level block diagram and determine the functionalities of the blocks. - Repeat this process recursively if a block is too complex - Derive HDL code according to the block diagram and perform synthesis. This approach is based on the observation that synthesis software is weak on architecture-level manipulation but good at gate-level logic minimization. It allows me to have full control of the system architecture (e.g., I can easily identify the key components, optimize critical path etc.). The basic block diagram of FSM (and most sequential circuits) consists of a register, next-state logic and output logic. Based on my design style, it is natural to describe each block in a process or a concurrent signal assignment. The number of segments (process and concurrent signal assignments etc.) is really not an issue. It is just a by-product of this design style. The advantage of this approach is that I have better control on final hardware implementation. Instead of blindly relying on synthesis software and testing code in a trial-and-error basis, I can consistently get what I want, regardless which synthesis software is used. On the downside, this approach requires more time in initial design phase and the code is less compact. The VHDL code itself sometimes can be cumbersome. But it is clear and easy to comprehend when presented with the block diagram. One interesting example in FSM design is the look-ahead output buffer discussed in section 10.7.2 of "RTL Hardware Design Using VHDL" (http://academic.csuohio.edu/chu_p/), the book mentioned in the previous thread. It is a clever scheme to obtain a buffered Moore output without the one-clock delay penalty. The code follows the block diagram and uses four processes, one for state register, one for output buffer, one for next-state logic and one for look-ahead output logic. Although it is somewhat lengthy, it is easy to understand. I believe the circuit can be described by using one clocked process with proper mix of signals and variables and reduce the code length by 3 quarters, but I feel it will be difficult to relate the code with the actual circuit diagram and vice versa. My 2 cents. Mike G.Article: 107112
Brad Smallridge wrote: > > It seemed that what I was looking at was completely flattened, part of the > mapping process? The only thing that was grouped was the carry chains for > some of the counters in the design. Everything else was in top. Should > I be adding constraints to my vhdl code instead of using the floorplanner? > The design is hierarchical, ie. different VHDL components for each submodule, yes? If so then the edif netlist should be hierarchical. Make sure the synthesizer you are using isn't set to flatten the design, and check in PAR to verify the properties are set to flatten. >>>How do I add registers to allow a bus to transverse across the chip and >>>not have the synth tool pack the registers into an SRL16? >> >>It has to be done in your RTL of course. > > > I still don't know what RTL is. Register Transfer Level design. It is basically the device independent HDL source code. My usage above is I guess improper. I should have said "It has to be done in your source of course". > > >>The easiest way to prevent SRL16 inference is to put a reset on the >>flip-flops. > > > That's clever. Whatever. It works.Article: 107113
me_2003@walla.co.il wrote: >>You can configure the two ports separately, e.g. one can be 1-bit wide, >>the other 9 bits wide. > > > That's an interesting idea, If I define portA to be 1 bit wide and > portB 6 bits wide, when reading from portB (address 0) would I get the > 6 values written to portA (address 0 to 5) ? I don't think 6 bits wide is an option, I'd expect only powers of 2 are allowed. Could you make port B 8 bits and just throw away 2 bits? So port A bit # = addr*8 + bit [0-5] -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architectureArticle: 107114
mikegurche@yahoo.com wrote: > One interesting example in FSM design is the look-ahead output buffer > discussed in section 10.7.2 of "RTL Hardware Design Using VHDL" > (http://academic.csuohio.edu/chu_p/), the book mentioned in the > previous thread. It is a clever scheme to obtain a buffered Moore > output without the one-clock delay penalty. The code follows the block > diagram and uses four processes, one for state register, one for output > buffer, one for next-state logic and one for look-ahead output logic. > Although it is somewhat lengthy, it is easy to understand. I believe > the circuit can be described by using one clocked process with proper > mix of signals and variables and reduce the code length by 3 quarters, > but I feel it will be difficult to relate the code with the actual > circuit diagram and vice versa. Combining the input and registered state this way allows for a non registered path from input to output. Is this ok? Or is there an assumption that the device connected to the output is itself latching on the clock edge? -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architectureArticle: 107115
Haven't had any experience with this yet, however in anticipation of an upcoming project I have been looking into solutions. So far it seems like the highest electrical performace solution is to use a Zd type connector and associated cable. This is a relatively new connector series that has been incorporated in the PICMG telecom standards. These connectors and cables are made by a number of companies including: Tyco Amp: http://catalog.tycoelectronics.com/TE/bin/TE.Connect?C=22438&F=0&M=CINF&BML=10576,16358,17560,17759,17654&GIID=0&LG=1&I=13&RQS=C~22438^M~FEAT^BML~10576,16358,17560,17759,17654^G~G ERNI: http://www.erni.com/ermetzdfront.htd Gore: http://www.gore.com/en_xx/products/cables/copper/backplane/gore_eye-opener_airmaxvs_cable_assemblies.html Hope this helps Brendan "vt2001cpe" <vt2001cpe@gmail.com> wrote in message news:1156447377.879040.291470@b28g2000cwb.googlegroups.com... > Anyone have experience with directly driving a cable with RocketIO? I > am interested in any information/experiences/advice regarding linking > two FPGAs via RocketIO over a cable. I have seen some signal > characterization information for high-speed links over copper, but > usually less than 800Mhz. I believe my implementation would use a a > less than 1 meter, but would like to know it works at 3, 5, > 10...meters. Ideally I would like to run the link at 10gbits, but > 6gbits could work. How feasible is this, or is it back to the drawing > board? > > Thanks in advance! > Dennis >Article: 107116
tweed_deluxe wrote: > I'm wondering what intrinsic ecomomic, technical, or "other" barriers > have precluded FPGA device vendors from taking this step. In other > words, why are there no advertised, periodic refreshes of older > generation FPGA devices. Reasonable question > > In the microprocessor world, many vendors have established a long and > succesful history of developing a pin compatible product roadmap for > customers. For the most part, these steps have allowed customers to > reap periodic technology updates without incurring the need to perform > major re-work on their printed circuit card designs or underlying > software. > > On the Xilinx side of the fence there appears to be no such parallel. > Take for example, Virtex-II Pro. This has been a proven work-horse for > many of our designs. It takes quite a bit of time to truly > understand and harness all of the capabilities and features offered by > a platform device like this. After making the investment to > develop IP and hardware targeted at this technology, is it unreasonable > to expect a forward looking roadmap that incorporates modest updates to > the silicon ? A step that doesn't require a flow blown jump to a new > FPGA device family and subsequent re-work of the portfolio of hardware > and, very often, the related FPGA IP ? > > Sure, devices like Virtex-5 offer capabilities that will be true > enablers for many customers (and for us at times as well). But why > not apply a 90 or 65 nm process shrink to V2-Pro, provide modest speed > bumps to the MGT, along with minor refinements to the hardware > multipliers. Maybe toss in a PLL for those looking to recover clocks > embedded in the MGT data stream etc. And make the resulting devices > 100% pin and code compatible with prior generations. > > Perhaps I'm off in the weeds. But, in our case, the ability to count > on continued refinement and update of a pin-comaptible products like > V2-Pro would result in more orders of Xilinx silicon as opposed to > fewer. > > The absence of such refreshes in the FPGA world leads me to believe > that I must be naive. So I am trying to understand where the logic is > failing. Its just that there are times I wish the FPGA vendors could > more closely parallel what the folks in the DSP and micro-processor > world do ... The FPGA market is not growing all that quickly, so the funds are not available for this. You will also find that the design life of FPGA products is shorter than DSP/microprocessors, plus they cannibalize sales of their 'hot new' devices, as well as confuse the designers. Sometimes, there are physical barriers, like changes to flip chip, and whole-die bonding, that mandate BGA. There, backwards compatible has to go - and that's the key reason for doing this. All those factors, mean this is unlikely to happen. What they CAN do, is try and keep ball-out compatible, over a couple of generations, but I'm not sure even that relatively simple effort is pushed too hard ? -jgArticle: 107117
I wouldn't bother, the abstraction added is not sufficient above properly written HDL to warrent the extra step. To get reasonable implimentation you will end up writting "C" in such a limited form it won't help you much. Sanka Piyaratna wrote: > Hi, > > What is your opinion on high level languages such as systems C, handel-C > etc. for FPGA development instead of VHDL/Verilog? > > SankaArticle: 107118
PeteS, There is a diode in every (Virtex) device. If you have an old data sheet, you might want to get a newer one. If you find a data sheet without a diode, let me know, and I will find which pins are the diode. It seems that there are those who do not understand how to cool our devices, as well. For help on heatsinks, etc. we have a lot of information that would allow up to (and perhaps more) 25 watts per device of heat. That is basically 100% of everything switching, at or near the BUFG clock maximum frequency. http://www.xilinx.com/bvdocs/userguides/ug112.pdf The heatsinking has to be able to remove the heat such that the 85C (commercial) or 100 C (industrial) junction temperature is not exceeded, Austin PeteS wrote: > A recent thread went over the maximum permitted power on a device. > > So why don't we have a sensing diode (well, a transistor with collector > tied to base works better) on the die somewhere? It's fairly easy to > do, according to my VLSI acquaintances, and with faster and faster IO > [implying as it does faster and faster logic switching as well], it > would make sense to see these on any largescale (more than 100 pins > perhaps) FPGA package. > > Comments? > > Cheers > > PeteS >Article: 107119
They actually kinda do die shrinks, but they give the new die a new name and alter a few other things. Example, a Virtex 2 becomes a Spartan 3, you get the idea... For each step in the process technology they make the trade offs that make sense for those geometries (e.g bigger memory). They first release a high priced wiz bang part with one name, then they follow up with a lower price smaller die using the same process and give it a different name. tweed_deluxe wrote: > I'm wondering what intrinsic ecomomic, technical, or "other" barriers > have precluded FPGA device vendors from taking this step. In other > words, why are there no advertised, periodic refreshes of older > generation FPGA devices. > > In the microprocessor world, many vendors have established a long and > succesful history of developing a pin compatible product roadmap for > customers. For the most part, these steps have allowed customers to > reap periodic technology updates without incurring the need to perform > major re-work on their printed circuit card designs or underlying > software. > > On the Xilinx side of the fence there appears to be no such parallel. > Take for example, Virtex-II Pro. This has been a proven work-horse for > many of our designs. It takes quite a bit of time to truly > understand and harness all of the capabilities and features offered by > a platform device like this. After making the investment to > develop IP and hardware targeted at this technology, is it unreasonable > to expect a forward looking roadmap that incorporates modest updates to > the silicon ? A step that doesn't require a flow blown jump to a new > FPGA device family and subsequent re-work of the portfolio of hardware > and, very often, the related FPGA IP ? > > Sure, devices like Virtex-5 offer capabilities that will be true > enablers for many customers (and for us at times as well). But why > not apply a 90 or 65 nm process shrink to V2-Pro, provide modest speed > bumps to the MGT, along with minor refinements to the hardware > multipliers. Maybe toss in a PLL for those looking to recover clocks > embedded in the MGT data stream etc. And make the resulting devices > 100% pin and code compatible with prior generations. > > Perhaps I'm off in the weeds. But, in our case, the ability to count > on continued refinement and update of a pin-comaptible products like > V2-Pro would result in more orders of Xilinx silicon as opposed to > fewer. > > The absence of such refreshes in the FPGA world leads me to believe > that I must be naive. So I am trying to understand where the logic is > failing. Its just that there are times I wish the FPGA vendors could > more closely parallel what the folks in the DSP and micro-processor > world do ...Article: 107120
Brendan Illingworth wrote: > Haven't had any experience with this yet, however in anticipation of an > upcoming project I have been looking into solutions. So far it seems like > the highest electrical performace solution is to use a Zd type connector and > associated cable. This is a relatively new connector series that has been > incorporated in the PICMG telecom standards. These connectors and cables > are made by a number of companies including: > > Tyco Amp: > http://catalog.tycoelectronics.com/TE/bin/TE.Connect?C=22438&F=0&M=CINF&BML=10576,16358,17560,17759,17654&GIID=0&LG=1&I=13&RQS=C~22438^M~FEAT^BML~10576,16358,17560,17759,17654^G~G > > ERNI: > http://www.erni.com/ermetzdfront.htd > > Gore: > http://www.gore.com/en_xx/products/cables/copper/backplane/gore_eye-opener_airmaxvs_cable_assemblies.html > > Hope this helps > Brendan > Thanks for the reply! These connectors look useful. I have noticed that people seem to have had success with FR4 connections of 40 inches or less. In som some cases that includes transmission through a backplane connector. Hope that helps with your application! --DennisArticle: 107121
Alan Nishioka wrote: > When using Xilinx, the best way to see what hardware is actually there > is to use fpga_editor. You don't even need a design; just create a new > one, make up a name, and select the part you want to look at. Then you > can double-click on the slice and see what is inside of it. > > Alan Nishioka > Rebooted into windows and launched fpga_editor, and was able to see the detail, sure enough there's an inverter right there. Thanks for the tip. Tried launching fpga_editor from linux but it doesn't work. dave% /Xilinx/bin/lin/fpga_editor Cannot register service: RPC: Unable to receive; errno = Connection refused unable to register (registryProg, registryVers, tcp) -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architectureArticle: 107122
me_2003@walla.co.il wrote: > Ben Jackson wrote: > > On 2006-08-24, me_2003@walla.co.il <me_2003@walla.co.il> wrote: > > > different bit field it should be also written to it (like a bit-wise OR > > > between current value and new value). > > > > You could have both sides write to their own BRAM and define the output > > of the whole thing as the OR of the outputs of the BRAMs. > > > > > Yes but I need a 6 bit wide vector - so that means that I would have to > use 6 BRAMs. > while utilizing a very small precentage of each (I need only 512 > entries). > > Thanks, Mordehay. As others suggested, you can ultilize 1 Bram, the writing port is 4k x1, the reading port is 512x8. No reason why it doesn't work.. Cheers,Article: 107123
mikegurche@yahoo.com wrote: > This approach is based on the observation that synthesis software is > weak on architecture-level manipulation but good at gate-level logic > minimization. I have observed that synthesis software does what it is told. If I describe two gates and a flop, that is what I get. If I describe a fifo or an array of counters, that is what I get. > The advantage of this approach is that I have better control on final > hardware implementation. Instead of blindly relying on synthesis > software and testing code in a trial-and-error basis, I can > consistently get what I want, regardless which synthesis software is > used. What I want is a netlist that sims the same as my code and makes reasonable use of the device resources. Synthesis does a good job of this with the right design rules. Trial and error would only come into play if I were to run synthesis without simulation. > On the downside, this approach requires more time in initial > design phase and the code is less compact. The VHDL code itself > sometimes can be cumbersome. But it is clear and easy to comprehend > when presented with the block diagram. I prefer clean, readable code, verified by simulation and static timing. I use the rtl viewer to convert my logical description to a structural one for review. -- Mike TreselerArticle: 107124
Hi Kevin, now I know more from your name than KJ ;-) >> My pipeline approach is just this little funny busy counter >> instead of a single ack and that a slave has to declare it's >> pipeline level (0 to 3). Level 1 is almost ever possible. >> It's more or less for free in a slave. Level 1 means that >> the master can issue the next read/write command in the same >> cycle when the data is available (rdy_cnt=0). Level 2 means >> issue the next command one cycle earlier (rdy_cnt=1). Still >> not a big issue for a slave (especially for a memory slave >> where you need a little state machine anyway). > > I'm assuming that the master side address and command signals enter the > 'Simpcon' bus and the 'Avalon' bus on the same clock cycle. Maybe this > assumption is where my hang up is and maybe JOP on Simpcon is getting a > 'head start' over JOP on Avalon. This assumption is true. Address and command (+write data) are issued in the same cycle - no magic there. In SimpCon this is a single cycle thing and there is no ack or busy signal involed in this first cycle. That means no combinatorial generation of ack or busy. And no combinatorial reaction of the master in the first cycle. What I loos with SimpCon is a single cycle latency access. However, I think this is not too much to give up for easier pipelining of the arbitration/data in MUX. > Given that assumption though, it's not clear to me why the address and > command could not be designed to also end up at the actual memory > device on the same clock cycle. Again, maybe this is where my hang up > is. The register that holds the address is probably a ALU result register (or in my case the top-of-stack). That one is usually buried deep in the design. Additional you have to generate your slave selection (chip select) from that address. This ends up with some logic and long routing pathes to the pins. In a practical example with the Cyclone 6-7 ns are not so uncommon. Almost one cycle at 100 MHz. Furthermore, this delay is not easy to control in your design - add another slave and the output delay changes. To avoid this unpredictability one will add a register at the IO pad for address and rd/wr/cs. If we agree on this additional register at the slave/memory interface we can drop the requirement on the master to hold the address and control longer than one cycle. Furthermore, as we have this minimum one cycle latency from master command till address/rd/wr/data on the pins we do not need an ack/busy indication during this command cycle. We just say to the master: in the cycle the follows your command you will get the information about ready or wait. > Given that address and command end up at the memory device on the same > clock cycle whether SimpCon or Avalon, the resulting read data would > then be valid and returned to the SimpCon/Avalon memory interface logic > on the same clock cycle. Pretty sure this is correct since this is > just saying that the external memory performance is the same which is > should be since it does not depend on SimpCon or Avalon. In SimpCon it will definitely arrive one cycle later. With Avalon (and the generated memory interface) I 'assume' that there is also one cycle latency - I read this from the tco values of the output pins in the Quartus timing analyzer report. For the SRAM interface I did in VHDL I explicitly added registers at the addredd/rd/wr/data output. I don't know if the switch fabric adds another cycle. Probably not, if you do not check the pipelined checkbox in the SOPC Builds. > Given all of that, it's not clear to me why the actual returned data > would show up on the SimpCon bus ahead of Avalon or how it would be any > slower getting back to the SimpCon or Avalon master. Again, this might > be where my hangup is but if my assumptions have been correct up to > this paragraph then I think the real issue is not here but in the next > paragraph. Completely agree. The read data should arrive in the same cycle from Avalon or SimpCon to the master. Now that's the point where this bsy_cnt comes into play. In my master (JOP) I can take advantage of the early knowledge when data will arrive. I can restart my waiting pipeline earlier with this information. This is probably the main performance difference. Going through my VHDL code for the Avalon interface I found on more issue with the JOP/Avalon interface: In JOP I issue read/write commands and continue to execute microcode if possible. Only when the result is needed the main pipeline waits for the slave result. However, the slave can deliver the result earlier than needed. In that case the slave has to hold the data for JOP. The Avalon specification guarantees the read data valid only for a single cycle. So I added a register to hold the data and got one cycle latency: * one register at the input pins for the read data * one register at the JOP/Avalon interface to hold the data longer than one cycle As I see it, this can be enhanced in the same way I did the little Avalon specification violation on the master side. Use a MUX to deliver the data from the input register in the first cycle and switch to the 'hold' register for the other cycles. Should change the interface for a fairer comparison. Thanks for pointing me to this :-) > If I got through this far then it comes down to....You say "Level 1 > means that the master can issue the next read/write command in the same > cycle when the data is available (rdy_cnt=0). Level 2 means issue the > next command one cycle earlier (rdy_cnt=1)." and presumably the > 'rdy_cnt=1' is the reason for the better SimpCon numbers. Where I'm > pretty sure I'm hung up then is why can't the Avalon slave drop the > wait request output on the clock cycle that corresponds to rdy_cnt=1 > (i.e. one before data is available at the master)? Because rdy_cnt has a different meaning than waitrequest. It is more like an early datavalid. Dropping waitrequest does not help with my pipeline restart thing. > rdy_cnt=1 sounds like it is allowing JOP on SimpCon to start up the > next transaction (read/write or twiddle thumbs) one clock cycle before > the read data is actually available. But how is that different than As above: the main thing is to get the master pipeline started early to use the read data. Perhaps this is a special design feature of JOP and not usable in a different master. I don't know. We would need to design a different CPU to evaluate if this feature is useful in the general case. > >> Enjoy this discussion :-) >> Martin > > Immensely. And I think I'll finally get the light bulb turned on in my > head after your reply. > BTW: As I'm also academic I should/have to publish papers. SimpCon is on my list for months to be published - and now it seems to be the right time. I will write a draft of the paper in the next few days. If you are interested I'll post a link to it in this thread and your comments are very welcome. Martin
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z