Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
seems that you posted in the wrond thread ;-) > http://indi.joox.net now has the first compiled quartus files for the > 16 bit indi core. > > basiclly and alu, control and registers, with fast interrupt switch. > > asynchro busy of cpu, and syncronous reset. > > the bus interface is not complete yet, as i have to think about the > expansion modules. Now you have to decide about: Avalon, SimpCon, Wishbone,.... Perhaps you can independetly compare Avalon and SimpCon with your CPU design :-) KJ can give you Avalon support, and I can give you SimpCon support. However, you will find lot of information already in this thread. MartinArticle: 107176
hi fpga_toys@yahoo.com wrote: > Peter Alfke wrote: > > Higher performance requires radical innovation and real cleverness > > these days. > > Peter Alfke > pin compatability is just customer support, how about a 1 pin high implies a self program from a small hardwired rom, which gets enough of the chip off the ground, to work as a programmer for itself and others. some of that extra space :-) internally they don't have to be the same, just roughly the same, as i'm sure there will be extra logic area. or how about a single sided io series, with 2 edges of for for corners, then a scale down is just more logic mapped to fewer pins. and extra die copies per cut chip. it just needs an interface mapping layer (ie new standard size pads, to old shrunk size pads (hyper buffers? or Capacitive resource.). and could someone put some analog low power fast comparators on please?? cheers jacko http://indi.joox.net a 24 blue block CPU element (16 bit)Article: 107177
zcsizmadia@gmail.com <zcsizmadia@gmail.com> wrote: > I inject a dll into the impact.exe, and hook DeviceIoControl and some > other kernel32 APIs. impact.exe calls DeviceIoControl to read/write LPT > I/O port using windriver6 driver. Instead of calling original windrvr > DeviceIoControl function I just forward the TMS/TDI/TDO/TCK bits to > Digilent USB (or any other programmer cable). Do you do this without the windriver header file? > On linux the easiest could be to create a brand new windrvr emulator > driver where we implement all the IOCTLs used by impact. BTW, I have no > clue why they are using the windriver as a device driver, because all > the features they do with the Jungo driver ius really simple and > generic(eg: user mode I/O access, USB acces to device, etc.) -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------Article: 107178
Peter, It's not enough with such a project to scan just to two leading manufacturers. In stead, you need to scan all, because the others (i.e. Lattice, Actel, Quicklogic) can just have the feature you need. I'll give an example: on the generic I/O's, both Altera and Xilinx can't get higher than 1.3 Gbps. Lattice's newest SC get I/O speeds up to 2Gpbs. (I can't comment on Actel's speed as I have never used them) At the logic side all three have about the same speed. As you will know, the highest system speed will depend on the design constraints (and also how well the tools are and how well you know the features). I shouldn't use a microprocessor (even not a soft core) as it is only addtional load (and taking away your resources). PS. why don't you mention the V5 - it should be intrinsically faster? Regards, Luc On 23 Aug 2006 11:12:36 -0700, "Peter Alfke" <peter@xilinx.com> wrote: >First, you have to decide how much logic you need, i.e. how much money >you want to spend. >Then you have to look at the two leading manufacturers, which are -in >order of size and speed- Xilinx and Altera >>From Xilinx, I would recommend the appropriate size Virtex-4 LX part, >or -if you need lots of multipliers and/or accumulators- the >appropriate size Virtex-4 SX part. > >If you are after max speed, you hardly need a microprocessor, but both >companies offer a soft microprocessor, it's called MicroBlaze in >Xilinx. > >Good luck, sounds like a fun project. >Peter Alfke, XilinxArticle: 107179
Hello, I need a linear priority encoder that has N input and N outputs. Searching the group, I saw a thread where Peter Alfke stated : --- cut --- > Let me tell you what can be done in Virtex-4 (probably also in > Spartan3): > A priority "linear encoder" with 4 x N inputs and 4 x N outputs, each > output corresponding to a prioritized input. > Only one output is ever active, the one corresponding to the > highest-priority active input. > Total cost: 5N+1 (LUTs+flip-flops). > Such a 32-input linear priority encoder uses 41 LUTs = 21 slices (<6 > CLBs), and runs at >250 MHz. > The design is fully modular (per 4 bits). >Peter Alfke --- cut --- But no details where given. Can someone provide more details on how that implemented in a slice ? Thanks. SylvainArticle: 107180
Sylvain Munaut <SomeOne@SomeDomain.com> schrieb: > Hello, > > I need a linear priority encoder that has N input and N outputs. > Searching the group, I saw a thread where Peter Alfke stated : > > --- cut --- > > > Let me tell you what can be done in Virtex-4 (probably also in > > Spartan3): > > A priority "linear encoder" with 4 x N inputs and 4 x N outputs, each > > output corresponding to a prioritized input. > > Only one output is ever active, the one corresponding to the > > highest-priority active input. > > Total cost: 5N+1 (LUTs+flip-flops). > > Such a 32-input linear priority encoder uses 41 LUTs = 21 slices (<6 > > CLBs), and runs at >250 MHz. > > The design is fully modular (per 4 bits). > >Peter Alfke > > --- cut --- > > But no details where given. > > Can someone provide more details on how that implemented in a slice ? > > Thanks. > > Sylvain was possible meant as brain teaser! AnttiArticle: 107181
David Ashley <dash@nowhere.net.dont.email.me> writes: > Actually the question I have is what kinds of programmer cables can > I use with linux. Impact -- I'd just as soon not use it. I prefer open > source command line utilities. In fact I want everything to be a > command line utility -- get rid of the IDE's. I use my own editor > and "make" and I'm happy. > I've done just this in Linux with a bit of makefile like this: prog: work/$(TOP).bit @echo "setPreference -pref StartupCLock:AUTO_CORRECTION" > work/impact.c md @echo "setMode -bs" >> work/impact.cmd @echo "setCable -port auto" >> work/impact.cmd @echo "Identify" >> work/impact.cmd @echo "setAttribute -position 1 -attr configFileName -value $(TOP).bit" >> work/impact.cmd @echo "Program -p 1 " >> work/impact.cmd @echo "quit" >> work/impact.cmd cd work && impact -batch impact.cmd make prog then does the work... Of course, ideally, ditching Impact would be good ;-) Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.conekt.net/electronics.htmlArticle: 107182
"Antti" <Antti.Lukats@xilant.com> writes: > GaLaKtIkUs™ schrieb: > > A small question: what does mean gosh? > > > > A+ > > gosh no idea! > > maybe is another perfectly perfect word, like "spunk" invented by Pippi > > Antti > http://www.thefreedictionary.com/gosh gosh Pronunciation (gsh) interj. Used to express mild surprise or delight. [Alteration of God.] Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.conekt.net/electronics.htmlArticle: 107183
On 25 Aug 2006 01:26:55 -0700, "Sylvain Munaut <SomeOne@SomeDomain.com>" <246tnt@gmail.com> wrote: >Hello, > >I need a linear priority encoder that has N input and N outputs. >Searching the group, I saw a thread where Peter Alfke stated : > >--- cut --- > >> Let me tell you what can be done in Virtex-4 (probably also in >> Spartan3): >> A priority "linear encoder" with 4 x N inputs and 4 x N outputs, each >> output corresponding to a prioritized input. >> Only one output is ever active, the one corresponding to the >> highest-priority active input. >> Total cost: 5N+1 (LUTs+flip-flops). >> Such a 32-input linear priority encoder uses 41 LUTs = 21 slices (<6 >> CLBs), and runs at >250 MHz. >> The design is fully modular (per 4 bits). >>Peter Alfke > >--- cut --- > >But no details where given. > >Can someone provide more details on how that implemented in a slice ? > >Thanks. > > Sylvain Let me try to verbally sketch this out for you for 4 bits: Imagine two colums of luts. There is one lut on the left for 4 inputs and 4 on the right one for each output. The left LUT is a 4 input OR and all the LUTs on the right are AND gates but with bubbles on some inputs. The top right LUT is just an AND of top first input bit and the output of left LUT. The second LUT on the right has first input inverted, second bit and output of left LUT. The third LUT has first two inputs inverted, third input, and output of left LUT. The last LUT has the first 3 inputs inverted and output of left LUT. This gives you 4 input, 4 output priority encoder with 5 LUTs. Now you have to be able to cascade this ie you have to disable the output of left (OR) LUT by a higher up left LUT. For this purpose you can use the carry chain. HTH.Article: 107184
Martin, Thanks for the detailed response. OK, we're definitely in the home stretch on this one. To summarize... >> I'm assuming that the master side address and command signals enter the >> 'Simpcon' bus and the 'Avalon' bus on the same clock cycle. > This assumption is true. Address and command (+write data) are > issued in the same cycle - no magic there. So Avalon and SimpCon are both leaving the starting blocks at the same time....no false starts from the starting gun. >> Given that assumption though, it's not clear to me why the address and >> command could not be designed to also end up at the actual memory >> device on the same clock cycle. I don't think your response here hit my point. I wasn't questioning on which cycle the address/command/write data actually got to the SRAM, just that I didn't see any reason why the Avalon or SimpCon version would arrive on different clock cycles. Given the later responses from you though I think that this is true....we'll get to that. >> Given that address and command end up at the memory device on the same >> clock cycle whether SimpCon or Avalon, the resulting read data would >> then be valid and returned to the SimpCon/Avalon memory interface logic >> on the same clock cycle. > In SimpCon it will definitely arrive one cycle later. With Avalon > (and the generated memory interface) I 'assume' that there is also > one cycle latency - I read this from the tco values of the output > pins in the Quartus timing analyzer report. For the SRAM interface I > did in VHDL I explicitly added registers at the addredd/rd/wr/data > output. I don't know if the switch fabric adds another cycle. > Probably not, if you do not check the pipelined checkbox in the SOPC > Builds. Again, when I was saying 'the same clock cycle' I'm referring to clock cycle differences between Avalon and SimpCon. In other words, if the SimpCon/Avalon bus cycle started on clock cycle 0, then when we start talking about when the data from the SRAM arriving back at the input to the FPGA, then with both designs it happens on clock cycle 'N'. For the relative comparison between the two busses, I don't much care what 'N' is (although it appears to either be '1' or '2') just that 'N' is the same for both designs. Again, I *think* you might be agreeing that this is true here, but coming up is a more definitive agreement. By the way, no Avalon does not add any clock cycle latency in the fabric. It is basically just a combinatorial logic router as it relates to moving data around. >> Given all of that, it's not clear to me why the actual returned data >> would show up on the SimpCon bus ahead of Avalon or how it would be any >> slower getting back to the SimpCon or Avalon master. Again, this might >> be where my hangup is but if my assumptions have been correct up to >> this paragraph then I think the real issue is not here but in the next >> paragraph. > > Completely agree. The read data should arrive in the same cycle from > Avalon or SimpCon to the master. And this is a key point. So regardless of the implementation (SimpCon or Avalon), the JOP master starts the command at the same time for both and the actual data arrives back at the JOP master at the same time. So the race through the data path is identical....(whew!), now on to the differences. > Now that's the point where this > bsy_cnt comes into play. In my master (JOP) I can take advantage of > the early knowledge when data will arrive. I can restart my waiting > pipeline earlier with this information. This is probably the main > performance difference. To contine the race analogy...So in some sense, even though the race through the data path ends in a tie, the advantage you feel you have with SimpCon is that the JOP master is endowed with the knowledge of when that race is going to end by virtue of this bsy_cnt signal and with Avalon you think you don't have this apriori knowledge. So to the specifics now...I'm (mis)interpreting this to mean that if 'somehow' Avalon could give JOP the knowledge of when 'readdatavalid' is going to be asserted one clock cycle earlier before it actually is then JOP on Avalon 'should' be able to match JOP on SimpCon in performance, is that correct? (Again, this is a key point, where if this assumption is not correct, the following paragraphs will be irrelevant). So under the assumption that the key problem to solve is to somehow enable the Avalon JOP master with the knowledge of when 'readdatavalid' is going to be asserted, one clock cycle before it actually is I put on my Avalon Mr. Wizard hat and say....well, gee, for an Avalon connection between a master and slave that are both latency aware (i.e. they implement 'readdatavalid') the Avalon specification requires that the 'waitrequest' output be asserted at least one clock cycle prior to 'readdatavalid'. It can be more than one and it can vary (what Avalon calls 'variable latency') but it does have to be at least one clock cycle. Since the Avalon slave design is under your design control, you could design it to act just this way, to assert 'readdatavalid' one clock cycle after dropping 'waitrequest'. So now, I have my 'early readdatavalid' signal. Now inside the JOP master, currently you have some sort of signal that I'll call 'start_the_pipeline' which is currently based on this busy_cnt hitting a particular count. 'start_the_pipeline' happens to fire one clock cycle prior to the data from the SRAM actually arriving back at JOP (from the previously stated and possibly incorrect assumption). My Avalon equivalent cheat to the sort of SimpCon cheating about having apriori knowledge about when the race completes is simply the following start_the_pipeline <= Jop_Master_Read and not(JOP_Master_Wait_Request); To reiterate, this JOP master side equation is working under the assumption that the Avalon slave component that interfaces to the actual SRAM is designed to assert it's readdatavalid output one clock cycle after dropping it's waitrequest output. So in some sense now I've endowed the Avalon JOP with the same sort of apriori knowledge of when the data is available that the SimpCon implementation is getting. And here is another point where I think we need to stop and flat out agree or not agree that - My stated assumption that if Avalon was to 'somehow' provide JOP a 'early readdatavalid' signal that was one clock earlier than 'readdatavalid' than the two JOP impliementations should have the same performance. - The implementation of the Avalon slave component and the timing of waitrequest and readdatavalid is entirely doable. - Given the apriori knowledge that the Avalon slave is now 'known' to have one clock cycle latency (the Avalon definition of latency measuring clock cycle delay from waitrequest to readdatavalid) that the equation for start_the_pipeline is doable and correct and will allow JOP on Avalon to get started whatever it needs to get started on the exact same clock cycle as JOP on Simpcon. Assuming that we agree on those three points, then I think it is safe to say that I didn't really do any worse 'cheating' on Avalon than you did with SimpCon. I'm using standard Avalon signals to accomplish the interface to JOP, I've smartened up the master side to have it 'know' about the latency in just the same way that SimpCon knows about it. I'm also guessing that there might not even need to be any sort of cheating on Avalon either. By that I mean, what if slower RAM was used and an extra clock cycle or two was needed. In the SimpCon design you would need to update the master side logic to tell it about the correct new latency. On the Avalon implementation the slave side 'could' now be asserting readdatavalid 2 or 3 clock cycles after after dropping waitrequest (which is entirely permissable and doable) and the magic 'start_the_pipeline' signal could now give JOP and extra clock cycle or two head start on readdatavalid. Whether that works or breaks JOP is something only you would know, but it might be worth pondering a bit. In fact, if it does break JOP is there some issue with the JOP design? What is magic about getting a one clock cycle head start versus two or three? I'm not really expecting an answer here it might just be that's the way it is but I was jus playing devil's advocate and seeing if there is some inherent reason why JOP couldn't work with even earlier flags. Anyway, if it doesn't break JOP, then really the Avalon master side doesn't need apriori knowledge of this *one* clock cycle delay at all, it could be anything. If however, it would break JOP to have 'start_the_pipeline' come 2 or 3 clock cycles before readdatavalid then this simply means that the new Avalon slave interface to the SRAM would have to be redesigned to maintain the one clock cycle Avalon latency between the waitrequest and readdatavalid outputs but nothing on the master side would need changing. So with SimpCon, the design change required to accomodate the now slower SRAMs would be made in the SimpCon master via the busy_cnt signal; with the Avalon implementation the design change would be made in the slave interface to the SRAM. Either one should be about the same amount of work I would think. > As I see it, this can be enhanced in the same way I did the little > Avalon specification violation on the master side. Use a MUX to > deliver the data from the input register in the first cycle and > switch to the 'hold' register for the other cycles. Should change > the interface for a fairer comparison. I agree, without the mux you've hindered the Avalon implementation. I guess also to be fair, one would need to look at resource usage of the two designs as well and see how the two compare given supposedly 'equivalent' implementations. Maybe SimpCon has some advantage in that regard. Maybe the muxing costs logic, or maybe it all synthesizes to exactly the same thing. Avalon being a mostly combinatorial fabric, reduces quite well but to be honest, the way they keep track of pending reads to slaves that have latency is pretty poor. I don't think it will show up in this case because there just can't be a lot of reads pending but I had a design where I provided a bridge to a 33 MHz PCI bus which went into another processor which then wrote or fetched data from it's DRAM then provided it back to PCI and there I needed to provide for a fairly hefty number of pending reads (~64 or so I believe) to keep things moving along. It's also where I learned that you really don't want to go overboard with the use of readdatavalid either. Use it where you need the highest performance only, otherwise use waitrequest only. Otherwise the SOPC Builder generated code that gets generated sloooooooooooows has terrible clock cycle performance for general busses with lots of slaves. > Because rdy_cnt has a different meaning than waitrequest. It is more > like an early datavalid. Dropping waitrequest does not help with my > pipeline restart thing. True, but I believe I've addressed the root of what those differences are. >>> Enjoy this discussion :-) >>> Martin >> >> Immensely. And I think I'll finally get the light bulb turned on in my >> head after your reply. >> > BTW: As I'm also academic I should/have to publish papers. SimpCon > is on my list for months to be published - and now it seems to be > the right time. I will write a draft of the paper in the next few > days. If you are interested I'll post a link to it in this thread > and your comments are very welcome. > OK. KJArticle: 107185
> > Let's say the address/command phase is per definition one cycle. > > That definition frees the master to do whatever it wants in the next > cycle. For another request to the same slave it has to watch for the > rdy_cnt in SimpCon. However, you can design a switch fabric with > SimpCon where it is legal to issue a command to a different slave in > the next cycle without attention to the first slave. You can just > ignore the first slaves output until you want to use it. In Avalon this would happen as well. By your definition, the Avalon slave (if it needed more than one clock cycle to totally complete the operation) would have to store away the address and command. It would not assert waitrequest on the first access. If the subsequent access to that slave occurred while the first was still going on it would then assert wait request but accesses to other slaves would not be hindered. The Avalon approach does not put this sort of stuff in the switch fabric but inside the slave design itself. In fact, the slave could queue up as many commands as needed (i.e. not just one) but I don't get the impression that SimpCon would allow this because there is one rdy_cnt per slave (I'm guessing). >> The Avalon fabric 'almost' passes the waitrequest signal right back to >> the >> master device, the only change being that the Avalon logic basically >> gates >> the slave's waitrequest output with the slave's chipselect input (which >> the >> Avalon fabric creates) to form the master's waitrequest input (assuming a >> simple single master/slave connection for simplicity here). Per Avalon, > > I'm repeating myself ;-) That's the point I don't like in Avalon, > Wishbone, OPB,...: You have a combinatorial path from address > register - decoding - slave decision - master decision (to hold > address/command or not). With a few slaves this will not be an > issue. With more slaves or a more complicated interconnect (multiple > master) this can be your critical path. You're right, in fact it most likely will be the critical path. Does SimpCon support different delays from different slaves? If not and 'everyone' is required to have the same number of wait states than I can see where SimpCon would have a performance advantage in terms of final clock speed on the FPGA, the tradeoff being that...everyone MUST have the same number of wait states. Whether that is a good or bad tradeoff is a design decision specific to a particular design so in that regard it's good to have both SimpCon (as I limitedly understand it) and Avalon. If SimpCon does allow for different slaves to have different delays than I don't see how SimpCon would be any better since there would still need to be address decoding done to figure out what the rdy_cnt needs to count to and such. Whether that code lives in the master side logic or slave side logic is irrelevant to the synthesis enging. >> how it appears to me, which is why I asked him to walk me through the > As described in the other posting: Yep, go that posting for the blow by blow description. KJArticle: 107186
backhus wrote: > > In my original post I had no intention to reach a common consensus. I > > wanted to see practical code examples which demonstrate the various > > techniques and discuss their relative merits and disadvantages. > > > > Kind regards, > > Eli > > Hi Eli, > Ok, that's something different. > Earns some contribution from my side :-) > > My example uses 3 Processes. > The first one is the simple state Register. > the second is the combinatocrical branch selection, > The third creates the registered outputs. > > Recognize that the third process uses NextState for the case selection. > Advantage: Outputs change exactly at the same time as the states do. > Disadvantage: The branch logic is connected to the output logic, causing > longer delays. > Workaround: If a one clock delay of the outputs doesn't matter, Current > State can be used instead. > > The only critical part I see is the second process. Because it's > combinatorical some synthesis tools might generate latches here, when > the designer writes no proper code. But we all should know how to write > latch free code, don't we? ;-) > > The structure is very regular, which makes it a useful template for > autogenerated code. > > Have a nice synthesis > Eilert > > ENTITY Example_Regout_FSM IS > PORT (Clock : IN STD_LOGIC; > Reset : IN STD_LOGIC; > A : IN STD_LOGIC; > B : IN STD_LOGIC; > Y : OUT STD_LOGIC; > Z : OUT STD_LOGIC); > END Example_Regout_FSM; > > > ARCHITECTURE RTL_3_Process_Model_undelayed OF Example_Regout_FSM IS > TYPE State_type IS (Start, Middle, Stop); > SIGNAL CurrentState : State_Type; > SIGNAL NextState : State_Type; > > BEGIN > > FSM_sync : PROCESS(Clock, Reset) > BEGIN -- CurrentState register > IF Reset = '1' THEN > CurrentState <= Start; > ELSIF Clock'EVENT AND Clock = '1' THEN > CurrentState <= NextState; > END IF; > END PROCESS FSM_sync; > > FSM_comb : PROCESS(A, B, CurrentState) > BEGIN -- CurrentState Logic > CASE CurrentState IS > WHEN Start => > IF (A NOR B) = '1' THEN > NextState <= Middle; > END IF; > WHEN Middle => > IF (A AND B) = '1' THEN > NextState <= Stop; > END IF; > WHEN Stop => > IF (A XOR B) = '1' THEN > NextState <= Start; > END IF; > WHEN OTHERS => NextState <= Start; > END CASE; > END PROCESS FSM_comb; > > FSM_regout : PROCESS(Clock, Reset) > BEGIN -- Output Logic > IF Reset = '1' THEN > Y <= '0'; > Z <= '0'; > ELSIF Clock'EVENT AND Clock = '1' THEN > Y <= '0'; -- Default Value assignments > Z <= '0'; > CASE NextState IS > WHEN Start => NULL; > WHEN Middle => Y <= '1'; > Z <= '1'; > WHEN Stop => Z <= '1'; > WHEN OTHERS => NULL; > END CASE; > END IF; > END PROCESS FSM_regout; > END RTL_3_Process_Model_undelayed; Hi, Eilert, I generally use this style but with a different output segment. I have three output logic templates: Template 1: vanilla, unbuffered output -- FSM with unbuffered output -- Can be used for Mealy/Moore output -- (include input in sensitivity list for Mealy) FSM_unbuf_out : PROCESS(CurrentState) Y <= '0'; -- Default Value assignments Z <= '0'; CASE CurrentState IS WHEN Start => NULL; WHEN Middle => Y <= '1'; Z <= '1'; WHEN Stop => Z <= '1'; WHEN OTHERS => NULL; END CASE; END IF; END PROCESS FSM_regout; Template 2: add buffer for output (There are 4 processes now ;-) -- FSM with buffered output -- there is a 1-clock delay -- can be used for Mealy/Moore output FSM_unbuf_out : PROCESS(CurrentState) Y_tmp <= '0'; -- Default Value assignments Z_tmp <= '0'; CASE CurrentState IS WHEN Start => NULL; WHEN Middle => Y_tmp <= '1'; Z_tmp <= '1'; WHEN Stop => Z_tmp <= '1'; WHEN OTHERS => NULL; END CASE; END IF; END PROCESS FSM_unbuf_out; -- buffer for output signal FSM_out_buf : PROCESS(Clock, Reset) BEGIN -- Output Logic IF Reset = '1' THEN Y <='0'; -- Default Value assignments Z <='0'; ELSIF Clock'EVENT AND Clock = '1' THEN Y <= Y_tmp ; -- Default Value assignments Z <= Z_tmp; END IF; END PROCESS FSM_out_buf; Template 3: buffer with "look-ahead" output logic -- FSM with look-ahead buffered output -- no 1-clock delay -- can be used for Moore output only FSM_unbuf_out : PROCESS(NextState) Y_tmp <= '0'; -- Default Value assignments Z_tmp <= '0'; CASE NextState IS WHEN Start => NULL; WHEN Middle => Y_tmp <= '1'; Z_tmp <= '1'; WHEN Stop => Z_tmp <= '1'; WHEN OTHERS => NULL; END CASE; END IF; END PROCESS FSM_unbuf_out; -- buffer for output signal -- same as template 2 FSM_out_buf : PROCESS(Clock, Reset) . . . The code is really lengthy. However, as you indicated earlier, its structure is regular, and can be served as a template or even autogenerated. I develop the template based on "http://academic.csuohio.edu/chu_p/rtl/chu_rtL_book/rtl_chap10_fsm.pdf" It is a very good article on FSM (or very bad, if this is not your coding style). Mike G.Article: 107187
Hello group, I would appreciate help in capturing the DMA done interrupt. If anybody could point me to a working example or a give any pointers on what I am doing wrong, I would be very gratefull. So, what am I doing? I am using the Xilinx EDK 8.1 tool and have generated a system with a custom peripheral. The custom peripheral was created using the "Create or Import Peripheral" wizard and includes DMA, FIFOs, and user logic interrupt support. However, the interrupt service routine is not called when a DMA transfer completes. Currently I need to poll the interrupt status register (ISR) to see if a DMA transfer has finished. The user logic stub generated by the wizard generates an interrupt approx. every 10s @ 100Mhz, and when this interrupt occurs the interrupt the service routine is called and I see a message on my UART. So I know the service routine is hooked up correctly, that the interrupt enable register (IER) and the global interrupt enable register (GIER) are set correctly. I made sure that the INCLUDE_DEV_ISC is set to 1 in the VHDL code to make sure the device interrupts are included. And sure enough, when I try to transfer data from an empty FIFO, the service routine is also called because of a transaction error interrupt. So I know the interrupts from the device are enabled and working correctly. After that I checked that the device interrupt enable register (DIER) is set to enable all interrupts from the device. (Not only the IPIR bit for the user logic interrupts and the TERR for the transaction error but all of them). Furthermore I enabled the interrupts in the DMA0 and DMA1 interrupt enable registers (DMA0_IER and DMA1_IER). However, the only interrupts that cause a call to my service routine are the timer interrupt from the user logic and transaction error. I don't know why the service routine is not called when a transfer completes. I tried finding some working examples, but all the examples I could find so far use polling on the ISR register to wait for the transfer to complete. Any pointers to get me in the right direction would be appreciated. Thanks, MartijnArticle: 107188
> Let me try to verbally sketch this out for you for 4 bits: > Imagine two colums of luts. There is one lut on the left for 4 inputs > and 4 on the right one for each output. The left LUT is a 4 input OR > and all the LUTs on the right are AND gates but with bubbles on some > inputs. The top right LUT is just an AND of top first input bit and > the output of left LUT. The second LUT on the right has first input > inverted, second bit and output of left LUT. The third LUT has first > two inputs inverted, third input, and output of left LUT. The last LUT > has the first 3 inputs inverted and output of left LUT. This gives you > 4 input, 4 output priority encoder with 5 LUTs. Now you have to be > able to cascade this ie you have to disable the output of left (OR) > LUT by a higher up left LUT. For this purpose you can use the carry > chain. Thanks ! Exactly what I was looking for ;) SylvainArticle: 107189
Martin, A bit of an ammendment to my previous post starting... KJ wrote: > Martin, > > Thanks for the detailed response. OK, we're definitely in the home stretch > on this one. After pondering a bit more, I believe the Avalon slave component to the SRAM should NOT have a one clock cycle delay between waitrequest de-asserted and readdatavalid asserted since that obviously would stall the Avalon master (JOP) needlessly. Instead the slave component should simply assert waitrequest when a request comes in while it is still busy processing an earlier one. Something along the lines of... process(Clock) begin if rising_edge(Clock) then if (Reset = '1') or (Count = MAX_COUNT) then Wait_Request <= '0'; elsif (Chip_Select = '1') then Wait_Request <= '1'; end if; end if; end process; where 'Count' and MAX_COUNT are used to count however many cycles it takes for the SRAM data to come back/or be written. If the SRAM only needs one clock cycle then the term "(Count = MAX_COUNT)" could be replaced with simply "Wait_Request = '1'" So now back on the Avalon master side, I can still count on the Avalon waitrequest to precede readdatavalid but now I've removed the guarantee that the slave will make the delay between the two to be exactly one clock cycle. To compensate, I still would key off when the JOP Avalon master read signal is asserted and waitrequest is not asserted. In other words the basic logic of my 'start_the_pipeline' signal is OK, but depending on what the actual latency is for the design, maybe it needs to be delayed by a clock cycle or so. In any case, that signal will still provide an 'early' form of the Avalon readdatavalid signal and I think all of my points on that previous post would still apply. Hopefully you've read this post before you got too far into typing a reply to that post. After yet more pondering on whether this is 'cheating' on the Avalon side or not I think perhaps it's not. The 'questionable' logic is in the generation of the 'start_the_pipeline' signal that keys off of waitrequest and uses it to produce this 'early data valid' signal. But this logic is simply a part of what I would consider to be a SimpCon to Avalon bridge. As such, that bridge is privy to whatever signals and apriori knowledge that the SimpCon bus specification provides as well as whatever signals and apriori knowledge that the Avalon bus specification provides and has the task of mating the two. If SimpCon needs an 'early data valid' signal as part of the interface then it also needs to pony up to providing whatever info that the SimpCon master has in regards to being able to know ahead of time when that data will be valid...in other words, it would need to know the same thing that you used to generate your rdy_cnt or busy_cnt whatever it was called. So I've basically concluded that while it might appear on the surface to be a 'cheat' to use the Avalon signals as I have, you can only say that if you're looking strictly at Avalon alone. But since the function being implemented is a bridge between SimpCon and Avalon, use of SimpCon information to implement that function is fair game and not a 'cheat'. KJArticle: 107190
Hi, anyone ever had something like this: "ERROR:Xst:800 - "C:/MyFolder/control_unit_top.vhd" line 317: Multi-source on Integers in Concurrent Assignment."? It refers to a process line: "control_make_reply: process(ctrl_top_clock)" I'll provide further details if needed, but for this first post I was just wondering if someone ever saw it working with ISE7.1 Thanks, MarcoArticle: 107191
On a sunny day (24 Aug 2006 23:20:57 -0700) it happened fpga_toys@yahoo.com wrote in <1156486857.335345.201150@i3g2000cwc.googlegroups.com >:There is a reason the software world doesn't allow >software engineers to write production programs in native machine >language ones and zeros, or even use high level assembly languange in >most cases .... and increasingly not even low level C code. There are many many cases where ASM on a micro controller is to be preferred. Not only for code-size, but also for speed, and _because of_ simplicity. For example PIC asm (Microchip) is so simple, and universal, and there is so much library stuff available, that it is, at least for me the _only_ choice for simple embedded projects. No way 'C'. Yes hardware engineers add sometimes great functionality with a few lines of ASM or maybe Verilog, cool! I guess it is a matter of learning, I started programming micros with switches and 0010 1000 etc, watch the clock cycles.. you know. Teaches you not to make any mistakes, as re-programming an EPROM took 15 to 20 minutes erase time first... Being _on_ the hardware (registers) omits the question of 'what did that compiler do', in many cases gives you more flexibility. C already puts some barrier, special versions of C for each micro support special functions in these micros.... But from a 'newcomer' POV perhaps coding part of a project in a higher level language... but hey... In spite of what I just wrote .. anyways why wants everybody all of the sudden Linux in FPGA? So they can then write in C? Have it slower then in a cheap mobo ? Ok, OTOH I appreciate the efforts for a higher level programming, as long as the one who uses it also knows the lower level. That sort of defuses your argument that 'engineers need less training' or something like that, the thing will have to interface to the outside world too, C or not. >The time >for allowing engineers to design hardware at the ones and zeros binary >level is passing too as the tools like System C emerge and produce >reasonable synthesis with co-design. Much more important then _how_ it is coded is the design, idea, behind it. Yes one sort of paint may give better results then an other painting the house, but wrong colors will be wrong with all kinds of paint. If it becomes a fine line, like art, then it is the painter not the paint.Article: 107192
zcsizmadia@gmail.com wrote: > I've created a patch for Impact so it supoorts Digilent USB. This patch > could be modified to become some kind of SDK for different programmer > cable which are not supported by Impact. > > So here is my survey. What kind of programmer cables would you like to > use with Impact? > > Regards, > > Zoltan > The support of Amontec JTAGkey would be very nice (based on FTDI FT2232 - FT2232L to be green). The Amontec JTAGkey is one of the only USB JTAG POD with very large io voltage range (all IOs can drive at 24mA) ! goto http://www.amontec.com/jtagkey.shtml Note: We are working on a new version cheaper but without ESD-EMI overvoltage ... protection and for 5V to 2.8V only. Coming in the next two weeks. We are working on our own .dll for controlling the JTAGkey integrating JTAG layer. Let me know if you want to receive one Amontec JTAGkey as sample (we will ship to you without any charge for you), this can be help for your integration. If interested, send me an email to laurent DOT gauch @ amontec DOT com Regards, Laurent www.amontec.comArticle: 107193
Sylvain Munaut <SomeOne@SomeDomain.com> wrote: > > Let me try to verbally sketch this out for you for 4 bits: > > Imagine two colums of luts. There is one lut on the left for 4 inputs > > and 4 on the right one for each output. The left LUT is a 4 input OR > > and all the LUTs on the right are AND gates but with bubbles on some > > inputs. The top right LUT is just an AND of top first input bit and > > the output of left LUT. The second LUT on the right has first input > > inverted, second bit and output of left LUT. The third LUT has first > > two inputs inverted, third input, and output of left LUT. The last LUT > > has the first 3 inputs inverted and output of left LUT. This gives you > > 4 input, 4 output priority encoder with 5 LUTs. Now you have to be > > able to cascade this ie you have to disable the output of left (OR) > > LUT by a higher up left LUT. For this purpose you can use the carry > > chain. > > Thanks ! Exactly what I was looking for ;) > Well, actually I'm having trouble with the chaining ... how do you implement what you describe (masking the lower priority OR luts) when a higher one is active. I have the slice schema before me and I can't figure that out ... (damn, am I slow today ...) I see that : - I could chain two stages, but no more than two. - I could use a big or of all the upper priority groups with the carry chain and then use it to mask the left OR. But then for 3 groups, I would need : - Upper priority group : - Just 1 LUT4 for the OR + 4 LUT4 for encoding - Middle priority group : - 1 LUT4 for the OR + 1 LUT4 for the 'masking' + 4 LUT4 for encoding - Low priority group - 1 LUT4 for the OR + 2 LUT4 for the masking + 4 LUT4 for encoding So that would be 18 LUT4. And 5*3 + 1 = 16; So there is better ;) SylvainArticle: 107194
Antii, The cableserver methos sounds much better. In that case the full Digilent USB could be utilized. Next week I'll look into it. Uwe, Yes and no. I use the windrvr.h to check which IOCTL is important to overwrite. The actual LPT I/O communication is very simple. You can see the location of port and data in the packets. BTW I use strace to log low level API activity by impact. http://www.bindview.com/Services/RAZOR/Utilities/Windows/strace_readme.cfm ZoltanArticle: 107195
Hi Louis, Did you manage to fix your problem? If so I'd be interested to know the solution, as my struggling with the UC2 myself... Anyone else using the UltraController II? I'd be interested to talk with someone who managed to get a system with UC2 + SystemAce running. Patrick louis lin wrote: > Hi Patrick, > > Thank you for your help. > Actually, I patched Answer #23011 before going through the flow > in EDK 8.1i. > I'll try to use uc2.vhd instead of uc2.ngc. > > Regards, > louis > > > "Patrick Dubois" <prdubois@gmail.com> > :1155922197.062518.72270@75g2000cwc.googlegroups.com... > > Hi Louis, > > > > I'm also currently working with the UC2, but on a Virtex-II Pro. > > > > First of all, make sure to replace some files as indicated in Answer > > Record #23011: > > http://tinyurl.com/e7c2g > > > > I also just solved a bug with the UC2 design on the Virtex-II Pro (I > > spent a week on it). It turns out that the file uc2.ngc is buggy. There > > is no error given by Xilinx tools however (which made is quite hard to > > debug), but the resulting bit file cannot be programmed in the FPGA > > (programming fails in Impact). The solution is to not use uc2.ngc at > > all and instead use uc2.vhd, also provided in the reference design. I > > don't know if the reference design for the V4 has the same issue > > though... > > > > Best of luck, > > > > Patrick Dubois > > > > > > > > louis lin wrote: > > > Has anyone tried Ultracontroller PROM solution of V4 in EDK 8.1i? > > > > > > The reference example is built by EDK 7.1i. > > > I went through the flow again in ISE 8.1 SP3 + EDK 8.1i SP2. > > > However, the resultant MCS can't program the XC4VFX12 properly > > > (DONE LED didn't go high). > > > > > > After I got the reference, I only added the -nostartfiles compiler > option to > > > fix > > > the multiple _start problem. > > > > > > Regards, > > > louis > >Article: 107196
Peter Alfke (alfke@sbcglobal.net) wrote: <snip> : Higher performance requires radical innovation and real cleverness : these days. : Peter Alfke Such as this? http://www.tip.csiro.au/ISEC2003/talks/OWe2.pdf JPL and Northrop Grumman built a 5k gate 8 bit CPU running at 20GHz by using superconducitng logic on a chip, it needs helium cycle cryogenics to hit 4.5k, but on the other hand it doesn't generate much heat being superconducting... I'd have thought gate arrays would make an excelent tool for investigating the technology... cdsArticle: 107197
Antti wrote: > GaLaKtIkUs=99 schrieb: > > > In the Virtex-4 user guide (ug070.pdf p.365 table 8-4) it is clearly > > indicated that for INTERFACE_TYPE=3DNETWOKING and DATA_RATE=3DSDR the > > latency should be 2 CLKDIV clock periods. > > I instantiated an ISERDES of DATA_WIDTH=3D6 but I see that valid output > > appears on the next CLKDIV rizing edge. > > Any explanations? > > > > Merci d'avance! > > advice: dont belive the simulator, its not always correct. > place the iserdes and chipscope ILA into dummy toplevel, load some FPGA > and look what happens in real silicon. > > Antti Unfortunately the tests on the board using Chipscope gave the same results as in simulation. I looked for informations on this issue on Xilinx's site but I didn't found any thing. So I assume that the issue is that I didn't understand the table 8-4 in the Virtex-4 UserGuide. If you can help you're welcome (I can send you simulation/implementation files I used). I'm going to make the same simulations/tests on-board as described a few posts higher but for wordlengths>6 i.e where 2 ISERDES are needed. CheersArticle: 107198
jacko wrote: >> > pin compatability is just customer support, how about a 1 pin high > implies a self program from a small hardwired rom, which gets enough of > the chip off the ground, to work as a programmer for itself and others. > > We have had that since the beginning, 20 years ago. It is called "Master Mode Configuration" Peter Alfke, XilinxArticle: 107199
Totally_Lost wrote: > Austin Lesea wrote: > >>There is no such thing as "over-clocking" a FPGA > > > Since Austin is technically and english language challenged, here is an > aid to decrypting this bull shit claim that Austin proudly would object > that it's only ammonium nitrate .... hehehehe > > The Free On-line Dictionary of Computing (27 SEP 03) [foldoc] > > overclocking > > <hardware> Any adjustments made to computer hardware (or > software) to make its CPU run at a higher clock frequency > than intended by the original manufacturers. Typically this > involves replacing the crystal in the clock generation > circuitry with a higher frequency one or changing jumper > settings or software configuration. > > If the clock frequency is increased too far, eventually some > component in the system will not be able to cope and the > system will stop working. This failure may be continuous (the > system never works at the higher frequency) or intermittant > (it fails more often but works some of the time) or, in the > worst case, irreversible (a component is damaged by > overheating). Overclocking may necessitate improved cooling > to maintain the same level of reliability. > > (1999-09-12) > Mr Lost, >> So ... you are claiming any valid design will run in any Xilinx FPGA >> at max clock rate? I'm afraid you don't know what you are talking about as far as FPGAs go. The clock rate for an FPGA design depends heavily on the design. It is not like a microprocessor where you have a fixed hardware design that has been characterized to guarantee running at a specific clock rate. Instead, it is up to the FPGA user to perform a timing analysis on his design to determine what the maximum clock rate for that design is. The max clock rate depends on the logic and routing delays for that design. As part of the due diligence for the design, the designer needs to perform a timing analysis which in turn gives you a minimum clock cycle time for which the design is guaranteed to work. Overclocking then only makes sense in the context of that design. If you clock it faster than the minimum cycle time found in the timing analysis, then you are overclocking the design. This is usually considered poor form for hardware design, but it can certainly be done if you are aware of the risks. That said, in laboratory conditions, FPGA designs can usually be overclocked by 10 or 15% of the max clock frequency for that design as found the timing analysis. The maximum toggle rate in the data sheets only tells you what the flip-flops in the fabric are capable of doing reliably over the temperature range. That doesn't take into account the propagation delays for the routing or combinatorial logic surrounding those flip-flops....those parameters, which to the user are far more important than the max toggle rate (that number is mainly for the benefit of export restrictions), weigh heavily on the specification for the user's design. Attempting to use the max toggle rate of the flip-flops to define overclocking would be like trying to define the overclocking of a CPU in terms of the switching time of a transistor on the die rather than the aggregate that comprises the useful circuit. The difference that is probably confusing you is that the CPU is characterized as a completed circuit design, similar to doing the timing analysis on a placed and routed FPGA design. Overclocking then is clocking the design at a clock rate faster than the clock rate the design was intended to be clocked at. Overclocking does not make sense outside the context of a specific design.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z