Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Peter Alfke <peter@xilinx.com> writes: > ...or you clock in the address on every FPGA clock, but use ALE being Low as a > clock disable. That way you have the address already moved over to the FPGA > clock domain. > Clocking "new" data into a register that already contains the same data just > means that nothing is going to happen in the register. :-) > > Peter Alfke, Xilinx Applications Don't forget to synchronize ALE to your local clock first! DavidArticle: 49151
On 1 Nov 2002 07:21:10 -0800, joefrese@hotmail.com (Joe Frese) wrote: >> This methodology only works for small, simple designs. The project >> I'm working on at the moment has about 1/2 million lines of source. >> Yes, we look over it carefully, but simulation is vitally important >> for a successful end result. I suppose that's true, especially for masked parts or in situations where concurrent development doesn't allow a lot of test/integration time. It's just that, most of the time, just localizing a problem makes the designer look back at his design, slap himself upside the head, and see the bug and fix it. If you aren't careful, reliance on simulation takes you down the slippery slope of quick-and-sloppy design followed by N-times as much simulation and verification. For big ASCIs, N is often in the region of 3 or so, but that makes sense when a mask set costs hundreds of kilobucks and takes months to turn. > >The design in question consists of about 5k lines of code; there are >no functional simulation errors, nor do any errors appear in the >timing report. The post-PAR simulation errors seem to be caused by >setup and/or hold violations in logic that is synchronizing signals >across two time domains, and as "nospam" pointed out, these are >allowable and inherently unavoidable. > >The issue as I see it is the shear volume of errors I get under >simulation, simply because there are so many such points of >synchronization--the FPGA must interface with three external time >domains, and a handful of new internal domains were created for >various reasons. It makes it very difficult under simulation to >distinguish these allowable/unavoidable timing errors from >legitimately critical timing errors. Obviously, in my next design, >I'll minimize the number of time domains and the size of the >interfaces between domains, in order to make this task more manageable >. . . but is it worth ripping this finished design apart to get more >acceptable simulation results? Or given the size, is the methodology >John suggested (careful code inspection) sufficient? Thanks again for >your input. > >Joe Joe, if, as you say, the thing actually works, then you can reasonably ignore the timing errors that are artifacts of crossing time domains, as long as you are confident that your synchronization scheme is sound. Of course, if you have a bazillion reported timing errors, it will be hard to separate them... In a case like this, I would test a couple units at varying clock rates and blast them with heat guns and freeze spray to verify timing margins. You ought to do this all the time, anyhow. JohnArticle: 49152
On Fri, 01 Nov 2002 14:24:38 -0800, Peter Alfke <peter@xilinx.com> wrote: >...or you clock in the address on every FPGA clock, but use ALE being Low as a >clock disable. That way you have the address already moved over to the FPGA >clock domain. >Clocking "new" data into a register that already contains the same data just >means that nothing is going to happen in the register. :-) > >Peter Alfke, Xilinx Applications >=============== Why not use ALE to gate a transparent latch? That maximizes setup time for whatever you plan to do with the address later. JohnArticle: 49153
I like the clock enable idea - that leaves virtually no requirement for hold time. The thing that hasn't been mentioned explicity is how to know when the data is valid. When the ALE goes low about the same time as the clocks, Peter's approach results in valid data. But the way to know the data is valid is to detect the falling edge of the ALE at a single register - a lone decision point. in Verilog you can see indecision in the code below: reg rALE, ALEfell; always @(posedge clk) rALE <= ALE; always @(posedge clk) ALEfell <= ~ALE & rALE; If the ALE falls about the same time as the posedge clk, rALE may or may not go low. ALEfell already knows rALE was high so ALE may or may not be seen as falling. If ALEfell decided that ALE did, indeed fall but rALE thought it hadn't, ALEfell ends up valid for two clock periods. Conversely if rALE caught the fall but ALEfell didn't at that hairy edge of the clock, the edge detection would be missed altogether. It's pretty "standard" to only use a registered version of ALE to detect the falling edge like in the following lines: reg rALE, rrALE, ALEfell; always @(posedge clk) rALE <= ALE; always @(posedge clk) rrALE <= rALE; always @(posedge clk) ALEfell <= ~rALE & rrALE; There's no confusion since rALE will always be stable for both rrALE and ALEfell, though there's a delay of at least one clock - this could be acceptable for your needs. What you do gain is complete coverage by the timing analysis without hiccups. A "quicker" way to get the same result is to know that there might be the indecision I spoke of and feed the ALEfell status back into the rALE: always @(posedge clk) rALE <= ALE | rALE & ~ALEfell; always @(posedge clk) ALEfell <= ~ALE & rALE & ~ALEfell; If the double-clock condition came up such that rALE was slow on the uptake but ALEfell detected the fall, the second ALEfell is suppressed by the first and rALE gets a valid value on that second posedge clk. If the skip-condition showed up, rALE would want to fall on that first posedge clk but waits another cycle for ALEfell to assert. In either case, when ALEfell first asserts, a second ALEfell is supressed and rALE is allowed to fall. The "hiccup" I mentioned is that the ALEfell value will work fine for you as long as you have margin in all the destinations fed by ALEfell. The indecision on whether ALEfell is high or low is a smaller value these days as Peter Alfke will proudly point out. The "metastability time" increases the clock-to-out time of the ALEfell result while the register broods over its indecision but -thankfully - maks up it's mind rather quickly. If I interpret Peter's recent posts (I have yet to read his article on xilinx.com), the additional margin only needs to be 500ps or so in the latest generation part to guarantee nearly zero change for the metastability to affect your results. Have fun with your design! - John_H "Peter Alfke" <peter@xilinx.com> wrote in message news:3DC2FF26.253B47E4@xilinx.com... > ...or you clock in the address on every FPGA clock, but use ALE being Low as a > clock disable. That way you have the address already moved over to the FPGA > clock domain. > Clocking "new" data into a register that already contains the same data just > means that nothing is going to happen in the register. :-) > > Peter Alfke, Xilinx Applications > =============== > Uwe Bonnes wrote: > > > Henri Faber <henri.faber@emdes.nl> wrote: > > : I am working on an FPGA design which involves an interface to a 8051 > > : style uController. The clock of the uC is not available to me. The > > : FPGA is running on > > : an unrelated faster clock. > > : I am trying to clock in address which is stable around the falling > > : edge of ALE. > > : The address setup time to ALE going low is long enough to guarantee > > : that the > > : address is clocked in correctly at least once before ALE goes low. > > : What will > > : happen when ALE falls at the same time that the flip-flops are being > > : clocked? > > : At the moment ALE falls the data on the D pin of the flip-flops is the > > : same as > > : the contents of the flip-flops. > > > > Why don't you clock the address latches with ALE going low? If you need the > > address on the FPGA clock domain, take care for the clock domain > > crossing. But mostly the address is needed to read out other registers or > > write to them. Latch the registers to read out on negedge ALE and sample the > > registers set with latches clocked from FPGA clock. I hope you get the > > picture... > > > > Bye > > -- > > Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de > > > > Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt > > --------- Tel. 06151 162516 -------- Fax. 06151 164321 ---------- >Article: 49154
I am looking for a small (and fast) CPU core optimized for FPGA (Actel or Altera). I have found the xr16 and gr0040 at www.fpgacpu.org (but it is not licensed for commercial use). Know of any good ones? Especially with development tools (e.g. gcc/gdb, etc). Thanks, Lennie ArakiArticle: 49155
<http://www.vautomation.com/Turbo.htm> On Sun, 03 Nov 2002 01:54:23 GMT, "lenman" <lenman@hotmail.com> wrote: >I am looking for a small (and fast) CPU core optimized for FPGA (Actel or >Altera). I have found the xr16 and gr0040 at www.fpgacpu.org (but it is not >licensed for commercial use). > >Know of any good ones? Especially with development tools (e.g. gcc/gdb, >etc). > >Thanks, >Lennie Araki > >Article: 49156
Guys are we talking about the samer thing here? Hardware design is a big subject. ICs, ASICs, FPGAs, SoC/ SoPC. The requirements and constraints for the design inevitably change depending upon the target and the wishes and needs of the customer. We bang on about QoR, size, area performance...why don't we use schematic capture and layout, this will give us serious efficiency? Probably because multi-million gate designs, or even multi-hundred thousand gate designs would take an eternity to complete and the product would be defunct before it got anywhere near the market. The time-to-market, productivity and cost issue drove the development of VHDL as a simulation langauge...the level of design abstraction was raised. -It wasn't designed for implmentation (only a subset is implementable) and thats why RTL was then developed. HLLs are just another increase in the level of design abstraction and their 'C' base makes them attractive for system or SoPC design. Let's get talking the same language here. It's not difficult to implement parallelism in a software language. Handel-C and SpecC use the 'par' construct developed through CSP (Communicating Sequential Processes) a notation for controlling levels of abstraction. Check out CSP and Hoare on google for more information. (Hoare developed CSP at Oxford University). I don't claim to be an expert, but this work and theory has stood the test of time. It has been discussed and critiqued and has remained a proven theory, and is widely accepted. Why use a C based language? Taking a spec and re-writing it into HDL is time-consuming and can be error prone. It's ...easier to migrate a spec to a HLL and it certainly helps if you have libraries of legacy C IP in your company - encryption, compression, protocol stacks - things that can benefit from being in hardware. And the more you think about it, when we optimise our design for area, speed, efficiency, if it is a part of a system how do we know the partition we are optimising is correct? How did we test and verify the design partition? It's not good engineering practice to spend many months of time, effort, dollars and expertise optimising something that from the outset is not optimal. HLLs can help at the partitioning stage. Fundamentally there us no one right or wrong way to design hardware. Look at your design requirements, what your customer wants and what is important to them, then choose you methodology, language and tool flow - be that language an HDL, HLL, block based design or Schematic Capture. Things are changing, we either influence the direction of that change and or stand by and watch it happen. Noel Martin Thompson <martin.j.thompson@trw.com> wrote in message news:<u3crmd5m0.fsf@trw.com>... > Ray Andraka <ray@andraka.com> writes: > > > I agree that the C to hardware things have their place. If nothing else, it > > lowers the bar for entry into FPGAs. What is missing is the necessary caveat > > explaining that there is a lot of performance/density left on the table...and > > that is much more than is commonly touted. I think an order of magnitude > > wouldn't be far wrong as an average. > > > > I wouldn't argue there! > > MartinArticle: 49157
For Altera check out Altera's own Nios which comes in 16 & 32 bit. http://www.altera.com/products/devices/nios/nio-index.html Karl. "lenman" <lenman@hotmail.com> wrote in message news:jl%w9.69343$C53.3575733@news2.west.cox.net... > I am looking for a small (and fast) CPU core optimized for FPGA (Actel or > Altera). I have found the xr16 and gr0040 at www.fpgacpu.org (but it is not > licensed for commercial use). > > Know of any good ones? Especially with development tools (e.g. gcc/gdb, > etc). > > Thanks, > Lennie Araki > > >Article: 49158
Hi Djohn, the DFT strategies are used to allow testing components after production. Sometimes the process is somehow faulty, so the silicon vendor must be able to discard faulty pieces and investigate the reason. One of the most used DFT strategies used for big IC (1 Mgates) is the scan chain. The design is synthesyzed using scanable flip-flops. They have one more input called Test Input TI and a selector called Test Enable. When TE is active (test mode), the FF uses TI as input, else the conventional input. TI is connected to the ouput of another FF and all TE are connected toghether. Thus all the FFs are connected to make a long chain (the Scan Chain) and selecting TE you can scan-in a special pattern and check the result (pattern+result=Test Vectors). For example: put the circuit in test mode (the scan chain is like a long shift register); scan in a pattern; put it in normal mode and run 1 clock cicle (the FF will now load new values according to the logic between them); put the circuit in test mode; scan out the result (the result will depend on the design itself). You can simulate this test vectors and see the behaviour. If the behaviour is OK you will release these Test Vectors to the silicon vendor who will run them on the chips soon after the production. If the outputs are as expected, the pieces will be delivered to the customer, otherwise they will be trashed. The faulty results can be analyzed to find where the circuit is faulty. ATPG (Automatic Test Pattern Generator) is the tool which takes as an input the circuit netlist, containing the scan chain, and generates the input patterns (and I guess the result). The vendor has a test machine where the components are placed (it's a sort of pattern generator and logical anayzer togheter). The input patterns are scaned-in the chip and the results are compared to the expected results. BIST is another story. Built In Self Test is a test usually designed for memories. Memories are likely to be faulty, so they need special test algorithms to be run. The BIST block implements this specail algorithms (March-...) and is usually provived togheter with the memory block. During the normal mode BIST is quiet. Usually it is connected to the TAP. The TAP should be designed so that you can issue an instruction to start the BIST and check the result. SO, if you have memories in your UART controller, you can implement it, otherwise there's no point. But I suppose if you had had a memory, you would have known about the BIST. Thats all. I am a HW designer from Italy and work for Ericsson, I'm 32 and my experience is not so big. What do you do ? Student ? Well, I hope I clarified you doubths. Regards AlessandroArticle: 49159
Thanks but I looked at this one also but it is still quite large (1100 LEs vs 200 for gr0040). Lennie "Karl de Boois" <karlIGNORETHISPART@chello.nl> wrote in message news:aIax9.57$SU4.10794@amsnews03.chello.com... > For Altera check out Altera's own Nios which comes in 16 & 32 bit. > > http://www.altera.com/products/devices/nios/nio-index.html > > Karl. > > "lenman" <lenman@hotmail.com> wrote in message > news:jl%w9.69343$C53.3575733@news2.west.cox.net... > > I am looking for a small (and fast) CPU core optimized for FPGA (Actel or > > Altera). I have found the xr16 and gr0040 at www.fpgacpu.org (but it is > not > > licensed for commercial use). > > > > Know of any good ones? Especially with development tools (e.g. gcc/gdb, > > etc). > > > > Thanks, > > Lennie Araki > > > > > > > >Article: 49160
David Rogoff wrote: > > Don't forget to synchronize ALE to your local clock first! Not necessary, since the original posting stated that data is stable for several FPGA clock ticks around the falling edge of ALE. At least, that's the way I understood it. Peter AlfkeArticle: 49161
> Thanks but I looked at this one also but it is still quite large (1100 LEs > vs 200 for gr0040). To my knowledge, there are still no comparably small, commercially deployable, C programmable 16-bit RISC CPUs. Not even within a factor of three! I don't understand it -- it's not like I'm keeping the secret sauce a secret. For example, I wrote about an 80-LUT-datapath 16-bit RISC here in 1994 (http://fpgacpu.org/usenet/homebrew.html). You just have to learn and apply The Art of High Performance FPGA Design (http://www.fpgacpu.org/log/aug02.html#art) to the problem of building a simple (http://www.fpgacpu.org/log/sep00.html#000919) processor that covers the required functionality. That said, 16-bit Nios 2.x is now down around 900 LUTs. 32-bit Microblaze is in the same ballpark. Both have excellent dev tools support. It pains me to say this, but unless you are building a multiprocessor, 900 LEs is not so egregious a footprint -- thanks to Moore's Law. For example, if you start designing now, 900 LEs is only ~1/3 of the smallest Cyclone part (EP1C3). (Nice job, Altera marketing -- to determine approximately how many LUTs are in your Cyclone parts, we simply multiply the part number suffix by 1000.) Please, if you don't mind, please explain why one of these well supported 900 LUT soft cores is untenable for your application. Thanks. Jan Gray, Gray Research LLCArticle: 49163
Too hastily, I wrote: > Not even within a factor of three! Mike Butts' xr16vx (http://users.easystreet.com/mbutts/xr16vx_jhdl.html), an independent JHDL implementation of the xr16 instruction set architecture, for Spartan-II/Virtex, uses ~255 slices -- a factor of two. It is licensed under GPL, but requires you use the lcc-xr16 toolset which is presently licensed only for non-commercial use. If there *are* other 16-bit C-programmable soft processor cores under (say) 400 LUTs out there, please let us know. Jan Gray, Gray Research LLCArticle: 49164
Too hastily, I wrote: > Not even within a factor of three! Mike Butts' xr16vx (http://users.easystreet.com/mbutts/xr16vx_jhdl.html), an independent JHDL implementation of the xr16 instruction set architecture, for Spartan-II/Virtex, uses ~255 slices -- a factor of two. It is licensed under GPL, but requires you use the lcc-xr16 toolset which is presently licensed only for non-commercial use. If there *are* other 16-bit C-programmable soft processor cores under (say) 400 LUTs out there, please let us know. Jan Gray, Gray Research LLCArticle: 49165
How do you model an open collector line in verilog. I can see you can do something like assign bus = output ? 1`bz : 0 But how do you make the bus pulled up when it's not being driven? Thanks RalphArticle: 49166
Anyone interested in a 16-bit data / 24-bit address CPU? I would think that there are many applications where you don't care so much about speed, but you do need more than 16-bits of linear address space. Anyway, if there is much interest, I'll finish off a verilog design I have for a tiny 6809-like CPU, but with 16-bit accumulators and 24-bit index registers and addresses. -- /* jhallen@world.std.com (192.74.137.5) */ /* Joseph H. Allen */ int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0) +r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2 ]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}Article: 49167
Hello, I was wondering if anyone had any urls, notes, or just general information on using an fpga with cpu core, such as risc, then connect to PC (usb, lpt etc) and have an application send the fpga cpu commands such as "add 3, 5" and the result will be returned to the PC (8). Thanks, newb.Article: 49168
On Mon, 4 Nov 2002 16:14:22 +1300, "Ralph Mason" <masonralph_at_yahoo_dot_com@thisisnotarealaddress.com> wrote: >How do you model an open collector line in verilog. > >I can see you can do something like > >assign bus = output ? 1`bz : 0 > >But how do you make the bus pulled up when it's not being driven? > >Thanks >Ralph > This should work: pullup (weak1) (bus); assign drives a strong 0 and wins over pullup. Muzaffer Kal http://www.dspia.com ASIC/FPGA design/verification consulting specializing in DSP algorithm implementationsArticle: 49169
Hi all I use APEX20k400E, and I use the lpm_ram_dp in quartus II to construct my register file, all input signal is registered , so read and write are all sync so if I read and write the same address of this memory at the same clock edge, can I get the just writen value? ThanxArticle: 49170
Ralph Mason wrote: > > How do you model an open collector line in verilog. > > I can see you can do something like > > assign bus = output ? 1`bz : 0 > > But how do you make the bus pulled up when it's not being driven? > > Thanks > Ralph just use the "pullup" predefined primitive of Verilog. UtkuArticle: 49171
Hello, every one. I'm working on a FPGA design project, it includes many module. It'll take so long a time to run full implement one time when I only make a little change in one module. So I wonder how I can do incremental implement. I use Synplify Pro and ISE 5.1i. I've read the Xilinx document on how to do Incremental designing, but not really understand, it seems that I need do it in Synplify Pro GUI mode. Is it impossible to do it in ISE? And where can I find any example code? Thanks for any advenceArticle: 49172
VIRTEX II -4 439.895MHzArticle: 49173
> If there *are* other 16-bit C-programmable soft processor cores under (say) > 400 LUTs out there, please let us know. I have one in development which is similar to the gr00xx series but different as well. I'm not sure of the size, but it should be relatively small on the order of 4-500 LUTs (it has more functionality). I think part of the problem is that both small size and speed have been asked for along with a sixteen bit datapath. Quite often the things that are done to increase performance (like using dual or triple ported registers, a wide datapath, pipelineing) also increase the size. One has to be realistic about what is expected. It just plain takes the number of LUTs that it takes to implement a certain level of functionality and performance. Most cores with the same level of performance and functionality will be roughly the same; there is no magic. I thiink the MicroBlaze and Nios cores set a good measuring stick. If you're interested I can post some information on the core on the web, but I don't like to post things I haven't finished. Rob rob@birdcomputer.ca www.birdcomputer.caArticle: 49174
Hello. One of our XC95288XL exhibits ecessive heating (untouchable!). It contains a multiplexor to select an output clock from 3 different input clocks. Two of the input clocks are attached to a GCK pin while one is attached to an IO/GTS pin. When selecting the the non-GCK clock source the parts start to heat-up dramatically. Frequencies are in the range between 60 and 100MHz. The device contains 180 macrocells in high-perf mode. Only about 20 macrocells are affected by the input clock selection, the others are connected to a GCK clock. Any hints ? Thanks, Andreas P.S. One idea is that the IO/GTS input drives the GTS net with the clock even if the GTS signal is not used ???
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z