Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
As I recall, you were getting speeds of ~135 MHz in V2. You should be able to get a fully pipelined processor using BRAMs up to 200 MHz or so without any big problems. In VirtexII, the carry chains will be the limiting factor, not the BRAM if you do it right. In Virtex and VIrtexE, you can double the width of the BRAM and then use registers to assemble consecutive accessess. It does get a bit messy in that case because it introduces pipeline misses. Goran Bilski wrote: > Hi, > > I agree that you can if you also double the clock frequency of the pipeline, > creating parts of the normal clock. > What I meant was the keeping the same clock and just adding more pipestages. > > I have finally got the idea of multithreading but it not as easy to implement > since you need to find a good middle point in each > pipestage that can divide the pipestage into equal parts. > > The control path is also needed to split into subparts and you also need to find > good points to break it up. > The processor also definitely needs a cache which you can run at the double > speed or more ports to in order to get the data for each thread. > I think that would be the largest obstacle for multithreading MicroBlaze, the > number of ports to the BRAM is finite (2) and in my implementation BRAM is > almost already in the critical path. > > Göran Bilski > > "Nicholas C. Weaver" wrote: > > > In article <3DADAAB7.E96D53F2@Xilinx.com>, > > Goran Bilski <Goran.Bilski@Xilinx.com> wrote: > > > > >You can't just double the number of pipestage for a processor without > > >major impacts. For streaming pipeline which hardware pipelines are I > > >agree but for processor that can't be done. > > > > Uhh, yes it can. > > > > Double all the pipeline stages, double the register file, rebalance > > the delays now that you have more pipelining, and out drops a 2-thread > > multithreaded architecture. Each single thread now runs slower, but > > aggregate throughput (sum of the two threads) is increased. > > > > It is so obvious yet unintuitive that nobody has actually DONE it > > before. :) > > -- > > Nicholas C. Weaver nweaver@cs.berkeley.edu -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 48376
The SRL16's make the permanent state really easy to store too. Ken McElvain wrote: > On big advantage of multi threading is that the pipeline > interlocks can be eliminated if the number of threads is larger > than the longest feedback path in the pipeline. For example, > a branch instruction does not have to stall waiting for > conditions from the preceeding comparison. This yields > some boost in the total performance. > > Permanent state such as conditions codes, register files have to be > expanded into larger memories with part of the index being the current > thread id, but other registers mostly do not have to be modified. Given > the distributed ram capabilities in Xilinx parts, this is pretty > cheap. > > The first place I saw this was the CDC 6600 IO processors, which > I belive ran 16 threads. > > - Ken > > Goran Bilski wrote: > > > Hi, > > > > "Nicholas C. Weaver" wrote: > > > > > >>In article <3DAD80F2.DC5AD4C4@Xilinx.com>, > >>Goran Bilski <Goran.Bilski@Xilinx.com> wrote: > >> > >>>Hi, > >>> > >>>Sort of. > >>> > >>>The complete decoding and the ALU is around 10-13% of the design. > >>>The actual instruction decoding is less than 5%. > >>> > >>>Make it multithreading as I understand is to have more than 1 instructions > >>>streams in the pipeline. > >>>What is the benefit unless you double the pipeline and have two data pipelines? > >>>Almost nothing > >>> > >>Uhh, you don't double the pipelines, you take the single pipeline, > >>double up the registers IN them, and then move the regsters to > >>rebalance all the pipeline stages, as now you have 2x the registers > >>through any fedback loop, allowing you to up the clock frequncy alot. > >> > >>If you do this to every register in the core (and tweak the RF), a > >>multithreaded design just sort of "dros out" automatically. > >> > >>You can even write a tool to do that automatically. > >> > >>What happens in the end is is you take adantage of the two threads to > >>up the clock substantially. Each individual thread is now a little > >>slower, but the throughput for the 2 threads is now substantiall > >>higher. You use more pipelining and more power, and you may or may > >>not end up thrashing the caches, but itdoes work. > >> > >>I can send you a paper submission and a thesis chapter draft on the > >>subject if you want. > >> > >> > > > > Please do. > > > > If you double all the registers in the data pipeline, hasn't you doubled the > > pipeline? > > Or is all functionality between the pipestages shared? > > > > > >>>So with two threads in MicroBlaze, to double the pipeline is to > >>>double the size of MicroBlaze. You also have to double the > >>>instruction fetching data throughput in order to get the two streams > >>>busy. That would put a big burden on the bus infrastructure and > >>>external memory interface which suddenly has to double it's > >>>performance. The doubling of the pipeline and added control handling > >>>WILL also lower the maximum clock frequency of MicroBlaze. > >>> > >>You don't need to double the exteral memory interface if you share the > >>cache, this is especially true on workloads where the threads are > >>related. The external memory interfare is now 2x the CLOCK, but you > >>could slow it down from there and arbitrate beween the two streams of > >>execution. > >> > > > >>You also probably want to make the feeding of interrupts a little > >>different, so you can designate one thread as receiving the > >>interrupts. > >> > >> > >>>Say you suddenly would like to have 5 threads instead of 2. That is a major > >>>change of the multithreading MicroBlaze and almost impossible to get the > >>>instruction fetching to keep up. With multiprocessing, just add another 3 > >>>MicroBlazes and you're done. > >>> > >>What you do is you have a 1 thread and a 2 thread version (going > >>beyond 2 threads seems to be less effective, maby 3 depending on the > >>architecture). From the exterior, however, they still look normal. > >>You can still tile that like any other core to create a multiprocessor > >>machine. > >> > >> > >>>BUT there is always a catch and that is how you write programs for these > >>>systems. > >>> > >>"one thread for I/O, one thread for processing" does come up in some > >>cases. > >> > >> > >>>Göran > >>> > >>>Hal Murray wrote: > >>> > >>> > >>>>>Another approach is to add multi-threading capabilities but I think that > >>>>>multi-processing is better for FPGA than multi-threading. > >>>>> > >>>>Why? > >>>> > >>>>If I understand what multi-threading means, the idea is to interleave > >>>>alternate cycles of two execution streams in order to reduce the > >>>>losses due to stalls. > >>>> > >>>>It looks like it "just" requires an extra address bit (odd/even cycle) > >>>>to the register file and the same bit selects between pairs of special > >>>>registers like the PC. > >>>> > >>>>Are you telling me that the ALU and instruction decoding is small enough > >>>>so that I might just as well build two copies of the whole CPU? > >>>> > >>>>-- > >>>>The suespammers.org mail server is located in California. So are all my > >>>>other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited > >>>>commercial e-mail to my suespammers.org address or any of my other addresses. > >>>>These are my opinions, not necessarily my employer's. I hate spam. > >>>> > >>-- > >>Nicholas C. Weaver nweaver@cs.berkeley.edu > >> > > -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 48377
Good question. The road is littered with FPGA start ups and even big companies that tried to get in on the action: Dynachip, Gatefield, Motorola, TI, AMD,.... rickman wrote: > Where is the money to start a new FPGA going to come from... ? > > > Rick "rickman" Collins > > rick.collins@XYarius.com > Ignore the reply address. To email me use the above address with the XY > removed. > > Arius - A Signal Processing Solutions Company > Specializing in DSP and FPGA design URL http://www.arius.com > 4 King Ave 301-682-7772 Voice > Frederick, MD 21701-3110 301-682-7666 FAX -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 48378
Hal Murray wrote: > > BUT there is always a catch and that is how you write programs > > for these systems. > > Standard programming problem. People are getting pretty good > at it. Yes, there are lots of applications where it doesn't work. > > If you can't take advantage of multi-threading then you wouldn't > be able to use multi-processing either. > Is that true? Don't you need to actually have two threads in order to use the multi-threading but multiprocessor parallelism can be more fine grain. ex. A code where the inner loop has a function call where some operations take place. Is it easier to thread that function or just place the function in another processor? Isn't it how data is move between two processor/threads that is more crucial? How does you actually move data between two threads in the same processor? Göran > > -- > The suespammers.org mail server is located in California. So are all my > other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited > commercial e-mail to my suespammers.org address or any of my other addresses. > These are my opinions, not necessarily my employer's. I hate spam.Article: 48379
Hello all, I've perused various Xilinx/Altera threads in this newsgroup with due interest, and would now like to invite comments and thoughts on the situation I find myself in: I've done a modest amount of design recently with Xilinx and am overall comfortable with the design flow, assorted support, and achievable performance. Recently a new application has come up requiring more horsepower (a PCI accelerator card for some compute- intensive portions of an imaging application). To make this fly, I need for three things to come together: (1) devices (2) tools (3) a development board. In a nutshell the Altera offering is tempting... (1) Devices: The Stratix prices I'm being offered are aggressively low. I'm comparing slow speed grades of EP1S10/20 with 2V1000/2000 and I get the feeling that Stratix has the edge on raw speed and on DSP block capability. Things like distributed memory and SRL that are strengths of Virtex-2 don't seem too important for this application. (2) Tools: I went to a Quartus seminar. Quartus seems learnable, no huge leaps for an ISE user. Overall the Altera tools cost considerably less too, when you look at the packages available and the implications of Xilinx' Time-Based License. I'm unsure about the level of support and bugginess of Quartus, but then reports of ISE 5.1i aren't exactly flattering. (3) With some difficulty I have identified suitable development boards (PCI + enough off-chip memory) for both Stratix and Virtex-II, as it happens none of them are available today. So that's a wash. Anyway, I would be eager to hear from other folks re: wisdom of your experience, or re: pitfalls for the unwary, or if you've been looking at the same kind of decision... Thanks, -rajeev-Article: 48380
Yes, But I also have a embedded multiplier which already is using the registrated output. I can't add another pipestage in that path since there is nowhere to insert it. Then I have to add special arrangement if the instructions is using the multiplier in the control logic. I have painfully detect that minor tweaks in the control logic can easily make it the critical path. I think that is possible to have two threads in MicroBlaze but I not convince that it would give me more performance than two separate MicroBlazes. The overall area will be less than two MicroBlazes but not far from it. MicroBlaze has 700 LUTS and 500 DFFs. Double the number of flipflops and the slices count will go up. (There is also a lot of places for Virtex and VirtexE where I have used all in/out for a slice and even if there is a free DFF, it can't be reached.) It will not double the size but a significant increase will occur. The multiprocessor approach makes it much easier to add 10 extra processors(threads). Göran Ray Andraka wrote: > As I recall, you were getting speeds of ~135 MHz in V2. You should be able to get a > fully pipelined processor using BRAMs up to 200 MHz or so without any big problems. > In VirtexII, the carry chains will be the limiting factor, not the BRAM if you do it > right. In Virtex and VIrtexE, you can double the width of the BRAM and then use > registers to assemble consecutive accessess. It does get a bit messy in that case > because it introduces pipeline misses. > > Goran Bilski wrote: > > > Hi, > > > > I agree that you can if you also double the clock frequency of the pipeline, > > creating parts of the normal clock. > > What I meant was the keeping the same clock and just adding more pipestages. > > > > I have finally got the idea of multithreading but it not as easy to implement > > since you need to find a good middle point in each > > pipestage that can divide the pipestage into equal parts. > > > > The control path is also needed to split into subparts and you also need to find > > good points to break it up. > > The processor also definitely needs a cache which you can run at the double > > speed or more ports to in order to get the data for each thread. > > I think that would be the largest obstacle for multithreading MicroBlaze, the > > number of ports to the BRAM is finite (2) and in my implementation BRAM is > > almost already in the critical path. > > > > Göran Bilski > > > > "Nicholas C. Weaver" wrote: > > > > > In article <3DADAAB7.E96D53F2@Xilinx.com>, > > > Goran Bilski <Goran.Bilski@Xilinx.com> wrote: > > > > > > >You can't just double the number of pipestage for a processor without > > > >major impacts. For streaming pipeline which hardware pipelines are I > > > >agree but for processor that can't be done. > > > > > > Uhh, yes it can. > > > > > > Double all the pipeline stages, double the register file, rebalance > > > the delays now that you have more pipelining, and out drops a 2-thread > > > multithreaded architecture. Each single thread now runs slower, but > > > aggregate throughput (sum of the two threads) is increased. > > > > > > It is so obvious yet unintuitive that nobody has actually DONE it > > > before. :) > > > -- > > > Nicholas C. Weaver nweaver@cs.berkeley.edu > > -- > --Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email ray@andraka.com > http://www.andraka.com > > "They that give up essential liberty to obtain a little > temporary safety deserve neither liberty nor safety." > -Benjamin Franklin, 1759Article: 48381
Easiest to do via memory or register file. Thread timing has to make sure the value is available before using it. Goran Bilski wrote: > > Isn't it how data is move between two processor/threads that is more crucial? > > How does you actually move data between two threads in the same processor? > > Göran > > > > > -- > > The suespammers.org mail server is located in California. So are all my > > other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited > > commercial e-mail to my suespammers.org address or any of my other addresses. > > These are my opinions, not necessarily my employer's. I hate spam. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 48382
I instantiate a 8-bit counter cc8ce from the library and only use the MSB as output. What is the best way to deal with the rest 7 bits of the output from the counter? Thanks.Article: 48383
In article <3DAC6ED3.A8E721C4@Xilinx.com>, Goran Bilski <Goran.Bilski@Xilinx.com> wrote: >Another approach is to rely on advanced compiler techniques for >handling all the pipeline hazardous but it would make it almost >impossible to program the processor in assembler since the user has >to do the handling. I personally don't think that this approach >would gain that much more performance than MicroBlaze and you have to >spend a lot of resources on the compiler which could be used for >other stuff. MIPS: Machine without Interlocking Pipeline Stages. -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 48384
John_H <johnhandwork@mail.com> wrote in message news:<3DACCA15.BE61F97B@mail.com>... > "Theron Hicks (Terry)" wrote: > <snip> > > > I will keep the rework machine idea in mind if we ever need to go to the virtex2 > > or other BGA parts. How does one inspect the solder joints to determine whether the > > joints all have flowed correctly? How steep is the learning curve to mount the chip > > consistently? > > It looked like a straight-forward process. I didn't do any of the work myself, but one > of the model shop guys showed me the workings of the thing. With the appropriate > preheat section and a slider to the hot air reflow, it looked like a solid technique > without too much leeway for problems. While a good visual inspection requires expensive > inspection microscopes (extremely low profile view) for outside balls, the "right" way > to fully inspect the balls is with X-ray inspection. While neither is appropriate for > your needs, a cheap boundary scan might be the best way to go. Good idea. Or if you have tons of spare I/O, do a pin-out so that you can retransmit every received signal out to a test pad if loaded with a special FPGA load. > Talking to the vendors that want to sell you those stations, you can probably get a > better understanding of the ins and outs. The learning curve isn't horrible, but it takes some practice. IE, I'd guess you'll waste 5 to 15 devices as you try to get it accurate. The hope is that after you save off the profile, you have no more problems. With business so slow, the vendors may be hungry enough that you could convince them to do the dial-in for you if you provided the packages (real or samples). Note that vendors often have X-ray, so if you have questions about your process, they can double check a few boards for you (for no charge, at least in our experience). Have fun, MarcArticle: 48385
In article <6uelaqjgg4.fsf@chonsp.franklin.ch>, Neil Franklin <neil@franklin.ch.remove> wrote: >> What will you feed into the backend? Output from the X or A front end? > >At present just interest to feed in my own simple language. May add >XDL if that is sufficiently interesting. Start with post placement XDL. Make a translator from your language to XDL, and you can now verify that your language makes sense by feeding the results to xilinx routing etc. You can now use your language (output as XDL) with your tool and Xilinx tools, and use Xilinx toolflows to drive your backend routing/bitgen, and you also decouple your language from your tools. >Lesser case: You do know, that nearly all the fundamental patents in >FPGAs appeared around/pre 1985 (XC2000) and are now nearing their 17 >year, and so at end of life? Give a few years (needed for any hypothetical >bit-compatible scenario that makes cloning interesting anyway), and >quite a lot of them will be gone. But nobody cares about those parts. THe patents on xc4000 features will be in force longer, and the Virtex parts even longer, and Virtex is so far superior that nobody is going to want to clone older parts. You too can clone a 8088, but whats the real use apart from an interesting emulator? >Don't forget that then the only patents remaining are detail patents, >i.e. on the actual implementation. And that can be varied, without >losing bit compatibility. The situation is getting simpler the longer >time goes. Not exactly. I'd bet (although I haven't searched) that there are patents on the BlockRAMs, patents on the hex lines, which would make a bitfile clones of the Virtex really suffer in terms of performance if you used funky alternate structures. But I will be glad when the LUT patent is dead and buried. >Also you may want to take into account, that Altera managed to >survive Xilinxes patents, despite starting when they Xilinx had >maximal protection, and with Altera an latecomer. Any new competitor >has an easier situation. Or worse as both brand A and brand X lawyers get together to put the Serious Hurtin on a competitor. Part of the reason Altera was able to get away with it was a large number of other patents, so although Xilinx was first, there was a major mess of overlapping patents. A new competitor won't have that advantage. >And an further scenario: assume bit compatible becomes important. >Either X or A is the winner in becoming the standard. How long do you >think will the other of the 2 look at declining sales, until they >clone? And we already know that a patent battle between them 2 ends in >stalemate. It never will be. If it does, I'll eat my hat. >> Likewise the >> innards of an FPGA are patented and otherwise protected IP. If you try >> to make an FPGA that is bitstream compatible you will either violate >> patents or end up with a very unworkable chip design or both. > >You can get around an patent. Altera survived Xilinxes ones. AMD has >wrung patents off of Intel, by tripping them over other stuff. Via has >stopped Intel attacks by tripping them up. Ask an good IP lawyer about >all the possibilities. IP law is not the clear "you lose" that you >believe it to be. Your examples are all companies which entered the particular market around the same time, give or take. AMD also had serious cross-liscencing with Intel as well, of which many of the lawsuits were about. -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 48386
It won't quite double performance, but it also should not be a significantly larger area or you either aren't doing it right or it is already heavily pipelined. The gain is not raw performance, it is a gain of performance/area. Two separate instances will provide more MIPs one dual threaded machine, but at the cost of more area. Normally, the pipeline stages should be inserted so that their input comes from the LUT in the same slice, so it is not an issue if you used up all the inputs. The only blocking input in that case is the SR input if you are using a CLB RAM or SRL16. The control logic can usually be pipelined similarly, but it may require a fresh start at the design rather than patching the existing one. Goran Bilski wrote: > Yes, But I also have a embedded multiplier which already is using the registrated > output. > I can't add another pipestage in that path since there is nowhere to insert it. > Then I have to add special arrangement if the instructions is using the multiplier in > the control logic. > > I have painfully detect that minor tweaks in the control logic can easily make it the > critical path. > I think that is possible to have two threads in MicroBlaze but I not convince that it > would give me more performance than two separate MicroBlazes. The overall area will be > less than two MicroBlazes but not far from it. > > > MicroBlaze has 700 LUTS and 500 DFFs. Double the number of flipflops and the slices > count will go up. > (There is also a lot of places for Virtex and VirtexE where I have used all in/out for a > slice and even if there is a free DFF, it can't be reached.) > It will not double the size but a significant increase will occur. > > The multiprocessor approach makes it much easier to add 10 extra processors(threads). > > Göran > > Ray Andraka wrote: > > > As I recall, you were getting speeds of ~135 MHz in V2. You should be able to get a > > fully pipelined processor using BRAMs up to 200 MHz or so without any big problems. > > In VirtexII, the carry chains will be the limiting factor, not the BRAM if you do it > > right. In Virtex and VIrtexE, you can double the width of the BRAM and then use > > registers to assemble consecutive accessess. It does get a bit messy in that case > > because it introduces pipeline misses. > > > > Goran Bilski wrote: > > > > > Hi, > > > > > > I agree that you can if you also double the clock frequency of the pipeline, > > > creating parts of the normal clock. > > > What I meant was the keeping the same clock and just adding more pipestages. > > > > > > I have finally got the idea of multithreading but it not as easy to implement > > > since you need to find a good middle point in each > > > pipestage that can divide the pipestage into equal parts. > > > > > > The control path is also needed to split into subparts and you also need to find > > > good points to break it up. > > > The processor also definitely needs a cache which you can run at the double > > > speed or more ports to in order to get the data for each thread. > > > I think that would be the largest obstacle for multithreading MicroBlaze, the > > > number of ports to the BRAM is finite (2) and in my implementation BRAM is > > > almost already in the critical path. > > > > > > Göran Bilski > > > > > > "Nicholas C. Weaver" wrote: > > > > > > > In article <3DADAAB7.E96D53F2@Xilinx.com>, > > > > Goran Bilski <Goran.Bilski@Xilinx.com> wrote: > > > > > > > > >You can't just double the number of pipestage for a processor without > > > > >major impacts. For streaming pipeline which hardware pipelines are I > > > > >agree but for processor that can't be done. > > > > > > > > Uhh, yes it can. > > > > > > > > Double all the pipeline stages, double the register file, rebalance > > > > the delays now that you have more pipelining, and out drops a 2-thread > > > > multithreaded architecture. Each single thread now runs slower, but > > > > aggregate throughput (sum of the two threads) is increased. > > > > > > > > It is so obvious yet unintuitive that nobody has actually DONE it > > > > before. :) > > > > -- > > > > Nicholas C. Weaver nweaver@cs.berkeley.edu > > > > -- > > --Ray Andraka, P.E. > > President, the Andraka Consulting Group, Inc. > > 401/884-7930 Fax 401/884-7950 > > email ray@andraka.com > > http://www.andraka.com > > > > "They that give up essential liberty to obtain a little > > temporary safety deserve neither liberty nor safety." > > -Benjamin Franklin, 1759 -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 48387
Marc (and all), Peter wandered by a few moments ago, and asked me why I was so against pq packages? I'm not, honestly! The Spartan family always has small and inexpensive packages for their parts. But the issue of assembly and rework is a real one for the bga packages. If you can get a 8 layer board build in three days thru any number of on-line services, how do you assemble these prototypes? Well, as it so happens, most places close to big technology centers have assembly services just for prototyping! And it isn't that expensive! In fact, it costs a lot less than having all of the stuff and people yourself, and not using it (which is what most prototyping consists of: 5% building, and 95% debugging what was built). They do rework too. So ask around. Large assembly services sometimes have a small proto line just for this, and in areas where there is a lot more business, there are small specialty shops that just do proto runs. And I am talking here about five, two, and sometimes even one board for a reasonable price. Of course, you will have to learn how to assemble the kit of parts, and identify them, and have a good schematic, and a good bill of materials, and have a good assembly drawing. All of that you already have, right? Believe me, that if you don't have the right documentation, you are just playing at this, and you must enjoy the subsequent pain. The FPGA lab builds and gets assembled pcbs regularly that have packages up to the ff1517 size, and even though we do have the rework equipment (not that expensive if this is your business) to mount the parts ourselves, it is just too easy and too quick to get it done outside. The outside folks are so good at it they seldom need to x-ray, but they can do that if there is any question. Austin Marc Randolph wrote: > John_H <johnhandwork@mail.com> wrote in message news:<3DACCA15.BE61F97B@mail.com>... > > "Theron Hicks (Terry)" wrote: > > <snip> > > > > > I will keep the rework machine idea in mind if we ever need to go to the virtex2 > > > or other BGA parts. How does one inspect the solder joints to determine whether the > > > joints all have flowed correctly? How steep is the learning curve to mount the chip > > > consistently? > > > > It looked like a straight-forward process. I didn't do any of the work myself, but one > > of the model shop guys showed me the workings of the thing. With the appropriate > > preheat section and a slider to the hot air reflow, it looked like a solid technique > > without too much leeway for problems. While a good visual inspection requires expensive > > inspection microscopes (extremely low profile view) for outside balls, the "right" way > > to fully inspect the balls is with X-ray inspection. While neither is appropriate for > > your needs, a cheap boundary scan might be the best way to go. > > Good idea. Or if you have tons of spare I/O, do a pin-out so that you > can retransmit every received signal out to a test pad if loaded with > a special FPGA load. > > > Talking to the vendors that want to sell you those stations, you can probably get a > > better understanding of the ins and outs. > > The learning curve isn't horrible, but it takes some practice. IE, > I'd guess you'll waste 5 to 15 devices as you try to get it accurate. > The hope is that after you save off the profile, you have no more > problems. > > With business so slow, the vendors may be hungry enough that you could > convince them to do the dial-in for you if you provided the packages > (real or samples). Note that vendors often have X-ray, so if you have > questions about your process, they can double check a few boards for > you (for no charge, at least in our experience). > > Have fun, > > MarcArticle: 48388
Nicholas wrote: > MIPS: Machine without Interlocking Pipeline Stages. MIPS: "Microprocessor without Interlocked Pipeline Stages" Even the R4000 (which had a 2 cycle load-to-use delay, IIRC) made sure to have a 0 cycle ALU-to-use delay, whereas a superpipelined 250 MHz V-II RISC would necessarily a 1 cycle ALU-to-use delay. That is rather more challenging for the code scheduler to address. My (unpublished) V-II architectural studies concur back up what Goran has been writing. Furthermore, to 2-thread any such machine would increase the area intolerably, because so much area is tied up in register files (which would have to double in size). Also, if you grow the area of the processor, it will slow down because of increased interconnect delays. A compact processor is a fast processor. Goran wrote: > MicroBlaze has 700 LUTS and 500 DFFs. Double the number of flipflops and the slices count will go up. Is that an implementation improvement over the old 900+ LUTs figure, or did the old 900 LUTs figure include other non-core resources? Can you say what changed? (SPRAM reg files? (The following is fast enough for ~150 MHz operation: Register the result in a write-back register (in FFs) on CLK rising edge; present reg file write address and write-back data to reg file SPRAMs while CLK high; write results to SPRAMs on CLK falling edge; present reg file read address while CLK low; mux SPRAM outputs with immediate and/or forwarded result; and register in operand registers.)) Jan Gray, Gray Research LLCArticle: 48389
Hi Jan, Jan Gray wrote: > Nicholas wrote: > > MIPS: Machine without Interlocking Pipeline Stages. > > MIPS: "Microprocessor without Interlocked Pipeline Stages" > > Even the R4000 (which had a 2 cycle load-to-use delay, IIRC) made sure to > have a 0 cycle ALU-to-use delay, whereas a superpipelined 250 MHz V-II RISC > would necessarily a 1 cycle ALU-to-use delay. That is rather more > challenging for the code scheduler to address. > > My (unpublished) V-II architectural studies concur back up what Goran has > been writing. Furthermore, to 2-thread any such machine would increase the > area intolerably, because so much area is tied up in register files (which > would have to double in size). Also, if you grow the area of the processor, > it will slow down because of increased interconnect delays. A compact > processor is a fast processor. > > Goran wrote: > > MicroBlaze has 700 LUTS and 500 DFFs. Double the number of flipflops and > the slices count will go up. > > Is that an implementation improvement over the old 900+ LUTs figure, or did > the old 900 LUTs figure include other non-core resources? Can you say what > changed? > Ooops, I did it again!! Error on my side, I looked at the report file for the core and took the number of IO instead of LUTs. > > (SPRAM reg files? (The following is fast enough for ~150 MHz operation: > Register the result in a write-back register (in FFs) on CLK rising edge; > present reg file write address and write-back data to reg file SPRAMs while > CLK high; write results to SPRAMs on CLK falling edge; present reg file read > address while CLK low; mux SPRAM outputs with immediate and/or forwarded > result; and register in operand registers.)) > (Of 900 LUTs, 256 of them are the register file => around 30%) It's something that I have thought off but it would not leave much room for any logic handling of the register addresses or operations on the register output. Since I using a SRL16 as the instruction prefetch buffer and the output delay of a SRL16 is around 2 ns, I can't add much logic to the register address before they have to go to the register file. But if I got some spare time (hahahaha) this is something that I would try to do in order to get down the MicroBlaze size. Jan, You might be able to do a clean room implementation of a MicroBlaze where area is everything. If you have any spare time ;-) > > Jan Gray, Gray Research LLCArticle: 48390
>> >Another approach is to rely on advanced compiler techniques for handling all the >> >pipeline hazardous but it would make it almost impossible to program the processor in >> >assembler since the user has to do the handling. >> >I personally don't think that this approach would gain that much more performance than >> >MicroBlaze and you have to spend a lot of resources on the compiler which could be >> >used for other stuff. >> >> This seems like an interesting opportunity for an open source project. > >Aren't there already CPUs in FPGA open source projects? > >http://www.fpgacpu.org/ > >http://www.opencores.org/ > >The list is getting pretty long. I was thinking of the compiler rather than the hardware. The idea is to use one thread rather than multiple, and make the compiler smart enough to understand the pipeline delays, and either automatically insert noops or slap your wrist if you do something bad. Think of it as microcode rather than "normal" (whatever that means) RISC type code. You have to get your head around it, but once you get in the right mode it's not that hard. Maybe I was lucky to have a good mentor at the right time. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.Article: 48391
Various people wrote: > > > MIPS: Machine without Interlocking Pipeline Stages. > > > > MIPS: "Microprocessor without Interlocked Pipeline Stages" From long ago, when life was simpler. MIPS means 'MIPS', and has plenty of interlocking. Some implementations also have the HACF instruction, which we all challenge Goran to implement. (Halt and Catch Fire)Article: 48392
Falk Brunner wrote: > > So you need to upgrade you assembly technology. > You can get BGAs assembled by professional companys, or do it youself, using > some advanced assembly tools. > Even a amateur can do this. At least one in this world ;-)) > the problem is that often also "so-called" professional companies mess up with BGAs, without warning you :( Here : http://edg.umd.edu/heater/bga/ is what happened to our BGAs. -- Tullio Grassi ====================================== Univ. of Maryland - Dept. of Physics College Park, MD 20742 - US Tel +1 301 405 5970 Fax +1 301 699 9195 ======================================Article: 48393
I always tries to get in an instruction that always produce the value 42. But I for some reason can never past the management on that one. Göran Tim wrote: > Various people wrote: > > > > MIPS: Machine without Interlocking Pipeline Stages. > > > > > > MIPS: "Microprocessor without Interlocked Pipeline Stages" > > From long ago, when life was simpler. > > MIPS means 'MIPS', and has plenty of interlocking. Some > implementations also have the HACF instruction, which we > all challenge Goran to implement. (Halt and Catch Fire)Article: 48394
Hi, I'm working on a board that requires a 120MHz clock, a 24Mhz clock and a 40MHz clock. The board has a SpartanIIE device on it. I've never used a PLL (which the Spartan device has) so I'm unsure if I should use multiple crystals or if I can use the PLL to give the needed frequencies. One of the IC's requires 24 MHz so would it be possible to use a 24MHz crystal for this IC and also feed the output to the FPGA to generate the 40 and 120 MHz clocks? cheers, Jamie MorkenArticle: 48395
Nicholas C. Weaver wrote: > In article <3DADAAB7.E96D53F2@Xilinx.com>, > Goran Bilski <Goran.Bilski@Xilinx.com> wrote: > > >>You can't just double the number of pipestage for a processor without >>major impacts. For streaming pipeline which hardware pipelines are I >>agree but for processor that can't be done. >> > > Uhh, yes it can. > > Double all the pipeline stages, double the register file, rebalance > the delays now that you have more pipelining, and out drops a 2-thread > multithreaded architecture. Each single thread now runs slower, but > aggregate throughput (sum of the two threads) is increased. > > It is so obvious yet unintuitive that nobody has actually DONE it > before. :) > Sorry, it was done a long time ago. Try to find some info on the CDC 6600 IO processors. They ran 16 threads in a very deep pipeline.Article: 48396
In article <3DADEB1E.35A6770B@andraka.com>, Ray Andraka <ray@andraka.com> wrote: >Normally, the pipeline stages should be inserted so that their input >comes from the LUT in the same slice, so it is not an issue if you >used up all the inputs. The only blocking input in that case is the >SR input if you are using a CLB RAM or SRL16. The control logic can >usually be pipelined similarly, but it may require a fresh start at >the design rather than patching the existing one. One observation: If you want to $C$-slow the clock enable anyway, you want to loop it through LUT logic anyway, otherwise you get interferance between the two threads. Same actually goes for the reset as well. -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 48397
In article <3DAE1E83.60005@synplicity.com>, Ken McElvain <ken@synplicity.com> wrote: >> Uhh, yes it can. >> >> Double all the pipeline stages, double the register file, rebalance >> the delays now that you have more pipelining, and out drops a 2-thread >> multithreaded architecture. Each single thread now runs slower, but >> aggregate throughput (sum of the two threads) is increased. >> >> It is so obvious yet unintuitive that nobody has actually DONE it >> before. :) >> > >Sorry, it was done a long time ago. Try to find some info on >the CDC 6600 IO processors. They ran 16 threads in a very deep >pipeline. Those machines, also Hep and Tera, didn't have any bypassing. The proposed multithreaded approach I'm talking about keeps the bypassing but doubles the pipelining. The closest, "interleaved multithreading" kept some of the bypassing, but never took advantage that now the bypass feedback loops have more registers in it to up the clock rate by finer pipelining. -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 48398
See also http://www.fpgacpu.org/log/nov01.html#011122: "One can also build a simple barrel processor (say 4 threads (slots) x 32 regs = 128 entries of 32-bits = 2 16-bit ports on a single 256x16 BRAM, tripled cycled, or two BRAMs double cycled) and switch threads on each cycle. Then you can have a 4-deep pipeline without need for any result forwarding muxes (by the time you read an operand on thread[i], you have already retired that threads' previous result to the register file). This seems to me to be a perfectly simple and practical basis to issue instructions faster than the ALU + result forwarding mux + operand register recurrence critical path. Unfortunately single-thread performance is not so hot but in workloads such as a "network processing", who cares? This idea was taken to sublime levels in the 20-stage pipelined 5-threaded 1 GHz MicroUnity MediaProcessor (which would have needed some result forwarding, but not 18 stages worth)." Jan Gray, Gray Research LLCArticle: 48399
>I'm working on a board that requires a 120MHz clock, a 24Mhz clock and a >40MHz clock. Do the clocks have to be in sync? 24 is 120/5 and 40 is 120/3 so it's simple to generate the other clocks with a PLL or maybe just digital logic (PAL?) If you don't need them to be in sync (that is you have 3 separate chunks of logic and can put synchronizers between them) then compare the cost/risks of PLLs with three separate oscillator packages. Sometimes it's handy to be able to adjust (fudge?) a clock a bit (say 22 rather than 24) because a chunk of logic doesn't quite work at 24. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z