Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On 04/04/2013 10:38 AM, Syd Rumpo wrote: > On 29/03/2013 21:00, rickman wrote: >> I have been working with stack based MISC designs in FPGAs for some >> years. All along I have been comparing my work to the work of others. >> These others were the conventional RISC type processors supplied by the >> FPGA vendors as well as the many processor designs done by individuals >> or groups as open source. > > <snip> > > Can you achieve as fast interrupt response times on a register-based > machine as a stack machine? OK, shadow registers buy you one fast > interrupt, but that's sort of a one-level 2D stack. > > Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt > response time. It depends on the implementation. The easiest thing would be to not save anything at all before jumping to the interrupt handler. This would make the interrupt response really fast, but you'd have to save the registers manually before using them. It would benefit systems that don't need many (or any) registers in the interrupt handler. And even saving 4 registers at 100 MHz only takes an additional 40 ns. If you have parallel access to the stack/program memory, you could like the Cortex, and save a few (e.g. 4) registers on the stack, while you fetch the interrupt vector, and refill the execution pipeline at the same time. This adds a considerable bit of complexity, though. If you keep the register file in a large memory, like a internal block RAM, you can easily implement multiple sets of shadow registers. Of course, an FPGA comes with flexible hardware such as large FIFOs, so you can generally avoid the need for super fast interrupt response. In fact, you may not even need interrupts at all.Article: 155051
In article <kjin8q$so5$1@speranza.aioe.org>, glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote: >In comp.arch.fpga Rod Pemberton <do_not_have@notemailnotq.cpm> wrote: >> "rickman" <gnuarm@gmail.com> wrote in message >> news:kjf48e$5qu$1@dont-email.me... >>> Weren't you the person who brought CISC into this discussion? > >> Yes. > >>> Why are you asking this question about CISC? > >> You mentioned code density. AISI, code density is purely a CISC >> concept. They go together and are effectively inseparable. > >They do go together, but I am not so sure that they are inseperable. > >CISC began when much coding was done in pure assembler, and anything >that made that easier was useful. (One should figure out the relative >costs, but at least it was in the right direction.) But, of course, this is a fallacy. The same goal is accomplished by macro's, and better. Code densitity is the only valid reason. <SNIP> > >Decoding multiple different instruction formats tends to require >complicated demultiplexers which are especially hard to do in >an FPGA. Even so, one can make efficient use of the bits >and still be MISC. > >-- glen -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horstArticle: 155052
On 04/04/2013 01:16 PM, Albert van der Horst wrote: >>> You mentioned code density. AISI, code density is purely a CISC >>> concept. They go together and are effectively inseparable. >> >> They do go together, but I am not so sure that they are inseperable. >> >> CISC began when much coding was done in pure assembler, and anything >> that made that easier was useful. (One should figure out the relative >> costs, but at least it was in the right direction.) > > But, of course, this is a fallacy. The same goal is accomplished by > macro's, and better. Code densitity is the only valid reason. Speed is another valid reason.Article: 155053
In comp.arch.fpga Arlet Ottens <usenet+5@c-scape.nl> wrote: > On 04/04/2013 01:16 PM, Albert van der Horst wrote: >>>> You mentioned code density. AISI, code density is purely a CISC >>>> concept. They go together and are effectively inseparable. >>> They do go together, but I am not so sure that they are inseperable. >>> CISC began when much coding was done in pure assembler, and anything >>> that made that easier was useful. (One should figure out the relative >>> costs, but at least it was in the right direction.) >> But, of course, this is a fallacy. The same goal is accomplished by >> macro's, and better. Code densitity is the only valid reason. > Speed is another valid reason. Presumably some combination of ease of coding, speed, and also Brooks' "Second System Effect". Paraphrasing from "Mythical Man Month" since I haven't read it recently, the ideas that designers couldn't implement in their first system that they designed, for cost/efficiency/whatever reasons, come out in the second system. Brooks wrote that more for OS/360 (software) than for S/360 (hardware), but it might still have some effect on the hardware, and maybe also for VAX. There are a number of VAX instructions that seem like a good idea, but as I understand it ended up slower than if done without the special instructions. As examples, both the VAX POLY and INDEX instruction. When VAX was new, compiled languages (Fortran for example) pretty much never did array bounds testing. It was just too slow. So VAX supplied INDEX, which in one instruction did the multiply/add needed for a subscript calcualtion (you do one INDEX for each subscript) and also checked that the subscript was in range. Nice idea, but it seems that even with INDEX it was still too slow. Then POLY evaluates a whole polynomial, such as is used to approximate many mathematical functions, but again, as I understand it, too slow. Both the PDP-10 and S/360 have the option for an index register on many instructions, where when register 0 is selected no indexing is done. VAX instead has indexed as a separate address mode selected by the address mode byte. Is that the most efficient use for those bits? -- glenArticle: 155054
In comp.arch.fpga Syd Rumpo <usenet@nononono.co.uk> wrote: (snip) > Can you achieve as fast interrupt response times on a register-based > machine as a stack machine? OK, shadow registers buy you one fast > interrupt, but that's sort of a one-level 2D stack. If you disable interrupts so that another one doesn't come along before you can save enough state for the first one, yes. S/360 does it with no stack. You have to have some place in the low (first 4K) address range to save at least one register. The hardware saves the old PSW at a fixed (for each type of interrupt) address, which you also have to move somewhere else before enabling more interrupts of the same type. > Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt > response time. -- glenArticle: 155055
On 04/04/2013 02:49 PM, glen herrmannsfeldt wrote: > In comp.arch.fpga Syd Rumpo <usenet@nononono.co.uk> wrote: > > (snip) >> Can you achieve as fast interrupt response times on a register-based >> machine as a stack machine? OK, shadow registers buy you one fast >> interrupt, but that's sort of a one-level 2D stack. > > If you disable interrupts so that another one doesn't come along > before you can save enough state for the first one, yes. > > S/360 does it with no stack. You have to have some place in the > low (first 4K) address range to save at least one register. > The hardware saves the old PSW at a fixed (for each type of > interrupt) address, which you also have to move somewhere else > before enabling more interrupts of the same type. ARM7 is similar. PC and PSW are copied to registers, and further interrupts are disabled. The hardware does not touch the stack. If you want to make nested interrupts, the programmer is responsible for saving these registers. ARM Cortex has changed that, and it saves registers on the stack. This allows interrupt handlers to be written as regular higher language functions, and also allows easy nested interrupts. When dealing with back-to-back interrupts, the Cortex takes a shortcut, and does not pop/push the registers, but just leaves them on the stack.Article: 155056
On 4/3/13 11:31 PM, Arlet Ottens wrote: > On 04/04/2013 10:38 AM, Syd Rumpo wrote: >> On 29/03/2013 21:00, rickman wrote: >>> I have been working with stack based MISC designs in FPGAs for some >>> years. All along I have been comparing my work to the work of others. >>> These others were the conventional RISC type processors supplied by the >>> FPGA vendors as well as the many processor designs done by individuals >>> or groups as open source. >> >> <snip> >> >> Can you achieve as fast interrupt response times on a register-based >> machine as a stack machine? OK, shadow registers buy you one fast >> interrupt, but that's sort of a one-level 2D stack. >> >> Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt >> response time. > > It depends on the implementation. > > The easiest thing would be to not save anything at all before jumping to > the interrupt handler. This would make the interrupt response really > fast, but you'd have to save the registers manually before using them. > It would benefit systems that don't need many (or any) registers in the > interrupt handler. And even saving 4 registers at 100 MHz only takes an > additional 40 ns. The best interrupt implementation just jumps to the handler code. The implementation knows what registers it has to save and restore, which may be only one or two. Saving and restoring large register files takes cycles! > If you have parallel access to the stack/program memory, you could like > the Cortex, and save a few (e.g. 4) registers on the stack, while you > fetch the interrupt vector, and refill the execution pipeline at the > same time. This adds a considerable bit of complexity, though. > > If you keep the register file in a large memory, like a internal block > RAM, you can easily implement multiple sets of shadow registers. > > Of course, an FPGA comes with flexible hardware such as large FIFOs, so > you can generally avoid the need for super fast interrupt response. In > fact, you may not even need interrupts at all. Interrupts are good. I don't know why people worry about them so! Cheers, Elizabeth -- ================================================== Elizabeth D. Rather (US & Canada) 800-55-FORTH FORTH Inc. +1 310.999.6784 5959 West Century Blvd. Suite 700 Los Angeles, CA 90045 http://www.forth.com "Forth-based products and Services for real-time applications since 1973." ==================================================Article: 155057
On Tuesday, April 2, 2013 8:27:07 AM UTC-7, David Brown wrote: > I am working on a project that will involve a large HDMI switch - up to > > 16 inputs and 16 outputs. We haven't yet decided on the architecture, > > but one possibility is to use one or more FPGAs. The FPGAs won't be > > doing much other than the switch - there is no video processing going on. > > > > Each HDMI channel will be up to 3.4 Gbps (for HDMI 1.4), with 4 TMDS > > pairs (3 data and 1 clock). That means 64 pairs in, and 64 pairs out, > > all at 3.4 Gbps. > > > > > > Does anyone know of any FPGA families that might be suitable here? > > > > I've had a little look at Altera (since I've used Altera devices > > before), but their low-cost transceivers are at 3.125 Gbps - this means > > we'd have to use their mid or high cost devices, and they don't have > > nearly enough channels. I don't expect the card to be particularly > > cheap, but I'd like to avoid the cost of multiple top-range FPGA devices > > - then it would be much cheaper just to have a card with 80 4-to-1 HDMI > > mux chips. > > > > Thanks for any pointers, > > > > David You cannot do what you desire in an FPGA, even if one existed with 64 high speed serdes at sufficient speed and cost. What you seek is a serial crosspoint switch. Look at vendors like Mindspeed.Article: 155058
On 4/4/2013 8:44 AM, glen herrmannsfeldt wrote: > In comp.arch.fpga Arlet Ottens<usenet+5@c-scape.nl> wrote: >> On 04/04/2013 01:16 PM, Albert van der Horst wrote: > >>>>> You mentioned code density. AISI, code density is purely a CISC >>>>> concept. They go together and are effectively inseparable. > >>>> They do go together, but I am not so sure that they are inseperable. > >>>> CISC began when much coding was done in pure assembler, and anything >>>> that made that easier was useful. (One should figure out the relative >>>> costs, but at least it was in the right direction.) > >>> But, of course, this is a fallacy. The same goal is accomplished by >>> macro's, and better. Code densitity is the only valid reason. Albert, do you have a reference about this? >> Speed is another valid reason. > > Presumably some combination of ease of coding, speed, and also Brooks' > "Second System Effect". > > Paraphrasing from "Mythical Man Month" since I haven't read it recently, > the ideas that designers couldn't implement in their first system that > they designed, for cost/efficiency/whatever reasons, come out in the > second system. > > Brooks wrote that more for OS/360 (software) than for S/360 (hardware), > but it might still have some effect on the hardware, and maybe also > for VAX. > > There are a number of VAX instructions that seem like a good idea, but > as I understand it ended up slower than if done without the special > instructions. > > As examples, both the VAX POLY and INDEX instruction. When VAX was new, > compiled languages (Fortran for example) pretty much never did array > bounds testing. It was just too slow. So VAX supplied INDEX, which in > one instruction did the multiply/add needed for a subscript calcualtion > (you do one INDEX for each subscript) and also checked that the > subscript was in range. Nice idea, but it seems that even with INDEX > it was still too slow. > > Then POLY evaluates a whole polynomial, such as is used to approximate > many mathematical functions, but again, as I understand it, too slow. > > Both the PDP-10 and S/360 have the option for an index register on > many instructions, where when register 0 is selected no indexing is > done. VAX instead has indexed as a separate address mode selected by > the address mode byte. Is that the most efficient use for those bits? I think you have just described the CISC instruction development concept. Build a new machine, add some new instructions. No big rational, no "CISC" concept, just "let's make it better, why not add some instructions?" I believe if you check you will find the term CISC was not even coined until after RISC was invented. So CISC really just means, "what we used to do". -- RickArticle: 155059
On 04/04/13 21:08, Matt L wrote: > On Tuesday, April 2, 2013 8:27:07 AM UTC-7, David Brown wrote: >> I am working on a project that will involve a large HDMI switch - >> up to >> >> 16 inputs and 16 outputs. We haven't yet decided on the >> architecture, >> >> but one possibility is to use one or more FPGAs. The FPGAs won't >> be >> >> doing much other than the switch - there is no video processing >> going on. >> >> >> >> Each HDMI channel will be up to 3.4 Gbps (for HDMI 1.4), with 4 >> TMDS >> >> pairs (3 data and 1 clock). That means 64 pairs in, and 64 pairs >> out, >> >> all at 3.4 Gbps. >> >> >> >> >> >> Does anyone know of any FPGA families that might be suitable here? >> >> >> >> I've had a little look at Altera (since I've used Altera devices >> >> before), but their low-cost transceivers are at 3.125 Gbps - this >> means >> >> we'd have to use their mid or high cost devices, and they don't >> have >> >> nearly enough channels. I don't expect the card to be >> particularly >> >> cheap, but I'd like to avoid the cost of multiple top-range FPGA >> devices >> >> - then it would be much cheaper just to have a card with 80 4-to-1 >> HDMI >> >> mux chips. >> >> >> >> Thanks for any pointers, >> >> >> >> David > > You cannot do what you desire in an FPGA, even if one existed with 64 > high speed serdes at sufficient speed and cost. What you seek is a > serial crosspoint switch. Look at vendors like Mindspeed. > Thanks for that hint. I got another reply suggesting a crosspoint switch - I will look at Mindspeed too now. mvh., DavidArticle: 155060
In comp.arch.fpga rickman <gnuarm@gmail.com> wrote: (snip, then I wrote) >> Then POLY evaluates a whole polynomial, such as is used to approximate >> many mathematical functions, but again, as I understand it, too slow. >> Both the PDP-10 and S/360 have the option for an index register on >> many instructions, where when register 0 is selected no indexing is >> done. VAX instead has indexed as a separate address mode selected by >> the address mode byte. Is that the most efficient use for those bits? > I think you have just described the CISC instruction development > concept. Build a new machine, add some new instructions. No big > rational, no "CISC" concept, just "let's make it better, why not add > some instructions?" Yes, but remember that there is competition and each has to have some reason why someone should by their product. Adding new instructions was one way to do that. > I believe if you check you will find the term CISC was not even coined > until after RISC was invented. So CISC really just means, "what we used > to do". Well, yes, but why did "we used to do that"? For S/360, a lot of software was still written in pure assembler, for one reason to make it faster, and for another to make it smaller. And people were just starting to learn that people (writing software) are more expensive that machines (hardware). Well, that is about the point that it was true. For earlier machines you were lucky to get one compiler and enough system to run it. And VAX was enough later and even more CISCy. -- glenArticle: 155061
Hi Paul, It's a plain XC3020. I installed a 2001 Student Edition on my Win7 machine, and it appears to run. All it needed was the license code that came with the CD-ROM. I'm pretty sure XC3020 and XC3020A differ only in timing and the same bitfile can work on both. Anyway I'll be finding out sometime soon and I'll post how it goes. Thanks all! --MikeArticle: 155062
On Apr 4, 10:04=A0pm, rickman <gnu...@gmail.com> wrote: > On 4/4/2013 8:44 AM, glen herrmannsfeldt wrote: > > > In comp.arch.fpga Arlet Ottens<usene...@c-scape.nl> =A0wrote: > >> On 04/04/2013 01:16 PM, Albert van der Horst wrote: > > >>>>> You mentioned code density. =A0AISI, code density is purely a CISC > >>>>> concept. =A0They go together and are effectively inseparable. > > >>>> They do go together, but I am not so sure that they are inseperable. > > >>>> CISC began when much coding was done in pure assembler, and anything > >>>> that made that easier was useful. (One should figure out the relativ= e > >>>> costs, but at least it was in the right direction.) > > >>> But, of course, this is a fallacy. The same goal is accomplished by > >>> macro's, and better. Code densitity is the only valid reason. > > Albert, do you have a reference about this? > > Let's take two commonly used S/360 opcodes as an example of CISC; some move operations. MVC (move 0 to 255 bytes) MVCL (move 0 to 16M bytes). MVC does no padding or truncation. MVCL can pad and truncate, but unlike MVC will do nothing and report overflow if the operands overlap. MVC appears to other processors as a single indivisible operation; every processor (including IO processors) sees storage as either before the MVC or after it; it's not interruptible. MVCL is interruptible, and partial products can be observed by other processors. MVCL requires 4 registers and their contents are updated after completion of the operation; MVC requires 1 for variable length moves, 0 for fixed and its contents are preserved. MVCL has a high code setup cost; MVC has none. Writing a macro to do multiple MVCs and mimic the behaviour of MVCL? Why not? It's possible, if a little tricky. And by all accounts, MVC in a loop is faster than MVCL too. IBM even provided a macro; $MVCL. But then, when you look at MVCL usage closely, there are a few defining characteristics that are very useful. It can zero memory, and the millicode (IBM's word for microcode) recognizes 4K boundaries for 4K lengths and optimises it; it's faster than 16 MVCs. There's even a MVPG instruction for moving 4K aligned pages! What are those crazy instruction set designers thinking? The answer's a bit more than just code density; it never really was about that. In all the years I wrote IBM BAL, I never gave code density a serious thought -- with one exception. That was the 4K base address limit; a base register could only span 4K, so code that was bigger than that, you had to have either fancy register footwork or waste registers for multiple bases. It was more about giving assembler programmers choice and variety to get the best out of the box before the advent of optimising compilers; a way, if you like, of exposing the potential of the micro/millicode through the instruction set. "Here I want you to zero memory" meant an MVCL. "Here I am moving 8 bytes from A to B" meant using MVC. A knowledgeable assembler programmer could out-perform a compiler. (Nowadays quality compilers do a much better job of instruction selection than humans, especially for pipelined processors that stall.) Hence CISC instruction sets (at least, IMHO and for IBM). They were there for people and performance, not for code density.Article: 155063
On 4/4/2013 4:38 AM, Syd Rumpo wrote: > On 29/03/2013 21:00, rickman wrote: >> I have been working with stack based MISC designs in FPGAs for some >> years. All along I have been comparing my work to the work of others. >> These others were the conventional RISC type processors supplied by the >> FPGA vendors as well as the many processor designs done by individuals >> or groups as open source. > > <snip> > > Can you achieve as fast interrupt response times on a register-based > machine as a stack machine? OK, shadow registers buy you one fast > interrupt, but that's sort of a one-level 2D stack. > > Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt > response time. That's an interesting question. The short answer is yes, but it requires that I provide circuitry to do two things, one is to save both the Processor Status Word (PSW) and the return address to the stack in one cycle. The stack computer has two stacks and I can save these two items in one clock cycle. Currently my register machine uses a stack in memory pointed to by a register, so it would require *two* cycles to save two words. But the memory is dual ported and I can use a tiny bit of extra logic to save both words at once and bump the pointer by two. The other task is to save registers. The stack design doesn't really need to do that, the stack is available for new work and the interrupt routine just needs to finish with the stack in the same state as when it started. I've been thinking about how to handle this in the register machine. The registers are really two registers and one bank of registers. R6 and R7 are "special" in that they have a separate incrementer to support the addressing modes. They need a separate write port so they can be updated in parallel with the other registers. I have considered "saving" the registers by just bumping the start address of the registers in the RAM, but that only saves R0-R5. I could use LUT RAM for R6 and R6 as well. This would provide two sets of registers for R0-R5 and up to 16 sets for R6 and R7. The imbalance isn't very useful, but at least there would be a set for the main program and a set for interrupts with the caveat that nothing can be retained between interrupts. This also means interrupts can't be interrupted other than at specific points where the registers are not used for storage. I'm also going to look at using a block RAM for the registers. With only two read and write ports this makes the multiply step cycle longer though. Once that issue is resolved the interrupt response then becomes the same as the stack machine - 1 clock cycle or 20 ns. -- RickArticle: 155064
On Apr 3, 5:34=A0pm, "Rod Pemberton" <do_not_h...@notemailnotq.cpm> wrote: > =A0CISC was > typically little-endian to reduce the space needed for integer > encodings. As long as you discount IBM mainframes. They are big endian. Or Borroughs/Unisys; they were big endian too. Or the Motorola 68K; it was big endian. For little-endian CISC, only the VAX and x86 come to mind. Of those only the x86 survives.Article: 155065
In comp.arch.fpga Alex McDonald <blog@rivadpm.com> wrote: (snip, someone wrote) >> >>> But, of course, this is a fallacy. The same goal is accomplished by >> >>> macro's, and better. Code densitity is the only valid reason. >> Albert, do you have a reference about this? > Let's take two commonly used S/360 opcodes as an example of CISC; some > move operations. MVC (move 0 to 255 bytes) MVCL (move 0 to 16M bytes). MVC moves 1 to 256 bytes, conveniently. (Unless you want 0.) > MVC does no padding or truncation. MVCL can pad and truncate, but > unlike MVC will do nothing and report overflow if the operands > overlap. MVC appears to other processors as a single indivisible > operation; every processor (including IO processors) sees storage as > either before the MVC or after it; I haven't looked recently, but I didn't think it locked out I/O. Seems that one of the favorite tricks for S/360 was modifying channel programs while they are running. (Not to mention self- modifying channel prorams.) Seems that MVC would be convenient for that. It might be that MVC interlocks on CCW fetch such that only whole CCWs are fetched, though. > it's not interruptible. MVCL is > interruptible, and partial products can be observed by other > processors. MVCL requires 4 registers and their contents are updated > after completion of the operation; MVC requires 1 for variable length > moves, 0 for fixed and its contents are preserved. MVCL has a high > code setup cost; MVC has none. > Writing a macro to do multiple MVCs and mimic the behaviour of MVCL? > Why not? It's possible, if a little tricky. And by all accounts, MVC > in a loop is faster than MVCL too. IBM even provided a macro; $MVCL. > But then, when you look at MVCL usage closely, there are a few > defining characteristics that are very useful. It can zero memory, and > the millicode (IBM's word for microcode) recognizes 4K boundaries for > 4K lengths and optimises it; it's faster than 16 MVCs. As far as I understand, millicode isn't exactly like microcode, but does allow for more complicated new instructions to be more easily implemented. > There's even a MVPG instruction for moving 4K aligned pages! What are > those crazy instruction set designers thinking? > The answer's a bit more than just code density; it never really was > about that. In all the years I wrote IBM BAL, I never gave code > density a serious thought -- with one exception. That was the 4K base > address limit; a base register could only span 4K, so code that was > bigger than that, you had to have either fancy register footwork or > waste registers for multiple bases. Compared to VAX, S/360 is somewhat RISCy. Note only three different instruction lengths and, for much of the instruction set only two address modes. If processors fast path the more popular instructions, like L and even MVC, it isn't so far from RISC. > It was more about giving assembler programmers choice and variety to > get the best out of the box before the advent of optimising compilers; Though stories are that even the old Fortran H could come close to good assembly programmers, and likely better than the average assembly programmer. > a way, if you like, of exposing the potential of the micro/millicode > through the instruction set. "Here I want you to zero memory" meant an > MVCL. "Here I am moving 8 bytes from A to B" meant using MVC. A > knowledgeable assembler programmer could out-perform a compiler. > (Nowadays quality compilers do a much better job of instruction > selection than humans, especially for pipelined processors that > stall.) For many processors, MVC was much faster on appropriately aligned data, such as the 8 bytes from A to B. Then again, some might use LD and STD. > Hence CISC instruction sets (at least, IMHO and for IBM). They were > there for people and performance, not for code density. I noticed some time ago that the hex opcodes for add instructions end in A, and for divide in D. (That leaves B for subtract and C for multiply, but not so hard to remember.) If they really wanted to reduce code size, they should have added a load indirect register instruction. (RR format.) A good fraction of L (load) instructions have both base and offset zero, (or, equivalently, index and offset). -- glenArticle: 155066
On 4/4/2013 5:34 PM, glen herrmannsfeldt wrote: > In comp.arch.fpga rickman<gnuarm@gmail.com> wrote: > > (snip, then I wrote) > >>> Then POLY evaluates a whole polynomial, such as is used to approximate >>> many mathematical functions, but again, as I understand it, too slow. > >>> Both the PDP-10 and S/360 have the option for an index register on >>> many instructions, where when register 0 is selected no indexing is >>> done. VAX instead has indexed as a separate address mode selected by >>> the address mode byte. Is that the most efficient use for those bits? > >> I think you have just described the CISC instruction development >> concept. Build a new machine, add some new instructions. No big >> rational, no "CISC" concept, just "let's make it better, why not add >> some instructions?" > > Yes, but remember that there is competition and each has to have > some reason why someone should by their product. Adding new instructions > was one way to do that. > >> I believe if you check you will find the term CISC was not even coined >> until after RISC was invented. So CISC really just means, "what we used >> to do". > > Well, yes, but why did "we used to do that"? For S/360, a lot of > software was still written in pure assembler, for one reason to make > it faster, and for another to make it smaller. And people were just > starting to learn that people (writing software) are more expensive > that machines (hardware). Well, that is about the point that it was > true. For earlier machines you were lucky to get one compiler and > enough system to run it. Sure, none of this stuff was done without some purpose. My point is that there was no *common* theme to the various CISC instruction sets. Everybody was doing their own thing until RISC came along with a basic philosophy. Someone felt the need to give a name to the previous way of doing things and CISC seemed appropriate. No special meaning in the name actually, just a contrast to the "Reduced" in RISC. I don't think this is a very interesting topic really. It started in response to a comment by Rod. -- RickArticle: 155067
On 4/4/2013 7:16 AM, Albert van der Horst wrote: > In article<kjin8q$so5$1@speranza.aioe.org>, > glen herrmannsfeldt<gah@ugcs.caltech.edu> wrote: >> In comp.arch.fpga Rod Pemberton<do_not_have@notemailnotq.cpm> wrote: >>> "rickman"<gnuarm@gmail.com> wrote in message >>> news:kjf48e$5qu$1@dont-email.me... >>>> Weren't you the person who brought CISC into this discussion? >> >>> Yes. >> >>>> Why are you asking this question about CISC? >> >>> You mentioned code density. AISI, code density is purely a CISC >>> concept. They go together and are effectively inseparable. >> >> They do go together, but I am not so sure that they are inseperable. >> >> CISC began when much coding was done in pure assembler, and anything >> that made that easier was useful. (One should figure out the relative >> costs, but at least it was in the right direction.) > > But, of course, this is a fallacy. The same goal is accomplished by > macro's, and better. Code densitity is the only valid reason. I'm pretty sure that conclusion is not correct. If you have an instruction that does two or three memory accesses in one instruction and you replace it with three instructions that do one memory access each, you end up with two extra memory accesses. How is this faster? That is one of the reasons why I want to increase code density, in my machine it automatically improves execution time as well as reducing the amount of storage needed. -- RickArticle: 155068
On Apr 4, 4:15=A0pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote: > In comp.arch.fpga Alex McDonald <b...@rivadpm.com> wrote: > > (snip, someone wrote) > > >> >>> But, of course, this is a fallacy. The same goal is accomplished b= y > >> >>> macro's, and better. Code densitity is the only valid reason. > >> Albert, do you have a reference about this? > > Let's take two commonly used S/360 opcodes as an example of CISC; some > > move operations. MVC (move 0 to 255 bytes) MVCL (move 0 to 16M bytes). > > MVC moves 1 to 256 bytes, conveniently. (Unless you want 0.) My bad; the encoding is 0 to 255 but it's interpreted as +1. > > > MVC does no padding or truncation. MVCL can pad and truncate, but > > unlike MVC will do nothing and report overflow if the operands > > overlap. MVC appears to other processors as a single indivisible > > operation; every processor (including IO processors) sees storage as > > either before the MVC or after it; > > I haven't looked recently, but I didn't think it locked out I/O. > Seems that one of the favorite tricks for S/360 was modifying > channel programs while they are running. (Not to mention self- > modifying channel prorams.) Seems that MVC would be convenient > for that. It might be that MVC interlocks on CCW fetch such that > only whole CCWs are fetched, though. Certainly on the S/360 whole CCWs, since it had a precise interrupt model and MVC wasn't (and still isn't) interruptible. The S/370 allowed interrupts on page faults for the target and source, but that is done before the instruction is executed. IPL operates just like that; it issues a fixed CCW that reads in data that's a PSW and some CCWs, and away she goes... > > > it's not interruptible. MVCL is > > interruptible, and partial products can be observed by other > > processors. MVCL requires 4 registers and their contents are updated > > after completion of the operation; MVC requires 1 for variable length > > moves, 0 for fixed and its contents are preserved. MVCL has a high > > code setup cost; MVC has none. > > Writing a macro to do multiple MVCs and mimic the behaviour of MVCL? > > Why not? It's possible, if a little tricky. And by all accounts, MVC > > in a loop is faster than MVCL too. IBM even provided a macro; $MVCL. > > But then, when you look at MVCL usage closely, there are a few > > defining characteristics that are very useful. It can zero memory, and > > the millicode (IBM's word for microcode) recognizes 4K boundaries for > > 4K lengths and optimises it; it's faster than 16 MVCs. > > As far as I understand, millicode isn't exactly like microcode, > but does allow for more complicated new instructions to be more > easily implemented. > > > There's even a MVPG instruction for moving 4K aligned pages! What are > > those crazy instruction set designers thinking? > > The answer's a bit more than just code density; it never really was > > about that. In all the years I wrote IBM BAL, I never gave code > > density a serious thought -- with one exception. That was the 4K base > > address limit; a base register could only span 4K, so code that was > > bigger than that, you had to have either fancy register footwork or > > waste registers for multiple bases. > > Compared to VAX, S/360 is somewhat RISCy. Note only three different > instruction lengths and, for much of the instruction set only two > address modes. If processors fast path the more popular instructions, > like L and even MVC, it isn't so far from RISC. A modern Z series has more instructions than the average Forth has words; it's in the high hundreds. > > > It was more about giving assembler programmers choice and variety to > > get the best out of the box before the advent of optimising compilers; > > Though stories are that even the old Fortran H could come close to > good assembly programmers, and likely better than the average assembly > programmer. Fortran/H was a good compiler. The early PL/I was horrible, and there was a move to use it for systems programming work. I never did so due to its incredibly bad performance. > > > a way, if you like, of exposing the potential of the micro/millicode > > through the instruction set. "Here I want you to zero memory" meant an > > MVCL. "Here I am moving 8 bytes from A to B" meant using MVC. A > > knowledgeable assembler programmer could out-perform a compiler. > > (Nowadays quality compilers do a much better job of instruction > > selection than humans, especially for pipelined processors that > > stall.) > > For many processors, MVC was much faster on appropriately aligned > data, such as the 8 bytes from A to B. Then again, some might > use LD and STD. OK, 9 bytes. :-) > > > Hence CISC instruction sets (at least, IMHO and for IBM). They were > > there for people and performance, not for code density. > > I noticed some time ago that the hex opcodes for add instructions > end in A, and for divide in D. (That leaves B for subtract and > C for multiply, but not so hard to remember.) > > If they really wanted to reduce code size, they should have added > a load indirect register instruction. (RR format.) A good > fraction of L (load) instructions have both base and offset > zero, (or, equivalently, index and offset). Agreed. And had they wanted to, a single opcode for the standard prolog & epilog; for example: STM 14,12,12(13) LR 12,15 LA 15,SAVE ST 15,8(13) ST 13,4(15) LR 13,15 could have been the single op ENTRY SAVE It was macros every time, which is in direct opposition to Albert's assertion. > > -- glenArticle: 155069
In article <kjkpnp$qdp$1@dont-email.me>, rickman <gnuarm@gmail.com> wrote: >On 4/4/2013 8:44 AM, glen herrmannsfeldt wrote: >> In comp.arch.fpga Arlet Ottens<usenet+5@c-scape.nl> wrote: >>> On 04/04/2013 01:16 PM, Albert van der Horst wrote: >> >>>>>> You mentioned code density. AISI, code density is purely a CISC >>>>>> concept. They go together and are effectively inseparable. >> >>>>> They do go together, but I am not so sure that they are inseperable. >> >>>>> CISC began when much coding was done in pure assembler, and anything >>>>> that made that easier was useful. (One should figure out the relative >>>>> costs, but at least it was in the right direction.) >> >>>> But, of course, this is a fallacy. The same goal is accomplished by >>>> macro's, and better. Code densitity is the only valid reason. > >Albert, do you have a reference about this? Not in a wikipedia sense where you're not allowed to mention original research, and are only quoting what is in the books. It is more experience that school knowledge. If you want to see what can be done with a good macro processor like m4 study the one source of the 16/32/64 bit ciforth x86 for linux/Windows/Apple. See my site below. The existance of an XLAT instruction (to name an example) OTOH does virtually nothing to make the life of an assembler programmer better. Groetjes Albert > >-- > >Rick -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horstArticle: 155070
In comp.arch.fpga Alex McDonald <blog@rivadpm.com> wrote: > On Apr 3, 5:34 pm, "Rod Pemberton" <do_not_h...@notemailnotq.cpm> > wrote: >> CISC was >> typically little-endian to reduce the space needed for integer >> encodings. > As long as you discount IBM mainframes. They are big endian. Or > Borroughs/Unisys; they were big endian too. Or the Motorola 68K; it > was big endian. For little-endian CISC, only the VAX and x86 come to > mind. Of those only the x86 survives. To me, the only one where little endian seems reasonable is the 6502. They did amazingly well with a small number of gates. Note, for one, that on subroutine call the 6502 doesn't push the address of the next instruction on the stack. That would have taken too much logic. It pushes the adress minus one, as, it seems, that is what is in the register at the time. RET adds one after the pop. Two byte addition is slightly easier in little endian order, but only slightly. It doesn't help at all for multiply and divide. VAX was little endian because the PDP-11 was, though I am not sure that there was a good reason for that. -- glenArticle: 155071
On Apr 4, 9:10=A0pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote: . > > VAX was little endian because the PDP-11 was, though I am not sure > that there was a good reason for that. > > -- glen John Savard's take on it; http://www.plex86.org/Computer_Folklore/Little-En= dian-1326.htmlArticle: 155072
On Apr 5, 12:30=A0am, rickman <gnu...@gmail.com> wrote: > On 4/4/2013 5:34 PM, glen herrmannsfeldt wrote: > > > > > > > In comp.arch.fpga rickman<gnu...@gmail.com> =A0wrote: > > > (snip, then I wrote) > > >>> Then POLY evaluates a whole polynomial, such as is used to approximat= e > >>> many mathematical functions, but again, as I understand it, too slow. > > >>> Both the PDP-10 and S/360 have the option for an index register on > >>> many instructions, where when register 0 is selected no indexing is > >>> done. VAX instead has indexed as a separate address mode selected by > >>> the address mode byte. Is that the most efficient use for those bits? > > >> I think you have just described the CISC instruction development > >> concept. =A0Build a new machine, add some new instructions. =A0No big > >> rational, no "CISC" concept, just "let's make it better, why not add > >> some instructions?" > > > Yes, but remember that there is competition and each has to have > > some reason why someone should by their product. Adding new instruction= s > > was one way to do that. > > >> I believe if you check you will find the term CISC was not even coined > >> until after RISC was invented. =A0So CISC really just means, "what we = used > >> to do". > > > Well, yes, but why did "we used to do that"? For S/360, a lot of > > software was still written in pure assembler, for one reason to make > > it faster, and for another to make it smaller. And people were just > > starting to learn that people (writing software) are more expensive > > that machines (hardware). Well, that is about the point that it was > > true. For earlier machines you were lucky to get one compiler and > > enough system to run it. > > Sure, none of this stuff was done without some purpose. =A0My point is > that there was no *common* theme to the various CISC instruction sets. > Everybody was doing their own thing until RISC came along with a basic > philosophy. =A0Someone felt the need to give a name to the previous way o= f > doing things and CISC seemed appropriate. =A0No special meaning in the > name actually, just a contrast to the "Reduced" in RISC. > > I don't think this is a very interesting topic really. =A0It started in > response to a comment by Rod. > > -- > > Rick- Hide quoted text - > > - Show quoted text - Exactly. The CISC moniker was simply applied to an entire *generation* of processors by the RISC guys. The term RISC was coined by David Patterson at the University of California between 1980 and 1984. http://en.wikipedia.org/wiki/Berkeley_RISC *Until* that time, there was no need for the term "CISC", because there was no RISC concept that required a differentiation! I always thought of the Z80 of a CISC 8-bitter; it has some useful memory move instructions (LDIR etc).Article: 155073
On Apr 5, 1:07=A0am, rickman <gnu...@gmail.com> wrote: > On 4/4/2013 7:16 AM, Albert van der Horst wrote: > > > > > > > In article<kjin8q$so...@speranza.aioe.org>, > > glen herrmannsfeldt<g...@ugcs.caltech.edu> =A0wrote: > >> In comp.arch.fpga Rod Pemberton<do_not_h...@notemailnotq.cpm> =A0wrote= : > >>> "rickman"<gnu...@gmail.com> =A0wrote in message > >>>news:kjf48e$5qu$1@dont-email.me... > >>>> Weren't you the person who brought CISC into this discussion? > > >>> Yes. > > >>>> Why are you asking this question about CISC? > > >>> You mentioned code density. =A0AISI, code density is purely a CISC > >>> concept. =A0They go together and are effectively inseparable. > > >> They do go together, but I am not so sure that they are inseperable. > > >> CISC began when much coding was done in pure assembler, and anything > >> that made that easier was useful. (One should figure out the relative > >> costs, but at least it was in the right direction.) > > > But, of course, this is a fallacy. The same goal is accomplished by > > macro's, and better. Code densitity is the only valid reason. > > I'm pretty sure that conclusion is not correct. =A0If you have an > instruction that does two or three memory accesses in one instruction > and you replace it with three instructions that do one memory access > each, you end up with two extra memory accesses. =A0How is this faster? > > That is one of the reasons why I want to increase code density, in my > machine it automatically improves execution time as well as reducing the > amount of storage needed. > > -- > > Rick- Hide quoted text - > > - Show quoted text - I think you're on the right track. With FPGAs it's really quite simple to execute all instructions in a single cycle. It's no big deal at all - with MPY and DIV being exceptions. In the 'Forth CPU world' even literals can be loaded in a single cycle. It then comes down to careful selection of your instruction set. With a small enough instruction set one can pack more than one instruction in a word - and there's your code density. If you can pack more than one instruction in a word, you can execute them in a single clock cycle. With added complexity, you may even be able to execute them in parallel rather than as a process.Article: 155074
On 04/05/2013 09:51 AM, Mark Wills wrote: >> I'm pretty sure that conclusion is not correct. If you have an >> instruction that does two or three memory accesses in one instruction >> and you replace it with three instructions that do one memory access >> each, you end up with two extra memory accesses. How is this faster? >> >> That is one of the reasons why I want to increase code density, in my >> machine it automatically improves execution time as well as reducing the >> amount of storage needed. > I think you're on the right track. With FPGAs it's really quite simple > to execute all instructions in a single cycle. It's no big deal at all > - with MPY and DIV being exceptions. In the 'Forth CPU world' even > literals can be loaded in a single cycle. It then comes down to > careful selection of your instruction set. With a small enough > instruction set one can pack more than one instruction in a word - and > there's your code density. If you can pack more than one instruction > in a word, you can execute them in a single clock cycle. With added > complexity, you may even be able to execute them in parallel rather > than as a process. Multiple instructions per word sounds like a bad idea. It requires instructions that are so small that they can't do very much, so you need more of them. And if you need 2 or more small instructions to do whatever 1 big instruction does, it's better to use 1 big instruction since it makes instruction decoding more efficient and simpler.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z