Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On 3/30/2013 3:20 AM, Arlet Ottens wrote: > On 03/29/2013 10:00 PM, rickman wrote: >> I have been working with stack based MISC designs in FPGAs for some >> years. All along I have been comparing my work to the work of others. >> These others were the conventional RISC type processors supplied by the >> FPGA vendors as well as the many processor designs done by individuals >> or groups as open source. >> >> So far my CPUs have always ranked reasonably well in terms of speed, but >> more importantly to me, very well in terms of size and code density. My >> efforts have shown it hard to improve on code density by a significant >> degree while simultaneously minimizing the resources used by the design. >> Careful selection of the instruction set can both improve code density >> and minimize logic used if measured together, but there is always a >> tradeoff. One can always be improved at the expense of the other. >> > > I once made a CPU design for an FPGA that had multiple stacks. There was > a general purpose stack "A", two index stacks "X" and "Y", and a return > stack "R". ALU operations worked between A and any other stack, so they > only required 2 bits in the opcode. There was also a move instruction > that could move data from a source to a destination stack. Let's see, this would be a hybrid really, between register based and stack based as it has multiple stacks which must be selected in the instruction set like registers, just not many. > Having access to multiple stacks means you spend less time shuffling > data on the stack. There's no more need for swap, over, rot and similar > stack manipulation instructions. The only primitive operations you need > are push and pop. There is the extra hardware required to implement multiple stacks. Why have quite so many? I use the return stack for addresses. That works pretty well. Maybe A, X and R? > For instance, I had a load instruction that could load from memory using > the address in the X stack, and push the result on the A stack. The cool > part is that the X stack itself isn't changed by this operation, so the > same address can be used multiple time. So, you could do a > > LOAD (X) ; load from (X) and push on A > 1 ; push literal on A > ADD ; add top two elements of A > STORE (X) ; pop A, and store in (X) > > to increment a location in memory. But you aren't quite done yet, at least if you care about stack overflows. Chuck's designs don't care. If he has data on the bottom of the stack he can just leave it. Otherwise you need to drop the address on the X stack. I looked at this when designing my MISC processor. I ended up with two fetch and two stores. One just does the fetch or store and pops the address. The other does a post increment and retains the address to be used in a loop. This one would have to be dropped at the end. > And if you wanted to increment X to access the next memory location, > you'd do: > > 1 ; push literal on A > ADD X ; pop X, pop A, add, and push result on A. > MOVE A, X ; pop A, and push on X Add an autoincrement to the index stacks and these three instructions go away. > It was an 8 bit architecture with 9 bit instructions (to match the FPGA > block RAM + parity bit). Having 9 bit instructions allows an 8 bit > literal push to be encoded in 1 instruction. I was all about 9 bit instructions and 18 bit address/data bus sizes until I started working with the iCE40 parts which only have 8 bit memory. I think the parity bit is a comms industry thing. That one bit makes a *big* difference in the instruction set capabilities. > Feel free to e-mail if you want more details. Thanks. -- RickArticle: 155026
On 03/30/2013 10:54 PM, Rod Pemberton wrote: >> >> I guess looking at other peoples designs (such as Chuck's) has >> changed my perspective over the years so that I am willing and >> able to do optimizations in ways I would not have wanted to do >> in the past. But I am a bit surprised that there has been so >> much emphasis on stack oriented MISC machines which it may well >> be that register based MISC designs are also very efficient, >> at least if you aren't building them to service a C compiler or >> trying to match some ideal RISC model. >> > > Are those your actual results or did you just reiterate what is on > Wikipedia? Yes, that's a serious question. Read the MISC page: > http://en.wikipedia.org/wiki/Minimal_instruction_set_computer > > See ... ?! It sounds to me rickman is questioning the (unsupported) claims on wikipedia that stack based machines have an advantage in size and/or simplicity, not reiterating them. > Code density is a CISC concept. I don't see how it applies to > your MISC project. Increasing code density for a MISC processor > means implementing more powerful instructions, i.e., those that do > more work, while minimizing bytes in the instruction opcode > encoding. Even if you implement CISC-like instructions, you can't > forgo the MISC instructions you already have in order to add the > CISC-like instructions. So, to do that, you'll need to increase > the size of the instruction set, as well as implement a more > complicated instruction decoder. I.e., that means the processor > will no longer be MISC, but MISC+minimal CISC hybrid, or pure > CISC... It is perfectly possible to trade one type of MISC processor for another one. The choice between stack and register based is an obvious one. If you switch from stack to register based, there's no need to keep stack manipulation instructions around. > Also, you cross-posted to comp.arch.fpga. While they'll likely be > familiar with FPGAs, most there are not going to be familiar with > the features of stack-based processors or Forth processors that > you discuss indirectly within your post. They might not be > familiar with ancient CISC concepts such as "code density" either, > or understand why it was important at one point in time. E.g., I > suspect this Forth related stuff from above won't be widely > understood on c.a.f. without clarification: The design of simple and compact processors is of great interest to many FPGA engineers. Plenty of FPGA designs need some sort of control processor, and for cost reduction it's important to use minimal resources. Like rickman said, this involves a careful balance between implementation complexity, speed, and code density, while also considering how much work it is to write/maintain the software that's running on the processor. Code density is still critically important. Fast memory is small, both on FPGA as well as general purpose processors.Article: 155027
On 03/31/2013 12:00 AM, rickman wrote: >> I once made a CPU design for an FPGA that had multiple stacks. There was >> a general purpose stack "A", two index stacks "X" and "Y", and a return >> stack "R". ALU operations worked between A and any other stack, so they >> only required 2 bits in the opcode. There was also a move instruction >> that could move data from a source to a destination stack. > > Let's see, this would be a hybrid really, between register based and > stack based as it has multiple stacks which must be selected in the > instruction set like registers, just not many. Exactly. And as a hybrid, it offers some advantages from both kinds of designs. > > >> Having access to multiple stacks means you spend less time shuffling >> data on the stack. There's no more need for swap, over, rot and similar >> stack manipulation instructions. The only primitive operations you need >> are push and pop. > > There is the extra hardware required to implement multiple stacks. Why > have quite so many? I use the return stack for addresses. That works > pretty well. Maybe A, X and R? I had all stacks implemented in the same block RAM, just using different sections of it. But you are right, in my implementation I had reserved room for the Y stack, but never really implemented it. Just using the X was sufficient for the application I needed the CPU For. Of course, for other applications, having an extra register/stack that you can use as an memory pointer could be useful, so I left the Y register in the instruction encoding. For 3 registers you need 2 bits anyway, so it makes sense to allow for 4. > But you aren't quite done yet, at least if you care about stack > overflows. Chuck's designs don't care. If he has data on the bottom of > the stack he can just leave it. Otherwise you need to drop the address > on the X stack. Correct, if you no longer need the address, you need to drop it from the stack. On the other hand, if you let it drop automatically, and you need it twice, you would have to dup it. Intuitively, I would say that in inner loops it would be more common that you'd want to reuse an address (possibly with offset or autoinc/dec). > I was all about 9 bit instructions and 18 bit address/data bus sizes > until I started working with the iCE40 parts which only have 8 bit > memory. I think the parity bit is a comms industry thing. That one bit > makes a *big* difference in the instruction set capabilities. Agreed. As soon you use the parity bits, you're tied to a certain architecture. On the other hand, if you're already chose a certain architecture, and the parity bits are available for free, it can be very advantageous to use them.Article: 155028
In comp.arch.fpga rickman <gnuarm@gmail.com> wrote: (snip) >> Well, much of the idea of RISC is that code density isn't very >> important, and that many of the more complicated instructions made >> assembly language programming easier, but compilers didn't use them. > I am somewhat familiar with VLIW. I am also very familiar with > microcode which is the extreme VLIW. I have coded microcode for an I/O > processor on an attached array processor. That's like saying I coded a > DMA controller in a DSP chip, but before DSP chips were around. Well, yes, VLIW is just about like compiling from source directly to microcode. The currently in production VLIW processor is Itanium. I believe 128 bit wide instructions, which specify many different operations that can happen at the same time. 128 general and 128 floating point registers, 128 bit wide data bus, but it uses a lot of power. Something like 100W, with Icc of 100A and Vcc about 1V. I have an actual RX2600 dual Itanium box, but don't run it very often, mostly because of the power used. (snip) >> But every level of logic adds delay. Using a wide bus to fast memory >> is more efficient that a complicated decoder. But sometimes RISC went >> too far. In early RISC, there was the idea of one cycle per instruction. >> They couldn't do that for multiply, so they added multiply-step, an >> instruction that you execute many times for each multiply operation. >> (And maybe no divide at all.) > I"m not sure what your point is. What part of this is "too far"? This > is exactly the type of design I am doing, but to a greater extent. The early SPARC didn't have a multiply instruction. (Since it couldn't be done in one cycle.) Instead, they did multiply, at least the Sun machines did, through software emulation, with a software trap. Early when I started using a Sun4/110 I generated a whole set of fonts for TeX from Metafont source, which does a lot of multiply. Unix (SunOS) keeps track of user time (what your program does) and system time (what the OS does while executing your program). Multiply counted as system time, and could be a large fraction of the total time. >> For VLIW, a very wide instruction word allows for specifying many >> different operations at the same time. It relies on complicated >> compilers to optimally pack the multiple operations into the >> instruction stream. > Yes, in theory that is what VLIW is. This is just one step removed from > microcode where the only limitation to how parallel operations can be is > the data/address paths themselves. The primary application of VLIW I > have seen is in the TI 6000 series DSP chips. But in reality this is > not what I consider VLIW. This design uses eight CPUs which are mostly > similar, but not quite identical with two sets of four CPU units sharing > a register file, IIRC. In reality each of the eight CPUs gets its own > 32 bit instruction stream. They all operate in lock step, but you can't > do eight FIR filters. I think of the four in a set, two are set up to > do full math, etc and two are able to generate addresses. So this is > really two CPUs with dual MACs and two address generators as it ends up > being used most of the time. But then they make it run at clocks of > over 1 GHz so it is a damn fast DSP and handles most of the cell phone > calls as part of the base station. For Itanium, the different units do different things. There are instruction formats that divide up the bits in different ways to make optimal use of the bits. I used to have the manual nearby, but I don't see it right now. (snip) > That is a result of the heavy pipelining that is being done. I like to > say my design is not pipelined, but someone here finally convinced me > that my design *is* pipelined with the execution in parallel with the > next instruction fetch, but there is never a need to stall or flush > because there are no conflicts. Yes, that counts, but it gets much more interesting with pipelines like the Cray-1 that, after some cycles of latency, generate one result every clock cycle. > I want a design to be fast, but not at the expense of complexity. That > is one way I think like Chuck Moore. Keep it simple and that gives you > speed. (snip) > In Forth speak a POP would be a DROP. That is not often used in Forth > really, or in my apps. I just wrote some code for my stack CPU and I > think there were maybe two DROPs in just over 100 instructions. I am > talking about the DUPs, SWAPs, OVERs and such. The end up being needed > enough that it makes the register design look good... at least at first > blush. I am looking at how to organize a register based instruction set > without expanding the size of the instructions. I'm realizing that is > one issue with registers, you have to specify them. But I don't need to > make the machine totally general like the goal for RISC. That is so > writing compilers is easier. I don't need to consider that. For x87, they avoid the DUP, SWAP, and such by instructions being able to specify any register in the stack. You can, for example, add any stack register (there are eight) to the top of stack. I haven't thought about it for a while, but I believe either pushing the result, or replacing the previous top of the stack. (snip) >> Well, the stack design has the advantage that you can use instruction >> bits either for a memory address or for the operation, allowing for much >> smaller instructions. But that only works as long as everything is in >> the right place on the stack. > Yes, it is a tradeoff between instruction size and the number of ops > needed to get a job done. I'm looking at trimming the instruction size > down to give a workable subset for register operations. Seems to me that one possibility is to have a really functional stack operation instruction, such that, with the give number of bits, it allows for the most opertions. Some combination of DUP, SWAP, and POP all at once. Though that isn't easy for the stack itself. -- glenArticle: 155029
In the following forum thread, member Alex reports some odd behavior when using ISIM to simulate some code that is part of an FPGA emulation of an arcade game. http://forum.gadgetfactory.net/index.php?/topic/1544-isim-is-driving-me-crazy/#entry10072 Since the code is likely a literal translation of the arcade game schematic, some bits from a 5-bit slv counter are used as the async reset and clock in a process. Don't be distracted by this non-synchronous design practice, as this isn't the issue. The issue is that the rising_edge() function wasn't working correctly. After some investigation and simulation, a determination has been made that in ISIM the 'last_value attribute isn't being properly evaluated *inside* the function when the argument to rising_edge() is a bit from a standard_logic_vector??? If the counter bit, counter(1), that is used as the clock is assigned to the std_logic signal, counter_1, then the 'last_value attribute is evaluated properly in the rising_edge() function call. To test this a 2nd process is created using the counter_1 signal as the clock. The rising_edge function source is copied from the Xilinx library and modified to report the status of the incoming signal. Here is the test bench which illustrates the problem. ********** BEGIN CODE *********** library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity test_tb is end test_tb; architecture behavior of test_tb is function rising_edge (signal s : std_ulogic) return boolean is begin report "s'event=" & boolean'image(s'event) & " s=" & std_ulogic'image(s) & " s'last_value=" & std_ulogic'image(s'last_value); return (s'event and (to_x01(s) = '1') and (to_x01(s'last_value) = '0')); end; signal output : std_logic := '0'; signal counter : std_logic_vector(4 downto 0) := (others => '0'); constant period : time := 20 ns; signal counter_1 : std_logic; signal output_1 : std_logic; begin process variable counter_i : std_logic_vector(4 downto 0) := (others => '0'); begin wait for period; counter <= counter+1; end process; process(counter(1), counter(4)) begin report "counter(1)'event=" & boolean'image(counter(1)'event) & " counter(1)=" & std_ulogic'image(counter(1)) & " counter(1)'last_value=" & std_ulogic'image(counter(1)'last_value); if (counter(4) = '0') then output <= '0'; elsif rising_edge(counter(1)) then output <= (not counter(3)) and counter(2); end if; end process; counter_1 <= counter(1); process(counter_1, counter(4)) begin report "counter_1'event=" & boolean'image(counter_1'event) & " counter_1=" & std_ulogic'image(counter_1) & " counter_1'last_value=" & std_ulogic'image(counter_1'last_value); if (counter(4) = '0') then output_1 <= '0'; elsif rising_edge(counter_1) then output_1 <= (not counter(3)) and counter(2); end if; end process; end; ********** END CODE *********** Here are the reports from the ISIM log for the beginning of the time period when counter(4) is '1' and the async reset in the process is not asserted. As you can see, the 'last_value attribute value is incorrect for all rising_edge(counter(1)) calls but correct for rising_edge(counter_1). The code runs correctly in ModelSim 10.1, producing identical results for both function calls. ********** BEGIN ISIM LOG *********** at 320 ns(1): Note: counter(1)'event=true counter(1)='0' counter(1)'last_value='1' (/test_tb/). at 320 ns(1): Note: s'event=true s='0' s'last_value='0' (/test_tb/). at 320 ns(1): Note: counter_1'event=false counter_1='1' counter_1'last_value='0' (/test_tb/). at 320 ns(1): Note: s'event=false s='1' s'last_value='0' (/test_tb/). at 320 ns(2): Note: counter_1'event=true counter_1='0' counter_1'last_value='1' (/test_tb/). at 320 ns(2): Note: s'event=true s='0' s'last_value='1' (/test_tb/). at 360 ns(1): Note: counter(1)'event=true counter(1)='1' counter(1)'last_value='0' (/test_tb/). at 360 ns(1): Note: s'event=true s='1' s'last_value='1' (/test_tb/). at 360 ns(2): Note: counter_1'event=true counter_1='1' counter_1'last_value='0' (/test_tb/). at 360 ns(2): Note: s'event=true s='1' s'last_value='0' (/test_tb/). at 400 ns(1): Note: counter(1)'event=true counter(1)='0' counter(1)'last_value='1' (/test_tb/). at 400 ns(1): Note: s'event=true s='0' s'last_value='1' (/test_tb/). at 400 ns(2): Note: counter_1'event=true counter_1='0' counter_1'last_value='1' (/test_tb/). at 400 ns(2): Note: s'event=true s='0' s'last_value='1' (/test_tb/). ********** END ISIM LOG *********** Any ideas? Is this a legitimate bug or a hole in my understanding of VHDL and simulation in general? UrbiteArticle: 155030
glen herrmannsfeldt wrote: > Seems to me that much of the design of VAX was to improve code > density when main memory was still a very significant part of the > cost of a machine. The original VAX series (780, then 730 and 750) and the uVAX I and uVAX II had no cache, so reducing rate of memory fetches was also important. The first cached machine I used was the uVAX III (KA650) and a good part of the performance increase was due to the cache. JonArticle: 155031
On 3/25/2013 7:04 PM, Mike Butts wrote: > I've got a 20-year-old Xilinx XC3020 development board. I think it would be fun to fire it up and bring it to the 20th anniversary FCCM in Seattle next month. (http://fccm.org/2013/) > > I don't see XC3000-series supported on even the oldest archived ISE at xilinx.com. Anyone know where I can find some tools for this old chip? It has 64 CLBs and 256 flip-flops! Maybe one of you folks at Xilinx? Thanks! > > --Mike > > Is this a 'plain' XC3020 or the XC3020A or 'L'? I have a Foundation Express 1.5 loaded up on a Win 98 VM running under VMware Workstation 8 on a Win 7 host. Just for 'fun' I coded up an 8-bit counter and was able to push the design all the way through the flow and generate a bit file. Unfortunately F1.5 doesn't support the original XC3000 family, only the later 'A' and 'L' revisions. That's too bad because I just found several XC3042's, 3020's, 2018's, 2064's, and some 95108's in my collection. What kind of retro project could be built with that collection of logic? I'm certain that I have earlier versions of the Xilinx somewhere that will support that old stuff. All of the licenses back then were locked to a hard drive serial number, which are easy to change when moving to a new pc - real or virtual. --PaulArticle: 155032
"Arlet Ottens" <usenet+5@c-scape.nl> wrote in message news:5157e1a1$0$6924$e4fe514c@news2.news.xs4all.nl... > On 03/30/2013 10:54 PM, Rod Pemberton wrote: > >> > >> I guess looking at other peoples designs (such as Chuck's) > >> has changed my perspective over the years so that I am > >> willing and able to do optimizations in ways I would not have > >> wanted to do in the past. But I am a bit surprised that there > >> has been so much emphasis on stack oriented MISC machines > >> which it may well be that register based MISC designs are > >> also very efficient, at least if you aren't building them to > >> service a C compiler or trying to match some ideal RISC > >> model. > >> > > > > Are those your actual results or did you just reiterate what > > is on Wikipedia? Yes, that's a serious question. Read the > > MISC page: [link] > > > > See ... ?! > > It sounds to me rickman is questioning the (unsupported) claims > on wikipedia that stack based machines have an advantage in size > and/or simplicity, not reiterating them. > Well, you snipped alot of context. I wouldn't have reformatted all of it either. Anyway, what I took from rickman's statements was that he had only managed to confirm what is already known about MISC according to Wikipedia. I.e., what was/is the point? > > Code density is a CISC concept. I don't see how it applies to > > your MISC project. Increasing code density for a MISC > > processor means implementing more powerful instructions, > > i.e., those that do more work, while minimizing bytes in the > > instruction opcode encoding. Even if you implement > > CISC-like instructions, you can't forgo the MISC instructions > > you already have in order to add the CISC-like instructions. > > So, to do that, you'll need to increase the size of the > > instruction set, as well as implement a more complicated > > instruction decoder. I.e., that means the processor will no > > longer be MISC, but MISC+minimal CISC hybrid, or pure > > CISC... > > It is perfectly possible to trade one type of MISC processor for > another one. The choice between stack and register based is an > obvious one. If you switch from stack to register based, there's > no need to keep stack manipulation instructions around. > Yes, true. But, what does eliminating MISC stack instructions and replacing them with MISC register instructions have to do with the CISC concept of code density or with CISC instructions? ... > > Also, you cross-posted to comp.arch.fpga. While they'll > > likely be familiar with FPGAs, most there are not going > > to be familiar with the features of stack-based processors > > or Forth processors that you discuss indirectly within your > > post. They might not be familiar with ancient CISC > > concepts such as "code density" either, or understand why > > it was important at one point in time. E.g., I suspect this > > Forth related stuff from above won't be widely > > understood on c.a.f. without clarification: > > The design of simple and compact processors is of great interest > to many FPGA engineers. Plenty of FPGA designs need some > sort of control processor, and for cost reduction it's important > to use minimal resources. Like rickman said, this involves a > careful balance between implementation complexity, speed, > and code density, while also considering how much work it > is to write/maintain the software that's running on the > processor. > > Code density is still critically important. Fast memory is > small, both on FPGA as well as general purpose processors. > So, shouldn't he dump the entire MISC instruction set he has, and implement a CISC instruction set instead? That's the only way he's going to get the "critically important" code density, which I'll take it you rank well above MISC as being important. Of course, a CISC instruction set requires a more complicated instruction decoder ... So, it seems that either way he proceeds, he is being contrained by the "minimal resources" of his FPGA. That was what he stated: "My efforts have shown it hard to improve on code density by a significant degree while simultaneously minimizing the resources used by the design." I.e., if the FPGA he is attempting to use is insufficient to do what he wants or needs, then it's insufficient. Or, he needs some new techniques. He didn't explicitly ask for any though ... Glen mentioned the numerous address modes of the VAX. The 6502 also had alot of address modes and had instructions which used zero-page as a set of fast registers. I would think that early, compact designs like the 6502 and Z80 could be useful to rickman. They had low transistor counts. Z80 8500 transistors 6502 9000 transistors 8088 29000 transistors (for comparison...) Rod PembertonArticle: 155033
In comp.arch.fpga Rod Pemberton <do_not_have@notemailnotq.cpm> wrote: (snip) > Well, you snipped alot of context. I wouldn't have reformatted > all of it either. > Anyway, what I took from rickman's statements was that he had only > managed to confirm what is already known about MISC according to > Wikipedia. I.e., what was/is the point? (snip, someone wrote) >> It is perfectly possible to trade one type of MISC processor for >> another one. The choice between stack and register based is an >> obvious one. If you switch from stack to register based, there's >> no need to keep stack manipulation instructions around. > Yes, true. But, what does eliminating MISC stack instructions and > replacing them with MISC register instructions have to do with the > CISC concept of code density or with CISC instructions? ... Seems to me that one could still Huffman code the opcode, even within the MISC concept. That is, use fewer bits for more common operations, or where it otherwise simplifies the result. As someone noted, you can have an N-1 bit load immediate instruction where N is the instruction size. In one of Knuth's TAOCP books he describes a two instruction computer. Seems like if one is interested in MISC, someone should build that one. Also, maybe a MIX and MMIX machine, maybe the decimal version. For those who don't read TAOCP, MIX is defined independent of the underlying base. Programs in MIXAL are supposed to assemble and run correctly on hosts that use any base within the range specified. Instruction bytes have, if I remember, between 64 and 100 possible values, such that six bits or two decimal digits are possible representations. I believe that allows for bases 2, 3, 4, 8, 9, 10, and 16. -- glenArticle: 155034
On 04/02/2013 02:10 AM, Rod Pemberton wrote: > Yes, true. But, what does eliminating MISC stack instructions and > replacing them with MISC register instructions have to do with the > CISC concept of code density or with CISC instructions? ... I don't think 'code density' is a CISC concept at all. Code density applies to any kind of instruction encoding. Any kind of MISC architecture will have a certain code density (for a given application), and require a certain amount of FPGA resources. And if you store the code in local FPGA memory, and you know your application, you can even convert it all to FPGA resources. Obviously, one MISC design will have better resource use than another, and one of the questions is whether we can make any kind of general statement about stack vs register implementation. > So, shouldn't he dump the entire MISC instruction set he has, and > implement a CISC instruction set instead? That's the only way > he's going to get the "critically important" code density, which > I'll take it you rank well above MISC as being important. Of > course, a CISC instruction set requires a more complicated > instruction decoder ... So, it seems that either way he proceeds, > he is being contrained by the "minimal resources" of his FPGA. Exactly. If you want to get FPGA resources low, a MISC design seems most appropriate. But within the MISC concept, there are still plenty of design decisions left. Really, the biggest problem with FPGA CPU design is that there are too many design decisions, and each decision influences everything else. > Glen mentioned the numerous address modes of the VAX. The 6502 > also had alot of address modes and had instructions which used > zero-page as a set of fast registers. I would think that early, > compact designs like the 6502 and Z80 could be useful to rickman. > They had low transistor counts. > > Z80 8500 transistors > 6502 9000 transistors > 8088 29000 transistors (for comparison...) Low transistor counts do not necessarily translate to low FPGA resources. Early CPUs used dynamic storage, dual clock latches, pass logic and tri-state buses to create really small designs that don't necessarily map well to FPGAs. On the other hand, FPGAs have specific features (depending on the brand) that can be exploited to create really tight designs. Also, these processors are slow, using multi cycle instructions, and 8 bit operations. That may not be acceptable. And even if low performance is acceptable, there are numerous other ways where you can trade speed for code density, so you'd have to consider these too. For instance, I can replace pipelining with multi-cycle execution, or use microcode.Article: 155035
On 3/31/2013 2:34 PM, glen herrmannsfeldt wrote: > In comp.arch.fpga rickman<gnuarm@gmail.com> wrote: (snip) > > I have an actual RX2600 dual Itanium box, but don't run it very > often, mostly because of the power used. A 100 watt lightbulb uses about $10 a month if left on 24/7. So I wouldn't worry too much about the cost of running your Itanium. If you turn it off when you aren't using it I can't imagine it would really cost anything noticeable to run. > For Itanium, the different units do different things. There are > instruction formats that divide up the bits in different ways to make > optimal use of the bits. I used to have the manual nearby, but I don't > see it right now. Yes, the array processor I worked on was coded from scratch, very laboriously. The Itanium is trying to run existing code as fast as possible. So they have a number of units to do similar things, but also different types, all working in parallel as much as possible. Also the parallelism in the array processor was all controlled by the programmer. In regular x86 processors the parallelism is controlled by the chip itself. I'm amazed sometimes at just how much they can get the chip to do, no wonder there are 100's of millions of transistors on the thing. I assume parallelism in the Itanium is back to the compiler smarts to control since it needs to be coded into the VLIW instructions. > For x87, they avoid the DUP, SWAP, and such by instructions being able > to specify any register in the stack. You can, for example, add any > stack register (there are eight) to the top of stack. I haven't thought > about it for a while, but I believe either pushing the result, or > replacing the previous top of the stack. That is something I have not yet looked at, a hybrid approach with a stack, but also with stack top relative addressing. It is important to keep the hardware simple, but that might not be too hard to implement. Arlet talked about his hybrid processor design with multiple stacks. I would need to give this a bit of thought as the question becomes, what possible advantage would a hybrid processor have over the other two? Actually, the image in my mind is a bit like the C language stack frame model. You can work with addressing relative to the top of stack and when a sub is called it layers its variables on top of the existing stack. That would require a bit of entry and exit code, so there will be a tradeoff between simple register addressing and simple subroutine entry. > Seems to me that one possibility is to have a really functional stack > operation instruction, such that, with the give number of bits, it > allows for the most opertions. Some combination of DUP, SWAP, and POP > all at once. Though that isn't easy for the stack itself. Like many tradeoffs, this can complicate the hardware. If you want to be able to combine an ADD with a SWAP or OVER, you need to be able to access more than two operands on the stack at once. In my designs that means pulling another register out of the proper stack. Rather than one register and a block of memory, each stack would need two registers and the block of memory along with the input multiplexers. This would need to be examined in context of common code to see how much it would help vs the expense of added hardware. I'm still not complete in my analysis of a register based MISC design, but at the moment I think the register approach gives better instruction size efficiency/faster execution as well as simpler hardware design. -- RickArticle: 155036
I am working on a project that will involve a large HDMI switch - up to 16 inputs and 16 outputs. We haven't yet decided on the architecture, but one possibility is to use one or more FPGAs. The FPGAs won't be doing much other than the switch - there is no video processing going on. Each HDMI channel will be up to 3.4 Gbps (for HDMI 1.4), with 4 TMDS pairs (3 data and 1 clock). That means 64 pairs in, and 64 pairs out, all at 3.4 Gbps. Does anyone know of any FPGA families that might be suitable here? I've had a little look at Altera (since I've used Altera devices before), but their low-cost transceivers are at 3.125 Gbps - this means we'd have to use their mid or high cost devices, and they don't have nearly enough channels. I don't expect the card to be particularly cheap, but I'd like to avoid the cost of multiple top-range FPGA devices - then it would be much cheaper just to have a card with 80 4-to-1 HDMI mux chips. Thanks for any pointers, DavidArticle: 155037
On 3/30/2013 5:54 PM, Rod Pemberton wrote: > "rickman"<gnuarm@gmail.com> wrote in message > news:kj4vae$msi$1@dont-email.me... > >> I have been working with stack based MISC designs in FPGAs for >> some years. All along I have been comparing my work to the work >> of others. These others were the conventional RISC type >> processors supplied by the FPGA vendors as well as the many >> processor designs done by individuals or groups as open source. >> >> So far my CPUs have always ranked reasonably well in terms of >> speed, but more importantly to me, very well in terms of size >> and code density. My efforts have shown it hard to improve on >> code density by a significant degree while simultaneously >> minimizing the resources used by the design. Careful selection >> of the instruction set can both improve code density and >> minimize logic used if measured together, but there is always a >> tradeoff. One can always be improved at the expense of the >> other. >> >> The last couple of days I was looking at some code I plan to use >> and realized that it could be a lot more efficient if I could >> find a way to use more parallelism inside the CPU and use fewer >> instructions. So I started looking at defining separate opcodes >> for the two primary function units in the design, the data stack >> and the return stack. Each has its own ALU. The data stack has >> a full complement of capabilities while the return stack can >> only add, subtract and compare. The return stack is actually >> intended to be an "address" processing unit. >> >> While trying to figure out how to maximize the parallel >> capabilities of these units, I realized that many operations >> were just stack manipulations. Then I read the thread about the >> relative "cost" of stack ops vs memory accesses and I realized >> these were what I needed to optimize. I needed to find a way to >> not use an instruction and a clock cycle for moving data around >> on the stack. >> >> In the thread on stack ops it was pointed out repeatedly that >> very often the stack operands would be optimized to register >> operands, meaning they wouldn't need to do the stack ops at all >> really. So I took a look at a register based MISC design. >> Guess what, I don't see the disadvantage! I have pushed this >> around for a couple of days and although I haven't done a >> detailed design, I think I have looked at it enough to realize >> that I can design a register oriented MISC CPU that will run as >> fast, if not faster than my stack based design and it will use >> fewer instructions. I still need to add some features like >> support for a stack in memory, in other words, >> pre-increment/post-decrement (or the other way around...), but I >> don't see where this is a bad design. It may end up using >> *less* logic as well. My stack design provides access to the >> stack pointers which require logic for both the pointers and >> muxing them into the data stack for reading. >> >> I guess looking at other peoples designs (such as Chuck's) has >> changed my perspective over the years so that I am willing and >> able to do optimizations in ways I would not have wanted to do >> in the past. But I am a bit surprised that there has been so >> much emphasis on stack oriented MISC machines which it may well >> be that register based MISC designs are also very efficient, >> at least if you aren't building them to service a C compiler or >> trying to match some ideal RISC model. >> > > Are those your actual results or did you just reiterate what is on > Wikipedia? Yes, that's a serious question. Read the MISC page: > http://en.wikipedia.org/wiki/Minimal_instruction_set_computer > > See ... ?! You clearly don't understand my post or the Wiki article or possibly both. I suggest you reread both and then ask questions if you still don't understand. > Code density is a CISC concept. I don't see how it applies to > your MISC project. Yes, I get that you don't understand. Do you have a specific question? > Increasing code density for a MISC processor > means implementing more powerful instructions, i.e., those that do > more work, while minimizing bytes in the instruction opcode > encoding. Yes, that part you seem to understand. > Even if you implement CISC-like instructions, you can't > forgo the MISC instructions you already have in order to add the > CISC-like instructions. Really? I can't drop the entire instruction set and start over? Who says so? Am I breaking a law? > So, to do that, you'll need to increase > the size of the instruction set, as well as implement a more > complicated instruction decoder. Define "increase the size of the instruction set". I am using a 9 bit opcode for my stack design and am using a similar 9 bit opcode for the register design. In what way is the register design using a larger instruction set? That was exactly the blinder I was wearing until now. I had read a lot about register CPU instruction sets where the intent was not in line with MISC goals. MicroBlaze is a good example. I think it uses well in excess of 1000 LUTs, maybe multiple 1000's. I need something that is much smaller and it hadn't occurred to me that perhaps the common goals of register designs (lots of registers, orthogonality, address mode flexibility, etc) could be limited or even tossed out the window. I don't need a machine that is easy for a C compiler to produce code for. My goals are for minimal hardware without losing any more performance than is essential. In particular I work in FPGAs, so the design needs to work well in that environment. > I.e., that means the processor > will no longer be MISC, but MISC+minimal CISC hybrid, or pure > CISC... Nonsense. "Minimal Instruction Set Computer (MISC) is a processor architecture with a very small number of basic operations and corresponding opcodes." from Wikipedia. BTW, I don't think the term MISC is widely used and is not well defined. This is the only web page I found that even tries to define it. Actually, if you consider only the opcodes and not the operand combinations, I think the register design may have fewer instructions than does the stack design. But the register design still is in work so I'm not done counting yet. There are some interesting corners to be explored. For example a MOV rx,rx is essentially a NOP. There are eight of these. So instead, why not make them useful by clearing the register? So MOV rx,rx is a clear to be given the name CLR rx... unless the rx is r7 in which case it is indeed a MOV r7,r7 which is now a NOP to be coded as such. The CLR r7 is not needed because LIT 0 already does the same job. Even better the opcode for a NOP is 0x1FF or octal 777. That's very easy to remember and recognize. It feels good to find convenient features like this and makes me like the register MISC design. > No offense, but you seem to be "reinventing the wheel" in terms of > microprocessor design. You're coming to the same conclusions that > were found in the 1980's, e.g., concluding a register based > machine can perform better than a stack based machine, except > you've applied it to MISC in an FPGA package... How is that a new > conclusion? I don't see how you can say that. I don't know of other MISC designs that are very good. I think the picoBlaze is one, but I don't think it was designed to be very MISC really. It was designed to be small, period. It has a lot of fixed features and so can't be altered without tossing old code. Some of the MicroChip devices might be MISC. I have not worked with them. I might want to take a look actually. There may be useful ideas. But "reinventing the wheel"? I don't think so. -- RickArticle: 155038
In comp.arch.fpga rickman <gnuarm@gmail.com> wrote: > On 3/31/2013 2:34 PM, glen herrmannsfeldt wrote: >> In comp.arch.fpga rickman<gnuarm@gmail.com> wrote: > (snip) >> I have an actual RX2600 dual Itanium box, but don't run it very >> often, mostly because of the power used. > A 100 watt lightbulb uses about $10 a month if left on 24/7. So I > wouldn't worry too much about the cost of running your Itanium. If you > turn it off when you aren't using it I can't imagine it would really > cost anything noticeable to run. It is a dual processor box, plus all the rest of the systems in the box. Yes, I was considering running it all the time, but it is too expensive for that. >> For Itanium, the different units do different things. There are >> instruction formats that divide up the bits in different ways to make >> optimal use of the bits. I used to have the manual nearby, but I don't >> see it right now. > Yes, the array processor I worked on was coded from scratch, very > laboriously. The Itanium is trying to run existing code as fast as > possible. So they have a number of units to do similar things, but also > different types, all working in parallel as much as possible. Also the > parallelism in the array processor was all controlled by the programmer. > In regular x86 processors the parallelism is controlled by the chip > itself. I'm amazed sometimes at just how much they can get the chip to > do, no wonder there are 100's of millions of transistors on the thing. > I assume parallelism in the Itanium is back to the compiler smarts to > control since it needs to be coded into the VLIW instructions. Seems to me that the big problem with the original Itanium was the need to also run x86 code. That delayed the release for some time, and in that time other processors had advanced. I believe that later versions run x86 code in software emulation, maybe with some hardware assist. -- glenArticle: 155039
On 4/1/2013 8:10 PM, Rod Pemberton wrote: > "Arlet Ottens"<usenet+5@c-scape.nl> wrote in message > news:5157e1a1$0$6924$e4fe514c@news2.news.xs4all.nl... >> On 03/30/2013 10:54 PM, Rod Pemberton wrote: > >>>> >>>> I guess looking at other peoples designs (such as Chuck's) >>>> has changed my perspective over the years so that I am >>>> willing and able to do optimizations in ways I would not have >>>> wanted to do in the past. But I am a bit surprised that there >>>> has been so much emphasis on stack oriented MISC machines >>>> which it may well be that register based MISC designs are >>>> also very efficient, at least if you aren't building them to >>>> service a C compiler or trying to match some ideal RISC >>>> model. >>>> >>> >>> Are those your actual results or did you just reiterate what >>> is on Wikipedia? Yes, that's a serious question. Read the >>> MISC page: [link] >>> >>> See ... ?! >> >> It sounds to me rickman is questioning the (unsupported) claims >> on wikipedia that stack based machines have an advantage in size >> and/or simplicity, not reiterating them. >> > > Well, you snipped alot of context. I wouldn't have reformatted > all of it either. > > Anyway, what I took from rickman's statements was that he had only > managed to confirm what is already known about MISC according to > Wikipedia. I.e., what was/is the point? You really need to reread the wiki description of MISC. There seems to be a disconnect. >>> Code density is a CISC concept. I don't see how it applies to >>> your MISC project. Increasing code density for a MISC >>> processor means implementing more powerful instructions, >>> i.e., those that do more work, while minimizing bytes in the >>> instruction opcode encoding. Even if you implement >>> CISC-like instructions, you can't forgo the MISC instructions >>> you already have in order to add the CISC-like instructions. >>> So, to do that, you'll need to increase the size of the >>> instruction set, as well as implement a more complicated >>> instruction decoder. I.e., that means the processor will no >>> longer be MISC, but MISC+minimal CISC hybrid, or pure >>> CISC... >> >> It is perfectly possible to trade one type of MISC processor for >> another one. The choice between stack and register based is an >> obvious one. If you switch from stack to register based, there's >> no need to keep stack manipulation instructions around. >> > > Yes, true. But, what does eliminating MISC stack instructions and > replacing them with MISC register instructions have to do with the > CISC concept of code density or with CISC instructions? ... Weren't you the person who brought CISC into this discussion? Why are you asking this question about CISC? >>> Also, you cross-posted to comp.arch.fpga. While they'll >>> likely be familiar with FPGAs, most there are not going >>> to be familiar with the features of stack-based processors >>> or Forth processors that you discuss indirectly within your >>> post. They might not be familiar with ancient CISC >>> concepts such as "code density" either, or understand why >>> it was important at one point in time. E.g., I suspect this >>> Forth related stuff from above won't be widely >>> understood on c.a.f. without clarification: >> >> The design of simple and compact processors is of great interest >> to many FPGA engineers. Plenty of FPGA designs need some >> sort of control processor, and for cost reduction it's important >> to use minimal resources. Like rickman said, this involves a >> careful balance between implementation complexity, speed, >> and code density, while also considering how much work it >> is to write/maintain the software that's running on the >> processor. >> >> Code density is still critically important. Fast memory is >> small, both on FPGA as well as general purpose processors. >> > > So, shouldn't he dump the entire MISC instruction set he has, and > implement a CISC instruction set instead? That's the only way > he's going to get the "critically important" code density, which > I'll take it you rank well above MISC as being important. Of > course, a CISC instruction set requires a more complicated > instruction decoder ... So, it seems that either way he proceeds, > he is being contrained by the "minimal resources" of his FPGA. > That was what he stated: > > "My efforts have shown it hard to improve on code density by a > significant degree while simultaneously minimizing the resources > used by the design." I have "dumped" the stack related portion of the instruction set and replaced it with register references. Why would I do otherwise? > I.e., if the FPGA he is attempting to use is insufficient to do > what he wants or needs, then it's insufficient. Or, he needs some > new techniques. He didn't explicitly ask for any though ... No one said anything about "insufficient". I am looking for an optimal design for a small CPU that can be used efficiently in an FPGA. > Glen mentioned the numerous address modes of the VAX. The 6502 > also had alot of address modes and had instructions which used > zero-page as a set of fast registers. I would think that early, > compact designs like the 6502 and Z80 could be useful to rickman. > They had low transistor counts. > > Z80 8500 transistors > 6502 9000 transistors > 8088 29000 transistors (for comparison...) I'm not counting transistors. That is one of the fallacies of comparing chip designs to FPGA designs. When you have the flexibility of using transistors you can do things in ways that are efficient that are not efficient in the LUTs of an FPGA. The two big constraints are resources used in the FPGA and opcode size. If an instruction would require addition of resources to the design, it needs to justify that addition. An instruction also needs to justify the portion of the opcode space it requires. Operations that specify more than one register become very expensive in terms of opcode bits. Operations that require additional data paths become expensive in terms of resources. Doing as much as possible with as little as possible is what I'm looking for. BTW, the ZPU is a pretty good example of just how small a design can be. But it is very slow. That's the other size of the equation. Improve speed as much as possible while not using more resources than possible. The optimal point depends on the coefficients used... in other words, my judgement. This is not entirely objective. -- RickArticle: 155040
On 4/1/2013 8:54 PM, glen herrmannsfeldt wrote: > > Seems to me that one could still Huffman code the opcode, even > within the MISC concept. That is, use fewer bits for more common > operations, or where it otherwise simplifies the result. > > As someone noted, you can have an N-1 bit load immediate instruction > where N is the instruction size. Yup, I am still using the high bit to indicate an immediate operand. This requires an implied location, so this literal is always in R7, the addressing register. In my stack design it is always the return stack, also the addressing register. Jumps and calls still contain a small literal to be used alone when the relative jump is within range or with the literal instruction when not. I think this is very similar to Huffman encoding. > In one of Knuth's TAOCP books he describes a two instruction computer. > > Seems like if one is interested in MISC, someone should build that one. You can design a one instruction computer, but there is a balance between resources used and the effectiveness of the resulting design. The effectiveness of this sort of design is too low. > Also, maybe a MIX and MMIX machine, maybe the decimal version. > > For those who don't read TAOCP, MIX is defined independent of the > underlying base. Programs in MIXAL are supposed to assemble and > run correctly on hosts that use any base within the range specified. > > Instruction bytes have, if I remember, between 64 and 100 possible > values, such that six bits or two decimal digits are possible > representations. > > I believe that allows for bases 2, 3, 4, 8, 9, 10, and 16. Doesn't sound like an especially practical computer. Has anyone ever built one? -- RickArticle: 155041
On 4/2/2013 2:25 AM, Arlet Ottens wrote: > On 04/02/2013 02:10 AM, Rod Pemberton wrote: >> >> Z80 8500 transistors >> 6502 9000 transistors >> 8088 29000 transistors (for comparison...) > > Low transistor counts do not necessarily translate to low FPGA > resources. Early CPUs used dynamic storage, dual clock latches, pass > logic and tri-state buses to create really small designs that don't > necessarily map well to FPGAs. On the other hand, FPGAs have specific > features (depending on the brand) that can be exploited to create really > tight designs. > > Also, these processors are slow, using multi cycle instructions, and 8 > bit operations. That may not be acceptable. And even if low performance > is acceptable, there are numerous other ways where you can trade speed > for code density, so you'd have to consider these too. For instance, I > can replace pipelining with multi-cycle execution, or use microcode. I think you have a very good handle on the nature of my goals. -- RickArticle: 155042
In comp.arch.fpga rickman <gnuarm@gmail.com> wrote: (snip, I wrote) >> Also, maybe a MIX and MMIX machine, maybe the decimal version. >> For those who don't read TAOCP, MIX is defined independent of the >> underlying base. Programs in MIXAL are supposed to assemble and >> run correctly on hosts that use any base within the range specified. >> Instruction bytes have, if I remember, between 64 and 100 possible >> values, such that six bits or two decimal digits are possible >> representations. >> I believe that allows for bases 2, 3, 4, 8, 9, 10, and 16. > Doesn't sound like an especially practical computer. > Has anyone ever built one? Well, it exists mostly to write the examples (and, I believe, homework problems) in the book. I believe that there have been software emulation, but maybe no hardware (FPGA) versions. MIXAL programs should be base independent, but actual implementations are likely one base. (The model number, 1009 or MIX in roman numerals, is supposed to be the average of the model numbers of some popular machines. There is also the DLX, a RISC machine used by Hennessy and Patterson in their book, which could also be in roman numerals. -- glenArticle: 155043
On 02/04/13 19:20, glen herrmannsfeldt wrote: > In comp.arch.fpga rickman <gnuarm@gmail.com> wrote: >> On 3/31/2013 2:34 PM, glen herrmannsfeldt wrote: >>> For Itanium, the different units do different things. There are >>> instruction formats that divide up the bits in different ways to make >>> optimal use of the bits. I used to have the manual nearby, but I don't >>> see it right now. > >> Yes, the array processor I worked on was coded from scratch, very >> laboriously. The Itanium is trying to run existing code as fast as >> possible. So they have a number of units to do similar things, but also >> different types, all working in parallel as much as possible. Also the >> parallelism in the array processor was all controlled by the programmer. >> In regular x86 processors the parallelism is controlled by the chip >> itself. I'm amazed sometimes at just how much they can get the chip to >> do, no wonder there are 100's of millions of transistors on the thing. >> I assume parallelism in the Itanium is back to the compiler smarts to >> control since it needs to be coded into the VLIW instructions. > > Seems to me that the big problem with the original Itanium was the > need to also run x86 code. That delayed the release for some time, and > in that time other processors had advanced. I believe that later > versions run x86 code in software emulation, maybe with some hardware > assist. > x86 compatibility was not the "big" problem with the Itanium (though it didn't help). There were two far bigger problems. One is that the chip was targeted as maximising throughput with little regard for power efficiency, since it was for the server market - so all of the logic was running all of the time to avoid latencies, and it has massive caches that run as fast as possible. The result here is that the original devices had a power density exceeding the core of a nuclear reactor (it was probably someone from AMD who worked that out...). The big problem, however, is that the idea with VLIW is that the compiler does all the work scheduling instructions in a way that lets them run in parallel. This works in some specialised cases - some DSP's have this sort of architecture, and some types of mathematical algorithms suit it well. But when Intel started work on the Itanium, compilers were not up to the task - Intel simply assumed they would work well enough by the time the chips were ready. Unfortunately for Intel, compiler technology never made it - and in fact, it will never work particularly well for general code. There are too many unpredictable branches and conditionals to predict parallelism at compile time. So most real-world Itanium code uses only about a quarter or so of the processing units in the cpu at any one time (though some types of code can work far better). Thus Itanium chips run at half the real-world speed of "normal" processors, while burning through at least twice the power.Article: 155044
"rickman" <gnuarm@gmail.com> wrote in message news:kjf48e$5qu$1@dont-email.me... > Weren't you the person who brought CISC into this discussion? Yes. > Why are you asking this question about CISC? You mentioned code density. AISI, code density is purely a CISC concept. They go together and are effectively inseparable. RISC was about effectively using all the processor clock cycles by using fast instructions. RISC wasn't concerned about the encoded size of instructions, how much memory a program consumed, the cost of memory, or how fast memory needed to be. CISC was about reducing memory consumed per instruction. CISC reduced the average size of encoded instructions while also increasing the amount of work each instruction performs. CISC was typically little-endian to reduce the space needed for integer encodings. However, increasing the amount of work per instruction produces highly specialized instructions that are the characteristic of CISC. You only need to look at the x86 instruction set to find some, e.g., STOS, LODS, XLAT, etc. They are also slow to decode and execute as compared to RISC. So, if memory is cheap and fast, there is no point in improving code density, i.e., use RISC. If memory is expensive or slow, use CISC. Arlet mentioned changes to a processor that appeared to me to have nothing to do with increasing or decreasing code density. AISI, the changes he mentioned would only affect what was in current set of MISC instructions, i.e., either a set of register-based MISC instructions or a set of stack-based MISC instructions. This was stated previously. Rod PembertonArticle: 155045
"rickman" <gnuarm@gmail.com> wrote in message news:kjeve8$tvm$1@dont-email.me... > On 3/30/2013 5:54 PM, Rod Pemberton wrote: ... > > Even if you implement CISC-like instructions, you can't > > forgo the MISC instructions you already have in order to > > add the CISC-like instructions. > > Really? I can't drop the entire instruction set and start over? > Who says so? Am I breaking a law? > It's just a logical outcome, AISI. The design criteria that you stated was that of producing MISC processors. MISC seems to be purely about the minimizing the quantity of instructions. You've produced a MISC processor. So, if you now change your mind about MISC and add additional instructions to your processor's instruction set, especially non-MISC instructions, you're effectively going against your own stated design requirement: MISC or reducing the quantity of instructions. So, it'd be you who says so, or not... It just seems contradictory with your past self to change course now with your current self. > > So, to do that, you'll need to increase > > the size of the instruction set, as well as implement > > a more complicated instruction decoder. > > Define "increase the size of the instruction set". You'll have more instructions in your instruction set. > I am using a 9 bit opcode for my stack design and am > using a similar 9 bit opcode for the register design. In what > way is the register design using a larger instruction set? > You haven't added any additional CISC-like instructions, yet. You just exchanged stack operations for register operations. So, none for now. > > I.e., that means the processor will no longer be MISC, >> but MISC+minimal CISC hybrid, or pure > > CISC... > > Nonsense. "Minimal Instruction Set Computer (MISC) is a > processor architecture with a very small number of basic > operations and corresponding opcodes." from Wikipedia. > BTW, I don't think the term MISC is widely used and is not > well defined. This is the only web page I found that even > tries to define it. > > Actually, if you consider only the opcodes and not the operand > combinations, I think the register design may have fewer > instructions than does the stack design. But the register > design still is in work so I'm not done counting yet. > How exactly does fewer instructions contribute to increased code density for the remaining instructions? The eliminated instructions are no longer a measured component of code density. I.e., they no longer consume memory and therefore aren't measured. > There are some interesting corners to be explored. For example > a MOV rx,rx is essentially a NOP. There are eight of these. So > instead, why not make them useful by clearing the register? > So MOV rx,rx is a clear to be given the name CLR rx... unless > the rx is r7 in which case it is indeed a MOV r7,r7 which is now > a NOP to be coded as such. The CLR r7 is not needed because > LIT 0 already does the same job. Even better the opcode for a > NOP is 0x1FF or octal 777. That's very easy to remember > and recognize. It feels good to find convenient features like > this and makes me like the register MISC design. I can see that you're attempting to minimize the quantity of implemented instructions. Although similar in nature, that's not the same as improving the code density. Are you conflating two different concepts? One of them reduces the encoded size of an instruction, while the other eliminates instructions ... How are you going to attempt to increase the code density for your processor? 1) adding new, additional, more powerful instructions that you don't already have 2) merging existing instruction into fewer instructions 3) finding a more compact method of instruction encoding 4) use little-endian to reduce encoded sizes of integers 5) none or other I'd think it should be #1 and #3 and #4, or #2 and #3 and #4, or "other" and #3 and #4 ... Rod PembertonArticle: 155046
In comp.arch.fpga Rod Pemberton <do_not_have@notemailnotq.cpm> wrote: > "rickman" <gnuarm@gmail.com> wrote in message > news:kjf48e$5qu$1@dont-email.me... >> Weren't you the person who brought CISC into this discussion? > Yes. >> Why are you asking this question about CISC? > You mentioned code density. AISI, code density is purely a CISC > concept. They go together and are effectively inseparable. They do go together, but I am not so sure that they are inseperable. CISC began when much coding was done in pure assembler, and anything that made that easier was useful. (One should figure out the relative costs, but at least it was in the right direction.) That brought instructions like S/360 EDMK and VAX POLY. (Stories are that on most VAX models, POLY is slower than an explicit loop.) Now, there is no need to waste bits, and so instruction formats were defined to use the available bits. S/360 (and successors) only have three different instruction lengths, and even then sometimes waste bits. The VAX huge number of different instructions lengths, and also IA32, does seem to be for code size efficiency. VAX was also defined with a 512 byte page size, even after S/370 had 2K and 4K pages. Way too small, but maybe seemed right at the time. > RISC was about effectively using all the processor clock cycles by > using fast instructions. RISC wasn't concerned about the encoded > size of instructions, how much memory a program consumed, the > cost of memory, or how fast memory needed to be. Yes, but that doesn't mean that CISC is concerned with the size of instructions. > CISC was about reducing memory consumed per instruction. CISC > reduced the average size of encoded instructions while also > increasing the amount of work each instruction performs. Even if it is true (and it probably is) that CISC tends to make efficient use of the bits, that doesn't prove that is what CISC was about. As above, CISC was about making coding easier for programmers, specifically assembly programmers. Now, complex instructions take less space than a series of simpler instructions, but then again one could use a subroutine call. The PDP-10 has the indirect bit, allowing for nested indirection, which may or may not make efficient use of that bit. S/360 uses a sequence of L (load) instructions to do indirection. Instruction usage statistics noting how often L (load) was executed in S/360 code may have been the beginning of RISC. > CISC was typically little-endian to reduce the space needed > for integer encodings. This I don't understand at all. They take the same amount of space. Little endian does make it slightly easier to do a multiword (usually byte) add, and that may have helped for the 6502. It allows one to propagate the carry in the same order one reads bytes from memory. But once you add multiply and divide, the advantage is pretty small. > However, increasing the amount of work per instruction > produces highly specialized instructions that are the > characteristic of CISC. You only need to look at the x86 > instruction set to find some, e.g., STOS, LODS, XLAT, etc. They > are also slow to decode and execute as compared to RISC. Those are not very CISCy compared with some S/360 or VAX instructions. Now, compare to S/360 TR which will translate by looking up bytes in a lookup table for 1 to 256 byte long strings. (Unless I remember wrong, XLAT does one byte.) > So, if memory is cheap and fast, there is no point in improving > code density, i.e., use RISC. If memory is expensive or slow, use > CISC. Well, RISC is more toward using simpler instructions that compilers actually generate and executing them fast. Having one instruction size helps things go fast, and tends to be less efficient with bits. Even so, I believe that you will find that RISC designers try to make efficient use of the bits available, within the single size instruction constraint. > Arlet mentioned changes to a processor that appeared to me to have > nothing to do with increasing or decreasing code density. AISI, > the changes he mentioned would only affect what was in current set > of MISC instructions, i.e., either a set of register-based MISC > instructions or a set of stack-based MISC instructions. This was > stated previously. Decoding multiple different instruction formats tends to require complicated demultiplexers which are especially hard to do in an FPGA. Even so, one can make efficient use of the bits and still be MISC. -- glenArticle: 155047
On 4/3/2013 8:34 PM, Rod Pemberton wrote: > "rickman"<gnuarm@gmail.com> wrote in message > news:kjf48e$5qu$1@dont-email.me... > >> Weren't you the person who brought CISC into this discussion? > > Yes. > >> Why are you asking this question about CISC? > > You mentioned code density. AISI, code density is purely a CISC > concept. They go together and are effectively inseparable. Ok, so that's how you see it. > RISC was about effectively using all the processor clock cycles by > using fast instructions. RISC wasn't concerned about the encoded > size of instructions, how much memory a program consumed, the > cost of memory, or how fast memory needed to be. Don't know why you are even mentioning RISC. > CISC was about reducing memory consumed per instruction. CISC > reduced the average size of encoded instructions while also > increasing the amount of work each instruction performs. CISC was > typically little-endian to reduce the space needed for integer > encodings. However, increasing the amount of work per instruction > produces highly specialized instructions that are the > characteristic of CISC. You only need to look at the x86 > instruction set to find some, e.g., STOS, LODS, XLAT, etc. They > are also slow to decode and execute as compared to RISC. I think CISC was not solely about reducing memory used per instruction. CISC was not an area of work. CISC was not even coined until long after many CISC machines were designed. Most CISC computers were designed with very different goals in mind. For example, the x86, a CISC processor, was initially designed to extend the x86 instruction set to a 16 bit processor and then to a 32 bit processor. The goal was just to develop an instruction set that was backwards compatible with existing processors while adding capabilities that would make 32 bit processors marketable. > So, if memory is cheap and fast, there is no point in improving > code density, i.e., use RISC. If memory is expensive or slow, use > CISC. LOL, that is a pretty MINIMAL analysis of computers, so I guess it is MACA, Minimal Analysis of Computer Architectures. > Arlet mentioned changes to a processor that appeared to me to have > nothing to do with increasing or decreasing code density. AISI, > the changes he mentioned would only affect what was in current set > of MISC instructions, i.e., either a set of register-based MISC > instructions or a set of stack-based MISC instructions. This was > stated previously. This is so far out of context I can't comment. -- RickArticle: 155048
On 4/3/2013 8:35 PM, Rod Pemberton wrote: > "rickman"<gnuarm@gmail.com> wrote in message > news:kjeve8$tvm$1@dont-email.me... >> On 3/30/2013 5:54 PM, Rod Pemberton wrote: > ... > >>> Even if you implement CISC-like instructions, you can't >>> forgo the MISC instructions you already have in order to >>> add the CISC-like instructions. >> >> Really? I can't drop the entire instruction set and start over? >> Who says so? Am I breaking a law? >> > > It's just a logical outcome, AISI. The design criteria that you > stated was that of producing MISC processors. MISC seems to be > purely about the minimizing the quantity of instructions. You've > produced a MISC processor. So, if you now change your mind about > MISC and add additional instructions to your processor's > instruction set, especially non-MISC instructions, you're > effectively going against your own stated design requirement: MISC > or reducing the quantity of instructions. So, it'd be you who > says so, or not... It just seems contradictory with your past > self to change course now with your current self. Your logic seems to be flawed on so many levels. I don't think I stated that producing a MISC processor was a "design criteria". It doesn't even make sense to have that as a "design criteria". I never said I was "adding" instructions to some existing instruction set. In fact, I think I've said that the instruction set for the register based MISC processor is so far, *smaller* than the instruction set for the stack based MISC processor as long as you don't consider each combination of X and Y in MOV rx,ry to be a separate instruction. If you feel each combination is a separate instruction then they both have approximately the same number of instructions since they both have 9 bit instructions and so have 512 possible instructions. >>> So, to do that, you'll need to increase >>> the size of the instruction set, as well as implement >>> a more complicated instruction decoder. >> >> Define "increase the size of the instruction set". > > You'll have more instructions in your instruction set. Sorry, that isn't a good definition because you used part of the term you are defining in the definition. >> I am using a 9 bit opcode for my stack design and am >> using a similar 9 bit opcode for the register design. In what >> way is the register design using a larger instruction set? >> > > You haven't added any additional CISC-like instructions, yet. You > just exchanged stack operations for register operations. So, > none for now. Ok, now we are getting somewhere. In fact, if you read my other posts, you will find that I *won't* be adding any CISC instructions because one of my stated "design criteria" is that each instruction executes in one clock cycle. It's pretty hard to design a simple machine that can do "complex" instructions without executing them in multiple clock cycles. >>> I.e., that means the processor will no longer be MISC, >>> but MISC+minimal CISC hybrid, or pure >>> CISC... >> >> Nonsense. "Minimal Instruction Set Computer (MISC) is a >> processor architecture with a very small number of basic >> operations and corresponding opcodes." from Wikipedia. >> BTW, I don't think the term MISC is widely used and is not >> well defined. This is the only web page I found that even >> tries to define it. >> >> Actually, if you consider only the opcodes and not the operand >> combinations, I think the register design may have fewer >> instructions than does the stack design. But the register >> design still is in work so I'm not done counting yet. >> > > How exactly does fewer instructions contribute to increased code > density for the remaining instructions? The eliminated > instructions are no longer a measured component of code density. > I.e., they no longer consume memory and therefore aren't measured. Not sure what you mean here. Code density how many instructions it takes to do a given amount of work. I measure this by writing code and counting the instructions it takes. Right now I have a section of code I am working on that performs the DDS calculations from a set of control inputs to the DDS. This is what I was working on when I realized that a register based design likely could do this without the stack ops, OVER mainly, but also nearly all the others that just work on the top two stack items. So far it appears the register based instructions are significantly more compact than the stack based instructions. Just as important, the implementation appears to be simpler for the register based design. But that is just *so far*. I am still working on this. The devil is in the details and I may find some aspects of what I am doing that cause problems and can't be done in the instruction formats I am planning or something blows up the hardware to be much bigger than I am picturing at the moment. >> There are some interesting corners to be explored. For example >> a MOV rx,rx is essentially a NOP. There are eight of these. So >> instead, why not make them useful by clearing the register? >> So MOV rx,rx is a clear to be given the name CLR rx... unless >> the rx is r7 in which case it is indeed a MOV r7,r7 which is now >> a NOP to be coded as such. The CLR r7 is not needed because >> LIT 0 already does the same job. Even better the opcode for a >> NOP is 0x1FF or octal 777. That's very easy to remember >> and recognize. It feels good to find convenient features like >> this and makes me like the register MISC design. > > I can see that you're attempting to minimize the quantity of > implemented instructions. Although similar in nature, that's not > the same as improving the code density. Are you conflating two > different concepts? One of them reduces the encoded size of an > instruction, while the other eliminates instructions ... You really don't seem to understand what I am doing. You continually misinterpret what I explain. > How are you going to attempt to increase the code density for your > processor? > > 1) adding new, additional, more powerful instructions that you > don't already have > 2) merging existing instruction into fewer instructions > 3) finding a more compact method of instruction encoding > 4) use little-endian to reduce encoded sizes of integers > 5) none or other > > I'd think it should be #1 and #3 and #4, or #2 and #3 and #4, or > "other" and #3 and #4 ... Uh, I am designing an instruction set that does as much as possible with as little hardware as possible. When you say, "new, additional" instructions, compared to what? When you say "more compact", again, compared to what exactly? When you say "litte-endian" to reduce encoded integer size, what exactly is that? Are you referring to specifying an integer in small chunks so that sign extension allows the specification to be limited in length? Yes, that is done on both the stack and register based designs. Koopmans paper lists literals and calls as some of the most frequently used instructions, so optimizing literals optimizes the most frequently used instructions. -- RickArticle: 155049
On 29/03/2013 21:00, rickman wrote: > I have been working with stack based MISC designs in FPGAs for some > years. All along I have been comparing my work to the work of others. > These others were the conventional RISC type processors supplied by the > FPGA vendors as well as the many processor designs done by individuals > or groups as open source. <snip> Can you achieve as fast interrupt response times on a register-based machine as a stack machine? OK, shadow registers buy you one fast interrupt, but that's sort of a one-level 2D stack. Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt response time. Cheers -- Syd
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z