Messages from 155025

Article: 155025
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Sat, 30 Mar 2013 19:00:43 -0400
Links: << >> << T >> << A >>

On 3/30/2013 3:20 AM, Arlet Ottens wrote:
> On 03/29/2013 10:00 PM, rickman wrote:
>> I have been working with stack based MISC designs in FPGAs for some
>> years. All along I have been comparing my work to the work of others.
>> These others were the conventional RISC type processors supplied by the
>> FPGA vendors as well as the many processor designs done by individuals
>> or groups as open source.
>>
>> So far my CPUs have always ranked reasonably well in terms of speed, but
>> more importantly to me, very well in terms of size and code density. My
>> efforts have shown it hard to improve on code density by a significant
>> degree while simultaneously minimizing the resources used by the design.
>> Careful selection of the instruction set can both improve code density
>> and minimize logic used if measured together, but there is always a
>> tradeoff. One can always be improved at the expense of the other.
>>
>
> I once made a CPU design for an FPGA that had multiple stacks. There was
> a general purpose stack "A", two index stacks "X" and "Y", and a return
> stack "R". ALU operations worked between A and any other stack, so they
> only required 2 bits in the opcode. There was also a move instruction
> that could move data from a source to a destination stack.

Let's see, this would be a hybrid really, between register based and 
stack based as it has multiple stacks which must be selected in the 
instruction set like registers, just not many.

> Having access to multiple stacks means you spend less time shuffling
> data on the stack. There's no more need for swap, over, rot and similar
> stack manipulation instructions. The only primitive operations you need
> are push and pop.

There is the extra hardware required to implement multiple stacks.  Why 
have quite so many?  I use the return stack for addresses.  That works 
pretty well.  Maybe A, X and R?

> For instance, I had a load instruction that could load from memory using
> the address in the X stack, and push the result on the A stack. The cool
> part is that the X stack itself isn't changed by this operation, so the
> same address can be used multiple time. So, you could do a
>
> LOAD (X) ; load from (X) and push on A
> 1 ; push literal on A
> ADD ; add top two elements of A
> STORE (X) ; pop A, and store in (X)
>
> to increment a location in memory.

But you aren't quite done yet, at least if you care about stack 
overflows.  Chuck's designs don't care.  If he has data on the bottom of 
the stack he can just leave it.  Otherwise you need to drop the address 
on the X stack.

I looked at this when designing my MISC processor.  I ended up with two 
fetch and two stores.  One just does the fetch or store and pops the 
address.  The other does a post increment and retains the address to be 
used in a loop.  This one would have to be dropped at the end.

> And if you wanted to increment X to access the next memory location,
> you'd do:
>
> 1 ; push literal on A
> ADD X ; pop X, pop A, add, and push result on A.
> MOVE A, X ; pop A, and push on X

Add an autoincrement to the index stacks and these three instructions go 
away.

> It was an 8 bit architecture with 9 bit instructions (to match the FPGA
> block RAM + parity bit). Having 9 bit instructions allows an 8 bit
> literal push to be encoded in 1 instruction.

I was all about 9 bit instructions and 18 bit address/data bus sizes 
until I started working with the iCE40 parts which only have 8 bit 
memory.  I think the parity bit is a comms industry thing.  That one bit 
makes a *big* difference in the instruction set capabilities.

> Feel free to e-mail if you want more details.

Thanks.

-- 

Rick

Article: 155026
Subject: Re: MISC - Stack Based vs. Register Based
From: Arlet Ottens <usenet+5@c-scape.nl>
Date: Sun, 31 Mar 2013 09:11:29 +0200
Links: << >> << T >> << A >>

On 03/30/2013 10:54 PM, Rod Pemberton wrote:

>>
>> I guess looking at other peoples designs (such as Chuck's) has
>> changed my perspective over the years so that I am willing and
>> able to do optimizations in ways I would not have wanted to do
>> in the past. But I am a bit surprised that there has been so
>> much emphasis on stack oriented MISC machines which it may well
>> be that register based MISC designs are also very efficient,
>> at least if you aren't building them to service a C compiler or
>> trying to match some ideal RISC model.
>>
>
> Are those your actual results or did you just reiterate what is on
> Wikipedia?  Yes, that's a serious question.  Read the MISC page:
> http://en.wikipedia.org/wiki/Minimal_instruction_set_computer
>
> See ... ?!

It sounds to me rickman is questioning the (unsupported) claims on 
wikipedia that stack based machines have an advantage in size and/or 
simplicity, not reiterating them.

> Code density is a CISC concept.  I don't see how it applies to
> your MISC project.  Increasing code density for a MISC processor
> means implementing more powerful instructions, i.e., those that do
> more work, while minimizing bytes in the instruction opcode
> encoding.  Even if you implement CISC-like instructions, you can't
> forgo the MISC instructions you already have in order to add the
> CISC-like instructions.  So, to do that, you'll need to increase
> the size of the instruction set, as well as implement a more
> complicated instruction decoder.  I.e., that means the processor
> will no longer be MISC, but MISC+minimal CISC hybrid, or pure
> CISC...

It is perfectly possible to trade one type of MISC processor for another 
one. The choice between stack and register based is an obvious one. If 
you switch from stack to register based, there's no need to keep stack 
manipulation instructions around.

> Also, you cross-posted to comp.arch.fpga.  While they'll likely be
> familiar with FPGAs, most there are not going to be familiar with
> the features of stack-based processors or Forth processors that
> you discuss indirectly within your post.  They might not be
> familiar with ancient CISC concepts such as "code density" either,
> or understand why it was important at one point in time.  E.g., I
> suspect this Forth related stuff from above won't be widely
> understood on c.a.f. without clarification:

The design of simple and compact processors is of great interest to many 
FPGA engineers. Plenty of FPGA designs need some sort of control 
processor, and for cost reduction it's important to use minimal 
resources. Like rickman said, this involves a careful balance between 
implementation complexity, speed, and code density, while also 
considering how much work it is to write/maintain the software that's 
running on the processor.

Code density is still critically important. Fast memory is small, both 
on FPGA as well as general purpose processors.

Article: 155027
Subject: Re: MISC - Stack Based vs. Register Based
From: Arlet Ottens <usenet+5@c-scape.nl>
Date: Sun, 31 Mar 2013 13:20:53 +0200
Links: << >> << T >> << A >>

On 03/31/2013 12:00 AM, rickman wrote:

>> I once made a CPU design for an FPGA that had multiple stacks. There was
>> a general purpose stack "A", two index stacks "X" and "Y", and a return
>> stack "R". ALU operations worked between A and any other stack, so they
>> only required 2 bits in the opcode. There was also a move instruction
>> that could move data from a source to a destination stack.
>
> Let's see, this would be a hybrid really, between register based and
> stack based as it has multiple stacks which must be selected in the
> instruction set like registers, just not many.

Exactly. And as a hybrid, it offers some advantages from both kinds of 
designs.

>
>
>> Having access to multiple stacks means you spend less time shuffling
>> data on the stack. There's no more need for swap, over, rot and similar
>> stack manipulation instructions. The only primitive operations you need
>> are push and pop.
>
> There is the extra hardware required to implement multiple stacks.  Why
> have quite so many?  I use the return stack for addresses.  That works
> pretty well.  Maybe A, X and R?

I had all stacks implemented in the same block RAM, just using different 
sections of it. But you are right, in my implementation I had reserved 
room for the Y stack, but never really implemented it. Just using the X 
was sufficient for the application I needed the CPU For. Of course, for 
other applications, having an extra register/stack that you can use as 
an memory pointer could be useful, so I left the Y register in the 
instruction encoding. For 3 registers you need 2 bits anyway, so it 
makes sense to allow for 4.

> But you aren't quite done yet, at least if you care about stack
> overflows.  Chuck's designs don't care.  If he has data on the bottom of
> the stack he can just leave it.  Otherwise you need to drop the address
> on the X stack.

Correct, if you no longer need the address, you need to drop it from the 
stack. On the other hand, if you let it drop automatically, and you need 
it twice, you would have to dup it. Intuitively, I would say that in 
inner loops it would be more common that you'd want to reuse an address 
(possibly with offset or autoinc/dec).

> I was all about 9 bit instructions and 18 bit address/data bus sizes
> until I started working with the iCE40 parts which only have 8 bit
> memory.  I think the parity bit is a comms industry thing.  That one bit
> makes a *big* difference in the instruction set capabilities.

Agreed. As soon you use the parity bits, you're tied to a certain 
architecture.  On the other hand, if you're already chose a certain 
architecture, and the parity bits are available for free, it can be very 
advantageous to use them.

Article: 155028
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Sun, 31 Mar 2013 18:34:57 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga rickman <gnuarm@gmail.com> wrote:

(snip)
>> Well, much of the idea of RISC is that code density isn't very
>> important, and that many of the more complicated instructions made
>> assembly language programming easier, but compilers didn't use them.

> I am somewhat familiar with VLIW.  I am also very familiar with 
> microcode which is the extreme VLIW.  I have coded microcode for an I/O 
> processor on an attached array processor.  That's like saying I coded a 
> DMA controller in a DSP chip, but before DSP chips were around.

Well, yes, VLIW is just about like compiling from source directly to
microcode. The currently in production VLIW processor is Itanium.
I believe 128 bit wide instructions, which specify many different
operations that can happen at the same time. 128 general and 128
floating point registers, 128 bit wide data bus, but it uses a lot
of power. Something like 100W, with Icc of 100A and Vcc about 1V.

I have an actual RX2600 dual Itanium box, but don't run it very
often, mostly because of the power used.

(snip)

>> But every level of logic adds delay. Using a wide bus to fast memory
>> is more efficient that a complicated decoder. But sometimes RISC went
>> too far. In early RISC, there was the idea of one cycle per instruction.
>> They couldn't do that for multiply, so they added multiply-step, an
>> instruction that you execute many times for each multiply operation.
>> (And maybe no divide at all.)

> I"m not sure what your point is.  What part of this is "too far"?  This 
> is exactly the type of design I am doing, but to a greater extent.

The early SPARC didn't have a multiply instruction. (Since it couldn't
be done in one cycle.) Instead, they did multiply, at least the Sun
machines did, through software emulation, with a software trap.

Early when I started using a Sun4/110 I generated a whole set of fonts
for TeX from Metafont source, which does a lot of multiply. Unix (SunOS)
keeps track of user time (what your program does) and system time (what
the OS does while executing your program). Multiply counted as system
time, and could be a large fraction of the total time.

>> For VLIW, a very wide instruction word allows for specifying many
>> different operations at the same time. It relies on complicated
>> compilers to optimally pack the multiple operations into the
>> instruction stream.

> Yes, in theory that is what VLIW is.  This is just one step removed from 
> microcode where the only limitation to how parallel operations can be is 
> the data/address paths themselves.  The primary application of VLIW I 
> have seen is in the TI 6000 series DSP chips.  But in reality this is 
> not what I consider VLIW.  This design uses eight CPUs which are mostly 
> similar, but not quite identical with two sets of four CPU units sharing 
> a register file, IIRC.  In reality each of the eight CPUs gets its own 
> 32 bit instruction stream.  They all operate in lock step, but you can't 
> do eight FIR filters.  I think of the four in a set, two are set up to 
> do full math, etc and two are able to generate addresses.  So this is 
> really two CPUs with dual MACs and two address generators as it ends up 
> being used most of the time.  But then they make it run at clocks of 
> over 1 GHz so it is a damn fast DSP and handles most of the cell phone 
> calls as part of the base station.

For Itanium, the different units do different things. There are
instruction formats that divide up the bits in different ways to make
optimal use of the bits. I used to have the manual nearby, but I don't
see it right now.

(snip)

> That is a result of the heavy pipelining that is being done.  I like to 
> say my design is not pipelined, but someone here finally convinced me 
> that my design *is* pipelined with the execution in parallel with the 
> next instruction fetch, but there is never a need to stall or flush 
> because there are no conflicts.

Yes, that counts, but it gets much more interesting with pipelines
like the Cray-1 that, after some cycles of latency, generate one result
every clock cycle. 

> I want a design to be fast, but not at the expense of complexity.  That 
> is one way I think like Chuck Moore.  Keep it simple and that gives you 
> speed.

(snip)

> In Forth speak a POP would be a DROP.  That is not often used in Forth 
> really, or in my apps.  I just wrote some code for my stack CPU and I 
> think there were maybe two DROPs in just over 100 instructions.  I am 
> talking about the DUPs, SWAPs, OVERs and such.  The end up being needed 
> enough that it makes the register design look good... at least at first 
> blush.  I am looking at how to organize a register based instruction set 
> without expanding the size of the instructions.  I'm realizing that is 
> one issue with registers, you have to specify them.  But I don't need to 
> make the machine totally general like the goal for RISC.  That is so 
> writing compilers is easier.  I don't need to consider that.

For x87, they avoid the DUP, SWAP, and such by instructions being able
to specify any register in the stack. You can, for example, add any
stack register (there are eight) to the top of stack. I haven't thought 
about it for a while, but I believe either pushing the result, or
replacing the previous top of the stack. 

(snip)

>> Well, the stack design has the advantage that you can use instruction
>> bits either for a memory address or for the operation, allowing for much
>> smaller instructions. But that only works as long as everything is in
>> the right place on the stack.

> Yes, it is a tradeoff between instruction size and the number of ops 
> needed to get a job done.  I'm looking at trimming the instruction size 
> down to give a workable subset for register operations.

Seems to me that one possibility is to have a really functional stack
operation instruction, such that, with the give number of bits, it
allows for the most opertions. Some combination of DUP, SWAP, and POP
all at once. Though that isn't easy for the stack itself.

-- glen

Article: 155029
Subject: ISIM issue with 'last_value attribute in functions
From: Paul Urbanus <urb@urbonix.com>
Date: Mon, 01 Apr 2013 11:42:25 -0500
Links: << >> << T >> << A >>

In the following forum thread, member Alex reports some odd behavior 
when using ISIM to simulate some code that is part of an FPGA emulation 
of an arcade game.

http://forum.gadgetfactory.net/index.php?/topic/1544-isim-is-driving-me-crazy/#entry10072

Since the code is likely a literal translation of the arcade game 
schematic, some bits from a 5-bit slv counter are used as the async 
reset and clock in a process. Don't be distracted by this 
non-synchronous design practice, as this isn't the issue.

The issue is that the rising_edge() function wasn't working correctly. 
After some investigation and simulation, a determination has been made 
that in ISIM the 'last_value attribute isn't being properly evaluated 
*inside* the function when the argument to rising_edge() is a bit from a 
standard_logic_vector???

If the counter bit, counter(1), that is used as the clock is assigned to 
the std_logic signal, counter_1, then the 'last_value attribute is 
evaluated properly in the rising_edge() function call.

To test this a 2nd process is created using the counter_1 signal as the 
clock. The rising_edge function source is copied from the Xilinx library 
and modified to report the status of the incoming signal.

Here is the test bench which illustrates the problem.

********** BEGIN CODE ***********
library ieee;
   use ieee.std_logic_1164.all;
   use ieee.std_logic_unsigned.all;

entity test_tb is
end test_tb;

architecture behavior of test_tb is

   function rising_edge (signal s : std_ulogic) return boolean is
   begin
     report "s'event=" & boolean'image(s'event) & " s=" & 
std_ulogic'image(s) & " s'last_value=" & std_ulogic'image(s'last_value);
     return (s'event and (to_x01(s) = '1') and
                         (to_x01(s'last_value) = '0'));
   end;

   signal output     : std_logic := '0';
   signal counter    : std_logic_vector(4 downto 0) := (others => '0');
   constant period   : time := 20 ns;
   signal counter_1  : std_logic;
   signal output_1  : std_logic;

begin
   process
     variable counter_i  : std_logic_vector(4 downto 0) := (others => '0');
   begin
     wait for period;
     counter <= counter+1;
   end process;

   process(counter(1), counter(4))
   begin
     report "counter(1)'event=" & boolean'image(counter(1)'event) & " 
counter(1)=" & std_ulogic'image(counter(1)) & " counter(1)'last_value=" 
& std_ulogic'image(counter(1)'last_value);
     if (counter(4) = '0') then
       output <= '0';
     elsif rising_edge(counter(1)) then
       output <= (not counter(3)) and counter(2);
     end if;
   end process;

   counter_1 <= counter(1);
   process(counter_1, counter(4))
   begin
     report "counter_1'event=" & boolean'image(counter_1'event) & " 
counter_1=" & std_ulogic'image(counter_1) & " counter_1'last_value=" & 
std_ulogic'image(counter_1'last_value);
     if (counter(4) = '0') then
       output_1 <= '0';
     elsif rising_edge(counter_1) then
       output_1 <= (not counter(3)) and counter(2);
     end if;
   end process;

end;
********** END CODE ***********

Here are the reports from the ISIM log for the beginning of the time 
period when counter(4) is '1' and the async reset in the process is not 
asserted. As you can see, the 'last_value attribute value is incorrect 
for all rising_edge(counter(1)) calls but correct for 
rising_edge(counter_1).

The code runs correctly in ModelSim 10.1, producing identical results 
for both function calls.


********** BEGIN ISIM LOG ***********
at 320 ns(1): Note: counter(1)'event=true counter(1)='0' 
counter(1)'last_value='1' (/test_tb/).
at 320 ns(1): Note: s'event=true s='0' s'last_value='0' (/test_tb/).
at 320 ns(1): Note: counter_1'event=false counter_1='1' 
counter_1'last_value='0' (/test_tb/).
at 320 ns(1): Note: s'event=false s='1' s'last_value='0' (/test_tb/).
at 320 ns(2): Note: counter_1'event=true counter_1='0' 
counter_1'last_value='1' (/test_tb/).
at 320 ns(2): Note: s'event=true s='0' s'last_value='1' (/test_tb/).
at 360 ns(1): Note: counter(1)'event=true counter(1)='1' 
counter(1)'last_value='0' (/test_tb/).
at 360 ns(1): Note: s'event=true s='1' s'last_value='1' (/test_tb/).
at 360 ns(2): Note: counter_1'event=true counter_1='1' 
counter_1'last_value='0' (/test_tb/).
at 360 ns(2): Note: s'event=true s='1' s'last_value='0' (/test_tb/).
at 400 ns(1): Note: counter(1)'event=true counter(1)='0' 
counter(1)'last_value='1' (/test_tb/).
at 400 ns(1): Note: s'event=true s='0' s'last_value='1' (/test_tb/).
at 400 ns(2): Note: counter_1'event=true counter_1='0' 
counter_1'last_value='1' (/test_tb/).
at 400 ns(2): Note: s'event=true s='0' s'last_value='1' (/test_tb/).
********** END ISIM LOG ***********

Any ideas? Is this a legitimate bug or a hole in my understanding of 
VHDL and simulation in general?

Urbite

Article: 155030
Subject: Re: MISC - Stack Based vs. Register Based
From: Jon Elson <jmelson@wustl.edu>
Date: Mon, 01 Apr 2013 15:11:18 -0500
Links: << >> << T >> << A >>

glen herrmannsfeldt wrote:


> Seems to me that much of the design of VAX was to improve code
> density when main memory was still a very significant part of the
> cost of a machine.
The original VAX series (780, then 730 and 750) and the uVAX I and
uVAX II had no cache, so reducing rate of memory fetches was also
important.  The first cached machine I used was the uVAX III (KA650)
and a good part of the performance increase was due to the cache.

Jon

Article: 155031
Subject: Re: Xilinx tools for XC3020???
From: Paul Urbanus <urb@urbonix.com>
Date: Mon, 01 Apr 2013 16:11:00 -0500
Links: << >> << T >> << A >>

On 3/25/2013 7:04 PM, Mike Butts wrote:
> I've got a 20-year-old Xilinx XC3020 development board. I think it would be fun to fire it up and bring it to the 20th anniversary FCCM in Seattle next month. (http://fccm.org/2013/)
>
> I don't see XC3000-series supported on even the oldest archived ISE at xilinx.com. Anyone know where I can find some tools for this old chip? It has 64 CLBs and 256 flip-flops! Maybe one of you folks at Xilinx? Thanks!
>
>    --Mike
>
>
Is this a 'plain' XC3020 or the XC3020A or 'L'? I have a Foundation 
Express 1.5 loaded up on a Win 98 VM running under VMware Workstation 8 
on a Win 7 host.

Just for 'fun' I coded up an 8-bit counter and was able to push the 
design all the way through the flow and generate a bit file.

Unfortunately F1.5 doesn't support the original XC3000 family, only the 
later 'A' and 'L' revisions.

That's too bad because I just found several XC3042's, 3020's, 2018's, 
2064's, and some 95108's in my collection. What kind of retro project 
could be built with that collection of logic?

I'm certain that I have earlier versions of the Xilinx somewhere that 
will support that old stuff. All of the licenses back then were locked 
to a hard drive serial number, which are easy to change when moving to a 
new pc - real or virtual.

--Paul

Article: 155032
Subject: Re: MISC - Stack Based vs. Register Based
From: "Rod Pemberton" <do_not_have@notemailnotq.cpm>
Date: Mon, 1 Apr 2013 20:10:27 -0400
Links: << >> << T >> << A >>

"Arlet Ottens" <usenet+5@c-scape.nl> wrote in message
news:5157e1a1$0$6924$e4fe514c@news2.news.xs4all.nl...
> On 03/30/2013 10:54 PM, Rod Pemberton wrote:

> >>
> >> I guess looking at other peoples designs (such as Chuck's)
> >> has changed my perspective over the years so that I am
> >> willing and able to do optimizations in ways I would not have
> >> wanted to do in the past. But I am a bit surprised that there
> >> has been so much emphasis on stack oriented MISC machines
> >> which it may well be that register based MISC designs are
> >> also very efficient, at least if you aren't building them to
> >> service a C compiler or trying to match some ideal RISC
> >> model.
> >>
> >
> > Are those your actual results or did you just reiterate what
> > is on Wikipedia?  Yes, that's a serious question.  Read the
> > MISC page: [link]
> >
> > See ... ?!
>
> It sounds to me rickman is questioning the (unsupported) claims
> on wikipedia that stack based machines have an advantage in size
> and/or simplicity, not reiterating them.
>

Well, you snipped alot of context.  I wouldn't have reformatted
all of it either.

Anyway, what I took from rickman's statements was that he had only
managed to confirm what is already known about MISC according to
Wikipedia.  I.e., what was/is the point?

> > Code density is a CISC concept.  I don't see how it applies to
> > your MISC project.  Increasing code density for a MISC
> > processor means implementing more powerful instructions,
> > i.e., those that do more work, while minimizing bytes in the
> > instruction opcode encoding.  Even if you implement
> > CISC-like instructions, you can't forgo the MISC instructions
> > you already have in order to add the CISC-like instructions.
> > So, to do that, you'll need to increase the size of the
> > instruction set, as well as implement a more complicated
> > instruction decoder.  I.e., that means the processor will no
> > longer be MISC, but MISC+minimal CISC hybrid, or pure
> > CISC...
>
> It is perfectly possible to trade one type of MISC processor for
> another one. The choice between stack and register based is an
> obvious one. If you switch from stack to register based, there's
> no need to keep stack manipulation instructions around.
>

Yes, true.  But, what does eliminating MISC stack instructions and
replacing them with MISC register instructions have to do with the
CISC concept of code density or with CISC instructions? ...

> > Also, you cross-posted to comp.arch.fpga.  While they'll
> > likely be familiar with FPGAs, most there are not going
> > to be familiar with the features of stack-based processors
> > or Forth processors that you discuss indirectly within your
> > post.  They might not be familiar with ancient CISC
> > concepts such as "code density" either, or understand why
> > it was important at one point in time.  E.g., I suspect this
> > Forth related stuff from above won't be widely
> > understood on c.a.f. without clarification:
>
> The design of simple and compact processors is of great interest
> to many FPGA engineers. Plenty of FPGA designs need some
> sort of control processor, and for cost reduction it's important
> to use minimal resources. Like rickman said, this involves a
> careful balance between implementation complexity, speed,
> and code density, while also considering how much work it
> is to write/maintain the software that's running on the
> processor.
>
> Code density is still critically important. Fast memory is
> small, both on FPGA as well as general purpose processors.
>

So, shouldn't he dump the entire MISC instruction set he has, and
implement a CISC instruction set instead?  That's the only way
he's going to get the "critically important" code density, which
I'll take it you rank well above MISC as being important.  Of
course, a CISC instruction set requires a more complicated
instruction decoder ...  So, it seems that either way he proceeds,
he is being contrained by the "minimal resources" of his FPGA.
That was what he stated:

  "My efforts have shown it hard to improve on code density by a
significant degree while simultaneously minimizing the resources
used by the design."

I.e., if the FPGA he is attempting to use is insufficient to do
what he wants or needs, then it's insufficient.  Or, he needs some
new techniques.  He didn't explicitly ask for any though ...

Glen mentioned the numerous address modes of the VAX.  The 6502
also had alot of address modes and had instructions which used
zero-page as a set of fast registers.  I would think that early,
compact designs like the 6502 and Z80 could be useful to rickman.
They had low transistor counts.

Z80   8500 transistors
6502  9000 transistors
8088 29000 transistors (for comparison...)


Rod Pemberton

Article: 155033
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 2 Apr 2013 00:54:06 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga Rod Pemberton <do_not_have@notemailnotq.cpm> wrote:

(snip)

> Well, you snipped alot of context.  I wouldn't have reformatted
> all of it either.

> Anyway, what I took from rickman's statements was that he had only
> managed to confirm what is already known about MISC according to
> Wikipedia.  I.e., what was/is the point?

(snip, someone wrote)
>> It is perfectly possible to trade one type of MISC processor for
>> another one. The choice between stack and register based is an
>> obvious one. If you switch from stack to register based, there's
>> no need to keep stack manipulation instructions around.

> Yes, true.  But, what does eliminating MISC stack instructions and
> replacing them with MISC register instructions have to do with the
> CISC concept of code density or with CISC instructions? ...

Seems to me that one could still Huffman code the opcode, even
within the MISC concept. That is, use fewer bits for more common
operations, or where it otherwise simplifies the result.

As someone noted, you can have an N-1 bit load immediate instruction
where N is the instruction size. 

In one of Knuth's TAOCP books he describes a two instruction computer.

Seems like if one is interested in MISC, someone should build that one.

Also, maybe a MIX and MMIX machine, maybe the decimal version.

For those who don't read TAOCP, MIX is defined independent of the
underlying base. Programs in MIXAL are supposed to assemble and
run correctly on hosts that use any base within the range specified.

Instruction bytes have, if I remember, between 64 and 100 possible
values, such that six bits or two decimal digits are possible 
representations. 

I believe that allows for bases 2, 3, 4, 8, 9, 10, and 16.

-- glen

Article: 155034
Subject: Re: MISC - Stack Based vs. Register Based
From: Arlet Ottens <usenet+5@c-scape.nl>
Date: Tue, 02 Apr 2013 08:25:15 +0200
Links: << >> << T >> << A >>

On 04/02/2013 02:10 AM, Rod Pemberton wrote:

> Yes, true.  But, what does eliminating MISC stack instructions and
> replacing them with MISC register instructions have to do with the
> CISC concept of code density or with CISC instructions? ...

I don't think 'code density' is a CISC concept at all. Code density 
applies to any kind of instruction encoding.

Any kind of MISC architecture will have a certain code density (for a 
given application), and require a certain amount of FPGA resources. And 
if you store the code in local FPGA memory, and you know your 
application, you can even convert it all to FPGA resources.

Obviously, one MISC design will have better resource use than another, 
and one of the questions is whether we can make any kind of general 
statement about stack vs register implementation.

> So, shouldn't he dump the entire MISC instruction set he has, and
> implement a CISC instruction set instead?  That's the only way
> he's going to get the "critically important" code density, which
> I'll take it you rank well above MISC as being important.  Of
> course, a CISC instruction set requires a more complicated
> instruction decoder ...  So, it seems that either way he proceeds,
> he is being contrained by the "minimal resources" of his FPGA.

Exactly. If you want to get FPGA resources low, a MISC design seems most 
appropriate. But within the MISC concept, there are still plenty of 
design decisions left.

Really, the biggest problem with FPGA CPU design is that there are too 
many design decisions, and each decision influences everything else.

> Glen mentioned the numerous address modes of the VAX.  The 6502
> also had alot of address modes and had instructions which used
> zero-page as a set of fast registers.  I would think that early,
> compact designs like the 6502 and Z80 could be useful to rickman.
> They had low transistor counts.
>
> Z80   8500 transistors
> 6502  9000 transistors
> 8088 29000 transistors (for comparison...)

Low transistor counts do not necessarily translate to low FPGA 
resources. Early CPUs used dynamic storage, dual clock latches, pass 
logic and tri-state buses to create really small designs that don't 
necessarily map well to FPGAs. On the other hand, FPGAs have specific 
features (depending on the brand) that can be exploited to create really 
tight designs.

Also, these processors are slow, using multi cycle instructions, and 8 
bit operations. That may not be acceptable. And even if low performance 
is acceptable, there are numerous other ways where you can trade speed 
for code density, so you'd have to consider these too. For instance, I 
can replace pipelining with multi-cycle execution, or use microcode.

Article: 155035
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Tue, 02 Apr 2013 09:54:40 -0400
Links: << >> << T >> << A >>

On 3/31/2013 2:34 PM, glen herrmannsfeldt wrote:
> In comp.arch.fpga rickman<gnuarm@gmail.com>  wrote:

(snip)
>
> I have an actual RX2600 dual Itanium box, but don't run it very
> often, mostly because of the power used.

A 100 watt lightbulb uses about $10 a month if left on 24/7.  So I 
wouldn't worry too much about the cost of running your Itanium.  If you 
turn it off when you aren't using it I can't imagine it would really 
cost anything noticeable to run.

> For Itanium, the different units do different things. There are
> instruction formats that divide up the bits in different ways to make
> optimal use of the bits. I used to have the manual nearby, but I don't
> see it right now.

Yes, the array processor I worked on was coded from scratch, very 
laboriously.  The Itanium is trying to run existing code as fast as 
possible.  So they have a number of units to do similar things, but also 
different types, all working in parallel as much as possible.  Also the 
parallelism in the array processor was all controlled by the programmer. 
  In regular x86 processors the parallelism is controlled by the chip 
itself.  I'm amazed sometimes at just how much they can get the chip to 
do, no wonder there are 100's of millions of transistors on the thing. 
I assume parallelism in the Itanium is back to the compiler smarts to 
control since it needs to be coded into the VLIW instructions.

> For x87, they avoid the DUP, SWAP, and such by instructions being able
> to specify any register in the stack. You can, for example, add any
> stack register (there are eight) to the top of stack. I haven't thought
> about it for a while, but I believe either pushing the result, or
> replacing the previous top of the stack.

That is something I have not yet looked at, a hybrid approach with a 
stack, but also with stack top relative addressing.  It is important to 
keep the hardware simple, but that might not be too hard to implement.
Arlet talked about his hybrid processor design with multiple stacks.  I 
would need to give this a bit of thought as the question becomes, what 
possible advantage would a hybrid processor have over the other two? 
Actually, the image in my mind is a bit like the C language stack frame 
model.  You can work with addressing relative to the top of stack and 
when a sub is called it layers its variables on top of the existing 
stack.  That would require a bit of entry and exit code, so there will 
be a tradeoff between simple register addressing and simple subroutine 
entry.

> Seems to me that one possibility is to have a really functional stack
> operation instruction, such that, with the give number of bits, it
> allows for the most opertions. Some combination of DUP, SWAP, and POP
> all at once. Though that isn't easy for the stack itself.

Like many tradeoffs, this can complicate the hardware.  If you want to 
be able to combine an ADD with a SWAP or OVER, you need to be able to 
access more than two operands on the stack at once.  In my designs that 
means pulling another register out of the proper stack.  Rather than one 
register and a block of memory, each stack would need two registers and 
the block of memory along with the input multiplexers.  This would need 
to be examined in context of common code to see how much it would help 
vs the expense of added hardware.

I'm still not complete in my analysis of a register based MISC design, 
but at the moment I think the register approach gives better instruction 
size efficiency/faster execution as well as simpler hardware design.

-- 

Rick

Article: 155036
Subject: FPGA for large HDMI switch
From: David Brown <david@westcontrol.removethisbit.com>
Date: Tue, 02 Apr 2013 17:27:07 +0200
Links: << >> << T >> << A >>

I am working on a project that will involve a large HDMI switch - up to
16 inputs and 16 outputs.  We haven't yet decided on the architecture,
but one possibility is to use one or more FPGAs.  The FPGAs won't be
doing much other than the switch - there is no video processing going on.

Each HDMI channel will be up to 3.4 Gbps (for HDMI 1.4), with 4 TMDS
pairs (3 data and 1 clock).  That means 64 pairs in, and 64 pairs out,
all at 3.4 Gbps.


Does anyone know of any FPGA families that might be suitable here?

I've had a little look at Altera (since I've used Altera devices
before), but their low-cost transceivers are at 3.125 Gbps - this means
we'd have to use their mid or high cost devices, and they don't have
nearly enough channels.  I don't expect the card to be particularly
cheap, but I'd like to avoid the cost of multiple top-range FPGA devices
- then it would be much cheaper just to have a card with 80 4-to-1 HDMI
mux chips.

Thanks for any pointers,

David

Article: 155037
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Tue, 02 Apr 2013 12:04:24 -0400
Links: << >> << T >> << A >>

On 3/30/2013 5:54 PM, Rod Pemberton wrote:
> "rickman"<gnuarm@gmail.com>  wrote in message
> news:kj4vae$msi$1@dont-email.me...
>
>> I have been working with stack based MISC designs in FPGAs for
>> some years.  All along I have been comparing my work to the work
>> of others.  These others were the conventional RISC type
>> processors supplied by the FPGA vendors as well as the many
>> processor designs done by individuals or groups as open source.
>>
>> So far my CPUs have always ranked reasonably well in terms of
>> speed, but more importantly to me, very well in terms of size
>> and code density.  My efforts have shown it hard to improve on
>> code density by a significant degree while simultaneously
>> minimizing the resources used by the design.  Careful selection
>> of the instruction set can both improve code density and
>> minimize logic used if measured together, but there is always a
>> tradeoff.  One can always be improved at the expense of the
>> other.
>>
>> The last couple of days I was looking at some code I plan to use
>> and realized that it could be a lot more efficient if I could
>> find a way to use more parallelism inside the CPU and use fewer
>> instructions. So I started looking at defining separate opcodes
>> for the two primary function units in the design, the data stack
>> and the return stack.  Each has its own ALU.  The data stack has
>> a full complement of capabilities while the return stack can
>> only add, subtract and compare.  The return stack is actually
>> intended to be an "address" processing unit.
>>
>> While trying to figure out how to maximize the parallel
>> capabilities of these units, I realized that many operations
>> were just stack manipulations.  Then I read the thread about the
>> relative "cost" of stack ops vs memory accesses and I realized
>> these were what I needed to optimize.  I needed to find a way to
>> not use an instruction and a clock cycle for moving data around
>> on the stack.
>>
>> In the thread on stack ops it was pointed out repeatedly that
>> very often the stack operands would be optimized to register
>> operands, meaning they wouldn't need to do the stack ops at all
>> really.  So I took a look at a register based MISC design.
>> Guess what, I don't see the disadvantage!  I have pushed this
>> around for a couple of days and although I haven't done a
>> detailed design, I think I have looked at it enough to realize
>> that I can design a register oriented MISC CPU that will run as
>> fast, if not faster than my stack based design and it will use
>> fewer instructions.  I still need to add some features like
>> support for a stack in memory, in other words,
>> pre-increment/post-decrement (or the other way around...), but I
>> don't see where this is a bad design.  It may end up using
>> *less* logic as well.  My stack design provides access to the
>> stack pointers which require logic for both the pointers and
>> muxing them into the data stack for reading.
>>
>> I guess looking at other peoples designs (such as Chuck's) has
>> changed my perspective over the years so that I am willing and
>> able to do optimizations in ways I would not have wanted to do
>> in the past. But I am a bit surprised that there has been so
>> much emphasis on stack oriented MISC machines which it may well
>> be that register based MISC designs are also very efficient,
>> at least if you aren't building them to service a C compiler or
>> trying to match some ideal RISC model.
>>
>
> Are those your actual results or did you just reiterate what is on
> Wikipedia?  Yes, that's a serious question.  Read the MISC page:
> http://en.wikipedia.org/wiki/Minimal_instruction_set_computer
>
> See ... ?!

You clearly don't understand my post or the Wiki article or possibly 
both.  I suggest you reread both and then ask questions if you still 
don't understand.

> Code density is a CISC concept.  I don't see how it applies to
> your MISC project.

Yes, I get that you don't understand.  Do you have a specific question?

> Increasing code density for a MISC processor
> means implementing more powerful instructions, i.e., those that do
> more work, while minimizing bytes in the instruction opcode
> encoding.

Yes, that part you seem to understand.

> Even if you implement CISC-like instructions, you can't
> forgo the MISC instructions you already have in order to add the
> CISC-like instructions.

Really?  I can't drop the entire instruction set and start over?  Who 
says so?  Am I breaking a law?

> So, to do that, you'll need to increase
> the size of the instruction set, as well as implement a more
> complicated instruction decoder.

Define "increase the size of the instruction set".  I am using a 9 bit 
opcode for my stack design and am using a similar 9 bit opcode for the 
register design.  In what way is the register design using a larger 
instruction set?

That was exactly the blinder I was wearing until now.  I had read a lot 
about register CPU instruction sets where the intent was not in line 
with MISC goals.  MicroBlaze is a good example.  I think it uses well in 
excess of 1000 LUTs, maybe multiple 1000's.  I need something that is 
much smaller and it hadn't occurred to me that perhaps the common goals 
of register designs (lots of registers, orthogonality, address mode 
flexibility, etc) could be limited or even tossed out the window.  I 
don't need a machine that is easy for a C compiler to produce code for. 
  My goals are for minimal hardware without losing any more performance 
than is essential.  In particular I work in FPGAs, so the design needs 
to work well in that environment.

> I.e., that means the processor
> will no longer be MISC, but MISC+minimal CISC hybrid, or pure
> CISC...

Nonsense.  "Minimal Instruction Set Computer (MISC) is a processor 
architecture with a very small number of basic operations and 
corresponding opcodes." from Wikipedia.  BTW, I don't think the term 
MISC is widely used and is not well defined.  This is the only web page 
I found that even tries to define it.

Actually, if you consider only the opcodes and not the operand 
combinations, I think the register design may have fewer instructions 
than does the stack design.  But the register design still is in work so 
I'm not done counting yet.

There are some interesting corners to be explored.  For example a MOV 
rx,rx is essentially a NOP.  There are eight of these.  So instead, why 
not make them useful by clearing the register?  So MOV rx,rx is a clear 
to be given the name CLR rx... unless the rx is r7 in which case it is 
indeed a MOV r7,r7 which is now a NOP to be coded as such.  The CLR r7 
is not needed because LIT 0 already does the same job.  Even better the 
opcode for a NOP is 0x1FF or octal 777.  That's very easy to remember 
and recognize.  It feels good to find convenient features like this and 
makes me like the register MISC design.

> No offense, but you seem to be "reinventing the wheel" in terms of
> microprocessor design.  You're coming to the same conclusions that
> were found in the 1980's, e.g., concluding a register based
> machine can perform better than a stack based machine, except
> you've applied it to MISC in an FPGA package...  How is that a new
> conclusion?

I don't see how you can say that.  I don't know of other MISC designs 
that are very good.  I think the picoBlaze is one, but I don't think it 
was designed to be very MISC really.  It was designed to be small, 
period.  It has a lot of fixed features and so can't be altered without 
tossing old code.  Some of the MicroChip devices might be MISC.  I have 
not worked with them.  I might want to take a look actually.  There may 
be useful ideas.

But "reinventing the wheel"?  I don't think so.

-- 

Rick

Article: 155038
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 2 Apr 2013 17:20:44 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga rickman <gnuarm@gmail.com> wrote:
> On 3/31/2013 2:34 PM, glen herrmannsfeldt wrote:
>> In comp.arch.fpga rickman<gnuarm@gmail.com>  wrote:
 
> (snip)

>> I have an actual RX2600 dual Itanium box, but don't run it very
>> often, mostly because of the power used.
 
> A 100 watt lightbulb uses about $10 a month if left on 24/7.  So I 
> wouldn't worry too much about the cost of running your Itanium.  If you 
> turn it off when you aren't using it I can't imagine it would really 
> cost anything noticeable to run.

It is a dual processor box, plus all the rest of the systems
in the box. Yes, I was considering running it all the time, but
it is too expensive for that.
 
>> For Itanium, the different units do different things. There are
>> instruction formats that divide up the bits in different ways to make
>> optimal use of the bits. I used to have the manual nearby, but I don't
>> see it right now.
 
> Yes, the array processor I worked on was coded from scratch, very 
> laboriously.  The Itanium is trying to run existing code as fast as 
> possible.  So they have a number of units to do similar things, but also 
> different types, all working in parallel as much as possible.  Also the 
> parallelism in the array processor was all controlled by the programmer. 
>  In regular x86 processors the parallelism is controlled by the chip 
> itself.  I'm amazed sometimes at just how much they can get the chip to 
> do, no wonder there are 100's of millions of transistors on the thing. 
> I assume parallelism in the Itanium is back to the compiler smarts to 
> control since it needs to be coded into the VLIW instructions.

Seems to me that the big problem with the original Itanium was the
need to also run x86 code. That delayed the release for some time, and
in that time other processors had advanced. I believe that later
versions run x86 code in software emulation, maybe with some hardware
assist. 

-- glen

Article: 155039
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Tue, 02 Apr 2013 13:26:36 -0400
Links: << >> << T >> << A >>

On 4/1/2013 8:10 PM, Rod Pemberton wrote:
> "Arlet Ottens"<usenet+5@c-scape.nl>  wrote in message
> news:5157e1a1$0$6924$e4fe514c@news2.news.xs4all.nl...
>> On 03/30/2013 10:54 PM, Rod Pemberton wrote:
>
>>>>
>>>> I guess looking at other peoples designs (such as Chuck's)
>>>> has changed my perspective over the years so that I am
>>>> willing and able to do optimizations in ways I would not have
>>>> wanted to do in the past. But I am a bit surprised that there
>>>> has been so much emphasis on stack oriented MISC machines
>>>> which it may well be that register based MISC designs are
>>>> also very efficient, at least if you aren't building them to
>>>> service a C compiler or trying to match some ideal RISC
>>>> model.
>>>>
>>>
>>> Are those your actual results or did you just reiterate what
>>> is on Wikipedia?  Yes, that's a serious question.  Read the
>>> MISC page: [link]
>>>
>>> See ... ?!
>>
>> It sounds to me rickman is questioning the (unsupported) claims
>> on wikipedia that stack based machines have an advantage in size
>> and/or simplicity, not reiterating them.
>>
>
> Well, you snipped alot of context.  I wouldn't have reformatted
> all of it either.
>
> Anyway, what I took from rickman's statements was that he had only
> managed to confirm what is already known about MISC according to
> Wikipedia.  I.e., what was/is the point?

You really need to reread the wiki description of MISC.  There seems to 
be a disconnect.


>>> Code density is a CISC concept.  I don't see how it applies to
>>> your MISC project.  Increasing code density for a MISC
>>> processor means implementing more powerful instructions,
>>> i.e., those that do more work, while minimizing bytes in the
>>> instruction opcode encoding.  Even if you implement
>>> CISC-like instructions, you can't forgo the MISC instructions
>>> you already have in order to add the CISC-like instructions.
>>> So, to do that, you'll need to increase the size of the
>>> instruction set, as well as implement a more complicated
>>> instruction decoder.  I.e., that means the processor will no
>>> longer be MISC, but MISC+minimal CISC hybrid, or pure
>>> CISC...
>>
>> It is perfectly possible to trade one type of MISC processor for
>> another one. The choice between stack and register based is an
>> obvious one. If you switch from stack to register based, there's
>> no need to keep stack manipulation instructions around.
>>
>
> Yes, true.  But, what does eliminating MISC stack instructions and
> replacing them with MISC register instructions have to do with the
> CISC concept of code density or with CISC instructions? ...

Weren't you the person who brought CISC into this discussion?  Why are 
you asking this question about CISC?


>>> Also, you cross-posted to comp.arch.fpga.  While they'll
>>> likely be familiar with FPGAs, most there are not going
>>> to be familiar with the features of stack-based processors
>>> or Forth processors that you discuss indirectly within your
>>> post.  They might not be familiar with ancient CISC
>>> concepts such as "code density" either, or understand why
>>> it was important at one point in time.  E.g., I suspect this
>>> Forth related stuff from above won't be widely
>>> understood on c.a.f. without clarification:
>>
>> The design of simple and compact processors is of great interest
>> to many FPGA engineers. Plenty of FPGA designs need some
>> sort of control processor, and for cost reduction it's important
>> to use minimal resources. Like rickman said, this involves a
>> careful balance between implementation complexity, speed,
>> and code density, while also considering how much work it
>> is to write/maintain the software that's running on the
>> processor.
>>
>> Code density is still critically important. Fast memory is
>> small, both on FPGA as well as general purpose processors.
>>
>
> So, shouldn't he dump the entire MISC instruction set he has, and
> implement a CISC instruction set instead?  That's the only way
> he's going to get the "critically important" code density, which
> I'll take it you rank well above MISC as being important.  Of
> course, a CISC instruction set requires a more complicated
> instruction decoder ...  So, it seems that either way he proceeds,
> he is being contrained by the "minimal resources" of his FPGA.
> That was what he stated:
>
>    "My efforts have shown it hard to improve on code density by a
> significant degree while simultaneously minimizing the resources
> used by the design."

I have "dumped" the stack related portion of the instruction set and 
replaced it with register references.  Why would I do otherwise?


> I.e., if the FPGA he is attempting to use is insufficient to do
> what he wants or needs, then it's insufficient.  Or, he needs some
> new techniques.  He didn't explicitly ask for any though ...

No one said anything about "insufficient".  I am looking for an optimal 
design for a small CPU that can be used efficiently in an FPGA.


> Glen mentioned the numerous address modes of the VAX.  The 6502
> also had alot of address modes and had instructions which used
> zero-page as a set of fast registers.  I would think that early,
> compact designs like the 6502 and Z80 could be useful to rickman.
> They had low transistor counts.
>
> Z80   8500 transistors
> 6502  9000 transistors
> 8088 29000 transistors (for comparison...)

I'm not counting transistors.  That is one of the fallacies of comparing 
chip designs to FPGA designs.  When you have the flexibility of using 
transistors you can do things in ways that are efficient that are not 
efficient in the LUTs of an FPGA.

The two big constraints are resources used in the FPGA and opcode size. 
  If an instruction would require addition of resources to the design, 
it needs to justify that addition.  An instruction also needs to justify 
the portion of the opcode space it requires.  Operations that specify 
more than one register become very expensive in terms of opcode bits. 
Operations that require additional data paths become expensive in terms 
of resources.  Doing as much as possible with as little as possible is 
what I'm looking for.

BTW, the ZPU is a pretty good example of just how small a design can be. 
  But it is very slow.  That's the other size of the equation.  Improve 
speed as much as possible while not using more resources than possible. 
  The optimal point depends on the coefficients used... in other words, 
my judgement.  This is not entirely objective.

-- 

Rick

Article: 155040
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Tue, 02 Apr 2013 13:31:39 -0400
Links: << >> << T >> << A >>

On 4/1/2013 8:54 PM, glen herrmannsfeldt wrote:
>
> Seems to me that one could still Huffman code the opcode, even
> within the MISC concept. That is, use fewer bits for more common
> operations, or where it otherwise simplifies the result.
>
> As someone noted, you can have an N-1 bit load immediate instruction
> where N is the instruction size.

Yup, I am still using the high bit to indicate an immediate operand. 
This requires an implied location, so this literal is always in R7, the 
addressing register.  In my stack design it is always the return stack, 
also the addressing register.

Jumps and calls still contain a small literal to be used alone when the 
relative jump is within range or with the literal instruction when not.

I think this is very similar to Huffman encoding.

> In one of Knuth's TAOCP books he describes a two instruction computer.
>
> Seems like if one is interested in MISC, someone should build that one.

You can design a one instruction computer, but there is a balance 
between resources used and the effectiveness of the resulting design. 
The effectiveness of this sort of design is too low.

> Also, maybe a MIX and MMIX machine, maybe the decimal version.
>
> For those who don't read TAOCP, MIX is defined independent of the
> underlying base. Programs in MIXAL are supposed to assemble and
> run correctly on hosts that use any base within the range specified.
>
> Instruction bytes have, if I remember, between 64 and 100 possible
> values, such that six bits or two decimal digits are possible
> representations.
>
> I believe that allows for bases 2, 3, 4, 8, 9, 10, and 16.

Doesn't sound like an especially practical computer.  Has anyone ever 
built one?

-- 

Rick

Article: 155041
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Tue, 02 Apr 2013 13:33:02 -0400
Links: << >> << T >> << A >>

On 4/2/2013 2:25 AM, Arlet Ottens wrote:
> On 04/02/2013 02:10 AM, Rod Pemberton wrote:
>>
>> Z80 8500 transistors
>> 6502 9000 transistors
>> 8088 29000 transistors (for comparison...)
>
> Low transistor counts do not necessarily translate to low FPGA
> resources. Early CPUs used dynamic storage, dual clock latches, pass
> logic and tri-state buses to create really small designs that don't
> necessarily map well to FPGAs. On the other hand, FPGAs have specific
> features (depending on the brand) that can be exploited to create really
> tight designs.
>
> Also, these processors are slow, using multi cycle instructions, and 8
> bit operations. That may not be acceptable. And even if low performance
> is acceptable, there are numerous other ways where you can trade speed
> for code density, so you'd have to consider these too. For instance, I
> can replace pipelining with multi-cycle execution, or use microcode.

I think you have a very good handle on the nature of my goals.

-- 

Rick

Article: 155042
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 2 Apr 2013 19:03:29 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga rickman <gnuarm@gmail.com> wrote:

(snip, I wrote)
>> Also, maybe a MIX and MMIX machine, maybe the decimal version.

>> For those who don't read TAOCP, MIX is defined independent of the
>> underlying base. Programs in MIXAL are supposed to assemble and
>> run correctly on hosts that use any base within the range specified.

>> Instruction bytes have, if I remember, between 64 and 100 possible
>> values, such that six bits or two decimal digits are possible
>> representations.

>> I believe that allows for bases 2, 3, 4, 8, 9, 10, and 16.

> Doesn't sound like an especially practical computer.  
> Has anyone ever built one?

Well, it exists mostly to write the examples (and, I believe,
homework problems) in the book. I believe that there have been
software emulation, but maybe no hardware (FPGA) versions.

MIXAL programs should be base independent, but actual implementations
are likely one base. 

(The model number, 1009 or MIX in roman numerals, is supposed to
be the average of the model numbers of some popular machines.

There is also the DLX, a RISC machine used by Hennessy and Patterson
in their book, which could also be in roman numerals.

-- glen

Article: 155043
Subject: Re: MISC - Stack Based vs. Register Based
From: David Brown <david@westcontrol.removethisbit.com>
Date: Wed, 03 Apr 2013 09:12:23 +0200
Links: << >> << T >> << A >>

On 02/04/13 19:20, glen herrmannsfeldt wrote:
> In comp.arch.fpga rickman <gnuarm@gmail.com> wrote:
>> On 3/31/2013 2:34 PM, glen herrmannsfeldt wrote:

>>> For Itanium, the different units do different things. There are
>>> instruction formats that divide up the bits in different ways to make
>>> optimal use of the bits. I used to have the manual nearby, but I don't
>>> see it right now.
>  
>> Yes, the array processor I worked on was coded from scratch, very 
>> laboriously.  The Itanium is trying to run existing code as fast as 
>> possible.  So they have a number of units to do similar things, but also 
>> different types, all working in parallel as much as possible.  Also the 
>> parallelism in the array processor was all controlled by the programmer. 
>>  In regular x86 processors the parallelism is controlled by the chip 
>> itself.  I'm amazed sometimes at just how much they can get the chip to 
>> do, no wonder there are 100's of millions of transistors on the thing. 
>> I assume parallelism in the Itanium is back to the compiler smarts to 
>> control since it needs to be coded into the VLIW instructions.
> 
> Seems to me that the big problem with the original Itanium was the
> need to also run x86 code. That delayed the release for some time, and
> in that time other processors had advanced. I believe that later
> versions run x86 code in software emulation, maybe with some hardware
> assist. 
> 

x86 compatibility was not the "big" problem with the Itanium (though it
didn't help).  There were two far bigger problems.  One is that the chip
was targeted as maximising throughput with little regard for power
efficiency, since it was for the server market - so all of the logic was
running all of the time to avoid latencies, and it has massive caches
that run as fast as possible.  The result here is that the original
devices had a power density exceeding the core of a nuclear reactor (it
was probably someone from AMD who worked that out...).

The big problem, however, is that the idea with VLIW is that the
compiler does all the work scheduling instructions in a way that lets
them run in parallel.  This works in some specialised cases - some DSP's
have this sort of architecture, and some types of mathematical
algorithms suit it well.  But when Intel started work on the Itanium,
compilers were not up to the task - Intel simply assumed they would work
well enough by the time the chips were ready.  Unfortunately for Intel,
compiler technology never made it - and in fact, it will never work
particularly well for general code.  There are too many unpredictable
branches and conditionals to predict parallelism at compile time.  So
most real-world Itanium code uses only about a quarter or so of the
processing units in the cpu at any one time (though some types of code
can work far better).  Thus Itanium chips run at half the real-world
speed of "normal" processors, while burning through at least twice the
power.

Article: 155044
Subject: Re: MISC - Stack Based vs. Register Based
From: "Rod Pemberton" <do_not_have@notemailnotq.cpm>
Date: Wed, 3 Apr 2013 20:34:06 -0400
Links: << >> << T >> << A >>

"rickman" <gnuarm@gmail.com> wrote in message
news:kjf48e$5qu$1@dont-email.me...

> Weren't you the person who brought CISC into this discussion?

Yes.

> Why are you asking this question about CISC?

You mentioned code density.  AISI, code density is purely a CISC
concept.  They go together and are effectively inseparable.

RISC was about effectively using all the processor clock cycles by
using fast instructions.  RISC wasn't concerned about the encoded
size of instructions, how much memory a program consumed, the
cost of memory, or how fast memory needed to be.

CISC was about reducing memory consumed per instruction.  CISC
reduced the average size of encoded instructions while also
increasing the amount of work each instruction performs.  CISC was
typically little-endian to reduce the space needed for integer
encodings.  However, increasing the amount of work per instruction
produces highly specialized instructions that are the
characteristic of CISC.  You only need to look at the x86
instruction set to find some, e.g., STOS, LODS, XLAT, etc.  They
are also slow to decode and execute as compared to RISC.

So, if memory is cheap and fast, there is no point in improving
code density, i.e., use RISC.  If memory is expensive or slow, use
CISC.

Arlet mentioned changes to a processor that appeared to me to have
nothing to do with increasing or decreasing code density.  AISI,
the changes he mentioned would only affect what was in current set
of MISC instructions, i.e., either a set of register-based MISC
instructions or a set of stack-based MISC instructions.  This was
stated previously.



Rod Pemberton

Article: 155045
Subject: Re: MISC - Stack Based vs. Register Based
From: "Rod Pemberton" <do_not_have@notemailnotq.cpm>
Date: Wed, 3 Apr 2013 20:35:36 -0400
Links: << >> << T >> << A >>

"rickman" <gnuarm@gmail.com> wrote in message
news:kjeve8$tvm$1@dont-email.me...
> On 3/30/2013 5:54 PM, Rod Pemberton wrote:
...

> > Even if you implement CISC-like instructions, you can't
> > forgo the MISC instructions you already have in order to
> > add the CISC-like instructions.
>
> Really?  I can't drop the entire instruction set and start over?
> Who says so?  Am I breaking a law?
>

It's just a logical outcome, AISI.  The design criteria that you
stated was that of producing MISC processors.  MISC seems to be
purely about the minimizing the quantity of instructions.  You've
produced a MISC processor.  So, if you now change your mind about
MISC and add additional instructions to your processor's
instruction set, especially non-MISC instructions, you're
effectively going against your own stated design requirement: MISC
or reducing the quantity of instructions.  So, it'd be you who
says so, or not...  It just seems contradictory with your past
self to change course now with your current self.

> > So, to do that, you'll need to increase
> > the size of the instruction set, as well as implement
> > a more complicated instruction decoder.
>
> Define "increase the size of the instruction set".

You'll have more instructions in your instruction set.

> I am using a 9 bit opcode for my stack design and am
> using a similar 9 bit opcode for the register design.  In what
> way is the register design using a larger instruction set?
>

You haven't added any additional CISC-like instructions, yet.  You
just exchanged stack operations for register operations.  So,
none for now.

> > I.e., that means the processor will no longer be MISC,
>> but MISC+minimal CISC hybrid, or pure
> > CISC...
>
> Nonsense.  "Minimal Instruction Set Computer (MISC) is a
> processor architecture with a very small number of basic
> operations and corresponding opcodes." from Wikipedia.
> BTW, I don't think the term MISC is widely used and is not
> well defined.  This is the only web page I found that even
> tries to define it.
>
> Actually, if you consider only the opcodes and not the operand
> combinations, I think the register design may have fewer
> instructions than does the stack design.  But the register
> design still is in work so I'm not done counting yet.
>

How exactly does fewer instructions contribute to increased code
density for the remaining instructions?  The eliminated
instructions are no longer a measured component of code density.
I.e., they no longer consume memory and therefore aren't measured.

> There are some interesting corners to be explored.  For example
> a MOV rx,rx is essentially a NOP.  There are eight of these.  So
> instead, why not make them useful by clearing the register?
> So MOV rx,rx is a clear to be given the name CLR rx... unless
> the rx is r7 in which case it is indeed a MOV r7,r7 which is now
> a NOP to be coded as such.  The CLR r7 is not needed because
> LIT 0 already does the same job.  Even better the opcode for a
> NOP is 0x1FF or octal 777.  That's very easy to remember
> and recognize.  It feels good to find convenient features like
> this and makes me like the register MISC design.

I can see that you're attempting to minimize the quantity of
implemented instructions.  Although similar in nature, that's not
the same as improving the code density.  Are you conflating two
different concepts?  One of them reduces the encoded size of an
instruction, while the other eliminates instructions ...

How are you going to attempt to increase the code density for your
processor?

1) adding new, additional, more powerful instructions that you
don't already have
2) merging existing instruction into fewer instructions
3) finding a more compact method of instruction encoding
4) use little-endian to reduce encoded sizes of integers
5) none or other

I'd think it should be #1 and #3 and #4, or #2 and #3 and #4, or
"other" and #3 and #4 ...

Rod Pemberton

Article: 155046
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 4 Apr 2013 02:07:22 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga Rod Pemberton <do_not_have@notemailnotq.cpm> wrote:
> "rickman" <gnuarm@gmail.com> wrote in message
> news:kjf48e$5qu$1@dont-email.me...
>> Weren't you the person who brought CISC into this discussion?

> Yes.

>> Why are you asking this question about CISC?

> You mentioned code density.  AISI, code density is purely a CISC
> concept.  They go together and are effectively inseparable.

They do go together, but I am not so sure that they are inseperable.

CISC began when much coding was done in pure assembler, and anything
that made that easier was useful. (One should figure out the relative
costs, but at least it was in the right direction.)

That brought instructions like S/360 EDMK and VAX POLY. 
(Stories are that on most VAX models, POLY is slower than an
explicit loop.) Now, there is no need to waste bits, and so 
instruction formats were defined to use the available bits.

S/360 (and successors) only have three different instruction lengths,
and even then sometimes waste bits. 

The VAX huge number of different instructions lengths, and also IA32,
does seem to be for code size efficiency. VAX was also defined with
a 512 byte page size, even after S/370 had 2K and 4K pages.
Way too small, but maybe seemed right at the time.

> RISC was about effectively using all the processor clock cycles by
> using fast instructions.  RISC wasn't concerned about the encoded
> size of instructions, how much memory a program consumed, the
> cost of memory, or how fast memory needed to be.

Yes, but that doesn't mean that CISC is concerned with the size
of instructions. 

> CISC was about reducing memory consumed per instruction.  CISC
> reduced the average size of encoded instructions while also
> increasing the amount of work each instruction performs.  

Even if it is true (and it probably is) that CISC tends to make
efficient use of the bits, that doesn't prove that is what CISC
was about. 

As above, CISC was about making coding easier for programmers,
specifically assembly programmers. Now, complex instructions take
less space than a series of simpler instructions, but then again
one could use a subroutine call. 

The PDP-10 has the indirect bit, allowing for nested indirection,
which may or may not make efficient use of that bit. S/360 uses
a sequence of L (load) instructions to do indirection. 

Instruction usage statistics noting how often L (load) was 
executed in S/360 code may have been the beginning of RISC.

> CISC was typically little-endian to reduce the space needed 
> for integer encodings.  

This I don't understand at all. They take the same amount
of space. Little endian does make it slightly easier to
do a multiword (usually byte) add, and that may have helped
for the 6502. It allows one to propagate the carry in the
same order one reads bytes from memory. 

But once you add multiply and divide, the advantage is pretty 
small. 

> However, increasing the amount of work per instruction
> produces highly specialized instructions that are the
> characteristic of CISC.  You only need to look at the x86
> instruction set to find some, e.g., STOS, LODS, XLAT, etc.  They
> are also slow to decode and execute as compared to RISC.

Those are not very CISCy compared with some S/360 or VAX 
instructions. Now, compare to S/360 TR which will translate
by looking up bytes in a lookup table for 1 to 256 byte
long strings. (Unless I remember wrong, XLAT does one byte.)

> So, if memory is cheap and fast, there is no point in improving
> code density, i.e., use RISC.  If memory is expensive or slow, use
> CISC.

Well, RISC is more toward using simpler instructions that compilers
actually generate and executing them fast. Having one instruction
size helps things go fast, and tends to be less efficient with bits.

Even so, I believe that you will find that RISC designers try to
make efficient use of the bits available, within the single size
instruction constraint.

> Arlet mentioned changes to a processor that appeared to me to have
> nothing to do with increasing or decreasing code density.  AISI,
> the changes he mentioned would only affect what was in current set
> of MISC instructions, i.e., either a set of register-based MISC
> instructions or a set of stack-based MISC instructions.  This was
> stated previously.

Decoding multiple different instruction formats tends to require
complicated demultiplexers which are especially hard to do in
an FPGA. Even so, one can make efficient use of the bits
and still be MISC.

-- glen

Article: 155047
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Wed, 03 Apr 2013 22:42:06 -0400
Links: << >> << T >> << A >>

On 4/3/2013 8:34 PM, Rod Pemberton wrote:
> "rickman"<gnuarm@gmail.com>  wrote in message
> news:kjf48e$5qu$1@dont-email.me...
>
>> Weren't you the person who brought CISC into this discussion?
>
> Yes.
>
>> Why are you asking this question about CISC?
>
> You mentioned code density.  AISI, code density is purely a CISC
> concept.  They go together and are effectively inseparable.

Ok, so that's how you see it.

> RISC was about effectively using all the processor clock cycles by
> using fast instructions.  RISC wasn't concerned about the encoded
> size of instructions, how much memory a program consumed, the
> cost of memory, or how fast memory needed to be.

Don't know why you are even mentioning RISC.

> CISC was about reducing memory consumed per instruction.  CISC
> reduced the average size of encoded instructions while also
> increasing the amount of work each instruction performs.  CISC was
> typically little-endian to reduce the space needed for integer
> encodings.  However, increasing the amount of work per instruction
> produces highly specialized instructions that are the
> characteristic of CISC.  You only need to look at the x86
> instruction set to find some, e.g., STOS, LODS, XLAT, etc.  They
> are also slow to decode and execute as compared to RISC.

I think CISC was not solely about reducing memory used per instruction. 
  CISC was not an area of work.  CISC was not even coined until long 
after many CISC machines were designed.  Most CISC computers were 
designed with very different goals in mind.  For example, the x86, a 
CISC processor, was initially designed to extend the x86 instruction set 
to a 16 bit processor and then to a 32 bit processor.  The goal was just 
to develop an instruction set that was backwards compatible with 
existing processors while adding capabilities that would make 32 bit 
processors marketable.

> So, if memory is cheap and fast, there is no point in improving
> code density, i.e., use RISC.  If memory is expensive or slow, use
> CISC.

LOL, that is a pretty MINIMAL analysis of computers, so I guess it is 
MACA, Minimal Analysis of Computer Architectures.

> Arlet mentioned changes to a processor that appeared to me to have
> nothing to do with increasing or decreasing code density.  AISI,
> the changes he mentioned would only affect what was in current set
> of MISC instructions, i.e., either a set of register-based MISC
> instructions or a set of stack-based MISC instructions.  This was
> stated previously.

This is so far out of context I can't comment.

-- 

Rick

Article: 155048
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Thu, 04 Apr 2013 00:09:53 -0400
Links: << >> << T >> << A >>

On 4/3/2013 8:35 PM, Rod Pemberton wrote:
> "rickman"<gnuarm@gmail.com>  wrote in message
> news:kjeve8$tvm$1@dont-email.me...
>> On 3/30/2013 5:54 PM, Rod Pemberton wrote:
> ...
>
>>> Even if you implement CISC-like instructions, you can't
>>> forgo the MISC instructions you already have in order to
>>> add the CISC-like instructions.
>>
>> Really?  I can't drop the entire instruction set and start over?
>> Who says so?  Am I breaking a law?
>>
>
> It's just a logical outcome, AISI.  The design criteria that you
> stated was that of producing MISC processors.  MISC seems to be
> purely about the minimizing the quantity of instructions.  You've
> produced a MISC processor.  So, if you now change your mind about
> MISC and add additional instructions to your processor's
> instruction set, especially non-MISC instructions, you're
> effectively going against your own stated design requirement: MISC
> or reducing the quantity of instructions.  So, it'd be you who
> says so, or not...  It just seems contradictory with your past
> self to change course now with your current self.

Your logic seems to be flawed on so many levels.  I don't think I stated 
that producing a MISC processor was a "design criteria".  It doesn't 
even make sense to have that as a "design criteria".

I never said I was "adding" instructions to some existing instruction 
set.  In fact, I think I've said that the instruction set for the 
register based MISC processor is so far, *smaller* than the instruction 
set for the stack based MISC processor as long as you don't consider 
each combination of X and Y in MOV rx,ry to be a separate instruction. 
If you feel each combination is a separate instruction then they both 
have approximately the same number of instructions since they both have 
9 bit instructions and so have 512 possible instructions.

>>> So, to do that, you'll need to increase
>>> the size of the instruction set, as well as implement
>>> a more complicated instruction decoder.
>>
>> Define "increase the size of the instruction set".
>
> You'll have more instructions in your instruction set.

Sorry, that isn't a good definition because you used part of the term 
you are defining in the definition.

>> I am using a 9 bit opcode for my stack design and am
>> using a similar 9 bit opcode for the register design.  In what
>> way is the register design using a larger instruction set?
>>
>
> You haven't added any additional CISC-like instructions, yet.  You
> just exchanged stack operations for register operations.  So,
> none for now.

Ok, now we are getting somewhere.  In fact, if you read my other posts, 
you will find that I *won't* be adding any CISC instructions because one 
of my stated "design criteria" is that each instruction executes in one 
clock cycle.  It's pretty hard to design a simple machine that can do 
"complex" instructions without executing them in multiple clock cycles.

>>> I.e., that means the processor will no longer be MISC,
>>> but MISC+minimal CISC hybrid, or pure
>>> CISC...
>>
>> Nonsense.  "Minimal Instruction Set Computer (MISC) is a
>> processor architecture with a very small number of basic
>> operations and corresponding opcodes." from Wikipedia.
>> BTW, I don't think the term MISC is widely used and is not
>> well defined.  This is the only web page I found that even
>> tries to define it.
>>
>> Actually, if you consider only the opcodes and not the operand
>> combinations, I think the register design may have fewer
>> instructions than does the stack design.  But the register
>> design still is in work so I'm not done counting yet.
>>
>
> How exactly does fewer instructions contribute to increased code
> density for the remaining instructions?  The eliminated
> instructions are no longer a measured component of code density.
> I.e., they no longer consume memory and therefore aren't measured.

Not sure what you mean here.  Code density how many instructions it 
takes to do a given amount of work.  I measure this by writing code and 
counting the instructions it takes.  Right now I have a section of code 
I am working on that performs the DDS calculations from a set of control 
inputs to the DDS.  This is what I was working on when I realized that a 
register based design likely could do this without the stack ops, OVER 
mainly, but also nearly all the others that just work on the top two 
stack items.

So far it appears the register based instructions are significantly more 
compact than the stack based instructions.  Just as important, the 
implementation appears to be simpler for the register based design.  But 
that is just *so far*.  I am still working on this.  The devil is in the 
details and I may find some aspects of what I am doing that cause 
problems and can't be done in the instruction formats I am planning or 
something blows up the hardware to be much bigger than I am picturing at 
the moment.

>> There are some interesting corners to be explored.  For example
>> a MOV rx,rx is essentially a NOP.  There are eight of these.  So
>> instead, why not make them useful by clearing the register?
>> So MOV rx,rx is a clear to be given the name CLR rx... unless
>> the rx is r7 in which case it is indeed a MOV r7,r7 which is now
>> a NOP to be coded as such.  The CLR r7 is not needed because
>> LIT 0 already does the same job.  Even better the opcode for a
>> NOP is 0x1FF or octal 777.  That's very easy to remember
>> and recognize.  It feels good to find convenient features like
>> this and makes me like the register MISC design.
>
> I can see that you're attempting to minimize the quantity of
> implemented instructions.  Although similar in nature, that's not
> the same as improving the code density.  Are you conflating two
> different concepts?  One of them reduces the encoded size of an
> instruction, while the other eliminates instructions ...

You really don't seem to understand what I am doing.  You continually 
misinterpret what I explain.

> How are you going to attempt to increase the code density for your
> processor?
>
> 1) adding new, additional, more powerful instructions that you
> don't already have
> 2) merging existing instruction into fewer instructions
> 3) finding a more compact method of instruction encoding
> 4) use little-endian to reduce encoded sizes of integers
> 5) none or other
>
> I'd think it should be #1 and #3 and #4, or #2 and #3 and #4, or
> "other" and #3 and #4 ...

Uh, I am designing an instruction set that does as much as possible with 
as little hardware as possible.  When you say, "new, additional" 
instructions, compared to what?  When you say "more compact", again, 
compared to what exactly?

When you say "litte-endian" to reduce encoded integer size, what exactly 
is that?  Are you referring to specifying an integer in small chunks so 
that sign extension allows the specification to be limited in length? 
Yes, that is done on both the stack and register based designs. 
Koopmans paper lists literals and calls as some of the most frequently 
used instructions, so optimizing literals optimizes the most frequently 
used instructions.

-- 

Rick

Article: 155049
Subject: Re: MISC - Stack Based vs. Register Based
From: Syd Rumpo <usenet@nononono.co.uk>
Date: Thu, 04 Apr 2013 09:38:34 +0100
Links: << >> << T >> << A >>

On 29/03/2013 21:00, rickman wrote:
> I have been working with stack based MISC designs in FPGAs for some
> years.  All along I have been comparing my work to the work of others.
> These others were the conventional RISC type processors supplied by the
> FPGA vendors as well as the many processor designs done by individuals
> or groups as open source.

<snip>

Can you achieve as fast interrupt response times on a register-based 
machine as a stack machine?  OK, shadow registers buy you one fast 
interrupt, but that's sort of a one-level 2D stack.

Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt 
response time.

Cheers
-- 
Syd

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search