Messages from 155050

Article: 155050
Subject: Re: MISC - Stack Based vs. Register Based
From: Arlet Ottens <usenet+5@c-scape.nl>
Date: Thu, 04 Apr 2013 11:31:38 +0200
Links: << >> << T >> << A >>

On 04/04/2013 10:38 AM, Syd Rumpo wrote:
> On 29/03/2013 21:00, rickman wrote:
>> I have been working with stack based MISC designs in FPGAs for some
>> years.  All along I have been comparing my work to the work of others.
>> These others were the conventional RISC type processors supplied by the
>> FPGA vendors as well as the many processor designs done by individuals
>> or groups as open source.
>
> <snip>
>
> Can you achieve as fast interrupt response times on a register-based
> machine as a stack machine?  OK, shadow registers buy you one fast
> interrupt, but that's sort of a one-level 2D stack.
>
> Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt
> response time.

It depends on the implementation.

The easiest thing would be to not save anything at all before jumping to 
the interrupt handler. This would make the interrupt response really 
fast, but you'd have to save the registers manually before using them. 
It would benefit systems that don't need many (or any) registers in the 
interrupt handler. And even saving 4 registers at 100 MHz only takes an 
additional 40 ns.

If you have parallel access to the stack/program memory, you could like 
the Cortex, and save a few (e.g. 4) registers on the stack, while you 
fetch the interrupt vector, and refill the execution pipeline at the 
same time. This adds a considerable bit of complexity, though.

If you keep the register file in a large memory, like a internal block 
RAM, you can easily implement multiple sets of shadow registers.

Of course, an FPGA comes with flexible hardware such as large FIFOs, so 
you can generally avoid the need for super fast interrupt response. In 
fact, you may not even need interrupts at all.

Article: 155051
Subject: Re: MISC - Stack Based vs. Register Based
From: albert@spenarnc.xs4all.nl (Albert van der Horst)
Date: 04 Apr 2013 11:16:17 GMT
Links: << >> << T >> << A >>

In article <kjin8q$so5$1@speranza.aioe.org>,
glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
>In comp.arch.fpga Rod Pemberton <do_not_have@notemailnotq.cpm> wrote:
>> "rickman" <gnuarm@gmail.com> wrote in message
>> news:kjf48e$5qu$1@dont-email.me...
>>> Weren't you the person who brought CISC into this discussion?
>
>> Yes.
>
>>> Why are you asking this question about CISC?
>
>> You mentioned code density.  AISI, code density is purely a CISC
>> concept.  They go together and are effectively inseparable.
>
>They do go together, but I am not so sure that they are inseperable.
>
>CISC began when much coding was done in pure assembler, and anything
>that made that easier was useful. (One should figure out the relative
>costs, but at least it was in the right direction.)

But, of course, this is a fallacy. The same goal is accomplished by
macro's, and better. Code densitity is the only valid reason.

<SNIP>
>
>Decoding multiple different instruction formats tends to require
>complicated demultiplexers which are especially hard to do in
>an FPGA. Even so, one can make efficient use of the bits
>and still be MISC.

>
>-- glen
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Article: 155052
Subject: Re: MISC - Stack Based vs. Register Based
From: Arlet Ottens <usenet+5@c-scape.nl>
Date: Thu, 04 Apr 2013 13:25:48 +0200
Links: << >> << T >> << A >>

On 04/04/2013 01:16 PM, Albert van der Horst wrote:

>>> You mentioned code density.  AISI, code density is purely a CISC
>>> concept.  They go together and are effectively inseparable.
>>
>> They do go together, but I am not so sure that they are inseperable.
>>
>> CISC began when much coding was done in pure assembler, and anything
>> that made that easier was useful. (One should figure out the relative
>> costs, but at least it was in the right direction.)
>
> But, of course, this is a fallacy. The same goal is accomplished by
> macro's, and better. Code densitity is the only valid reason.

Speed is another valid reason.

Article: 155053
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 4 Apr 2013 12:44:10 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga Arlet Ottens <usenet+5@c-scape.nl> wrote:
> On 04/04/2013 01:16 PM, Albert van der Horst wrote:

>>>> You mentioned code density.  AISI, code density is purely a CISC
>>>> concept.  They go together and are effectively inseparable.

>>> They do go together, but I am not so sure that they are inseperable.

>>> CISC began when much coding was done in pure assembler, and anything
>>> that made that easier was useful. (One should figure out the relative
>>> costs, but at least it was in the right direction.)

>> But, of course, this is a fallacy. The same goal is accomplished by
>> macro's, and better. Code densitity is the only valid reason.

> Speed is another valid reason.

Presumably some combination of ease of coding, speed, and also Brooks'
"Second System Effect".

Paraphrasing from "Mythical Man Month" since I haven't read it recently,
the ideas that designers couldn't implement in their first system that
they designed, for cost/efficiency/whatever reasons, come out in the
second system. 

Brooks wrote that more for OS/360 (software) than for S/360 (hardware),
but it might still have some effect on the hardware, and maybe also
for VAX.

There are a number of VAX instructions that seem like a good idea, but
as I understand it ended up slower than if done without the special
instructions.

As examples, both the VAX POLY and INDEX instruction. When VAX was new,
compiled languages (Fortran for example) pretty much never did array
bounds testing. It was just too slow. So VAX supplied INDEX, which in
one instruction did the multiply/add needed for a subscript calcualtion
(you do one INDEX for each subscript) and also checked that the
subscript was in range. Nice idea, but it seems that even with INDEX
it was still too slow.

Then POLY evaluates a whole polynomial, such as is used to approximate
many mathematical functions, but again, as I understand it, too slow.

Both the PDP-10 and S/360 have the option for an index register on
many instructions, where when register 0 is selected no indexing is
done. VAX instead has indexed as a separate address mode selected by
the address mode byte. Is that the most efficient use for those bits?

-- glen

Article: 155054
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 4 Apr 2013 12:49:54 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga Syd Rumpo <usenet@nononono.co.uk> wrote:

(snip)
> Can you achieve as fast interrupt response times on a register-based 
> machine as a stack machine?  OK, shadow registers buy you one fast 
> interrupt, but that's sort of a one-level 2D stack.

If you disable interrupts so that another one doesn't come along
before you can save enough state for the first one, yes.

S/360 does it with no stack. You have to have some place in the
low (first 4K) address range to save at least one register. 
The hardware saves the old PSW at a fixed (for each type of
interrupt) address, which you also have to move somewhere else
before enabling more interrupts of the same type.

> Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt 
> response time.

-- glen

Article: 155055
Subject: Re: MISC - Stack Based vs. Register Based
From: Arlet Ottens <usenet+5@c-scape.nl>
Date: Thu, 04 Apr 2013 15:02:34 +0200
Links: << >> << T >> << A >>

On 04/04/2013 02:49 PM, glen herrmannsfeldt wrote:
> In comp.arch.fpga Syd Rumpo <usenet@nononono.co.uk> wrote:
>
> (snip)
>> Can you achieve as fast interrupt response times on a register-based
>> machine as a stack machine?  OK, shadow registers buy you one fast
>> interrupt, but that's sort of a one-level 2D stack.
>
> If you disable interrupts so that another one doesn't come along
> before you can save enough state for the first one, yes.
>
> S/360 does it with no stack. You have to have some place in the
> low (first 4K) address range to save at least one register.
> The hardware saves the old PSW at a fixed (for each type of
> interrupt) address, which you also have to move somewhere else
> before enabling more interrupts of the same type.

ARM7 is similar. PC and PSW are copied to registers, and further 
interrupts are disabled. The hardware does not touch the stack. If you 
want to make nested interrupts, the programmer is responsible for saving 
these registers.

ARM Cortex has changed that, and it saves registers on the stack. This 
allows interrupt handlers to be written as regular higher language 
functions, and also allows easy nested interrupts.

When dealing with back-to-back interrupts, the Cortex takes a shortcut, 
and does not pop/push the registers, but just leaves them on the stack.

Article: 155056
Subject: Re: MISC - Stack Based vs. Register Based
From: "Elizabeth D. Rather" <erather@forth.com>
Date: Thu, 04 Apr 2013 08:36:55 -1000
Links: << >> << T >> << A >>

On 4/3/13 11:31 PM, Arlet Ottens wrote:
> On 04/04/2013 10:38 AM, Syd Rumpo wrote:
>> On 29/03/2013 21:00, rickman wrote:
>>> I have been working with stack based MISC designs in FPGAs for some
>>> years.  All along I have been comparing my work to the work of others.
>>> These others were the conventional RISC type processors supplied by the
>>> FPGA vendors as well as the many processor designs done by individuals
>>> or groups as open source.
>>
>> <snip>
>>
>> Can you achieve as fast interrupt response times on a register-based
>> machine as a stack machine?  OK, shadow registers buy you one fast
>> interrupt, but that's sort of a one-level 2D stack.
>>
>> Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt
>> response time.
>
> It depends on the implementation.
>
> The easiest thing would be to not save anything at all before jumping to
> the interrupt handler. This would make the interrupt response really
> fast, but you'd have to save the registers manually before using them.
> It would benefit systems that don't need many (or any) registers in the
> interrupt handler. And even saving 4 registers at 100 MHz only takes an
> additional 40 ns.

The best interrupt implementation just jumps to the handler code. The 
implementation knows what registers it has to save and restore, which 
may be only one or two. Saving and restoring large register files takes 
cycles!

> If you have parallel access to the stack/program memory, you could like
> the Cortex, and save a few (e.g. 4) registers on the stack, while you
> fetch the interrupt vector, and refill the execution pipeline at the
> same time. This adds a considerable bit of complexity, though.
>
> If you keep the register file in a large memory, like a internal block
> RAM, you can easily implement multiple sets of shadow registers.
>
> Of course, an FPGA comes with flexible hardware such as large FIFOs, so
> you can generally avoid the need for super fast interrupt response. In
> fact, you may not even need interrupts at all.

Interrupts are good. I don't know why people worry about them so!

Cheers,
Elizabeth

-- 
==================================================
Elizabeth D. Rather   (US & Canada)   800-55-FORTH
FORTH Inc.                         +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Article: 155057
Subject: Re: FPGA for large HDMI switch
From: Matt L <matt.lettau@gmail.com>
Date: Thu, 4 Apr 2013 12:08:47 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, April 2, 2013 8:27:07 AM UTC-7, David Brown wrote:
> I am working on a project that will involve a large HDMI switch - up to
> 
> 16 inputs and 16 outputs.  We haven't yet decided on the architecture,
> 
> but one possibility is to use one or more FPGAs.  The FPGAs won't be
> 
> doing much other than the switch - there is no video processing going on.
> 
> 
> 
> Each HDMI channel will be up to 3.4 Gbps (for HDMI 1.4), with 4 TMDS
> 
> pairs (3 data and 1 clock).  That means 64 pairs in, and 64 pairs out,
> 
> all at 3.4 Gbps.
> 
> 
> 
> 
> 
> Does anyone know of any FPGA families that might be suitable here?
> 
> 
> 
> I've had a little look at Altera (since I've used Altera devices
> 
> before), but their low-cost transceivers are at 3.125 Gbps - this means
> 
> we'd have to use their mid or high cost devices, and they don't have
> 
> nearly enough channels.  I don't expect the card to be particularly
> 
> cheap, but I'd like to avoid the cost of multiple top-range FPGA devices
> 
> - then it would be much cheaper just to have a card with 80 4-to-1 HDMI
> 
> mux chips.
> 
> 
> 
> Thanks for any pointers,
> 
> 
> 
> David

You cannot do what you desire in an FPGA, even if one existed with 64 high speed serdes at sufficient speed and cost. What you seek is a serial crosspoint switch. Look at vendors like Mindspeed.

Article: 155058
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Thu, 04 Apr 2013 17:04:05 -0400
Links: << >> << T >> << A >>

On 4/4/2013 8:44 AM, glen herrmannsfeldt wrote:
> In comp.arch.fpga Arlet Ottens<usenet+5@c-scape.nl>  wrote:
>> On 04/04/2013 01:16 PM, Albert van der Horst wrote:
>
>>>>> You mentioned code density.  AISI, code density is purely a CISC
>>>>> concept.  They go together and are effectively inseparable.
>
>>>> They do go together, but I am not so sure that they are inseperable.
>
>>>> CISC began when much coding was done in pure assembler, and anything
>>>> that made that easier was useful. (One should figure out the relative
>>>> costs, but at least it was in the right direction.)
>
>>> But, of course, this is a fallacy. The same goal is accomplished by
>>> macro's, and better. Code densitity is the only valid reason.

Albert, do you have a reference about this?


>> Speed is another valid reason.
>
> Presumably some combination of ease of coding, speed, and also Brooks'
> "Second System Effect".
>
> Paraphrasing from "Mythical Man Month" since I haven't read it recently,
> the ideas that designers couldn't implement in their first system that
> they designed, for cost/efficiency/whatever reasons, come out in the
> second system.
>
> Brooks wrote that more for OS/360 (software) than for S/360 (hardware),
> but it might still have some effect on the hardware, and maybe also
> for VAX.
>
> There are a number of VAX instructions that seem like a good idea, but
> as I understand it ended up slower than if done without the special
> instructions.
>
> As examples, both the VAX POLY and INDEX instruction. When VAX was new,
> compiled languages (Fortran for example) pretty much never did array
> bounds testing. It was just too slow. So VAX supplied INDEX, which in
> one instruction did the multiply/add needed for a subscript calcualtion
> (you do one INDEX for each subscript) and also checked that the
> subscript was in range. Nice idea, but it seems that even with INDEX
> it was still too slow.
>
> Then POLY evaluates a whole polynomial, such as is used to approximate
> many mathematical functions, but again, as I understand it, too slow.
>
> Both the PDP-10 and S/360 have the option for an index register on
> many instructions, where when register 0 is selected no indexing is
> done. VAX instead has indexed as a separate address mode selected by
> the address mode byte. Is that the most efficient use for those bits?

I think you have just described the CISC instruction development 
concept.  Build a new machine, add some new instructions.  No big 
rational, no "CISC" concept, just "let's make it better, why not add 
some instructions?"

I believe if you check you will find the term CISC was not even coined 
until after RISC was invented.  So CISC really just means, "what we used 
to do".

-- 

Rick

Article: 155059
Subject: Re: FPGA for large HDMI switch
From: David Brown <david.brown@removethis.hesbynett.no>
Date: Thu, 04 Apr 2013 23:31:59 +0200
Links: << >> << T >> << A >>

On 04/04/13 21:08, Matt L wrote:
> On Tuesday, April 2, 2013 8:27:07 AM UTC-7, David Brown wrote:
>> I am working on a project that will involve a large HDMI switch -
>> up to
>>
>> 16 inputs and 16 outputs.  We haven't yet decided on the
>> architecture,
>>
>> but one possibility is to use one or more FPGAs.  The FPGAs won't
>> be
>>
>> doing much other than the switch - there is no video processing
>> going on.
>>
>>
>>
>> Each HDMI channel will be up to 3.4 Gbps (for HDMI 1.4), with 4
>> TMDS
>>
>> pairs (3 data and 1 clock).  That means 64 pairs in, and 64 pairs
>> out,
>>
>> all at 3.4 Gbps.
>>
>>
>>
>>
>>
>> Does anyone know of any FPGA families that might be suitable here?
>>
>>
>>
>> I've had a little look at Altera (since I've used Altera devices
>>
>> before), but their low-cost transceivers are at 3.125 Gbps - this
>> means
>>
>> we'd have to use their mid or high cost devices, and they don't
>> have
>>
>> nearly enough channels.  I don't expect the card to be
>> particularly
>>
>> cheap, but I'd like to avoid the cost of multiple top-range FPGA
>> devices
>>
>> - then it would be much cheaper just to have a card with 80 4-to-1
>>  HDMI
>>
>> mux chips.
>>
>>
>>
>> Thanks for any pointers,
>>
>>
>>
>> David
>
> You cannot do what you desire in an FPGA, even if one existed with 64
> high speed serdes at sufficient speed and cost. What you seek is a
> serial crosspoint switch. Look at vendors like Mindspeed.
>

Thanks for that hint.  I got another reply suggesting a crosspoint 
switch - I will look at Mindspeed too now.

mvh.,

David

Article: 155060
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 4 Apr 2013 21:34:52 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga rickman <gnuarm@gmail.com> wrote:

(snip, then I wrote)

>> Then POLY evaluates a whole polynomial, such as is used to approximate
>> many mathematical functions, but again, as I understand it, too slow.

>> Both the PDP-10 and S/360 have the option for an index register on
>> many instructions, where when register 0 is selected no indexing is
>> done. VAX instead has indexed as a separate address mode selected by
>> the address mode byte. Is that the most efficient use for those bits?

> I think you have just described the CISC instruction development 
> concept.  Build a new machine, add some new instructions.  No big 
> rational, no "CISC" concept, just "let's make it better, why not add 
> some instructions?"

Yes, but remember that there is competition and each has to have
some reason why someone should by their product. Adding new instructions
was one way to do that. 

> I believe if you check you will find the term CISC was not even coined 
> until after RISC was invented.  So CISC really just means, "what we used 
> to do".

Well, yes, but why did "we used to do that"? For S/360, a lot of
software was still written in pure assembler, for one reason to make
it faster, and for another to make it smaller. And people were just
starting to learn that people (writing software) are more expensive
that machines (hardware). Well, that is about the point that it was
true. For earlier machines you were lucky to get one compiler and
enough system to run it. 

And VAX was enough later and even more CISCy.

-- glen

Article: 155061
Subject: Re: Xilinx tools for XC3020???
From: Mike Butts <mbuttspdx@gmail.com>
Date: Thu, 4 Apr 2013 15:20:09 -0700 (PDT)
Links: << >> << T >> << A >>

Hi Paul,

It's a plain XC3020. I installed a 2001 Student Edition on my Win7 machine, and it appears to run. All it needed was the license code that came with the CD-ROM. 

I'm pretty sure XC3020 and XC3020A differ only in timing and the same bitfile can work on both. Anyway I'll be finding out sometime soon and I'll post how it goes. 
Thanks all!

  --Mike

Article: 155062
Subject: Re: MISC - Stack Based vs. Register Based
From: Alex McDonald <blog@rivadpm.com>
Date: Thu, 4 Apr 2013 15:30:37 -0700 (PDT)
Links: << >> << T >> << A >>

On Apr 4, 10:04=A0pm, rickman <gnu...@gmail.com> wrote:
> On 4/4/2013 8:44 AM, glen herrmannsfeldt wrote:
>
> > In comp.arch.fpga Arlet Ottens<usene...@c-scape.nl> =A0wrote:
> >> On 04/04/2013 01:16 PM, Albert van der Horst wrote:
>
> >>>>> You mentioned code density. =A0AISI, code density is purely a CISC
> >>>>> concept. =A0They go together and are effectively inseparable.
>
> >>>> They do go together, but I am not so sure that they are inseperable.
>
> >>>> CISC began when much coding was done in pure assembler, and anything
> >>>> that made that easier was useful. (One should figure out the relativ=
e
> >>>> costs, but at least it was in the right direction.)
>
> >>> But, of course, this is a fallacy. The same goal is accomplished by
> >>> macro's, and better. Code densitity is the only valid reason.
>
> Albert, do you have a reference about this?
>
>

Let's take two commonly used S/360 opcodes as an example of CISC; some
move operations. MVC (move 0 to 255 bytes) MVCL (move 0 to 16M bytes).
MVC does no padding or truncation. MVCL can pad and truncate, but
unlike MVC will do nothing and report overflow if the operands
overlap. MVC appears to other processors as a single indivisible
operation; every processor (including IO processors) sees storage as
either before the MVC or after it; it's not interruptible. MVCL is
interruptible, and partial products can be observed by other
processors. MVCL requires 4 registers and their contents are updated
after completion of the operation; MVC requires 1 for variable length
moves, 0 for fixed and its contents are preserved. MVCL has a high
code setup cost; MVC has none.

Writing a macro to do multiple MVCs and mimic the behaviour of MVCL?
Why not? It's possible, if a little tricky. And by all accounts, MVC
in a loop is faster than MVCL too. IBM even provided a macro; $MVCL.

But then, when you look at MVCL usage closely, there are a few
defining characteristics that are very useful. It can zero memory, and
the millicode (IBM's word for microcode) recognizes 4K boundaries for
4K lengths and optimises it; it's faster than 16 MVCs.

There's even a MVPG instruction for moving 4K aligned pages! What are
those crazy instruction set designers thinking?

The answer's a bit more than just code density; it never really was
about that. In all the years I wrote IBM BAL, I never gave code
density a serious thought -- with one exception. That was the 4K base
address limit; a base register could only span 4K, so code that was
bigger than that, you had to have either fancy register footwork or
waste registers for multiple bases.

It was more about giving assembler programmers choice and variety to
get the best out of the box before the advent of optimising compilers;
a way, if you like, of exposing the potential of the micro/millicode
through the instruction set. "Here I want you to zero memory" meant an
MVCL. "Here I am moving 8 bytes from A to B" meant using MVC. A
knowledgeable assembler programmer could out-perform a compiler.
(Nowadays quality compilers do a much better job of instruction
selection than humans, especially for pipelined processors that
stall.)

Hence CISC instruction sets (at least, IMHO and for IBM). They were
there for people and performance, not for code density.

Article: 155063
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Thu, 04 Apr 2013 18:57:48 -0400
Links: << >> << T >> << A >>

On 4/4/2013 4:38 AM, Syd Rumpo wrote:
> On 29/03/2013 21:00, rickman wrote:
>> I have been working with stack based MISC designs in FPGAs for some
>> years. All along I have been comparing my work to the work of others.
>> These others were the conventional RISC type processors supplied by the
>> FPGA vendors as well as the many processor designs done by individuals
>> or groups as open source.
>
> <snip>
>
> Can you achieve as fast interrupt response times on a register-based
> machine as a stack machine? OK, shadow registers buy you one fast
> interrupt, but that's sort of a one-level 2D stack.
>
> Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt
> response time.

That's an interesting question.  The short answer is yes, but it 
requires that I provide circuitry to do two things, one is to save both 
the Processor Status Word (PSW) and the return address to the stack in 
one cycle.  The stack computer has two stacks and I can save these two 
items in one clock cycle.  Currently my register machine uses a stack in 
memory pointed to by a register, so it would require *two* cycles to 
save two words.  But the memory is dual ported and I can use a tiny bit 
of extra logic to save both words at once and bump the pointer by two.

The other task is to save registers.  The stack design doesn't really 
need to do that, the stack is available for new work and the interrupt 
routine just needs to finish with the stack in the same state as when it 
started.  I've been thinking about how to handle this in the register 
machine.

The registers are really two registers and one bank of registers.  R6 
and R7 are "special" in that they have a separate incrementer to support 
the addressing modes.  They need a separate write port so they can be 
updated in parallel with the other registers.  I have considered 
"saving" the registers by just bumping the start address of the 
registers in the RAM, but that only saves R0-R5.  I could use LUT RAM 
for R6 and R6 as well.  This would provide two sets of registers for 
R0-R5 and up to 16 sets for R6 and R7.  The imbalance isn't very useful, 
but at least there would be a set for the main program and a set for 
interrupts with the caveat that nothing can be retained between 
interrupts.  This also means interrupts can't be interrupted other than 
at specific points where the registers are not used for storage.

I'm also going to look at using a block RAM for the registers.  With 
only two read and write ports this makes the multiply step cycle longer 
though.

Once that issue is resolved the interrupt response then becomes the same 
as the stack machine - 1 clock cycle or 20 ns.

-- 

Rick

Article: 155064
Subject: Re: MISC - Stack Based vs. Register Based
From: Alex McDonald <blog@rivadpm.com>
Date: Thu, 4 Apr 2013 16:07:20 -0700 (PDT)
Links: << >> << T >> << A >>

On Apr 3, 5:34=A0pm, "Rod Pemberton" <do_not_h...@notemailnotq.cpm>
wrote:
> =A0CISC was
> typically little-endian to reduce the space needed for integer
> encodings.

As long as you discount IBM mainframes. They are big endian. Or
Borroughs/Unisys; they were big endian too. Or the Motorola 68K; it
was big endian. For little-endian CISC, only the VAX and x86 come to
mind. Of those only the x86 survives.

Article: 155065
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 4 Apr 2013 23:15:37 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga Alex McDonald <blog@rivadpm.com> wrote:

(snip, someone wrote)
>> >>> But, of course, this is a fallacy. The same goal is accomplished by
>> >>> macro's, and better. Code densitity is the only valid reason.

>> Albert, do you have a reference about this?

> Let's take two commonly used S/360 opcodes as an example of CISC; some
> move operations. MVC (move 0 to 255 bytes) MVCL (move 0 to 16M bytes).

MVC moves 1 to 256 bytes, conveniently. (Unless you want 0.)

> MVC does no padding or truncation. MVCL can pad and truncate, but
> unlike MVC will do nothing and report overflow if the operands
> overlap. MVC appears to other processors as a single indivisible
> operation; every processor (including IO processors) sees storage as
> either before the MVC or after it; 

I haven't looked recently, but I didn't think it locked out I/O.
Seems that one of the favorite tricks for S/360 was modifying
channel programs while they are running. (Not to mention self-
modifying channel prorams.) Seems that MVC would be convenient
for that. It might be that MVC interlocks on CCW fetch such that
only whole CCWs are fetched, though.

> it's not interruptible. MVCL is
> interruptible, and partial products can be observed by other
> processors. MVCL requires 4 registers and their contents are updated
> after completion of the operation; MVC requires 1 for variable length
> moves, 0 for fixed and its contents are preserved. MVCL has a high
> code setup cost; MVC has none.

> Writing a macro to do multiple MVCs and mimic the behaviour of MVCL?
> Why not? It's possible, if a little tricky. And by all accounts, MVC
> in a loop is faster than MVCL too. IBM even provided a macro; $MVCL.

> But then, when you look at MVCL usage closely, there are a few
> defining characteristics that are very useful. It can zero memory, and
> the millicode (IBM's word for microcode) recognizes 4K boundaries for
> 4K lengths and optimises it; it's faster than 16 MVCs.

As far as I understand, millicode isn't exactly like microcode,
but does allow for more complicated new instructions to be more
easily implemented.

> There's even a MVPG instruction for moving 4K aligned pages! What are
> those crazy instruction set designers thinking?

> The answer's a bit more than just code density; it never really was
> about that. In all the years I wrote IBM BAL, I never gave code
> density a serious thought -- with one exception. That was the 4K base
> address limit; a base register could only span 4K, so code that was
> bigger than that, you had to have either fancy register footwork or
> waste registers for multiple bases.

Compared to VAX, S/360 is somewhat RISCy. Note only three different
instruction lengths and, for much of the instruction set only two
address modes. If processors fast path the more popular instructions,
like L and even MVC, it isn't so far from RISC.

> It was more about giving assembler programmers choice and variety to
> get the best out of the box before the advent of optimising compilers;

Though stories are that even the old Fortran H could come close to
good assembly programmers, and likely better than the average assembly
programmer. 

> a way, if you like, of exposing the potential of the micro/millicode
> through the instruction set. "Here I want you to zero memory" meant an
> MVCL. "Here I am moving 8 bytes from A to B" meant using MVC. A
> knowledgeable assembler programmer could out-perform a compiler.
> (Nowadays quality compilers do a much better job of instruction
> selection than humans, especially for pipelined processors that
> stall.)

For many processors, MVC was much faster on appropriately aligned
data, such as the 8 bytes from A to B. Then again, some might
use LD and STD. 

> Hence CISC instruction sets (at least, IMHO and for IBM). They were
> there for people and performance, not for code density.

I noticed some time ago that the hex opcodes for add instructions
end in A, and for divide in D. (That leaves B for subtract and
C for multiply, but not so hard to remember.) 

If they really wanted to reduce code size, they should have added
a load indirect register instruction. (RR format.) A good 
fraction of L (load) instructions have both base and offset
zero, (or, equivalently, index and offset).

-- glen

Article: 155066
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Thu, 04 Apr 2013 19:30:28 -0400
Links: << >> << T >> << A >>

On 4/4/2013 5:34 PM, glen herrmannsfeldt wrote:
> In comp.arch.fpga rickman<gnuarm@gmail.com>  wrote:
>
> (snip, then I wrote)
>
>>> Then POLY evaluates a whole polynomial, such as is used to approximate
>>> many mathematical functions, but again, as I understand it, too slow.
>
>>> Both the PDP-10 and S/360 have the option for an index register on
>>> many instructions, where when register 0 is selected no indexing is
>>> done. VAX instead has indexed as a separate address mode selected by
>>> the address mode byte. Is that the most efficient use for those bits?
>
>> I think you have just described the CISC instruction development
>> concept.  Build a new machine, add some new instructions.  No big
>> rational, no "CISC" concept, just "let's make it better, why not add
>> some instructions?"
>
> Yes, but remember that there is competition and each has to have
> some reason why someone should by their product. Adding new instructions
> was one way to do that.
>
>> I believe if you check you will find the term CISC was not even coined
>> until after RISC was invented.  So CISC really just means, "what we used
>> to do".
>
> Well, yes, but why did "we used to do that"? For S/360, a lot of
> software was still written in pure assembler, for one reason to make
> it faster, and for another to make it smaller. And people were just
> starting to learn that people (writing software) are more expensive
> that machines (hardware). Well, that is about the point that it was
> true. For earlier machines you were lucky to get one compiler and
> enough system to run it.

Sure, none of this stuff was done without some purpose.  My point is 
that there was no *common* theme to the various CISC instruction sets. 
Everybody was doing their own thing until RISC came along with a basic 
philosophy.  Someone felt the need to give a name to the previous way of 
doing things and CISC seemed appropriate.  No special meaning in the 
name actually, just a contrast to the "Reduced" in RISC.

I don't think this is a very interesting topic really.  It started in 
response to a comment by Rod.

-- 

Rick

Article: 155067
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Thu, 04 Apr 2013 20:07:24 -0400
Links: << >> << T >> << A >>

On 4/4/2013 7:16 AM, Albert van der Horst wrote:
> In article<kjin8q$so5$1@speranza.aioe.org>,
> glen herrmannsfeldt<gah@ugcs.caltech.edu>  wrote:
>> In comp.arch.fpga Rod Pemberton<do_not_have@notemailnotq.cpm>  wrote:
>>> "rickman"<gnuarm@gmail.com>  wrote in message
>>> news:kjf48e$5qu$1@dont-email.me...
>>>> Weren't you the person who brought CISC into this discussion?
>>
>>> Yes.
>>
>>>> Why are you asking this question about CISC?
>>
>>> You mentioned code density.  AISI, code density is purely a CISC
>>> concept.  They go together and are effectively inseparable.
>>
>> They do go together, but I am not so sure that they are inseperable.
>>
>> CISC began when much coding was done in pure assembler, and anything
>> that made that easier was useful. (One should figure out the relative
>> costs, but at least it was in the right direction.)
>
> But, of course, this is a fallacy. The same goal is accomplished by
> macro's, and better. Code densitity is the only valid reason.

I'm pretty sure that conclusion is not correct.  If you have an 
instruction that does two or three memory accesses in one instruction 
and you replace it with three instructions that do one memory access 
each, you end up with two extra memory accesses.  How is this faster?

That is one of the reasons why I want to increase code density, in my 
machine it automatically improves execution time as well as reducing the 
amount of storage needed.

-- 

Rick

Article: 155068
Subject: Re: MISC - Stack Based vs. Register Based
From: Alex McDonald <blog@rivadpm.com>
Date: Thu, 4 Apr 2013 17:26:24 -0700 (PDT)
Links: << >> << T >> << A >>

On Apr 4, 4:15=A0pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> In comp.arch.fpga Alex McDonald <b...@rivadpm.com> wrote:
>
> (snip, someone wrote)
>
> >> >>> But, of course, this is a fallacy. The same goal is accomplished b=
y
> >> >>> macro's, and better. Code densitity is the only valid reason.
> >> Albert, do you have a reference about this?
> > Let's take two commonly used S/360 opcodes as an example of CISC; some
> > move operations. MVC (move 0 to 255 bytes) MVCL (move 0 to 16M bytes).
>
> MVC moves 1 to 256 bytes, conveniently. (Unless you want 0.)

My bad; the encoding is 0 to 255 but it's interpreted as +1.

>
> > MVC does no padding or truncation. MVCL can pad and truncate, but
> > unlike MVC will do nothing and report overflow if the operands
> > overlap. MVC appears to other processors as a single indivisible
> > operation; every processor (including IO processors) sees storage as
> > either before the MVC or after it;
>
> I haven't looked recently, but I didn't think it locked out I/O.
> Seems that one of the favorite tricks for S/360 was modifying
> channel programs while they are running. (Not to mention self-
> modifying channel prorams.) Seems that MVC would be convenient
> for that. It might be that MVC interlocks on CCW fetch such that
> only whole CCWs are fetched, though.

Certainly on the S/360 whole CCWs, since it had a precise interrupt
model and MVC wasn't (and still isn't) interruptible. The S/370
allowed interrupts on page faults for the target and source, but that
is done before the instruction is executed.

IPL operates just like that; it issues a fixed CCW that reads in data
that's a PSW and some CCWs, and away she goes...


>
> > it's not interruptible. MVCL is
> > interruptible, and partial products can be observed by other
> > processors. MVCL requires 4 registers and their contents are updated
> > after completion of the operation; MVC requires 1 for variable length
> > moves, 0 for fixed and its contents are preserved. MVCL has a high
> > code setup cost; MVC has none.
> > Writing a macro to do multiple MVCs and mimic the behaviour of MVCL?
> > Why not? It's possible, if a little tricky. And by all accounts, MVC
> > in a loop is faster than MVCL too. IBM even provided a macro; $MVCL.
> > But then, when you look at MVCL usage closely, there are a few
> > defining characteristics that are very useful. It can zero memory, and
> > the millicode (IBM's word for microcode) recognizes 4K boundaries for
> > 4K lengths and optimises it; it's faster than 16 MVCs.
>
> As far as I understand, millicode isn't exactly like microcode,
> but does allow for more complicated new instructions to be more
> easily implemented.
>
> > There's even a MVPG instruction for moving 4K aligned pages! What are
> > those crazy instruction set designers thinking?
> > The answer's a bit more than just code density; it never really was
> > about that. In all the years I wrote IBM BAL, I never gave code
> > density a serious thought -- with one exception. That was the 4K base
> > address limit; a base register could only span 4K, so code that was
> > bigger than that, you had to have either fancy register footwork or
> > waste registers for multiple bases.
>
> Compared to VAX, S/360 is somewhat RISCy. Note only three different
> instruction lengths and, for much of the instruction set only two
> address modes. If processors fast path the more popular instructions,
> like L and even MVC, it isn't so far from RISC.

A modern Z series has more instructions than the average Forth has
words; it's in the high hundreds.

>
> > It was more about giving assembler programmers choice and variety to
> > get the best out of the box before the advent of optimising compilers;
>
> Though stories are that even the old Fortran H could come close to
> good assembly programmers, and likely better than the average assembly
> programmer.

Fortran/H was a good compiler. The early PL/I was horrible, and there
was a move to use it for systems programming work. I never did so due
to its incredibly bad performance.

>
> > a way, if you like, of exposing the potential of the micro/millicode
> > through the instruction set. "Here I want you to zero memory" meant an
> > MVCL. "Here I am moving 8 bytes from A to B" meant using MVC. A
> > knowledgeable assembler programmer could out-perform a compiler.
> > (Nowadays quality compilers do a much better job of instruction
> > selection than humans, especially for pipelined processors that
> > stall.)
>
> For many processors, MVC was much faster on appropriately aligned
> data, such as the 8 bytes from A to B. Then again, some might
> use LD and STD.

OK, 9 bytes. :-)

>
> > Hence CISC instruction sets (at least, IMHO and for IBM). They were
> > there for people and performance, not for code density.
>
> I noticed some time ago that the hex opcodes for add instructions
> end in A, and for divide in D. (That leaves B for subtract and
> C for multiply, but not so hard to remember.)
>
> If they really wanted to reduce code size, they should have added
> a load indirect register instruction. (RR format.) A good
> fraction of L (load) instructions have both base and offset
> zero, (or, equivalently, index and offset).

Agreed. And had they wanted to, a single opcode for the standard
prolog & epilog; for example:

         STM   14,12,12(13)
         LR    12,15
         LA    15,SAVE
         ST    15,8(13)
         ST    13,4(15)
         LR    13,15

could have been the single op

         ENTRY SAVE

It was macros every time, which is in direct opposition to Albert's
assertion.

>
> -- glen

Article: 155069
Subject: Re: MISC - Stack Based vs. Register Based
From: albert@spenarnc.xs4all.nl (Albert van der Horst)
Date: 05 Apr 2013 01:17:31 GMT
Links: << >> << T >> << A >>

In article <kjkpnp$qdp$1@dont-email.me>, rickman  <gnuarm@gmail.com> wrote:
>On 4/4/2013 8:44 AM, glen herrmannsfeldt wrote:
>> In comp.arch.fpga Arlet Ottens<usenet+5@c-scape.nl>  wrote:
>>> On 04/04/2013 01:16 PM, Albert van der Horst wrote:
>>
>>>>>> You mentioned code density.  AISI, code density is purely a CISC
>>>>>> concept.  They go together and are effectively inseparable.
>>
>>>>> They do go together, but I am not so sure that they are inseperable.
>>
>>>>> CISC began when much coding was done in pure assembler, and anything
>>>>> that made that easier was useful. (One should figure out the relative
>>>>> costs, but at least it was in the right direction.)
>>
>>>> But, of course, this is a fallacy. The same goal is accomplished by
>>>> macro's, and better. Code densitity is the only valid reason.
>
>Albert, do you have a reference about this?

Not in a wikipedia sense where you're not allowed to mention original
research, and are only quoting what is in the books. It is more experience
that school knowledge.

If you want to see what can be done with a good macro processor like m4
study the one source of the 16/32/64 bit ciforth x86 for linux/Windows/Apple.
See my site below.

The existance of an XLAT instruction (to name an example) OTOH does virtually
nothing to make the life of an assembler programmer better.

Groetjes Albert
>
>--
>
>Rick
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Article: 155070
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 5 Apr 2013 04:10:47 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga Alex McDonald <blog@rivadpm.com> wrote:
> On Apr 3, 5:34 pm, "Rod Pemberton" <do_not_h...@notemailnotq.cpm>
> wrote:
>>  CISC was
>> typically little-endian to reduce the space needed for integer
>> encodings.

> As long as you discount IBM mainframes. They are big endian. Or
> Borroughs/Unisys; they were big endian too. Or the Motorola 68K; it
> was big endian. For little-endian CISC, only the VAX and x86 come to
> mind. Of those only the x86 survives.

To me, the only one where little endian seems reasonable is the 6502.
They did amazingly well with a small number of gates. Note, for one,
that on subroutine call the 6502 doesn't push the address of the next
instruction on the stack. That would have taken too much logic.
It pushes the adress minus one, as, it seems, that is what is
in the register at the time. RET adds one after the pop.

Two byte addition is slightly easier in little endian order, but
only slightly. It doesn't help at all for multiply and divide.

VAX was little endian because the PDP-11 was, though I am not sure
that there was a good reason for that. 

-- glen

Article: 155071
Subject: Re: MISC - Stack Based vs. Register Based
From: Alex McDonald <blog@rivadpm.com>
Date: Thu, 4 Apr 2013 23:49:33 -0700 (PDT)
Links: << >> << T >> << A >>

On Apr 4, 9:10=A0pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
.
>
> VAX was little endian because the PDP-11 was, though I am not sure
> that there was a good reason for that.
>
> -- glen

John Savard's take on it; http://www.plex86.org/Computer_Folklore/Little-En=
dian-1326.html

Article: 155072
Subject: Re: MISC - Stack Based vs. Register Based
From: Mark Wills <markrobertwills@yahoo.co.uk>
Date: Fri, 5 Apr 2013 00:41:38 -0700 (PDT)
Links: << >> << T >> << A >>

On Apr 5, 12:30=A0am, rickman <gnu...@gmail.com> wrote:
> On 4/4/2013 5:34 PM, glen herrmannsfeldt wrote:
>
>
>
>
>
> > In comp.arch.fpga rickman<gnu...@gmail.com> =A0wrote:
>
> > (snip, then I wrote)
>
> >>> Then POLY evaluates a whole polynomial, such as is used to approximat=
e
> >>> many mathematical functions, but again, as I understand it, too slow.
>
> >>> Both the PDP-10 and S/360 have the option for an index register on
> >>> many instructions, where when register 0 is selected no indexing is
> >>> done. VAX instead has indexed as a separate address mode selected by
> >>> the address mode byte. Is that the most efficient use for those bits?
>
> >> I think you have just described the CISC instruction development
> >> concept. =A0Build a new machine, add some new instructions. =A0No big
> >> rational, no "CISC" concept, just "let's make it better, why not add
> >> some instructions?"
>
> > Yes, but remember that there is competition and each has to have
> > some reason why someone should by their product. Adding new instruction=
s
> > was one way to do that.
>
> >> I believe if you check you will find the term CISC was not even coined
> >> until after RISC was invented. =A0So CISC really just means, "what we =
used
> >> to do".
>
> > Well, yes, but why did "we used to do that"? For S/360, a lot of
> > software was still written in pure assembler, for one reason to make
> > it faster, and for another to make it smaller. And people were just
> > starting to learn that people (writing software) are more expensive
> > that machines (hardware). Well, that is about the point that it was
> > true. For earlier machines you were lucky to get one compiler and
> > enough system to run it.
>
> Sure, none of this stuff was done without some purpose. =A0My point is
> that there was no *common* theme to the various CISC instruction sets.
> Everybody was doing their own thing until RISC came along with a basic
> philosophy. =A0Someone felt the need to give a name to the previous way o=
f
> doing things and CISC seemed appropriate. =A0No special meaning in the
> name actually, just a contrast to the "Reduced" in RISC.
>
> I don't think this is a very interesting topic really. =A0It started in
> response to a comment by Rod.
>
> --
>
> Rick- Hide quoted text -
>
> - Show quoted text -

Exactly. The CISC moniker was simply applied to an entire *generation*
of processors by the RISC guys. The term RISC was coined by David
Patterson at the University of California between 1980 and 1984.
http://en.wikipedia.org/wiki/Berkeley_RISC

*Until* that time, there was no need for the term "CISC", because
there was no RISC concept that required a differentiation!

I always thought of the Z80 of a CISC 8-bitter; it has some useful
memory move instructions (LDIR etc).

Article: 155073
Subject: Re: MISC - Stack Based vs. Register Based
From: Mark Wills <markrobertwills@yahoo.co.uk>
Date: Fri, 5 Apr 2013 00:51:27 -0700 (PDT)
Links: << >> << T >> << A >>

On Apr 5, 1:07=A0am, rickman <gnu...@gmail.com> wrote:
> On 4/4/2013 7:16 AM, Albert van der Horst wrote:
>
>
>
>
>
> > In article<kjin8q$so...@speranza.aioe.org>,
> > glen herrmannsfeldt<g...@ugcs.caltech.edu> =A0wrote:
> >> In comp.arch.fpga Rod Pemberton<do_not_h...@notemailnotq.cpm> =A0wrote=
:
> >>> "rickman"<gnu...@gmail.com> =A0wrote in message
> >>>news:kjf48e$5qu$1@dont-email.me...
> >>>> Weren't you the person who brought CISC into this discussion?
>
> >>> Yes.
>
> >>>> Why are you asking this question about CISC?
>
> >>> You mentioned code density. =A0AISI, code density is purely a CISC
> >>> concept. =A0They go together and are effectively inseparable.
>
> >> They do go together, but I am not so sure that they are inseperable.
>
> >> CISC began when much coding was done in pure assembler, and anything
> >> that made that easier was useful. (One should figure out the relative
> >> costs, but at least it was in the right direction.)
>
> > But, of course, this is a fallacy. The same goal is accomplished by
> > macro's, and better. Code densitity is the only valid reason.
>
> I'm pretty sure that conclusion is not correct. =A0If you have an
> instruction that does two or three memory accesses in one instruction
> and you replace it with three instructions that do one memory access
> each, you end up with two extra memory accesses. =A0How is this faster?
>
> That is one of the reasons why I want to increase code density, in my
> machine it automatically improves execution time as well as reducing the
> amount of storage needed.
>
> --
>
> Rick- Hide quoted text -
>
> - Show quoted text -

I think you're on the right track. With FPGAs it's really quite simple
to execute all instructions in a single cycle. It's no big deal at all
- with MPY and DIV being exceptions. In the 'Forth CPU world' even
literals can be loaded in a single cycle. It then comes down to
careful selection of your instruction set. With a small enough
instruction set one can pack more than one instruction in a word - and
there's your code density. If you can pack more than one instruction
in a word, you can execute them in a single clock cycle. With added
complexity, you may even be able to execute them in parallel rather
than as a process.

Article: 155074
Subject: Re: MISC - Stack Based vs. Register Based
From: Arlet Ottens <usenet+5@c-scape.nl>
Date: Fri, 05 Apr 2013 14:31:33 +0200
Links: << >> << T >> << A >>

On 04/05/2013 09:51 AM, Mark Wills wrote:

>> I'm pretty sure that conclusion is not correct.  If you have an
>> instruction that does two or three memory accesses in one instruction
>> and you replace it with three instructions that do one memory access
>> each, you end up with two extra memory accesses.  How is this faster?
>>
>> That is one of the reasons why I want to increase code density, in my
>> machine it automatically improves execution time as well as reducing the
>> amount of storage needed.

> I think you're on the right track. With FPGAs it's really quite simple
> to execute all instructions in a single cycle. It's no big deal at all
> - with MPY and DIV being exceptions. In the 'Forth CPU world' even
> literals can be loaded in a single cycle. It then comes down to
> careful selection of your instruction set. With a small enough
> instruction set one can pack more than one instruction in a word - and
> there's your code density. If you can pack more than one instruction
> in a word, you can execute them in a single clock cycle. With added
> complexity, you may even be able to execute them in parallel rather
> than as a process.

Multiple instructions per word sounds like a bad idea. It requires 
instructions that are so small that they can't do very much, so you need 
more of them. And if you need 2 or more small instructions to do 
whatever 1 big instruction does, it's better to use 1 big instruction 
since it makes instruction decoding more efficient and simpler.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search