Messages from 155425

Article: 155425
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Wed, 26 Jun 2013 10:52:06 -0700 (PDT)
Links: << >> << T >> << A >>

Question to the programming types:

Ever seen a signed logical or arithmetic shift distance before?  Hive shift=
 distances are signed, which works out quite nicely (the basic shift is shi=
ft left, with negative shift distances performing right shifts).  This is s=
omething I haven't encountered in any opcode listings I've had the pleasure=
 to peruse, so I'm wondering if it is kind of new-ish.

Article: 155426
Subject: Re: New soft processor core paper publisher?
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 26 Jun 2013 18:07:59 +0000 (UTC)
Links: << >> << T >> << A >>

Eric Wallin <tammie.eric@gmail.com> wrote:
> Question to the programming types:
 
> Ever seen a signed logical or arithmetic shift distance before?  
> Hive shift distances are signed, which works out quite nicely 
> (the basic shift is shift left, with negative shift distances 
> performing right shifts).  

> This is something I haven't encountered in any opcode listings 
> I've had the pleasure to peruse, so I'm wondering if it is 
> kind of new-ish.

PDP-10 has signed shifts. The manual is available on bitsavers, 
such as:

AA-H391A-TK_DECsystem-10_DECSYSTEM-20_Processor_Reference_Jun1982.pdf

Shifts use a signed 9 bit value from the computed effective 
address. 

-- glen

Article: 155427
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Wed, 26 Jun 2013 13:06:57 -0700 (PDT)
Links: << >> << T >> << A >>

On Wednesday, June 26, 2013 2:07:59 PM UTC-4, glen herrmannsfeldt wrote:

> PDP-10 has signed shifts.

Thanks for that Glen!

Article: 155428
Subject: Re: New soft processor core paper publisher?
From: thomas.entner99@gmail.com
Date: Wed, 26 Jun 2013 14:44:39 -0700 (PDT)
Links: << >> << T >> << A >>


>=20
> Thomas, with your experience with ERIC5 series, do you see anything obvio=
usly missing from the Hive instruction set?  What do you think of the liter=
al sizing?

I just took a quick look at your document (time is limited...). What I like=
 is the concept of "in-line" literals. A good extension would be to have th=
e same concept also for calls and jumps (i.e. so you do not have to load th=
e destination address into a register first) and maybe also other instructi=
ons that can work with literals. I also think that you leave some bits unus=
ed: e.g. byt instruction does not use register B, so you would have 3 addit=
ional bits in to opcode to make it possible to have 11b literal instead of =
an 8b literal (or you could use this 3 bits for other purposes, e.g. A =3D =
A + lit8)

What others already mentioned is the restricted code-space, but without C-c=
ompiler this will never become a real issue ;-)

For your desired application, you could maybe think of options to reduce th=
e resource usage. BTW: The bad habit of Quartus to replace flip-flop chains=
 with memories (you mentioned this somewhere in your document) can be disab=
led by turning off "auto replace shift registers" somewhere in the synthesi=
s settings of Quartus.

Regards,

Thomas
www.entner-electronics.com

Article: 155429
Subject: Re: FPGA Exchange
From: "RCIngham" <2161@embeddedrelated>
Date: Thu, 27 Jun 2013 06:47:45 -0500
Links: << >> << T >> << A >>

>Rick, 
>
>Guy does not get advertising $ when people use comp.arch.fpga.
>
>Andy
>

Quite so. I feel that 'FPGARelated.com' adds some value, 
so I don't begrudge Stephane his advertising revenue.
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 155430
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Thu, 27 Jun 2013 08:31:20 -0700 (PDT)
Links: << >> << T >> << A >>

On Wednesday, June 26, 2013 5:44:39 PM UTC-4, thomas....@gmail.com wrote:

> I just took a quick look at your document (time is limited...). What I li=
ke is the concept of "in-line" literals. A good extension would be to have =
the same concept also for calls and jumps (i.e. so you do not have to load =
the destination address into a register first) and maybe also other instruc=
tions that can work with literals. I also think that you leave some bits un=
used: e.g. byt instruction does not use register B, so you would have 3 add=
itional bits in to opcode to make it possible to have 11b literal instead o=
f an 8b literal (or you could use this 3 bits for other purposes, e.g. A =
=3D A + lit8)

Oooh, very nice idea, thanks so much!  I gave this some thought and even fo=
und some space to shoehorn some opcodes in, but the lit has to come from th=
e data memory port and go back into the control ring to offset / replace th=
e PC, and this would require some combinatorial logic in front of the progr=
am memory address port which could slow the entire thing down.  I'll defini=
tely give it a try though.

I'm kind of against invading the B stack index/pop for other things, having=
 it always present allows for concurrent stack cleanup.

> What others already mentioned is the restricted code-space, but without C=
-compiler this will never become a real issue ;-)

Hive could be easily edited to have 32 bit addresses, but the use of BRAM f=
or small processor main memory is likely an even stronger restriction on co=
de-space, which is why I don't feel the need for anything beyond 16 bits. =
=20

> For your desired application, you could maybe think of options to reduce =
the resource usage. BTW: The bad habit of Quartus to replace flip-flop chai=
ns with memories (you mentioned this somewhere in your document) can be dis=
abled by turning off "auto replace shift registers" somewhere in the synthe=
sis settings of Quartus.

Using the "speed" optimization technique for analysis and synthesis avoids =
this as well.

Article: 155431
Subject: Re: New soft processor core paper publisher?
From: Theo Markettos <theom+news@chiark.greenend.org.uk>
Date: 28 Jun 2013 00:22:58 +0100 (BST)
Links: << >> << T >> << A >>

In comp.arch.fpga Andrew Haley <andrew29@littlepinkcloud.invalid> wrote:
> In comp.lang.forth Paul Rubin <no.email@nospam.invalid> wrote:
> >    More than 50% of SIM cards deployed in 2011 run Java Card
> 
> OK, but that's hardly "most Java", unless you're just counting the
> number of virtual machines that might run at some point.

Java Card isn't the JVM - it's Java compiled down to whatever CPU is on the
card.

Theo

Article: 155432
Subject: Re: FPGA Exchange
From: Anssi Saari <as@sci.fi>
Date: Fri, 28 Jun 2013 10:13:12 +0300
Links: << >> << T >> << A >>

Guy Eschemann <Guy.Eschemann@gmail.com> writes:

> This is a honest attempt at creating a friendly, vendor-independent
> discussion space where FPGA developers can share their knowledge. A
> bit like comp.arch.fpga was 15 years ago. People are moving away from
> newsgroups anyway, so I'd rather have them join FPGA Exchange than
> some random LinkedIn group.

Does your "modern" platform provide any control for the user to choose
what content he wants or doesn't want to see? Sort of like what we've
had on Usenet since the 1990s, sorting, threading, scoring... 

As far as LinkedIn goes I don't think it's going to be a discussion
platform. In fact, I've been surprised at the lack of discussion on
LinkedIn in the various FPGA-related groups. Other than "please do my
homework" and "what book / what eval kit should I buy" from extreme
beginners and"please read my blog" and some job ads, it's been pretty
quiet. Although I have to admit I wouldn't have known about Arrow's
cheap Cyclone V SoC trainings this summer if it weren't for LinkedIn.

Article: 155433
Subject: Re: FPGA Exchange
From: "RCIngham" <2161@embeddedrelated>
Date: Fri, 28 Jun 2013 02:55:27 -0500
Links: << >> << T >> << A >>

<snip>

>As far as LinkedIn goes I don't think it's going to be a discussion
>platform. In fact, I've been surprised at the lack of discussion on
>LinkedIn in the various FPGA-related groups. Other than "please do my
>homework" and "what book / what eval kit should I buy" from extreme
>beginners and"please read my blog" and some job ads, it's been pretty
>quiet. Although I have to admit I wouldn't have known about Arrow's
>cheap Cyclone V SoC trainings this summer if it weren't for LinkedIn.
>

I agree with you, the signal-to-noise ratio in the FPGA- and VHDL-related
groups that I belong to is rather poor. I quit one group because it was all
job-related and shameless self-promotion. I occasionally post to advise
people against doing something obviously really wrong, but I don't expect
to learn anything worthwhile in any of their groups.
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 155434
Subject: Re: New soft processor core paper publisher?
From: "RCIngham" <2161@embeddedrelated>
Date: Fri, 28 Jun 2013 04:09:08 -0500
Links: << >> << T >> << A >>

<snip>

>Mind you, I'd *love* to see a radical overhaul of traditional
>multicore processors so they took the form of
>   - a large number of processors
>   - each with completely independent memory
>   - connected by message passing fifos
>
>In the long term that'll be the only way we can continue
>to scale individual machines: SMP scales for a while, but
>then cache coherence requirements kill performance.
>

Transputer?
http://en.wikipedia.org/wiki/Transputer
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 155435
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Fri, 28 Jun 2013 10:33:24 +0100
Links: << >> << T >> << A >>

On 28/06/13 10:09, RCIngham wrote:
> <snip>
>
>> Mind you, I'd *love* to see a radical overhaul of traditional
>> multicore processors so they took the form of
>>    - a large number of processors
>>    - each with completely independent memory
>>    - connected by message passing fifos
>>
>> In the long term that'll be the only way we can continue
>> to scale individual machines: SMP scales for a while, but
>> then cache coherence requirements kill performance.
>>
>
> Transputer?
> http://en.wikipedia.org/wiki/Transputer

It had a lot going for it, but was a too dogmatic about
the development environment. At the time it was respectably
fast, but that wasn't sufficient -- particularly since there
was so much scope for increasing speed of uniprocessor
machines.

Given that uniprocessors have hit a wall, transputer
*concepts* embodied in a completely different form
might begin to be fashionable again.

It would also help if people can decide that reliability
is important, and that bucketfuls of salt should be
on hand when listening to salesman's protestations that
"the software/hardware framework takes care of all of
that so you don't have to worry".

Article: 155436
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Fri, 28 Jun 2013 06:59:17 -0700 (PDT)
Links: << >> << T >> << A >>

On Wednesday, June 26, 2013 5:44:39 PM UTC-4, thomas....@gmail.com wrote:

> I just took a quick look at your document (time is limited...). What I li=
ke is the concept of "in-line" literals. A good extension would be to have =
the same concept also for calls and jumps (i.e. so you do not have to load =
the destination address into a register first) and maybe also other instruc=
tions that can work with literals. I also think that you leave some bits un=
used: e.g. byt instruction does not use register B, so you would have 3 add=
itional bits in to opcode to make it possible to have 11b literal instead o=
f an 8b literal (or you could use this 3 bits for other purposes, e.g. A =
=3D A + lit8)

After looking into this yesterday I don't think I'll do it.  The in-line va=
lue has to be retrieved before it can be used to offset or replace the PC, =
which is one clock too late for the way the pipeline is currently configure=
d.  Using it in other ways like adding wouldn't work unless I used a sepear=
ate adder, as the ALU add/subtract happens fairly early in the pipe.  But I=
 really appreciate this excellent suggestion Thomas, and for the time you t=
ook to read my paper!

Article: 155437
Subject: Re: FPGA Exchange
From: rickman <gnuarm@gmail.com>
Date: Fri, 28 Jun 2013 10:47:56 -0400
Links: << >> << T >> << A >>

On 6/28/2013 3:55 AM, RCIngham wrote:
> <snip>
>
>> As far as LinkedIn goes I don't think it's going to be a discussion
>> platform. In fact, I've been surprised at the lack of discussion on
>> LinkedIn in the various FPGA-related groups. Other than "please do my
>> homework" and "what book / what eval kit should I buy" from extreme
>> beginners and"please read my blog" and some job ads, it's been pretty
>> quiet. Although I have to admit I wouldn't have known about Arrow's
>> cheap Cyclone V SoC trainings this summer if it weren't for LinkedIn.
>>
>
> I agree with you, the signal-to-noise ratio in the FPGA- and VHDL-related
> groups that I belong to is rather poor. I quit one group because it was all
> job-related and shameless self-promotion. I occasionally post to advise
> people against doing something obviously really wrong, but I don't expect
> to learn anything worthwhile in any of their groups.

I am in a few groups at Linkedin and I find an interesting discussion 
now and again.  There was a rather long one on MISC and Forth, but that 
discussion had a poor SNR (at least from my viewpoint).  There is an 
interesting one in one of the FPGA groups where someone is writing 
training materials and seems to be doing something worthwhile.  I 
haven't figured out how he is presenting the materials though.

This group, comp.arch.fpga is not so bad, but a lot of usenet groups are 
pretty poor SNR too.  Not that they don't have much content, but they 
can have *so much* noise.  Actually it is more like SDR, signal to drama 
ratio. lol

-- 

Rick

Article: 155438
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Fri, 28 Jun 2013 10:52:29 -0400
Links: << >> << T >> << A >>

On 6/28/2013 5:33 AM, Tom Gardner wrote:
> On 28/06/13 10:09, RCIngham wrote:
>> <snip>
>>
>>> Mind you, I'd *love* to see a radical overhaul of traditional
>>> multicore processors so they took the form of
>>> - a large number of processors
>>> - each with completely independent memory
>>> - connected by message passing fifos
>>>
>>> In the long term that'll be the only way we can continue
>>> to scale individual machines: SMP scales for a while, but
>>> then cache coherence requirements kill performance.
>>>
>>
>> Transputer?
>> http://en.wikipedia.org/wiki/Transputer
>
> It had a lot going for it, but was a too dogmatic about
> the development environment.

You mean 'C'?  I worked on a large transputer oriented project and they 
used ANSI 'C' rather than Occam.  It got the job done... or should I say 
"jobs"?


> At the time it was respectably
> fast, but that wasn't sufficient -- particularly since there
> was so much scope for increasing speed of uniprocessor
> machines.
>
> Given that uniprocessors have hit a wall, transputer
> *concepts* embodied in a completely different form
> might begin to be fashionable again.

You mean like 144 transputers on a single chip?  I"m not sure where 
processing is headed.  I actually just see confusion ahead as all of the 
existing methods seem to have come to a steep incline if not a brick 
wall.  It may be time for something completely different.


> It would also help if people can decide that reliability
> is important, and that bucketfuls of salt should be
> on hand when listening to salesman's protestations that
> "the software/hardware framework takes care of all of
> that so you don't have to worry".

What?  Since when did engineers listen to salesmen?

-- 

Rick

Article: 155439
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Fri, 28 Jun 2013 17:23:48 +0100
Links: << >> << T >> << A >>

On 28/06/13 15:52, rickman wrote:
> On 6/28/2013 5:33 AM, Tom Gardner wrote:
>> On 28/06/13 10:09, RCIngham wrote:
>>> <snip>
>>>
>>>> Mind you, I'd *love* to see a radical overhaul of traditional
>>>> multicore processors so they took the form of
>>>> - a large number of processors
>>>> - each with completely independent memory
>>>> - connected by message passing fifos
>>>>can
>>>> In the long term that'll be the only way we can continue
>>>> to scale individual machines: SMP scales for a while, but
>>>> then cache coherence requirements kill performance.
>>>>
>>>
>>> Transputer?
>>> http://en.wikipedia.org/wiki/Transputer
>>
>> It had a lot going for it, but was a too dogmatic about
>> the development environment.
>
> You mean 'C'?  I worked on a large transputer oriented project and they used ANSI 'C' rather than Occam.  It got the job done... or should I say "jobs"?

I only looked at the Transputer when it was Occam only.
I liked Occam as an academic language, but at that time
it would have been a bit of a pain to do any serious
engineering; ISTR anything other than primitive types
weren't supported in the language. IIRC that was
ameliorated later, but by then the opportunity for
me (and Inmos) had passed.

I don't know how C fitted onto the Transputer, but
I'd only have been interested if "multithreaded"
(to use the term loosely) code could have been
expressed reasonably easily.

Shame, I'd have loved to use it.

>> At the time it was respectably
>> fast, but that wasn't sufficient -- particularly since there
>> was so much scope for increasing speed of uniprocessor
>> machines.
>>
>> Given that uniprocessors have hit a wall, transputer
>> *concepts* embodied in a completely different form
>> might begin to be fashionable again.
>
> You mean like 144 transputers on a single chip?

Or Intel's 80 cored chip :)

> I"m not sure where processing is headed.

Not that way! Memory bandwidth and latency are
key issues - but you knew that!

> I actually just see confusion ahead as all of the existing methods seem to have come to a steep incline if
> not a brick wall.  It may be time for something completely different.

Precisely. My bet is that message passing between
independent processor+memory systems has the
biggest potential. It matches nicely onto many
forms of event-driven industrial and financial
applications and, I am told, onto significant
parts of HPC. It is also relatively easy to
comprehend and debug.

The trick will be to get the sizes of the
processor + memory + computation "just right".
And desktop/GUI doesn't match that.

>> It would also help if people can decide that reliability
>> is important, and that bucketfuls of salt should be
>> on hand when listening to salesman's protestations that
>> "the software/hardware framework takes care of all of
>> that so you don't have to worry".
>
> What?  Since when did engineers listen to salesmen?

Since their PHBs get taken out to the golf course
to chat about sport by the salesmen :(

Article: 155440
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Fri, 28 Jun 2013 15:06:17 -0400
Links: << >> << T >> << A >>

On 6/28/2013 12:23 PM, Tom Gardner wrote:
> On 28/06/13 15:52, rickman wrote:
>> On 6/28/2013 5:33 AM, Tom Gardner wrote:
>>> On 28/06/13 10:09, RCIngham wrote:
>>>> <snip>
>>>>
>>>>> Mind you, I'd *love* to see a radical overhaul of traditional
>>>>> multicore processors so they took the form of
>>>>> - a large number of processors
>>>>> - each with completely independent memory
>>>>> - connected by message passing fifos
>>>>> can
>>>>> In the long term that'll be the only way we can continue
>>>>> to scale individual machines: SMP scales for a while, but
>>>>> then cache coherence requirements kill performance.
>>>>>
>>>>
>>>> Transputer?
>>>> http://en.wikipedia.org/wiki/Transputer
>>>
>>> It had a lot going for it, but was a too dogmatic about
>>> the development environment.
>>
>> You mean 'C'? I worked on a large transputer oriented project and they
>> used ANSI 'C' rather than Occam. It got the job done... or should I
>> say "jobs"?
>
> I only looked at the Transputer when it was Occam only.
> I liked Occam as an academic language, but at that time
> it would have been a bit of a pain to do any serious
> engineering; ISTR anything other than primitive types
> weren't supported in the language. IIRC that was
> ameliorated later, but by then the opportunity for
> me (and Inmos) had passed.
>
> I don't know how C fitted onto the Transputer, but
> I'd only have been interested if "multithreaded"
> (to use the term loosely) code could have been
> expressed reasonably easily.
>
> Shame, I'd have loved to use it.
>
>>> At the time it was respectably
>>> fast, but that wasn't sufficient -- particularly since there
>>> was so much scope for increasing speed of uniprocessor
>>> machines.
>>>
>>> Given that uniprocessors have hit a wall, transputer
>>> *concepts* embodied in a completely different form
>>> might begin to be fashionable again.
>>
>> You mean like 144 transputers on a single chip?
>
> Or Intel's 80 cored chip :)
>
>> I"m not sure where processing is headed.
>
> Not that way! Memory bandwidth and latency are
> key issues - but you knew that!

Yeah, but I think the current programming paradigm is the problem.  I 
think something else needs to come along.  The current methods are all 
based on one, massive von Neumann design and that is what has hit the 
wall... duh!

Time to think in terms of much smaller entities not totally different 
from what is found in FPGAs, just processors rather than logic.

An 80 core chip will just be a starting point, but the hard part will 
*be* getting started.

>> I actually just see confusion ahead as all of the existing methods
>> seem to have come to a steep incline if
>> not a brick wall. It may be time for something completely different.
>
> Precisely. My bet is that message passing between
> independent processor+memory systems has the
> biggest potential. It matches nicely onto many
> forms of event-driven industrial and financial
> applications and, I am told, onto significant
> parts of HPC. It is also relatively easy to
> comprehend and debug.
>
> The trick will be to get the sizes of the
> processor + memory + computation "just right".
> And desktop/GUI doesn't match that.

I think the trick will be in finding ways of dividing up the programs so 
they can meld to the hardware rather than trying to optimize everything.

Consider a chip where you have literally a trillion operations per 
second available all the time.  Do you really care if half go to waste? 
  I don't!  I design FPGAs and I have never felt obliged (not since the 
early days anyway) to optimize the utility of each LUT and FF.  No, it 
turns out the precious resource in FPGAs is routing and you can't do 
much but let the tools manage that anyway.

So a fine grained processor array could be very effective if the 
programming can be divided down to suit.  Maybe it takes 10 of these 
cores to handle 100 Mbps Ethernet, so what?  Something like a browser 
might need to harness a couple of dozen.  If the load slacks off and 
they are idling, so what?

>>> It would also help if people can decide that reliability
>>> is important, and that bucketfuls of salt should be
>>> on hand when listening to salesman's protestations that
>>> "the software/hardware framework takes care of all of
>>> that so you don't have to worry".
>>
>> What? Since when did engineers listen to salesmen?
>
> Since their PHBs get taken out to the golf course
> to chat about sport by the salesmen :(

It's a bit different with me.  I am my own PHB and I kayak, not golf.  I 
have one disti person who I really enjoy talking to.  She tried to help 
me from time to time, but often she can't do a lot because I'm not 
buying 1000's of chips.  But my quantities have gone up a bit lately, 
we'll see where it goes.

-- 

Rick

Article: 155441
Subject: Re: New soft processor core paper publisher?
From: Bakul Shah <usenet@bitblocks.com>
Date: Fri, 28 Jun 2013 12:55:06 -0700
Links: << >> << T >> << A >>

On 6/28/13 2:33 AM, Tom Gardner wrote:
> On 28/06/13 10:09, RCIngham wrote:
>> <snip>
>>
>>> Mind you, I'd *love* to see a radical overhaul of traditional
>>> multicore processors so they took the form of
>>>    - a large number of processors
>>>    - each with completely independent memory
>>>    - connected by message passing fifos
>>>
>>> In the long term that'll be the only way we can continue
>>> to scale individual machines: SMP scales for a while, but
>>> then cache coherence requirements kill performance.
>>>
>>
>> Transputer?
>> http://en.wikipedia.org/wiki/Transputer
>
> It had a lot going for it, but was a too dogmatic about
> the development environment. At the time it was respectably
> fast, but that wasn't sufficient -- particularly since there
> was so much scope for increasing speed of uniprocessor
> machines.

Have you looked at Tilera's TILEpro64 or Adapteva's Epiphany
64 core processors?

> Given that uniprocessors have hit a wall, transputer
> *concepts* embodied in a completely different form
> might begin to be fashionable again.

Languages like Erlang and Go use similar concepts (as
did Occam on the transputer). But I think the problem
is that /in general/ we still don't know how to write
parallel or distributed programs. Most of the concepts
are from ~40 years back (CSP, guarded commands etc.).
We still don't have decent tools. Turning serial programs
into parallel versions is manual, laborious, error prone
and not very successful.

Article: 155442
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Fri, 28 Jun 2013 22:04:32 +0100
Links: << >> << T >> << A >>

On 28/06/13 20:55, Bakul Shah wrote:
> On 6/28/13 2:33 AM, Tom Gardner wrote:
>> On 28/06/13 10:09, RCIngham wrote:
>>> <snip>
>>>
>>>> Mind you, I'd *love* to see a radical overhaul of traditional
>>>> multicore processors so they took the form of
>>>>    - a large number of processors
>>>>    - each with completely independent memory
>>>>    - connected by message passing fifos
>>>>
>>>> In the long term that'll be the only way we can continue
>>>> to scale individual machines: SMP scales for a while, but
>>>> then cache coherence requirements kill performance.
>>>>
>>>
>>> Transputer?
>>> http://en.wikipedia.org/wiki/Transputer
>>
>> It had a lot going for it, but was a too dogmatic about
>> the development environment. At the time it was respectably
>> fast, but that wasn't sufficient -- particularly since there
>> was so much scope for increasing speed of uniprocessor
>> machines.
>
> Have you looked at Tilera's TILEpro64 or Adapteva's Epiphany
> 64 core processors?

No I haven't.

I've been constrained by getting high-availability
software to market quickly, on hardware that is
demonstrably supported all over the world.

>
>> Given that uniprocessors have hit a wall, transputer
>> *concepts* embodied in a completely different form
>> might begin to be fashionable again.
>
> Languages like Erlang and Go use similar concepts (as
> did Occam on the transputer). But I think the problem
> is that /in general/ we still don't know how to write
> parallel or distributed programs. Most of the concepts
> are from ~40 years back (CSP, guarded commands etc.).
> We still don't have decent tools. Turning serial programs
> into parallel versions is manual, laborious, error prone
> and not very successful.

Erlang is certainly interesting from this point of view.

I'm not interested in turning existing serial programs
into parallel ones; that way lies madness and failure.

What is more interestingly tractable are "embarrassingly
parallel" problems (e.g. massive event processing systems),
and completely new approaches (currently typified by
big data and map-reduce, but that's just the beginning).

Article: 155443
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Fri, 28 Jun 2013 22:11:35 +0100
Links: << >> << T >> << A >>

On 28/06/13 20:06, rickman wrote:
> On 6/28/2013 12:23 PM, Tom Gardner wrote:
>> On 28/06/13 15:52, rickman wrote:
>>> On 6/28/2013 5:33 AM, Tom Gardner wrote:
>>>> On 28/06/13 10:09, RCIngham wrote:
>>>>> <snip>
>>>>>
>>>>>> Mind you, I'd *love* to see a radical overhaul of traditional
>>>>>> multicore processors so they took the form of
>>>>>> - a large number of processors
>>>>>> - each with completely independent memory
>>>>>> - connected by message passing fifos
>>>>>> can
>>>>>> In the long term that'll be the only way we can continue
>>>>>> to scale individual machines: SMP scales for a while, but
>>>>>> then cache coherence requirements kill performance.
>>>>>>
>>>>>
>>>>> Transputer?
>>>>> http://en.wikipedia.org/wiki/Transputer
>>>>
>>>> It had a lot going for it, but was a too dogmatic about
>>>> the development environment.
>>>
>>> You mean 'C'? I worked on a large transputer oriented project and they
>>> used ANSI 'C' rather than Occam. It got the job done... or should I
>>> say "jobs"?
>>
>> I only looked at the Transputer when it was Occam only.
>> I liked Occam as an academic language, but at that time
>> it would have been a bit of a pain to do any serious
>> engineering; ISTR anything other than primitive types
>> weren't supported in the language. IIRC that was
>> ameliorated later, but by then the opportunity for
>> me (and Inmos) had passed.
>>
>> I don't know how C fitted onto the Transputer, but
>> I'd only have been interested if "multithreaded"
>> (to use the term loosely) code could have been
>> expressed reasonably easily.
>>
>> Shame, I'd have loved to use it.
>>
>>>> At the time it was respectably
>>>> fast, but that wasn't sufficient -- particularly since there
>>>> was so much scope for increasing speed of uniprocessor
>>>> machines.
>>>>
>>>> Given that uniprocessors have hit a wall, transputer
>>>> *concepts* embodied in a completely different form
>>>> might begin to be fashionable again.
>>>
>>> You mean like 144 transputers on a single chip?
>>
>> Or Intel's 80 cored chip :)
>>
>>> I"m not sure where processing is headed.
>>
>> Not that way! Memory bandwidth and latency are
>> key issues - but you knew that!
>
> Yeah, but I think the current programming paradigm is the problem.  I think something else needs to come along.  The current methods are all based on one, massive von Neumann design and that is what
> has hit the wall... duh!
>
> Time to think in terms of much smaller entities not totally different from what is found in FPGAs, just processors rather than logic.
>
> An 80 core chip will just be a starting point, but the hard part will *be* getting started.
>
>
>>> I actually just see confusion ahead as all of the existing methods
>>> seem to have come to a steep incline if
>>> not a brick wall. It may be time for something completely different.
>>
>> Precisely. My bet is that message passing between
>> independent processor+memory systems has the
>> biggest potential. It matches nicely onto many
>> forms of event-driven industrial and financial
>> applications and, I am told, onto significant
>> parts of HPC. It is also relatively easy to
>> comprehend and debug.
>>
>> The trick will be to get the sizes of the
>> processor + memory + computation "just right".
>> And desktop/GUI doesn't match that.
>
> I think the trick will be in finding ways of dividing up the programs so they can meld to the hardware rather than trying to optimize everything.

My suspicion is that, except for compute-bound
problems that only require "local" data, that
granularity will be too small.

Examples where it will work, e.g. protein folding,
will rapidly migrate to CUDA and graphics processors.

>
> Consider a chip where you have literally a trillion operations per second available all the time.  Do you really care if half go to waste?  I don't!  I design FPGAs and I have never felt obliged (not
> since the early days anyway) to optimize the utility of each LUT and FF.  No, it turns out the precious resource in FPGAs is routing and you can't do much but let the tools manage that anyway.

Those internal FPGA constraints also have analogues at
a larger scale, e.g. ic pinout, backplanes, networks...


> So a fine grained processor array could be very effective if the programming can be divided down to suit.  Maybe it takes 10 of these cores to handle 100 Mbps Ethernet, so what?  Something like a
> browser might need to harness a couple of dozen.  If the load slacks off and they are idling, so what?

The fundamental problem is that in general as you make the
granularity smaller, the communications requirements
get larger. And vice versa :(


>>>> It would also help if people can decide that reliability
>>>> is important, and that bucketfuls of salt should be
>>>> on hand when listening to salesman's protestations that
>>>> "the software/hardware framework takes care of all of
>>>> that so you don't have to worry".
>>>
>>> What? Since when did engineers listen to salesmen?
>>
>> Since their PHBs get taken out to the golf course
>> to chat about sport by the salesmen :(
>
> It's a bit different with me.  I am my own PHB and I kayak, not golf.  I have one disti person who I really enjoy talking to.  She tried to help me from time to time, but often she can't do a lot
> because I'm not buying 1000's of chips.  But my quantities have gone up a bit lately, we'll see where it goes.

I'm sort-of retired (I got sick of corporate in-fighting,
and I have my "drop dead money", so...)

I regard golf as silly, despite having two courses in
walking distance. My equivalent of kayaking is flying
gliders.

Article: 155444
Subject: Re: New soft processor core paper publisher?
From: Bakul Shah <usenet@bitblocks.com>
Date: Fri, 28 Jun 2013 14:22:59 -0700
Links: << >> << T >> << A >>

On 6/28/13 2:04 PM, Tom Gardner wrote:
> On 28/06/13 20:55, Bakul Shah wrote:
>>
>> Have you looked at Tilera's TILEpro64 or Adapteva's Epiphany
>> 64 core processors?
>
> No I haven't.

FYI the epiphany III processor is used in the $99
parallela "supercomputer". Should be available by
August end according to
http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone/posts

> What is more interestingly tractable are "embarrassingly
> parallel" problems (e.g. massive event processing systems),
> and completely new approaches (currently typified by
> big data and map-reduce, but that's just the beginning).

And yet these run on traditional computers. Parallelism
is at the node level.

Article: 155445
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Fri, 28 Jun 2013 22:39:09 +0100
Links: << >> << T >> << A >>

On 28/06/13 22:22, Bakul Shah wrote:
> On 6/28/13 2:04 PM, Tom Gardner wrote:
>> What is more interestingly tractable are "embarrassingly
>> parallel" problems (e.g. massive event processing systems),
>> and completely new approaches (currently typified by
>> big data and map-reduce, but that's just the beginning).
>
> And yet these run on traditional computers. Parallelism
> is at the node level.

Just so, but even such nodes can be the subject of innovation.
A recent good example is Sun's Niagara/Rock T series sparcs,
which forego OOO and caches in favour of a medium number of
cores each operating at the speed of main memory.

Article: 155446
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Fri, 28 Jun 2013 21:02:10 -0400
Links: << >> << T >> << A >>

On 6/28/2013 5:11 PM, Tom Gardner wrote:
> On 28/06/13 20:06, rickman wrote:
>> On 6/28/2013 12:23 PM, Tom Gardner wrote:
>>> On 28/06/13 15:52, rickman wrote:
>>>> On 6/28/2013 5:33 AM, Tom Gardner wrote:
>>>>> On 28/06/13 10:09, RCIngham wrote:
>>>>>> <snip>
>>>>>>
>>>>>>> Mind you, I'd *love* to see a radical overhaul of traditional
>>>>>>> multicore processors so they took the form of
>>>>>>> - a large number of processors
>>>>>>> - each with completely independent memory
>>>>>>> - connected by message passing fifos
>>>>>>> can
>>>>>>> In the long term that'll be the only way we can continue
>>>>>>> to scale individual machines: SMP scales for a while, but
>>>>>>> then cache coherence requirements kill performance.
>>>>>>>
>>>>>>
>>>>>> Transputer?
>>>>>> http://en.wikipedia.org/wiki/Transputer
>>>>>
>>>>> It had a lot going for it, but was a too dogmatic about
>>>>> the development environment.
>>>>
>>>> You mean 'C'? I worked on a large transputer oriented project and they
>>>> used ANSI 'C' rather than Occam. It got the job done... or should I
>>>> say "jobs"?
>>>
>>> I only looked at the Transputer when it was Occam only.
>>> I liked Occam as an academic language, but at that time
>>> it would have been a bit of a pain to do any serious
>>> engineering; ISTR anything other than primitive types
>>> weren't supported in the language. IIRC that was
>>> ameliorated later, but by then the opportunity for
>>> me (and Inmos) had passed.
>>>
>>> I don't know how C fitted onto the Transputer, but
>>> I'd only have been interested if "multithreaded"
>>> (to use the term loosely) code could have been
>>> expressed reasonably easily.
>>>
>>> Shame, I'd have loved to use it.
>>>
>>>>> At the time it was respectably
>>>>> fast, but that wasn't sufficient -- particularly since there
>>>>> was so much scope for increasing speed of uniprocessor
>>>>> machines.
>>>>>
>>>>> Given that uniprocessors have hit a wall, transputer
>>>>> *concepts* embodied in a completely different form
>>>>> might begin to be fashionable again.
>>>>
>>>> You mean like 144 transputers on a single chip?
>>>
>>> Or Intel's 80 cored chip :)
>>>
>>>> I"m not sure where processing is headed.
>>>
>>> Not that way! Memory bandwidth and latency are
>>> key issues - but you knew that!
>>
>> Yeah, but I think the current programming paradigm is the problem. I
>> think something else needs to come along. The current methods are all
>> based on one, massive von Neumann design and that is what
>> has hit the wall... duh!
>>
>> Time to think in terms of much smaller entities not totally different
>> from what is found in FPGAs, just processors rather than logic.
>>
>> An 80 core chip will just be a starting point, but the hard part will
>> *be* getting started.
>>
>>
>>>> I actually just see confusion ahead as all of the existing methods
>>>> seem to have come to a steep incline if
>>>> not a brick wall. It may be time for something completely different.
>>>
>>> Precisely. My bet is that message passing between
>>> independent processor+memory systems has the
>>> biggest potential. It matches nicely onto many
>>> forms of event-driven industrial and financial
>>> applications and, I am told, onto significant
>>> parts of HPC. It is also relatively easy to
>>> comprehend and debug.
>>>
>>> The trick will be to get the sizes of the
>>> processor + memory + computation "just right".
>>> And desktop/GUI doesn't match that.
>>
>> I think the trick will be in finding ways of dividing up the programs
>> so they can meld to the hardware rather than trying to optimize
>> everything.
>
> My suspicion is that, except for compute-bound
> problems that only require "local" data, that
> granularity will be too small.
>
> Examples where it will work, e.g. protein folding,
> will rapidly migrate to CUDA and graphics processors.

You are still thinking von Neumann.  Any application can be broken down 
into small units and parceled out to small processors.  But you have to 
think in those terms rather than just saying, "it doesn't fit".  Of 
course it can fit!


>> Consider a chip where you have literally a trillion operations per
>> second available all the time. Do you really care if half go to waste?
>> I don't! I design FPGAs and I have never felt obliged (not
>> since the early days anyway) to optimize the utility of each LUT and
>> FF. No, it turns out the precious resource in FPGAs is routing and you
>> can't do much but let the tools manage that anyway.
>
> Those internal FPGA constraints also have analogues at
> a larger scale, e.g. ic pinout, backplanes, networks...
>
>
>> So a fine grained processor array could be very effective if the
>> programming can be divided down to suit. Maybe it takes 10 of these
>> cores to handle 100 Mbps Ethernet, so what? Something like a
>> browser might need to harness a couple of dozen. If the load slacks
>> off and they are idling, so what?
>
> The fundamental problem is that in general as you make the
> granularity smaller, the communications requirements
> get larger. And vice versa :(

Actually not.  The aggregate comms requirements may increase, but we 
aren't sharing an Ethernet bus.  All of the local processors talk to 
each other and less often have to talk to non-local processors.  I think 
the phone company knows something about that.

If you apply your line of reasoning to FPGAs with the lowly 4 input LUT 
it would seem like they would be doomed to eternal comms congestion. 
Look at the routing in FPGAs and other PLDs sometime.  They are 
hierarchical.  Works pretty well, but the trade off is in worrying about 
providing enough comms to let all of the logic be used for every design 
or just not worrying about it and "making do".  Works pretty well if the 
designers just chill about utilization.


>>>>> It would also help if people can decide that reliability
>>>>> is important, and that bucketfuls of salt should be
>>>>> on hand when listening to salesman's protestations that
>>>>> "the software/hardware framework takes care of all of
>>>>> that so you don't have to worry".
>>>>
>>>> What? Since when did engineers listen to salesmen?
>>>
>>> Since their PHBs get taken out to the golf course
>>> to chat about sport by the salesmen :(
>>
>> It's a bit different with me. I am my own PHB and I kayak, not golf. I
>> have one disti person who I really enjoy talking to. She tried to help
>> me from time to time, but often she can't do a lot
>> because I'm not buying 1000's of chips. But my quantities have gone up
>> a bit lately, we'll see where it goes.
>
> I'm sort-of retired (I got sick of corporate in-fighting,
> and I have my "drop dead money", so...)

That's me too, but I found some work that is paying off very well now. 
So I've got a foot in both camps, retired, not retired... both are fun 
in their own way.  But dealing with international shipping is a PITA.


> I regard golf as silly, despite having two courses in
> walking distance. My equivalent of kayaking is flying
> gliders.

That has got to be fun!  I've never worked up the whatever to learn to 
fly.  It seems like a big investment and not so cheap overall.  But 
there is clearly a great thrill there.

-- 

Rick

Article: 155447
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Fri, 28 Jun 2013 19:15:36 -0700 (PDT)
Links: << >> << T >> << A >>

On Friday, June 28, 2013 9:02:10 PM UTC-4, rickman wrote:

> You are still thinking von Neumann.  Any application can be broken down=
=20
> into small units and parceled out to small processors.  But you have to=
=20
> think in those terms rather than just saying, "it doesn't fit".  Of=20
> course it can fit!

Intra brain communications are hierarchical as well.

I'm nobody, but one of the reasons for designing Hive was because I feel pr=
ocessors in general are much too complex, to the point where I'm repelled b=
y them.  I believe one of the drivers for this over-complexity is the fact =
that main memory is external.  I've been assembling PCs since the 286 days,=
 and I've never understood why main memory wasn't tightly integrated onto t=
he uP die.  Everyone pretty much gets the same ballpark memory size when pu=
tting a PC together, and I can remember only once or twice upgrading memory=
 after the initial build (for someone else's Dell or similar where the init=
ial build was anemically low-balled for "value" reasons).  Here we are in 2=
013, the memory is several light cm away from the processor on the MB, talk=
ing in cache lines, and I still don't get why we have this gross inefficien=
cy. =20

My dual core multi-GHz PC with SSD often just sits there for many seconds a=
fter I click on something, and malware is now taking me sometimes days to f=
ix.  Windows 7 is a dog to install, with relentless updates that often comp=
letely hose it rather than improve it.  The future isn't looking too bright=
 for the desktop with the way we're going.

Article: 155448
Subject: Re: New soft processor core paper publisher?
From: Les Cargill <lcargill99@comcast.com>
Date: Fri, 28 Jun 2013 21:44:37 -0500
Links: << >> << T >> << A >>

Eric Wallin wrote:
> On Friday, June 28, 2013 9:02:10 PM UTC-4, rickman wrote:
>
>> You are still thinking von Neumann.  Any application can be broken
>> down into small units and parceled out to small processors.  But
>> you have to think in those terms rather than just saying, "it
>> doesn't fit".  Of course it can fit!
>
> Intra brain communications are hierarchical as well.
>
> I'm nobody, but one of the reasons for designing Hive was because I
> feel processors in general are much too complex, to the point where
> I'm repelled by them.  I believe one of the drivers for this
> over-complexity is the fact that main memory is external.  I've been
> assembling PCs since the 286 days, and I've never understood why main
> memory wasn't tightly integrated onto the uP die.

RAM was both large and expensive until recently. Different people
made RAM than made processors and it would have been challenging to get
the business arrangements such that they'd glue up.

Plus, beginning not long ago, you're rwally dealing with cache directly, 
not RAM. Throw in that main memory is DRAM, and it gets a lot more 
complicated.

Building a BSP for a new board from scratch with a DRAM controller
is a lot of work.

> Everyone pretty
> much gets the same ballpark memory size when putting a PC together,
> and I can remember only once or twice upgrading memory after the
> initial build (for someone else's Dell or similar where the initial
> build was anemically low-balled for "value" reasons).  Here we are in
> 2013, the memory is several light cm away from the processor on the
> MB, talking in cache lines, and I still don't get why we have this
> gross inefficiency.
>

That's not generally the bottleneck, though.

> My dual core multi-GHz PC with SSD often just sits there for many
> seconds after I click on something, and malware is now taking me
> sometimes days to fix.

Geez. Ever use virtual machines? If you break/infect one,
just roll it back.

> Windows 7 is a dog to install, with
> relentless updates that often completely hose it rather than improve
> it.  The future isn't looking too bright for the desktop with the way
> we're going.
>

--
Les Cargill

Article: 155449
Subject: Re: New soft processor core paper publisher?
From: Les Cargill <lcargill99@comcast.com>
Date: Fri, 28 Jun 2013 21:55:05 -0500
Links: << >> << T >> << A >>

Bakul Shah wrote:
> On 6/28/13 2:33 AM, Tom Gardner wrote:
>> On 28/06/13 10:09, RCIngham wrote:
>>> <snip>
>>>
>>>> Mind you, I'd *love* to see a radical overhaul of traditional
>>>> multicore processors so they took the form of
>>>>    - a large number of processors
>>>>    - each with completely independent memory
>>>>    - connected by message passing fifos
>>>>
>>>> In the long term that'll be the only way we can continue
>>>> to scale individual machines: SMP scales for a while, but
>>>> then cache coherence requirements kill performance.
>>>>
>>>
>>> Transputer?
>>> http://en.wikipedia.org/wiki/Transputer
>>
>> It had a lot going for it, but was a too dogmatic about
>> the development environment. At the time it was respectably
>> fast, but that wasn't sufficient -- particularly since there
>> was so much scope for increasing speed of uniprocessor
>> machines.
>
> Have you looked at Tilera's TILEpro64 or Adapteva's Epiphany
> 64 core processors?
>
>> Given that uniprocessors have hit a wall, transputer
>> *concepts* embodied in a completely different form
>> might begin to be fashionable again.
>
> Languages like Erlang and Go use similar concepts (as
> did Occam on the transputer). But I think the problem
> is that /in general/ we still don't know how to write
> parallel or distributed programs.

I do - I've been doing it for a long time, too. It's not
all that hard if you have no libraries getting in the way.

This, by the way, is absolutely nothing fancy. It's
precisely the same concepts as when we linked stuff
together with serial ports in the Stone Age.

> Most of the concepts
> are from ~40 years back (CSP, guarded commands etc.).

Most *all* concepts in computers are from that long ago or longer.
The "new stuff" is more about arbitraging market forces than getting 
real work done.

> We still don't have decent tools.

I respectfully disagree. But my standard for "decency" is
probably different from your'n. My idea of an IDE is an
editor and a shell prompt...

> Turning serial programs
> into parallel versions is manual, laborious, error prone
> and not very successful.

So don't do that. Write them to be parallel from the
git-go. Write them to be event-driven. It's better in
all dimensions.

After all, we're all really clockmakers. Events regulate our
"wheels" just like the escapement on a pendulum clock. .
When you get that happening, things get to be a lot more
deterministic and that is what parallelism needs the most.

-- 
Les Cargill

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search