Messages from 155400

Article: 155400
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Tue, 25 Jun 2013 21:26:14 +0100
Links: << >> << T >> << A >>

Eric Wallin wrote:
> On Monday, June 24, 2013 3:24:44 AM UTC-4, Tom Gardner wrote:
>
>> Consider trying to pass a message consisting of one
>> integer from one thread to another such that the
>> receiving thread is guaranteed to be able to picks
>> it up exactly once.
>
> Thread A works on the integer value and when it is done it writes it to location Z.  It then reads a value at location X, increments it, and writes it back to location X.
>
> Thread B has been repeatedly reading location X and notices it has been incremented.  It reads the integer value at Z, performs some function on it, and writes it back to location Z.  It then reads a value at Y, increments it, and writes it back to location Y to let thread A know it took, worked on, and replaced the integer at Z.
>
> The above seems airtight to me if reads and writes to memory are not cached or otherwise delayed, and I don't see how interrupts are germane, but perhaps I haven't taken everything into account.

Have a look at
http://pages.cs.wisc.edu/~remzi/Classes/537/Fall2011/Book/threads-intro.pdf
section 25.3 et al for one exposition of the kinds of problem that arise.

That exposition is in x86 terms but it applies equally to all other 11 major
processor families I've examined over the past 35 years.
If there is a reason your processor cannot experience these issues, let us know.

Subsequent chapters on the solutions can be found
http://pages.cs.wisc.edu/~remzi/Classes/537/Fall2011/

Article: 155401
Subject: Re: New soft processor core paper publisher?
From: Bakul Shah <usenet@bitblocks.com>
Date: Tue, 25 Jun 2013 13:42:03 -0700
Links: << >> << T >> << A >>

On 6/25/13 10:14 AM, glen herrmannsfeldt wrote:
> Well, read the wikipedia article on spinlock and the linked-to
> article Peterson's_Algorithm.
>
> It is more efficient if you have an interlocked write, but can be
> done with spinlocks, if there is no reordering of writes to memory.
>
> As many processors now do reorder writes, there is need for special
> instructions.
>
> Otherwise, spinlocks might be good enough.

Spinlock is not good enough without special instructions. That
is why Petersen's or Dekker's or Szymanski's algorithms. Now
most processor provide some h/w support for mutexes. Most papers
on implementing mutex with just shared memory are 25+ years old.
Now this is just an interesting puzzle!

Article: 155402
Subject: Re: New soft processor core paper publisher?
From: thomas.entner99@gmail.com
Date: Tue, 25 Jun 2013 14:56:51 -0700 (PDT)
Links: << >> << T >> << A >>

Am Mittwoch, 12. Juni 2013 23:17:18 UTC+2 schrieb Eric Wallin:
> I have a general purpose soft processor core that I developed in verilog.=
  The processor is unusual in that it uses four indexed LIFO stacks with ex=
plicit stack pointer controls in the opcode.  It is 32 bit, 2 operand, full=
y pipelined, 8 threads, and produces an aggregate 200 MIPs in bargain basem=
ent Altera Cyclone 3 and 4 speed grade 8 parts while consuming ~1800 LEs.  =
The design is relatively simple (as these things go) yet powerful enough to=
 do real work.
>=20

Hi Eric,

first of all: I like your name, I have designed a soft-core CPU called ERIC=
5 ;-)

I have read your paper quickly and would like to give you some feedback:
- What is the target application of your processor? Barrel processors can m=
ake sense for special (highly parallel) applications but will have the prob=
lem that most programmers prefer high single thread performance simply beca=
use it is much easier to program.
- If you target general purpose applications in FPGAs, your core will be co=
mpared with e.g. Nios II or MICO32 (open source). They are about the same s=
ize, are fully 32bit, have high single thread performance and a full design=
 suite. What are the benefits of your core?
- If you want the core to be really used by others, a C-compiler is a MUST.=
 (I learned this with ERIC5 quickly.) This will most likely be much more ef=
fort than the core itself...

I know that designing a CPU is a lot of fun and I assume that this was the =
real motivation (which is perfectly valid, of course). Also it will give yo=
u experience in this field and maybe also reputation with future employees =
or others. However, if you want to make it a commercial successful product =
(or even more widely used than other CPUs on opencores), it will be a long =
hard way against Nios II, etc.

Regards,

Thomas
www.entner-electronics.com

Article: 155403
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Tue, 25 Jun 2013 15:40:33 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, June 25, 2013 4:26:14 PM UTC-4, Tom Gardner wrote:

> Have a look at
> http://pages.cs.wisc.edu/~remzi/Classes/537/Fall2011/Book/threads-intro.p=
df
> section 25.3 et al for one exposition of the kinds of problem that arise.

It talks about separate threads writing to the same location, which I under=
stand can be a problem with interrupts and without atomic read-modify-write=
.  All I can do is repeat that you don't program this way it won't happen. =
 A subroutine can be written so that threads can share a common instance of=
 it, but without using a common memory location to store data associated wi=
th the execution of that subroutine (unless the location is memory mapped H=
W).  In Hive, there is a register that when read returns the thread ID, whi=
ch is unique for each thread.  This could be used as an offset for subrouti=
ne data locations.

Article: 155404
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 19:06:30 -0400
Links: << >> << T >> << A >>

On 6/25/2013 4:23 PM, Bakul Shah wrote:
> On 6/25/13 11:18 AM, Eric Wallin wrote:
>> On Tuesday, June 25, 2013 11:14:52 AM UTC-4, Tom Gardner wrote:
>>
>>>> I believe Eric's point is that the thing that prevents more than one
>>>> processor from accessing the same memory location is the programmer.
>>>> Is that not a good enough method?
>
> This is not good enough in general. I gave some examples where threads
> have to read/write the same memory location.

I didn't see any examples that were essential.  You talked about two 
processes accessing the same data.  Why do you need to do that?  Just 
have one process send the data to the other process so only one updates 
the list.

> I agree with you that if threads communicate just through fifos
> and there is exactly one reader and one writer there is no problem.
> The reader updates the read ptr & watches but doesn't update the
> write ptr. The writer updates the write ptr & watches but doesn't
> update the read ptr. You can use fifos like these to implement a
> mutex but this is a very expensive way to implement mutexes and
> doesn't scale.

Doesn't scale?  Can you explain?

>> The paper has bulleted feature list at the very front and a downsides
>> bulleted list at the very back. I tried to write it in an accessible
>> manner for the widest audience. We all like to think aloud now and
>> then, but I'd think a comprehensive design paper would sidestep all of
>> this wild speculation and unnecessary third degree.
>
> I don't think it is a question of "third degree". You did invite
> feedback!
>
> Adding compare-and-swap or load-linked & store-conditional would
> make your processor more useful for parallel programming. I am not
> motivated enough to go through 4500+ lines of verilog to know how
> hard that is but you must already have some bus arbitration logic
> since all 8 threads can access memory.

You don't understand even the most basic concept of how the device 
works.  There is no arbitration logic because there is only one 
processor that is time shared between 8 processes on a clock cycle basis 
to match the 8 deep pipeline.

I'm not trying to be snarkey, but there are a lot of people posting here 
who really don't get the idea behind this design.

>> http://opencores.org/usercontent,doc,1371986749
>
> I missed this link before. A nicely done document! A top level
> diagram would be helpful. 64K address space seems too small.

Too small for what?  That is part of what people aren't getting.  This 
is not intended to be even remotely comparable to an ARM or an x86 
processor.  This is intended to replace a microBlaze or a B16 type FPGA 
core.

-- 

Rick

Article: 155405
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Tue, 25 Jun 2013 16:07:28 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, June 25, 2013 5:56:51 PM UTC-4, thomas....@gmail.com wrote:

> - What is the target application of your processor? Barrel processors can=
 make sense for special (highly parallel) applications but will have the pr=
oblem that most programmers prefer high single thread performance simply be=
cause it is much easier to program.
>=20

The target application is for an FPGA logic designer who needs processor fu=
nctionality but doesn't want or need anything too complex.  There is no nee=
d for a toolchain for instance, and operation has been kept as simple as po=
ssible.

> - If you target general purpose applications in FPGAs, your core will be =
compared with e.g. Nios II or MICO32 (open source). They are about the same=
 size, are fully 32bit, have high single thread performance and a full desi=
gn suite. What are the benefits of your core?
>=20

The benefit is it is really is free so you aren't legally bound to vendor s=
ilicon (not that all are).  And if you hate having yet another toolset betw=
een you and what is going on you're probably SOL with most soft processors =
as they are quite complex (overly so for many low level applications, IMO).=
  No one will be running Linux on Hive for instance.  But running Linux on =
any soft core seems kind of dumb to me, when you need that much processor y=
ou might as well buy an ASIC which is cheaper, faster, etc. and not a blob =
of logic.

> - If you want the core to be really used by others, a C-compiler is a MUS=
T. (I learned this with ERIC5 quickly.) This will most likely be much more =
effort than the core itself...

Nope, ain't gonna do it, and you can't make me!  :-)  A compiler for someth=
ing this low level is overkill and kind of asking for it IMO.

> I know that designing a CPU is a lot of fun and I assume that this was th=
e real motivation (which is perfectly valid, of course). Also it will give =
you experience in this field and maybe also reputation with future employee=
s or others. However, if you want to make it a commercial successful produc=
t (or even more widely used than other CPUs on opencores), it will be a lon=
g hard way against Nios II, etc.

I have like zero interest in Nios et al.  Hive is mainly for my real use fo=
r serializing low bandwidth FPGA applications that would otherwise under ut=
ilize the fast FPGA fabric.  But after all the work that went into it I wan=
ted to get it out there for others to use, or perhaps to employ one or more=
 aspects of Hive in their own processor core.

I hope to use Hive in a digital Theremin I've been working on for about a y=
ear now.  Too soon to really know, but one thread will probably handle the =
user interface (LCD, rotary encoder, LEDs, etc.) another will probably hand=
le linearization and scaling of the pitch side, another the wavetable and f=
iltering stuff, etc. so I believe I can keep the threads busy.  My main fea=
r at this point is that heat from the FPGA will disturb the exquisitely sen=
sitive electronics (there's only about 1 pF difference over the entire play=
able pitch range).  The open project is described in a forum thread over at=
 http://www.thereminworld.com if anyone is interested (I'm "dewster").

Article: 155406
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Wed, 26 Jun 2013 00:07:59 +0100
Links: << >> << T >> << A >>

Eric Wallin wrote:
> On Tuesday, June 25, 2013 4:26:14 PM UTC-4, Tom Gardner wrote:
>
>> Have a look at
>> http://pages.cs.wisc.edu/~remzi/Classes/537/Fall2011/Book/threads-intro.pdf
>> section 25.3 et al for one exposition of the kinds of problem that arise.
>
> It talks about separate threads writing to the same location, which I
 > understand can be a problem with interrupts and without atomic
 > read-modify-write.  All I can do is repeat that you don't program
> this way it won't happen.

If that is a constraint on the permissible programming style then it
would be good to state that explicitly - to save other people's time,
to save you questions, and to save everybody late unpleasant
surprises.

That is a very common programming paradigm that people will expect
to employ to solve problems they expect to encounter. It would be
beneficial for you to demonstrate the coding techniques that you
expect to be used to solve their problems. Think of it as an
application note :)

> A subroutine can be written so that threads can share a common
 > instance of it,

I presume "it" = code.

> but without using a common memory location tostore data
 > associated with the execution of that subroutine (unless
 > the location is memory mapped HW).

Sounds equivalent to keeping all the data on the thread's
stack in most of the other processors I've used.

Works for data that isn't shared between threads.

 > In Hive, there is a register that when read returns the
 > thread ID, which is unique for each thread.  This could
 > be used as an offset for subroutine data locations.

But what about data that has, of necessity, to be shared
between threads? For example a flag indicating whether or
not a non-sharable global resource (e.g. some i/o device,
or some data structure) is in use or is free to be used.

None of these situations are unique to your processor.
They first became a pain point in the 1960s and
necessitated development of techniques to resolve the
problem. If you've found a way to avoid such problems,
write it up and become famous.

Article: 155407
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 19:12:55 -0400
Links: << >> << T >> << A >>

On 6/25/2013 3:02 PM, glen herrmannsfeldt wrote:
> rickman<gnuarm@gmail.com>  wrote:
>
> (snip, someone wrote)
>>>> Not sure what you mean by "machine cycle".  As I said above, there are 8
>>>> clocks to the processor machine cycle, but they are all out of phase.
>>>> So on any given clock cycle only one processor will be updating
>>>> registers or memory.
>
> (then I wrote)
>>> If there 8 processors that never communicate, it would be better
>>> to have 8 separate RAM units.
>
>> Why is that?  What would be "better" about it?
>
> Well, if the RAM really is fast enough not to be the in the
> critical path, then maybe not, but separate RAM means no access
> limitations.

I don't follow your logic, but I bet that is because your logic doesn't 
apply to this design.  Do you understand that there is really only one 
processor?  So what advantage could there be having 8 RAMs?


>>>> I believe Eric's point is that the thing that prevents more than one
>>>> processor from accessing the same memory location is the programmer.  Is
>>>> that not a good enough method?
>
>>> So no thread ever communicates with another one?
>
>>> Well, read the wikipedia article on spinlock and the linked-to
>>> article Peterson's_Algorithm.
>
>>> It is more efficient if you have an interlocked write, but can be
>>> done with spinlocks, if there is no reordering of writes to memory.
>
>>> As many processors now do reorder writes, there is need for special
>>> instructions.
>
>> Are we talking about the same thing here?  We were talking about the
>> Hive processor.
>
> I was mentioning it for context. For processor that do reorder
> writes, you can't use Peterson's algorithm.

Ok, so this does not apply to the processor at hand, right?

Your quotes are a bit hard to read.  They turn the quoted blank lines 
into new unquoted lines.  Are you using Google by any chance and ripping 
out all the double spacing or something?


>>> Otherwise, spinlocks might be good enough.
>
>> So your point is?
>
> Without write reordering, it is possible, though maybe not
> efficient, to communicate without interlocked writes.

Since this processor doesn't do write reordering Bob's your uncle!


>> What would the critical section of code be doing that is critical?
>> Simple interprocess communications is not necessarily "critical".
>
> "Critical" means that the messages won't get lost due to other
> threads writing at about the same time. Now, much of networking
> is based on unreliable "best effort" protocols, and that may also
> work for communications to threads. But that involves delays and
> retransmission after timers expire.

You are talking very general here and I don't see how it applies to this 
discussion which is specific to this processor.

-- 

Rick

Article: 155408
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Tue, 25 Jun 2013 16:18:52 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, June 25, 2013 7:07:59 PM UTC-4, Tom Gardner wrote:

> But what about data that has, of necessity, to be shared
> between threads? For example a flag indicating whether or
> not a non-sharable global resource (e.g. some i/o device,
> or some data structure) is in use or is free to be used.

I plan to have one and only one thread handling I/O and passing the data on as needed via memory space to one or more other threads.  I promise to be careful and not blow up space-time when I write the code. ;-)

Article: 155409
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Tue, 25 Jun 2013 16:30:48 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, June 25, 2013 5:56:51 PM UTC-4, thomas....@gmail.com wrote:

> Barrel processors...

Hive is a barrel processor!  Thanks for that term Thomas!  I knew the idea wasn't original with me, but I had no idea the concept was so old (1964 Cray designed CDC 6000 series peripheral processors) and has been implemented many times since:

http://en.wikipedia.org/wiki/Barrel_processor

Article: 155410
Subject: Re: New soft processor core paper publisher?
From: thomas.entner99@gmail.com
Date: Tue, 25 Jun 2013 16:39:51 -0700 (PDT)
Links: << >> << T >> << A >>

Am Mittwoch, 26. Juni 2013 01:07:28 UTC+2 schrieb Eric Wallin:
> I hope to use Hive in a digital Theremin I've been working on for about a=
 year now.  Too soon to really know, but one thread will probably handle th=
e user interface (LCD, rotary encoder, LEDs, etc.) another will probably ha=
ndle linearization and scaling of the pitch side, another the wavetable and=
 filtering stuff, etc. so I believe I can keep the threads busy.

OK, I understand your idea behind the processor better now. But I think you=
 are targeting applications that could be realized also with PicoBlaze / Mi=
co8 / ERIC5 which are all MUCH smaller than your design. Of course your des=
ign has the benefit of 32b operations.

I guess it makes sense and will be fun for you to use it in your own projec=
ts. However, if other people compare Hive with e.g. Nios, most of them will=
 choose Nios because (for them) it looks less painful (both processors are =
new for them anyway, one can be programmed in C, for the other they have to=
 learn a new assembler language, one is supported by a large company and la=
rge community, the other not, etc.).

I just want to point out that there is a lot of competition out there...

Regards,

Thomas
www.entner-electronics.com

Article: 155411
Subject: Re: New soft processor core paper publisher?
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 25 Jun 2013 23:55:04 +0000 (UTC)
Links: << >> << T >> << A >>

Eric Wallin <tammie.eric@gmail.com> wrote:

(snip)
> It talks about separate threads writing to the same location, 
> which I understand can be a problem with interrupts and without 
> atomic read-modify-write.  All I can do is repeat that you 
> don't program this way it won't happen.  A subroutine can be 
> written so that threads can share a common instance of it, 
> but without using a common memory location to store data 
> associated with the execution of that subroutine (unless 
> the location is memory mapped HW).  

Sure, that is pretty common. It is usually related to being
reentrant, but not always exactly the same.

> In Hive, there is a register that when read returns the thread 
> ID, which is unique for each thread.  This could be used as an 
> offset for subroutine data locations.

Yes, but usually once in a while there needs to be communication
between threads. If no other time, to get data through an
I/O device, such as separate threads writing to the same
user console. (Screen, terminal, serial port, etc.)

-- glen

Article: 155412
Subject: Re: New soft processor core paper publisher?
From: thomas.entner99@gmail.com
Date: Tue, 25 Jun 2013 16:57:41 -0700 (PDT)
Links: << >> << T >> << A >>

Am Mittwoch, 26. Juni 2013 01:30:48 UTC+2 schrieb Eric Wallin:
> On Tuesday, June 25, 2013 5:56:51 PM UTC-4, thomas....@gmail.com wrote:
> 
> 
> 
> > Barrel processors...
> 
> 
> 
> Hive is a barrel processor!  Thanks for that term Thomas!  I knew the idea wasn't original with me, but I had no idea the concept was so old (1964 Cray designed CDC 6000 series peripheral processors) and has been implemented many times since:
> 
> 
> 
> http://en.wikipedia.org/wiki/Barrel_processor

Yes, the old heros of super computers and mainframes invented almost everything... E.g. I long assumed that Intel invented all this fancy out-of-order execution stuff, etc., just to learn recently that it was all long there, e.g.:
http://en.wikipedia.org/wiki/Tomasulo_algorithm

Regards,

Thomas
www.entner-electronics.com

Article: 155413
Subject: Re: New soft processor core paper publisher?
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 26 Jun 2013 00:02:24 +0000 (UTC)
Links: << >> << T >> << A >>

Eric Wallin <tammie.eric@gmail.com> wrote:

(snip)
> I plan to have one and only one thread handling I/O and passing 
> the data on as needed via memory space to one or more other 
> threads.  I promise to be careful and not blow up space-time 
> when I write the code. ;-)

OK, but you need a way to tell the other thread that its data
is ready, and a way for that thread to tell the I/O thread that
it got the data and is ready for more. And you want to do all
that without too much overhead.

-- glen

Article: 155414
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 20:06:32 -0400
Links: << >> << T >> << A >>

On 6/25/2013 7:07 PM, Tom Gardner wrote:
> Eric Wallin wrote:
>> On Tuesday, June 25, 2013 4:26:14 PM UTC-4, Tom Gardner wrote:
>>
>>> Have a look at
>>> http://pages.cs.wisc.edu/~remzi/Classes/537/Fall2011/Book/threads-intro.pdf
>>>
>>> section 25.3 et al for one exposition of the kinds of problem that
>>> arise.
>>
>> It talks about separate threads writing to the same location, which I
>  > understand can be a problem with interrupts and without atomic
>  > read-modify-write. All I can do is repeat that you don't program
>> this way it won't happen.
>
> If that is a constraint on the permissible programming style then it
> would be good to state that explicitly - to save other people's time,
> to save you questions, and to save everybody late unpleasant
> surprises.

If you care to go back through the discussion, I believe he did exactly 
that, say that two threads should not write to the same address.  And we 
have already discussed that this can be worked around.

> That is a very common programming paradigm that people will expect
> to employ to solve problems they expect to encounter. It would be
> beneficial for you to demonstrate the coding techniques that you
> expect to be used to solve their problems. Think of it as an
> application note :)
>
>> A subroutine can be written so that threads can share a common
>  > instance of it,
>
> I presume "it" = code.
>
>> but without using a common memory location tostore data
>  > associated with the execution of that subroutine (unless
>  > the location is memory mapped HW).
>
> Sounds equivalent to keeping all the data on the thread's
> stack in most of the other processors I've used.

Or you just don't share data...

One issue is the use of the word "thread".  I never understood the 
difference between thread and process until I read the link you 
provided.  We don't have to be talking about threads here.  I expect the 
processors will be much more likely to be running separate processes 
using separate memory.  Does that make you happier?  Then we can just 
say they don't share memory other than for communications that are well 
defined and preclude the conditions that cause problems.

> Works for data that isn't shared between threads.

Yes, or more specifically, it works as long as two threads (or 
processes) don't write to the same locations.

>  > In Hive, there is a register that when read returns the
>  > thread ID, which is unique for each thread. This could
>  > be used as an offset for subroutine data locations.
>
> But what about data that has, of necessity, to be shared
> between threads? For example a flag indicating whether or
> not a non-sharable global resource (e.g. some i/o device,
> or some data structure) is in use or is free to be used.

That's easy, don't have *global* I/O devices... let one processor 
control that I/O device and everyone else asks that processor for I/O 
support.  In fact, that is one of the few ways to actually get benefit 
from this processor design.  It is not all that much better than a 
single threaded processor in an FPGA.  The J1 runs at 100 MIPS and this 
runs at 200 MIPS.  But no one processor does more than 25.  So how to 
you use that?  You can assign tasks to processors and let them do 
separate jobs.

> None of these situations are unique to your processor.
> They first became a pain point in the 1960s and
> necessitated development of techniques to resolve the
> problem. If you've found a way to avoid such problems,
> write it up and become famous.

Yes, but none of these apply if you just read his paper...

-- 

Rick

Article: 155415
Subject: Re: New soft processor core paper publisher?
From: Bakul Shah <usenet@bitblocks.com>
Date: Tue, 25 Jun 2013 17:10:10 -0700
Links: << >> << T >> << A >>

On 6/25/13 4:06 PM, rickman wrote:
> On 6/25/2013 4:23 PM, Bakul Shah wrote:
>> This is not good enough in general. I gave some examples where threads
>> have to read/write the same memory location.
>
 > I didn't see any examples that were essential.  You talked about two processes accessing the same
> data.  Why do you need to do that?  Just have one process send the data to the other process so only
> one updates the list.

There by reducing things to single threading.

>> I agree with you that if threads communicate just through fifos
>> and there is exactly one reader and one writer there is no problem.
>> The reader updates the read ptr & watches but doesn't update the
>> write ptr. The writer updates the write ptr & watches but doesn't
>> update the read ptr. You can use fifos like these to implement a
>> mutex but this is a very expensive way to implement mutexes and
>> doesn't scale.
>
> Doesn't scale?  Can you explain?

Scale to more than two threads. For that it may be better to use
one of the other algorithms mentioned in my last article. Still pretty
complicated and inefficient.

>> Adding compare-and-swap or load-linked & store-conditional would
>> make your processor more useful for parallel programming. I am not
>> motivated enough to go through 4500+ lines of verilog to know how
>> hard that is but you must already have some bus arbitration logic
>> since all 8 threads can access memory.
>
> You don't understand even the most basic concept of how the device works.  There is no arbitration
> logic because there is only one processor that is time shared between 8 processes on a clock cycle
> basis to match the 8 deep pipeline.

In this case load-linked, store conditional may be possible?
load-linked records the loaded address & thread id in a special
register. If any other thread tries to *write* to the same
address, a subsequent store-conditional fails & the next instn
can test that. You could simplify this further at some loss of
efficiency: fail the store if there is *any* store by any other
thread!

Article: 155416
Subject: Re: New soft processor core paper publisher?
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 26 Jun 2013 00:14:46 +0000 (UTC)
Links: << >> << T >> << A >>

thomas.entner99@gmail.com wrote:

(snip)

> Yes, the old heros of super computers and mainframes invented 
> almost everything... E.g. I long assumed that Intel invented all 
> this fancy out-of-order execution stuff, etc., just to learn 
> recently that it was all long there, e.g.:

> http://en.wikipedia.org/wiki/Tomasulo_algorithm

The 360/91 is much more fun, though. Intel has out-of-order
execution, but in-order retirement. The results of instructions
are done in order. That takes memory to keep things around for
a while.

The 360/91 does out-of-order retirement. It helps that S/360
(except for the 67) doesn't have virtual memory. 

When an interrupt comes through, the pipelines have to
be flushed of all instructions, at least up to the last one
retired. The result is imprecise interrupts where the address
reported isn't the instruction at fault. (It is where to resume
execution after the interrupt, as usual.) Even more there is
multiple imprecise interrupt as more can occur before the pipeline
is empty.

Much of that went away when VS came in, so that page faults
could be serviced appropriately. 

The 360/91 was for many years, and maybe still is, a favorite
example for books on pipelined architecture. 

-- glen

Article: 155417
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 20:16:28 -0400
Links: << >> << T >> << A >>

On 6/25/2013 7:55 PM, glen herrmannsfeldt wrote:
> Eric Wallin<tammie.eric@gmail.com>  wrote:
>
> (snip)
>> It talks about separate threads writing to the same location,
>> which I understand can be a problem with interrupts and without
>> atomic read-modify-write.  All I can do is repeat that you
>> don't program this way it won't happen.  A subroutine can be
>> written so that threads can share a common instance of it,
>> but without using a common memory location to store data
>> associated with the execution of that subroutine (unless
>> the location is memory mapped HW).
>
> Sure, that is pretty common. It is usually related to being
> reentrant, but not always exactly the same.
>
>> In Hive, there is a register that when read returns the thread
>> ID, which is unique for each thread.  This could be used as an
>> offset for subroutine data locations.
>
> Yes, but usually once in a while there needs to be communication
> between threads. If no other time, to get data through an
> I/O device, such as separate threads writing to the same
> user console. (Screen, terminal, serial port, etc.)

Why would communications be a problem?  Just let one processor control 
the I/O device and let the other processors talk to that one.

-- 

Rick

Article: 155418
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 20:28:51 -0400
Links: << >> << T >> << A >>

On 6/25/2013 8:10 PM, Bakul Shah wrote:
> On 6/25/13 4:06 PM, rickman wrote:
>> On 6/25/2013 4:23 PM, Bakul Shah wrote:
>>> This is not good enough in general. I gave some examples where threads
>>> have to read/write the same memory location.
>>
>  > I didn't see any examples that were essential. You talked about two
> processes accessing the same
>> data. Why do you need to do that? Just have one process send the data
>> to the other process so only
>> one updates the list.
>
> There by reducing things to single threading.

Maybe you need to define what you mean by thread and process...


>>> I agree with you that if threads communicate just through fifos
>>> and there is exactly one reader and one writer there is no problem.
>>> The reader updates the read ptr & watches but doesn't update the
>>> write ptr. The writer updates the write ptr & watches but doesn't
>>> update the read ptr. You can use fifos like these to implement a
>>> mutex but this is a very expensive way to implement mutexes and
>>> doesn't scale.
>>
>> Doesn't scale? Can you explain?
>
> Scale to more than two threads. For that it may be better to use
> one of the other algorithms mentioned in my last article. Still pretty
> complicated and inefficient.

I didn't mean explain what "more" means, explain *why* it doesn't scale.


>>> Adding compare-and-swap or load-linked & store-conditional would
>>> make your processor more useful for parallel programming. I am not
>>> motivated enough to go through 4500+ lines of verilog to know how
>>> hard that is but you must already have some bus arbitration logic
>>> since all 8 threads can access memory.
>>
>> You don't understand even the most basic concept of how the device
>> works. There is no arbitration
>> logic because there is only one processor that is time shared between
>> 8 processes on a clock cycle
>> basis to match the 8 deep pipeline.
>
> In this case load-linked, store conditional may be possible?
> load-linked records the loaded address & thread id in a special
> register. If any other thread tries to *write* to the same
> address, a subsequent store-conditional fails & the next instn
> can test that. You could simplify this further at some loss of
> efficiency: fail the store if there is *any* store by any other
> thread!

Do you understand the processor design?

-- 

Rick

Article: 155419
Subject: Re: New soft processor core paper publisher?
From: Paul Rubin <no.email@nospam.invalid>
Date: Wed, 26 Jun 2013 01:41:22 -0700
Links: << >> << T >> << A >>

Andrew Haley <andrew29@littlepinkcloud.invalid> writes:
>> I think most Java runs on SIM cards.
> Err, what is this belief based on?  I mean, you might be right, but I
> never heard that before.

http://www.oracle.com/us/technologies/java/embedded/card/overview/index.html :

Overview:

    Currently shipping on more than 2 billion devices/year

    Deployed on more than 9 billion devices around the world since 1998

    More than 50% of SIM cards deployed in 2011 run Java Card

    ...

    Included in billions of SIM cards, payment cards, ID cards,
    e-passports, and more

Article: 155420
Subject: Re: New soft processor core paper publisher?
From: Andrew Haley <andrew29@littlepinkcloud.invalid>
Date: Wed, 26 Jun 2013 04:00:44 -0500
Links: << >> << T >> << A >>

In comp.lang.forth Paul Rubin <no.email@nospam.invalid> wrote:
> Andrew Haley <andrew29@littlepinkcloud.invalid> writes:
>>> I think most Java runs on SIM cards.
>> Err, what is this belief based on?  I mean, you might be right, but I
>> never heard that before.
> 
> http://www.oracle.com/us/technologies/java/embedded/card/overview/index.html :
> 
> Overview:
> 
>    Currently shipping on more than 2 billion devices/year
> 
>    Deployed on more than 9 billion devices around the world since 1998
> 
>    More than 50% of SIM cards deployed in 2011 run Java Card
> 
>    ...
> 
>    Included in billions of SIM cards, payment cards, ID cards,
>    e-passports, and more

OK, but that's hardly "most Java", unless you're just counting the
number of virtual machines that might run at some point.

Andrew.

Article: 155421
Subject: Re: New soft processor core paper publisher?
From: Paul Rubin <no.email@nospam.invalid>
Date: Wed, 26 Jun 2013 02:13:36 -0700
Links: << >> << T >> << A >>

Andrew Haley <andrew29@littlepinkcloud.invalid> writes:
> OK, but that's hardly "most Java", unless you're just counting the
> number of virtual machines that might run at some point.

Well there's all sorts of ways to calculate it.  If you want total LOC,
Android phones may be past servers by now.

Article: 155422
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Wed, 26 Jun 2013 10:32:23 +0100
Links: << >> << T >> << A >>

rickman wrote:
> On 6/25/2013 7:55 PM, glen herrmannsfeldt wrote:
>> Eric Wallin<tammie.eric@gmail.com>  wrote:
>>
>> (snip)
>>> It talks about separate threads writing to the same location,
>>> which I understand can be a problem with interrupts and without
>>> atomic read-modify-write.  All I can do is repeat that you
>>> don't program this way it won't happen.  A subroutine can be
>>> written so that threads can share a common instance of it,
>>> but without using a common memory location to store data
>>> associated with the execution of that subroutine (unless
>>> the location is memory mapped HW).
>>
>> Sure, that is pretty common. It is usually related to being
>> reentrant, but not always exactly the same.
>>
>>> In Hive, there is a register that when read returns the thread
>>> ID, which is unique for each thread.  This could be used as an
>>> offset for subroutine data locations.
>>
>> Yes, but usually once in a while there needs to be communication
>> between threads. If no other time, to get data through an
>> I/O device, such as separate threads writing to the same
>> user console. (Screen, terminal, serial port, etc.)
>
> Why would communications be a problem?  Just let one processor control the I/O device and let the other processors talk to that one.

Oh dear. It looks like you have vanishingly little experience
writing software. That is supported by your statement in another
post.

On 26/06/13 01:06, rickman wrote:
 > I never understood the difference between thread and
 > process until I read the link you provided.

Article: 155423
Subject: Re: FPGA Exchange
From: jonesandy@comcast.net
Date: Wed, 26 Jun 2013 06:10:02 -0700 (PDT)
Links: << >> << T >> << A >>

Rick, 

Guy does not get advertising $ when people use comp.arch.fpga.

Andy

Article: 155424
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Wed, 26 Jun 2013 06:24:48 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, June 25, 2013 7:39:51 PM UTC-4, thomas....@gmail.com wrote:

> I just want to point out that there is a lot of competition out there...

I'm just putting it out there, people can use it if they want to, or not.

Thomas, with your experience with ERIC5 series, do you see anything obviously missing from the Hive instruction set?  What do you think of the literal sizing?

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search