Messages from 155375

Article: 155375
Subject: Re: New soft processor core paper publisher?
From: Bakul Shah <usenet@bitblocks.com>
Date: Mon, 24 Jun 2013 16:00:11 -0700
Links: << >> << T >> << A >>

On 6/24/13 3:23 PM, Eric Wallin wrote:
> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote:
>
>> Consider a case where *both* thread A and B want to increment
>> a counter at location X? A reads X and finds it contains 10. But
>> before it can write back 11, B reads X and finds 10 and it too
>> writes back 11. Now you've lost a count. Can this happen in your
>> design? If so you need some sort of atomic update instruction.
>
> It can happen if the programmer is crazy enough to do it, otherwise not.

Concurrent threads need to communicate with each other to cooperate
on some common task. Consider two threads adding an item to a linked
list or keeping statistics on some events or many such things. You
are pretty much required to be "crazy enough"! Any support for mutex
would simplify things quite a bit. Without atomic update you have to
use some complicated, inefficient algorithm to implement mutexes.

Article: 155376
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Tue, 25 Jun 2013 00:17:37 +0100
Links: << >> << T >> << A >>

Bakul Shah wrote:
> On 6/24/13 3:23 PM, Eric Wallin wrote:
>> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote:
>>
>>> Consider a case where *both* thread A and B want to increment
>>> a counter at location X? A reads X and finds it contains 10. But
>>> before it can write back 11, B reads X and finds 10 and it too
>>> writes back 11. Now you've lost a count. Can this happen in your
>>> design? If so you need some sort of atomic update instruction.
>>
>> It can happen if the programmer is crazy enough to do it, otherwise not.
>
> Concurrent threads need to communicate with each other to cooperate
> on some common task. Consider two threads adding an item to a linked
> list or keeping statistics on some events or many such things. You
> are pretty much required to be "crazy enough"! Any support for mutex
> would simplify things quite a bit. Without atomic update you have to
> use some complicated, inefficient algorithm to implement mutexes.

Just so.

A programmer that doesn't understand that is the equivalent
of a hardware engineer that doesn't under stand metastability.
(When I started out most people denied the possibility of
synchronisation failure due to metastability!)

Mind you, I'd *love* to see a radical overhaul of traditional
multicore processors so they took the form of
   - a large number of processors
   - each with completely independent memory
   - connected by message passing fifos

In the long term that'll be the only way we can continue
to scale individual machines: SMP scales for a while, but
then cache coherence requirements kill performance.

Article: 155377
Subject: Re: New soft processor core paper publisher?
From: David Brown <david@westcontrol.removethisbit.com>
Date: Tue, 25 Jun 2013 11:12:07 +0200
Links: << >> << T >> << A >>

On 25/06/13 01:17, Tom Gardner wrote:
> Bakul Shah wrote:
>> On 6/24/13 3:23 PM, Eric Wallin wrote:
>>> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote:
>>>
>>>> Consider a case where *both* thread A and B want to increment
>>>> a counter at location X? A reads X and finds it contains 10. But
>>>> before it can write back 11, B reads X and finds 10 and it too
>>>> writes back 11. Now you've lost a count. Can this happen in your
>>>> design? If so you need some sort of atomic update instruction.
>>>
>>> It can happen if the programmer is crazy enough to do it, otherwise not.
>>
>> Concurrent threads need to communicate with each other to cooperate
>> on some common task. Consider two threads adding an item to a linked
>> list or keeping statistics on some events or many such things. You
>> are pretty much required to be "crazy enough"! Any support for mutex
>> would simplify things quite a bit. Without atomic update you have to
>> use some complicated, inefficient algorithm to implement mutexes.
> 
> Just so.
> 
> A programmer that doesn't understand that is the equivalent
> of a hardware engineer that doesn't under stand metastability.
> (When I started out most people denied the possibility of
> synchronisation failure due to metastability!)
> 
> Mind you, I'd *love* to see a radical overhaul of traditional
> multicore processors so they took the form of
>   - a large number of processors
>   - each with completely independent memory
>   - connected by message passing fifos

This sounds nice in theory, but in practice there can be problems.
Scaling with number of processors can quickly become an issue here -
lock-free algorithms and fifos work well between two processors, but
scale badly with many processors.  Independent memory for each processor
sounds nice, and can work well for some purposes, but is a poor
structure for general-purpose computing.

If you want to scale well, you want hardware support for semaphores.
And you don't want to divide things up by processor - you want to be
able to divide them up by process or thread.  Threads should have
independent memory areas, which they can access safely and quickly
regardless of which cpu they are running on.  Otherwise you spend much
of your bandwidth just moving data around between your cpu-dependent
memory blocks (replacing the cache coherence problems with new memory
movement bottlenecks), or your threads have to have very strong affinity
to particular cpus and you lose your scaling.

> 
> In the long term that'll be the only way we can continue
> to scale individual machines: SMP scales for a while, but
> then cache coherence requirements kill performance.

Article: 155378
Subject: Re: FPGA Exchange
From: Guy Eschemann <Guy.Eschemann@gmail.com>
Date: Tue, 25 Jun 2013 03:00:12 -0700 (PDT)
Links: << >> << T >> << A >>

Hello RCIngham,

sorry to hear that. FPGA Exchange is built on a platform that makes heavy u=
se of JavaScript, so the minimum browser requirements are quite high:

- Internet Explorer 10+
- Google Chrome 24+
- Firefox 14+
- Safari 5+

As the aim is to create a discussion forum for the next decade of programma=
ble logic, we unfortunately cannot support older browsers.

Guy Eschemann
Ingenieurb=FCro ESCHEMANN
Am Sandfeld 17a
76149 Karlsruhe, Germany

Tel.: +49 (0) 721 170 293 89
Fax: +49 (0) 721 170 293 89 - 9

Guy.Eschemann@gmail.com
Follow me on Twitter: @geschema
http://noasic.com
NEW: http://fpga-exchange.com
http://fpga-news.de

Article: 155379
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 10:53:46 -0400
Links: << >> << T >> << A >>

On 6/24/2013 12:56 PM, Tom Gardner wrote:
> rickman wrote:
>> On 6/24/2013 11:57 AM, Eric Wallin wrote:
>>> On Monday, June 24, 2013 9:47:28 AM UTC-4, Tom Gardner wrote:
>>>
>>>> Please explain why your processor does not need test and set or
>>>> compare and exchange operations. What theoretical advance have you
>>>> made?
>>>
>>> I'm not exactly sure why we're having this generalized, theoretical
>>> discussion when a simple reading the design document I've provided
>>> would probably answer your questions. If it doesn't then
>>> perhaps you could tell me what I left out, and I might include that
>>> info in the next rev. Not trying to be gruff or anything, I'd very
>>> much like the document (and processor) to be on as solid a
>>> footing as possible.
>>
>> Eric, I think you have explained properly how your design will deal
>> with synchronization. I'm not sure what Tom is going on about. Clearly
>> he doesn't understand your design.
>
> Correct.

I'm glad you understand that.


>> If it is of any help, Eric's design is more like 8 cores running in
>> parallel, time sharing memory and in fact, the same processor hardware
>> on a machine cycle basis
>> (so no 8 ported memory required).
>
> Fair enough; sounds like it is in the same area as the propellor chip.

No point in making such a comparison.  If you want to understand Eric's 
chip, then learn about Eric's chip.  I certainly don't know enough about 
the Propeller chip to compare in a meaningful manner.

Just think of each processor executing one instruction every 8 clocks, 
but all processors are out of phase, so no one completes on the same 
clock.


> Is there anything to prevent multiple cores reading/writing the
> same memory location in the same machine cycle? What is the
> result when that happens?

Not sure what you mean by "machine cycle".  As I said above, there are 8 
clocks to the processor machine cycle, but they are all out of phase. 
So on any given clock cycle only one processor will be updating 
registers or memory.

I believe Eric's point is that the thing that prevents more than one 
processor from accessing the same memory location is the programmer.  Is 
that not a good enough method?


>> If an interrupt occurs it doesn't cause one of the other 7 tasks to
>> run, they are already running, it simply invokes the interrupt
>> handler. I believe Eric is not envisioning multiple tasks on a
>> single processor.
>
> Such presumptions would be useful to have in the white paper.

Have you read the paper?  How do you know its not there?


>> As others have pointed out, test and set instructions are not required
>> to support concurrency and communications. They are certainly nice to
>> have, but are not essential.
>
> Agreed. I'm perfectly prepared to accept alternative techniques,
> e.g. disable interrupts.

Ok, so is this discussion over?


>> In your case they would be superfluous.
>
> Not proven to me.
>
> The trouble is I've seen too many hardware designs that
> leave the awkward problems to software - especially first
> efforts by small teams.
>
> And too often those problems can be very difficult to solve
> in software. Nowadays it is hard to find people that have
> sufficient experience across the whole hardware/firmware/system
> software spectrum to enable them to avoid such traps.
>
> I don't know whether Eric is such a person, but I'm afraid
> his answers have raised orange flags in my mind.
>
> As a point of reference, I had similar misgivings when I
> first heard about the Itanium's architecture in, IIRC,
> 1994. I suppressed them because the people involved were
> undoubtedly more skilled in the area that I, and had been
> working for 5 years. Much later I regrettably came to the
> conclusion the orange flags were too optimistic.

If you still have reservations, then learn about the design.  If you 
don't want to invest the time to learn about the design, why are you 
bothering to object to it?

-- 

Rick

Article: 155380
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Tue, 25 Jun 2013 15:54:16 +0100
Links: << >> << T >> << A >>

David Brown wrote:
> On 25/06/13 01:17, Tom Gardner wrote:
>> Bakul Shah wrote:
>>> On 6/24/13 3:23 PM, Eric Wallin wrote:
>>>> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote:
>>>>
>>>>> Consider a case where *both* thread A and B want to increment
>>>>> a counter at location X? A reads X and finds it contains 10. But
>>>>> before it can write back 11, B reads X and finds 10 and it too
>>>>> writes back 11. Now you've lost a count. Can this happen in your
>>>>> design? If so you need some sort of atomic update instruction.
>>>>
>>>> It can happen if the programmer is crazy enough to do it, otherwise not.
>>>
>>> Concurrent threads need to communicate with each other to cooperate
>>> on some common task. Consider two threads adding an item to a linked
>>> list or keeping statistics on some events or many such things. You
>>> are pretty much required to be "crazy enough"! Any support for mutex
>>> would simplify things quite a bit. Without atomic update you have to
>>> use some complicated, inefficient algorithm to implement mutexes.
>>
>> Just so.
>>
>> A programmer that doesn't understand that is the equivalent
>> of a hardware engineer that doesn't under stand metastability.
>> (When I started out most people denied the possibility of
>> synchronisation failure due to metastability!)
>>
>> Mind you, I'd *love* to see a radical overhaul of traditional
>> multicore processors so they took the form of
>>    - a large number of processors
>>    - each with completely independent memory
>>    - connected by message passing fifos
>
> This sounds nice in theory, but in practice there can be problems.
> Scaling with number of processors can quickly become an issue here -
> lock-free algorithms and fifos work well between two processors, but
> scale badly with many processors.  Independent memory for each processor
> sounds nice, and can work well for some purposes, but is a poor
> structure for general-purpose computing.

I agree with all your points. Unfortunately they are equally
applicable to the current batch of SMP/NUMA architectures :(

A key point is the granularity of the computation and message
passing, and that varies radically between applications.

There are a large number of commercially important workloads
that would work well on such a system, ranging from embarrassingly
parallel problems such as soft real-time event proccessing, some
HPC, big data (think map-reduce).

But I agree it wouldn't be a significant benefit for
bog-standard desktop processing - but current machines
are more than sufficient for that anyway!

> If you want to scale well, you want hardware support for semaphores.
> And you don't want to divide things up by processor - you want to be
> able to divide them up by process or thread.  Threads should have
> independent memory areas, which they can access safely and quickly
> regardless of which cpu they are running on.  Otherwise you spend much
> of your bandwidth just moving data around between your cpu-dependent
> memory blocks (replacing the cache coherence problems with new memory
> movement bottlenecks), or your threads have to have very strong affinity
> to particular cpus and you lose your scaling.

I agree with all those points too.

Article: 155381
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 11:01:04 -0400
Links: << >> << T >> << A >>

On 6/24/2013 3:25 PM, Eric Wallin wrote:
> On Monday, June 24, 2013 12:07:23 AM UTC-4, rickman wrote:
>
>> I'm glad you can take (hopefully) constructive criticism.  I was
>> concerned when I wrote the above that it might be a bit too blunt.
>
> I apologize to everyone here, I kind of barged in and have behaved somewhat brashly.
>
>> ... part of the utility
>> of a design is the ease of programming efficiently.  I haven't looked at
>> yours yet, but just picturing the four stacks makes it seem pretty
>> simple... so far. :^)
>
> Writing a conventional stack machine in an HDL isn't too daunting, but programming it afterward, for me anyway, was just too much.
>
>> I have to say I'm not crazy about the large instruction word.  That is
>> one of the appealing things about MISC to me.  I work in very small
>> FPGAs and 16 bit instructions are better avoided if possible, but that
>> may be a red herring.  What matters is how many bytes a given program
>> uses, not how many bits are in an instruction.
>
> Yes.  Opcode space obviously expands exponentially with bit count, so one can get a lot more with a small size increase.  I think a 32 bit opcode is pushing it for a small FPGA implementation, but a 16 bit opcode gives one a couple of small operand indices, and some reasonably sized immediate instructions (data, conditional jumps, shifts, add) that I find I'm using quite a bit during the testing and verification phase.  Data plus operation in a single opcode is hard to beat for efficiency but it has to earn it's keep in the expanded opcode space.  With the operand indices you get a free copy/move with most single operand operations which is another efficiency.
>
>> I am supposed to present to the SVFIG and I think your design would be a
>> very interesting part of the presentation unless you think you would
>> rather present yourself.  I'm sure they would like to hear about it and
>> they likely would be interested in your opinions on MISC.  I know I am.
>
> I'm on the other coast so I most likely can't attend, but I would be most honored if you were to present it to SVFIG.

I was going to talk about the CPU design I had been working on, but I 
think it is going to be more of a survey of CPU designs for FPGAs ending 
with my spin on how to optimize a design.  Your implementation is very 
different from mine, but the hybrid register/stack approach is similar 
in intent and results from a similar line of thought.

Turns out I am busier in July than expected, so I will not be able to 
present at the July meeting.  I'll shoot for August.  I've been looking 
at their stuff on the web and they do a pretty good job.  I was thinking 
it was a local group and it would be a small audience, but I think it 
may be a lot bigger when the web is considered.

-- 

Rick

Article: 155382
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 11:11:34 -0400
Links: << >> << T >> << A >>

On 6/24/2013 7:17 PM, Tom Gardner wrote:
> Bakul Shah wrote:
>> On 6/24/13 3:23 PM, Eric Wallin wrote:
>>> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote:
>>>
>>>> Consider a case where *both* thread A and B want to increment
>>>> a counter at location X? A reads X and finds it contains 10. But
>>>> before it can write back 11, B reads X and finds 10 and it too
>>>> writes back 11. Now you've lost a count. Can this happen in your
>>>> design? If so you need some sort of atomic update instruction.
>>>
>>> It can happen if the programmer is crazy enough to do it, otherwise not.
>>
>> Concurrent threads need to communicate with each other to cooperate
>> on some common task. Consider two threads adding an item to a linked
>> list or keeping statistics on some events or many such things. You
>> are pretty much required to be "crazy enough"! Any support for mutex
>> would simplify things quite a bit. Without atomic update you have to
>> use some complicated, inefficient algorithm to implement mutexes.
>
> Just so.
>
> A programmer that doesn't understand that is the equivalent
> of a hardware engineer that doesn't under stand metastability.
> (When I started out most people denied the possibility of
> synchronisation failure due to metastability!)
>
> Mind you, I'd *love* to see a radical overhaul of traditional
> multicore processors so they took the form of
> - a large number of processors
> - each with completely independent memory
> - connected by message passing fifos
>
> In the long term that'll be the only way we can continue
> to scale individual machines: SMP scales for a while, but
> then cache coherence requirements kill performance.

The *only* way?  lol  You think like a programmer.  The big assumption 
you are making that is no longer valid is that the processor itself is a 
precious resource that must be optimized.  That is no longer valid. 
When x86 and ARM machines put four cores on a chip with one memory 
interface they are choking the CPU's airway.  Those designs are no 
longer efficient and the processor is underused.  So clearly it is not 
the precious resource anymore.

Rather than trying to optimize the utilization of the CPU, design needs 
to proceed with the recognition of the limits of multiprocessors.  Treat 
processors the same way you treat peripheral functions.  Dedicate them 
to tasks.  Let them have a job to do and not worry if they are idle part 
of the time.  This results in totally different designs and can result 
in faster, lower cost and lower power systems.

-- 

Rick

Article: 155383
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 11:13:48 -0400
Links: << >> << T >> << A >>

On 6/24/2013 7:00 PM, Bakul Shah wrote:
> On 6/24/13 3:23 PM, Eric Wallin wrote:
>> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote:
>>
>>> Consider a case where *both* thread A and B want to increment
>>> a counter at location X? A reads X and finds it contains 10. But
>>> before it can write back 11, B reads X and finds 10 and it too
>>> writes back 11. Now you've lost a count. Can this happen in your
>>> design? If so you need some sort of atomic update instruction.
>>
>> It can happen if the programmer is crazy enough to do it, otherwise not.
>
> Concurrent threads need to communicate with each other to cooperate
> on some common task. Consider two threads adding an item to a linked
> list or keeping statistics on some events or many such things. You
> are pretty much required to be "crazy enough"! Any support for mutex
> would simplify things quite a bit. Without atomic update you have to
> use some complicated, inefficient algorithm to implement mutexes.

What assumptions is this based on?  Do you know?

What are the alternatives to "mutexes"?  How inefficient are they?  When 
do you need to use a mutex?

Have you looked at Eric's design in the least?  Do you have any idea of 
the applications it is targeted to?

-- 

Rick

Article: 155384
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Tue, 25 Jun 2013 16:14:52 +0100
Links: << >> << T >> << A >>

rickman wrote:

> Not sure what you mean by "machine cycle".

I mean it in the same sense as it was used in the posting
that I replied to.

> I believe Eric's point is that the thing that prevents more than one processor from accessing the same memory location is the programmer.  Is that not a good enough method?

I'd prefer it if Eric gave the correct answer rather than
someone else's possibly correct answer.

It is a good enough method for some things, and not for others.

> If you still have reservations, then learn about the design.  If you don't want to invest the time to learn about the design, why are you bothering to object to it?

There are *many* new designs which might be interesting.
Nobody has time to look at them all so they make fast
decisions as to whether to design and designed is credible.

I'm not objecting to it, but I am giving the designer the
opportunity to pass the "elevator pitch" test.

Article: 155385
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 11:16:01 -0400
Links: << >> << T >> << A >>

On 6/24/2013 5:30 PM, Eric Wallin wrote:
> Verilog code for my Hive processor is now up:
>
> http://opencores.org/project,hive
>
> (Took me most of the freaking day to figure out SVN.)

You mean you actually figured it out?

-- 

Rick

Article: 155386
Subject: Re: FPGA Exchange
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 11:19:30 -0400
Links: << >> << T >> << A >>

On 6/25/2013 6:00 AM, Guy Eschemann wrote:
> Hello RCIngham,
>
> sorry to hear that. FPGA Exchange is built on a platform that makes heavy use of JavaScript, so the minimum browser requirements are quite high:
>
> - Internet Explorer 10+
> - Google Chrome 24+
> - Firefox 14+
> - Safari 5+
>
> As the aim is to create a discussion forum for the next decade of programmable logic, we unfortunately cannot support older browsers.

LOL, I didn't know anyone created programmable logic with a browser. 
You might want to rethink your approach.  There are a lot of people who 
don't control the computers they work on.  Do you really want to exclude 
a significant portion of your potential audience?

BTW, I don't think you ever responded to the post that asked why you are 
announcing this here which would have the effect of splitting the 
community.  This group is barely alive these days.  Your site may kill 
it off.

-- 

Rick

Article: 155387
Subject: Re: New soft processor core paper publisher?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Tue, 25 Jun 2013 16:23:35 +0100
Links: << >> << T >> << A >>

rickman wrote:
> On 6/24/2013 7:17 PM, Tom Gardner wrote:
>> Bakul Shah wrote:
>>> On 6/24/13 3:23 PM, Eric Wallin wrote:
>>>> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote:
>>>>
>>>>> Consider a case where *both* thread A and B want to increment
>>>>> a counter at location X? A reads X and finds it contains 10. But
>>>>> before it can write back 11, B reads X and finds 10 and it too
>>>>> writes back 11. Now you've lost a count. Can this happen in your
>>>>> design? If so you need some sort of atomic update instruction.
>>>>
>>>> It can happen if the programmer is crazy enough to do it, otherwise not.
>>>
>>> Concurrent threads need to communicate with each other to cooperate
>>> on some common task. Consider two threads adding an item to a linked
>>> list or keeping statistics on some events or many such things. You
>>> are pretty much required to be "crazy enough"! Any support for mutex
>>> would simplify things quite a bit. Without atomic update you have to
>>> use some complicated, inefficient algorithm to implement mutexes.
>>
>> Just so.
>>
>> A programmer that doesn't understand that is the equivalent
>> of a hardware engineer that doesn't under stand metastability.
>> (When I started out most people denied the possibility of
>> synchronisation failure due to metastability!)
>>
>> Mind you, I'd *love* to see a radical overhaul of traditional
>> multicore processors so they took the form of
>> - a large number of processors
>> - each with completely independent memory
>> - connected by message passing fifos
>>
>> In the long term that'll be the only way we can continue
>> to scale individual machines: SMP scales for a while, but
>> then cache coherence requirements kill performance.
>
> The *only* way?  lol  You think like a programmer.  The big assumption you are making that is no longer valid is that the processor itself is a precious resource that must be optimized.  That is no
> longer valid. When x86 and ARM machines put four cores on a chip with one memory interface they are choking the CPU's airway.  Those designs are no longer efficient and the processor is underused.  So
> clearly it is not the precious resource anymore.

I don't think that and your statements don't follow from my comments.


> Rather than trying to optimize the utilization of the CPU, design needs to proceed with the recognition of the limits of multiprocessors.  Treat processors the same way you treat peripheral
> functions.  Dedicate them to tasks.  Let them have a job to do and not worry if they are idle part of the time.  This results in totally different designs and can result in faster, lower cost and
> lower power systems.

That approach is valuable when and where it works, but can
be impractical for many workloads.

Article: 155388
Subject: Re: Ask about finding maximum and second's maximum number in array
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 11:48:32 -0400
Links: << >> << T >> << A >>

On 6/23/2013 5:10 PM, Richard Damon wrote:
> On 6/19/13 9:39 PM, rickman wrote:
>> On 6/19/2013 11:40 AM, jonesandy@comcast.net wrote:
>>> To borrow Gabor's card game analogy...
>>>
>>> You have two stacks, (highest and 2nd highest)
>>>
>>> If the drawn card is same or higher than the highest stack, then
>>>
>>>     move the top card from the highest stack to the 2nd highest stack,
>>>     move the drawn card to the highest stack.
>>>
>>> else if the drawn card is same or higher than the 2nd highest stack, then
>>>
>>>     move the drawn card to the 2nd highest stack.
>>>
>>> draw another card and repeat.
>>
>> They don't need to be stacks.  You just need to have two holding spots
>> (registers) and initialize them to something less than anything you will
>> have on the input.  Then on each draw of a card (or sample on the input)
>> you compare to both spots, if the input is higher than the "highest"
>> spot you save it there and put the old highest on the "second highest"
>> spot.  If not, but it is higher than the "second highest" you put it there.
>>
>> Gabor was using a stack because he thought it would get him both the
>> highest and the second highest with one compare operation, but it didn't
>> work.  Two compares are needed for each input.
>>
>> In your approach your compare is "higher or same", why do you need to do
>> anything if they are the same?  Not that it is a big deal, but in some
>> situations this could require extra work.
>>
> You actually only need to compare most of the entries to the second
> highest register, if it isn't higher, than you can discard the item.
> Only if it is higher than the second highest, do you need to compare it
> to the highest to see if the new item goes into the highest or second
> highest.
> I.E.
>
> Compare drawn card to 2nd highest stack, if not higher, discard and repeat
> if higher (same doesn't really matter), discard the 2nd highest stack
> and compare the new card to the highest stack.
>
> if not higher, new card goes into 2nd highest stack, if higher, item in
> highest goes to 2nd highest, and new goes to highest.

This is being done in hardware not software.  Your description is 
sequential while the hardware is concurrent.  The control logic is 
simple if you just code it in a simple way.  Do both compares and you 
get two bits as a result.  Then load the max and second max registers 
based on those two compare results.

The only fly in the ointment that I see is the initial condition.  You 
can either initialize the two registers to values which you know will 
always be the min values possible, or you can have a flag for the first 
clock cycle that loads both registers with the first value read.  I 
think the initial state flag would be the simplest.  So the register 
control logic has a third input bit and of course an enable from the 10 
counter.

-- 

Rick

Article: 155389
Subject: Re: FPGA Exchange
From: Guy Eschemann <Guy.Eschemann@gmail.com>
Date: Tue, 25 Jun 2013 09:07:09 -0700 (PDT)
Links: << >> << T >> << A >>

Hello Rick,

I'm not happy about the fact that some people can't access the forum becaus=
e their IT department doesn't allow modern browsers. But I guess I have to =
live with this limitation for now. With time, even conservative IT departme=
nts will have to upgrade, if only for security reasons.

This is a honest attempt at creating a friendly, vendor-independent discuss=
ion space where FPGA developers can share their knowledge. A bit like comp.=
arch.fpga was 15 years ago. People are moving away from newsgroups anyway, =
so I'd rather have them join FPGA Exchange than some random LinkedIn group.

Guy.

On Tuesday, June 25, 2013 5:19:30 PM UTC+2, rickman wrote:
>
> LOL, I didn't know anyone created programmable logic with a browser.=20
>=20
> You might want to rethink your approach.  There are a lot of people who=
=20
>=20
> don't control the computers they work on.  Do you really want to exclude=
=20
>=20
> a significant portion of your potential audience?
>=20
>=20
>=20
> BTW, I don't think you ever responded to the post that asked why you are=
=20
>=20
> announcing this here which would have the effect of splitting the=20
>=20
> community.  This group is barely alive these days.  Your site may kill=20
>=20
> it off.
>=20
>=20
>=20
> --=20
>=20
>=20
>=20
> Rick

Article: 155390
Subject: Re: Pure HDL Xilinx Zynq Arm Instantiation
From: muzaffer.kal@gmail.com
Date: Tue, 25 Jun 2013 09:41:05 -0700 (PDT)
Links: << >> << T >> << A >>

On Monday, June 24, 2013 3:03:15 AM UTC-7, peter dudley wrote:
> Hello All,
> I am wondering if it is possible to use a more conventional approach to b=
uilding hardware and connecting it to the AXI bus of the ARM processor.  I =
greatly prefer to directly instantiate components in my HDL code.  I find s=
trait HDL development easier to maintain in the long run and less sensitive=
 to changes in FPGA compiler tools.
> Has anyone on this group succeeded in going around the PlanAhead/XPS grap=
hical flow for building systems for the Zynq ARM?

It is definitely possible but not trivial. What you need is the NGC file an=
d the instantiation model for the zynq interface. This you can get from XPS=
. Create an empty project with only a zynq in it to get these two files. Th=
en you can instantiate the model (which has no content) and make connection=
s to the +3300 nets. During implementation point to the ngc file and you ar=
e done.

Article: 155391
Subject: Re: FPGA Exchange
From: HT-Lab <hans64@htminuslab.com>
Date: Tue, 25 Jun 2013 17:56:43 +0100
Links: << >> << T >> << A >>

On 25/06/2013 17:07, Guy Eschemann wrote:
> Hello Rick,
>
> I'm not happy about the fact that some people can't access the forum because their IT department doesn't allow modern browsers. But I guess I have to live with this limitation for now. With time, even conservative IT departments will have to upgrade, if only for security reasons.
>
> This is a honest attempt at creating a friendly, vendor-independent discussion space where FPGA developers can share their knowledge. A bit like comp.arch.fpga was 15 years ago. People are moving away from newsgroups anyway, so I'd rather have them join FPGA Exchange than some random LinkedIn group.
>
> Guy.

Hi Guy,

I am not sure how long you have been using usenet but this forum has 
been a vendor/company-independent friendly (especially compared to some 
of the other forums I read) forum since I started to use it a few 
decades ago.

I agree with Rick and Uwe that there is really no need for another FPGA 
forum especially one which is controlled by a single person (right?).

I understand why Vendors are doing it as it increases traffic to their 
website and gives them a better marketing tool but FPGA Exchange seems 
to be somewhat decoupled from your noasic one, so I am not sure why you 
decided to spend the time and effort to set it up.

Anyway, good luck with your FPGA consultancy firm,

Regards,
Hans
www.ht-lab.com

>
>
>
> On Tuesday, June 25, 2013 5:19:30 PM UTC+2, rickman wrote:
>>
>> LOL, I didn't know anyone created programmable logic with a browser.
>>
>> You might want to rethink your approach.  There are a lot of people who
>>
>> don't control the computers they work on.  Do you really want to exclude
>>
>> a significant portion of your potential audience?
>>
>>
>>
>> BTW, I don't think you ever responded to the post that asked why you are
>>
>> announcing this here which would have the effect of splitting the
>>
>> community.  This group is barely alive these days.  Your site may kill
>>
>> it off.
>>
>>
>>
>> --
>>
>>
>>
>> Rick
>

Article: 155392
Subject: Re: New soft processor core paper publisher?
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 25 Jun 2013 17:14:57 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:
> On 6/24/2013 12:56 PM, Tom Gardner wrote:

(snip)
>> Is there anything to prevent multiple cores reading/writing the
>> same memory location in the same machine cycle? What is the
>> result when that happens?

> Not sure what you mean by "machine cycle".  As I said above, there are 8 
> clocks to the processor machine cycle, but they are all out of phase. 
> So on any given clock cycle only one processor will be updating 
> registers or memory.

If there 8 processors that never communicate, it would be better
to have 8 separate RAM units. 

> I believe Eric's point is that the thing that prevents more than one 
> processor from accessing the same memory location is the programmer.  Is 
> that not a good enough method?

So no thread ever communicates with another one?

Well, read the wikipedia article on spinlock and the linked-to
article Peterson's_Algorithm. 

It is more efficient if you have an interlocked write, but can be
done with spinlocks, if there is no reordering of writes to memory.

As many processors now do reorder writes, there is need for special
instructions.

Otherwise, spinlocks might be good enough.

-- glen

Article: 155393
Subject: Re: FPGA Exchange
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 13:43:45 -0400
Links: << >> << T >> << A >>

On 6/25/2013 12:07 PM, Guy Eschemann wrote:
> Hello Rick,
>
> I'm not happy about the fact that some people can't access the forum because their IT department doesn't allow modern browsers. But I guess I have to live with this limitation for now. With time, even conservative IT departments will have to upgrade, if only for security reasons.

I take issue at your use of the term "modern".  Chrome 24 was released 
only 8 months ago, Internet Explorer 7 months ago, Firefox 14 a year ago.

But the site is yours to run as you see fit.

> This is a honest attempt at creating a friendly, vendor-independent discussion space where FPGA developers can share their knowledge. A bit like comp.arch.fpga was 15 years ago. People are moving away from newsgroups anyway, so I'd rather have them join FPGA Exchange than some random LinkedIn group.

Again, you have an interesting way of characterizing the other 
discussion forums.  Not many here would agree with you and it is a bit 
offputting for you to imply the other groups that we like to be somehow 
unfit.  I'm sure you prefer the folks use your site.  I would too if I 
had started a web site.

-- 

Rick

Article: 155394
Subject: Re: New soft processor core paper publisher?
From: rickman <gnuarm@gmail.com>
Date: Tue, 25 Jun 2013 14:06:40 -0400
Links: << >> << T >> << A >>

On 6/25/2013 1:14 PM, glen herrmannsfeldt wrote:
> rickman<gnuarm@gmail.com>  wrote:
>> On 6/24/2013 12:56 PM, Tom Gardner wrote:
>
> (snip)
>>> Is there anything to prevent multiple cores reading/writing the
>>> same memory location in the same machine cycle? What is the
>>> result when that happens?
>
>> Not sure what you mean by "machine cycle".  As I said above, there are 8
>> clocks to the processor machine cycle, but they are all out of phase.
>> So on any given clock cycle only one processor will be updating
>> registers or memory.
>
> If there 8 processors that never communicate, it would be better
> to have 8 separate RAM units.

Why is that?  What would be "better" about it?


>> I believe Eric's point is that the thing that prevents more than one
>> processor from accessing the same memory location is the programmer.  Is
>> that not a good enough method?
>
> So no thread ever communicates with another one?
>
> Well, read the wikipedia article on spinlock and the linked-to
> article Peterson's_Algorithm.
>
> It is more efficient if you have an interlocked write, but can be
> done with spinlocks, if there is no reordering of writes to memory.
>
> As many processors now do reorder writes, there is need for special
> instructions.

Are we talking about the same thing here?  We were talking about the 
Hive processor.


> Otherwise, spinlocks might be good enough.

So your point is?

What would the critical section of code be doing that is critical? 
Simple interprocess communications is not necessarily "critical".

-- 

Rick

Article: 155395
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Tue, 25 Jun 2013 11:07:59 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, June 25, 2013 1:14:57 PM UTC-4, glen herrmannsfeldt wrote:

> So no thread ever communicates with another one?

All threads share the same Von Neumann memory, so of course they can communicate with each other.

If only there were a paper somewhere, written by the designer, freely available to anyone on the web...

Article: 155396
Subject: Re: New soft processor core paper publisher?
From: Eric Wallin <tammie.eric@gmail.com>
Date: Tue, 25 Jun 2013 11:18:45 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, June 25, 2013 11:14:52 AM UTC-4, Tom Gardner wrote:

> > I believe Eric's point is that the thing that prevents more than one pr=
ocessor from accessing the same memory location is the programmer.  Is that=
 not a good enough method?
>
> I'd prefer it if Eric gave the correct answer rather than
> someone else's possibly correct answer.

If Rick says anything wrong I'll correct him.

> I'm not objecting to it, but I am giving the designer the
> opportunity to pass the "elevator pitch" test.

The paper has bulleted feature list at the very front and a downsides bulle=
ted list at the very back.  I tried to write it in an accessible manner for=
 the widest audience.  We all like to think aloud now and then, but I'd thi=
nk a comprehensive design paper would sidestep all of this wild speculation=
 and unnecessary third degree.

http://opencores.org/usercontent,doc,1371986749

Article: 155397
Subject: Re: New soft processor core paper publisher?
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 25 Jun 2013 19:02:25 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:

(snip, someone wrote)
>>> Not sure what you mean by "machine cycle".  As I said above, there are 8
>>> clocks to the processor machine cycle, but they are all out of phase.
>>> So on any given clock cycle only one processor will be updating
>>> registers or memory.

(then I wrote)
>> If there 8 processors that never communicate, it would be better
>> to have 8 separate RAM units.

> Why is that?  What would be "better" about it?

Well, if the RAM really is fast enough not to be the in the
critical path, then maybe not, but separate RAM means no access
limitations.

>>> I believe Eric's point is that the thing that prevents more than one
>>> processor from accessing the same memory location is the programmer.  Is
>>> that not a good enough method?

>> So no thread ever communicates with another one?

>> Well, read the wikipedia article on spinlock and the linked-to
>> article Peterson's_Algorithm.

>> It is more efficient if you have an interlocked write, but can be
>> done with spinlocks, if there is no reordering of writes to memory.

>> As many processors now do reorder writes, there is need for special
>> instructions.

> Are we talking about the same thing here?  We were talking about the 
> Hive processor.

I was mentioning it for context. For processor that do reorder
writes, you can't use Peterson's algorithm. 

>> Otherwise, spinlocks might be good enough.

> So your point is?

Without write reordering, it is possible, though maybe not
efficient, to communicate without interlocked writes.

> What would the critical section of code be doing that is critical? 
> Simple interprocess communications is not necessarily "critical".

"Critical" means that the messages won't get lost due to other
threads writing at about the same time. Now, much of networking
is based on unreliable "best effort" protocols, and that may also
work for communications to threads. But that involves delays and
retransmission after timers expire. 

-- glen

Article: 155398
Subject: Re: Pure HDL Xilinx Zynq Arm Instantiation
From: gtwrek@sonic.net (Mark Curry)
Date: Tue, 25 Jun 2013 20:06:18 +0000 (UTC)
Links: << >> << T >> << A >>

In article <3a29b759-dd7a-4b12-9f7c-83608402c247@googlegroups.com>,
peter dudley  <padudle@gmail.com> wrote:
>Hello All,
>
>I have a Xilinx Zynq development board and I am starting to teach
>myself to build systems for Zynq.  The recommended flow described
>in UG873 is a very long sequence of graphical menu clicks, 
>pull-downs and forms. 
>
>The tools then produce a great deal of machine generated code.
>
>I am wondering if it is possible to use a more conventional approach
>to building hardware and connecting it to the AXI bus of the ARM 
>processor.  I greatly prefer to directly instantiate components in 
>my HDL code.  I find strait HDL development easier to maintain in the
> long run and less sensitive to changes in FPGA compiler tools.
>
>Has anyone on this group succeeded in going around the PlanAhead/XPS 
>graphical flow for building systems for the Zynq ARM?
>
>Any advice or opinions are appreciated.

Similar question was just posted on the Xilinx forums - I'll say here
what I did there:

We've done this for all of our designs in Xilinx involving a processor 
(PPC405, PPC440, microblaze) and expect to do the same in the future
for ARM based designs.

Use a bare-miniumum XPS flow - often just using one of the Xilinx 
examples - something wih just small block RAM for boot, maybe a UART, 
and not much else.

Generate the netlist, and then never look back at XPS - all ISE and 
makefile (or Vivado and TCL ) from then on..  The original netlist
is used as a reference and modified as needed.

Xilinx strongly discourages this flow.  But it's worked great for us 
for many years.

It's nice to hear that others in this thread have the same frustrations,
and have done similar things to workaround.  I've never been 
sure if we were alone with our unhappiness of EDK/XPS/whatever they're
calling it now.  It's basically a BAD schematic capture tool - if it
were easier to use it'd be state of the art for the mid 80s....

You're basically describing a netlist along with parameter (generic)
settings.  HDL is perfect for this, no need for MHS and other cruft.
Add some assertions, and/or connectivity checks in ngdbuild or somewhere,
and be done with it.  No MHS, no PAO, no BMM, no XCO - stop inventing 
new / poorly defined languages / etc - when exiting standard solutions 
exists.

Ok - got a little <ranty> there... I'm done.

--Mark

Article: 155399
Subject: Re: New soft processor core paper publisher?
From: Bakul Shah <usenet@bitblocks.com>
Date: Tue, 25 Jun 2013 13:23:29 -0700
Links: << >> << T >> << A >>

On 6/25/13 11:18 AM, Eric Wallin wrote:
> On Tuesday, June 25, 2013 11:14:52 AM UTC-4, Tom Gardner wrote:
>
>>> I believe Eric's point is that the thing that prevents more than one processor from accessing the same memory location is the programmer.  Is that not a good enough method?

This is not good enough in general. I gave some examples where threads
have to read/write the same memory location.

I agree with you that if threads communicate just through fifos
and there is exactly one reader and one writer there is no problem.
The reader updates the read ptr & watches but doesn't update the
write ptr. The writer updates the write ptr & watches but doesn't
update the read ptr. You can use fifos like these to implement a
mutex but this is a very expensive way to implement mutexes and
doesn't scale.

> The paper has bulleted feature list at the very front and a downsides bulleted list at the very back.  I tried to write it in an accessible manner for the widest audience.  We all like to think aloud now and then, but I'd think a comprehensive design paper would sidestep all of this wild speculation and unnecessary third degree.

I don't think it is a question of "third degree". You did invite
feedback!

Adding compare-and-swap or load-linked & store-conditional would
make your processor more useful for parallel programming. I am not
motivated enough to go through 4500+ lines of verilog to know how
hard that is but you must already have some bus arbitration logic
since all 8 threads can access memory.

> http://opencores.org/usercontent,doc,1371986749

I missed this link before. A nicely done document! A top level
diagram would be helpful. 64K address space seems too small.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search