Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On 6/24/13 3:23 PM, Eric Wallin wrote: > On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote: > >> Consider a case where *both* thread A and B want to increment >> a counter at location X? A reads X and finds it contains 10. But >> before it can write back 11, B reads X and finds 10 and it too >> writes back 11. Now you've lost a count. Can this happen in your >> design? If so you need some sort of atomic update instruction. > > It can happen if the programmer is crazy enough to do it, otherwise not. Concurrent threads need to communicate with each other to cooperate on some common task. Consider two threads adding an item to a linked list or keeping statistics on some events or many such things. You are pretty much required to be "crazy enough"! Any support for mutex would simplify things quite a bit. Without atomic update you have to use some complicated, inefficient algorithm to implement mutexes.Article: 155376
Bakul Shah wrote: > On 6/24/13 3:23 PM, Eric Wallin wrote: >> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote: >> >>> Consider a case where *both* thread A and B want to increment >>> a counter at location X? A reads X and finds it contains 10. But >>> before it can write back 11, B reads X and finds 10 and it too >>> writes back 11. Now you've lost a count. Can this happen in your >>> design? If so you need some sort of atomic update instruction. >> >> It can happen if the programmer is crazy enough to do it, otherwise not. > > Concurrent threads need to communicate with each other to cooperate > on some common task. Consider two threads adding an item to a linked > list or keeping statistics on some events or many such things. You > are pretty much required to be "crazy enough"! Any support for mutex > would simplify things quite a bit. Without atomic update you have to > use some complicated, inefficient algorithm to implement mutexes. Just so. A programmer that doesn't understand that is the equivalent of a hardware engineer that doesn't under stand metastability. (When I started out most people denied the possibility of synchronisation failure due to metastability!) Mind you, I'd *love* to see a radical overhaul of traditional multicore processors so they took the form of - a large number of processors - each with completely independent memory - connected by message passing fifos In the long term that'll be the only way we can continue to scale individual machines: SMP scales for a while, but then cache coherence requirements kill performance.Article: 155377
On 25/06/13 01:17, Tom Gardner wrote: > Bakul Shah wrote: >> On 6/24/13 3:23 PM, Eric Wallin wrote: >>> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote: >>> >>>> Consider a case where *both* thread A and B want to increment >>>> a counter at location X? A reads X and finds it contains 10. But >>>> before it can write back 11, B reads X and finds 10 and it too >>>> writes back 11. Now you've lost a count. Can this happen in your >>>> design? If so you need some sort of atomic update instruction. >>> >>> It can happen if the programmer is crazy enough to do it, otherwise not. >> >> Concurrent threads need to communicate with each other to cooperate >> on some common task. Consider two threads adding an item to a linked >> list or keeping statistics on some events or many such things. You >> are pretty much required to be "crazy enough"! Any support for mutex >> would simplify things quite a bit. Without atomic update you have to >> use some complicated, inefficient algorithm to implement mutexes. > > Just so. > > A programmer that doesn't understand that is the equivalent > of a hardware engineer that doesn't under stand metastability. > (When I started out most people denied the possibility of > synchronisation failure due to metastability!) > > Mind you, I'd *love* to see a radical overhaul of traditional > multicore processors so they took the form of > - a large number of processors > - each with completely independent memory > - connected by message passing fifos This sounds nice in theory, but in practice there can be problems. Scaling with number of processors can quickly become an issue here - lock-free algorithms and fifos work well between two processors, but scale badly with many processors. Independent memory for each processor sounds nice, and can work well for some purposes, but is a poor structure for general-purpose computing. If you want to scale well, you want hardware support for semaphores. And you don't want to divide things up by processor - you want to be able to divide them up by process or thread. Threads should have independent memory areas, which they can access safely and quickly regardless of which cpu they are running on. Otherwise you spend much of your bandwidth just moving data around between your cpu-dependent memory blocks (replacing the cache coherence problems with new memory movement bottlenecks), or your threads have to have very strong affinity to particular cpus and you lose your scaling. > > In the long term that'll be the only way we can continue > to scale individual machines: SMP scales for a while, but > then cache coherence requirements kill performance.Article: 155378
Hello RCIngham, sorry to hear that. FPGA Exchange is built on a platform that makes heavy u= se of JavaScript, so the minimum browser requirements are quite high: - Internet Explorer 10+ - Google Chrome 24+ - Firefox 14+ - Safari 5+ As the aim is to create a discussion forum for the next decade of programma= ble logic, we unfortunately cannot support older browsers. Guy Eschemann Ingenieurb=FCro ESCHEMANN Am Sandfeld 17a 76149 Karlsruhe, Germany Tel.: +49 (0) 721 170 293 89 Fax: +49 (0) 721 170 293 89 - 9 Guy.Eschemann@gmail.com Follow me on Twitter: @geschema http://noasic.com NEW: http://fpga-exchange.com http://fpga-news.deArticle: 155379
On 6/24/2013 12:56 PM, Tom Gardner wrote: > rickman wrote: >> On 6/24/2013 11:57 AM, Eric Wallin wrote: >>> On Monday, June 24, 2013 9:47:28 AM UTC-4, Tom Gardner wrote: >>> >>>> Please explain why your processor does not need test and set or >>>> compare and exchange operations. What theoretical advance have you >>>> made? >>> >>> I'm not exactly sure why we're having this generalized, theoretical >>> discussion when a simple reading the design document I've provided >>> would probably answer your questions. If it doesn't then >>> perhaps you could tell me what I left out, and I might include that >>> info in the next rev. Not trying to be gruff or anything, I'd very >>> much like the document (and processor) to be on as solid a >>> footing as possible. >> >> Eric, I think you have explained properly how your design will deal >> with synchronization. I'm not sure what Tom is going on about. Clearly >> he doesn't understand your design. > > Correct. I'm glad you understand that. >> If it is of any help, Eric's design is more like 8 cores running in >> parallel, time sharing memory and in fact, the same processor hardware >> on a machine cycle basis >> (so no 8 ported memory required). > > Fair enough; sounds like it is in the same area as the propellor chip. No point in making such a comparison. If you want to understand Eric's chip, then learn about Eric's chip. I certainly don't know enough about the Propeller chip to compare in a meaningful manner. Just think of each processor executing one instruction every 8 clocks, but all processors are out of phase, so no one completes on the same clock. > Is there anything to prevent multiple cores reading/writing the > same memory location in the same machine cycle? What is the > result when that happens? Not sure what you mean by "machine cycle". As I said above, there are 8 clocks to the processor machine cycle, but they are all out of phase. So on any given clock cycle only one processor will be updating registers or memory. I believe Eric's point is that the thing that prevents more than one processor from accessing the same memory location is the programmer. Is that not a good enough method? >> If an interrupt occurs it doesn't cause one of the other 7 tasks to >> run, they are already running, it simply invokes the interrupt >> handler. I believe Eric is not envisioning multiple tasks on a >> single processor. > > Such presumptions would be useful to have in the white paper. Have you read the paper? How do you know its not there? >> As others have pointed out, test and set instructions are not required >> to support concurrency and communications. They are certainly nice to >> have, but are not essential. > > Agreed. I'm perfectly prepared to accept alternative techniques, > e.g. disable interrupts. Ok, so is this discussion over? >> In your case they would be superfluous. > > Not proven to me. > > The trouble is I've seen too many hardware designs that > leave the awkward problems to software - especially first > efforts by small teams. > > And too often those problems can be very difficult to solve > in software. Nowadays it is hard to find people that have > sufficient experience across the whole hardware/firmware/system > software spectrum to enable them to avoid such traps. > > I don't know whether Eric is such a person, but I'm afraid > his answers have raised orange flags in my mind. > > As a point of reference, I had similar misgivings when I > first heard about the Itanium's architecture in, IIRC, > 1994. I suppressed them because the people involved were > undoubtedly more skilled in the area that I, and had been > working for 5 years. Much later I regrettably came to the > conclusion the orange flags were too optimistic. If you still have reservations, then learn about the design. If you don't want to invest the time to learn about the design, why are you bothering to object to it? -- RickArticle: 155380
David Brown wrote: > On 25/06/13 01:17, Tom Gardner wrote: >> Bakul Shah wrote: >>> On 6/24/13 3:23 PM, Eric Wallin wrote: >>>> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote: >>>> >>>>> Consider a case where *both* thread A and B want to increment >>>>> a counter at location X? A reads X and finds it contains 10. But >>>>> before it can write back 11, B reads X and finds 10 and it too >>>>> writes back 11. Now you've lost a count. Can this happen in your >>>>> design? If so you need some sort of atomic update instruction. >>>> >>>> It can happen if the programmer is crazy enough to do it, otherwise not. >>> >>> Concurrent threads need to communicate with each other to cooperate >>> on some common task. Consider two threads adding an item to a linked >>> list or keeping statistics on some events or many such things. You >>> are pretty much required to be "crazy enough"! Any support for mutex >>> would simplify things quite a bit. Without atomic update you have to >>> use some complicated, inefficient algorithm to implement mutexes. >> >> Just so. >> >> A programmer that doesn't understand that is the equivalent >> of a hardware engineer that doesn't under stand metastability. >> (When I started out most people denied the possibility of >> synchronisation failure due to metastability!) >> >> Mind you, I'd *love* to see a radical overhaul of traditional >> multicore processors so they took the form of >> - a large number of processors >> - each with completely independent memory >> - connected by message passing fifos > > This sounds nice in theory, but in practice there can be problems. > Scaling with number of processors can quickly become an issue here - > lock-free algorithms and fifos work well between two processors, but > scale badly with many processors. Independent memory for each processor > sounds nice, and can work well for some purposes, but is a poor > structure for general-purpose computing. I agree with all your points. Unfortunately they are equally applicable to the current batch of SMP/NUMA architectures :( A key point is the granularity of the computation and message passing, and that varies radically between applications. There are a large number of commercially important workloads that would work well on such a system, ranging from embarrassingly parallel problems such as soft real-time event proccessing, some HPC, big data (think map-reduce). But I agree it wouldn't be a significant benefit for bog-standard desktop processing - but current machines are more than sufficient for that anyway! > If you want to scale well, you want hardware support for semaphores. > And you don't want to divide things up by processor - you want to be > able to divide them up by process or thread. Threads should have > independent memory areas, which they can access safely and quickly > regardless of which cpu they are running on. Otherwise you spend much > of your bandwidth just moving data around between your cpu-dependent > memory blocks (replacing the cache coherence problems with new memory > movement bottlenecks), or your threads have to have very strong affinity > to particular cpus and you lose your scaling. I agree with all those points too.Article: 155381
On 6/24/2013 3:25 PM, Eric Wallin wrote: > On Monday, June 24, 2013 12:07:23 AM UTC-4, rickman wrote: > >> I'm glad you can take (hopefully) constructive criticism. I was >> concerned when I wrote the above that it might be a bit too blunt. > > I apologize to everyone here, I kind of barged in and have behaved somewhat brashly. > >> ... part of the utility >> of a design is the ease of programming efficiently. I haven't looked at >> yours yet, but just picturing the four stacks makes it seem pretty >> simple... so far. :^) > > Writing a conventional stack machine in an HDL isn't too daunting, but programming it afterward, for me anyway, was just too much. > >> I have to say I'm not crazy about the large instruction word. That is >> one of the appealing things about MISC to me. I work in very small >> FPGAs and 16 bit instructions are better avoided if possible, but that >> may be a red herring. What matters is how many bytes a given program >> uses, not how many bits are in an instruction. > > Yes. Opcode space obviously expands exponentially with bit count, so one can get a lot more with a small size increase. I think a 32 bit opcode is pushing it for a small FPGA implementation, but a 16 bit opcode gives one a couple of small operand indices, and some reasonably sized immediate instructions (data, conditional jumps, shifts, add) that I find I'm using quite a bit during the testing and verification phase. Data plus operation in a single opcode is hard to beat for efficiency but it has to earn it's keep in the expanded opcode space. With the operand indices you get a free copy/move with most single operand operations which is another efficiency. > >> I am supposed to present to the SVFIG and I think your design would be a >> very interesting part of the presentation unless you think you would >> rather present yourself. I'm sure they would like to hear about it and >> they likely would be interested in your opinions on MISC. I know I am. > > I'm on the other coast so I most likely can't attend, but I would be most honored if you were to present it to SVFIG. I was going to talk about the CPU design I had been working on, but I think it is going to be more of a survey of CPU designs for FPGAs ending with my spin on how to optimize a design. Your implementation is very different from mine, but the hybrid register/stack approach is similar in intent and results from a similar line of thought. Turns out I am busier in July than expected, so I will not be able to present at the July meeting. I'll shoot for August. I've been looking at their stuff on the web and they do a pretty good job. I was thinking it was a local group and it would be a small audience, but I think it may be a lot bigger when the web is considered. -- RickArticle: 155382
On 6/24/2013 7:17 PM, Tom Gardner wrote: > Bakul Shah wrote: >> On 6/24/13 3:23 PM, Eric Wallin wrote: >>> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote: >>> >>>> Consider a case where *both* thread A and B want to increment >>>> a counter at location X? A reads X and finds it contains 10. But >>>> before it can write back 11, B reads X and finds 10 and it too >>>> writes back 11. Now you've lost a count. Can this happen in your >>>> design? If so you need some sort of atomic update instruction. >>> >>> It can happen if the programmer is crazy enough to do it, otherwise not. >> >> Concurrent threads need to communicate with each other to cooperate >> on some common task. Consider two threads adding an item to a linked >> list or keeping statistics on some events or many such things. You >> are pretty much required to be "crazy enough"! Any support for mutex >> would simplify things quite a bit. Without atomic update you have to >> use some complicated, inefficient algorithm to implement mutexes. > > Just so. > > A programmer that doesn't understand that is the equivalent > of a hardware engineer that doesn't under stand metastability. > (When I started out most people denied the possibility of > synchronisation failure due to metastability!) > > Mind you, I'd *love* to see a radical overhaul of traditional > multicore processors so they took the form of > - a large number of processors > - each with completely independent memory > - connected by message passing fifos > > In the long term that'll be the only way we can continue > to scale individual machines: SMP scales for a while, but > then cache coherence requirements kill performance. The *only* way? lol You think like a programmer. The big assumption you are making that is no longer valid is that the processor itself is a precious resource that must be optimized. That is no longer valid. When x86 and ARM machines put four cores on a chip with one memory interface they are choking the CPU's airway. Those designs are no longer efficient and the processor is underused. So clearly it is not the precious resource anymore. Rather than trying to optimize the utilization of the CPU, design needs to proceed with the recognition of the limits of multiprocessors. Treat processors the same way you treat peripheral functions. Dedicate them to tasks. Let them have a job to do and not worry if they are idle part of the time. This results in totally different designs and can result in faster, lower cost and lower power systems. -- RickArticle: 155383
On 6/24/2013 7:00 PM, Bakul Shah wrote: > On 6/24/13 3:23 PM, Eric Wallin wrote: >> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote: >> >>> Consider a case where *both* thread A and B want to increment >>> a counter at location X? A reads X and finds it contains 10. But >>> before it can write back 11, B reads X and finds 10 and it too >>> writes back 11. Now you've lost a count. Can this happen in your >>> design? If so you need some sort of atomic update instruction. >> >> It can happen if the programmer is crazy enough to do it, otherwise not. > > Concurrent threads need to communicate with each other to cooperate > on some common task. Consider two threads adding an item to a linked > list or keeping statistics on some events or many such things. You > are pretty much required to be "crazy enough"! Any support for mutex > would simplify things quite a bit. Without atomic update you have to > use some complicated, inefficient algorithm to implement mutexes. What assumptions is this based on? Do you know? What are the alternatives to "mutexes"? How inefficient are they? When do you need to use a mutex? Have you looked at Eric's design in the least? Do you have any idea of the applications it is targeted to? -- RickArticle: 155384
rickman wrote: > Not sure what you mean by "machine cycle". I mean it in the same sense as it was used in the posting that I replied to. > I believe Eric's point is that the thing that prevents more than one processor from accessing the same memory location is the programmer. Is that not a good enough method? I'd prefer it if Eric gave the correct answer rather than someone else's possibly correct answer. It is a good enough method for some things, and not for others. > If you still have reservations, then learn about the design. If you don't want to invest the time to learn about the design, why are you bothering to object to it? There are *many* new designs which might be interesting. Nobody has time to look at them all so they make fast decisions as to whether to design and designed is credible. I'm not objecting to it, but I am giving the designer the opportunity to pass the "elevator pitch" test.Article: 155385
On 6/24/2013 5:30 PM, Eric Wallin wrote: > Verilog code for my Hive processor is now up: > > http://opencores.org/project,hive > > (Took me most of the freaking day to figure out SVN.) You mean you actually figured it out? -- RickArticle: 155386
On 6/25/2013 6:00 AM, Guy Eschemann wrote: > Hello RCIngham, > > sorry to hear that. FPGA Exchange is built on a platform that makes heavy use of JavaScript, so the minimum browser requirements are quite high: > > - Internet Explorer 10+ > - Google Chrome 24+ > - Firefox 14+ > - Safari 5+ > > As the aim is to create a discussion forum for the next decade of programmable logic, we unfortunately cannot support older browsers. LOL, I didn't know anyone created programmable logic with a browser. You might want to rethink your approach. There are a lot of people who don't control the computers they work on. Do you really want to exclude a significant portion of your potential audience? BTW, I don't think you ever responded to the post that asked why you are announcing this here which would have the effect of splitting the community. This group is barely alive these days. Your site may kill it off. -- RickArticle: 155387
rickman wrote: > On 6/24/2013 7:17 PM, Tom Gardner wrote: >> Bakul Shah wrote: >>> On 6/24/13 3:23 PM, Eric Wallin wrote: >>>> On Monday, June 24, 2013 6:03:38 PM UTC-4, Bakul Shah wrote: >>>> >>>>> Consider a case where *both* thread A and B want to increment >>>>> a counter at location X? A reads X and finds it contains 10. But >>>>> before it can write back 11, B reads X and finds 10 and it too >>>>> writes back 11. Now you've lost a count. Can this happen in your >>>>> design? If so you need some sort of atomic update instruction. >>>> >>>> It can happen if the programmer is crazy enough to do it, otherwise not. >>> >>> Concurrent threads need to communicate with each other to cooperate >>> on some common task. Consider two threads adding an item to a linked >>> list or keeping statistics on some events or many such things. You >>> are pretty much required to be "crazy enough"! Any support for mutex >>> would simplify things quite a bit. Without atomic update you have to >>> use some complicated, inefficient algorithm to implement mutexes. >> >> Just so. >> >> A programmer that doesn't understand that is the equivalent >> of a hardware engineer that doesn't under stand metastability. >> (When I started out most people denied the possibility of >> synchronisation failure due to metastability!) >> >> Mind you, I'd *love* to see a radical overhaul of traditional >> multicore processors so they took the form of >> - a large number of processors >> - each with completely independent memory >> - connected by message passing fifos >> >> In the long term that'll be the only way we can continue >> to scale individual machines: SMP scales for a while, but >> then cache coherence requirements kill performance. > > The *only* way? lol You think like a programmer. The big assumption you are making that is no longer valid is that the processor itself is a precious resource that must be optimized. That is no > longer valid. When x86 and ARM machines put four cores on a chip with one memory interface they are choking the CPU's airway. Those designs are no longer efficient and the processor is underused. So > clearly it is not the precious resource anymore. I don't think that and your statements don't follow from my comments. > Rather than trying to optimize the utilization of the CPU, design needs to proceed with the recognition of the limits of multiprocessors. Treat processors the same way you treat peripheral > functions. Dedicate them to tasks. Let them have a job to do and not worry if they are idle part of the time. This results in totally different designs and can result in faster, lower cost and > lower power systems. That approach is valuable when and where it works, but can be impractical for many workloads.Article: 155388
On 6/23/2013 5:10 PM, Richard Damon wrote: > On 6/19/13 9:39 PM, rickman wrote: >> On 6/19/2013 11:40 AM, jonesandy@comcast.net wrote: >>> To borrow Gabor's card game analogy... >>> >>> You have two stacks, (highest and 2nd highest) >>> >>> If the drawn card is same or higher than the highest stack, then >>> >>> move the top card from the highest stack to the 2nd highest stack, >>> move the drawn card to the highest stack. >>> >>> else if the drawn card is same or higher than the 2nd highest stack, then >>> >>> move the drawn card to the 2nd highest stack. >>> >>> draw another card and repeat. >> >> They don't need to be stacks. You just need to have two holding spots >> (registers) and initialize them to something less than anything you will >> have on the input. Then on each draw of a card (or sample on the input) >> you compare to both spots, if the input is higher than the "highest" >> spot you save it there and put the old highest on the "second highest" >> spot. If not, but it is higher than the "second highest" you put it there. >> >> Gabor was using a stack because he thought it would get him both the >> highest and the second highest with one compare operation, but it didn't >> work. Two compares are needed for each input. >> >> In your approach your compare is "higher or same", why do you need to do >> anything if they are the same? Not that it is a big deal, but in some >> situations this could require extra work. >> > You actually only need to compare most of the entries to the second > highest register, if it isn't higher, than you can discard the item. > Only if it is higher than the second highest, do you need to compare it > to the highest to see if the new item goes into the highest or second > highest. > I.E. > > Compare drawn card to 2nd highest stack, if not higher, discard and repeat > if higher (same doesn't really matter), discard the 2nd highest stack > and compare the new card to the highest stack. > > if not higher, new card goes into 2nd highest stack, if higher, item in > highest goes to 2nd highest, and new goes to highest. This is being done in hardware not software. Your description is sequential while the hardware is concurrent. The control logic is simple if you just code it in a simple way. Do both compares and you get two bits as a result. Then load the max and second max registers based on those two compare results. The only fly in the ointment that I see is the initial condition. You can either initialize the two registers to values which you know will always be the min values possible, or you can have a flag for the first clock cycle that loads both registers with the first value read. I think the initial state flag would be the simplest. So the register control logic has a third input bit and of course an enable from the 10 counter. -- RickArticle: 155389
Hello Rick, I'm not happy about the fact that some people can't access the forum becaus= e their IT department doesn't allow modern browsers. But I guess I have to = live with this limitation for now. With time, even conservative IT departme= nts will have to upgrade, if only for security reasons. This is a honest attempt at creating a friendly, vendor-independent discuss= ion space where FPGA developers can share their knowledge. A bit like comp.= arch.fpga was 15 years ago. People are moving away from newsgroups anyway, = so I'd rather have them join FPGA Exchange than some random LinkedIn group. Guy. On Tuesday, June 25, 2013 5:19:30 PM UTC+2, rickman wrote: > > LOL, I didn't know anyone created programmable logic with a browser.=20 >=20 > You might want to rethink your approach. There are a lot of people who= =20 >=20 > don't control the computers they work on. Do you really want to exclude= =20 >=20 > a significant portion of your potential audience? >=20 >=20 >=20 > BTW, I don't think you ever responded to the post that asked why you are= =20 >=20 > announcing this here which would have the effect of splitting the=20 >=20 > community. This group is barely alive these days. Your site may kill=20 >=20 > it off. >=20 >=20 >=20 > --=20 >=20 >=20 >=20 > RickArticle: 155390
On Monday, June 24, 2013 3:03:15 AM UTC-7, peter dudley wrote: > Hello All, > I am wondering if it is possible to use a more conventional approach to b= uilding hardware and connecting it to the AXI bus of the ARM processor. I = greatly prefer to directly instantiate components in my HDL code. I find s= trait HDL development easier to maintain in the long run and less sensitive= to changes in FPGA compiler tools. > Has anyone on this group succeeded in going around the PlanAhead/XPS grap= hical flow for building systems for the Zynq ARM? It is definitely possible but not trivial. What you need is the NGC file an= d the instantiation model for the zynq interface. This you can get from XPS= . Create an empty project with only a zynq in it to get these two files. Th= en you can instantiate the model (which has no content) and make connection= s to the +3300 nets. During implementation point to the ngc file and you ar= e done.Article: 155391
On 25/06/2013 17:07, Guy Eschemann wrote: > Hello Rick, > > I'm not happy about the fact that some people can't access the forum because their IT department doesn't allow modern browsers. But I guess I have to live with this limitation for now. With time, even conservative IT departments will have to upgrade, if only for security reasons. > > This is a honest attempt at creating a friendly, vendor-independent discussion space where FPGA developers can share their knowledge. A bit like comp.arch.fpga was 15 years ago. People are moving away from newsgroups anyway, so I'd rather have them join FPGA Exchange than some random LinkedIn group. > > Guy. Hi Guy, I am not sure how long you have been using usenet but this forum has been a vendor/company-independent friendly (especially compared to some of the other forums I read) forum since I started to use it a few decades ago. I agree with Rick and Uwe that there is really no need for another FPGA forum especially one which is controlled by a single person (right?). I understand why Vendors are doing it as it increases traffic to their website and gives them a better marketing tool but FPGA Exchange seems to be somewhat decoupled from your noasic one, so I am not sure why you decided to spend the time and effort to set it up. Anyway, good luck with your FPGA consultancy firm, Regards, Hans www.ht-lab.com > > > > On Tuesday, June 25, 2013 5:19:30 PM UTC+2, rickman wrote: >> >> LOL, I didn't know anyone created programmable logic with a browser. >> >> You might want to rethink your approach. There are a lot of people who >> >> don't control the computers they work on. Do you really want to exclude >> >> a significant portion of your potential audience? >> >> >> >> BTW, I don't think you ever responded to the post that asked why you are >> >> announcing this here which would have the effect of splitting the >> >> community. This group is barely alive these days. Your site may kill >> >> it off. >> >> >> >> -- >> >> >> >> Rick >Article: 155392
rickman <gnuarm@gmail.com> wrote: > On 6/24/2013 12:56 PM, Tom Gardner wrote: (snip) >> Is there anything to prevent multiple cores reading/writing the >> same memory location in the same machine cycle? What is the >> result when that happens? > Not sure what you mean by "machine cycle". As I said above, there are 8 > clocks to the processor machine cycle, but they are all out of phase. > So on any given clock cycle only one processor will be updating > registers or memory. If there 8 processors that never communicate, it would be better to have 8 separate RAM units. > I believe Eric's point is that the thing that prevents more than one > processor from accessing the same memory location is the programmer. Is > that not a good enough method? So no thread ever communicates with another one? Well, read the wikipedia article on spinlock and the linked-to article Peterson's_Algorithm. It is more efficient if you have an interlocked write, but can be done with spinlocks, if there is no reordering of writes to memory. As many processors now do reorder writes, there is need for special instructions. Otherwise, spinlocks might be good enough. -- glenArticle: 155393
On 6/25/2013 12:07 PM, Guy Eschemann wrote: > Hello Rick, > > I'm not happy about the fact that some people can't access the forum because their IT department doesn't allow modern browsers. But I guess I have to live with this limitation for now. With time, even conservative IT departments will have to upgrade, if only for security reasons. I take issue at your use of the term "modern". Chrome 24 was released only 8 months ago, Internet Explorer 7 months ago, Firefox 14 a year ago. But the site is yours to run as you see fit. > This is a honest attempt at creating a friendly, vendor-independent discussion space where FPGA developers can share their knowledge. A bit like comp.arch.fpga was 15 years ago. People are moving away from newsgroups anyway, so I'd rather have them join FPGA Exchange than some random LinkedIn group. Again, you have an interesting way of characterizing the other discussion forums. Not many here would agree with you and it is a bit offputting for you to imply the other groups that we like to be somehow unfit. I'm sure you prefer the folks use your site. I would too if I had started a web site. -- RickArticle: 155394
On 6/25/2013 1:14 PM, glen herrmannsfeldt wrote: > rickman<gnuarm@gmail.com> wrote: >> On 6/24/2013 12:56 PM, Tom Gardner wrote: > > (snip) >>> Is there anything to prevent multiple cores reading/writing the >>> same memory location in the same machine cycle? What is the >>> result when that happens? > >> Not sure what you mean by "machine cycle". As I said above, there are 8 >> clocks to the processor machine cycle, but they are all out of phase. >> So on any given clock cycle only one processor will be updating >> registers or memory. > > If there 8 processors that never communicate, it would be better > to have 8 separate RAM units. Why is that? What would be "better" about it? >> I believe Eric's point is that the thing that prevents more than one >> processor from accessing the same memory location is the programmer. Is >> that not a good enough method? > > So no thread ever communicates with another one? > > Well, read the wikipedia article on spinlock and the linked-to > article Peterson's_Algorithm. > > It is more efficient if you have an interlocked write, but can be > done with spinlocks, if there is no reordering of writes to memory. > > As many processors now do reorder writes, there is need for special > instructions. Are we talking about the same thing here? We were talking about the Hive processor. > Otherwise, spinlocks might be good enough. So your point is? What would the critical section of code be doing that is critical? Simple interprocess communications is not necessarily "critical". -- RickArticle: 155395
On Tuesday, June 25, 2013 1:14:57 PM UTC-4, glen herrmannsfeldt wrote: > So no thread ever communicates with another one? All threads share the same Von Neumann memory, so of course they can communicate with each other. If only there were a paper somewhere, written by the designer, freely available to anyone on the web...Article: 155396
On Tuesday, June 25, 2013 11:14:52 AM UTC-4, Tom Gardner wrote: > > I believe Eric's point is that the thing that prevents more than one pr= ocessor from accessing the same memory location is the programmer. Is that= not a good enough method? > > I'd prefer it if Eric gave the correct answer rather than > someone else's possibly correct answer. If Rick says anything wrong I'll correct him. > I'm not objecting to it, but I am giving the designer the > opportunity to pass the "elevator pitch" test. The paper has bulleted feature list at the very front and a downsides bulle= ted list at the very back. I tried to write it in an accessible manner for= the widest audience. We all like to think aloud now and then, but I'd thi= nk a comprehensive design paper would sidestep all of this wild speculation= and unnecessary third degree. http://opencores.org/usercontent,doc,1371986749Article: 155397
rickman <gnuarm@gmail.com> wrote: (snip, someone wrote) >>> Not sure what you mean by "machine cycle". As I said above, there are 8 >>> clocks to the processor machine cycle, but they are all out of phase. >>> So on any given clock cycle only one processor will be updating >>> registers or memory. (then I wrote) >> If there 8 processors that never communicate, it would be better >> to have 8 separate RAM units. > Why is that? What would be "better" about it? Well, if the RAM really is fast enough not to be the in the critical path, then maybe not, but separate RAM means no access limitations. >>> I believe Eric's point is that the thing that prevents more than one >>> processor from accessing the same memory location is the programmer. Is >>> that not a good enough method? >> So no thread ever communicates with another one? >> Well, read the wikipedia article on spinlock and the linked-to >> article Peterson's_Algorithm. >> It is more efficient if you have an interlocked write, but can be >> done with spinlocks, if there is no reordering of writes to memory. >> As many processors now do reorder writes, there is need for special >> instructions. > Are we talking about the same thing here? We were talking about the > Hive processor. I was mentioning it for context. For processor that do reorder writes, you can't use Peterson's algorithm. >> Otherwise, spinlocks might be good enough. > So your point is? Without write reordering, it is possible, though maybe not efficient, to communicate without interlocked writes. > What would the critical section of code be doing that is critical? > Simple interprocess communications is not necessarily "critical". "Critical" means that the messages won't get lost due to other threads writing at about the same time. Now, much of networking is based on unreliable "best effort" protocols, and that may also work for communications to threads. But that involves delays and retransmission after timers expire. -- glenArticle: 155398
In article <3a29b759-dd7a-4b12-9f7c-83608402c247@googlegroups.com>, peter dudley <padudle@gmail.com> wrote: >Hello All, > >I have a Xilinx Zynq development board and I am starting to teach >myself to build systems for Zynq. The recommended flow described >in UG873 is a very long sequence of graphical menu clicks, >pull-downs and forms. > >The tools then produce a great deal of machine generated code. > >I am wondering if it is possible to use a more conventional approach >to building hardware and connecting it to the AXI bus of the ARM >processor. I greatly prefer to directly instantiate components in >my HDL code. I find strait HDL development easier to maintain in the > long run and less sensitive to changes in FPGA compiler tools. > >Has anyone on this group succeeded in going around the PlanAhead/XPS >graphical flow for building systems for the Zynq ARM? > >Any advice or opinions are appreciated. Similar question was just posted on the Xilinx forums - I'll say here what I did there: We've done this for all of our designs in Xilinx involving a processor (PPC405, PPC440, microblaze) and expect to do the same in the future for ARM based designs. Use a bare-miniumum XPS flow - often just using one of the Xilinx examples - something wih just small block RAM for boot, maybe a UART, and not much else. Generate the netlist, and then never look back at XPS - all ISE and makefile (or Vivado and TCL ) from then on.. The original netlist is used as a reference and modified as needed. Xilinx strongly discourages this flow. But it's worked great for us for many years. It's nice to hear that others in this thread have the same frustrations, and have done similar things to workaround. I've never been sure if we were alone with our unhappiness of EDK/XPS/whatever they're calling it now. It's basically a BAD schematic capture tool - if it were easier to use it'd be state of the art for the mid 80s.... You're basically describing a netlist along with parameter (generic) settings. HDL is perfect for this, no need for MHS and other cruft. Add some assertions, and/or connectivity checks in ngdbuild or somewhere, and be done with it. No MHS, no PAO, no BMM, no XCO - stop inventing new / poorly defined languages / etc - when exiting standard solutions exists. Ok - got a little <ranty> there... I'm done. --MarkArticle: 155399
On 6/25/13 11:18 AM, Eric Wallin wrote: > On Tuesday, June 25, 2013 11:14:52 AM UTC-4, Tom Gardner wrote: > >>> I believe Eric's point is that the thing that prevents more than one processor from accessing the same memory location is the programmer. Is that not a good enough method? This is not good enough in general. I gave some examples where threads have to read/write the same memory location. I agree with you that if threads communicate just through fifos and there is exactly one reader and one writer there is no problem. The reader updates the read ptr & watches but doesn't update the write ptr. The writer updates the write ptr & watches but doesn't update the read ptr. You can use fifos like these to implement a mutex but this is a very expensive way to implement mutexes and doesn't scale. > The paper has bulleted feature list at the very front and a downsides bulleted list at the very back. I tried to write it in an accessible manner for the widest audience. We all like to think aloud now and then, but I'd think a comprehensive design paper would sidestep all of this wild speculation and unnecessary third degree. I don't think it is a question of "third degree". You did invite feedback! Adding compare-and-swap or load-linked & store-conditional would make your processor more useful for parallel programming. I am not motivated enough to go through 4500+ lines of verilog to know how hard that is but you must already have some bus arbitration logic since all 8 threads can access memory. > http://opencores.org/usercontent,doc,1371986749 I missed this link before. A nicely done document! A top level diagram would be helpful. 64K address space seems too small.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z