Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Question to the programming types: Ever seen a signed logical or arithmetic shift distance before? Hive shift= distances are signed, which works out quite nicely (the basic shift is shi= ft left, with negative shift distances performing right shifts). This is s= omething I haven't encountered in any opcode listings I've had the pleasure= to peruse, so I'm wondering if it is kind of new-ish.Article: 155426
Eric Wallin <tammie.eric@gmail.com> wrote: > Question to the programming types: > Ever seen a signed logical or arithmetic shift distance before? > Hive shift distances are signed, which works out quite nicely > (the basic shift is shift left, with negative shift distances > performing right shifts). > This is something I haven't encountered in any opcode listings > I've had the pleasure to peruse, so I'm wondering if it is > kind of new-ish. PDP-10 has signed shifts. The manual is available on bitsavers, such as: AA-H391A-TK_DECsystem-10_DECSYSTEM-20_Processor_Reference_Jun1982.pdf Shifts use a signed 9 bit value from the computed effective address. -- glenArticle: 155427
On Wednesday, June 26, 2013 2:07:59 PM UTC-4, glen herrmannsfeldt wrote: > PDP-10 has signed shifts. Thanks for that Glen!Article: 155428
>=20 > Thomas, with your experience with ERIC5 series, do you see anything obvio= usly missing from the Hive instruction set? What do you think of the liter= al sizing? I just took a quick look at your document (time is limited...). What I like= is the concept of "in-line" literals. A good extension would be to have th= e same concept also for calls and jumps (i.e. so you do not have to load th= e destination address into a register first) and maybe also other instructi= ons that can work with literals. I also think that you leave some bits unus= ed: e.g. byt instruction does not use register B, so you would have 3 addit= ional bits in to opcode to make it possible to have 11b literal instead of = an 8b literal (or you could use this 3 bits for other purposes, e.g. A =3D = A + lit8) What others already mentioned is the restricted code-space, but without C-c= ompiler this will never become a real issue ;-) For your desired application, you could maybe think of options to reduce th= e resource usage. BTW: The bad habit of Quartus to replace flip-flop chains= with memories (you mentioned this somewhere in your document) can be disab= led by turning off "auto replace shift registers" somewhere in the synthesi= s settings of Quartus. Regards, Thomas www.entner-electronics.comArticle: 155429
>Rick, > >Guy does not get advertising $ when people use comp.arch.fpga. > >Andy > Quite so. I feel that 'FPGARelated.com' adds some value, so I don't begrudge Stephane his advertising revenue. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 155430
On Wednesday, June 26, 2013 5:44:39 PM UTC-4, thomas....@gmail.com wrote: > I just took a quick look at your document (time is limited...). What I li= ke is the concept of "in-line" literals. A good extension would be to have = the same concept also for calls and jumps (i.e. so you do not have to load = the destination address into a register first) and maybe also other instruc= tions that can work with literals. I also think that you leave some bits un= used: e.g. byt instruction does not use register B, so you would have 3 add= itional bits in to opcode to make it possible to have 11b literal instead o= f an 8b literal (or you could use this 3 bits for other purposes, e.g. A = =3D A + lit8) Oooh, very nice idea, thanks so much! I gave this some thought and even fo= und some space to shoehorn some opcodes in, but the lit has to come from th= e data memory port and go back into the control ring to offset / replace th= e PC, and this would require some combinatorial logic in front of the progr= am memory address port which could slow the entire thing down. I'll defini= tely give it a try though. I'm kind of against invading the B stack index/pop for other things, having= it always present allows for concurrent stack cleanup. > What others already mentioned is the restricted code-space, but without C= -compiler this will never become a real issue ;-) Hive could be easily edited to have 32 bit addresses, but the use of BRAM f= or small processor main memory is likely an even stronger restriction on co= de-space, which is why I don't feel the need for anything beyond 16 bits. = =20 > For your desired application, you could maybe think of options to reduce = the resource usage. BTW: The bad habit of Quartus to replace flip-flop chai= ns with memories (you mentioned this somewhere in your document) can be dis= abled by turning off "auto replace shift registers" somewhere in the synthe= sis settings of Quartus. Using the "speed" optimization technique for analysis and synthesis avoids = this as well.Article: 155431
In comp.arch.fpga Andrew Haley <andrew29@littlepinkcloud.invalid> wrote: > In comp.lang.forth Paul Rubin <no.email@nospam.invalid> wrote: > > More than 50% of SIM cards deployed in 2011 run Java Card > > OK, but that's hardly "most Java", unless you're just counting the > number of virtual machines that might run at some point. Java Card isn't the JVM - it's Java compiled down to whatever CPU is on the card. TheoArticle: 155432
Guy Eschemann <Guy.Eschemann@gmail.com> writes: > This is a honest attempt at creating a friendly, vendor-independent > discussion space where FPGA developers can share their knowledge. A > bit like comp.arch.fpga was 15 years ago. People are moving away from > newsgroups anyway, so I'd rather have them join FPGA Exchange than > some random LinkedIn group. Does your "modern" platform provide any control for the user to choose what content he wants or doesn't want to see? Sort of like what we've had on Usenet since the 1990s, sorting, threading, scoring... As far as LinkedIn goes I don't think it's going to be a discussion platform. In fact, I've been surprised at the lack of discussion on LinkedIn in the various FPGA-related groups. Other than "please do my homework" and "what book / what eval kit should I buy" from extreme beginners and"please read my blog" and some job ads, it's been pretty quiet. Although I have to admit I wouldn't have known about Arrow's cheap Cyclone V SoC trainings this summer if it weren't for LinkedIn.Article: 155433
<snip> >As far as LinkedIn goes I don't think it's going to be a discussion >platform. In fact, I've been surprised at the lack of discussion on >LinkedIn in the various FPGA-related groups. Other than "please do my >homework" and "what book / what eval kit should I buy" from extreme >beginners and"please read my blog" and some job ads, it's been pretty >quiet. Although I have to admit I wouldn't have known about Arrow's >cheap Cyclone V SoC trainings this summer if it weren't for LinkedIn. > I agree with you, the signal-to-noise ratio in the FPGA- and VHDL-related groups that I belong to is rather poor. I quit one group because it was all job-related and shameless self-promotion. I occasionally post to advise people against doing something obviously really wrong, but I don't expect to learn anything worthwhile in any of their groups. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 155434
<snip> >Mind you, I'd *love* to see a radical overhaul of traditional >multicore processors so they took the form of > - a large number of processors > - each with completely independent memory > - connected by message passing fifos > >In the long term that'll be the only way we can continue >to scale individual machines: SMP scales for a while, but >then cache coherence requirements kill performance. > Transputer? http://en.wikipedia.org/wiki/Transputer --------------------------------------- Posted through http://www.FPGARelated.comArticle: 155435
On 28/06/13 10:09, RCIngham wrote: > <snip> > >> Mind you, I'd *love* to see a radical overhaul of traditional >> multicore processors so they took the form of >> - a large number of processors >> - each with completely independent memory >> - connected by message passing fifos >> >> In the long term that'll be the only way we can continue >> to scale individual machines: SMP scales for a while, but >> then cache coherence requirements kill performance. >> > > Transputer? > http://en.wikipedia.org/wiki/Transputer It had a lot going for it, but was a too dogmatic about the development environment. At the time it was respectably fast, but that wasn't sufficient -- particularly since there was so much scope for increasing speed of uniprocessor machines. Given that uniprocessors have hit a wall, transputer *concepts* embodied in a completely different form might begin to be fashionable again. It would also help if people can decide that reliability is important, and that bucketfuls of salt should be on hand when listening to salesman's protestations that "the software/hardware framework takes care of all of that so you don't have to worry".Article: 155436
On Wednesday, June 26, 2013 5:44:39 PM UTC-4, thomas....@gmail.com wrote: > I just took a quick look at your document (time is limited...). What I li= ke is the concept of "in-line" literals. A good extension would be to have = the same concept also for calls and jumps (i.e. so you do not have to load = the destination address into a register first) and maybe also other instruc= tions that can work with literals. I also think that you leave some bits un= used: e.g. byt instruction does not use register B, so you would have 3 add= itional bits in to opcode to make it possible to have 11b literal instead o= f an 8b literal (or you could use this 3 bits for other purposes, e.g. A = =3D A + lit8) After looking into this yesterday I don't think I'll do it. The in-line va= lue has to be retrieved before it can be used to offset or replace the PC, = which is one clock too late for the way the pipeline is currently configure= d. Using it in other ways like adding wouldn't work unless I used a sepear= ate adder, as the ALU add/subtract happens fairly early in the pipe. But I= really appreciate this excellent suggestion Thomas, and for the time you t= ook to read my paper!Article: 155437
On 6/28/2013 3:55 AM, RCIngham wrote: > <snip> > >> As far as LinkedIn goes I don't think it's going to be a discussion >> platform. In fact, I've been surprised at the lack of discussion on >> LinkedIn in the various FPGA-related groups. Other than "please do my >> homework" and "what book / what eval kit should I buy" from extreme >> beginners and"please read my blog" and some job ads, it's been pretty >> quiet. Although I have to admit I wouldn't have known about Arrow's >> cheap Cyclone V SoC trainings this summer if it weren't for LinkedIn. >> > > I agree with you, the signal-to-noise ratio in the FPGA- and VHDL-related > groups that I belong to is rather poor. I quit one group because it was all > job-related and shameless self-promotion. I occasionally post to advise > people against doing something obviously really wrong, but I don't expect > to learn anything worthwhile in any of their groups. I am in a few groups at Linkedin and I find an interesting discussion now and again. There was a rather long one on MISC and Forth, but that discussion had a poor SNR (at least from my viewpoint). There is an interesting one in one of the FPGA groups where someone is writing training materials and seems to be doing something worthwhile. I haven't figured out how he is presenting the materials though. This group, comp.arch.fpga is not so bad, but a lot of usenet groups are pretty poor SNR too. Not that they don't have much content, but they can have *so much* noise. Actually it is more like SDR, signal to drama ratio. lol -- RickArticle: 155438
On 6/28/2013 5:33 AM, Tom Gardner wrote: > On 28/06/13 10:09, RCIngham wrote: >> <snip> >> >>> Mind you, I'd *love* to see a radical overhaul of traditional >>> multicore processors so they took the form of >>> - a large number of processors >>> - each with completely independent memory >>> - connected by message passing fifos >>> >>> In the long term that'll be the only way we can continue >>> to scale individual machines: SMP scales for a while, but >>> then cache coherence requirements kill performance. >>> >> >> Transputer? >> http://en.wikipedia.org/wiki/Transputer > > It had a lot going for it, but was a too dogmatic about > the development environment. You mean 'C'? I worked on a large transputer oriented project and they used ANSI 'C' rather than Occam. It got the job done... or should I say "jobs"? > At the time it was respectably > fast, but that wasn't sufficient -- particularly since there > was so much scope for increasing speed of uniprocessor > machines. > > Given that uniprocessors have hit a wall, transputer > *concepts* embodied in a completely different form > might begin to be fashionable again. You mean like 144 transputers on a single chip? I"m not sure where processing is headed. I actually just see confusion ahead as all of the existing methods seem to have come to a steep incline if not a brick wall. It may be time for something completely different. > It would also help if people can decide that reliability > is important, and that bucketfuls of salt should be > on hand when listening to salesman's protestations that > "the software/hardware framework takes care of all of > that so you don't have to worry". What? Since when did engineers listen to salesmen? -- RickArticle: 155439
On 28/06/13 15:52, rickman wrote: > On 6/28/2013 5:33 AM, Tom Gardner wrote: >> On 28/06/13 10:09, RCIngham wrote: >>> <snip> >>> >>>> Mind you, I'd *love* to see a radical overhaul of traditional >>>> multicore processors so they took the form of >>>> - a large number of processors >>>> - each with completely independent memory >>>> - connected by message passing fifos >>>>can >>>> In the long term that'll be the only way we can continue >>>> to scale individual machines: SMP scales for a while, but >>>> then cache coherence requirements kill performance. >>>> >>> >>> Transputer? >>> http://en.wikipedia.org/wiki/Transputer >> >> It had a lot going for it, but was a too dogmatic about >> the development environment. > > You mean 'C'? I worked on a large transputer oriented project and they used ANSI 'C' rather than Occam. It got the job done... or should I say "jobs"? I only looked at the Transputer when it was Occam only. I liked Occam as an academic language, but at that time it would have been a bit of a pain to do any serious engineering; ISTR anything other than primitive types weren't supported in the language. IIRC that was ameliorated later, but by then the opportunity for me (and Inmos) had passed. I don't know how C fitted onto the Transputer, but I'd only have been interested if "multithreaded" (to use the term loosely) code could have been expressed reasonably easily. Shame, I'd have loved to use it. >> At the time it was respectably >> fast, but that wasn't sufficient -- particularly since there >> was so much scope for increasing speed of uniprocessor >> machines. >> >> Given that uniprocessors have hit a wall, transputer >> *concepts* embodied in a completely different form >> might begin to be fashionable again. > > You mean like 144 transputers on a single chip? Or Intel's 80 cored chip :) > I"m not sure where processing is headed. Not that way! Memory bandwidth and latency are key issues - but you knew that! > I actually just see confusion ahead as all of the existing methods seem to have come to a steep incline if > not a brick wall. It may be time for something completely different. Precisely. My bet is that message passing between independent processor+memory systems has the biggest potential. It matches nicely onto many forms of event-driven industrial and financial applications and, I am told, onto significant parts of HPC. It is also relatively easy to comprehend and debug. The trick will be to get the sizes of the processor + memory + computation "just right". And desktop/GUI doesn't match that. >> It would also help if people can decide that reliability >> is important, and that bucketfuls of salt should be >> on hand when listening to salesman's protestations that >> "the software/hardware framework takes care of all of >> that so you don't have to worry". > > What? Since when did engineers listen to salesmen? Since their PHBs get taken out to the golf course to chat about sport by the salesmen :(Article: 155440
On 6/28/2013 12:23 PM, Tom Gardner wrote: > On 28/06/13 15:52, rickman wrote: >> On 6/28/2013 5:33 AM, Tom Gardner wrote: >>> On 28/06/13 10:09, RCIngham wrote: >>>> <snip> >>>> >>>>> Mind you, I'd *love* to see a radical overhaul of traditional >>>>> multicore processors so they took the form of >>>>> - a large number of processors >>>>> - each with completely independent memory >>>>> - connected by message passing fifos >>>>> can >>>>> In the long term that'll be the only way we can continue >>>>> to scale individual machines: SMP scales for a while, but >>>>> then cache coherence requirements kill performance. >>>>> >>>> >>>> Transputer? >>>> http://en.wikipedia.org/wiki/Transputer >>> >>> It had a lot going for it, but was a too dogmatic about >>> the development environment. >> >> You mean 'C'? I worked on a large transputer oriented project and they >> used ANSI 'C' rather than Occam. It got the job done... or should I >> say "jobs"? > > I only looked at the Transputer when it was Occam only. > I liked Occam as an academic language, but at that time > it would have been a bit of a pain to do any serious > engineering; ISTR anything other than primitive types > weren't supported in the language. IIRC that was > ameliorated later, but by then the opportunity for > me (and Inmos) had passed. > > I don't know how C fitted onto the Transputer, but > I'd only have been interested if "multithreaded" > (to use the term loosely) code could have been > expressed reasonably easily. > > Shame, I'd have loved to use it. > >>> At the time it was respectably >>> fast, but that wasn't sufficient -- particularly since there >>> was so much scope for increasing speed of uniprocessor >>> machines. >>> >>> Given that uniprocessors have hit a wall, transputer >>> *concepts* embodied in a completely different form >>> might begin to be fashionable again. >> >> You mean like 144 transputers on a single chip? > > Or Intel's 80 cored chip :) > >> I"m not sure where processing is headed. > > Not that way! Memory bandwidth and latency are > key issues - but you knew that! Yeah, but I think the current programming paradigm is the problem. I think something else needs to come along. The current methods are all based on one, massive von Neumann design and that is what has hit the wall... duh! Time to think in terms of much smaller entities not totally different from what is found in FPGAs, just processors rather than logic. An 80 core chip will just be a starting point, but the hard part will *be* getting started. >> I actually just see confusion ahead as all of the existing methods >> seem to have come to a steep incline if >> not a brick wall. It may be time for something completely different. > > Precisely. My bet is that message passing between > independent processor+memory systems has the > biggest potential. It matches nicely onto many > forms of event-driven industrial and financial > applications and, I am told, onto significant > parts of HPC. It is also relatively easy to > comprehend and debug. > > The trick will be to get the sizes of the > processor + memory + computation "just right". > And desktop/GUI doesn't match that. I think the trick will be in finding ways of dividing up the programs so they can meld to the hardware rather than trying to optimize everything. Consider a chip where you have literally a trillion operations per second available all the time. Do you really care if half go to waste? I don't! I design FPGAs and I have never felt obliged (not since the early days anyway) to optimize the utility of each LUT and FF. No, it turns out the precious resource in FPGAs is routing and you can't do much but let the tools manage that anyway. So a fine grained processor array could be very effective if the programming can be divided down to suit. Maybe it takes 10 of these cores to handle 100 Mbps Ethernet, so what? Something like a browser might need to harness a couple of dozen. If the load slacks off and they are idling, so what? >>> It would also help if people can decide that reliability >>> is important, and that bucketfuls of salt should be >>> on hand when listening to salesman's protestations that >>> "the software/hardware framework takes care of all of >>> that so you don't have to worry". >> >> What? Since when did engineers listen to salesmen? > > Since their PHBs get taken out to the golf course > to chat about sport by the salesmen :( It's a bit different with me. I am my own PHB and I kayak, not golf. I have one disti person who I really enjoy talking to. She tried to help me from time to time, but often she can't do a lot because I'm not buying 1000's of chips. But my quantities have gone up a bit lately, we'll see where it goes. -- RickArticle: 155441
On 6/28/13 2:33 AM, Tom Gardner wrote: > On 28/06/13 10:09, RCIngham wrote: >> <snip> >> >>> Mind you, I'd *love* to see a radical overhaul of traditional >>> multicore processors so they took the form of >>> - a large number of processors >>> - each with completely independent memory >>> - connected by message passing fifos >>> >>> In the long term that'll be the only way we can continue >>> to scale individual machines: SMP scales for a while, but >>> then cache coherence requirements kill performance. >>> >> >> Transputer? >> http://en.wikipedia.org/wiki/Transputer > > It had a lot going for it, but was a too dogmatic about > the development environment. At the time it was respectably > fast, but that wasn't sufficient -- particularly since there > was so much scope for increasing speed of uniprocessor > machines. Have you looked at Tilera's TILEpro64 or Adapteva's Epiphany 64 core processors? > Given that uniprocessors have hit a wall, transputer > *concepts* embodied in a completely different form > might begin to be fashionable again. Languages like Erlang and Go use similar concepts (as did Occam on the transputer). But I think the problem is that /in general/ we still don't know how to write parallel or distributed programs. Most of the concepts are from ~40 years back (CSP, guarded commands etc.). We still don't have decent tools. Turning serial programs into parallel versions is manual, laborious, error prone and not very successful.Article: 155442
On 28/06/13 20:55, Bakul Shah wrote: > On 6/28/13 2:33 AM, Tom Gardner wrote: >> On 28/06/13 10:09, RCIngham wrote: >>> <snip> >>> >>>> Mind you, I'd *love* to see a radical overhaul of traditional >>>> multicore processors so they took the form of >>>> - a large number of processors >>>> - each with completely independent memory >>>> - connected by message passing fifos >>>> >>>> In the long term that'll be the only way we can continue >>>> to scale individual machines: SMP scales for a while, but >>>> then cache coherence requirements kill performance. >>>> >>> >>> Transputer? >>> http://en.wikipedia.org/wiki/Transputer >> >> It had a lot going for it, but was a too dogmatic about >> the development environment. At the time it was respectably >> fast, but that wasn't sufficient -- particularly since there >> was so much scope for increasing speed of uniprocessor >> machines. > > Have you looked at Tilera's TILEpro64 or Adapteva's Epiphany > 64 core processors? No I haven't. I've been constrained by getting high-availability software to market quickly, on hardware that is demonstrably supported all over the world. > >> Given that uniprocessors have hit a wall, transputer >> *concepts* embodied in a completely different form >> might begin to be fashionable again. > > Languages like Erlang and Go use similar concepts (as > did Occam on the transputer). But I think the problem > is that /in general/ we still don't know how to write > parallel or distributed programs. Most of the concepts > are from ~40 years back (CSP, guarded commands etc.). > We still don't have decent tools. Turning serial programs > into parallel versions is manual, laborious, error prone > and not very successful. Erlang is certainly interesting from this point of view. I'm not interested in turning existing serial programs into parallel ones; that way lies madness and failure. What is more interestingly tractable are "embarrassingly parallel" problems (e.g. massive event processing systems), and completely new approaches (currently typified by big data and map-reduce, but that's just the beginning).Article: 155443
On 28/06/13 20:06, rickman wrote: > On 6/28/2013 12:23 PM, Tom Gardner wrote: >> On 28/06/13 15:52, rickman wrote: >>> On 6/28/2013 5:33 AM, Tom Gardner wrote: >>>> On 28/06/13 10:09, RCIngham wrote: >>>>> <snip> >>>>> >>>>>> Mind you, I'd *love* to see a radical overhaul of traditional >>>>>> multicore processors so they took the form of >>>>>> - a large number of processors >>>>>> - each with completely independent memory >>>>>> - connected by message passing fifos >>>>>> can >>>>>> In the long term that'll be the only way we can continue >>>>>> to scale individual machines: SMP scales for a while, but >>>>>> then cache coherence requirements kill performance. >>>>>> >>>>> >>>>> Transputer? >>>>> http://en.wikipedia.org/wiki/Transputer >>>> >>>> It had a lot going for it, but was a too dogmatic about >>>> the development environment. >>> >>> You mean 'C'? I worked on a large transputer oriented project and they >>> used ANSI 'C' rather than Occam. It got the job done... or should I >>> say "jobs"? >> >> I only looked at the Transputer when it was Occam only. >> I liked Occam as an academic language, but at that time >> it would have been a bit of a pain to do any serious >> engineering; ISTR anything other than primitive types >> weren't supported in the language. IIRC that was >> ameliorated later, but by then the opportunity for >> me (and Inmos) had passed. >> >> I don't know how C fitted onto the Transputer, but >> I'd only have been interested if "multithreaded" >> (to use the term loosely) code could have been >> expressed reasonably easily. >> >> Shame, I'd have loved to use it. >> >>>> At the time it was respectably >>>> fast, but that wasn't sufficient -- particularly since there >>>> was so much scope for increasing speed of uniprocessor >>>> machines. >>>> >>>> Given that uniprocessors have hit a wall, transputer >>>> *concepts* embodied in a completely different form >>>> might begin to be fashionable again. >>> >>> You mean like 144 transputers on a single chip? >> >> Or Intel's 80 cored chip :) >> >>> I"m not sure where processing is headed. >> >> Not that way! Memory bandwidth and latency are >> key issues - but you knew that! > > Yeah, but I think the current programming paradigm is the problem. I think something else needs to come along. The current methods are all based on one, massive von Neumann design and that is what > has hit the wall... duh! > > Time to think in terms of much smaller entities not totally different from what is found in FPGAs, just processors rather than logic. > > An 80 core chip will just be a starting point, but the hard part will *be* getting started. > > >>> I actually just see confusion ahead as all of the existing methods >>> seem to have come to a steep incline if >>> not a brick wall. It may be time for something completely different. >> >> Precisely. My bet is that message passing between >> independent processor+memory systems has the >> biggest potential. It matches nicely onto many >> forms of event-driven industrial and financial >> applications and, I am told, onto significant >> parts of HPC. It is also relatively easy to >> comprehend and debug. >> >> The trick will be to get the sizes of the >> processor + memory + computation "just right". >> And desktop/GUI doesn't match that. > > I think the trick will be in finding ways of dividing up the programs so they can meld to the hardware rather than trying to optimize everything. My suspicion is that, except for compute-bound problems that only require "local" data, that granularity will be too small. Examples where it will work, e.g. protein folding, will rapidly migrate to CUDA and graphics processors. > > Consider a chip where you have literally a trillion operations per second available all the time. Do you really care if half go to waste? I don't! I design FPGAs and I have never felt obliged (not > since the early days anyway) to optimize the utility of each LUT and FF. No, it turns out the precious resource in FPGAs is routing and you can't do much but let the tools manage that anyway. Those internal FPGA constraints also have analogues at a larger scale, e.g. ic pinout, backplanes, networks... > So a fine grained processor array could be very effective if the programming can be divided down to suit. Maybe it takes 10 of these cores to handle 100 Mbps Ethernet, so what? Something like a > browser might need to harness a couple of dozen. If the load slacks off and they are idling, so what? The fundamental problem is that in general as you make the granularity smaller, the communications requirements get larger. And vice versa :( >>>> It would also help if people can decide that reliability >>>> is important, and that bucketfuls of salt should be >>>> on hand when listening to salesman's protestations that >>>> "the software/hardware framework takes care of all of >>>> that so you don't have to worry". >>> >>> What? Since when did engineers listen to salesmen? >> >> Since their PHBs get taken out to the golf course >> to chat about sport by the salesmen :( > > It's a bit different with me. I am my own PHB and I kayak, not golf. I have one disti person who I really enjoy talking to. She tried to help me from time to time, but often she can't do a lot > because I'm not buying 1000's of chips. But my quantities have gone up a bit lately, we'll see where it goes. I'm sort-of retired (I got sick of corporate in-fighting, and I have my "drop dead money", so...) I regard golf as silly, despite having two courses in walking distance. My equivalent of kayaking is flying gliders.Article: 155444
On 6/28/13 2:04 PM, Tom Gardner wrote: > On 28/06/13 20:55, Bakul Shah wrote: >> >> Have you looked at Tilera's TILEpro64 or Adapteva's Epiphany >> 64 core processors? > > No I haven't. FYI the epiphany III processor is used in the $99 parallela "supercomputer". Should be available by August end according to http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone/posts > What is more interestingly tractable are "embarrassingly > parallel" problems (e.g. massive event processing systems), > and completely new approaches (currently typified by > big data and map-reduce, but that's just the beginning). And yet these run on traditional computers. Parallelism is at the node level.Article: 155445
On 28/06/13 22:22, Bakul Shah wrote: > On 6/28/13 2:04 PM, Tom Gardner wrote: >> What is more interestingly tractable are "embarrassingly >> parallel" problems (e.g. massive event processing systems), >> and completely new approaches (currently typified by >> big data and map-reduce, but that's just the beginning). > > And yet these run on traditional computers. Parallelism > is at the node level. Just so, but even such nodes can be the subject of innovation. A recent good example is Sun's Niagara/Rock T series sparcs, which forego OOO and caches in favour of a medium number of cores each operating at the speed of main memory.Article: 155446
On 6/28/2013 5:11 PM, Tom Gardner wrote: > On 28/06/13 20:06, rickman wrote: >> On 6/28/2013 12:23 PM, Tom Gardner wrote: >>> On 28/06/13 15:52, rickman wrote: >>>> On 6/28/2013 5:33 AM, Tom Gardner wrote: >>>>> On 28/06/13 10:09, RCIngham wrote: >>>>>> <snip> >>>>>> >>>>>>> Mind you, I'd *love* to see a radical overhaul of traditional >>>>>>> multicore processors so they took the form of >>>>>>> - a large number of processors >>>>>>> - each with completely independent memory >>>>>>> - connected by message passing fifos >>>>>>> can >>>>>>> In the long term that'll be the only way we can continue >>>>>>> to scale individual machines: SMP scales for a while, but >>>>>>> then cache coherence requirements kill performance. >>>>>>> >>>>>> >>>>>> Transputer? >>>>>> http://en.wikipedia.org/wiki/Transputer >>>>> >>>>> It had a lot going for it, but was a too dogmatic about >>>>> the development environment. >>>> >>>> You mean 'C'? I worked on a large transputer oriented project and they >>>> used ANSI 'C' rather than Occam. It got the job done... or should I >>>> say "jobs"? >>> >>> I only looked at the Transputer when it was Occam only. >>> I liked Occam as an academic language, but at that time >>> it would have been a bit of a pain to do any serious >>> engineering; ISTR anything other than primitive types >>> weren't supported in the language. IIRC that was >>> ameliorated later, but by then the opportunity for >>> me (and Inmos) had passed. >>> >>> I don't know how C fitted onto the Transputer, but >>> I'd only have been interested if "multithreaded" >>> (to use the term loosely) code could have been >>> expressed reasonably easily. >>> >>> Shame, I'd have loved to use it. >>> >>>>> At the time it was respectably >>>>> fast, but that wasn't sufficient -- particularly since there >>>>> was so much scope for increasing speed of uniprocessor >>>>> machines. >>>>> >>>>> Given that uniprocessors have hit a wall, transputer >>>>> *concepts* embodied in a completely different form >>>>> might begin to be fashionable again. >>>> >>>> You mean like 144 transputers on a single chip? >>> >>> Or Intel's 80 cored chip :) >>> >>>> I"m not sure where processing is headed. >>> >>> Not that way! Memory bandwidth and latency are >>> key issues - but you knew that! >> >> Yeah, but I think the current programming paradigm is the problem. I >> think something else needs to come along. The current methods are all >> based on one, massive von Neumann design and that is what >> has hit the wall... duh! >> >> Time to think in terms of much smaller entities not totally different >> from what is found in FPGAs, just processors rather than logic. >> >> An 80 core chip will just be a starting point, but the hard part will >> *be* getting started. >> >> >>>> I actually just see confusion ahead as all of the existing methods >>>> seem to have come to a steep incline if >>>> not a brick wall. It may be time for something completely different. >>> >>> Precisely. My bet is that message passing between >>> independent processor+memory systems has the >>> biggest potential. It matches nicely onto many >>> forms of event-driven industrial and financial >>> applications and, I am told, onto significant >>> parts of HPC. It is also relatively easy to >>> comprehend and debug. >>> >>> The trick will be to get the sizes of the >>> processor + memory + computation "just right". >>> And desktop/GUI doesn't match that. >> >> I think the trick will be in finding ways of dividing up the programs >> so they can meld to the hardware rather than trying to optimize >> everything. > > My suspicion is that, except for compute-bound > problems that only require "local" data, that > granularity will be too small. > > Examples where it will work, e.g. protein folding, > will rapidly migrate to CUDA and graphics processors. You are still thinking von Neumann. Any application can be broken down into small units and parceled out to small processors. But you have to think in those terms rather than just saying, "it doesn't fit". Of course it can fit! >> Consider a chip where you have literally a trillion operations per >> second available all the time. Do you really care if half go to waste? >> I don't! I design FPGAs and I have never felt obliged (not >> since the early days anyway) to optimize the utility of each LUT and >> FF. No, it turns out the precious resource in FPGAs is routing and you >> can't do much but let the tools manage that anyway. > > Those internal FPGA constraints also have analogues at > a larger scale, e.g. ic pinout, backplanes, networks... > > >> So a fine grained processor array could be very effective if the >> programming can be divided down to suit. Maybe it takes 10 of these >> cores to handle 100 Mbps Ethernet, so what? Something like a >> browser might need to harness a couple of dozen. If the load slacks >> off and they are idling, so what? > > The fundamental problem is that in general as you make the > granularity smaller, the communications requirements > get larger. And vice versa :( Actually not. The aggregate comms requirements may increase, but we aren't sharing an Ethernet bus. All of the local processors talk to each other and less often have to talk to non-local processors. I think the phone company knows something about that. If you apply your line of reasoning to FPGAs with the lowly 4 input LUT it would seem like they would be doomed to eternal comms congestion. Look at the routing in FPGAs and other PLDs sometime. They are hierarchical. Works pretty well, but the trade off is in worrying about providing enough comms to let all of the logic be used for every design or just not worrying about it and "making do". Works pretty well if the designers just chill about utilization. >>>>> It would also help if people can decide that reliability >>>>> is important, and that bucketfuls of salt should be >>>>> on hand when listening to salesman's protestations that >>>>> "the software/hardware framework takes care of all of >>>>> that so you don't have to worry". >>>> >>>> What? Since when did engineers listen to salesmen? >>> >>> Since their PHBs get taken out to the golf course >>> to chat about sport by the salesmen :( >> >> It's a bit different with me. I am my own PHB and I kayak, not golf. I >> have one disti person who I really enjoy talking to. She tried to help >> me from time to time, but often she can't do a lot >> because I'm not buying 1000's of chips. But my quantities have gone up >> a bit lately, we'll see where it goes. > > I'm sort-of retired (I got sick of corporate in-fighting, > and I have my "drop dead money", so...) That's me too, but I found some work that is paying off very well now. So I've got a foot in both camps, retired, not retired... both are fun in their own way. But dealing with international shipping is a PITA. > I regard golf as silly, despite having two courses in > walking distance. My equivalent of kayaking is flying > gliders. That has got to be fun! I've never worked up the whatever to learn to fly. It seems like a big investment and not so cheap overall. But there is clearly a great thrill there. -- RickArticle: 155447
On Friday, June 28, 2013 9:02:10 PM UTC-4, rickman wrote: > You are still thinking von Neumann. Any application can be broken down= =20 > into small units and parceled out to small processors. But you have to= =20 > think in those terms rather than just saying, "it doesn't fit". Of=20 > course it can fit! Intra brain communications are hierarchical as well. I'm nobody, but one of the reasons for designing Hive was because I feel pr= ocessors in general are much too complex, to the point where I'm repelled b= y them. I believe one of the drivers for this over-complexity is the fact = that main memory is external. I've been assembling PCs since the 286 days,= and I've never understood why main memory wasn't tightly integrated onto t= he uP die. Everyone pretty much gets the same ballpark memory size when pu= tting a PC together, and I can remember only once or twice upgrading memory= after the initial build (for someone else's Dell or similar where the init= ial build was anemically low-balled for "value" reasons). Here we are in 2= 013, the memory is several light cm away from the processor on the MB, talk= ing in cache lines, and I still don't get why we have this gross inefficien= cy. =20 My dual core multi-GHz PC with SSD often just sits there for many seconds a= fter I click on something, and malware is now taking me sometimes days to f= ix. Windows 7 is a dog to install, with relentless updates that often comp= letely hose it rather than improve it. The future isn't looking too bright= for the desktop with the way we're going.Article: 155448
Eric Wallin wrote: > On Friday, June 28, 2013 9:02:10 PM UTC-4, rickman wrote: > >> You are still thinking von Neumann. Any application can be broken >> down into small units and parceled out to small processors. But >> you have to think in those terms rather than just saying, "it >> doesn't fit". Of course it can fit! > > Intra brain communications are hierarchical as well. > > I'm nobody, but one of the reasons for designing Hive was because I > feel processors in general are much too complex, to the point where > I'm repelled by them. I believe one of the drivers for this > over-complexity is the fact that main memory is external. I've been > assembling PCs since the 286 days, and I've never understood why main > memory wasn't tightly integrated onto the uP die. RAM was both large and expensive until recently. Different people made RAM than made processors and it would have been challenging to get the business arrangements such that they'd glue up. Plus, beginning not long ago, you're rwally dealing with cache directly, not RAM. Throw in that main memory is DRAM, and it gets a lot more complicated. Building a BSP for a new board from scratch with a DRAM controller is a lot of work. > Everyone pretty > much gets the same ballpark memory size when putting a PC together, > and I can remember only once or twice upgrading memory after the > initial build (for someone else's Dell or similar where the initial > build was anemically low-balled for "value" reasons). Here we are in > 2013, the memory is several light cm away from the processor on the > MB, talking in cache lines, and I still don't get why we have this > gross inefficiency. > That's not generally the bottleneck, though. > My dual core multi-GHz PC with SSD often just sits there for many > seconds after I click on something, and malware is now taking me > sometimes days to fix. Geez. Ever use virtual machines? If you break/infect one, just roll it back. > Windows 7 is a dog to install, with > relentless updates that often completely hose it rather than improve > it. The future isn't looking too bright for the desktop with the way > we're going. > -- Les CargillArticle: 155449
Bakul Shah wrote: > On 6/28/13 2:33 AM, Tom Gardner wrote: >> On 28/06/13 10:09, RCIngham wrote: >>> <snip> >>> >>>> Mind you, I'd *love* to see a radical overhaul of traditional >>>> multicore processors so they took the form of >>>> - a large number of processors >>>> - each with completely independent memory >>>> - connected by message passing fifos >>>> >>>> In the long term that'll be the only way we can continue >>>> to scale individual machines: SMP scales for a while, but >>>> then cache coherence requirements kill performance. >>>> >>> >>> Transputer? >>> http://en.wikipedia.org/wiki/Transputer >> >> It had a lot going for it, but was a too dogmatic about >> the development environment. At the time it was respectably >> fast, but that wasn't sufficient -- particularly since there >> was so much scope for increasing speed of uniprocessor >> machines. > > Have you looked at Tilera's TILEpro64 or Adapteva's Epiphany > 64 core processors? > >> Given that uniprocessors have hit a wall, transputer >> *concepts* embodied in a completely different form >> might begin to be fashionable again. > > Languages like Erlang and Go use similar concepts (as > did Occam on the transputer). But I think the problem > is that /in general/ we still don't know how to write > parallel or distributed programs. I do - I've been doing it for a long time, too. It's not all that hard if you have no libraries getting in the way. This, by the way, is absolutely nothing fancy. It's precisely the same concepts as when we linked stuff together with serial ports in the Stone Age. > Most of the concepts > are from ~40 years back (CSP, guarded commands etc.). Most *all* concepts in computers are from that long ago or longer. The "new stuff" is more about arbitraging market forces than getting real work done. > We still don't have decent tools. I respectfully disagree. But my standard for "decency" is probably different from your'n. My idea of an IDE is an editor and a shell prompt... > Turning serial programs > into parallel versions is manual, laborious, error prone > and not very successful. So don't do that. Write them to be parallel from the git-go. Write them to be event-driven. It's better in all dimensions. After all, we're all really clockmakers. Events regulate our "wheels" just like the escapement on a pendulum clock. . When you get that happening, things get to be a lot more deterministic and that is what parallelism needs the most. -- Les Cargill
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z