Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On Sat, 18 Dec 1999 12:50:33 -0500, Ray Andraka <randraka@ids.net> wrote: > > >Dann Corbit wrote: > >> "Ray Andraka" <randraka@ids.net> wrote in message >> news:385B1DEE.7517AAC7@ids.net... >> > The chess processor as you describe would be sensible in an FPGA. Current >> > offerings have extraordinary logic densities, and some of the newer FPGAs >> have >> > over 500K of on-chip RAM which can be arranged as a very wide memory. >> Some of >> > the newest parts have several million 'marketing' gates available too. >> FPGAs >> > have long been used as prototyping platforms for custom silicon. >> >> I am curious about the memory. Chess programs need to access at least tens >> of megabytes of memory. This is used for the hash tables, since the same >> areas are repeatedly searched. Without a hash table, the calculations must >> be performed over and over. Some programs can even access gigabytes of ram >> when implemented on a mainframe architecture. Is very fast external ram >> access possible from FPGA's? > >This is conventional CPU thinking. With the high degree of parallelism in the No this is algorithmic speedup design. Branching factor (time multiplyer to see another move ahead) gets better with it by a large margin. So BF in the next formula gets better # operations in FGPA = C * (BF^n) where n is a positive integer. >FPGA and the large amount of resources in some of the more recent devices, it >may very well be that it is more advantageous to recompute the values rather >than fetching them. There may even be a better approach to the algorithm that >just isn't practical on a conventional CPU. Early computer chess did not use >the huge memories. I suspect the large memory is more used to speed up the >processing rather than a necessity to solving the problem. Though #operations used by deep blue was incredible compared to any program of today at world championship 1999 many programs searched positionally deeper (deep blue 5 to 6 moves ahead some programs looking there 6-7 moves ahead). This all because of these algorithmic improvements. It's like comparing bubblesort against merge sort. You need more memory for merge sort as this is not in situ but it's O (n log n). Take into account that in computergames the option to use an in situ algorithm is not available. >> > If I were doing such I design in an FPGA however, I would look deeper to >> see >> > what algorithmic changes could be done to take advantage of the >> parallelism >> > offered by the FPGA architecture. Usually that means moving away from a >> > traditional GP CPU architecture which is limited by the inherently serial >> > instruction stream. If you are trying to mimic the behavior of a CPU, you >> would >> > possibly do better with a fast CPU, as you will get be able to run those >> at a >> > higher clock rate. The FPGA gains an advantage over CPUs when you can >> take >> > advantage of parallelism to get much more done in a clock cycle than you >> can >> > with a CPU. >> >> The ability to do many things at once may be a huge advantage. I don't >> really know anything about FPGA's, but I do know that in chess, there are a >> large number of similar calcutions that take place at the same time. The >> more things that can be done in parallel, the better. > >Think of it as a medium for creating a custom logic circuit. A conventional CPU >is specific hardware optimized to perform a wide variety of tasks, none >especially well. Instead we can build a circuit the specifically addresses the >chess algorithms at hand. Now, I don't really know much about the algorithms >used for chess. I suspect one would look ahead at all the possibilities for at >least a few moves ahead and assign some metric to each to determine the one with >the best likely cost/benefit ratio. The FPGA might be used to search all the >possible paths in parallel. My program allows parallellism. i need bigtime locking for this, in order to balance the parallel paths. How are the possibilities in FPGA to press several of the same program at one cpu, so that inside the FPGA there is a sense of parallellism? How about making something that enables to lock within the FPGA? It's not possible my parallellism without locking, as that's the same bubblesort versus merge sort story, as 4 processors my program gets 4.0 speedup, but without the locking 4 processors would be a lot slower than a single sequential processor. >> > That said, I wouldn't recommend that someone without a sound footing in >> > synchronous digital logic design take on such a project. Ideally the >> designer >> > for something like this is very familiar with the FPGA architecture and >> tools >> > (knows what does and doesn't map efficiently in the FPGA architecture), >> and is >> > conversant in computer architecture and design and possibly has some >> pipelined >> > signal processing background (for exposure to hardware efficient >> algorithms, >> > which are usually different than ones optimized for software). >> I am just curious about feasibility, since someone raised the question. I >> would not try such a thing by myself. >> >> Supposing that someone decided to do the project (however) what would a >> rough ball-park guestimate be for design costs, the costs of creating the >> actual masks, and production be for a part like that? > >The nice thing about FPGAs is that there is essentially no NRE or fabrication >costs. The parts are pretty much commodity items, purchased as generic >components. The user develops a program consisting of a compiled digital logic >design, which is then used to field customize the part. Some FPGAs are >programmed once during the product manufacturer (one time programmables include >Actel and Quicklogic). Others, including the Xilinx line, have thousands of >registers that are loaded up by a bitstream each time the device is powered up. >The bitstream is typically stored in an external EPROM memory, or in some cases >supplied by an attached CPU. Part costs range from under $5 for small arrays to >well over $1000 for the newest largest fastest parts. How about a program that's having thousands of chessrules and incredible amount of loops within them and a huge search, So the engine & eval only equalling 1.5mb of C source code. How expensive would that be, am i understaning here that i need for every few rules to spent another $1000 ? >The design effort for the logic circuit you are looking at is not trivial. For >the project you describe, the bottom end would probably be anywhere from 12 >weeks to well over a year of effort depending on the actual complexity of the >design, the experience of the designer with the algorithms, FPGA devices and >tools. I needed years to write it in C already... Vincent Diepeveen diep@xs4all.nl >> -- >> C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html >> "The C-FAQ Book" ISBN 0-201-84519-9 >> C.A.P. Newsgroup http://www.dejanews.com/~c_a_p >> C.A.P. FAQ: ftp://38.168.214.175/pub/Chess%20Analysis%20Project%20FAQ.htm >-- >-Ray Andraka, P.E. >President, the Andraka Consulting Group, Inc. >401/884-7930 Fax 401/884-7950 >email randraka@ids.net >http://users.ids.net/~randrakaArticle: 19401
Hello Thomas If you haven't already seen it, Application Note AN088 on the Altera web site is helpful when programming FLEX10K with the Jam player. http://www.altera.com/document/an/an088.pdf It is important to get the DOS command line for jam right, in particular specifying the actions you require. See the -d command line options on Pages 10 and 11 of AN088 (especially the DO_CONFIGURE command). You may also like to see the Atlas Technical Solutions on the Altera Web Site, especially the following : http://www.altera.com/html/atlas/soln/rd09171998_2963.html "Problem : When I run the JamTM Player, why does it say programming or configuration was successful but nothing happens? " Regards, Michael Thomas Bornhaupt wrote: > Hi, > > i use MAX+plus II 9.3 7/23/1999 (1999.07). I can program the EPF10K10LC84-4 > direcly out of Max+plus. And all works correct! > > Then I try to program the EPF10K10LC84-4 with JAM.EXE (16Bit Dos) on the > same Hardware (Computer and Byteblaster). But it does not work! > > The following lines are from the dosbox > ----------- > Jam (Stapl) Player Version 2.12 > Copywrite (C) 1997-1999 Altera Corporation > Device #1 IDCODE is 010100DD > DONE > Exit code = 0 ... Success > ----------- > > If i take off the cable from the ByteBlaster then i get the following lines: > ----------- > Jam (Stapl) Player Version 2.12 > Copywrite (C) 1997-1999 Altera Corporation > Device #1 IDCODE is FFFFFFFF > DONE > Exit code = 0 ... Success > ---------- > > The ExitCode ist always 0. But the EPF10K10LC84-4 ist not working. > > This Error is so clear, that i think that must be an User Error. > > Which option is not correkt in MAX+plus or JAM.EXE?Article: 19402
Luigi Funes wrote: > Peter, > can you explain what are the "dirty asynchronous tricks" to avoid, please? > The manufacturers specify only the max. delays. I have always to assume > that the min. delay, theorically, could be zero? > If several signals follow similar paths, like in a bus, how have I to assume > the timing relationships between these signals at the end of path? Each > signal could have a different delay on the same device? > And generally, how much real is the timing analysis and simulation? > I belive these are important misunderstood questions. Thank you. > > Luigi Sounds like you use sound design practices. The dirty tricks Peter refers to generally depend on a propagation delay being above some minimum amount for the circuit to work. I've seen too many of these to still believe that people wouldn't design that way. The asynchronous sets/resets on FPGAs have a way of letting some of these 'dirty tricks' sneak in on an otherwise conscientous designer. The timing analyzer in the xilinx tools is pretty good. You do have to make sure you set up your constraints properly. -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 19403
Hi Michael, thank you for your tipps. But it doesnot work. It seemt to me, that the MAX+plus (9.3) genarates wrong JAM or JBC files. I testet Jam.EXE 1.2 with the -dDO_CONFIGURE. But the Chip is not programmed. Inside of the JAM-File (Language 1.1) i found this line BOOLEAN DO_CONFIGURE = 0; So i set it to BOOLEAN DO_CONFIGURE = 1; Starting JAM.EXE i got a syntax-error in line 440! Also i tested JAM.EXE 2.2. Here you have the option -aCONFIGURE. This is the Action out of the JAM-file (STAPL Format): ACTION CONFIGURE = PR_INIT_CONFIGURE, PR_EXECUTE; And now I got an exception. The Dosbox went direcly away and a pure dosmachine hang up with an EMM386 error. regards Thomas Bornhaupt Michael Stanton <mikes@magtech.com.au> schrieb in im Newsbeitrag: 385D6A66.27FE91D4@magtech.com.au... > Hello Thomas > > If you haven't already seen it, Application Note AN088 on the Altera web site is > helpful when programming FLEX10K with the Jam player. > > http://www.altera.com/document/an/an088.pdf > > It is important to get the DOS command line for jam right, in particular > specifying the actions you require. See the -d command line options on Pages 10 > and 11 of AN088 (especially the DO_CONFIGURE command). > > You may also like to see the Atlas Technical Solutions on the Altera Web Site, > especially the following : > > http://www.altera.com/html/atlas/soln/rd09171998_2963.html > > "Problem : When I run the JamTM Player, why does it say programming or > configuration was successful but nothing happens? " > > Regards, Michael > > > Thomas Bornhaupt wrote: > > > Hi, > > > > i use MAX+plus II 9.3 7/23/1999 (1999.07). I can program the EPF10K10LC84-4 > > direcly out of Max+plus. And all works correct! > > > > Then I try to program the EPF10K10LC84-4 with JAM.EXE (16Bit Dos) on the > > same Hardware (Computer and Byteblaster). But it does not work! > > > > The following lines are from the dosbox > > ----------- > > Jam (Stapl) Player Version 2.12 > > Copywrite (C) 1997-1999 Altera Corporation > > Device #1 IDCODE is 010100DD > > DONE > > Exit code = 0 ... Success > > ----------- > > > > If i take off the cable from the ByteBlaster then i get the following lines: > > ----------- > > Jam (Stapl) Player Version 2.12 > > Copywrite (C) 1997-1999 Altera Corporation > > Device #1 IDCODE is FFFFFFFF > > DONE > > Exit code = 0 ... Success > > ---------- > > > > The ExitCode ist always 0. But the EPF10K10LC84-4 ist not working. > > > > This Error is so clear, that i think that must be an User Error. > > > > Which option is not correkt in MAX+plus or JAM.EXE? > > > >Article: 19404
I pose this question on the assumption that an asynchronous clear signal (for example a power on reset) might not reliably initialise all registers of an FSM if it were to release very close to a clock edge. Even if this is a correct assumption, maybe its felt that a reset is such a rare-event that the chances of this happening are 'slim and none' anyway? Any thoughts would be appreciated. regds Mike Sent via Deja.com http://www.deja.com/ Before you buy.Article: 19405
Søren Lambæk wrote in message ... >Hi > >I have designed a Xilinx SpartanXL project in OrCAD EXPRESS using the Xilinx >Aliance 2.1i fitter. >I have a microcontroller wich will handle the loading of the FPGA. > >My question is can someone point me in the right direction on how to include >the FPGA code in the C source kode of >the microcontroller SW. > >Regards >Søren Lambæk >KK-Electronic a/s >Denmark >E-mail: sl@kk-electronic.dk > Hello, You can look up XAPP122 on the XILINX web page. You will have to download 2 files: PCONFIG and MAKESRC. To get makesrc to work you need to make a configuration file to instruct it to create an array, i.e. FPGA_DATA[] to copy from ROM to FPGA. The only thing that you still have to do is to strip the first 7 lines in the .MCS file before PCONFIG. We created a (simple and dirty) batch file to do all the work. If you are interested I can post or mail it. I hope this helps. Mark van de Belt ROAX BV (remove the NOSPAM from the E-mail address to mail me)Article: 19406
>On Sat, 18 Dec 1999 01:24:39 -0700, "Simon Bacon" ... ... > >Why not ask around on some the chess news groups like: >rec.games.chess.computer >to see what part of computer programs use the most time, and then see >if the FPGA's power could help speed up one of those bottlenecks. For my program: evaluation : 90% systemtime search : 10% systemtime that 10% of systemtime are basically: - locking (root, hashtable, searchblocks) - caches to prevent doing more evaluations and smart code to select moves in order to look ahead as deep as possible with the minimal number of 'nodes', where a node can be seen as a position that's processed somehow. The number of evaluations a second in my program is 2000-4000 at a PII450, 60% i get out of a direct cache at the 'evaluation cache' (so that 2000-4000 full evaluations are 40% out of that number). Then another couple of thousands a seconds are 'transpositions' (out of the huge transpositionhashtable, which stores whether a certain path was previously already seen). this totally leads to 12000 to 14000 nodes a second. 12000 to 14000 nodes a second is dead slow compared to many other programs that get way over a 100000. the reason for this is obviously the slow evaluation function. I would be interested in a PCI card having at it hardware (fpga?) which is doing this evaluation and can receive say 160 bytes of position information (can be compressed to a smaller number of bytes) and return to the software an integer which is the evaluation for that given position. In order to be interesting this evaluation must be given to the software at least a 100,000 times a second to be interesting. The big 4 interesting things then are a) what do i need making this? b) the price of making prototypes? c) price a piece making a couple of hundreds of those PCI cards (up to a couple of thousand)? d) price of making a reprogrammable(?) prototype (many bugs in eval get weekly fixed and eval gets weekly expanded a bit)? Thanks in advance, Vincent Diepeveen diep@xs4all.nl www.diepchess.com -- +----------------------------------------------------+ | Vincent Diepeveen email: vdiepeve@cs.ruu.nl | | http://www.students.cs.ruu.nl/~vdiepeve/ | +----------------------------------------------------+Article: 19407
Hi - On Mon, 20 Dec 1999 12:41:19 GMT, micheal_thompson@my-deja.com wrote: > > >I pose this question on the assumption that an asynchronous clear >signal (for example a power on reset) might not reliably initialise all >registers of an FSM if it were to release very close to a clock edge. >Even if this is a correct assumption, maybe its felt that a reset is >such a rare-event that the chances of this happening are 'slim and >none' anyway? >Any thoughts would be appreciated. > >regds >Mike My policy is to give every FSM an asynchronous reset and a synchronous reset. The asynchronous reset puts the FSM in the right state even in the absence of a clock, which is important if the FSM is controlling, say, internal or external TriStates that might otherwise contend. The synchronous reset works around the problem you mentioned (by the way, 'slim and none' is just another phrase for, 'sooner or later, for sure'). I do one-hot FSMs exclusively, and I apply the synchronous reset only to the initial state FF of the FSM; I use it to (a) hold that FF set and (b) gate off that FF's output to any other state FF. I create the synchronous reset with a pipeline of 3 or 4 FFs, all of which get a global reset. A HIGH is fed to the D of the first FF, and gets propagated to the end of the chain after reset is released. The output of the last FF is inverted to produce the active HIGH synchronous reset. For devices that support global sets, you can just set all the FFs, feed a LOW into the first FF, and dispense with the inverter at the end. It's important to clock this FF chain with the same clock used for the FSM, of course. There are other ways to work around this problem, such as adding extra do-nothing states after the initial states in a one-hot, or making sure that the FSM won't transition out of the initial state until a few cycles after the asynch reset has been released. These work, too. The method I've described is easy to do in either schematics or HDL and, if desired, allows you to easily synchronize the startup of multiple FSMs. Take care, Bob Perlman ----------------------------------------------------- Bob Perlman Cambrian Design Works Digital Design, Signal Integrity http://www.best.com/~bobperl/cdw.htm Send e-mail replies to best<dot>com, username bobperl -----------------------------------------------------Article: 19408
Visit the industry's largest independent on-line information source for programmable logic, The Programmable Logic Jump Station. * FREE downloadable FPGA and CPLD design software * Information on devices, boards, books, consultants, etc. * FAQ plus tutorials on VHDL and Verilog http://www.optimagic.com/index.shtml Featuring: --------- --- FREE Development Software --- Free and Low-Cost Software - http://www.optimagic.com/lowcost.shtml Free, downloadable demos and evaluation versions from all the major suppliers. --- Frequently-Asked Questions (FAQ) --- Programmable Logic FAQ - http://www.optimagic.com/faq.html A great resource for designers new to programmable logic. --- FPGAs, CPLDs, FPICs, etc. --- Recent Developments - http://www.optimagic.com/index.shtml Find out the latest news about programmable logic. Device Vendors - http://www.optimagic.com/companies.html FPGA, CPLD, SPLD, and FPIC manufacturers. Device Summary - http://www.optimagic.com/summary.html Who makes what and where to find out more. Market Statistics - http://www.optimagic.com/market.html Total high-density programmable logic sales and market share. --- Development Software --- Design Software - http://www.optimagic.com/software.html Find the right tool for building your programmable logic design. Synthesis Tutorials - http://www.optimagic.com/tutorials.html How to use VHDL or Verilog. --- Related Topics --- FPGA Boards - http://www.optimagic.com/boards.html See the latest FPGA boards and reconfigurable computers. Design Consultants - http://www.optimagic.com/consultants.html Find a programmable logic expert in your area of the world. Research Groups - http://www.optimagic.com/research.html The latest developments from universities, industry, and government R&D facilities covering FPGA and CPLD devices, applications, and reconfigurable computing. News Groups - http://www.optimagic.com/newsgroups.html Information on useful newsgroups. Related Conferences - http://www.optimagic.com/conferences.html Conferences and seminars on programmable logic. Information Search - http://www.optimagic.com/search.html Pre-built queries for popular search engines plus other information resources. The Programmable Logic Bookstore - http://www.optimagic.com/books.html Books on programmable logic, VHDL, and Verilog. Most can be ordered on-line, in association with Amazon.com . . . and much, much more. Bookmark it today!Article: 19409
You can use the global async reset even with fast clocks as long as you include a mechanism to keep critical stuff from starting off until a clock or two after the async global reset is released. There is no issue if the D inputs of the asynchronously reset flip-flops are not at a '1' level when the reset is released. micheal_thompson@my-deja.com wrote: > I pose this question on the assumption that an asynchronous clear > signal (for example a power on reset) might not reliably initialise all > registers of an FSM if it were to release very close to a clock edge. > Even if this is a correct assumption, maybe its felt that a reset is > such a rare-event that the chances of this happening are 'slim and > none' anyway? > Any thoughts would be appreciated. > > regds > Mike > > Sent via Deja.com http://www.deja.com/ > Before you buy. -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 19410
Vincent Diepeveen wrote: > On Sat, 18 Dec 1999 12:50:33 -0500, Ray Andraka <randraka@ids.net> > wrote: > > > > > > >Dann Corbit wrote: > > > >> "Ray Andraka" <randraka@ids.net> wrote in message > >> news:385B1DEE.7517AAC7@ids.net... > >> > The chess processor as you describe would be sensible in an FPGA. Current > >> > offerings have extraordinary logic densities, and some of the newer FPGAs > >> have > >> > over 500K of on-chip RAM which can be arranged as a very wide memory. > >> Some of > >> > the newest parts have several million 'marketing' gates available too. > >> FPGAs > >> > have long been used as prototyping platforms for custom silicon. > >> > >> I am curious about the memory. Chess programs need to access at least tens > >> of megabytes of memory. This is used for the hash tables, since the same > >> areas are repeatedly searched. Without a hash table, the calculations must > >> be performed over and over. Some programs can even access gigabytes of ram > >> when implemented on a mainframe architecture. Is very fast external ram > >> access possible from FPGA's? > > > >This is conventional CPU thinking. With the high degree of parallelism in the > > No this is algorithmic speedup design. > What I meant by this is that just using the FPGA to accelerate the CPU algorithm isn't necessarily going to give you all the FPGA is capable of doing. You need to rethink some of the algorithm to optimize it to the resources you have available in the FPGA. The algorithm as it stands now is at least somewhat tailored to a cpu implementation. It appears your thinking is jsut using the FPGA to speed up the inner loop, where what I am proposing is to rearrange the algorithm so that the FPGA might for example look at the whole board state on the current then next move. In a CPU based algorithm, the storage is cheap and the computation is expensive. In an FPGA, you have an opportunity for very wide parallel processes (you can even send a lock signal laterally across process threads). Here the processing is generally cheaper than the storage of intermediate results. The limiting factor is often the I/O bandwidth, so you want to rearrange your algorithm to tailor it to the quite different limitations of the FPGA. > Branching factor (time multiplyer to see another move ahead) > gets better with it by a large margin. > > So BF in the next formula gets better > > # operations in FGPA = C * (BF^n) > where n is a positive integer. > > >FPGA and the large amount of resources in some of the more recent devices, it > >may very well be that it is more advantageous to recompute the values rather > >than fetching them. There may even be a better approach to the algorithm that > >just isn't practical on a conventional CPU. Early computer chess did not use > >the huge memories. I suspect the large memory is more used to speed up the > >processing rather than a necessity to solving the problem. > > Though #operations used by deep blue was incredible compared to > any program of today at world championship 1999 many programs searched > positionally deeper (deep blue 5 to 6 moves ahead some programs > looking there 6-7 moves ahead). > > This all because of these algorithmic improvements. > > It's like comparing bubblesort against merge sort. > You need more memory for merge sort as this is not in situ but > it's O (n log n). Take into account that in computergames the > option to use an in situ algorithm is not available. > > >> > If I were doing such I design in an FPGA however, I would look deeper to > >> see > >> > what algorithmic changes could be done to take advantage of the > >> parallelism > >> > offered by the FPGA architecture. Usually that means moving away from a > >> > traditional GP CPU architecture which is limited by the inherently serial > >> > instruction stream. If you are trying to mimic the behavior of a CPU, you > >> would > >> > possibly do better with a fast CPU, as you will get be able to run those > >> at a > >> > higher clock rate. The FPGA gains an advantage over CPUs when you can > >> take > >> > advantage of parallelism to get much more done in a clock cycle than you > >> can > >> > with a CPU. > >> > >> The ability to do many things at once may be a huge advantage. I don't > >> really know anything about FPGA's, but I do know that in chess, there are a > >> large number of similar calcutions that take place at the same time. The > >> more things that can be done in parallel, the better. > > > >Think of it as a medium for creating a custom logic circuit. A conventional CPU > >is specific hardware optimized to perform a wide variety of tasks, none > >especially well. Instead we can build a circuit the specifically addresses the > >chess algorithms at hand. Now, I don't really know much about the algorithms > >used for chess. I suspect one would look ahead at all the possibilities for at > >least a few moves ahead and assign some metric to each to determine the one with > >the best likely cost/benefit ratio. The FPGA might be used to search all the > >possible paths in parallel. > > My program allows parallellism. i need bigtime locking for this, in > order to balance the parallel paths. > > How are the possibilities in FPGA to press several of the same program > at one cpu, so that inside the FPGA there is a sense of parallellism? > > How about making something that enables to lock within the FPGA? > > It's not possible my parallellism without locking, as that's the same > bubblesort versus merge sort story, as 4 processors my program gets > 4.0 speedup, but without the locking 4 processors would be a > lot slower than a single sequential processor. > > >> > That said, I wouldn't recommend that someone without a sound footing in > >> > synchronous digital logic design take on such a project. Ideally the > >> designer > >> > for something like this is very familiar with the FPGA architecture and > >> tools > >> > (knows what does and doesn't map efficiently in the FPGA architecture), > >> and is > >> > conversant in computer architecture and design and possibly has some > >> pipelined > >> > signal processing background (for exposure to hardware efficient > >> algorithms, > >> > which are usually different than ones optimized for software). > >> I am just curious about feasibility, since someone raised the question. I > >> would not try such a thing by myself. > >> > >> Supposing that someone decided to do the project (however) what would a > >> rough ball-park guestimate be for design costs, the costs of creating the > >> actual masks, and production be for a part like that? > > > >The nice thing about FPGAs is that there is essentially no NRE or fabrication > >costs. The parts are pretty much commodity items, purchased as generic > >components. The user develops a program consisting of a compiled digital logic > >design, which is then used to field customize the part. Some FPGAs are > >programmed once during the product manufacturer (one time programmables include > >Actel and Quicklogic). Others, including the Xilinx line, have thousands of > >registers that are loaded up by a bitstream each time the device is powered up. > >The bitstream is typically stored in an external EPROM memory, or in some cases > >supplied by an attached CPU. Part costs range from under $5 for small arrays to > >well over $1000 for the newest largest fastest parts. > > How about a program that's having thousands of chessrules and > incredible amount of loops within them and a huge search, > > So the engine & eval only equalling 1.5mb of C source code. > > How expensive would that be, am i understaning here that > i need for every few rules to spent another $1000 ? It really depends on the implementation. The first step in finding a good FPGA implementation is repartitioning the algorithm. This ground work is often the longest part of the FPGA design cycle, and it is a part that is not even really acknowledged in the literature or by the part vendors. Do the system work up front to optimize the architecture for the resoucrces you have available, and in the end you will wind up with something much better, faster, and smaller than anything arrived at by simple translation. At one extreme, one could just us the FPGA to instantiate custom CPUs with a specialized instruction set for the chess program. That approach would likely net you less performance than an emulator for the custom CPU running on a modern machine. The reason for that is the modern CPUs are clocked at considerably higher clock rates than a typical FPGA design is capable of, so even if the emulation takes an average of 4 or 5 cycles for each custom instruction, it will still keep up with or outperform the FPGA. Where the FPGA gets its power is the ability to do lots of stuff at the same time. To take advantage of that, you usually need to get away from an instruction based processor. > > > >The design effort for the logic circuit you are looking at is not trivial. For > >the project you describe, the bottom end would probably be anywhere from 12 > >weeks to well over a year of effort depending on the actual complexity of the > >design, the experience of the designer with the algorithms, FPGA devices and > >tools. > > I needed years to write it in C already... > > Vincent Diepeveen > diep@xs4all.nl > > >> -- > >> C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html > >> "The C-FAQ Book" ISBN 0-201-84519-9 > >> C.A.P. Newsgroup http://www.dejanews.com/~c_a_p > >> C.A.P. FAQ: ftp://38.168.214.175/pub/Chess%20Analysis%20Project%20FAQ.htm > > >-- > >-Ray Andraka, P.E. > >President, the Andraka Consulting Group, Inc. > >401/884-7930 Fax 401/884-7950 > >email randraka@ids.net > >http://users.ids.net/~randraka -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 19411
In article <38542614.91EB57DA@auckland.ac.nz>, Grant Sargent <g.sargent@auckland.ac.nz> wrote: [snip] > I'd recommend it. I did see that a previous poster mentioned it was a > 6-month time-limited version, but the version I've got is unlimited > (well, I'm fairly sure it's unlimited... I didn't see anything that > mentioned a time-limit.) > > Cheers, > Grant > ... the license code does expire after 6 months, but you can then just re- register through the Altera website, and you get a license file good for another 6 months. It's sort of a marketing thing, I suppose; keeps eyes on their site. I have to cast my vote for Altera also- as a cold- start novice, I got up the curve pretty quickly based on their help files and examples. Good luck- Joe Curren Sent via Deja.com http://www.deja.com/ Before you buy.Article: 19412
>can you explain what are the "dirty asynchronous tricks" to avoid, please? Yep, I also would like to know what you shouldn't do. I have been taught to keep everything synchronous. Some stuff I've learned is: - Never source an asynch reset or clock from comb. logic - No comb. feedback loops - Synch your asynch inputs inputs carefully (if needed) - Use one global clock (Be aware of clock skew and short paths) - Use one global reset, but take careful care of your FSM reset - Keep everything synchronous! Will this avoid "dirty asynchronous tricks" or what are those tricks? Merry X-Mas from an unexperienced engineer!Article: 19413
John L. Smith <jsmith@visicom.com> wrote... > Seems to me that if chess is implementable in FPGA, place & route > ought to be accelerable too. And that place/route should have a > higher priority. I'd rather play chess against people. > > When do we get FPGA accelerated place/route, to reduce our P/R times > from hours to minutes? Its only a factor of 60 acceleration we're > looking for. > > Take the discussion below, and replace the words "moves" or "board > positions" with the word "placements" or "routings" where appropriate. Some guys at Xerox PARC wre looking at that a few years ago. In principle something like Lee's algorithm is a natural. The problem is that users are rarely looking for a simple route solution; the 'rules' set for a modern design can be huge - things like layer assignment, crosstalk, trace widths, and on and on. Sorry, I no longer have any references to the Xerox work.Article: 19414
<micheal_thompson@my-deja.com> wrote... > > > I pose this question on the assumption that an asynchronous clear > signal (for example a power on reset) might not reliably initialise all > registers of an FSM if it were to release very close to a clock edge. > Even if this is a correct assumption, maybe its felt that a reset is > such a rare-event that the chances of this happening are 'slim and > none' anyway? > Any thoughts would be appreciated. The chance of it happening on your bench is 'slim to none'. The chance of it happening in the customer's equipment is 100%. I have an existence proof of this :)Article: 19415
This is a multi-part message in MIME format. --------------57B64DD0374D5F0A238C4420 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Seems to me that if chess is implementable in FPGA, place & route ought to be accelerable too. And that place/route should have a higher priority. I'd rather play chess against people. When do we get FPGA accelerated place/route, to reduce our P/R times from hours to minutes? Its only a factor of 60 acceleration we're looking for. Take the discussion below, and replace the words "moves" or "board positions" with the word "placements" or "routings" where appropriate. Dave Decker wrote: > "Simon Bacon" > <simon@tile.demon.co.uk.notreally> wrote: > >Could you post a few examples of the sort of primitives you > >would like to see a Chess Machine execute. > The partition between the work done by the micro or DSP and the work > done by the FPGA, is usually best made giving the micro the more > complex algorithmic jobs and giving the FPGA the compute intensive, > but algorithmicly simple, repetitive, flow through tasks. > > Chess programs have to: > Generate a tree of all possible moves from the current position for a > depth of a few generations, the more the better. > > Prune that tree so that stupid moves are not investigated, giving time > for more interesting moves to be probed to more generations. > > As each new possible future board position is postulated it must be > evaluated. > > It seems that one first task the FPGA could do is to evaluate a board > position and return its merit. > > If that's not enough work, perhaps the FPGA could also generate a list > of every possible next half move and return that list. > > The micro would be used for the more complex task of pruning the tree > and sending the next position to be evaluated to the FPGA(s). The > micro would need access to big memory. The FPGA would just run a > subroutine, without need to reference the history or progress of the > overall algorithm. > --------------57B64DD0374D5F0A238C4420 Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for John L. Smith Content-Disposition: attachment; filename="vcard.vcf" begin: vcard fn: John L. Smith n: Smith;John L. org: Visicom Imaging Products adr: 10052 Mesa Ridge Court;;;San Diego;CA;92121;USA email;internet: jsmith@visicom.com title: Principal Engineer tel;work: 858-320-4102 tel;fax: 858-?????? note: http://www.visicom.com/products/Vigra/index.html x-mozilla-cpt: ;0 x-mozilla-html: TRUE version: 2.1 end: vcard --------------57B64DD0374D5F0A238C4420--Article: 19416
John L. Smith <jsmith@visicom.com> wrote in message news:385EC613.57BAFBF0@visicom.com... > Seems to me that if chess is implementable in FPGA, place & route > ought to be accelerable too. It is. Some models of those big ASIC emulation boxes made by Ikos and QuickTurn use Xilinx FPGAs, and they're running the same old PAR that you and I are... just on racks of PCs simultaneously to speed up the process. I don't know just how many are run in parallel, however. > When do we get FPGA accelerated place/route, to reduce our P/R times > from hours to minutes? Its only a factor of 60 acceleration we're > looking for. 64 PCs (which wouldn't be 60x, but...) is certainly under $64K... how much money do you have? :-)Article: 19417
I agree with Mark here that you should go after MakeSrc. It's a simple, clean little program that'll generate C source for you. On our last project, we used it until we ran out of memory and had to stuff the bitsream into Flash. In any case, it's configurable enough so that you can have it add lines before and after the bulk of the FPGA bitstream -- this is useful for adding a line that defines a variable telling your programming subroutine the size of the bitstream. (I.e., you don't want to have a header file that says #define BitSreamLength=xxxx -- make the compiler do the work for you, something like const int SizeOfFPGABitsream=sizeof(TheBigHonkingFPGABitStreamArray)) Another thing to check when you do this -- make sure the definition of the datastream itself has a 'const' qualifier in front of it! (E.g., const unsigned myBitsream[]={ ...} ). If you don't do this, and if you linker thinks it's targeting a ROM, it'll happily reserve the same amount of space out of your RAM and the C environment startup code will copy it from 'ROM' to RAM. Uggh. > We created a (simple and dirty) batch file to do all the work. ...or coerce your makefile into doing this. ---Joel KolstadArticle: 19418
Hi Thomas We have never had to alter any lines inside the Jam source file and have always been able to use the .jam file produced by Max+Plus II. The following is the DOS command line we use to program a FLEX 10K30A as part of a three device JTAG chain : jam -v -dDO_CONFIGURE=1 -p378 cpld_top.jam We are using Jam.exe ver 1.2 and Max+Plus II 9.3 and have a ByteBlasterMV connected to a standard PC printer port (LPT1 at 378h) via a 2m long D25M-D25F extension cable. There are two versions of the jam.exe ; 16-bit-DOS and Win95-WinNT. Have you tried each version ? Can't think of anything else to try, - hope it works out for you ! Regards, Michael Thomas Bornhaupt wrote: > Hi Michael, > > thank you for your tipps. But it doesnot work. > > It seemt to me, that the MAX+plus (9.3) genarates wrong JAM or JBC files. > > I testet Jam.EXE 1.2 with the -dDO_CONFIGURE. But the Chip is not > programmed. > > Inside of the JAM-File (Language 1.1) i found this line > > BOOLEAN DO_CONFIGURE = 0; > > So i set it to > > BOOLEAN DO_CONFIGURE = 1; > > Starting JAM.EXE i got a syntax-error in line 440! > > Also i tested JAM.EXE 2.2. Here you have the option -aCONFIGURE. This is the > Action out of the JAM-file (STAPL Format): > > ACTION CONFIGURE = PR_INIT_CONFIGURE, PR_EXECUTE; > > And now I got an exception. The Dosbox went direcly away and a pure > dosmachine hang up with an EMM386 error. > > regards > Thomas BornhauptArticle: 19419
That pretty much covers it. Add don't use comb. logic for intentional delays. Jonas Thor wrote: > >can you explain what are the "dirty asynchronous tricks" to avoid, please? > > Yep, I also would like to know what you shouldn't do. I have been > taught to keep everything synchronous. Some stuff I've learned is: > > - Never source an asynch reset or clock from comb. logic > - No comb. feedback loops > - Synch your asynch inputs inputs carefully (if needed) > - Use one global clock (Be aware of clock skew and short paths) > - Use one global reset, but take careful care of your FSM reset > - Keep everything synchronous! > > Will this avoid "dirty asynchronous tricks" or what are those tricks? > > Merry X-Mas from an unexperienced engineer! -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 19420
I am have heard of a limited amount of work using FPGAs to speed up PAR for FPGAs. For some reason, there doesn't seem to be much mainstream interest. Perhaps it is because they (who ever that is) assume the average FPGA user who only does a design or so a year isn't going to spring several grand for an accelerator board to save what amounts to a very small amount of time. John L. Smith wrote: > Seems to me that if chess is implementable in FPGA, place & route > ought to be accelerable too. And that place/route should have a > higher priority. I'd rather play chess against people. > > When do we get FPGA accelerated place/route, to reduce our P/R times > from hours to minutes? Its only a factor of 60 acceleration we're > looking for. > > Take the discussion below, and replace the words "moves" or "board > positions" with the word "placements" or "routings" where appropriate. > > Dave Decker wrote: > > "Simon Bacon" > <simon@tile.demon.co.uk.notreally> wrote: > > >Could you post a few examples of the sort of primitives you > > >would like to see a Chess Machine execute. > > The partition between the work done by the micro or DSP and the work > > done by the FPGA, is usually best made giving the micro the more > > complex algorithmic jobs and giving the FPGA the compute intensive, > > but algorithmicly simple, repetitive, flow through tasks. > > > > Chess programs have to: > > Generate a tree of all possible moves from the current position for a > > depth of a few generations, the more the better. > > > > Prune that tree so that stupid moves are not investigated, giving time > > for more interesting moves to be probed to more generations. > > > > As each new possible future board position is postulated it must be > > evaluated. > > > > It seems that one first task the FPGA could do is to evaluate a board > > position and return its merit. > > > > If that's not enough work, perhaps the FPGA could also generate a list > > of every possible next half move and return that list. > > > > The micro would be used for the more complex task of pruning the tree > > and sending the next position to be evaluated to the FPGA(s). The > > micro would need access to big memory. The FPGA would just run a > > subroutine, without need to reference the history or progress of the > > overall algorithm. > > > > ------------------------------------------------------------------------ > > John L. Smith <jsmith@visicom.com> > Principal Engineer > Visicom Imaging Products > > John L. Smith > Principal Engineer <jsmith@visicom.com> > Visicom Imaging Products HTML Mail > 10052 Mesa Ridge Court Work: 858-320-4102 > San Diego Fax: 858-?????? > CA Netscape Conference Address > 92121 Netscape Conference DLS Server > USA > http://www.visicom.com/products/Vigra/index.html > Additional Information: > Last Name Smith > First Name John L. > Version 2.1 -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 19421
perhaps another one ... no self-clearing structures (f-f output to it's clear). find another way to make that pulse! i've seen this too many times! ---------------------------------------------------------------------- rk The world of space holds vast promise stellar engineering, ltd. for the service of man, and it is a stellare@erols.com.NOSPAM world we have only begun to explore. Hi-Rel Digital Systems Design -- James E. Webb, 1968 Ray Andraka wrote: > That pretty much covers it. Add don't use comb. logic for intentional delays. > > Jonas Thor wrote: > > > >can you explain what are the "dirty asynchronous tricks" to avoid, please? > > > > Yep, I also would like to know what you shouldn't do. I have been > > taught to keep everything synchronous. Some stuff I've learned is: > > > > - Never source an asynch reset or clock from comb. logic > > - No comb. feedback loops > > - Synch your asynch inputs inputs carefully (if needed) > > - Use one global clock (Be aware of clock skew and short paths) > > - Use one global reset, but take careful care of your FSM reset > > - Keep everything synchronous! > > > > Will this avoid "dirty asynchronous tricks" or what are those tricks? > > > > Merry X-Mas from an unexperienced engineer! > > -- > -Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email randraka@ids.net > http://users.ids.net/~randrakaArticle: 19422
Hi, Anyone can share as to the useful links or knowhow as to how to build an automated testbench. Generally, I am interested as to what are the issues involved and how to get around it? I am also interested to know how to obtain a good text handling package to obtain the stimulus / vector from a file and then cycle n no of cycle to obtain a result to be outputed to a file for verification beside the limited package of std.textio.all and ieee.std_logic_textio.all. Thanks yeoArticle: 19423
> no self-clearing structures (f-f output to it's clear). find another way to make > that pulse! i've seen this too many times! I'd go much farther than that. Any pulse that isn't a multiple of the clock frequency (made by clocking a FF in a FSM) is asking for troubles. How about a list of reasonable ways to make a shorter pulse? (and/or things to keep in mind when you do) I think one of the Xilinx ap-notes mentions at least one. It may be a very old one. -- These are my opinions, not necessarily my employers.Article: 19424
Hal Murray wrote: > > no self-clearing structures (f-f output to it's clear). find another way to make > > that pulse! i've seen this too many times! > > I'd go much farther than that. Any pulse that isn't a multiple > of the clock frequency (made by clocking a FF in a FSM) is asking > for troubles. once circuit i ran into that comes to mind, one of the most memorable ones, was when there were two flip-flops, their outputs were NANDED, and the output of the NAND was hooked up to the clears of both flip-flops and the NAND was an output of that sub-circuit. it's nice to always be able to make a pulse an integral number of clock ticks ... however, system requirements and limitations do not always make that practical. the nice thing about making everything go off one edge of the clock, a frequent recommendation, is that it makes the static timing analysis trivial. same with the other rules ... nice to follow but can't all the time. for example, sometimes i just run out of low-skew clocks ... you can design reliabily with high-skew clocks, there are a number of ways to do it, but they aren't pretty and take some care. another example is designing for very low power. ---------------------------------------------------------------------- rk The world of space holds vast promise stellar engineering, ltd. for the service of man, and it is a stellare@erols.com.NOSPAM world we have only begun to explore. Hi-Rel Digital Systems Design -- James E. Webb, 1968
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z