Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Theo Markettos <theom+news@chiark.greenend.org.uk> wrote: (previously snipped project suggestions) > The trouble with all these projects is they're something a GPU could do with > much less programming effort (at least to make it work non-optimally). So > I'm not sure the advantage of using an FPGA. In an FPGA it's a lot harder > to change the architecture if the problem changes (at least if you're > writing in Verilog/VHDL it is). If you can do them in fixed point, you can make really big arrays to process really big data sets, though not so cheap. There are people who need that, but not so many of them. Learning how to do it isn't bad, though. (snip) > One thing FPGAs are good at I/O. So a nice example is video processing - > you take in video from a camera, do something clever to it, and output to a > display. There's a lot of data so you have to process it fast, and it's a > nice visual demo. It's also easy to debug - you can see what's going wrong > on the screen. Well, many filtering algorithms can be implemented as systolic arrays, which allow for minimal I/O for the processing done. Implementing an FIR filter in fixed point in an FPGA would be a reasonable sized project. Again, learn about systolic arrays. > Likewise other kinds of non-optical data (eg scan data from a 2D sensor of > some kind). You can also use audio or other sensors, as long as you have a > useful output. -- glenArticle: 157126
On 10/14/2014 8:17 PM, Theo Markettos wrote: > rickman <gnuarm@gmail.com> wrote: >> Oceanic modeling is a huge area. You might want to narrow the focus on >> that one a *lot* more before you try to narrow your list... or just >> remove it. > > The trouble with all these projects is they're something a GPU could do with > much less programming effort (at least to make it work non-optimally). So > I'm not sure the advantage of using an FPGA. In an FPGA it's a lot harder > to change the architecture if the problem changes (at least if you're > writing in Verilog/VHDL it is). Why is an HDL harder to change than any other code? I use the same editor for both... >> An area I find interesting is low power processing. You might consider >> what it takes to do something with a minimum of power consumption using >> off the shelf devices. There are a lot of potential applications there. > > One thing FPGAs are good at I/O. So a nice example is video processing - > you take in video from a camera, do something clever to it, and output to a > display. There's a lot of data so you have to process it fast, and it's a > nice visual demo. It's also easy to debug - you can see what's going wrong > on the screen. That is very true. > Likewise other kinds of non-optical data (eg scan data from a 2D sensor of > some kind). You can also use audio or other sensors, as long as you have a > useful output. I/O is a big plus for an FPGA. But I think the OP wants something that deals with some current major problem. I wonder what medical app might be suitable for an FPGA. Something that uses an array of sensors to measure body contour or pressure maybe, like a footstep? -- RickArticle: 157127
rickman <gnuarm@gmail.com> wrote: > On 10/14/2014 8:17 PM, Theo Markettos wrote: > > rickman <gnuarm@gmail.com> wrote: > >> Oceanic modeling is a huge area. You might want to narrow the focus on > >> that one a *lot* more before you try to narrow your list... or just > >> remove it. > > > > The trouble with all these projects is they're something a GPU could do > > with much less programming effort (at least to make it work > > non-optimally). So I'm not sure the advantage of using an FPGA. In an > > FPGA it's a lot harder to change the architecture if the problem changes > > (at least if you're writing in Verilog/VHDL it is). > > Why is an HDL harder to change than any other code? I use the same > editor for both... Changing small-scale stuff is easy, in any language. Re-architecting the problem is harder. In Verilog/VHDL it's hard because you have to rewrite all the control logic as well as reorganise the datapath. Let's say you built a single-issue CPU and you want to convert it to superscalar. Not only do you need to rewrite the datapath (not trivial) you have to manage all the enable signals on the pipeline stages and the state machine about when each stage fires. If you get one of those interlocks wrong you get subtle bugs. If you change something, you may get a different set of subtle bugs. Rinse and repeat. If you're writing code on a GPU you're writing in a much higher level language: the API doesn't even know how many cores you have (it'll depend what model of GPU your machine has) - you just give it the work to do and it'll partition it up amongst the cores. While there are many subtleties about writing efficient GPU code (you need to know a lot about the underlying architecture to achieve good performance), it's relatively simple to write bad GPU code that works, and then you can refine it later. Bad HDL tends not to work. Not working means staring at simulator traces, which is not a pleasant experience. Or it works in the simulator but not on the board, because the language (I'm looking at Verilog particularly) isn't sufficiently strict about what the expected behaviour should be (and then you get to stare at ChipScope/SignalTap traces, an even less pleasant experience). > I/O is a big plus for an FPGA. But I think the OP wants something that > deals with some current major problem. The issue for compute problems is always going to be that the FPGA at, say, 200MHz and one stick of DDR3 RAM is up against the multi-GHz GPU with thousands of threads, GDDR5 memory, and so on. There are applications that don't suit GPUs, but unless you have a good architectural reason why it won't work I'd say in most cases you're better off starting with a GPU. However, this is putting the cart before the horse. If you stare at your algorithm for long enough, with the FPGA or GPU architecture in mind, you can probably make significant performance increase by refactoring the task before writing a line of code. I realise this is a student project so 'doing something with an FPGA' might be more of a goal than 'making X go faster', but we tend to see a lot of papers which go like this: 1. Built a Matlab/Java/Python simulator that ran at speed X 2. Built an FPGA system that runs at speed 100X 3. Profit!^H^H^H^H Publish! When the intermediate steps might be 1b. Refactored algorithm (with caches, memory bandwidth, etc in mind) 1c. Built a multithreaded C/C++ simulator that runs at speed 70X on the same hardware as Matlab result 1d. Run that on a proper server, not their 5 year old laptop at which point why bother with this FPGA stuff? TheoArticle: 157128
Theo Markettos <theom+news@chiark.greenend.org.uk> wrote: > rickman <gnuarm@gmail.com> wrote: >> On 10/14/2014 8:17 PM, Theo Markettos wrote: (snip) >> > The trouble with all these projects is they're something a GPU could do >> > with much less programming effort (at least to make it work >> > non-optimally). So I'm not sure the advantage of using an FPGA. In an >> > FPGA it's a lot harder to change the architecture if the problem changes >> > (at least if you're writing in Verilog/VHDL it is). For those problems, you should use a GPU. There are some problems where an FPGA is a good solution, though. First, they pretty much have to be able to be done in fixed point. Next, they have to be done on a really huge scale. If all the arithmetic operations are fixed point add, subtract, and compare, you can do a really huge number of them in an array af FPGAs. >> Why is an HDL harder to change than any other code? I use the same >> editor for both... > Changing small-scale stuff is easy, in any language. Re-architecting the > problem is harder. In Verilog/VHDL it's hard because you have to rewrite > all the control logic as well as reorganise the datapath. Linear systolic arrays are pretty easy to change. It is a linear array of relatively simple (but in any case, a module in the appropriate HDL) cells. You can put more or less in a single FPGA, and make a linear array of such FPGAs when needed. > Let's say you built a single-issue CPU and you want to convert it to > superscalar. Not only do you need to rewrite the datapath (not > trivial) you have to manage all the enable signals on the pipeline > stages and the state machine about when each stage fires. > If you get one of those interlocks wrong you get subtle bugs. > If you change something, you may get a different set of subtle bugs. > Rinse and repeat. In that case, no, don't use an FPGA. > If you're writing code on a GPU you're writing in a much higher level > language: the API doesn't even know how many cores you have > (it'll depend what model of GPU your machine has) - you just give > it the work to do and it'll partition it up amongst the cores. I am not so sure what is now being done with large arrays of FPGAs (not clusters of PCs with a few GPUs in them). If it needs floating point, and not all problems that are commonly done in floating point should be, then GPU might be a better choice. (sni) >> I/O is a big plus for an FPGA. But I think the OP wants something that >> deals with some current major problem. > The issue for compute problems is always going to be that the FPGA at, say, > 200MHz and one stick of DDR3 RAM is up against the multi-GHz GPU with > thousands of threads, GDDR5 memory, and so on. There are applications that > don't suit GPUs, but unless you have a good architectural reason why it > won't work I'd say in most cases you're better off starting with a GPU. I have written verilog that could do 1e19 operations, which are 5 bit add/subtract/compare per day. There is an actual problem that can use that much computation. How many GPUs does it take to do 1e19 arithmetic operations per day? > However, this is putting the cart before the horse. If you stare at your > algorithm for long enough, with the FPGA or GPU architecture in mind, you > can probably make significant performance increase by refactoring the task > before writing a line of code. > I realise this is a student project so 'doing something with an FPGA' might > be more of a goal than 'making X go faster', but we tend to see a lot of > papers which go like this: > 1. Built a Matlab/Java/Python simulator that ran at speed X > 2. Built an FPGA system that runs at speed 100X > 3. Profit!^H^H^H^H Publish! Well, at this point he only needs to show that it could be done. That is, proof of concept. Only when someone puts of the money does he have to show that it can scale. > When the intermediate steps might be > 1b. Refactored algorithm (with caches, memory bandwidth, etc in mind) > 1c. Built a multithreaded C/C++ simulator that runs at speed 70X > on the same hardware as Matlab result > 1d. Run that on a proper server, not their 5 year old laptop > at which point why bother with this FPGA stuff? For some actual examples of FPGA based hardware processors see: http://www.timelogic.com/catalog/775 -- glenArticle: 157129
On Tuesday, 14 October 2014 05:09:49 UTC+13, awais...@namal.edu.pk wrote: > I am student of Bachelors and going to start my FYP in some days. I am go= ing into the field of high computation in verilog.=20 > > Any other projects you might suggest that may be beneficial for me. >=20 > Thanks! It isn't so much computation, and I know nothing about the bioinformatics f= ield, but some sort of DNA pattern matching algo always struck me as being = an interesting area to explore. The data objects are small, the data set si= zes are large, and the parallel nature of FPGAs and internal memory bandwid= th can be exploited. A processor can compare a couple symbols per cycle, a = GPU might do a few 100 or a thousand symbols per cycle. An low end FPGA cou= ld do a few thousands per cycle. Is it best to have 'n' tiny little state machines, each detecting one of 'n= ' patterns, or do you timeslice 'x' state machines, each looking for x/n pa= tterns? Is it best to look at data in big gulps, or one symbol at a time? How is the best way to look for patterns? A giant grep-like FSM, or multipl= e smaller FSMs? Do you spread the FSMs into a pipeline (each stage feeding = onto the next) or do you use local feedback? Can FSMs be partitioned to max= imise efficiency? Can you leverage the underlying FPGA architecture to your= advantage (e.g. cascades in DSP blocks, coupling between BRAM blocks). I like this idea because the FPGA side FSMs are relatively simple, and most= of the technology is in how you generate the tables that allow you to sear= ch quickly and efficiently. It could also easily implement pattern matches that are tricky to do in S/W= .=20 Gosh Darn - looks like somebody has been there before (not that I've actual= ly read the papers... )=20 http://www.ipcsit.com/vol2/38-A313.pdf http://ieee-hpec.org/2012/index_htm_files/Fernandez.pdf I also thought that some sort of particle simulation (e.g. Photon Mapping) = would be interesting to explore, but never had the time. MikeArticle: 157130
On 10/15/2014 7:57 PM, Theo Markettos wrote: > rickman <gnuarm@gmail.com> wrote: >> On 10/14/2014 8:17 PM, Theo Markettos wrote: >>> rickman <gnuarm@gmail.com> wrote: >>>> Oceanic modeling is a huge area. You might want to narrow the focus on >>>> that one a *lot* more before you try to narrow your list... or just >>>> remove it. >>> >>> The trouble with all these projects is they're something a GPU could do >>> with much less programming effort (at least to make it work >>> non-optimally). So I'm not sure the advantage of using an FPGA. In an >>> FPGA it's a lot harder to change the architecture if the problem changes >>> (at least if you're writing in Verilog/VHDL it is). >> >> Why is an HDL harder to change than any other code? I use the same >> editor for both... > > Changing small-scale stuff is easy, in any language. Re-architecting the > problem is harder. In Verilog/VHDL it's hard because you have to rewrite > all the control logic as well as reorganise the datapath. > > Let's say you built a single-issue CPU and you want to convert it to > superscalar. Not only do you need to rewrite the datapath (not trivial) you > have to manage all the enable signals on the pipeline stages and the state > machine about when each stage fires. If you get one of those interlocks > wrong you get subtle bugs. If you change something, you may get a different > set of subtle bugs. Rinse and repeat. > > If you're writing code on a GPU you're writing in a much higher level > language: the API doesn't even know how many cores you have (it'll depend > what model of GPU your machine has) - you just give it the work to do and > it'll partition it up amongst the cores. I don't follow. How is rearchitecting a CPU in an FPGA anything like changing GPU code? Your explanation makes no sense. I think you are working from a huge lack of knowledge of HDLs. > While there are many subtleties about writing efficient GPU code (you need > to know a lot about the underlying architecture to achieve good > performance), it's relatively simple to write bad GPU code that works, and > then you can refine it later. Bad HDL tends not to work. Not working means > staring at simulator traces, which is not a pleasant experience. Or it > works in the simulator but not on the board, because the language (I'm > looking at Verilog particularly) isn't sufficiently strict about what the > expected behaviour should be (and then you get to stare at > ChipScope/SignalTap traces, an even less pleasant experience). > >> I/O is a big plus for an FPGA. But I think the OP wants something that >> deals with some current major problem. > > The issue for compute problems is always going to be that the FPGA at, say, > 200MHz and one stick of DDR3 RAM is up against the multi-GHz GPU with > thousands of threads, GDDR5 memory, and so on. There are applications that > don't suit GPUs, but unless you have a good architectural reason why it > won't work I'd say in most cases you're better off starting with a GPU. The CPU has only a handful of ALUs to perform useful calculations on, the FPGA is limited only by its size. The clock speed is swamped by the sheer number of computations that can happen in parallel. The GPU has lots of ALUs, but is limited in how they are used. It is *nothing* like having 1000 separate processors. So it can only be useful on certain types of problems. The FPGA gets around all of these issues and can be configured on the fly. > However, this is putting the cart before the horse. If you stare at your > algorithm for long enough, with the FPGA or GPU architecture in mind, you > can probably make significant performance increase by refactoring the task > before writing a line of code. > > I realise this is a student project so 'doing something with an FPGA' might > be more of a goal than 'making X go faster', but we tend to see a lot of > papers which go like this: > > 1. Built a Matlab/Java/Python simulator that ran at speed X > 2. Built an FPGA system that runs at speed 100X > 3. Profit!^H^H^H^H Publish! > > When the intermediate steps might be > 1b. Refactored algorithm (with caches, memory bandwidth, etc in mind) > 1c. Built a multithreaded C/C++ simulator that runs at speed 70X on the same > hardware as Matlab result > 1d. Run that on a proper server, not their 5 year old laptop > > at which point why bother with this FPGA stuff? I gave one reason which you didn't respond to. -- RickArticle: 157131
Mike Field <mikefield1969@gmail.com> wrote: (snip) > It isn't so much computation, and I know nothing about the > bioinformatics field, but some sort of DNA pattern matching algo > always struck me as being an interesting area to explore. > The data objects are small, the data set sizes are large, > and the parallel nature of FPGAs and internal memory bandwidth > can be exploited. A processor can compare a couple symbols per > cycle, a GPU might do a few 100 or a thousand symbols per cycle. > An low end FPGA could do a few thousands per cycle. I think that is about right. And run at about 200MHz, maybe 300MHz. FPGAs have the registers built in, so you just have to be sure to use enough of them. > Is it best to have 'n' tiny little state machines, each detecting > one of 'n' patterns, or do you timeslice 'x' state machines, > each looking for x/n patterns? Is it best to look at data in > big gulps, or one symbol at a time? https://en.wikipedia.org/wiki/Dynamic_programming#Sequence_alignment > How is the best way to look for patterns? A giant grep-like FSM, > or multiple smaller FSMs? Do you spread the FSMs into a pipeline > (each stage feeding onto the next) or do you use local feedback? The latter is probably the right description. The idea of dynamic programming is that if you make the optimal decision at each point, you find the globally optimal solution. Conveniently, systolic arrays are convenient for evaluating dynamic programming algorithms, and also nice and efficient to implement in FPGAs. (Or ASICs, sometimes.) > Can FSMs be partitioned to maximise efficiency? > Can you leverage the underlying FPGA architecture to your > advantage (e.g. cascades in DSP blocks, coupling > between BRAM blocks). -- glenArticle: 157132
On 10/16/2014 7:55 AM, glen herrmannsfeldt wrote: > Mike Field <mikefield1969@gmail.com> wrote: > > (snip) > >> It isn't so much computation, and I know nothing about the >> bioinformatics field, but some sort of DNA pattern matching algo >> always struck me as being an interesting area to explore. >> The data objects are small, the data set sizes are large, >> and the parallel nature of FPGAs and internal memory bandwidth >> can be exploited. A processor can compare a couple symbols per >> cycle, a GPU might do a few 100 or a thousand symbols per cycle. >> An low end FPGA could do a few thousands per cycle. > > I think that is about right. And run at about 200MHz, maybe 300MHz. > FPGAs have the registers built in, so you just have to be sure > to use enough of them. I think they have dealt with that one pretty well. After all, they have sequenced the human genome. >> Is it best to have 'n' tiny little state machines, each detecting >> one of 'n' patterns, or do you timeslice 'x' state machines, >> each looking for x/n patterns? Is it best to look at data in >> big gulps, or one symbol at a time? > > https://en.wikipedia.org/wiki/Dynamic_programming#Sequence_alignment > >> How is the best way to look for patterns? A giant grep-like FSM, >> or multiple smaller FSMs? Do you spread the FSMs into a pipeline >> (each stage feeding onto the next) or do you use local feedback? > > The latter is probably the right description. > > The idea of dynamic programming is that if you make the optimal > decision at each point, you find the globally optimal solution. This is not a global truth. It assumes the path to the optimal solution is monotonic which it may not be. -- RickArticle: 157133
On Friday, 17 October 2014 05:17:04 UTC+13, rickman wrote: >=20 > I think they have dealt with that one pretty well. After all, they have= =20 > sequenced the human genome. >=20 Sure, but in this age of "big data" how quickly can you query 1000s of genm= oes, each approximately 4 billion symbols in size, looking for somewhat fuz= zy matches? Oldish (2006) papers talk about a 2GHz Xeon processing 32M characters per s= econd, and 16 CPU Power system processing 1.2G symbols per second. This is = obvoiusly bound by CPU cycles and not memory or I/O bandwidth. FPGA hardware, has come a long way in that time, as have memory capacity, m= emory bandwidths and I/O subsystems. Standard CPUs haven't progressed at su= ch a dramatic pace, just adding more cores. I am pretty sure that revisitin= g it with an FPGA board that could hold the entire genome in memory, full m= emory bandwidth speed (a 16x 333 MHz DDR memory can deliver 1.2G symbols pe= r second) should be able to get close to this on a tiny power budget.=20 If the process isn't limited by I/O bandwidth, then it isn't running fast e= nough :-) Maybe it could be implemented on one of those ARM/FPGA hybrid chips, with t= he fabric having a larger private memory to hold the genome data, and the A= RM just performing command and control... it would avoid a lot of the compl= exity of high speed I/O.Article: 157134
rickman <gnuarm@gmail.com> wrote: (snip) >> I think that is about right. And run at about 200MHz, maybe 300MHz. >> FPGAs have the registers built in, so you just have to be sure >> to use enough of them. > I think they have dealt with that one pretty well. After all, they have > sequenced the human genome. For some actual numbers of what you can do today: http://res.illumina.com/documents/products/datasheets/datasheet_hiseq2500.pdf this machine can generate 4 billion reads (sequences) of 125 base pairs, for a total of 500 Gbp in six days. You then want to compare that the the reference (human) genome (if it is human data) or 3Gbp. The dynamic programming algorithm gives you the score for each 125bp fragment against the reference, including appropriate penalty (usually 1 each) for insertions, deletions, or substitutions. (The algorithm is the same one, or similar to, the one used by diff. The original diff got the algorithm from one that was used for protein sequences in the 1970s.) Since the reads are up to 125bp long, if you score +1 for a match, the score can't go over 127, and so 7 bits is enough. It takes five to seven add/subtract/compare operations, 7 bit fixed point, to compare each new base against each base of the reference. So, 5e11*3e9/6 days or 2.5e20 dynamic programming cells per day. Times 5, so 1.25e21 7 bit add/subtract/compare per day. How fast is your GPU? (If you want to sequence a new genome, it is done at about 10x coverage. You randomly select 30Gbp of 125bp fragments, and hope that they cover most of the genome to a depth of at least two. So, the above machine can sequence about 12 humans in 6 days.) The sequencers have gotten somewhat faster since the last time I did this calculations. Note that for many years now, it isn't the chemistry that limits it, but the data processing. >>> Is it best to have 'n' tiny little state machines, each detecting >>> one of 'n' patterns, or do you timeslice 'x' state machines, >>> each looking for x/n patterns? Is it best to look at data in >>> big gulps, or one symbol at a time? >> https://en.wikipedia.org/wiki/Dynamic_programming#Sequence_alignment (snip) >> The idea of dynamic programming is that if you make the optimal >> decision at each point, you find the globally optimal solution. > This is not a global truth. It assumes the path to the optimal > solution is monotonic which it may not be. People only think up algorithms that satisfy the restrictions for dynamic programming. The one commonly used does local alignment, so finds the highest scoring match between each input sequence and the reference, including all combinations of insertion, deletion, or substitution. (Think about spell checkers finding the close words to you misspelled word.) The five operation algorithm scores pretty much the way you would for words. With a little more work, you an do affine gap scoring, there the penalty for a gap has an open penalty and extend penalty, such that longer gaps are not proportionally penalized. You can make even more complicated gap penalty functions. The times are already long enough. No point in going to one that is exponential in the length of the fragments. -- glenArticle: 157135
On 10/16/2014 4:57 PM, Mike Field wrote: > On Friday, 17 October 2014 05:17:04 UTC+13, rickman wrote: >> >> I think they have dealt with that one pretty well. After all, they have >> sequenced the human genome. >> > Sure, but in this age of "big data" how quickly can you query 1000s of genmoes, each approximately 4 billion symbols in size, looking for somewhat fuzzy matches? > > Oldish (2006) papers talk about a 2GHz Xeon processing 32M characters per second, and 16 CPU Power system processing 1.2G symbols per second. This is obvoiusly bound by CPU cycles and not memory or I/O bandwidth. > > FPGA hardware, has come a long way in that time, as have memory capacity, memory bandwidths and I/O subsystems. Standard CPUs haven't progressed at such a dramatic pace, just adding more cores. I am pretty sure that revisiting it with an FPGA board that could hold the entire genome in memory, full memory bandwidth speed (a 16x 333 MHz DDR memory can deliver 1.2G symbols per second) should be able to get close to this on a tiny power budget. > > If the process isn't limited by I/O bandwidth, then it isn't running fast enough :-) > > Maybe it could be implemented on one of those ARM/FPGA hybrid chips, with the fabric having a larger private memory to hold the genome data, and the ARM just performing command and control... it would avoid a lot of the complexity of high speed I/O. Not sure you can hold 3 Gsymbols on chip in an FPGA. They may have memory, but not GBs. So the ARM doing control isn't really all that helpful. It can't even be remotely in the data path so it doesn't need to be on chip at all. Why waste space that can be used for more memory and logic? The real advantage of the FPGA approach is that it can connect to multiple memory chips and run them at max throughput. Multiple FPGAs can be used on one board potentially outpacing the density of PC CPUs and almost certainly reducing the power budget. What was ALU bound in a PC will be memory bound in an FPGA, so more memory means more processing. -- RickArticle: 157136
On Friday, 17 October 2014 11:08:48 UTC+13, rickman wrote: > On 10/16/2014 4:57 PM, Mike Field wrote: > The real advantage of the FPGA approach is that it can connect to > multiple memory chips and run them at max throughput. Multiple FPGAs > can be used on one board potentially outpacing the density of PC CPUs > and almost certainly reducing the power budget. What was ALU bound in a > PC will be memory bound in an FPGA, so more memory means more processing. Fully agree, and assuming his board is something like an Digilent Atlys it may already have perhaps 128MB of DDR on it, allowing the design to be tested with 512 million DNA (2-bit) symbols - enough to hold a worm's genome. However, I guess I've dragged this discussion far away from the the original poster's question of what to do for his final year project... MikeArticle: 157137
rickman <gnuarm@gmail.com> wrote: > On 10/15/2014 7:57 PM, Theo Markettos wrote: > > I don't follow. How is rearchitecting a CPU in an FPGA anything like > changing GPU code? Your explanation makes no sense. I think you are > working from a huge lack of knowledge of HDLs. It isn't, that's the point. /If/ you can write in a high-ish level language like CUDA or OpenCL and achieve your goal with some commodity hardware you can buy in every city, why would you want to use an FPGA? Why would you want to worry about synthesis and meeting timing and state machines and debugging with a logic analyser? Maybe your problem doesn't fit the GPU model and is better suited to an FPGA, but you really ought to stop and think first. It's not the HDL as a language as such (though Verilog's lax syntax makes it easy to introduce bugs), it's that the abstraction is not sufficiently high enough to make any useful progress. If somebody wants to experiment with architecture they should be able to do that without having to manage all the underlying complexity, and then be able to go back later and refine the code for performance. But this is awkward in (eg) verilog unless you have a very good test suite - it's too easy to introduce control flow bugs. I've read the code of a web browser written in assembler. That's an example where the abstraction was not sufficiently high - hand management of registers and bit twiddling of memory made it simply impossible to keep control of the complexity. Development eventually ground to a halt because it was simply impossible to develop. Likewise verilog gives you ultimate control, and that's not always what you want when you're just evaluating ideas. All I'm saying is that verilog/VHDL are insufficiently high levels of abstraction for architectural exploration. I'm not saying all HDLs/HLS are bad, just that you need to pick the right language. > The CPU has only a handful of ALUs to perform useful calculations on, > the FPGA is limited only by its size. The clock speed is swamped by the > sheer number of computations that can happen in parallel. > > The GPU has lots of ALUs, but is limited in how they are used. It is > *nothing* like having 1000 separate processors. So it can only be > useful on certain types of problems. > > The FPGA gets around all of these issues and can be configured on the fly. Agreed. Some problems are about relatively simple heavily-parallel compute, and if especially if they can be easily pipelined then they fit FPGA nicely. Likewise if they have Gbps of external I/O FPGA will leave a GPU standing (or if the I/O is not in a PC-friendly format). However if they need heavy floating point, like a lot of scientific compute, this starts eating up area rapidly. If they're memory-bound, then you're up against the limits of DDR3, which is a lot less bandwidth than GDDR5. Or if you want to do iterative development: many-hours FPGA synthesis times are not conducive. Horses for courses and all that. My point is that you should do the work on your algorithm to see how it best fits the technologies available to you (CPU, GPU, FPGA), and then refactor it to suit. You may get substantially more performance by refactoring the algorithm for a given technology, rather than simply jumping in and implementing a naive algorithm. Once you've done this, only then implement it. But then be prepared to (repeatedly) refactor your architecture again in the light of that experience. > I gave one reason which you didn't respond to. I'm not saying 'FPGA bad, GPU good', I'm saying implementing an FPGA design for scientific compute is a lot of work. So you need to have a clear reasoning why you're doing it. Just doing it 'to make my Matlab go faster' is not a good enough reason, because there's a lot less painful ways to achieve that. TheoArticle: 157138
On Tuesday, October 14, 2014 9:05:07 AM UTC-4, Petter Gustad wrote: > Is it possible to run a Vivado simulation in non-project mode? > > > > I can't seem to find any documentation on how to do it. ug835 describes > > which Tcl commands are used for simulation, but not which to use for > > non-project mode. > > > > //Petter > > -- > > .sig removed by request. Yes, it's fairly easy using xvhdl, xvlog, xelab and xsim as described in UG900. For an example, I have a SPI Master module I've set this up for. I created a spi_master.prj file with the following contents: vhdl work ../src/spi_master_ae.vhd vhdl work ../tb/stdtb_pb.vhd vhdl work ../tb/models/spi_bfm.vhd vhdl work ../tb/spi_testbench_pb.vhd vhdl work ../tests/spi_testcase_e.vhd vhdl work ../tests/spi_tc_a.vhd vhdl work ../tb/harnesses/spi_harness_pb.vhd vhdl work ../tb/harnesses/spi_harness_tb_e.vhd vhdl work ../tb/harnesses/spi_harness_tb_a.vhd vhdl work ../tb/spi_tb.vhd I then run xvhdl using the following to compile everything: # xvhdl -prj spi_master.prj Next, I elaborated the design using: # xelab work.spi_tb -prj spi_master.prj -debug all Finally, I kick off the simulation using the GUI with: # xsim -g work.spi_tb UG900 provides much greater detail on the other options for each of those steps too.Article: 157139
When I generate VHDL from Handel-C. I always end up with an empty VHDL file, did any one face this problem? and how to solve it? ThanksArticle: 157140
On 13/10/14 18:35, Jim Lewis wrote: > Hi Alan, >>> I used to, has been a fun ride. Too bad I never got the chance to follow >> >>> a course of yours. It seems now they are more and more pushing for >> >>> SystemVerilog, while I lately was trying to fight my way with OSVVM >> >>> which I find more appropriate for that type of community. >> >>> >> >> >> >> I'm afraid it's "the dead hand of the market"... > > Perhaps a shrinking market for Doulos, however, it has been a growing market for us (including in UK). > > What I do see in SystemVerilog's favor is the vendors are pushing the user community heavily to it since they can make more money with SystemVerilog licenses. > > OTOH, recently we have had people switching from SystemVerilog to VHDL/OSVVM because their projects could not afford the pricing of a SystemVerilog simulator when OSVVM can do the same thing. > > Cheers, > Jim > > > I just want to clarify that I left Doulos a year ago, and I have no knowledge of their current market. Also I didn't say that the VHDL market is shrinking. What I was trying to say is that there's more demand for SystemVerilog training than VHDL - but that doesn't mean that VHDL is necessarily shrinking. regards Alan -- Alan FitchArticle: 157141
I'm wondering what the correct way to handle the following situation is. Sorry this is a bit long winded. BTW, it's not homework, all that was 40+ years ago. I have two clocks, clk which is the FPGA clock rate, and sclk which I create using a simple divide by n counter. Typically, sclk is 1024 times slower than clk. An event occurs that sets a reg, DR, for one clk cycle. There is a register, calcreg [7:0] which is to be incremented slowly, but reset to zero on DR. There are two sections, one triggered by clk and one by sclk, ie: // Fast section always @ (posedge clk) begin .... end // Slow section always @ (posedge sclk) begin if (DR) calcreg <= 8'h0 ; // Reset calcreg on DR else calcreg <= calcreg + 1 ; // Else increment end The problem of course is that the on state of DR will almost always be missed, it will only appear if it happens to coincide with a sclk edge (1 / 1024). So the above doesn't work. So I tried modifying the clk section as follows: always @ (posedge clk) begin .... if (DR) calcreg <= 8'h0 ; end This threw up build errors, Error (10028): Can't resolve multiple constant drivers for net "FlashCtr[3]" at tick.v(43) I think I see the reason, it's like trying to wire two gate outputs to the same point, something that's obviously verboten with active drive hardware. If someone could help with the following specific questions it would help a lot..... Is using the two clocks simply bad practice, ie. should everything be done in a single always block at clk rate? Is there a standard way to latch the DR signal when it occurs on the fast clock, so that it will be there on the next transition of sclk, which must then clear the DR latch? I've tried this, and come up with the same build error with the latch.Article: 157142
On 17/10/2014 23:48, Ahmed Ablak wrote: > When I generate VHDL from Handel-C. I always end up with an empty VHDL file, did any one face this problem? > and how to solve it? > Thanks > Hi Ahmed, There is not much info to go on. I assume you are using Handel-c because of some existing (Celoxica) hardware? If you are just after C synthesis then I would suggest you look at Hercules, Xilinx HLS, synflow etc or swap to a more traditional RTL languages. Do any of the demo files work? There is a simple led toggle example in the ./examples/pal/led directory. Good luck, Hans www.ht-lab.comArticle: 157143
On 18/10/2014 00:37, Alan Fitch wrote: Hi Alan, .. > > I just want to clarify that I left Doulos a year ago, and I have no > knowledge of their current market. Also I didn't say that the VHDL > market is shrinking. I guess using "the dead hand of the market" is not the most appropriate phrase for the leading FPGA design language. > What I was trying to say is that there's more > demand for SystemVerilog training than VHDL - I am not sure that is correct from what I understand VHDL is still the most popular Doulos language course, also if you look at the current schedule there are more VHDL than SystemVerilog courses. > but that doesn't mean that > VHDL is necessarily shrinking. I agree with Jim that the EDA industry seems to be doing its best to make this happen ;-) Hope you are enjoying your new job and are allowed to use VHDL and SystemC, Regards, Hans. www.ht-lab.com > > regards > > Alan >Article: 157144
Hi, I would put it all into a single clk block. I.e. reg [9:0] count = 0; always @(posedge clk) begin count <= count + 1; // fast "clock" here if (count == 0) begin end end Depending on what you want to achieve, you could also re-synchronize the --------------------------------------- Posted through http://www.FPGARelated.comArticle: 157145
.. you could also resynchronize the slow "clock" on the event. Maybe the difficulty is to define how exactly the circuit is going to behave, not so much coding it in RTL. Multiple clocks might be used on an ASIC where it allows use of smaller cells for the slow part. For a simple FPGA project, my main goal would be to keep the code as readable as possible. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 157146
On Monday, October 13, 2014 11:48:37 PM UTC+5, mnentwig wrote: > Hi, > > > > >> verilog. > > > > >> These are some projects which I might be doing: > > >> spartan 6 xc6slx45 kit > > > > >>1.the n-body gravitational problem > > >>2.Oceanic modeling > > >>3.Cancer biology modeling > > > > this is not to discourage you. But please be warned that heavy-duty FPGA > > implementations as youre planning are > > > > *much* > > > > and I mean "much" harder than it looks from all those shiny webpages that > > make it look like Lego bricks because they want to sell you stuff. > > > > Here's my proposal: Why don't you implement "Hello world!" in Morse code. > > Which is ".... . .-.. .-.. --- .-- --- .-. .-.. -.. " > > Just a blinking LED. > > Expect that i'll take between a day and two weeks. This includes things > > that "should" be easy but are not, such as installing ISE 14.7 when you've > > never done it before, making the JTAG interface work etc. > > > > In my personal opinion, the xc6slx45 is an excellent choice to get started. > > Because > > a) it does not require a Xilinx license to program it > > b) I can get one cheaply if you ever need one, i.e. "Numato Saturn" or > > "Pipistrello" boards, for ~$130..160. > > c) If it breaks, it's no big deal, compared to a $3000+ board. > > To learn Verilog, the smallest and cheapest FPGA will do, if you decide to > > buy one for yourself. The typical feedback from the board is "this doesn't > > work - go simulate some more". > > > > Note, you said "Verilog", not using some intermediate wizardry that > > generates the code. For the latter, a sxl45 is probably too small > > (guessing, haven't done it myself). > > > > --------------------------------------- > > Posted through http://www.FPGARelated.com I have Xilinx 14.5 full licensed and spartan 6 xc6slx45 kit available in my college lab. And I also want to get a bit challenging thing for project.Article: 157147
On Monday, October 13, 2014 9:09:49 PM UTC+5, awais...@namal.edu.pk wrote: > I am student of Bachelors and going to start my FYP in some days. I am going into the field of high computation in verilog. These are some projects which I might be doing: > > 1.the n-body gravitational problem > > 2.Oceanic modeling > > 3.Cancer biology modeling > > > > Any other projects you might suggest that may be beneficial for me. > > And also my main aim after Bachelors is to get admission in some US university. > > Thanks! And I have also available this card in my lab http://www.nallatech.com/PCI-Express-FPGA-Cards/pcie-385n-altera-stratix-v-fpga-computing-card.html And I may also move to open CL. ANy good projects in open CL???Article: 157148
On 10/18/2014 3:53 AM, "Bruce Varley" wrote: > I'm wondering what the correct way to handle the following situation > is. Sorry this is a bit long winded. BTW, it's not homework, all that > was 40+ years ago. > > > I have two clocks, clk which is the FPGA clock rate, and sclk which I > create using a simple divide by n counter. Typically, sclk is 1024 > times slower than clk. > > An event occurs that sets a reg, DR, for one clk cycle. > > There is a register, calcreg [7:0] which is to be incremented slowly, > but reset to zero on DR. > > There are two sections, one triggered by clk and one by sclk, ie: > > // Fast section > always @ (posedge clk) > begin > .... > end > > // Slow section > always @ (posedge sclk) > begin > if (DR) calcreg <= 8'h0 ; // Reset calcreg on DR > else calcreg <= calcreg + 1 ; // Else increment > end > > The problem of course is that the on state of DR will almost always be > missed, it will only appear if it happens to coincide with a sclk edge > (1 / 1024). So the above doesn't work. > > So I tried modifying the clk section as follows: > > always @ (posedge clk) > begin > .... > if (DR) calcreg <= 8'h0 ; > end > > This threw up build errors, > > Error (10028): Can't resolve multiple constant drivers for net > "FlashCtr[3]" at tick.v(43) > > I think I see the reason, it's like trying to wire two gate outputs to > the same point, something that's obviously verboten with active drive > hardware. > > > If someone could help with the following specific questions it would > help a lot..... > > Is using the two clocks simply bad practice, ie. should everything be > done in a single always block at clk rate? > > Is there a standard way to latch the DR signal when it occurs on the > fast clock, so that it will be there on the next transition of sclk, > which must then clear the DR latch? I've tried this, and come up with > the same build error with the latch. > If you're really just dividing one clock to make another, then you're probably better off using a single clock and generating a count enable for your slow process. On the other hand your problem of using a fast signal to reset a slow process is also applicable to situations where the two clocks are not related and are both necessary for the design. In that case I would normally have an intermediate variable in the fast clock domain that gets set by DR and cleared by a signal returned from the slow process. Something like: reg DR_hold = 0; reg DR_seen = 0; always @ (posedge clk) begin if (DR) DR_hold <= 1; else if (DR_resync) DR_hold <= 0; end always @ (posedge sclk) begin DR_resync <= DR_hold; end Note that if you use DR_resync as the reset term, it will cause additional latency from DR to the reset of the counter. You could use DR_hold instead, but then the problem is if the two clocks are really unrelated you could miss a reset if DR_hold asserts very near the rising edge of sclk and DR_resync catches the event but the counter (or some of its bits) does not. -- GaborArticle: 157149
On 10/18/2014 10:53 AM, Gabor wrote: > On 10/18/2014 3:53 AM, "Bruce Varley" wrote: >> I'm wondering what the correct way to handle the following situation >> is. Sorry this is a bit long winded. BTW, it's not homework, all that >> was 40+ years ago. >> >> >> I have two clocks, clk which is the FPGA clock rate, and sclk which I >> create using a simple divide by n counter. Typically, sclk is 1024 >> times slower than clk. >> >> An event occurs that sets a reg, DR, for one clk cycle. >> >> There is a register, calcreg [7:0] which is to be incremented slowly, >> but reset to zero on DR. >> >> There are two sections, one triggered by clk and one by sclk, ie: >> >> // Fast section >> always @ (posedge clk) >> begin >> .... >> end >> >> // Slow section >> always @ (posedge sclk) >> begin >> if (DR) calcreg <= 8'h0 ; // Reset calcreg on DR >> else calcreg <= calcreg + 1 ; // Else increment >> end >> >> The problem of course is that the on state of DR will almost always be >> missed, it will only appear if it happens to coincide with a sclk edge >> (1 / 1024). So the above doesn't work. >> >> So I tried modifying the clk section as follows: >> >> always @ (posedge clk) >> begin >> .... >> if (DR) calcreg <= 8'h0 ; >> end >> >> This threw up build errors, >> >> Error (10028): Can't resolve multiple constant drivers for net >> "FlashCtr[3]" at tick.v(43) >> >> I think I see the reason, it's like trying to wire two gate outputs to >> the same point, something that's obviously verboten with active drive >> hardware. >> >> >> If someone could help with the following specific questions it would >> help a lot..... >> >> Is using the two clocks simply bad practice, ie. should everything be >> done in a single always block at clk rate? >> >> Is there a standard way to latch the DR signal when it occurs on the >> fast clock, so that it will be there on the next transition of sclk, >> which must then clear the DR latch? I've tried this, and come up with >> the same build error with the latch. >> > > If you're really just dividing one clock to make another, then you're > probably better off using a single clock and generating a count enable > for your slow process. On the other hand your problem of using a fast > signal to reset a slow process is also applicable to situations where > the two clocks are not related and are both necessary for the design. > In that case I would normally have an intermediate variable in the fast > clock domain that gets set by DR and cleared by a signal returned from > the slow process. Something like: > > reg DR_hold = 0; > reg DR_seen = 0; > always @ (posedge clk) > begin > if (DR) DR_hold <= 1; > else if (DR_resync) DR_hold <= 0; > end > > always @ (posedge sclk) > begin > DR_resync <= DR_hold; > end > > Note that if you use DR_resync as the reset term, it will > cause additional latency from DR to the reset of the counter. > You could use DR_hold instead, but then the problem is if > the two clocks are really unrelated you could miss a reset > if DR_hold asserts very near the rising edge of sclk and > DR_resync catches the event but the counter (or some of its > bits) does not. > Oops, in the previous post I started with "DR_seen" but then went to "DR_resync" for the same signal. But you get the idea... -- Gabor
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z