Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Hi,=20 I have a query on the RTL designing for addsub based implementations.=20 I heard that addsubs are not preferred on FPGAs as they produce worse area = and timing QoR. Is it true ? Is resource sharing not preferred in general o= n FPGAs. However, if I try a very simple design of addsub shown below it shows me no= difference. May be in case of small examples, the difference in implementa= tion might not be evident. That is why I wanted to ask a broader audience.= =20 The reasoning & cases for both 'yes' and 'no' will help in understanding th= e cause ?=20 Thanks Vipin module addsub(a, b, oper, res); input oper; input [7:0] a; input [7:0] b; output [7:0] res; reg [7:0] res; always @(a or b or oper) begin if (oper =3D=3D 1=92b0) res =3D a + b; else res =3D a - b; end endmoduleArticle: 156176
sh.vipin@gmail.com wrote: > I have a query on the RTL designing for addsub based implementations. > I heard that addsubs are not preferred on FPGAs as they produce > worse area and timing QoR. Is it true ? Is resource sharing not > preferred in general on FPGAs. > However, if I try a very simple design of addsub shown below > it shows me no difference. May be in case of small examples, > the difference in implementation might not be evident. > That is why I wanted to ask a broader audience. > The reasoning & cases for both 'yes' and 'no' will help in > understanding the cause ? I first got interested in FPGA addition and subtraction in the XC4000 days. The XC4000 has a special carry logic that may or may not do this operation. The carry logic changed completely between the XC4000 series and later series, though. In the pre-IC days, it was common to build logic, called ALU, which can implement add, subtract, and some bitwise logic operations using an optimal number of transistors or gates. Similar logic went into TTL. > module addsub(a, b, oper, res); > input oper; > input [7:0] a; > input [7:0] b; > output [7:0] res; > reg [7:0] res; > always @(a or b or oper) > begin > if (oper == 1???b0) > res = a + b; > else > res = a - b; > end > endmodule Well, one possible implementation is adder and subtractor, followed by mux to select. But modern logic optimization tools should be able to do better. You could also write: res = a + (oper ? b:-b); which may or may not fit the FPGA better. (Seems to me closer to the way that the carry logic works, though.) If you want optimal LUT use, or minimal delay, then you need to look more carefully at what it is doing. Otherwise, the logic minimization will apply to the whole system, such that it may or may not matter. -- glenArticle: 156177
On Wednesday, January 8, 2014 4:33:39 PM UTC-6, sh.v...@gmail.com wrote: > Hi, I have a query on the RTL designing for addsub based implementations.= I > heard that addsubs are not preferred on FPGAs as they produce worse area = and > timing QoR. Such statements often heard about preferences in FPGAs are not always appli= cable to all manufacturers' FPGAs or even all of the same manufacturer's FP= GA families. What might not have worked well at some time months or years a= go may not be an issue today with another FPGA family. Your tests seem to s= how it works fine for your target FPGA and tools. Different synthesis tools= (including different versions of the same tool) may also affect the reults= . On a slightly different issue, IMHO, creating a design where an adder and/o= r subtractor is a separate module to be instantiated makes the larger proje= ct's code less readable and understandable, unless you are specifically tr= ying to re-use a given adder or subtractor's implementation (not just the c= ode) to save utilization on the project.=20 Don't borrow trouble unless you have to. Write the RTL so that you can unde= rstand the function it has to perform (not the way you'd design the hardwar= e) first, then see if that meets your performance/utilization requirements = (not your personal desire to make the "best" implementation). You'd be amaz= ed what a good synthesis tool can do these days. The folks that have to mai= ntain your design (which may be yourself in 6 weeks/months/years) will than= k you for it. AndyArticle: 156178
glen herrmannsfeldt wrote: > sh.vipin@gmail.com wrote: > > > I have a query on the RTL designing for addsub based implementations. > > > I heard that addsubs are not preferred on FPGAs as they produce > > worse area and timing QoR. Is it true ? Is resource sharing not > > preferred in general on FPGAs. > > > However, if I try a very simple design of addsub shown below > > it shows me no difference. May be in case of small examples, > > the difference in implementation might not be evident. > > That is why I wanted to ask a broader audience. > > > The reasoning & cases for both 'yes' and 'no' will help in > > understanding the cause ? > > I first got interested in FPGA addition and subtraction > in the XC4000 days. The XC4000 has a special carry logic > that may or may not do this operation. The carry logic > changed completely between the XC4000 series and later series, > though. > > In the pre-IC days, it was common to build logic, called ALU, > which can implement add, subtract, and some bitwise logic operations > using an optimal number of transistors or gates. Similar logic > went into TTL. > > > module addsub(a, b, oper, res); > > input oper; > > input [7:0] a; > > input [7:0] b; > > output [7:0] res; > > reg [7:0] res; > > always @(a or b or oper) > > begin > > if (oper == 1???b0) > > res = a + b; > > else > > res = a - b; > > end > > endmodule > > Well, one possible implementation is adder and subtractor, > followed by mux to select. But modern logic optimization tools > should be able to do better. You could also write: > > res = a + (oper ? b:-b); > > which may or may not fit the FPGA better. (Seems to me closer > to the way that the carry logic works, though.) The above has an ambiguous carry out depending on how the -b is implemented. If -b is implemented as ~b+1 then for subtract res = a + ~b + 1 which makes the carry out the result of the +1 increment and not the addition. A simple test case is when a and b are 0. If the -b is a true -b then res = 0 Carry = 0 If the -b is ~b+1 then res = 0 Carry = 1 Might be better to restate the above as res = (oper ? b:-b) + a; which doesn't have this ambiguity. I run into this a lot writing code generators for compilers. w..Article: 156179
jonesandy@comcast.net wrote: > On Wednesday, January 8, 2014 4:33:39 PM UTC-6, sh.v...@gmail.com wrote: >> Hi, I have a query on the RTL designing for addsub based implementations. I >> heard that addsubs are not preferred on FPGAs as they produce worse area and >> timing QoR. > Such statements often heard about preferences in FPGAs are not > always applicable to all manufacturers' FPGAs or even all of the > same manufacturer's FPGA families. What might not have worked well > at some time months or years ago may not be an issue today with > another FPGA family. Your tests seem to show it works fine for > your target FPGA and tools. Different synthesis tools (including > different versions of the same tool) may also affect the reults. Yes. As I noted, there was a big change after the XC4000. > On a slightly different issue, IMHO, creating a design where > an adder and/or subtractor is a separate module to be > instantiated makes the larger project's code less readable > and understandable, unless you are specifically trying to > re-use a given adder or subtractor's implementation (not > just the code) to save utilization on the project. Hmm. Hard to say, but in the ones I work on, it is more readable as a separate module. But it might be that the OP was using this to show the question, and not actually code that way. As far as I know, the tools first flatten the netlist, so it doesn't change the result at all. > Don't borrow trouble unless you have to. Write the RTL so > that you can understand the function it has to perform > (not the way you'd design the hardware) first, then see > if that meets your performance/utilization requirements > (not your personal desire to make the "best" implementation). It has always seemed to me that people who knew how to design hardware, knew about gates and such, wrote better HDL. That is, not think of it as writing software (like C), but as wiring up gates. But yes, as with software, write for readability. > You'd be amazed what a good synthesis tool can do these days. > The folks that have to maintain your design (which may be > yourself in 6 weeks/months/years) will thank you for it. There are cases where the performance goal is "as fast as possible." In this case, compare the logic against the logic of a fixed adder. If it is the same speed, then use it. If it is a lot slower, then see why it is slow. Another possibility is to pipeline the complement stage before an adder. -- glenArticle: 156180
Hello, I like to be able to instantiate FPGA primitives directly in my VHDL in ord= er to get fine control of a design and to get full access to the hardware f= eatures of the chip.=20 Xilinx publishes libraries for each family of parts, for example "7 Series = Libraries Guide for HDL Designs". These guides describe how to instantiate = and parameterize every component available in the part. This is good for bl= ock RAMs, PLLs, I/O SerDes, etc. Does something like this exist for Altera devices? Sometimes I use Xilinx Logicore or Altera Megafunction generators but often= I need to get right to the elements of the chip. Thanks for any advice. PeteArticle: 156181
On Thursday, January 9, 2014 2:07:28 PM UTC-6, Walter Banks wrote: > The above has an ambiguous carry out depending on how the -b is implemented. Interesting, but since res, a and b are all the same size (in bits), in this Verilog statement, there is no observable carry out, so there is no ambiguity. If res were bigger than a and b, then I'm not sure what it would do (but I'm sure it's defined somewhere). I use VHDL. AndyArticle: 156182
On Thursday, January 9, 2014 4:38:07 PM UTC-6, glen herrmannsfeldt wrote: > It has always seemed to me that people who knew how to design hardware, k= new > about gates and such, wrote better HDL. That is, not think of it as writi= ng > software (like C), but as wiring up gates.=20 I'm almost the opposite. I see RTL written by very experienced HW (not HDL)= designers, and it often reads like a netlist. Might as well have coded it = in edif and saved the cost of a synthesis license.=20 It's not their fault. We don't spend time teaching HDL designers how a synt= hesis tool analyzes their code, and why it infers a register, a latch(!), a= RAM, or combinatorial gates. We teach all these cook-book approaches to de= signing FPGAs and ASICs using the same primitive functions they used with s= chematics. We are sequential thinkers, not parallel thinkers. Therefore, it is best th= at we describe the desired behavior (on a clock cycle basis) in a sequentia= l context (an always block or process), and let the synthesis tool infer pa= rallelism where it is possible (they're excellent at that). Use functions a= nd procedures to break out subsets of sequential behaviors. Instead of thin= king in registers (circuit elemenst), think in clock cycles of delay (behav= ior). The registers are going to get shuffled around by retiming/pipelining= optimizations anyway. The clock cycle delays will still be there. Just be = careful around asynchronous inputs! Of course, when the functionality is so complex that it cannot be easily ex= pressed in a single sequential context, then it must be broken up into sepa= rately instantiated parallel contexts (entities or modules), each including= their own detailed behavior in a sequential context.=20 My point is, we can understand (and therefore express and maintain) more co= mplex behavior when it is conveyed in a sequential context. Imagine a casse= role recipe written in concurrent statements. > There are cases where the performance goal is "as fast as possible."=20 In my professional experience, such cases are pretty rare. But fun when the= y happen. > Another possibility is to pipeline the complement stage before an adder. Especially if oper and b are both available early! andyArticle: 156183
Hello! I am looking for a simple shareware SPDIF to I2S audio convertor IP. I saw one in Opencores, but it is was actually AES/EBU->I2S IP overloaded with AES/EBU extraction options. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 156184
jonesandy@comcast.net wrote: > On Thursday, January 9, 2014 2:07:28 PM UTC-6, Walter Banks wrote: >> The above has an ambiguous carry out depending on how >> the -b is implemented. > Interesting, but since res, a and b are all the same size > (in bits), in this Verilog statement, there is no observable > carry out, so there is no ambiguity. > If res were bigger than a and b, then I'm not sure what it > would do (but I'm sure it's defined somewhere). I use VHDL. I would have to look up the rule if I was actually doing it, but yes, verilog knows about carry if the register is wide enough, and it is supposed to ignore the carry if there aren't more bits. I have found some synthesis tools that complain about the loss of the carry. Unlike most programming languages, verilog looks at the size of the destination (left side of assignment). Well, I usually write continuous assignment, not behavioral assignment. I believe the rules are the same, but I am not sure about that. Does VHDL have something like the verilog continuous assignment? -- glenArticle: 156185
On Friday, January 10, 2014 4:15:01 PM UTC-6, glen herrmannsfeldt wrote: > Does VHDL have something like the verilog continuous assignment? Yes, VHDL has concurrent assignment statements in several forms: direct, conditional and selected (like a case statement on the RHS), as well as concurrent procedure calls. It is difficult to describe an iterative behavior, such as priority encoding or "counting ones," with concurrent statements; these are much easier with sequential statements. AndyArticle: 156186
jonesandy@comcast.net wrote: > On Thursday, January 9, 2014 4:38:07 PM UTC-6, glen herrmannsfeldt wrote: >> It has always seemed to me that people who knew how to design hardware, knew >> about gates and such, wrote better HDL. That is, not think of it as writing >> software (like C), but as wiring up gates. > I'm almost the opposite. I see RTL written by very experienced > HW (not HDL) designers, and it often reads like a netlist. > Might as well have coded it in edif and saved the cost of > a synthesis license. > It's not their fault. We don't spend time teaching HDL > designers how a synthesis tool analyzes their code, > and why it infers a register, a latch(!), a RAM, or > combinatorial gates. We teach all these cook-book approaches > to designing FPGAs and ASICs using the same primitive > functions they used with schematics. > We are sequential thinkers, not parallel thinkers. OK, but HDL is inherently parallel, and, more and more, software programming, as multicore systems get more and more popular. > Therefore, it is best that we describe the desired behavior > (on a clock cycle basis) in a sequential context (an > always block or process), and let the synthesis tool infer > parallelism where it is possible (they're excellent at that). I believe that C programmers, and other high-level language programmers, who know how to write assembler code tend to write better HLL code. They don't have to think about the generated code for each statement, but still know which constructs generate better code. > Use functions and procedures to break out subsets of > sequential behaviors. Instead of thinking in registers > (circuit elemenst), think in clock cycles of delay (behavior). > The registers are going to get shuffled around by > retiming/pipelining optimizations anyway. The clock > cycle delays will still be there. Just be careful around > asynchronous inputs! Some time ago, I was designing systolic arrays with the goal of at most two level of logic (two LUTs) between registers. But registers are what make systolic arrays work, so there really isn't any ignoring them. > Of course, when the functionality is so complex that it > cannot be easily expressed in a single sequential context, > then it must be broken up into separately instantiated > parallel contexts (entities or modules), each including > their own detailed behavior in a sequential context. A systolic array is a long array, hundreds to thousands of stages, of fairly simple unit cells. Mostly, I don't have anything against behavioral HDL, but am less sure about people who want to write HDL in C. > My point is, we can understand (and therefore express and > maintain) more complex behavior when it is conveyed in a > sequential context. Imagine a casserole recipe written > in concurrent statements. If you are building a factory to produce thousands of them a day, then you probably have to consider it in parallel. For home cooking, though, serial usually works. >> There are cases where the performance goal is "as fast >> as possible." > In my professional experience, such cases are pretty rare. > But fun when they happen. (snip) -- glenArticle: 156187
jonesandy@comcast.net wrote: (snip, I wrote) >> Does VHDL have something like the verilog continuous assignment? > Yes, VHDL has concurrent assignment statements in several > forms: direct, conditional and selected (like a case statement > on the RHS), as well as concurrent procedure calls. Verilog has the conditional operator (?:) like C and Java. > It is difficult to describe an iterative behavior, such > as priority encoding or "counting ones," with concurrent > statements; these are much easier with sequential > statements. Not so hard, as I think I have done both of them. The usual implementation of counting ones is a carry save adder tree. It isn't so hard to write, but, yes, the usual tools generate them pretty well. Well, once I needed a ones counting that would generate, zero, one, two, three, or more than three from a 40 bit input, and with one pipeline stage. I wrote the logic for an 8 bit version, used five of those, a register stage, and then enough logic to combine the results. Counting up to 8 bits is about as easy with and without a loop. -- glenArticle: 156188
Hi, I agree, there isn't much papers and links on this subject. I recently came across this: http://www.clifford.at/yosys/files/yosys-austrochip2013.pdf You can also search for relevant Xilinx/Altera/Tabula patents in Google Patents. Thanks, EvgeniArticle: 156189
Hi Peter, As far as I know, Altera doesn't have a single document with all possible primitives. Here are some low-level primitives: http://www.altera.com/literature/ug/ug_low_level.pdf Memories: http://www.altera.com/literature/ug/ug_ram_rom.pdf One option is to generate a primitive with MegaWizard, go to the generated code to find out primitive's name, and then google for its documentation. Thanks, EvgeniArticle: 156190
Hi I have to perform punctured convolution encoding at 3/4 rate on Verilog. Ca= n anyone provide me the source code to perform puncturing? I am able to per= form convolution coding. Don't know how to implement puncturing on it.I hav= e to maintain an output data rate of 100 MHz.Article: 156191
Hi, I'm getting some strange results from the simulator that I don't understand (tried both iverilog and ISim). The design tries to make a combinational assignment inside a module: module gate (input [1:0] mux, input [17:0] a, output [17:0] b); assign b = (mux == 1) ? a : 17'hx; endmodule What happens is that "a" never gets through to "b". A similar line outside a module works. With some time scale settings, it works also. The simulation is completely algorithmic, no delays, device models or the like. Does anybody have an idea, what is going on here? Is this a delta-cycle problem, am I missing something fundamental here? Apologies if the answer is obvious, but usually I I stick to the safe path of fully synchronous logic in Moore machines... --------------------------------------- Posted through http://www.FPGARelated.comArticle: 156192
On 1/14/2014 12:50 PM, mnentwig wrote: > Hi, > > I'm getting some strange results from the simulator that I don't understand > (tried both iverilog and ISim). > > The design tries to make a combinational assignment inside a module: > > module gate (input [1:0] mux, input [17:0] a, output [17:0] b); > assign b = (mux == 1) ? a : 17'hx; > endmodule > > What happens is that "a" never gets through to "b". > A similar line outside a module works. > With some time scale settings, it works also. The simulation is completely > algorithmic, no delays, device models or the like. > > Does anybody have an idea, what is going on here? Is this a delta-cycle > problem, am I missing something fundamental here? > Apologies if the answer is obvious, but usually I I stick to the safe path > of fully synchronous logic in Moore machines... > > --------------------------------------- > Posted through http://www.FPGARelated.com > Maybe you have a problem with the code that instantiates this module? Or were you saying that this is all you tried to simulate? Can you post the test bench code that didn't work? -- GaborArticle: 156193
mnentwig <24789@embeddedrelated> wrote: > I'm getting some strange results from the simulator that I don't > understand (tried both iverilog and ISim). > The design tries to make a combinational assignment inside a module: > module gate (input [1:0] mux, input [17:0] a, output [17:0] b); > assign b = (mux == 1) ? a : 17'hx; > endmodule > What happens is that "a" never gets through to "b". > A similar line outside a module works. > With some time scale settings, it works also. The simulation > is completely algorithmic, no delays, device models or the like. Well, it shouldn't get through to "b" unless mux==2'b01. Not that it should matter, but why not 18'hx instead of 17'hx? Try putting some other value, such as 18'x12345 instead, so you will know that some value is getting through. For synthesis, there is no use for x state, I suppose some for simulation, but more often I put in an actual, but unexpected, value. > Does anybody have an idea, what is going on here? > Is this a delta-cycle problem, am I missing something > fundamental here? I don't see any delta cycle problem here, but it is possible in the instantiating module. If you get the widths wrong, though, it could confuse everything. Why is mux two bits? -- glenArticle: 156194
Hi, well, the example got a bit sloppy, after I edited it about 20 times. The "17" bit width should be "18", but this works anywhere else (usually I just write 'hx). The mux control is 2 bit wide because of four states. The purpose of the block to arbitrate memory access between up to four parties. But the output remains undefined for any value of "mux". If I move the line out of the module, it works as it should. I guess it's a simulator glitch. Weird though, the rest of the project works exactly as intended, but there all the modules are registered. Maybe I'll try the original code again in a week, with everything else cleaned up. It's a only a fun project, no deadline. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 156195
On Sat, 11 Jan 2014 23:31:10 -0800, eyecatcherdear wrote: > Hi I have to perform punctured convolution encoding at 3/4 rate on > Verilog. Can anyone provide me the source code to perform puncturing? I > am able to perform convolution coding. Don't know how to implement > puncturing on it.I have to maintain an output data rate of 100 MHz. OK. It's not even close to April, so this isn't an April fool's joke. What part of removing one of every four bits are you having trouble with? -- Tim Wescott Wescott Design Services http://www.wescottdesign.comArticle: 156196
mnentwig wrote: > Hi, > > well, the example got a bit sloppy, after I edited it about 20 times. > The "17" bit width should be "18", but this works anywhere else (usually I > just write 'hx). > The mux control is 2 bit wide because of four states. > The purpose of the block to arbitrate memory access between up to four > parties. But the output remains undefined for any value of "mux". If I move > the line out of the module, it works as it should. > > I guess it's a simulator glitch. Weird though, the rest of the project > works exactly as intended, but there all the modules are registered. > Maybe I'll try the original code again in a week, with everything else > cleaned up. It's a only a fun project, no deadline. > > > --------------------------------------- > Posted through http://www.FPGARelated.com It would be strange for two different simulators to have the same "glitch." I'm still guessing that the instantiating code has something odd about it. Can you post the simplest test case that gives this behavior so I could try it on ModelSim? -- GaborArticle: 156197
For those that haven't seen it: http://eda-playground.readthedocs.org/en/latest/ http://www.edaplayground.com/s/example/546 What is scary is that it also works on my mobile, no more holidays for me.... Hans www.ht-lab.comArticle: 156198
Hi everyone, I'm trying to optimize the footprint of my firmware on the target device and I realize there are a lot of parameters which might be stored in the embedded RAM instead of dedicated registers. Certainly the RAM access logic will 'eat some space' but lot's of flops will be released. Is there any recommendation on how to optimally use embedded resources? [1] The main reason for this optimization is to free some space to include a function which has been added later in the design phase (ouch!). Thanks a lot, Al [1] I know that put like this this question is certainly open to a hot discussion! :-) -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?Article: 156199
Hey y'all -- So this is one of those times that my lack of serious math chops is coming round to bite me, and none of my usual books is helping me out. I'm hoping someone has some thoughts. I'm trying to approximate either exp(-1/(n+1)) or 4^(-1/n+1). I can convince myself I don't care which. n is an integer from 1-65535, and the result should be fixed-point fractional, probably U0.18. The function output is always between 0-1, and goes up like a rocket for small n before leveling off to a steady cruise of >0.9 for the rest of the function domain. I'm working in an FPGA, so I've got adds, multiplies, and table lookups from tables of reasonable size (10s of kb) cheap, but other things (divides especially) are expensive. I can throw several clock cycles at the problem if need be. Taylor series attacks seem to fail horribly. I feel like there may be some answer where answers for n in [1,127] gets a direct table lookup, and n in [128,65535] gets some other algorithm, possibly with a table boost. Or somehow taking advantage of the fact that log(1-f(n)) is related to log(n)? Anyone have any thoughts? -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.H
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z