Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On Aug 6, 8:40=A0am, eromlignod <eromlig...@aol.com> wrote: > Hi guys: > > I'm prototyping an application using a Xilinx Spartan-3 development > board. =A0I'm using this particular development kit because it is suited > to the large amount of I/O I need. > > I'm new to FPGA, so I have written the code in Verilog using almost > exclusively a high-level, behavioural style. =A0The program works, but > synthesizes using 99% of the available slices. =A0So if I try to change > or improve the code, it often synthesizes to over 100% and kicks out > an error. > > I need to condense what I've got to give me some space to work with. > > The application is basically a large number of high-speed pulse > inputs. =A0I count them all independently and average several readings > over time for each to produce a 21-bit number. =A0Each of these 21-bit > vectors (there are almost 100) is sent to a central processing module > that evaluates and compares them using simple arithmetic. =A0Based on > these comparisons, another set of vectors is sent on to a couple of > modules that arrange them into a special synchronous serial output. > That's all it does. > > Are there any standard tips or general guidelines that you might offer > to condense my synthesis? =A0I have found, for example, that making the > vectors smaller doesn't really change the overall slice count, yet > commenting out a single line of the processing code can change it > drastically. > > Any ideas or comments would be greatly appreciated. > > Don Since you state that you run out of slices, I know that your design is larger than the FPGA can hold, but I would still point out that the slice utilization is a pessimistic view of how much of the FPGA you are using, the mapping stage spreads the logic out by default instead of packing it as tightly as possible. The Register and LUT utilization is an optimistic measure of how much of the FPGA you have left. You need to watch all of them to get a good idea of how full your design really is. You mention both a high speed pulse counting section that counts and averages over time, and then a processing section that sounds like it is slower. How much slower is it? If you can share resources over time in this section you could save resources. You can look in the reports to see how many adders, etc the tools inferred from your code. Your goal is to reduce that number to the minimum required to perform the comparisons. You have a range of options that depend on your constraints. At one end of the spectrum, just find any redundant calculations and rearrange your code to share those calculations. At the other end, you could use a soft processor such as a PicoBlaze to do the calculations in software. Regards, John McCaskill www.FasterTechnology.comArticle: 134326
On Aug 6, 8:56=A0am, Mike Treseler <mtrese...@gmail.com> wrote: > eromlignod wrote: > > The application is basically a large number of high-speed pulse > > inputs. =A0I count them all independently and average several readings > > over time for each to produce a 21-bit number. =A0Each of these 21-bit > > vectors (there are almost 100) is sent to a central processing module > > that evaluates and compares them using simple arithmetic. =A0Based on > > these comparisons, another set of vectors is sent on to a couple of > > modules that arrange them into a special synchronous serial output. > > Since the answer is shifted out in serial, > maybe it could be constructed a bit at a time > to save resources. > > > Are there any standard tips or general guidelines that you might offer > > to condense my synthesis? > > A basic trade is time for gates. > A serial crc is slower, but requires less resources > than the parallel version, for example. > > =A0 =A0 -- Mike Treseler Mike: I'm intrigued by your answer, but don't fully understand what you propose. You say that I should construct my serial signal a bit at a time, but how else can I? My last serial generating module has a big 256 vector input that it is translating to a serial output that repeats the 256 bits over and over. The code is basically something like this: input [255:0] invector; output serout; reg [7:0] x; always @(negedge shiftclock) begin x =3D x + 1; serout =3D invector[x]; end I'll bet there's a better way. DonArticle: 134327
On Aug 6, 9:21=A0am, John McCaskill <jhmccask...@gmail.com> wrote: > On Aug 6, 8:40=A0am, eromlignod <eromlig...@aol.com> wrote: > > > > > > > Hi guys: > > > I'm prototyping an application using a Xilinx Spartan-3 development > > board. =A0I'm using this particular development kit because it is suite= d > > to the large amount of I/O I need. > > > I'm new to FPGA, so I have written the code in Verilog using almost > > exclusively a high-level, behavioural style. =A0The program works, but > > synthesizes using 99% of the available slices. =A0So if I try to change > > or improve the code, it often synthesizes to over 100% and kicks out > > an error. > > > I need to condense what I've got to give me some space to work with. > > > The application is basically a large number of high-speed pulse > > inputs. =A0I count them all independently and average several readings > > over time for each to produce a 21-bit number. =A0Each of these 21-bit > > vectors (there are almost 100) is sent to a central processing module > > that evaluates and compares them using simple arithmetic. =A0Based on > > these comparisons, another set of vectors is sent on to a couple of > > modules that arrange them into a special synchronous serial output. > > That's all it does. > > > Are there any standard tips or general guidelines that you might offer > > to condense my synthesis? =A0I have found, for example, that making the > > vectors smaller doesn't really change the overall slice count, yet > > commenting out a single line of the processing code can change it > > drastically. > > > Any ideas or comments would be greatly appreciated. > > > Don > > Since you state that you run out of slices, I know that your design is > larger than the FPGA can hold, but I would still point out that the > slice utilization is a pessimistic view of how much of the FPGA you > are using, the mapping stage spreads the logic out by default instead > of packing it as tightly as possible. =A0The Register and LUT > utilization is an optimistic measure of how much of the FPGA you have > left. =A0You need to watch all of them to get a good idea of how full > your design really is. > > You mention both a high speed pulse counting section that counts and > averages over time, and then a processing section that sounds like it > is slower. How much slower is it? =A0If you can share resources over > time in this section you could save resources. > > You can look in the reports to see how many adders, etc the tools > inferred from your code. =A0Your goal is to reduce that number to the > minimum required to perform the comparisons. =A0You have a range of > options that depend on your constraints. =A0At one end of the spectrum, > just find any redundant calculations and rearrange your code to share > those calculations. At the other end, you could use a soft processor > such as a PicoBlaze to do the calculations in software. > > Regards, > > John McCaskillwww.FasterTechnology.com- Hide quoted text - > > - Show quoted text - What sorts of operations are the biggest gate-hogs? I have a lot of comparison "if" operations, counters, and non-blocking assignments to convert lots of inputs into usable arrays. The averagers each divide by 32 and I have another single divider toward the end that divides by 256. Other than that, I'm not doing anything very fancy. I have no multipliers (though I might like to add one), no "for" loops, etc. I do have a series of hard-coded standard values that I use for comparison. They are in the form of parameters that are fed to each of the input counter modules when they are instantiated in the top module. I suppose these could be EPROM memories, but I haven't figured out yet how to use the memory provided on the development board. DonArticle: 134328
hi everybody, what is the difference between the processor clk and the bus clock in edk platform, and what is the relationship between these clocks and the opb clock , also does the HWICAP clk is the OPB clk. thanks fatmaArticle: 134329
On Aug 6, 9:58=A0am, Jon Beniston <j...@beniston.com> wrote: > > > 5V tolerant I/O with a 3.3V supply. Hardly the first company to do > > > that. Maybe the actual circuit to implement it was novel. > > A novel circuit (if that's the case) even to implement something that > > is not functionally new is patentable...you don't agree? > > No. Being novel is not the only requirement. > True, there are other requirements. Rather than saying 'is patentable' I should have said 'could be patentable'. > > > > I have not reviewed any of these patents so I'm not sure what the a= ctual > > > > patent claims are. > > > Usually people don't boast about their lack of knowledge due to their > > own lack of effort to review the publicly available material...in a > > public forum no less. > > You might want to check who you have quoted, especially given your > next comment. > Sorry about that, I apologize. Kevin JenningsArticle: 134330
On Aug 6, 10:43 am, eromlignod <eromlig...@aol.com> wrote: > On Aug 6, 9:21 am, John McCaskill <jhmccask...@gmail.com> wrote: > > > > > On Aug 6, 8:40 am, eromlignod <eromlig...@aol.com> wrote: > > > > Hi guys: > > > > I'm prototyping an application using a Xilinx Spartan-3 development > > > board. I'm using this particular development kit because it is suited > > > to the large amount of I/O I need. > > > > I'm new to FPGA, so I have written the code in Verilog using almost > > > exclusively a high-level, behavioural style. The program works, but > > > synthesizes using 99% of the available slices. So if I try to change > > > or improve the code, it often synthesizes to over 100% and kicks out > > > an error. > > > > I need to condense what I've got to give me some space to work with. > > > > The application is basically a large number of high-speed pulse > > > inputs. I count them all independently and average several readings > > > over time for each to produce a 21-bit number. Each of these 21-bit > > > vectors (there are almost 100) is sent to a central processing module > > > that evaluates and compares them using simple arithmetic. Based on > > > these comparisons, another set of vectors is sent on to a couple of > > > modules that arrange them into a special synchronous serial output. > > > That's all it does. > > > > Are there any standard tips or general guidelines that you might offer > > > to condense my synthesis? I have found, for example, that making the > > > vectors smaller doesn't really change the overall slice count, yet > > > commenting out a single line of the processing code can change it > > > drastically. > > > > Any ideas or comments would be greatly appreciated. > > > > Don > > > Since you state that you run out of slices, I know that your design is > > larger than the FPGA can hold, but I would still point out that the > > slice utilization is a pessimistic view of how much of the FPGA you > > are using, the mapping stage spreads the logic out by default instead > > of packing it as tightly as possible. The Register and LUT > > utilization is an optimistic measure of how much of the FPGA you have > > left. You need to watch all of them to get a good idea of how full > > your design really is. > > > You mention both a high speed pulse counting section that counts and > > averages over time, and then a processing section that sounds like it > > is slower. How much slower is it? If you can share resources over > > time in this section you could save resources. > > > You can look in the reports to see how many adders, etc the tools > > inferred from your code. Your goal is to reduce that number to the > > minimum required to perform the comparisons. You have a range of > > options that depend on your constraints. At one end of the spectrum, > > just find any redundant calculations and rearrange your code to share > > those calculations. At the other end, you could use a soft processor > > such as a PicoBlaze to do the calculations in software. > > > Regards, > > > John McCaskillwww.FasterTechnology.com-Hide quoted text - > > > - Show quoted text - > > What sorts of operations are the biggest gate-hogs? > > I have a lot of comparison "if" operations, counters, and non-blocking > assignments to convert lots of inputs into usable arrays. The > averagers each divide by 32 and I have another single divider toward > the end that divides by 256. Other than that, I'm not doing anything > very fancy. I have no multipliers (though I might like to add one), > no "for" loops, etc. > > I do have a series of hard-coded standard values that I use for > comparison. They are in the form of parameters that are fed to each > of the input counter modules when they are instantiated in the top > module. I suppose these could be EPROM memories, but I haven't > figured out yet how to use the memory provided on the development > board. > > Don What tools are you using for synthesis? If ISE / XST (webpack or foundation from Xilinx) which version? Things like divide by power of two should take no resources whatever (i.e. shift operators are basically wires). However a synthesis tool may look at the division operator and think you need a divider, which will take a lot of logic. Also since you seem to be register-heavy, see where you can use serial shift registers or memory instead of loose flip-flops. In Spartan 3 you get 16 stages of serial shift register or 16 bits of distributed RAM from a single LUT site. Coding shift registers without a reset term allows the synthesizer to place them in these structures instead of flip-flops (which come one to a LUT site). Did you look at your map report or "design summary"? In the latest version of ISE the design summary can show you where your largest resource allocations come from. Regards, GaborArticle: 134331
On Aug 6, 10:49=A0am, Gabor <ga...@alacron.com> wrote: > On Aug 6, 10:43 am, eromlignod <eromlig...@aol.com> wrote: > > > > > > > On Aug 6, 9:21 am, John McCaskill <jhmccask...@gmail.com> wrote: > > > > On Aug 6, 8:40 am, eromlignod <eromlig...@aol.com> wrote: > > > > > Hi guys: > > > > > I'm prototyping an application using a Xilinx Spartan-3 development > > > > board. =A0I'm using this particular development kit because it is s= uited > > > > to the large amount of I/O I need. > > > > > I'm new to FPGA, so I have written the code in Verilog using almost > > > > exclusively a high-level, behavioural style. =A0The program works, = but > > > > synthesizes using 99% of the available slices. =A0So if I try to ch= ange > > > > or improve the code, it often synthesizes to over 100% and kicks ou= t > > > > an error. > > > > > I need to condense what I've got to give me some space to work with= . > > > > > The application is basically a large number of high-speed pulse > > > > inputs. =A0I count them all independently and average several readi= ngs > > > > over time for each to produce a 21-bit number. =A0Each of these 21-= bit > > > > vectors (there are almost 100) is sent to a central processing modu= le > > > > that evaluates and compares them using simple arithmetic. =A0Based = on > > > > these comparisons, another set of vectors is sent on to a couple of > > > > modules that arrange them into a special synchronous serial output. > > > > That's all it does. > > > > > Are there any standard tips or general guidelines that you might of= fer > > > > to condense my synthesis? =A0I have found, for example, that making= the > > > > vectors smaller doesn't really change the overall slice count, yet > > > > commenting out a single line of the processing code can change it > > > > drastically. > > > > > Any ideas or comments would be greatly appreciated. > > > > > Don > > > > Since you state that you run out of slices, I know that your design i= s > > > larger than the FPGA can hold, but I would still point out that the > > > slice utilization is a pessimistic view of how much of the FPGA you > > > are using, the mapping stage spreads the logic out by default instead > > > of packing it as tightly as possible. =A0The Register and LUT > > > utilization is an optimistic measure of how much of the FPGA you have > > > left. =A0You need to watch all of them to get a good idea of how full > > > your design really is. > > > > You mention both a high speed pulse counting section that counts and > > > averages over time, and then a processing section that sounds like it > > > is slower. How much slower is it? =A0If you can share resources over > > > time in this section you could save resources. > > > > You can look in the reports to see how many adders, etc the tools > > > inferred from your code. =A0Your goal is to reduce that number to the > > > minimum required to perform the comparisons. =A0You have a range of > > > options that depend on your constraints. =A0At one end of the spectru= m, > > > just find any redundant calculations and rearrange your code to share > > > those calculations. At the other end, you could use a soft processor > > > such as a PicoBlaze to do the calculations in software. > > > > Regards, > > > > John McCaskillwww.FasterTechnology.com-Hidequoted text - > > > > - Show quoted text - > > > What sorts of operations are the biggest gate-hogs? > > > I have a lot of comparison "if" operations, counters, and non-blocking > > assignments to convert lots of inputs into usable arrays. =A0The > > averagers each divide by 32 and I have another single divider toward > > the end that divides by 256. =A0Other than that, I'm not doing anything > > very fancy. =A0I have no multipliers (though I might like to add one), > > no "for" loops, etc. > > > I do have a series of hard-coded standard values that I use for > > comparison. =A0They are in the form of parameters that are fed to each > > of the input counter modules when they are instantiated in the top > > module. =A0I suppose these could be EPROM memories, but I haven't > > figured out yet how to use the memory provided on the development > > board. > > > Don > > What tools are you using for synthesis? =A0If ISE / XST (webpack or > foundation from Xilinx) which version? > > Things like divide by power of two should take no resources whatever > (i.e. shift operators are basically wires). =A0However a synthesis tool > may look at the division operator and think you need a divider, which > will take a lot of logic. > > Also since you seem to be register-heavy, see where you can > use serial shift registers or memory instead of loose flip-flops. > In Spartan 3 you get 16 stages of serial shift =A0register or 16 > bits of distributed RAM from a single LUT site. =A0Coding shift > registers without a reset term allows the synthesizer to place > them in these structures instead of flip-flops (which come > one to a LUT site). > > Did you look at your map report or "design summary"? =A0In > the latest version of ISE the design summary can show you > where your largest resource allocations come from. > > Regards, > Gabor- Hide quoted text - > > - Show quoted text - Interesting. Thanks Gabor! This may be very useful. I have a large number of 8- bit vectors in my design. I have about 220 of them passing from one module to another. They each begin as an "output reg [7:0]" in one module and are all assigned to an array in the other module like this. reg [7:0] array [219:0]; =2E.. y[0] <=3D array[0]; y[1] <=3D array[1]; y[2] <=3D array[3]; =2E..etc. Is this bad form? DonArticle: 134332
On Aug 6, 11:24=A0am, eromlignod <eromlig...@aol.com> wrote: > On Aug 6, 10:49=A0am, Gabor <ga...@alacron.com> wrote: > > > > > > > On Aug 6, 10:43 am, eromlignod <eromlig...@aol.com> wrote: > > > > On Aug 6, 9:21 am, John McCaskill <jhmccask...@gmail.com> wrote: > > > > > On Aug 6, 8:40 am, eromlignod <eromlig...@aol.com> wrote: > > > > > > Hi guys: > > > > > > I'm prototyping an application using a Xilinx Spartan-3 developme= nt > > > > > board. =A0I'm using this particular development kit because it is= suited > > > > > to the large amount of I/O I need. > > > > > > I'm new to FPGA, so I have written the code in Verilog using almo= st > > > > > exclusively a high-level, behavioural style. =A0The program works= , but > > > > > synthesizes using 99% of the available slices. =A0So if I try to = change > > > > > or improve the code, it often synthesizes to over 100% and kicks = out > > > > > an error. > > > > > > I need to condense what I've got to give me some space to work wi= th. > > > > > > The application is basically a large number of high-speed pulse > > > > > inputs. =A0I count them all independently and average several rea= dings > > > > > over time for each to produce a 21-bit number. =A0Each of these 2= 1-bit > > > > > vectors (there are almost 100) is sent to a central processing mo= dule > > > > > that evaluates and compares them using simple arithmetic. =A0Base= d on > > > > > these comparisons, another set of vectors is sent on to a couple = of > > > > > modules that arrange them into a special synchronous serial outpu= t. > > > > > That's all it does. > > > > > > Are there any standard tips or general guidelines that you might = offer > > > > > to condense my synthesis? =A0I have found, for example, that maki= ng the > > > > > vectors smaller doesn't really change the overall slice count, ye= t > > > > > commenting out a single line of the processing code can change it > > > > > drastically. > > > > > > Any ideas or comments would be greatly appreciated. > > > > > > Don > > > > > Since you state that you run out of slices, I know that your design= is > > > > larger than the FPGA can hold, but I would still point out that the > > > > slice utilization is a pessimistic view of how much of the FPGA you > > > > are using, the mapping stage spreads the logic out by default inste= ad > > > > of packing it as tightly as possible. =A0The Register and LUT > > > > utilization is an optimistic measure of how much of the FPGA you ha= ve > > > > left. =A0You need to watch all of them to get a good idea of how fu= ll > > > > your design really is. > > > > > You mention both a high speed pulse counting section that counts an= d > > > > averages over time, and then a processing section that sounds like = it > > > > is slower. How much slower is it? =A0If you can share resources ove= r > > > > time in this section you could save resources. > > > > > You can look in the reports to see how many adders, etc the tools > > > > inferred from your code. =A0Your goal is to reduce that number to t= he > > > > minimum required to perform the comparisons. =A0You have a range of > > > > options that depend on your constraints. =A0At one end of the spect= rum, > > > > just find any redundant calculations and rearrange your code to sha= re > > > > those calculations. At the other end, you could use a soft processo= r > > > > such as a PicoBlaze to do the calculations in software. > > > > > Regards, > > > > > John McCaskillwww.FasterTechnology.com-Hidequotedtext - > > > > > - Show quoted text - > > > > What sorts of operations are the biggest gate-hogs? > > > > I have a lot of comparison "if" operations, counters, and non-blockin= g > > > assignments to convert lots of inputs into usable arrays. =A0The > > > averagers each divide by 32 and I have another single divider toward > > > the end that divides by 256. =A0Other than that, I'm not doing anythi= ng > > > very fancy. =A0I have no multipliers (though I might like to add one)= , > > > no "for" loops, etc. > > > > I do have a series of hard-coded standard values that I use for > > > comparison. =A0They are in the form of parameters that are fed to eac= h > > > of the input counter modules when they are instantiated in the top > > > module. =A0I suppose these could be EPROM memories, but I haven't > > > figured out yet how to use the memory provided on the development > > > board. > > > > Don > > > What tools are you using for synthesis? =A0If ISE / XST (webpack or > > foundation from Xilinx) which version? > > > Things like divide by power of two should take no resources whatever > > (i.e. shift operators are basically wires). =A0However a synthesis tool > > may look at the division operator and think you need a divider, which > > will take a lot of logic. > > > Also since you seem to be register-heavy, see where you can > > use serial shift registers or memory instead of loose flip-flops. > > In Spartan 3 you get 16 stages of serial shift =A0register or 16 > > bits of distributed RAM from a single LUT site. =A0Coding shift > > registers without a reset term allows the synthesizer to place > > them in these structures instead of flip-flops (which come > > one to a LUT site). > > > Did you look at your map report or "design summary"? =A0In > > the latest version of ISE the design summary can show you > > where your largest resource allocations come from. > > > Regards, > > Gabor- Hide quoted text - > > > - Show quoted text - > > Interesting. > > Thanks Gabor! =A0This may be very useful. =A0I have a large number of 8- > bit vectors in my design. =A0I have about 220 of them passing from one > module to another. =A0They each begin as an "output reg [7:0]" in one > module and are all assigned to an array in the other module like this. > > reg [7:0] array [219:0]; > ... > y[0] <=3D array[0]; > y[1] <=3D array[1]; > y[2] <=3D array[3]; > ...etc. > > Is this bad form? > > Don- Hide quoted text - > > - Show quoted text - Oops. I meant for that code to be: input [7:0] y0; input [7:0] y1; =2E.. reg [7:0] array [219:0]; =2E.. array[0] <=3D y0; array[1] <=3D y1; =2E..etc. DonArticle: 134333
eromlignod wrote: > Mike: > > I'm intrigued by your answer, but don't fully understand what you > propose. You say that I should construct my serial signal a bit at a > time, but how else can I? I meant to suggest arranging some sort of pipeline to work on the math while you are shifting the answer out. -- Mike TreselerArticle: 134334
eromlignod wrote: > Hi guys: > > I'm prototyping an application using a Xilinx Spartan-3 development > board. I'm using this particular development kit because it is suited > to the large amount of I/O I need. > > I'm new to FPGA, so I have written the code in Verilog using almost > exclusively a high-level, behavioural style. The program works, but > synthesizes using 99% of the available slices. So if I try to change > or improve the code, it often synthesizes to over 100% and kicks out > an error. Are you talking about synthesis or place and route? It's not an error (at least through 9.1i) to synthesize to more logic than is available in a part. My current design synthesizes to 105% of device resources. Mapping optimizes the design and gets rid of redundant or unused logic. It's also not unusual to have 99% of the slices occupied. The tools prefer to spread the design over as many slices as possible. The map report will show how many resources are really used (LUTs, FF, BRAM, Clocks, MULT.....). --- Joe SamsonArticle: 134335
On Aug 6, 11:31=A0am, eromlignod <eromlig...@aol.com> wrote: > On Aug 6, 11:24=A0am, eromlignod <eromlig...@aol.com> wrote: > > > > > On Aug 6, 10:49=A0am, Gabor <ga...@alacron.com> wrote: > > > > On Aug 6, 10:43 am, eromlignod <eromlig...@aol.com> wrote: > > > > > On Aug 6, 9:21 am, John McCaskill <jhmccask...@gmail.com> wrote: > > > > > > On Aug 6, 8:40 am, eromlignod <eromlig...@aol.com> wrote: > > > > > > > Hi guys: > > > > > > > I'm prototyping an application using a Xilinx Spartan-3 develop= ment > > > > > > board. =A0I'm using this particular development kit because it = is suited > > > > > > to the large amount of I/O I need. > > > > > > > I'm new to FPGA, so I have written the code in Verilog using al= most > > > > > > exclusively a high-level, behavioural style. =A0The program wor= ks, but > > > > > > synthesizes using 99% of the available slices. =A0So if I try t= o change > > > > > > or improve the code, it often synthesizes to over 100% and kick= s out > > > > > > an error. > > > > > > > I need to condense what I've got to give me some space to work = with. > > > > > > > The application is basically a large number of high-speed pulse > > > > > > inputs. =A0I count them all independently and average several r= eadings > > > > > > over time for each to produce a 21-bit number. =A0Each of these= 21-bit > > > > > > vectors (there are almost 100) is sent to a central processing = module > > > > > > that evaluates and compares them using simple arithmetic. =A0Ba= sed on > > > > > > these comparisons, another set of vectors is sent on to a coupl= e of > > > > > > modules that arrange them into a special synchronous serial out= put. > > > > > > That's all it does. > > > > > > > Are there any standard tips or general guidelines that you migh= t offer > > > > > > to condense my synthesis? =A0I have found, for example, that ma= king the > > > > > > vectors smaller doesn't really change the overall slice count, = yet > > > > > > commenting out a single line of the processing code can change = it > > > > > > drastically. > > > > > > > Any ideas or comments would be greatly appreciated. > > > > > > > Don > > > > > > Since you state that you run out of slices, I know that your desi= gn is > > > > > larger than the FPGA can hold, but I would still point out that t= he > > > > > slice utilization is a pessimistic view of how much of the FPGA y= ou > > > > > are using, the mapping stage spreads the logic out by default ins= tead > > > > > of packing it as tightly as possible. =A0The Register and LUT > > > > > utilization is an optimistic measure of how much of the FPGA you = have > > > > > left. =A0You need to watch all of them to get a good idea of how = full > > > > > your design really is. > > > > > > You mention both a high speed pulse counting section that counts = and > > > > > averages over time, and then a processing section that sounds lik= e it > > > > > is slower. How much slower is it? =A0If you can share resources o= ver > > > > > time in this section you could save resources. > > > > > > You can look in the reports to see how many adders, etc the tools > > > > > inferred from your code. =A0Your goal is to reduce that number to= the > > > > > minimum required to perform the comparisons. =A0You have a range = of > > > > > options that depend on your constraints. =A0At one end of the spe= ctrum, > > > > > just find any redundant calculations and rearrange your code to s= hare > > > > > those calculations. At the other end, you could use a soft proces= sor > > > > > such as a PicoBlaze to do the calculations in software. > > > > > > Regards, > > > > > > John McCaskillwww.FasterTechnology.com-Hidequotedtext- > > > > > > - Show quoted text - > > > > > What sorts of operations are the biggest gate-hogs? > > > > > I have a lot of comparison "if" operations, counters, and non-block= ing > > > > assignments to convert lots of inputs into usable arrays. =A0The > > > > averagers each divide by 32 and I have another single divider towar= d > > > > the end that divides by 256. =A0Other than that, I'm not doing anyt= hing > > > > very fancy. =A0I have no multipliers (though I might like to add on= e), > > > > no "for" loops, etc. > > > > > I do have a series of hard-coded standard values that I use for > > > > comparison. =A0They are in the form of parameters that are fed to e= ach > > > > of the input counter modules when they are instantiated in the top > > > > module. =A0I suppose these could be EPROM memories, but I haven't > > > > figured out yet how to use the memory provided on the development > > > > board. > > > > > Don > > > > What tools are you using for synthesis? =A0If ISE / XST (webpack or > > > foundation from Xilinx) which version? > > > > Things like divide by power of two should take no resources whatever > > > (i.e. shift operators are basically wires). =A0However a synthesis to= ol > > > may look at the division operator and think you need a divider, which > > > will take a lot of logic. > > > > Also since you seem to be register-heavy, see where you can > > > use serial shift registers or memory instead of loose flip-flops. > > > In Spartan 3 you get 16 stages of serial shift =A0register or 16 > > > bits of distributed RAM from a single LUT site. =A0Coding shift > > > registers without a reset term allows the synthesizer to place > > > them in these structures instead of flip-flops (which come > > > one to a LUT site). > > > > Did you look at your map report or "design summary"? =A0In > > > the latest version of ISE the design summary can show you > > > where your largest resource allocations come from. > > > > Regards, > > > Gabor- Hide quoted text - > > > > - Show quoted text - > > > Interesting. > > > Thanks Gabor! =A0This may be very useful. =A0I have a large number of 8= - > > bit vectors in my design. =A0I have about 220 of them passing from one > > module to another. =A0They each begin as an "output reg [7:0]" in one > > module and are all assigned to an array in the other module like this. > > > reg [7:0] array [219:0]; > > ... > > y[0] <=3D array[0]; > > y[1] <=3D array[1]; > > y[2] <=3D array[3]; > > ...etc. > > > Is this bad form? > > > Don- Hide quoted text - > > > - Show quoted text - > > Oops. =A0I meant for that code to be: > > input [7:0] y0; > input [7:0] y1; > ... > reg [7:0] array [219:0]; > ... > array[0] <=3D y0; > array[1] <=3D y1; > ...etc. > > Don If you can map this onto a block ram, you will save quite a bit of registers. Whether or not you can do this depends on if you can write the vectors in one (or a few) at a time, and process them sequentially in the time you have available. How much time do you have to process the vectors? Ns, us, ms ? Regards, John McCaskill www.FasterTechnology.comArticle: 134336
On Aug 6, 11:50=A0am, John McCaskill <jhmccask...@gmail.com> wrote: > If you can map this onto a block ram, you will save quite a bit of > registers. Whether or not you can do this depends on if you can write > the vectors in one (or a few) at a time, and process them sequentially > in the time you have available. =A0How much time do you have to process > the vectors? Ns, us, ms ? Ah, I think I'm following along now. Are you talking about sending the numbers over a single 8-bit vector wire one-at-a-time? Hmmm. The vectors are actually independent from each other and refresh at various random rates, so a few usec here or there shouldn't make a difference. I'll give it a try! DonArticle: 134337
> what is the difference between the processor clk and the bus clock in > edk platform, It is up to you - they can be the same or different. The processor clock determines the speed instructions execute at, the bus clock determines the speed of bus transactions. The processor clock can often be faster than the bus clock (E.g. CPU at 300MHz, bus at 100MHz), particularly if you are using a PowerPC. JonArticle: 134338
On Aug 6, 11:43=A0am, Joseph Samson <u...@not.my.company> wrote: > Are you talking about synthesis or place and route? It's not an error > (at least through 9.1i) to synthesize to more logic than is available in > a part. My current design synthesizes to 105% of device resources. > Mapping optimizes the design and gets rid of redundant or unused logic. > > It's also not unusual to have 99% of the slices occupied. The tools > prefer to spread the design over as many slices as possible. > > The map report will show how many resources are really used (LUTs, FF, > BRAM, Clocks, MULT.....). Right, I meant when I process all the way to place & route, not just synthesis. It is funny that it is at 99%. In fact, it shows that all but two of the slices are used (!!). It does list usage as 76% related logic and 24% unrelated logic. I'm not sure how to remedy that. DonArticle: 134339
On Aug 6, 10:36=A0am, eromlignod <eromlig...@aol.com> wrote: <snip> > Right, I meant when I process all the way to place & route, not just > synthesis. =A0It is funny that it is at 99%. =A0In fact, it shows that al= l > but two of the slices are used (!!). > > It does list usage as 76% related logic and 24% unrelated logic. =A0I'm > not sure how to remedy that. > > Don The xilinx mapper will spread logic out with one LUT per slice until it fills to nearly 100% of the slices then it will backfill the 2nd LUT in each slice where conditions permit. There's usually a good stretch between 99% and 101%. If you explained better what you're trying to do (signal quantity, what the counts represent, frequencies of the signals and the system clock) you might get better suggestions on how to code things. Right now most of us are taking stabs in the dark.Article: 134340
> what's the point here? The point is that the embedded programmer has to read the hardware manual first and/or talk to the hardware designer. Asking generic questions here won't help! /MikhailArticle: 134341
Jon Beniston wrote: >>Ther is a fairly well argued rant against patents (or at least the USA's >>interpretation of patents) athttp://www.embeddedtechjournal.com/articles_2008/20080729_patent.htm >> >>I'm not sure that the idea of patents is completely broken, but it >>certainly needs major reform so that innovation is reasonably protected and >>rewarded. A copiers' free-for-all would also be bad, IMHO.- Hide quoted text - > > > It seems far too many companies seem to think that anything new they > do is patentable. In the UK there is a requirement that the invention > must also not be obvious to anyone with experience in the field. I thought that was fairly global ? Of course, patent attorneys have no motivation (or skill) to apply this test, (nor indeed of researching prior art any further than google) they are far more interested in chargable work, like researching other claims, and writing up the patent itself. A patent is merely a license to litigate, and if it IS contested, guess who gets income ? > If the same problem that is solved by many inventions in recent patents > were to be given to 10 other engineers, I'm fairly sure some of them > would come up with similar or better solutions. If that is the case, > then I don't think an invention should be patentable. If you solve > something that others have struggled with, then maybe you have a case. It is very rare to see a patent that does get above the 'obvious' and 'prior art' thresholds. -jgArticle: 134342
eromlignod wrote: > Hi guys: > > I'm prototyping an application using a Xilinx Spartan-3 development > board. I'm using this particular development kit because it is suited > to the large amount of I/O I need. > > I'm new to FPGA, so I have written the code in Verilog using almost > exclusively a high-level, behavioural style. The program works, but > synthesizes using 99% of the available slices. So if I try to change > or improve the code, it often synthesizes to over 100% and kicks out > an error. > > I need to condense what I've got to give me some space to work with. > > The application is basically a large number of high-speed pulse > inputs. Define 'high speed', and what timebase reading rate ? What are the pulses coming from ? To some, microseconds is high speed, to others, femtoseconds is high speed..... > I count them all independently and average several readings > over time for each to produce a 21-bit number. If you mean several readings from the same channel, a longer count time will do that for free. Are you reading frequency? (fixed time readout of the counters) > Each of these 21-bit > vectors (there are almost 100) is sent to a central processing module > that evaluates and compares them using simple arithmetic. Based on > these comparisons, another set of vectors is sent on to a couple of > modules that arrange them into a special synchronous serial output. > That's all it does. What sort of comparison, and what decision rates are you talking ? Is that processing software, or hardware ? Do you need 21 bits of precision, or just 21 bits of dynamic range ? A quasi-log counter bus would drop the fan-out. (so a 13 bit MSB and a 3 bit exponent, would mux on 16 bit data paths - 76% of the mux logic right there. -jgArticle: 134343
"Matthias Alles" <REMOVEallesCAPITALS@NOeit.SPAMuni-kl.de> wrote in message news:g6rmt5$ef7$1@news.uni-kl.de... > Thanks for the information. That's good to know! I didn't think about > that at all.. > > Are there any plans to include FPU support into xilkernel for a future > release? Yes we do plan to add the support. I don't have a hard date for it though :(Article: 134344
On Aug 6, 12:31=A0pm, eromlignod <eromlig...@aol.com> wrote: > On Aug 6, 11:50=A0am, John McCaskill <jhmccask...@gmail.com> wrote: > > > If you can map this onto a block ram, you will save quite a bit of > > registers. Whether or not you can do this depends on if you can write > > the vectors in one (or a few) at a time, and process them sequentially > > in the time you have available. =A0How much time do you have to process > > the vectors? Ns, us, ms ? > > Ah, I think I'm following along now. =A0Are you talking about sending > the numbers over a single 8-bit vector wire one-at-a-time? =A0Hmmm. > > The vectors are actually independent from each other and refresh at > various random rates, so a few usec here or there shouldn't make a > difference. =A0I'll give it a try! > > Don You are asking good questions, so there are multiple people here that will be happy to help you out. However, you are asking for some low level suggestions without giving enough high level detail. The best optimizations are the ones that you apply at the high level where you have the most leverage. If you can tell us more about what you are trying to do you will get better responses. You said that you have almost 100 high speed channels. How many channels are there? How fast are the pulses arriving on average? Over what time is the average? What is the air speed of an unladen swallow? What is the minimum spacing between pulses? How fast does your central processing module need to compare the channels? As Jim Granville pointed our, the various time bases of your problem have a major impact on the potential solutions. Regards, John McCaskill www.FasterTechnology.comArticle: 134345
Very cool looking tutorial! Thanks! Will i need the files included or can i look at the images in the pdf and complete it? thanks!! On Aug 6, 4:25=A0am, Herbert Kleebauer <k...@unibwm.de> wrote: > laserbeak43 wrote: > > > thanks for the offer, but the links seem to be dead, and the only > > version > > of ISE that truly works for me, is 7.1i(i haven't tried anything > > earlier than that, though) > > although, i'm sure i could easily change things around to get it to > > work in 7. > > Sorry: > > ftp://137.193.64.130/pub/mproz/mproz3_e.pdfftp://137.193.64.130/pub/mproz= /mproz3.zip > > Or here a http mirror: > > http://www.ikomi.de/pub/ > > > On Aug 6, 4:07 am, Herbert Kleebauer <k...@unibwm.de> wrote: > > > laserbeak43 wrote: > > > > Hmm, so i've heard. Everyone says Xilinx stuff is bad for beginners > > > > and i must admit, > > > > I've been doing nothing but troubleshooting ever since i got this > > > > board, what a headache. > > > > Take a look at: > > > >ftp://137.193.164.130/pub/mproz/mproz3_e.pdfftp://137.193.164.130/pub.= .. > > > > It's a step by step introduction for a simple design > > > (ISE 9.2, Spartan3E) using schematic entry.Article: 134346
hi, I keep getting this error in the implement and design phase with ISE: 'C:\MinGW\include\C__~1\342EBD~1.5\map' is not recognized as an internal or external command, operable program or batch file. I have MinGW installed but that dir doesn't exist. although i do know what dir it's looking for. Is anyone familiar with this error? can someone please help with this? Thanks MalikArticle: 134347
laserbeak43 wrote: > > Very cool looking tutorial! Thanks! > Will i need the files included or can i look at the images in the pdf > and complete it? If you want to do it yourself, then don't even look at the schematics in the pdf but only at the specification of the processor architecture (page 1-3 of the pdf, not page 4 because this already shows the solution). But I think you first want to get used to the Xilinx software (schematic entry, simulation and implementation). Therefore I would suggest to use the provided schematics and follow the step-by-step tutorial and simulate the provided simple assembly program. If all works, delete all logic (gates and flip-flops) from the lower level schematics (that's the starting point for our students) and redesign the CPU yourself. But if you want to do all yourself from scratch, you will need at least the assembler from the "addon" directory. If you worry about the executable, just compile the provided C sources yourself: gcc -O3 -o xdela xdela.cArticle: 134348
hi all; I have a question about the system clk in EDK project, my processor clk is 100 Mhz and my bus clk is 25Mhz , i tried to increase the bus clk to 50 Mhz , i did this by changing in the DCM , as i changed the CLKDV divisor to 2 instead of 4 , which means as i thought (100/2) instead of (100/4), and cleaned the hardware and i stared to generate the netlist and the bitstream but i don't know the uart is not working in a right way , and of course i couldn't examine the rest of the system . i don't know if what i did is right or there is something missed or there is another way to change the frequency. thanks fatma From glaser@ict.tuwien.ac.at Thu Aug 07 02:29:43 2008 Path: nlpi059.nbdc.sbc.com!nlpi062.nbdc.sbc.com!prodigy.com!nlpi057.nbdc.sbc.com!prodigy.net!feeder.erje.net!newsfeed.utanet.at!newsfeed.wu-wien.ac.at!aconews-feed.univie.ac.at!aconews.univie.ac.at!not-for-mail Newsgroups: comp.arch.fpga Subject: RTL Schematic as EDIF From: Johann Glaser <glaser@ict.tuwien.ac.at> Content-Type: text/plain Content-Transfer-Encoding: 7bit Message-Id: <1218101383.15335.13.camel@glaser> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Date: Thu, 07 Aug 2008 11:29:43 +0200 Lines: 29 NNTP-Posting-Host: pc53.ict.tuwien.ac.at X-Trace: 1218101250 tunews.univie.ac.at 11868 128.131.80.53 X-Complaints-To: abuse@tuwien.ac.at Xref: prodigy.net comp.arch.fpga:147114 X-Received-Date: Thu, 07 Aug 2008 05:31:02 EDT (nlpi059.nbdc.sbc.com) Hi! My PhD thesis deals with coarse-grained reconfigurable logic. Therefore the RTL schematic synthesis result is one major input for my work. I tried Xilinx ISE 10.1 as well as Synplicity Synplify Pro 9.2. Both tools provide this RTL netlist (before implementing it to the technology netlist), but both in encrypted file formats. Xilinx ISE 10.1 saves the file as NGR file. Unfortunately there is no ngr2edif tool provided (while an ngc2edif is available). Synplicity Synplify Pro 9.2 saves a SRS file and provides an edf2srs tool, but no reverse. Could you please point me to tools which can convert these files formats to open formats (especially EDIF) or to synthesis tools (not necessarily for FPGA, a tool from an ASIC flow is ok too), which save the RTL schematic as open file formats. Thanks Hansi -- Johann Glaser <glaser@ict.tuwien.ac.at> Institute of Computer Technology, E384 Vienna University of Technology, Gusshausstr. 27-29, A-1040 Wien Phone: ++43/1/58801-38444 Fax: ++43/1/58801-38499Article: 134349
> I'm intrigued by your answer, but don't fully understand what you > propose. You need to have a better understanding of what's generated by your code. Remember you're describing hardware. > My last serial generating module has a big 256 vector input that it is > translating to a serial output that repeats the 256 bits over and > over. The code is basically something like this: > input [255:0] invector; > output serout; > reg [7:0] x; > always @(negedge shiftclock) > begin > x = x + 1; > serout = invector[x]; > end That (probably) creates a 256 bit vector and a massive mux to select one of the bits. In VHDL the following generates a big shift register which the tools will find dead easy to place and route as each logical path is just from one register to the next..... if(rising_edge(clk)) then invector(254 downto 0) <= invector(255 downto 1); serout <= invector(0); end if; This should be easily translated to verilog. Nial.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z