Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Hi everyone, I have several ports of my design that are not driving anything and left 'open' on purpose, using the 'open' keyword in my component instantiation in vhdl. Now I receive loads of 'Warning: CMP201...' from Designer because of this. Is there a way not to be annoyed by these warnings with the possibility to miss an important one? I did not post this thread to comp.lang.vhdl because I do believe this is not a vhdl issue but rather a tool issue. Thanks a lot, Al -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?Article: 155751
alb <alessandro.basili@cern.ch> wrote: (snip) > I have several ports of my design that are not driving anything > and left 'open' on purpose, using the 'open' keyword in my component > instantiation in vhdl. I think last time I did this (in verilog) I wired them as outputs with the output enable tied low. Some time ago, the tools I used wired unused outputs low, and it turned out that they were connected to other signals on the board (that I didn't know about). -- glenArticle: 155752
chthon wrote: > Dear all, > > I have the following ALU code as part of a data path: > > -- purpose: simple ALU > -- type : combinational > -- inputs : a, b, op_sel > -- outputs: y > alu : PROCESS (a, b, op_sel) > BEGIN -- PROCESS alu > CASE op_sel IS > WHEN "0000" => -- increment > y <= a + 1; > WHEN "0001" => -- decrement > y <= a - 1; > WHEN "0010" => -- test for zero > y <= a; > WHEN "0111" => -- addition > y <= a + b; > WHEN "1000" => -- subtract, compare > y <= a - b; > WHEN "1010" => -- logical and > y <= a AND b; > WHEN "1011" => -- logical or > y <= a OR b; > WHEN "1100" => -- logical xor > y <= a XOR b; > WHEN "1101" => -- logical not > y <= NOT a; > WHEN "1110" => -- shift left logical > y <= a SLL 1; > WHEN "1111" => -- shift right logical > y <= a SRL 1; > WHEN OTHERS => > y <= a; > END CASE; > END PROCESS alu; > > Is it normal that this is synthesized as 14 separate functions, which are then multiplexed through a 14:1 multiplexer onto the y bus? I am just trying to find out if this is the fastest implementation that can be had, or if it is possible to get a faster implementation by mapping more functions into a single slice, so that the multiplexer becomes smaller. > > As an example of something similar, the multiplexers generated by default by ISE tend to cascade, while building them from LUT6 gives the possibility to build wide but not deep multiplexers (XAPP522). > > Regards, > > Jurgen 1 - I only see 12 items in the case, where did 14 come from? 2 - What synthesis tool are you using & what part are you targetting? A lot of optimization is architecture-specific. If your part is good at implementing wide muxes, maybe the solution your synthesis tool found is a good one. If it were me using Xilinx tools, I'd look at the worst case delays after map (route delays not included) to see how many levels of logic they have and how fast they go. Note that some "logic levels" (e.g. LUT) cause more delay than others (e.g. carry chain). -- GaborArticle: 155753
On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote: > > I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (out of 18K memory bits only 2K bits used). It's hard to translate exactly into old-fashioned LUTs, but I'd say - around 700. > Per clock Nios2e is pretty slow, but it clocks rather high and it is a 32-bit CPU - very easy to program in C. I can't say I fully understand the ALM, but I think it functions as a lot more than just a pair of 4 input LUTs. It will do that without any issue. But it will do a lot more and I expect this is used to great advantage in a CPU. I'd say the ALM is equivalent to between 3 and 4 LUT4s depending on the design. I guess it is hard to compare between different device types. > Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory for fabric, could be an interesting exercise, well suitable for coding competition. But, probably, illegal :( Yes, there are always lots of tradeoffs to be considered. -- RickArticle: 155754
On 8/25/2013 9:35 PM, glen herrmannsfeldt wrote: > rickman<gnuarm@gmail.com> wrote: > > (snip) >> I used to rail against the FPGA vendor's decisions in packaging. I find >> it very inconvenient at a minimum. But these days I have learned to do >> the Zen thing and reabsorb my dissatisfaction so as to turn it to an >> advantage. I'm not sure what that means exactly, but I've given up >> trying to think of FPGAs as an MCU substitute. > >> I suppose the markets are different enough that FPGAs just can't be >> produced economically in as wide a range of packaging. I think that is >> what Austin Leesa used to say, that it just costs too much to provide >> the parts in a lot of packages. Their real market is at the high end. >> Like the big CPU makers, it is all about raising the ASP, Average >> Selling Price. Which means they don't spend a lot of time wooing the >> small part users like us. > > In many cases they would put one in a package with fewer pins > than pads, a waste of good I/O drivers, but maybe useful for > some people. They often put parts in packages with a lot fewer pins than pads. Check out nearly any data sheet and you'll see what I mean. The Cyclone V book shows parts with I/O pin counts varying between 240 and 480 depending on the package. > I don't know much much it costs just to support an additional > package, though. Austin Leesa from Xilinx claimed it was prohibitive, at least for the low end devices. > Also, a big problem with FPGAs, and ICs in general, is a low enough > lead inductance. Many packages that would otherwise be useful have > too much inductance. That depends entirely on your design. All the devices I have used have at least two or more levels of I/O pin drive current which will help prevent issues from lead inductance. Even so, there are plenty of small packages with no leads greatly reducing the inductance. -- RickArticle: 155755
On 8/26/2013 6:58 PM, jonesandy@comcast.net wrote: > On Saturday, August 24, 2013 11:46:59 AM UTC-5, rickman wrote: >> Two comments. He is describing two packages with the same body >> size and so the caps would be the same distance from the chip. >> But also, when you use power and group planes with effective >> coupling, the distance of the cap from the chip is nearly moot. >> The power planes act as a transmission line providing the >> current until the wave reaches the capacitor. Transmission >> lines are your friend. -- Rick > > Two more comments... > > The problem with leaded packages is, especially compared to flip-chip packages, is the electrical distance (and characteristic impedance) from the lead/board joint to the die pad, particularly for power/ground connections. The substrate for a CSP looks like a mini-circuit board with its own power/ground planes. > > Sure you can put the cap on the board close to the power/ground lead, but you cannot get it electrically as close to the die pad as you can with a flip-chip package. > > Transmission lines for power connections are not your friend, unless they are of very low characteristic impedance at the high frequencies of interest (e.g. transition times on your fast outputs, etc.) Until the wave traverses the length of the transmission line, you are effectively supplying current through a resistor with the same value as the transmission line impedance. > > What power planes do is provide "very low impedance transmission lines" for the power/ground connections, and the ability to connect an appropriately packaged capacitor to the end of that line with very low inductance. > > If your design is slow (output edge rates, not clock rates) or has few simultaneously switching outputs, it won't matter which package you use. My experience has been that for 95% of the designs done in FPGAs, especially at the low end, this is just not an issue. FPGAs typically have selectable drive strength which limits the issues of ground bounce in less than electrically desirable packages. If you have fast edges and lots of them, then use one of the more suitable packages. But if you don't, then use the package that otherwise suits the design. -- RickArticle: 155756
Hi Glen, On 27/08/2013 20:22, glen herrmannsfeldt wrote: >> I have several ports of my design that are not driving anything >> and left 'open' on purpose, using the 'open' keyword in my component >> instantiation in vhdl. > > I think last time I did this (in verilog) I wired them as outputs > with the output enable tied low. maybe I misstated my problem. The 'ports' I was referring to are ports of components (using vhdl terminology) and therefore are not connected to any physical port. They mostly refer to unused ports of vendor's IPs (like PLL, fifo, etc.). > Some time ago, the tools I used wired unused outputs low, and > it turned out that they were connected to other signals on the > board (that I didn't know about). That's an 'interesting' feature! Could you provide the name of the tool?Article: 155757
On 28/08/2013 11:06, alb wrote: .. > >> Some time ago, the tools I used wired unused outputs low, and >> it turned out that they were connected to other signals on the >> board (that I didn't know about). > > That's an 'interesting' feature! Could you provide the name of the tool? > Quartus used to connect unused pins to ground, not sure if this is still the case. After discovering this my scripts always started with: set_global_assignment -name RESERVE_ALL_UNUSED_PINS "AS INPUT TRI-STATED WITH WEAK PULL-UP" Hans www.ht-lab.comArticle: 155758
On Wednesday, August 28, 2013 11:51:36 AM UTC+3, rickman wrote: > On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote: > > > > > > I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (out of 18K memory bits only 2K bits used). It's hard to translate exactly into old-fashioned LUTs, but I'd say - around 700. > > > Per clock Nios2e is pretty slow, but it clocks rather high and it is a 32-bit CPU - very easy to program in C. > > > > I can't say I fully understand the ALM, but I think it functions as a > lot more than just a pair of 4 input LUTs. It will do that without any > issue. But it will do a lot more and I expect this is used to great > advantage in a CPU. I'd say the ALM is equivalent to between 3 and 4 > LUT4s depending on the design. I guess it is hard to compare between > different device types. > No, ALM is close to two 4-input LUTs. May be, a bit more when implementing complex tightly-coupled logic with high internal complexity to fanout ratio. May be, a bit less, when implementing simple things with lots of registers and high fanout. For sake of the argument, I compiled Nios2e for Cyclone4, which has more old-fashioned architecture - 676 LCs + 2 M9Ks. I also dug out my real-world design from many years ago that embeds Nios2e into Cyclone2. It is even smaller at 565 LCs + 2 M4Ks. > > > > > > Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory for fabric, could be an interesting exercise, well suitable for coding competition. But, probably, illegal :( > > Yes, there are always lots of tradeoffs to be considered. > My point is - if you don't need performance and can use embedded memories then you can design useful 32-bit RISC CPU which would be non-trivially smaller than 600 LCs. Nios2e core that I took as example is small, but hardly minimalistic. It implements full Nios2 architecture including several parts that you probably don't need. In particular: - everything related to interrupts and exceptions - support for big program address space - ability to run execute programs from any memories others than on-chip SRAMArticle: 155759
Is there a difference between Libero SoC and Actel designer? I use SmartDesign and right-click on the pins and set attribute to unused. -- SvennArticle: 155760
On 8/28/2013 3:06 AM, alb wrote: > Hi Glen, > > On 27/08/2013 20:22, glen herrmannsfeldt wrote: >>> I have several ports of my design that are not driving anything >>> and left 'open' on purpose, using the 'open' keyword in my component >>> instantiation in vhdl. >> >> I think last time I did this (in verilog) I wired them as outputs >> with the output enable tied low. > > maybe I misstated my problem. The 'ports' I was referring to are ports > of components (using vhdl terminology) and therefore are not connected > to any physical port. They mostly refer to unused ports of vendor's IPs > (like PLL, fifo, etc.). > >> Some time ago, the tools I used wired unused outputs low, and >> it turned out that they were connected to other signals on the >> board (that I didn't know about). > > That's an 'interesting' feature! Could you provide the name of the tool? The Xilinx CPLD tools used to do this for unused input pins. The following is from some really old code. I eventually created a dummy net to 'use' the unused inputs so they wouldn't do odd things. Quoting the code: -- A note about the 'xilinx_sucks' net: -- The fitter seems to believe that pins that are defined to be inputs -- which have no logic connected to them internally may be used by the -- CPLD for intermediate logic and as outputs!!!. Without this net, -- CPLD will output a clock signal on the A0 input pin!!! I think that later I found a switch that disabled this 'feature'. I always wondered if that was some kind of ground-bounce reduction feature. Rob.Article: 155761
On Tuesday, August 27, 2013 5:34:46 AM UTC-7, alb wrote: > Hi everyone, I have several ports of my design that are not driving anyth= ing and left 'open' on purpose, using the 'open' keyword in my component in= stantiation in vhdl. Now I receive loads of 'Warning: CMP201...' from Desig= ner because of this. Is there a way not to be annoyed by these warnings wit= h the possibility to miss an important one? I did not post this thread to c= omp.lang.vhdl because I do believe this is not a vhdl issue but rather a to= ol issue. Thanks a lot, Al -- A: Because it fouls the order in which people= normally read text. Q: Why is top-posting such a bad thing? A: Top-posting= . Q: What is the most annoying thing on usenet and in e-mail? Al, Not sure what the issue is, this is just the tool warning you that you have= unconnected output ports on instantiated components. If you don't care th= en ignore them. I find it hard to believe that this is the only warning yo= u are getting, why are you concerned about it? If you need more comfort this is from the Actel Knowledgebase for compiler = warnings: (http://www.actel.com/kb/article.aspx?id=3DSL1055) CODE: CMP201=20 Description: An output net is not driving any inputs and will be removed fr= om the design.=20 Recommended Action: None required, unless the net should actually be drivin= g logic. If so, correct and re-import the netlist.=20Article: 155762
On Tuesday, August 27, 2013 5:34:46 AM UTC-7, alb wrote: > Hi everyone, I have several ports of my design that are not driving anyth= ing and left 'open' on purpose, using the 'open' keyword in my component in= stantiation in vhdl. Now I receive loads of 'Warning: CMP201...' from Desig= ner because of this. Is there a way not to be annoyed by these warnings wit= h the possibility to miss an important one? I did not post this thread to c= omp.lang.vhdl because I do believe this is not a vhdl issue but rather a to= ol issue. Thanks a lot, Al -- A: Because it fouls the order in which people= normally read text. Q: Why is top-posting such a bad thing? A: Top-posting= . Q: What is the most annoying thing on usenet and in e-mail? OK I just reread this and noticed you just want a way to silence these warn= ings if I am reading correctly... nope I do not know how to silence specifi= c warnings with the Microsemi Designer. You probably have to pull the text= into python and do it there if you really need to. I quickly looked throu= gh the literature I have on Actel and the only related thing I saw was to l= imit the total number of warnings shown. The default is 10,000 but you can= change it with the tcl command "-pdc_eco_max_warnings value" where value i= s the max number of warnings. This is not going to fix your issue though.Article: 155763
On 8/26/2013 7:43 PM, jg wrote: > On Saturday, August 24, 2013 1:06:37 PM UTC+12, rickman wrote: >> >> >> I suppose the markets are different enough that FPGAs just can't be >> produced economically in as wide a range of packaging. I think that is >> what Austin Leesa used to say, that it just costs too much to provide >> the parts in a lot of packages. Their real market is at the high end. >> Like the big CPU makers, it is all about raising the ASP, Average >> Selling Price. > > I was meaning more at the lower end, - eg where lattice can offer parts in QFN32, then take a large jump to Qfp100 0.5mm. > QFN32 have only 21 io, so you easily exceed that, but there is a very large gap between the QFN32 and TQFP100. > They claim to be chasing the lower cost markets with these parts, but seem rather blinkered in doing so. > Altera well priced parts in gull wing,(MAX V) but only in a 0.4mm pitch. I think you mean Lattice offers a part in the QFN32. I only found the XO2-256. A selection guide that isn't even a year old says they offer an iCE40 part in this package, but it doesn't show up in the data sheet from 2013. I guess the product line is still a little schizo. They haven't finished cleaning house and any part or package could be next. >> As long as we are wishing for stuff, I'd really love to see a smallish >> MCU mated to a smallish FPGA. > > If you push that 'smallish', the Cypress PSoC series have uC+logic. That's a pretty hard "push". I've looked at them but I don't get a warm fuzzy from a company that makes everything an uphill climb when they seem to think they are making it easy. I've been looking at the PSOC parts since they were new. At that time support was so crude, they had a weekly conference call and if you joined in you got a 1 on 1 training session. That progressed through a long development aimed at making their parts push button and I am pretty sure that won't even come close to working for my needs. I need a small FPGA, maybe 1000 LUTs to provide the high speed portion of the design. I don't even need a "real" processor, I bet I could live a rich full life (in this design anyway) with an 8051. In fact, that is an option, to add an MCU for most of the I/O and processing, then use something like the XO2-256 in a QFN32 to do the high speed stuff. I'm just not sure I can fit the design in 256 LUTs. Maybe the QFN32 is small enough I can use two? 5x5 mm! > The newest PSoc4 seems to have solved some of the sticker shock, but I think they crippled the Logic to achieve that. > Seems there is no free lunch. > > Cypress do however, grasp the package issue, and offer QFN40(0.5mm), as well as SSOP28(0.65mm) and TQFP44(0.8mm). Yeah, while the FPGA guys are rather phobic of issuing a lot of package combinations the MCU folks have tons of them. They have a much tougher problem with all the combos of RAM, Flash, I/O count, clock speed, ... I can see why the FPGA people haven't embraced the idea of combining MCU with FPGA, it just doesn't fit their culture. -- RickArticle: 155764
On 8/28/2013 2:58 PM, already5chosen@yahoo.com wrote: > On Wednesday, August 28, 2013 11:51:36 AM UTC+3, rickman wrote: >> On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote: >> >>> >> >>> I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (out of 18K memory bits only 2K bits used). It's hard to translate exactly into old-fashioned LUTs, but I'd say - around 700. >> >>> Per clock Nios2e is pretty slow, but it clocks rather high and it is a 32-bit CPU - very easy to program in C. >> >> >> >> I can't say I fully understand the ALM, but I think it functions as a >> lot more than just a pair of 4 input LUTs. It will do that without any >> issue. But it will do a lot more and I expect this is used to great >> advantage in a CPU. I'd say the ALM is equivalent to between 3 and 4 >> LUT4s depending on the design. I guess it is hard to compare between >> different device types. >> > > No, ALM is close to two 4-input LUTs. May be, a bit more when implementing complex tightly-coupled logic with high internal complexity to fanout ratio. May be, a bit less, when implementing simple things with lots of registers and high fanout. > > For sake of the argument, I compiled Nios2e for Cyclone4, which has more old-fashioned architecture - 676 LCs + 2 M9Ks. > I also dug out my real-world design from many years ago that embeds Nios2e into Cyclone2. It is even smaller at 565 LCs + 2 M4Ks. > > >> >> >> >> >>> Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory for fabric, could be an interesting exercise, well suitable for coding competition. But, probably, illegal :( >> >> Yes, there are always lots of tradeoffs to be considered. >> > > My point is - if you don't need performance and can use embedded memories then you can design useful 32-bit RISC CPU which would be non-trivially smaller than 600 LCs. > Nios2e core that I took as example is small, but hardly minimalistic. It implements full Nios2 architecture including several parts that you probably don't need. In particular: > - everything related to interrupts and exceptions > - support for big program address space > - ability to run execute programs from any memories others than on-chip SRAM If the size of the NIOS2 is as small as you say, then that only leaves two issues with using the NIOS2 in my FPGA designs. The first is that I don't need 32 bit data paths in addition to the large memory address bus. I assume this means the instructions are not so compact using more on chip memory than desired. But the really big issue with using the NIOS2 is not technical, Altera won't let you use it on anything that isn't an Altera part. So in reality this is a non-starter no matter how good the NIOS2 is technically. -- RickArticle: 155765
> > I think you mean Lattice offers a part in the QFN32. I only found the > XO2-256. A selection guide that isn't even a year old says they offer > an iCE40 part in this package, but it doesn't show up in the data sheet > from 2013. I guess the product line is still a little schizo. They > haven't finished cleaning house and any part or package could be next. The part code for this is ICE40LP384-SG32 Showing on price lists, but still 0 in the stock column. Mouser says 100 due on 9/30/2013 > In fact, that is an option, to add an MCU for > most of the I/O and processing, then use something like the XO2-256 in a > QFN32 to do the high speed stuff. I'm just not sure I can fit the > design in 256 LUTs. Maybe the QFN32 is small enough I can use two? 5x5mm! Try it and see. I found the XO2-256 seems to pack full quite well, and the tools are ok to use, so you can find out quite quickly. I did a series of capture counters in XO2-256, and once it worked, I increased the the width to fill more of the part. IIRC it got into the 90%+ with now surprises. I've been meaning to compare the ICE40LP384 with the XO2-256, as the iCE40 cell is more primitive, it may not fit more. -jgArticle: 155766
We're experimenting with heat sinking an Altera Cyclone 3 FPGA. To measure actual die temperature, we built a 19-stage ring oscillator, followed by a divide-by-16 ripple counter, and brought that out. The heat source is the FPGA itself: we just clocked every available flop on the chip at 250 MHz. We stuck a thinfilm thermocouple on the top of the BGA package, and here's what we got: https://dl.dropboxusercontent.com/u/53724080/Thermal/R2_Temp_Cal.jpg We can now use that curve (line, actually!) to evaluate various heat sinking options, for both this chip and the entire board. The equivalent prop delay per CLB seems to be about 350 ps. The prop delay slope is about 0.1% per degree C. -- John Larkin Highland Technology, Inc jlarkin at highlandtechnology dot com http://www.highlandtechnology.com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom laser drivers and controllers Photonics and fiberoptic TTL data links VME thermocouple, LVDT, synchro acquisition and simulationArticle: 155767
On Thursday, August 29, 2013 11:23:08 PM UTC+3, rickman wrote: > On 8/28/2013 2:58 PM, already5chosen@yahoo.com wrote: >=20 > > On Wednesday, August 28, 2013 11:51:36 AM UTC+3, rickman wrote: >=20 > >> On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote: >=20 > >> >=20 > >>> >=20 > >> >=20 > >>> I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (= out of 18K memory bits only 2K bits used). It's hard to translate exactly i= nto old-fashioned LUTs, but I'd say - around 700. >=20 > >> >=20 > >>> Per clock Nios2e is pretty slow, but it clocks rather high and it is = a 32-bit CPU - very easy to program in C. >=20 > >> >=20 > >> >=20 > >> >=20 > >> I can't say I fully understand the ALM, but I think it functions as a >=20 > >> lot more than just a pair of 4 input LUTs. It will do that without an= y >=20 > >> issue. But it will do a lot more and I expect this is used to great >=20 > >> advantage in a CPU. I'd say the ALM is equivalent to between 3 and 4 >=20 > >> LUT4s depending on the design. I guess it is hard to compare between >=20 > >> different device types. >=20 > >> >=20 > > >=20 > > No, ALM is close to two 4-input LUTs. May be, a bit more when implement= ing complex tightly-coupled logic with high internal complexity to fanout r= atio. May be, a bit less, when implementing simple things with lots of regi= sters and high fanout. >=20 > > >=20 > > For sake of the argument, I compiled Nios2e for Cyclone4, which has mor= e old-fashioned architecture - 676 LCs + 2 M9Ks. >=20 > > I also dug out my real-world design from many years ago that embeds Nio= s2e into Cyclone2. It is even smaller at 565 LCs + 2 M4Ks. >=20 > > >=20 > > >=20 > >> >=20 > >> >=20 > >> >=20 > >> >=20 > >>> Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory f= or fabric, could be an interesting exercise, well suitable for coding compe= tition. But, probably, illegal :( >=20 > >> >=20 > >> Yes, there are always lots of tradeoffs to be considered. >=20 > >> >=20 > > >=20 > > My point is - if you don't need performance and can use embedded memori= es then you can design useful 32-bit RISC CPU which would be non-trivially = smaller than 600 LCs. >=20 > > Nios2e core that I took as example is small, but hardly minimalistic. I= t implements full Nios2 architecture including several parts that you proba= bly don't need. In particular: >=20 > > - everything related to interrupts and exceptions >=20 > > - support for big program address space >=20 > > - ability to run execute programs from any memories others than on-chip= SRAM >=20 >=20 >=20 > If the size of the NIOS2 is as small as you say,=20 Nios2e is small. And slow. Nios2s and Nios2f aren't small. > then that only leaves=20 > two issues with using the NIOS2 in my FPGA designs. The first is that I= =20 > don't need 32 bit data paths in addition to the large memory address=20 > bus. I assume this means the instructions are not so compact using more= =20 > on chip memory than desired. Yes, Nios2 code density is poor. About the same as MIPS32, may be, just a l= ittle bit better. Similar to PPC. Measurably worse than "old" ARM. More tha= n 1.5x worse than Thumb2. >=20 >=20 >=20 > But the really big issue with using the NIOS2 is not technical, Altera=20 > won't let you use it on anything that isn't an Altera part. So in=20 > reality this is a non-starter no matter how good the NIOS2 is technically= . >=20 I don't understand why. If you code in C then porting non-hardware-specific parts of your code from= Nios2 to any other little-endian 32-bit processor with octet-addressable m= emory will take very little time. Much much less than porting hardware-spec= ific parts of code from, say, one ARM-Cortex SoC or MCU to another ARM-Cort= ex SoC or MCU. If you thought about it in advance, then even porting to big-endian 32-bitt= er is a non-issue, After all, we are talking about few KLOCs, at worst, few tens KLOCs. Unless= you code in asm, the CPU-related part of porting sounds as absolute non-is= sue. Esp. if you use gcc on both of your target. Or, may be, you wanted to say that Nios2 is unsuitable if your original des= ign not based on Altera FPGA? That's, of course, is true. But, then again, why would you *want* to use Nios2 outside of Altera realm?= Other vendors have their own 32-bit soft core solutions. I didn't try them= , but would think that in most aspects their solutions are similar to Nios2= . Or, as in case of Microsemi, they have licensing agreement with ARM which= make Cortex-M1 affordable for low volume products. In any case, unless the volumes are HUGE, "roll your own soft core" does no= t sound to me as a right use of developer's time. The only justification fo= r it that I can see about is personal enjoyment.Article: 155768
On 8/29/2013 5:44 PM, John Larkin wrote: > > > We're experimenting with heat sinking an Altera Cyclone 3 FPGA. To > measure actual die temperature, we built a 19-stage ring oscillator, > followed by a divide-by-16 ripple counter, and brought that out. > > The heat source is the FPGA itself: we just clocked every available > flop on the chip at 250 MHz. We stuck a thinfilm thermocouple on the > top of the BGA package, and here's what we got: > > https://dl.dropboxusercontent.com/u/53724080/Thermal/R2_Temp_Cal.jpg > > > We can now use that curve (line, actually!) to evaluate various heat > sinking options, for both this chip and the entire board. > > The equivalent prop delay per CLB seems to be about 350 ps. The prop > delay slope is about 0.1% per degree C. > > Cute graph. Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC Optics, Electro-optics, Photonics, Analog Electronics 160 North State Road #203 Briarcliff Manor NY 10510 USA +1 845 480 2058 hobbs at electrooptical dot net http://electrooptical.netArticle: 155769
On 8/29/2013 6:27 PM, already5chosen@yahoo.com wrote: > On Thursday, August 29, 2013 11:23:08 PM UTC+3, rickman wrote: >> On 8/28/2013 2:58 PM, already5chosen@yahoo.com wrote: >> >>> On Wednesday, August 28, 2013 11:51:36 AM UTC+3, rickman wrote: >> >>>> On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote: >> >>>> >> >>>>> >> >>>> >> >>>>> I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (out of 18K memory bits only 2K bits used). It's hard to translate exactly into old-fashioned LUTs, but I'd say - around 700. >> >>>> >> >>>>> Per clock Nios2e is pretty slow, but it clocks rather high and it is a 32-bit CPU - very easy to program in C. >> >>>> >> >>>> >> >>>> >> >>>> I can't say I fully understand the ALM, but I think it functions as a >> >>>> lot more than just a pair of 4 input LUTs. It will do that without any >> >>>> issue. But it will do a lot more and I expect this is used to great >> >>>> advantage in a CPU. I'd say the ALM is equivalent to between 3 and 4 >> >>>> LUT4s depending on the design. I guess it is hard to compare between >> >>>> different device types. >> >>>> >> >>> >> >>> No, ALM is close to two 4-input LUTs. May be, a bit more when implementing complex tightly-coupled logic with high internal complexity to fanout ratio. May be, a bit less, when implementing simple things with lots of registers and high fanout. >> >>> >> >>> For sake of the argument, I compiled Nios2e for Cyclone4, which has more old-fashioned architecture - 676 LCs + 2 M9Ks. >> >>> I also dug out my real-world design from many years ago that embeds Nios2e into Cyclone2. It is even smaller at 565 LCs + 2 M4Ks. >> >>> >> >>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>>> Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory for fabric, could be an interesting exercise, well suitable for coding competition. But, probably, illegal :( >> >>>> >> >>>> Yes, there are always lots of tradeoffs to be considered. >> >>>> >> >>> >> >>> My point is - if you don't need performance and can use embedded memories then you can design useful 32-bit RISC CPU which would be non-trivially smaller than 600 LCs. >> >>> Nios2e core that I took as example is small, but hardly minimalistic. It implements full Nios2 architecture including several parts that you probably don't need. In particular: >> >>> - everything related to interrupts and exceptions >> >>> - support for big program address space >> >>> - ability to run execute programs from any memories others than on-chip SRAM >> >> >> >> If the size of the NIOS2 is as small as you say, > > Nios2e is small. And slow. Nios2s and Nios2f aren't small. Slow is a relative term. I expect NIOS is designed for the instruction set rather than for the implementation. From your description the s and f versions burn logic to get speed while the e version is the minimum hardware that can do the job. This is not my idea of how to make an embedded core. I would take the approach of designing a CPU which uses minimal resources as part of its architecture and uses an instruction set that is adequate and efficient rather than being optimized for a language. I am accustomed to writing assembly language code and even micro code for bit slice processors. >> then that only leaves >> two issues with using the NIOS2 in my FPGA designs. The first is that I >> don't need 32 bit data paths in addition to the large memory address >> bus. I assume this means the instructions are not so compact using more >> on chip memory than desired. > > Yes, Nios2 code density is poor. About the same as MIPS32, may be, just a little bit better. Similar to PPC. Measurably worse than "old" ARM. More than 1.5x worse than Thumb2. I can tell by the terms you use that you are thinking in terms of C programming and larger code bases than what I typically do. In particular the code for this job would be not far removed from the hardware and in fact would need to be written to work very efficiently with the hardware to meet the hard, real time constraints involved. This is not your typical C program. >> But the really big issue with using the NIOS2 is not technical, Altera >> won't let you use it on anything that isn't an Altera part. So in >> reality this is a non-starter no matter how good the NIOS2 is technically. >> > > I don't understand why. > If you code in C then porting non-hardware-specific parts of your code from Nios2 to any other little-endian 32-bit processor with octet-addressable memory will take very little time. Much much less than porting hardware-specific parts of code from, say, one ARM-Cortex SoC or MCU to another ARM-Cortex SoC or MCU. > If you thought about it in advance, then even porting to big-endian 32-bitter is a non-issue, Yes, you are thinking along very different lines than I am. The idea is not to port the code, but to port the processor. Then there is virtually no work involved other than recompiling the HDL. > After all, we are talking about few KLOCs, at worst, few tens KLOCs. Unless you code in asm, the CPU-related part of porting sounds as absolute non-issue. Esp. if you use gcc on both of your target. Probably not even a single KLOC, lol. All I am doing is replacing some hardware functions with software. Use the ALU and data paths of the CPU to replace the logic and data paths of dedicated hardware. Not tons of work but the timing is important. So once it is written and working and more importantly, verified, I want to never have to touch the code again, just as if it were hardware (well, gateware). So the processor would need to be ported to whatever device this is implemented in. > Or, may be, you wanted to say that Nios2 is unsuitable if your original design not based on Altera FPGA? That's, of course, is true. > But, then again, why would you *want* to use Nios2 outside of Altera realm? Other vendors have their own 32-bit soft core solutions. I didn't try them, but would think that in most aspects their solutions are similar to Nios2. Or, as in case of Microsemi, they have licensing agreement with ARM which make Cortex-M1 affordable for low volume products. > > In any case, unless the volumes are HUGE, "roll your own soft core" does not sound to me as a right use of developer's time. The only justification for it that I can see about is personal enjoyment. A CPU design can be as hard or as easy as you want. If you must have C support there is a ZPU which was designed explicitly for that, but I don't think this is a good match for deterministic real time apps. I have worked on a couple of versions of a stack based processor design which is reasonably efficient. I have some new ideas for something a bit more novel. We'll see what happens. This is all due to the EOL from Lattice and we have until November to get a last time buy in and a new design won't be needed until those parts are used. So I've likely got a year or so. -- RickArticle: 155770
On Thu, 29 Aug 2013 20:33:30 -0400, Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> wrote: >On 8/29/2013 5:44 PM, John Larkin wrote: >> >> >> We're experimenting with heat sinking an Altera Cyclone 3 FPGA. To >> measure actual die temperature, we built a 19-stage ring oscillator, >> followed by a divide-by-16 ripple counter, and brought that out. >> >> The heat source is the FPGA itself: we just clocked every available >> flop on the chip at 250 MHz. We stuck a thinfilm thermocouple on the >> top of the BGA package, and here's what we got: >> >> https://dl.dropboxusercontent.com/u/53724080/Thermal/R2_Temp_Cal.jpg >> >> >> We can now use that curve (line, actually!) to evaluate various heat >> sinking options, for both this chip and the entire board. >> >> The equivalent prop delay per CLB seems to be about 350 ps. The prop >> delay slope is about 0.1% per degree C. >> >> >Cute graph. > >Cheers > >Phil Hobbs I had a minion photograph my whiteboard data and type it into Excel. I don't do Excel. Being now calibrated, I stuck a short pin-fin heat sink on top of the FPGA with some grease, and the chip temp dropped 4C. A tall pin-fin dropped it 4C. A 0.7" square of 0.062 thick aluminum, greasy-stuck to the top, dropped the chip temp 4C. Neat! -- John Larkin Highland Technology, Inc jlarkin at highlandtechnology dot com http://www.highlandtechnology.com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom laser drivers and controllers Photonics and fiberoptic TTL data links VME thermocouple, LVDT, synchro acquisition and simulationArticle: 155771
On 8/29/2013 9:14 PM, John Larkin wrote: > On Thu, 29 Aug 2013 20:33:30 -0400, Phil Hobbs > <pcdhSpamMeSenseless@electrooptical.net> wrote: > >> On 8/29/2013 5:44 PM, John Larkin wrote: >>> >>> >>> We're experimenting with heat sinking an Altera Cyclone 3 FPGA. To >>> measure actual die temperature, we built a 19-stage ring oscillator, >>> followed by a divide-by-16 ripple counter, and brought that out. >>> >>> The heat source is the FPGA itself: we just clocked every available >>> flop on the chip at 250 MHz. We stuck a thinfilm thermocouple on the >>> top of the BGA package, and here's what we got: >>> >>> https://dl.dropboxusercontent.com/u/53724080/Thermal/R2_Temp_Cal.jpg >>> >>> >>> We can now use that curve (line, actually!) to evaluate various heat >>> sinking options, for both this chip and the entire board. >>> >>> The equivalent prop delay per CLB seems to be about 350 ps. The prop >>> delay slope is about 0.1% per degree C. >>> >>> >> Cute graph. >> >> Cheers >> >> Phil Hobbs > > I had a minion photograph my whiteboard data and type it into Excel. I > don't do Excel. > > Being now calibrated, I stuck a short pin-fin heat sink on top of the > FPGA with some grease, and the chip temp dropped 4C. A tall pin-fin > dropped it 4C. A 0.7" square of 0.062 thick aluminum, greasy-stuck to > the top, dropped the chip temp 4C. > > Neat! > > Suggesting that the main mechanism is helping transport heat to the leads (or the more distant solder balls), rather than to the air. Pin fin heatsinks are a crock. You don't get any more surface area than a parallel-fin design, and all the discontinuities interfere with the airflow very badly. Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC Optics, Electro-optics, Photonics, Analog Electronics 160 North State Road #203 Briarcliff Manor NY 10510 USA +1 845 480 2058 hobbs at electrooptical dot net http://electrooptical.netArticle: 155772
On Thu, 29 Aug 2013 21:39:28 -0400, Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> wrote: >On 8/29/2013 9:14 PM, John Larkin wrote: >> On Thu, 29 Aug 2013 20:33:30 -0400, Phil Hobbs >> <pcdhSpamMeSenseless@electrooptical.net> wrote: >> >>> On 8/29/2013 5:44 PM, John Larkin wrote: >>>> >>>> >>>> We're experimenting with heat sinking an Altera Cyclone 3 FPGA. To >>>> measure actual die temperature, we built a 19-stage ring oscillator, >>>> followed by a divide-by-16 ripple counter, and brought that out. >>>> >>>> The heat source is the FPGA itself: we just clocked every available >>>> flop on the chip at 250 MHz. We stuck a thinfilm thermocouple on the >>>> top of the BGA package, and here's what we got: >>>> >>>> https://dl.dropboxusercontent.com/u/53724080/Thermal/R2_Temp_Cal.jpg >>>> >>>> >>>> We can now use that curve (line, actually!) to evaluate various heat >>>> sinking options, for both this chip and the entire board. >>>> >>>> The equivalent prop delay per CLB seems to be about 350 ps. The prop >>>> delay slope is about 0.1% per degree C. >>>> >>>> >>> Cute graph. >>> >>> Cheers >>> >>> Phil Hobbs >> >> I had a minion photograph my whiteboard data and type it into Excel. I >> don't do Excel. >> >> Being now calibrated, I stuck a short pin-fin heat sink on top of the >> FPGA with some grease, and the chip temp dropped 4C. A tall pin-fin >> dropped it 4C. A 0.7" square of 0.062 thick aluminum, greasy-stuck to >> the top, dropped the chip temp 4C. >> >> Neat! >> >> > >Suggesting that the main mechanism is helping transport heat to the >leads (or the more distant solder balls), rather than to the air. Right. It spreads the heat laterally from the hot spot in the center of the package. So, why didn't Altera do that for me? > >Pin fin heatsinks are a crock. You don't get any more surface area than >a parallel-fin design, and all the discontinuities interfere with the >airflow very badly. But I can buy one with a thick flat base and peel-off acrylic sticky on the bottom. The pin-fins are just for show. -- John Larkin Highland Technology Inc www.highlandtechnology.com jlarkin at highlandtechnology dot com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom timing and laser controllers Photonics and fiberoptic TTL data links VME analog, thermocouple, LVDT, synchro, tachometer Multichannel arbitrary waveform generatorsArticle: 155773
On 8/29/2013 9:45 PM, John Larkin wrote: > On Thu, 29 Aug 2013 21:39:28 -0400, Phil Hobbs > <pcdhSpamMeSenseless@electrooptical.net> wrote: > >> On 8/29/2013 9:14 PM, John Larkin wrote: >>> On Thu, 29 Aug 2013 20:33:30 -0400, Phil Hobbs >>> <pcdhSpamMeSenseless@electrooptical.net> wrote: >>> >>>> On 8/29/2013 5:44 PM, John Larkin wrote: >>>>> >>>>> >>>>> We're experimenting with heat sinking an Altera Cyclone 3 FPGA. To >>>>> measure actual die temperature, we built a 19-stage ring oscillator, >>>>> followed by a divide-by-16 ripple counter, and brought that out. >>>>> >>>>> The heat source is the FPGA itself: we just clocked every available >>>>> flop on the chip at 250 MHz. We stuck a thinfilm thermocouple on the >>>>> top of the BGA package, and here's what we got: >>>>> >>>>> https://dl.dropboxusercontent.com/u/53724080/Thermal/R2_Temp_Cal.jpg >>>>> >>>>> >>>>> We can now use that curve (line, actually!) to evaluate various heat >>>>> sinking options, for both this chip and the entire board. >>>>> >>>>> The equivalent prop delay per CLB seems to be about 350 ps. The prop >>>>> delay slope is about 0.1% per degree C. >>>>> >>>>> >>>> Cute graph. >>>> >>>> Cheers >>>> >>>> Phil Hobbs >>> >>> I had a minion photograph my whiteboard data and type it into Excel. I >>> don't do Excel. >>> >>> Being now calibrated, I stuck a short pin-fin heat sink on top of the >>> FPGA with some grease, and the chip temp dropped 4C. A tall pin-fin >>> dropped it 4C. A 0.7" square of 0.062 thick aluminum, greasy-stuck to >>> the top, dropped the chip temp 4C. >>> >>> Neat! >>> >>> >> >> Suggesting that the main mechanism is helping transport heat to the >> leads (or the more distant solder balls), rather than to the air. > > Right. It spreads the heat laterally from the hot spot in the center of the > package. So, why didn't Altera do that for me? > > >> >> Pin fin heatsinks are a crock. You don't get any more surface area than >> a parallel-fin design, and all the discontinuities interfere with the >> airflow very badly. > > But I can buy one with a thick flat base and peel-off acrylic sticky on the > bottom. The pin-fins are just for show. > > It would be amusing to calculate how thick the aluminum has to be before the sticky stuff dominates the thermal conduction. My guess is about 10 mils. Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC Optics, Electro-optics, Photonics, Analog Electronics 160 North State Road #203 Briarcliff Manor NY 10510 USA +1 845 480 2058 hobbs at electrooptical dot net http://electrooptical.netArticle: 155774
On Thursday, August 29, 2013 8:39:28 PM UTC-5, Phil Hobbs wrote: > > Pin fin heatsinks are a crock. You don't get any more surface area than > > a parallel-fin design, and all the discontinuities interfere with the > > airflow very badly. > In natural convection environments, the discontinuities break up the laminar air flow and improve heat transfer to the air, while having negligible impact on overall airflow. Different horses for different courses... Andy
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z