Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
rickman <gnuarm@gmail.com> wrote: (snip) >> Input (bit 15 on left): >> 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 >> 20-bit Output List: >> 1111 1110 0010 0001 0000 > I am old school so I have to picture these things.... > I see five priority encoders. The first priority encoder output is used > to disable that input to the second priority encoder, etc. > How did you code it? I can see how this would be a lot more than three > levels of logic. The equations get quite long. When you talk about > levels of logic I assume you mean layers of LUTs? I can see maybe each > 16 input priority encoder being no more than 3 levels of LUTs, but all > five layers with the inhibit logic... I don't think so. I expect the > tools did a pretty good job of optimizing it and I don't easily see any > way of using carry chains. It is slightly easier than that. (Sometime ago I did this with 36 and 3.) Consider the case of only two bits high. Use one priority encoder the usual way, and one upside down. (That is, the highest and lowest set bit.) Now, as you said, subtract off the highest and lowest, and two more priority encoders, and finally subtract one more and the last one. But first I would add the logic to determine that only (or at least) five were set. I suspect that, as the OP noted, it is combinatorial hard. Consider the logic needed to, separately, compute each bit of the result. Even just the first one. It does, however, pipeline very well. In the one I was working on, it had a fairly fast clock (66MHz) but I could stand some levels of latency. The OP claims the need for low latency. -- glenArticle: 157901
On 5/11/2015 7:26 PM, Kevin Neilson wrote: > To be clearer, here's an example. > > Input (bit 15 on left): > > 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 > > 20-bit Output List: > 1111 1110 0010 0001 0000 > 64K x 20 ROM? How badly do you want one level of logic? Rob.Article: 157902
Rob Doyle wrote: > On 5/11/2015 7:26 PM, Kevin Neilson wrote: >> To be clearer, here's an example. >> >> Input (bit 15 on left): >> >> 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 >> >> 20-bit Output List: >> 1111 1110 0010 0001 0000 >> > > 64K x 20 ROM? > > How badly do you want one level of logic? > > Rob. Two ROMs, 256 by 23 and 256 by 20. The first encodes the lower 8 bits into 0 to 5 4-bit numbers right-justified plus a 3-bit indication of how many bits were set. The second just encodes 0 to 5 4-bit numbers, left justified (you assume it has the remaining 1's not covered by the low 8 bits. Then one more level to select how many bits to take from each ROM based on the number of ones in the lower 8 bits (each output bit is only a 2:1 mux in this case - that's why the ROMs are justified right and left as noted). -- GaborArticle: 157903
GaborSzakacs <gabor@alacron.com> wrote: > Rob Doyle wrote: >> On 5/11/2015 7:26 PM, Kevin Neilson wrote: >>> To be clearer, here's an example. >>> Input (bit 15 on left): >>> 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 >>> 20-bit Output List: >>> 1111 1110 0010 0001 0000 >> 64K x 20 ROM? Hmm, that probably does work. Especially with the BRAM in many FPGAs. When I did it some years ago, it was three out of 36 bits, and ROMs weren't, and still aren't, that big. >> How badly do you want one level of logic? > Two ROMs, 256 by 23 and 256 by 20. The first encodes the > lower 8 bits into 0 to 5 4-bit numbers right-justified plus > a 3-bit indication of how many bits were set. The second > just encodes 0 to 5 4-bit numbers, left justified (you assume > it has the remaining 1's not covered by the low 8 bits. Then > one more level to select how many bits to take from each ROM > based on the number of ones in the lower 8 bits (each output > bit is only a 2:1 mux in this case - that's why the ROMs > are justified right and left as noted). If not in BRAM, that seems a good way. How many levels of logic is a 256 bit ROM in FPGA LUTs? -- glenArticle: 157904
On 5/12/2015 11:54 AM, glen herrmannsfeldt wrote: > GaborSzakacs <gabor@alacron.com> wrote: >> Rob Doyle wrote: >>> On 5/11/2015 7:26 PM, Kevin Neilson wrote: >>>> To be clearer, here's an example. > >>>> Input (bit 15 on left): > >>>> 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 > >>>> 20-bit Output List: >>>> 1111 1110 0010 0001 0000 > >>> 64K x 20 ROM? > > Hmm, that probably does work. Especially with the BRAM in many FPGAs. > > When I did it some years ago, it was three out of 36 bits, and ROMs > weren't, and still aren't, that big. > >>> How badly do you want one level of logic? > >> Two ROMs, 256 by 23 and 256 by 20. The first encodes the >> lower 8 bits into 0 to 5 4-bit numbers right-justified plus >> a 3-bit indication of how many bits were set. The second >> just encodes 0 to 5 4-bit numbers, left justified (you assume >> it has the remaining 1's not covered by the low 8 bits. Then >> one more level to select how many bits to take from each ROM >> based on the number of ones in the lower 8 bits (each output >> bit is only a 2:1 mux in this case - that's why the ROMs >> are justified right and left as noted). > > If not in BRAM, that seems a good way. > > How many levels of logic is a 256 bit ROM in FPGA LUTs? Like most things, "it depends". The older devices have 16 bits per LUT, if any, and many of the newer devices have 64 bits per LUT. Nearly all devices have BRAMs so it's a no brainer by the split table method. Only trouble is BRAMs use a clock cycle... I'm just sayin'... -- RickArticle: 157905
rickman <gnuarm@gmail.com> wrote: (snip, I wrote) >> Hmm, that probably does work. Especially with the BRAM in many FPGAs. >> When I did it some years ago, it was three out of 36 bits, and ROMs >> weren't, and still aren't, that big. (snip) >> How many levels of logic is a 256 bit ROM in FPGA LUTs? > Like most things, "it depends". The older devices have 16 bits per LUT, > if any, and many of the newer devices have 64 bits per LUT. > Nearly all devices have BRAMs so it's a no brainer by the split table > method. Only trouble is BRAMs use a clock cycle... I'm just sayin'... OK, going back the OP says that three levels of logic is fine. Doesn't say how many pipeline stages. I presume a clock is available for the BRAM, but I am not sure. -- glenArticle: 157906
glen herrmannsfeldt wrote: > GaborSzakacs <gabor@alacron.com> wrote: >> Rob Doyle wrote: >>> On 5/11/2015 7:26 PM, Kevin Neilson wrote: >>>> To be clearer, here's an example. > >>>> Input (bit 15 on left): > >>>> 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 > >>>> 20-bit Output List: >>>> 1111 1110 0010 0001 0000 > >>> 64K x 20 ROM? > > Hmm, that probably does work. Especially with the BRAM in many FPGAs. > > When I did it some years ago, it was three out of 36 bits, and ROMs > weren't, and still aren't, that big. > >>> How badly do you want one level of logic? > >> Two ROMs, 256 by 23 and 256 by 20. The first encodes the >> lower 8 bits into 0 to 5 4-bit numbers right-justified plus >> a 3-bit indication of how many bits were set. The second >> just encodes 0 to 5 4-bit numbers, left justified (you assume >> it has the remaining 1's not covered by the low 8 bits. Then >> one more level to select how many bits to take from each ROM >> based on the number of ones in the lower 8 bits (each output >> bit is only a 2:1 mux in this case - that's why the ROMs >> are justified right and left as noted). > > If not in BRAM, that seems a good way. > > How many levels of logic is a 256 bit ROM in FPGA LUTs? > > -- glen > In Xilinx 7-series: A 256-bit LUT fits in 1 SLICEL. It uses all 4 64-bit LUTS and three muxes (two for each pair of LUTS and one to combine the result), so it would show up as 3 levels of logic, but it all routes internally to the slice and the muxes are really fast. Convincing the tools to use LUT memory is the fun part. Here's my test code: module simple_lut ( input wire [7:0] addr, output wire [7:0] data ); (* RAM_STYLE = "distributed" *) reg [7:0] lut_mem [0:255]; initial $readmemh ("../source/lutcontents.hex", lut_mem); assign data = lut_mem[addr]; endmodule -- GaborArticle: 157907
On 5/12/2015 2:03 PM, glen herrmannsfeldt wrote: > rickman <gnuarm@gmail.com> wrote: > > (snip, I wrote) >>> Hmm, that probably does work. Especially with the BRAM in many FPGAs. > >>> When I did it some years ago, it was three out of 36 bits, and ROMs >>> weren't, and still aren't, that big. > > (snip) > >>> How many levels of logic is a 256 bit ROM in FPGA LUTs? > >> Like most things, "it depends". The older devices have 16 bits per LUT, >> if any, and many of the newer devices have 64 bits per LUT. > >> Nearly all devices have BRAMs so it's a no brainer by the split table >> method. Only trouble is BRAMs use a clock cycle... I'm just sayin'... > > OK, going back the OP says that three levels of logic is fine. > Doesn't say how many pipeline stages. > > I presume a clock is available for the BRAM, but I am not sure. The OP said, "minimal latency" and I think I over spec'd that to mean combinatorial. So one clock for the BRAM and one clock for the muxing should be ok. Better than the 5 clock cycles he mentions. -- RickArticle: 157908
On 5/12/2015 2:09 PM, GaborSzakacs wrote: > glen herrmannsfeldt wrote: >> GaborSzakacs <gabor@alacron.com> wrote: >>> Rob Doyle wrote: >>>> On 5/11/2015 7:26 PM, Kevin Neilson wrote: >>>>> To be clearer, here's an example. >> >>>>> Input (bit 15 on left): >> >>>>> 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 >> >>>>> 20-bit Output List: >>>>> 1111 1110 0010 0001 0000 >> >>>> 64K x 20 ROM? >> >> Hmm, that probably does work. Especially with the BRAM in many FPGAs. >> >> When I did it some years ago, it was three out of 36 bits, and ROMs >> weren't, and still aren't, that big. >> >>>> How badly do you want one level of logic? >> >>> Two ROMs, 256 by 23 and 256 by 20. The first encodes the >>> lower 8 bits into 0 to 5 4-bit numbers right-justified plus >>> a 3-bit indication of how many bits were set. The second >>> just encodes 0 to 5 4-bit numbers, left justified (you assume >>> it has the remaining 1's not covered by the low 8 bits. Then >>> one more level to select how many bits to take from each ROM >>> based on the number of ones in the lower 8 bits (each output >>> bit is only a 2:1 mux in this case - that's why the ROMs >>> are justified right and left as noted). >> >> If not in BRAM, that seems a good way. >> How many levels of logic is a 256 bit ROM in FPGA LUTs? >> >> -- glen >> > > In Xilinx 7-series: > > A 256-bit LUT fits in 1 SLICEL. It uses all 4 64-bit LUTS and three > muxes (two for each pair of LUTS and one to combine the result), so it > would show up as 3 levels of logic, but it all routes internally to the > slice and the muxes are really fast. > > Convincing the tools to use LUT memory is the fun part. Here's my test > code: > > module simple_lut > ( > input wire [7:0] addr, > output wire [7:0] data > ); > > (* RAM_STYLE = "distributed" *) reg [7:0] lut_mem [0:255]; > > initial $readmemh ("../source/lutcontents.hex", lut_mem); > > assign data = lut_mem[addr]; > > endmodule It would be 43 x 4 or 172 LUTs. A fair amount plus the muxes to combine the outputs. Still, it is mostly in parallel so it should be fairly fast. -- RickArticle: 157909
The idea of working from both sides is useful. Finding the 5th bit set is = a lot harder than finding the 1st because you have to keep a running sum of= the bits already set so that eats up a few inputs of each LUT. If I searc= h from both ends the running sum would be no bigger than 2 (finding, say, 3= starting from the top and 2 starting from the bottom). When I draw this o= ut I still have at least 3 levels of logic plus a few levels of carry chain= mux, which I'd probably still have to pipeline one stage, but that's accep= table. Vivado surely won't use the carry chain mux unless I instantiate i= t anyway, and then it would be 5-6 levels of logic, so I'd definitely need = to pipeline that one stage.Article: 157910
Vivado made it 16 levels of logic, and I can't tell exactly what it's doing= , but this is how I would expect it would work: the first output is the ea= siest. You just find the leading 1 with a priority encoder and encode it. = You can look at the first 5 bits with the first level, using the 6th LUT i= nput for an input from the next level if none of those 5 bits are set, and = so on. This requires 4 levels of LUTs. One could use the carry chain muxe= s to speed things up but you'd have to instantiate them because Vivado does= n't seem to know how to do that. So that first output requires 4 LUTs x 4 = bits. But the 5th encoded output is harder, because you have to keep a running 3-= bit sum of the number of set bits already encountered, so 3 bits of each LU= T after the first are needed for the running sum, and the sum itself requir= es 2 levels of logic. (I can't post pictures here, can I?) So now you end= up with what I calculate should be 7 levels of logic, or 3 levels of LUT a= nd 5 levels of carry chain mux. I could maybe do this if I pipeline it and= I can get Vivado to synthesize it properly. But it just seems like there = should be some easier way.Article: 157911
Yes, that would work. I think it would be about about 180 LUTs, which is quite a bit. It would probably work in one cycle: there is a LUT, F7/F8 mux, and a second level of LUT for the mux, and only 2 levels of net routed on the fabric.Article: 157912
I have plenty of BRAMs and don't mind using them, but they're a pain someti= mes. I have to use the output registers so they have 2 cycles of latency, = and often I have to add another cycle just to route data to or from the BRA= M column. They come in handy, though. Lately I've been using them for a l= ot of Galois arithmetic, such as lookup tables for 1/x.Article: 157913
On Tuesday, May 12, 2015 at 12:10:35 PM UTC-6, Gabor wrote: > glen herrmannsfeldt wrote: > > GaborSzakacs <gabor@alacron.com> wrote: > >> Rob Doyle wrote: > >>> On 5/11/2015 7:26 PM, Kevin Neilson wrote: > >>>> To be clearer, here's an example. > >=20 > >>>> Input (bit 15 on left): > >=20 > >>>> 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 > >=20 > >>>> 20-bit Output List: > >>>> 1111 1110 0010 0001 0000 > >=20 > >>> 64K x 20 ROM? > >=20 > > Hmm, that probably does work. Especially with the BRAM in many FPGAs. > >=20 > > When I did it some years ago, it was three out of 36 bits, and ROMs > > weren't, and still aren't, that big. > >=20 > >>> How badly do you want one level of logic? > > =20 > >> Two ROMs, 256 by 23 and 256 by 20. The first encodes the > >> lower 8 bits into 0 to 5 4-bit numbers right-justified plus > >> a 3-bit indication of how many bits were set. The second > >> just encodes 0 to 5 4-bit numbers, left justified (you assume > >> it has the remaining 1's not covered by the low 8 bits. Then > >> one more level to select how many bits to take from each ROM > >> based on the number of ones in the lower 8 bits (each output > >> bit is only a 2:1 mux in this case - that's why the ROMs > >> are justified right and left as noted). > >=20 > > If not in BRAM, that seems a good way.=20 > >=20 > > How many levels of logic is a 256 bit ROM in FPGA LUTs? > >=20 > > -- glen > > =20 >=20 > In Xilinx 7-series: >=20 > A 256-bit LUT fits in 1 SLICEL. It uses all 4 64-bit LUTS and three > muxes (two for each pair of LUTS and one to combine the result), so it > would show up as 3 levels of logic, but it all routes internally to the > slice and the muxes are really fast. >=20 > Convincing the tools to use LUT memory is the fun part. Here's my test > code: >=20 > module simple_lut > ( > input wire [7:0] addr, > output wire [7:0] data > ); >=20 > (* RAM_STYLE =3D "distributed" *) reg [7:0] lut_mem [0:255]; >=20 > initial $readmemh ("../source/lutcontents.hex", lut_mem); >=20 > assign data =3D lut_mem[addr]; >=20 > endmodule >=20 > --=20 > Gabor The ROM can be fast if you use the F7/F8 muxes built into the slice. I've = found the key thing for V7 is to minimize the number of routes on the fabri= c. The F7/F8 muxes are slower than LUTs, I think, but since you don't have= to route the connecting net onto the fabric you save a lot of time.Article: 157914
Den mandag den 11. maj 2015 kl. 19.59.43 UTC+2 skrev John Larkin: > Does anyone know if the ZYNQ chips have an internal high-temperature > shutdown? They are behaving like they do. >=20 looks like you have to enable it (it may be default) and you have to load t= he PL=20 30.3.6 Critical Over-temperature Alarm Note: This feature sends an interrupt status to the PS and causes an autom= atic shutdown feature for=20 the PL side of the Zynq-7000 device if enabled. Th e PL shutdown is enabled= via the bitstream and the=20 PL will only come out of power-down if th e over-temperature alarm goes ina= ctive or a=20 reconfiguration occurs. The on-chip temperature measurement is used for critical temperature warnin= gs. The default over=20 temperature threshold is 125=B0C. This threshold is used when the contents= of the OT Upper Alarm=20 register (listed in UG480) have not been configured. When the die temperatu= re exceeds the=20 threshold set in the XADC's Control register, the ov er-temperature alarm (= OT) becomes active. The OT=20 signal resets when the die temperature has fallen below set threshold.=20 The OT alarm can also be used to automatically power down the PL upon activ= ation. The OT alarm can=20 be disabled by writing a 1 to the OT bit in the XADC's Configuration regi= ster. Note: these registers are in the XADC and are accessible using the DRP. -LasseArticle: 157915
Kevin Neilson wrote: > Yes, that would work. I think it would be about about 180 LUTs, which is quite a bit. It would probably work in one cycle: there is a LUT, F7/F8 mux, and a second level of LUT for the mux, and only 2 levels of net routed on the fabric. If you decide to pipeline it, a register placed after the 256-deep LUT will go into the same slice with the 4 LUTs, 2 F7 muxes and F8 mux. Then the final 2:1 would go after a standard fabric register, which has pretty small clock to Q (much better than BRAM without the output register). Even in a -2 Artix you can run above 500 MHz with this arrangement. -- GaborArticle: 157916
On 5/12/2015 3:11 PM, Kevin Neilson wrote: > I have plenty of BRAMs and don't mind using them, but they're a pain > sometimes. I have to use the output registers so they have 2 cycles > of latency, and often I have to add another cycle just to route data > to or from the BRAM column. They come in handy, though. Lately I've > been using them for a lot of Galois arithmetic, such as lookup tables > for 1/x. Why do you have to use the output registers? The clock to out time on a BRAM has always been very fast as is the setup time. The ones I've worked with were only slightly slower than a FF in the context of typical delays in logic and fabric. What is your clock speed? If you are working in a large part the LUTs are not an unreasonable way to implement this. Not sure how fast the resulting logic will be, but it should be in the same ballpark as the BRAM but purely combinatorial. Do you need to run faster than 100 MHz? -- RickArticle: 157917
On Tuesday, May 12, 2015 at 4:29:39 PM UTC-6, rickman wrote: > On 5/12/2015 3:11 PM, Kevin Neilson wrote: > > I have plenty of BRAMs and don't mind using them, but they're a pain > > sometimes. I have to use the output registers so they have 2 cycles > > of latency, and often I have to add another cycle just to route data > > to or from the BRAM column. They come in handy, though. Lately I've > > been using them for a lot of Galois arithmetic, such as lookup tables > > for 1/x. >=20 > Why do you have to use the output registers? The clock to out time on a= =20 > BRAM has always been very fast as is the setup time. The ones I've=20 > worked with were only slightly slower than a FF in the context of=20 > typical delays in logic and fabric. What is your clock speed? >=20 > If you are working in a large part the LUTs are not an unreasonable way= =20 > to implement this. Not sure how fast the resulting logic will be, but=20 > it should be in the same ballpark as the BRAM but purely combinatorial.= =20 > Do you need to run faster than 100 MHz? >=20 > --=20 >=20 > Rick I'm using 350mHz, or a period of 2.8ns. The clk->out time for a V7 -1 BRAM= (without output reg) is about 2.1ns, so if I didn't use the BRAM output re= gister, I'd barely have enough time to get the output across a net to a FF.= And I know even that usually won't meet timing, because Vivado is fond of= pulling the output registers out of my BRAMs and putting them into slices,= I guess because it thinks it has extra slack and can give some of it to th= e next path. But then the net to the FF will be 600ps and the path will fa= il. I have not figured out how to make Vivado stop doing this (except by i= nstantiating BRAM primitives).Article: 157918
On 5/12/2015 2:57 PM, Kevin Neilson wrote: > Vivado made it 16 levels of logic, and I can't tell exactly what it's doing, but this is how I would expect it would work: the first output is the easiest. You just find the leading 1 with a priority encoder and encode it. You can look at the first 5 bits with the first level, using the 6th LUT input for an input from the next level if none of those 5 bits are set, and so on. This requires 4 levels of LUTs. One could use the carry chain muxes to speed things up but you'd have to instantiate them because Vivado doesn't seem to know how to do that. So that first output requires 4 LUTs x 4 bits. > > But the 5th encoded output is harder, because you have to keep a running 3-bit sum of the number of set bits already encountered, so 3 bits of each LUT after the first are needed for the running sum, and the sum itself requires 2 levels of logic. (I can't post pictures here, can I?) So now you end up with what I calculate should be 7 levels of logic, or 3 levels of LUT and 5 levels of carry chain mux. I could maybe do this if I pipeline it and I can get Vivado to synthesize it properly. But it just seems like there should be some easier way. Sorry, I just can't picture what you are doing. What is the "running sum" for? I think I might understand. You look at the first 5 inputs and output codes for all five positions. I'm not sure why you can't look at the first 6 inputs though. This outputs a three bit code of the number of 1's found. The second block looks at the next five inputs and outputs five codes. The last five bits would be like the second group and have a mux with the second group when in turn is what actually feeds the first mux. The first group would be one level of LUTs. The following two groups Let me try to draw this... ,------, 3 ,-----, 0-5 | |--/------------------------|SEL | -->--| | 20 | | 20 | |--/------------------------| BUM*|--/-->-- '------' | | ,---| | ,------, 3 ,-----, | '-----' 6-10 | |--/--------|SEL | | -->--| | 20 | | | | |--/--------| | | '------' | | 20 | | BUM*|--/--' ,------, | | 11-15 | | | | -->--| | 20 | | | |--/--------| | '------' '-----' *Big, Ugly Mux The mux might be hard to work out and will surely be more than 1 level of LUTs.... unless you can use the magic muxes in the slice to combine multiple LUTs into a 6 input mux. You don't need any adders for the counts since each 3 bit count controls a separate mux. This might just work in three levels of LUTs if you can use multiple LUTs to form a 6 input mux. I just read your post where you said you were running at 350 MHz. I guess even this will have to be pipelined. But it should be less logic than the brute force distributed RAM approach. But who knows until the LUTs are counted? In essence this is the same thing I guess. It might work better with the larger front end blocks and just one mux. I'm very surprised the clock to out time on the V7 BRAM is 2.1ns. I think that is about the same number as the Spartan 3s from long ago. Am I mistaken? -- RickArticle: 157919
On 5/12/2015 4:57 PM, lasselangwadtchristensen@gmail.com wrote: > Den mandag den 11. maj 2015 kl. 19.59.43 UTC+2 skrev John Larkin: >> Does anyone know if the ZYNQ chips have an internal high-temperature >> shutdown? They are behaving like they do. >> > > looks like you have to enable it (it may be default) and you have to load the PL > > 30.3.6 Critical Over-temperature Alarm > Note: This feature sends an interrupt status to the PS and causes an automatic shutdown feature for > the PL side of the Zynq-7000 device if enabled. Th e PL shutdown is enabled via the bitstream and the > PL will only come out of power-down if th e over-temperature alarm goes inactive or a > reconfiguration occurs. > The on-chip temperature measurement is used for critical temperature warnings. The default over > temperature threshold is 125°C. This threshold is used when the contents of the OT Upper Alarm > register (listed in UG480) have not been configured. When the die temperature exceeds the > threshold set in the XADC's Control register, the ov er-temperature alarm (OT) becomes active. The OT > signal resets when the die temperature has fallen below set threshold. > The OT alarm can also be used to automatically power down the PL upon activation. The OT alarm can > be disabled by writing a 1 to the OT bit in the XADC's Configuration register. > Note: these registers are in the XADC and are accessible using the DRP. Without me digging into the data sheet myself, can you tell me what the PL and PS are? -- RickArticle: 157920
On Tue, 12 May 2015 21:08:09 -0400, rickman wrote: > On 5/12/2015 4:57 PM, lasselangwadtchristensen@gmail.com wrote: >> Den mandag den 11. maj 2015 kl. 19.59.43 UTC+2 skrev John Larkin: >>> Does anyone know if the ZYNQ chips have an internal high-temperature >>> shutdown? They are behaving like they do. >>> >>> >> looks like you have to enable it (it may be default) and you have to >> load the PL >> > Without me digging into the data sheet myself, can you tell me what the > PL and PS are? Programmable Logic (FPGA side of things) and Processor System (hard ARM processor and some peripherals). -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.Article: 157921
On Tue, 12 May 2015 13:57:47 -0700 (PDT), lasselangwadtchristensen@gmail.com wrote: >Den mandag den 11. maj 2015 kl. 19.59.43 UTC+2 skrev John Larkin: >> Does anyone know if the ZYNQ chips have an internal high-temperature >> shutdown? They are behaving like they do. >> > >looks like you have to enable it (it may be default) and you have to load the PL > >30.3.6 Critical Over-temperature Alarm >Note: This feature sends an interrupt status to the PS and causes an automatic shutdown feature for >the PL side of the Zynq-7000 device if enabled. Th e PL shutdown is enabled via the bitstream and the >PL will only come out of power-down if th e over-temperature alarm goes inactive or a >reconfiguration occurs. >The on-chip temperature measurement is used for critical temperature warnings. The default over >temperature threshold is 125°C. This threshold is used when the contents of the OT Upper Alarm >register (listed in UG480) have not been configured. When the die temperature exceeds the >threshold set in the XADC's Control register, the ov er-temperature alarm (OT) becomes active. The OT >signal resets when the die temperature has fallen below set threshold. >The OT alarm can also be used to automatically power down the PL upon activation. The OT alarm can >be disabled by writing a 1 to the OT bit in the XADC's Configuration register. >Note: these registers are in the XADC and are accessible using the DRP. > >-Lasse It's probably shutting down at 125C, without our specifically programming any temperature. Extensive searching, by us and by Avnet, finds no fan that matches the hole spacing on the MicroZed board. So we'll fab a little aluminum adapter plate and use a standard fan. With a pin-fin heat sink glued to the 7020 FPGA, and the fan blowing down on that, we can run at 100C ambient. -- John Larkin Highland Technology, Inc picosecond timing laser drivers and controllers jlarkin att highlandtechnology dott com http://www.highlandtechnology.comArticle: 157922
On 5/12/2015 9:42 PM, John Larkin wrote: > > It's probably shutting down at 125C, without our specifically > programming any temperature. > > Extensive searching, by us and by Avnet, finds no fan that matches the > hole spacing on the MicroZed board. So we'll fab a little aluminum > adapter plate and use a standard fan. With a pin-fin heat sink glued > to the 7020 FPGA, and the fan blowing down on that, we can run at 100C > ambient. Reminds me of an array processor I worked on in the early 80's. It had ECL gate arrays in ceramic PGA packages with a heat sink on each chip and a specially designed plenum which slid over each one to direct air across the heat sink. This machine was as fast as a CRAY-1 and only a few years later. -- RickArticle: 157923
On Tuesday, May 12, 2015 at 5:37:27 PM UTC-6, rickman wrote: > On 5/12/2015 2:57 PM, Kevin Neilson wrote: > > Vivado made it 16 levels of logic, and I can't tell exactly what it's d= oing, but this is how I would expect it would work: the first output is th= e easiest. You just find the leading 1 with a priority encoder and encode = it. You can look at the first 5 bits with the first level, using the 6th L= UT input for an input from the next level if none of those 5 bits are set, = and so on. This requires 4 levels of LUTs. One could use the carry chain = muxes to speed things up but you'd have to instantiate them because Vivado = doesn't seem to know how to do that. So that first output requires 4 LUTs = x 4 bits. > > > > But the 5th encoded output is harder, because you have to keep a runnin= g 3-bit sum of the number of set bits already encountered, so 3 bits of eac= h LUT after the first are needed for the running sum, and the sum itself re= quires 2 levels of logic. (I can't post pictures here, can I?) So now you= end up with what I calculate should be 7 levels of logic, or 3 levels of L= UT and 5 levels of carry chain mux. I could maybe do this if I pipeline it= and I can get Vivado to synthesize it properly. But it just seems like th= ere should be some easier way. >=20 > Sorry, I just can't picture what you are doing. What is the "running=20 > sum" for? I think I might understand. You look at the first 5 inputs=20 > and output codes for all five positions. I'm not sure why you can't=20 > look at the first 6 inputs though. This outputs a three bit code of the= =20 > number of 1's found. The second block looks at the next five inputs=20 > and outputs five codes. The last five bits would be like the second=20 > group and have a mux with the second group when in turn is what actually= =20 > feeds the first mux. The first group would be one level of LUTs. The=20 > following two groups >=20 > Let me try to draw this... >=20 > ,------, 3 ,-----, > 0-5 | |--/------------------------|SEL | > -->--| | 20 | | 20 > | |--/------------------------| BUM*|--/-->-- > '------' | | > ,---| | > ,------, 3 ,-----, | '-----' > 6-10 | |--/--------|SEL | | > -->--| | 20 | | | > | |--/--------| | | > '------' | | 20 | > | BUM*|--/--' > ,------, | | > 11-15 | | | | > -->--| | 20 | | > | |--/--------| | > '------' '-----' >=20 > *Big, Ugly Mux >=20 > The mux might be hard to work out and will surely be more than 1 level=20 > of LUTs.... unless you can use the magic muxes in the slice to combine=20 > multiple LUTs into a 6 input mux. You don't need any adders for the=20 > counts since each 3 bit count controls a separate mux. This might just= =20 > work in three levels of LUTs if you can use multiple LUTs to form a 6=20 > input mux. >=20 > I just read your post where you said you were running at 350 MHz. I=20 > guess even this will have to be pipelined. But it should be less logic= =20 > than the brute force distributed RAM approach. But who knows until the= =20 > LUTs are counted? In essence this is the same thing I guess. It might= =20 > work better with the larger front end blocks and just one mux. >=20 > I'm very surprised the clock to out time on the V7 BRAM is 2.1ns. I=20 > think that is about the same number as the Spartan 3s from long ago. Am= =20 > I mistaken? >=20 > --=20 >=20 > Rick The BRAM output is 2.1 ns, but if you use the output register (which I have= to) it's 750 ps. Then the BRAM has 2 cycles of latency. Yes, something like you show would work. The design I'd written up had the= sums as inputs to the LUTs. So the top LUT could look at 6 bits (I said 5= originally because I was going to use the MUXCY but I abandoned that). Th= en the next LUT looks at 4 bits, and the other 2 inputs would be the 2-bit = sum of the first 5 bits. And the next LUT looks at 4 more bits and also as= a 2-bit sum of the first 10 bits. (This is for the 3rd encoded output so = we're looking for the 3rd bit set.) I end up with 4 of these LUTs, 2 level= s of LUTs to do the sums, and an F7/F8 mux afterward to pick one of the 4 L= UTs. So that's 3 levels of LUTs and an F7/F8, which would work in 1 cycle.= The whole thing would be about 100 LUTs. =20 I couldn't get that to work, though, because I can't get Vivado to synthesi= ze anything right, and I was going to have to instantiate a lot of primitiv= es (including the F7/F8 muxes). I couldn't even get Vivado to do the sums = correctly. You should be able to find the mod-2 sum of up to 18 bits with = 8 LUTs in 2 levels, but Vivado does 3 levels. It's pitiful. I ended up doing something else. I did a trailing-one detector like this: wire [15:0] trailing_1 =3D ~(input_vec[15:0]-1) & input_vec[15:0]; This uses the carry chain. I think the idea is from Knuth. That gives you= a 16-bit vector with just the trailing 1 set. =20 You encode that for the 1st output. You the same thing with a mirrored ver= sion of input_vec to do a leading-one detector and encode that for the 2nd = output. Then you XOR those two vectors with the original to get a vector w= ith just the 3 middle bits still set. You do another leading/trailing 1 de= tector and encode those two and then XOR those with the original and you ha= ve a vector with 1 bit set and you encode that. That's all 200 LUTs and I pipelined it for 3 cycles of latency. There's a = lot of slack so I might be able to do it in 2 but I'm not sure if I want to= risk it.Article: 157924
On Tuesday, May 12, 2015 at 9:42:58 PM UTC-4, John Larkin wrote: > On Tue, 12 May 2015 13:57:47 -0700 (PDT), > lasselangwadtchristensen@gmail.com wrote: >=20 > >Den mandag den 11. maj 2015 kl. 19.59.43 UTC+2 skrev John Larkin: > >> Does anyone know if the ZYNQ chips have an internal high-temperature > >> shutdown? They are behaving like they do. > >>=20 > > > >looks like you have to enable it (it may be default) and you have to loa= d the PL=20 > > > >30.3.6 Critical Over-temperature Alarm > >Note: This feature sends an interrupt status to the PS and causes an au= tomatic shutdown feature for=20 > >the PL side of the Zynq-7000 device if enabled. Th e PL shutdown is enab= led via the bitstream and the=20 > >PL will only come out of power-down if th e over-temperature alarm goes = inactive or a=20 > >reconfiguration occurs. > >The on-chip temperature measurement is used for critical temperature war= nings. The default over=20 > >temperature threshold is 125=B0C. This threshold is used when the conte= nts of the OT Upper Alarm=20 > >register (listed in UG480) have not been configured. When the die temper= ature exceeds the=20 > >threshold set in the XADC's Control register, the ov er-temperature alar= m (OT) becomes active. The OT=20 > >signal resets when the die temperature has fallen below set threshold.= =20 > >The OT alarm can also be used to automatically power down the PL upon ac= tivation. The OT alarm can=20 > >be disabled by writing a 1 to the OT bit in the XADC's Configuration r= egister. > >Note: these registers are in the XADC and are accessible using the DRP. > > > >-Lasse >=20 > It's probably shutting down at 125C, without our specifically > programming any temperature. >=20 > Extensive searching, by us and by Avnet, finds no fan that matches the > hole spacing on the MicroZed board. So we'll fab a little aluminum > adapter plate and use a standard fan. With a pin-fin heat sink glued > to the 7020 FPGA, and the fan blowing down on that, we can run at 100C > ambient. >=20 >=20 >=20 > --=20 >=20 > John Larkin Highland Technology, Inc > picosecond timing laser drivers and controllers >=20 > jlarkin att highlandtechnology dott com > http://www.highlandtechnology.com The MicroZed has a -I part on it, right? Those parts are spec'd at a max j= unction temp of 100 C. You need the Expanded temperature grade parts (Q) t= o get the 125 C junction temps.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z