Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
jmariano <jmariano65@gmail.com> wrote: > On Saturday, May 2, 2015 at 2:25:55 PM UTC+1, Brian Davis wrote: (snip) >> See this post/thread for more info and other debugging suggestions: >> https://groups.google.com/d/msg/comp.arch.fpga/l1zQYEyTmV8/qW-SIiYrFmMJ (snip) > BobH, the S3SKB has a on board 50 MHz clock connected to > the FPGA. You can replace it by other clock generator but > that's it. Of course you can use DCM inside the FPGA to > juggle with your clock, but not from the outside. You can also chain DCMs, though it is better not to to that. If you do, you should use the LOCK signal of one to keep the next one from trying to lock. (I believe with an inverter in between.) With two DCMs you can get from 50MHz to 14.318181818...MHz. -- glenArticle: 157876
>Hi Gabor, thanks for the help. Digilent's documentation is spase but the >board revision is the same and the manual is the same for both FPGA, so >I assume ucf to be the same. That's surprising. They were founded by two profs from WSU Pullman and every thing that I have see from them looked like classroom materials. Everything was pulled together and was placed in one document. John Eaton BTW: Never assume anything. --------------------------------------- Posted through http://www.FPGARelated.comArticle: 157877
Hi All, Thanks for helping! After a bit of experimentation I found problem! In EDK 7.1 (I don't know on newer versions) when you change the device in Project Options, all the files are changed except platgen.opt. This one was to be changed by hand. When I did this, the project compiled OK. John Eaton - Sorry, but my English is not enough to know if you're being sarcastic or not...Article: 157878
On 5/6/2015 11:15 AM, jmariano wrote: > Hi All, > > Thanks for helping! > > After a bit of experimentation I found problem! > In EDK 7.1 (I don't know on newer versions) when you change the device in Project Options, all the files are changed except platgen.opt. This one was to be changed by hand. When I did this, the project compiled OK. > > John Eaton - Sorry, but my English is not enough to know if you're being sarcastic or not... > Don't feel bad, I am a native English speaker and I think that I used to work with John (Vancouver, WA) and I don't know if he was being sarcastic or not! BobHArticle: 157879
Hi I downloas sc2v and i installed yacc and lex but i don't know what to do and how to realize the conversion .. Could anyone help me please??Article: 157880
On Thu, 07 May 2015 00:15:47 -0500, princesse91 wrote: > Hi I downloas sc2v and i installed yacc and lex but i don't know what to > do and how to realize the conversion .. Could anyone help me please?? If it needs yacc and lex then it sounds like what you have is the source code for the application. You either need to build the application, find a binary that'll work, or run away screaming. If you just want it to work and if there's no binaries available, that's a sign to run away screaming. -- Tim Wescott Wescott Design Services http://www.wescottdesign.comArticle: 157881
Dne =C4=8Detrtek, 07. maj 2015 07.15.51 UTC+2 je oseba princesse91 napisala= : > Hi I downloas sc2v and i installed yacc and lex but i don't know what to = do and how to realize the conversion .. Could anyone help me please?? in Ubuntu 15.04 I was able to compile sc2v after installing packages flex a= nd bison (command: sudo apt-get install flex bison). Then you have to compi= le the source, which creates three executables in bin folder (sc2v_step1, s= c2v_step2 ans sc2v_step3). A conversion from SystemC to Verilog can then be= done by calling the sc2.sh (e.g.: ./sc2v.sh ../examples/sc_ex1). Hope it help JanArticle: 157882
On Fri, 08 May 2015 12:34:24 -0700, jancooo wrote: > Dne četrtek, 07. maj 2015 07.15.51 UTC+2 je oseba princesse91 napisala: >> Hi I downloas sc2v and i installed yacc and lex but i don't know what >> to do and how to realize the conversion .. Could anyone help me >> please?? > > in Ubuntu 15.04 I was able to compile sc2v after installing packages > flex and bison (command: sudo apt-get install flex bison). Then you have > to compile the source, which creates three executables in bin folder > (sc2v_step1, sc2v_step2 ans sc2v_step3). A conversion from SystemC to > Verilog can then be done by calling the sc2.sh (e.g.: ./sc2v.sh > ../examples/sc_ex1). > > Hope it help Jan It occurs to me that if Princess is using Windows then things just won't work right. In which case he/she/it should probably build under Cygwin, unless he/she/ it wishes to learn a lot more about making things work in a dual-build environment than may be sane. -- Tim Wescott Wescott Design Services http://www.wescottdesign.comArticle: 157883
On Monday, June 23, 2014 at 1:35:52 AM UTC-7, Russell Dill wrote: > As part of my research, I needed a BCH encoder/decoder engine. Sadly, suc= h a thing has no existed under a permissive license. Even more depressing i= s that many students seem to submit Verilog or VHDL engines as a project (o= r even research), but never release anything that is usable. >=20 > Anyway, I'm releasing a BSD licensed Verilog BCH encoder/decoder. It offe= rs: >=20 > * Parallel input/output > * Modular components that can be shared across multiple decoders > * Automatic selection of BCH parameters based on data size and errors to = be corrected > * Specialized error locators for 1 error and 2 error codes > * Parallel or serial error polynomial generator for codes with 2 or more = errors >=20 > https://github.com/russdill/bch_verilog >=20 > I'm releasing this under BSD because I'd like to see the code used as wid= ely as possible, but I'd still like to get feedback and hopefully improveme= nts merged back in. >=20 > As an example, a decoder for a 512 byte data block that corrects up to 12= errors with an 8 bit wide input and an 8 bit wide output currently occupie= s 1635 slices and operates at up to 191 MHz on a Virtex-6 LX240T-3. Such a = decoder would take input for 532 clock cycles (512 data bytes, 20 ecc bytes= ), calculate for about 28 clock cycles, and then produce output for 512 clo= ck cycles. >=20 > The code currently compiles on Icarus Verilog (latest git) and Xilinx XST= /Isim (tested with 14.5). Awesome code, thanks for making it available ! Simulations run great on Icarus. However, I'm having execution time trouble on XST. On a Thinkpad T540P with 16 GB DDR3, a DATA_BITS=3D1024,T=3D8,BITS=3D8 is s= till synthesizing after 12 hours (note that 1 of 4 CPUs is fully utilized s= o it's not a case of continual memory swapping). What compute capabilities did you use for DATA_BITS=3D4096,T=3D12,BITS=3D16= on XST ? How long did it take ? I am using -loop_iteration_limit 2048 (for my case, 1024 barfs out) and -opt_level 2 Thanks !Article: 157884
On Sunday, May 10, 2015 at 9:33:19 AM UTC-7, pau...@gmail.com wrote: > On Monday, June 23, 2014 at 1:35:52 AM UTC-7, Russell Dill wrote: > > As part of my research, I needed a BCH encoder/decoder engine. Sadly, s= uch a thing has no existed under a permissive license. Even more depressing= is that many students seem to submit Verilog or VHDL engines as a project = (or even research), but never release anything that is usable. > >=20 > > Anyway, I'm releasing a BSD licensed Verilog BCH encoder/decoder. It of= fers: > >=20 > > * Parallel input/output > > * Modular components that can be shared across multiple decoders > > * Automatic selection of BCH parameters based on data size and errors t= o be corrected > > * Specialized error locators for 1 error and 2 error codes > > * Parallel or serial error polynomial generator for codes with 2 or mor= e errors > >=20 > > https://github.com/russdill/bch_verilog > >=20 > > I'm releasing this under BSD because I'd like to see the code used as w= idely as possible, but I'd still like to get feedback and hopefully improve= ments merged back in. > >=20 > > As an example, a decoder for a 512 byte data block that corrects up to = 12 errors with an 8 bit wide input and an 8 bit wide output currently occup= ies 1635 slices and operates at up to 191 MHz on a Virtex-6 LX240T-3. Such = a decoder would take input for 532 clock cycles (512 data bytes, 20 ecc byt= es), calculate for about 28 clock cycles, and then produce output for 512 c= lock cycles. > >=20 > > The code currently compiles on Icarus Verilog (latest git) and Xilinx X= ST/Isim (tested with 14.5). >=20 >=20 > Awesome code, thanks for making it available ! >=20 > Simulations run great on Icarus. >=20 > However, I'm having execution time trouble on XST. >=20 > On a Thinkpad T540P with 16 GB DDR3, a DATA_BITS=3D1024,T=3D8,BITS=3D8 is= still synthesizing after 12 hours (note that 1 of 4 CPUs is fully utilized= so > it's not a case of continual memory swapping). > > What compute capabilities did you use for DATA_BITS=3D4096,T=3D12,BITS=3D= 16 on XST ? > How long did it take ? >=20 > I am using -loop_iteration_limit 2048 (for my case, 1024 barfs out) and > -opt_level 2 I've pushed some updates related to corner cases and syndrome computation, go ahead and pull and give it another try. The main thing that will make XST run "forever" is swapping due to lack of = RAM. For synthesizing single channel decoders, I'd recommend at least 16GB.= For multi-channel, 32GB.Article: 157885
On Sunday, May 10, 2015 at 2:42:36 PM UTC-7, Russell Dill wrote: > On Sunday, May 10, 2015 at 9:33:19 AM UTC-7, pau...@gmail.com wrote: > > On Monday, June 23, 2014 at 1:35:52 AM UTC-7, Russell Dill wrote: > > > As part of my research, I needed a BCH encoder/decoder engine. Sadly,= such a thing has no existed under a permissive license. Even more depressi= ng is that many students seem to submit Verilog or VHDL engines as a projec= t (or even research), but never release anything that is usable. > > >=20 > > > Anyway, I'm releasing a BSD licensed Verilog BCH encoder/decoder. It = offers: > > >=20 > > > * Parallel input/output > > > * Modular components that can be shared across multiple decoders > > > * Automatic selection of BCH parameters based on data size and errors= to be corrected > > > * Specialized error locators for 1 error and 2 error codes > > > * Parallel or serial error polynomial generator for codes with 2 or m= ore errors > > >=20 > > > https://github.com/russdill/bch_verilog > > >=20 > > > I'm releasing this under BSD because I'd like to see the code used as= widely as possible, but I'd still like to get feedback and hopefully impro= vements merged back in. > > >=20 > > > As an example, a decoder for a 512 byte data block that corrects up t= o 12 errors with an 8 bit wide input and an 8 bit wide output currently occ= upies 1635 slices and operates at up to 191 MHz on a Virtex-6 LX240T-3. Suc= h a decoder would take input for 532 clock cycles (512 data bytes, 20 ecc b= ytes), calculate for about 28 clock cycles, and then produce output for 512= clock cycles. > > >=20 > > > The code currently compiles on Icarus Verilog (latest git) and Xilinx= XST/Isim (tested with 14.5). > >=20 > >=20 > > Awesome code, thanks for making it available ! > >=20 > > Simulations run great on Icarus. > >=20 > > However, I'm having execution time trouble on XST. > >=20 > > On a Thinkpad T540P with 16 GB DDR3, a DATA_BITS=3D1024,T=3D8,BITS=3D8 = is still synthesizing after 12 hours (note that 1 of 4 CPUs is fully utiliz= ed so > > it's not a case of continual memory swapping). > > > > What compute capabilities did you use for DATA_BITS=3D4096,T=3D12,BITS= =3D16 on XST ? > > How long did it take ? > >=20 > > I am using -loop_iteration_limit 2048 (for my case, 1024 barfs out) and > > -opt_level 2 >=20 >=20 > I've pushed some updates related to corner cases and syndrome > computation, go ahead and pull and give it another try. >=20 > The main thing that will make XST run "forever" is swapping due to lack o= f RAM. For synthesizing single channel decoders, I'd recommend at least 16G= B. For multi-channel, 32GB. Thanks Russel. I believe you've only changed the Makefile yes ? What's your approximate synthesis time for sim.v with DATA_BITS=3D1024,T=3D= 8,BITS=3D8 ? I'm not sure what you mean by single / multiple channels. Cheers, -PaulArticle: 157886
On Sunday, May 10, 2015 at 3:23:32 PM UTC-7, pau...@gmail.com wrote: > On Sunday, May 10, 2015 at 2:42:36 PM UTC-7, Russell Dill wrote: > > On Sunday, May 10, 2015 at 9:33:19 AM UTC-7, pau...@gmail.com wrote: > > > On Monday, June 23, 2014 at 1:35:52 AM UTC-7, Russell Dill wrote: > > > > As part of my research, I needed a BCH encoder/decoder engine. Sadl= y, such a thing has no existed under a permissive license. Even more depres= sing is that many students seem to submit Verilog or VHDL engines as a proj= ect (or even research), but never release anything that is usable. > > > >=20 > > > > Anyway, I'm releasing a BSD licensed Verilog BCH encoder/decoder. I= t offers: > > > >=20 > > > > * Parallel input/output > > > > * Modular components that can be shared across multiple decoders > > > > * Automatic selection of BCH parameters based on data size and erro= rs to be corrected > > > > * Specialized error locators for 1 error and 2 error codes > > > > * Parallel or serial error polynomial generator for codes with 2 or= more errors > > > >=20 > > > > https://github.com/russdill/bch_verilog > > > >=20 > > > > I'm releasing this under BSD because I'd like to see the code used = as widely as possible, but I'd still like to get feedback and hopefully imp= rovements merged back in. > > > >=20 > > > > As an example, a decoder for a 512 byte data block that corrects up= to 12 errors with an 8 bit wide input and an 8 bit wide output currently o= ccupies 1635 slices and operates at up to 191 MHz on a Virtex-6 LX240T-3. S= uch a decoder would take input for 532 clock cycles (512 data bytes, 20 ecc= bytes), calculate for about 28 clock cycles, and then produce output for 5= 12 clock cycles. > > > >=20 > > > > The code currently compiles on Icarus Verilog (latest git) and Xili= nx XST/Isim (tested with 14.5). > > >=20 > > >=20 > > > Awesome code, thanks for making it available ! > > >=20 > > > Simulations run great on Icarus. > > >=20 > > > However, I'm having execution time trouble on XST. > > >=20 > > > On a Thinkpad T540P with 16 GB DDR3, a DATA_BITS=3D1024,T=3D8,BITS=3D= 8 is still synthesizing after 12 hours (note that 1 of 4 CPUs is fully util= ized so > > > it's not a case of continual memory swapping). > > > > > > What compute capabilities did you use for DATA_BITS=3D4096,T=3D12,BIT= S=3D16 on XST ? > > > How long did it take ? > > >=20 > > > I am using -loop_iteration_limit 2048 (for my case, 1024 barfs out) a= nd > > > -opt_level 2 > >=20 > >=20 > > I've pushed some updates related to corner cases and syndrome > > computation, go ahead and pull and give it another try. > >=20 > > The main thing that will make XST run "forever" is swapping due to lack= of RAM. For synthesizing single channel decoders, I'd recommend at least 1= 6GB. For multi-channel, 32GB. >=20 >=20 >=20 >=20 > Thanks Russel. >=20 > I believe you've only changed the Makefile yes ? No, you can see the full list of changes here: https://github.com/russdill/bch_verilog/compare/cfd444733f...cee257ae47 > What's your approximate synthesis time for sim.v with DATA_BITS=3D1024,T= =3D8,BITS=3D8 ? tb_sim.v and sim.v were not intended to be synthesizable.=20 > I'm not sure what you mean by single / multiple channels. Running multiple decoders in parallelArticle: 157887
Does anyone know if the ZYNQ chips have an internal high-temperature shutdown? They are behaving like they do. -- John Larkin Highland Technology, Inc picosecond timing precision measurement jlarkin att highlandtechnology dott com http://www.highlandtechnology.comArticle: 157888
John Larkin <jlarkin@highlandtechnology.com> wrote: > Does anyone know if the ZYNQ chips have an internal high-temperature > shutdown? They are behaving like they do. Well, all chips have a high temperature shutdown, but you mean one that was designed in, right? -- glenArticle: 157889
On Mon, 11 May 2015 18:07:00 +0000 (UTC), glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote: >John Larkin <jlarkin@highlandtechnology.com> wrote: > >> Does anyone know if the ZYNQ chips have an internal high-temperature >> shutdown? They are behaving like they do. > >Well, all chips have a high temperature shutdown, but you mean >one that was designed in, right? > >-- glen Yeah, something it might recover from. As it seems to do. -- John Larkin Highland Technology, Inc picosecond timing precision measurement jlarkin att highlandtechnology dott com http://www.highlandtechnology.comArticle: 157890
On Mon, 11 May 2015 10:59:38 -0700, John Larkin <jlarkin@highlandtechnology.com> wrote: > >Does anyone know if the ZYNQ chips have an internal high-temperature >shutdown? They are behaving like they do. I did find this: ds190-Zynq-7000-Overview.pdf "A user-specified limit (for example, 100C) can be used to initiate an automatic power-down." I wonder what we specified! -- John Larkin Highland Technology, Inc picosecond timing precision measurement jlarkin att highlandtechnology dott com http://www.highlandtechnology.comArticle: 157891
On Mon, 11 May 2015 13:10:43 -0700, John Larkin wrote: > On Mon, 11 May 2015 10:59:38 -0700, John Larkin > <jlarkin@highlandtechnology.com> wrote: > > >>Does anyone know if the ZYNQ chips have an internal high-temperature >>shutdown? They are behaving like they do. > > I did find this: > > ds190-Zynq-7000-Overview.pdf > > "A user-specified limit (for example, 100°C) can be used to initiate an > automatic power-down." > > I wonder what we specified! There are references in ug585 (the Zynq TRM) to ug480 for the temperature sensor stuff, it looks to be common to all the 7 series.Article: 157892
On Mon, 11 May 2015 20:53:11 +0000 (UTC), Robert Swindells <rjs@fdy2.co.uk> wrote: >On Mon, 11 May 2015 13:10:43 -0700, John Larkin wrote: > >> On Mon, 11 May 2015 10:59:38 -0700, John Larkin >> <jlarkin@highlandtechnology.com> wrote: >> >> >>>Does anyone know if the ZYNQ chips have an internal high-temperature >>>shutdown? They are behaving like they do. >> >> I did find this: >> >> ds190-Zynq-7000-Overview.pdf >> >> "A user-specified limit (for example, 100C) can be used to initiate an >> automatic power-down." >> >> I wonder what we specified! > >There are references in ug585 (the Zynq TRM) to ug480 for the temperature >sensor stuff, it looks to be common to all the 7 series. We epoxied a pin-fin heat sink to the top, and added a fan. That helps a lot. https://dl.dropboxusercontent.com/u/53724080/Thermal/FPGA_Fan.JPG -- John Larkin Highland Technology, Inc picosecond timing precision measurement jlarkin att highlandtechnology dott com http://www.highlandtechnology.comArticle: 157893
> > I believe you've only changed the Makefile yes ? > > No, you can see the full list of changes here: > > https://github.com/russdill/bch_verilog/compare/cfd444733f...cee257ae47 Sorry for the imprecise question, my first pull was a week ago so I meant since then, I believe the Makefile only has changed but I may be wrong again :) > > > What's your approximate synthesis time for sim.v with DATA_BITS=1024,T=8,BITS=8 ? > > > tb_sim.v and sim.v were not intended to be synthesizable. Sure, bad question again, I hacked your sim.v (and unfortunately kept the same name ...) to contain bch_syndrome, bch_errors_present, bch_sigma_bma_parallel and bch_error_tmec (+ hook-ups ... etc). If I try to synthesize the whole thing with T=12, DATA_BITS=4096 I hit a wall (on a 16GB machine, 1 BCH channel only). So I narrowed it down: bch_syndrome, bch_errors_present, bch_sigma_bma_parallel all synthesize individually in 10 minutes or fewer and use < 2GB DDR even if I use T=64, DATA_BITS=8192 (>5x T and 2x DATA_BITS of above) However, bch_error_tmec ALONE with T=12, DATA_BITS=4096 only, takes 1+1/2 hours and reaches 7 GB DDR utilization with a funny pattern of slow ramp-ups and sharp declines - it doesn't use up all the available DDR though, i.e., there are 5 or more GB available at all times. T=64, DATA_BITS=8192 barfs. I tried chien separately and got the same result as with bch_error_tmec. Do you expect chien to be so much harder to synthesize than all the rest ? From my past experience with Reed-Solomon I sort of expected Berlekamp-Massey and Chien to be of somewhat comparable complexity. Thanks ! -PaulArticle: 157894
I'm trying to design a circuit (Virtex-7) which you might call either a pri= ority encoder or a sorter. This is what it should do: <i>Given a 16-bit vector with 5 bits set, create a list of 5 4-bit encoded = values of each bit set. These needn't be in order.</i> This turns out to be a lot harder than I thought. Writing the behavioral R= TL isn't hard, but Vivado synthesizes it to 16 levels of logic, and when I = draw out an optimized version, I still get at least 5 levels (using 6-input= LUTs). I'd like to do it with minimal latency, but I can only do about 3 = levels of logic at my clock speed. I've tried thinking about how I can use= the carry chain muxes but they don't seem to be helpful. I can do a leadi= ng-ones detector and encode the leading 1, but I'd probably have to pipelin= e each stage so that would take 5 cycles.Article: 157895
On 5/11/2015 9:27 PM, Kevin Neilson wrote: > I'm trying to design a circuit (Virtex-7) which you might call either > a priority encoder or a sorter. This is what it should do: > > <i>Given a 16-bit vector with 5 bits set, create a list of 5 4-bit > encoded values of each bit set. These needn't be in order.</i> > > This turns out to be a lot harder than I thought. Writing the > behavioral RTL isn't hard, but Vivado synthesizes it to 16 levels of > logic, and when I draw out an optimized version, I still get at least > 5 levels (using 6-input LUTs). I'd like to do it with minimal > latency, but I can only do about 3 levels of logic at my clock speed. > I've tried thinking about how I can use the carry chain muxes but > they don't seem to be helpful. I can do a leading-ones detector and > encode the leading 1, but I'd probably have to pipeline each stage so > that would take 5 cycles. I'm not sure what you mean by "list". Logic circuits don't normally use lists. You want to know which five lines are set. It is easy to encode that into four bit values. But once you have the values what exactly do you want to do with them to provide them in a "list"? If you want them accessible on the same four data lines in sequence, what determines the timing of the sequence? What exactly is your interface? -- RickArticle: 157896
The list is a 20-bit vector comprising the 5 4-bit values. This is put into a 20-bit wide FIFO; each of the 5 values will be processed simultaneously when read from the FIFO. So another way you could describe this is a 16-bit/5-hot -> 20-bit encoder.Article: 157897
To be clearer, here's an example. Input (bit 15 on left): 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 20-bit Output List: 1111 1110 0010 0001 0000Article: 157898
On Monday, May 11, 2015 at 6:13:58 PM UTC-7, pau...@gmail.com wrote: > > > I believe you've only changed the Makefile yes ? > >=20 > > No, you can see the full list of changes here: > >=20 > > https://github.com/russdill/bch_verilog/compare/cfd444733f...cee257ae47 >=20 > Sorry for the imprecise question, my first pull was a week ago so I meant= since then, I believe the Makefile only has changed but I may be wrong aga= in :) You are looking at the dates of the commits. The commits happened a while a= go, but were only recently pushed. > >=20 > > > What's your approximate synthesis time for sim.v with DATA_BITS=3D102= 4,T=3D8,BITS=3D8 ? > >=20 > >=20 > > tb_sim.v and sim.v were not intended to be synthesizable.=20 >=20 > Sure, bad question again, I hacked your sim.v (and unfortunately kept the= same name ...) to contain bch_syndrome, bch_errors_present, bch_sigma_bma_= parallel and bch_error_tmec (+ hook-ups ... etc). >=20 > If I try to synthesize the whole thing with T=3D12, DATA_BITS=3D4096 I hi= t a wall > (on a 16GB machine, 1 BCH channel only). Here's xilinx_error_tmec with PIPELINE_STAGES=3D2, DATA_BITS=3D4096, T=3D12= , BITS=3D8, REG_RATIO=3D8 xst: 524.43user 4.14system 8:42.02elapsed 101%CPU (0avgtext+0avgdata 13612104max= resident)k 0inputs+26104outputs (0major+4229163minor)pagefaults 0swaps map: 11.68user 0.12system 0:11.81elapsed 100%CPU (0avgtext+0avgdata 581040maxres= ident)k 0inputs+184outputs (0major+155622minor)pagefaults 0swaps So XST is using nearly 14GB. Depending on your machine's configuration, 16G= B of physical memory would not have been enough. Incidentally, the decoder = uses 612 slices, and runs at least 200MHz. > So I narrowed it down: >=20 > bch_syndrome, bch_errors_present, bch_sigma_bma_parallel all synthesize i= ndividually in 10 minutes or fewer and use < 2GB DDR even if I use =20 > T=3D64, DATA_BITS=3D8192 (>5x T and 2x DATA_BITS of above)=20 >=20 > However, bch_error_tmec ALONE with > T=3D12, DATA_BITS=3D4096 only, takes 1+1/2 hours and reaches 7 GB DDR uti= lization with a funny pattern of slow ramp-ups and sharp declines - it does= n't use up all the available DDR though, i.e., there are 5 or more GB avail= able at all times. >=20 > T=3D64, DATA_BITS=3D8192 barfs. >=20 > I tried chien separately and got the same result as with bch_error_tmec. >=20 >=20 > Do you expect chien to be so much harder to synthesize than all the rest = ? > From my past experience with Reed-Solomon I sort of expected Berlekamp-Ma= ssey > and Chien to be of somewhat comparable complexity. The way I'm dynamically compiling the chien modules is giving XST a hard ti= me. Although going bit-parallel, you do need a lot of parallel multipliers = for T=3D64 (some 512 of them). I've created quite a few variants to get the= most out of XST, each variant with it's own strengths and weaknesses. If y= ou change the multiplier in bch_chien_expand from a parallel_standard_multi= plier_const1 to a parallel_standard_multilier, you can save some memory, bu= t not the orders of magnitude required. Here's an example at: PIPELINE_STAGES=3D1,DATA_BITS=3D4096,T=3D32,BITS=3D8,= REG_RATIO=3D8 xst: 1925.94user 9.58system 31:49.83elapsed 101%CPU (0avgtext+0avgdata 28960864m= axresident)k map: 565608inputs+50552outputs (353major+8315051minor)pagefaults 0swaps 13.37user 0.15system 0:13.60elapsed 99%CPU (0avgtext+0avgdata 613264maxresi= dent)k 2104inputs+208outputs (3major+163868minor)pagefaults 0swaps So you can see that in this case xst is using about 29GB of memory. I origi= nally targeted the code for around T=3D3 to T=3D16. If you want to get up t= o T=3D64, you'll likely need to figure out why XST is taking so much memory= synthesizing the multipliers, likely by paring this down bit by bit. Additionally, the number of pipeline stages in error tmec is limited to 2 r= ight now. You might need to go higher to get 64 13 bit terms summed togethe= r and compared with zero. >=20 >=20 > Thanks ! >=20 > -PaulArticle: 157899
On 5/11/2015 10:26 PM, Kevin Neilson wrote: > To be clearer, here's an example. > > Input (bit 15 on left): > > 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 > > 20-bit Output List: > 1111 1110 0010 0001 0000 I am old school so I have to picture these things.... I see five priority encoders. The first priority encoder output is used to disable that input to the second priority encoder, etc. How did you code it? I can see how this would be a lot more than three levels of logic. The equations get quite long. When you talk about levels of logic I assume you mean layers of LUTs? I can see maybe each 16 input priority encoder being no more than 3 levels of LUTs, but all five layers with the inhibit logic... I don't think so. I expect the tools did a pretty good job of optimizing it and I don't easily see any way of using carry chains. -- Rick
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z