Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
What is your synthesis result, and in which way do you want to improve it (e.g. speed vs. area)? What is your design goal? What are your constraints? (time & area) This is not just a simple comparator, it's a magnitude comparator. There's also an adder included. Is verilog so type tolerant, that the vector length is not important? You are comparing a 13 bit Vector with a 10 bit vector. VHDL would badly complain about it. But if we just leave it like that, and assume the upper bits to be constantly 0 the resulting circuit will always be a wire to vcc. Thus being optimal for time and area. :-) It only changes when the addition of addr and length (A) generates a carry(C). 13'h400 = 0010000000000 0010000000000 00CAAAAAAAAAA Only then the resulting 11 bit vector can be greater than the constant. But how shall the synthesis tool decide whether you want to build an adder with carry or not? Or is there a default mechanism in the arithmetic synthesis of verilog? have a nice synthesis Eilert water9580@yahoo.com schrieb: > wire [6:0] length; > wire [11:0] addr; > > > > assign len_lte=((addr[11:2]+length)<=13'h400)?1'b1:1'b0; > > thxArticle: 131726
Hi all, I'm glad to see that the XUPV2P board is now supported by EDK 10.1. However, setting the MPCM memory controller to work seems complex and up to now I hadn't success (256 MB DDR DIMM memory module correctly working before with the standard PLB and OPB controllers). I set up parameters like the previous controllers and the clocking has been realized following Xilinx guide, but I cannot make it work... Any one had success with that? Thank you very much! Bye and thanks!!!Article: 131727
Hi , Bitgen generates as readback files ( .rba & .rbb) , but wich one contains exactly the data that is configured in the FPGA ? no one ? M.BArticle: 131728
On 28 avr, 15:08, bommels <bart.homm...@gmail.com> wrote: > On Apr 26, 10:33 pm, Alan Nishioka <a...@nishioka.com> wrote: > > > swissiyous...@gmail.com wrote: > > > is someone has the code of CRC (cyclical redundancy check) that xilinx > > > use in the bitstream ? is it a simple CRC ( the XOR of words ) ? > > > try > > <http://www.xilinx.com/support/documentation/application_notes/xapp151...> > > > Alan Nishioka > > I found this one to be very useful:http://www.easics.be/webtools/crctool > > In case you're trying to construct the FCS of an ethernet packet: > please note it is not equal to the CRC, you have to do some bit/byte > juggling to convert CRC to FCS. > > Bart Thank you for useful information !Article: 131729
Hi Tommy, It depends how you want to benchmark, only using features that your CPU has? (lacking large local memory). The code footprint when using optimized printf is around 50k with data. Using a processor with 8kbyte dcache and 16kbyte dcache on an application that is just twice the size dont seems to be valid. Cache effiencies is more likely to show when you have at least a 10-50x factor between cache size and code size. Also using cache will also include the external memory type and memory controller in the benchmark numbers. I guess they are not apples to apples between you and me. Using fast async sram as the external memory is not the same as using SDRAM. Yes, my results was with using float instead of double, I don't think you need to set the type to long since the values seems to be well within a byte. I took my board connected to my laptop, which is a ML505 (Virtex5 slowest speedgrade) and I didn't pushed the clock frequency. Göran "Tommy Thorn" <tommy.thorn@gmail.com> wrote in message news:0d6ce282-f79a-4dd2-b968-0af4ae735aba@1g2000prg.googlegroups.com... Thanks Göran, that's very impressive. You are right about the double precision, and output. With the below patch applied, I now clock in at 42.5 s. Could you try it again (I assume your numbers were with floats). Using local memory however doesn't make for an apples to apples comparison as this benchmark is memory heavy and local memory (as opposed to cache + slow memory) will give MB a large advantage. Thanks Tommy PS: Which FPGA was this on? On Apr 29, 5:31 am, "Göran Bilski" <goran.bil...@xilinx.com> wrote: > Hi, > > Actually the use of floating-point at all seems unnecessary in the > program. > Think this is a legacy of PC program where the usage of double (or float) > is > not performance critical as on CPU without a FPU. > > I think it's safe to change to double in the program to int without any > changes in result. > The program would not run faster on a MAC/PC with this change but it will > have a drastic effect on your CPU. > > Göran > > "Göran Bilski" <goran.bil...@xilinx.com> wrote in message > > news:fv70te$7s01@cnn.xsj.xilinx.com... > > > Hi, > > > I did a quick test with MicroBlaze. > > With 125 MHz and 64kbyte of local memory, it takes MicroBlaze 6.8s to > > run > > the benchmark. > > > I added two defines in the program. > > #define printf xil_printf > > #define double float > > The first define is to get a smaller code footprint since the default > > printf is bloated and no floating-point is printed. > > The second define will make the compiler to use the MicroBlaze FPU > > single-precision floating-point compare and conversion instructions. > > Neither defines will change the program result since there is no actual > > floating-point calculations, just compare and conversions. > > > Actually the program prints out a relative large number of characters > > and > > if I remove the printf statement that is part of the loop, the program > > executes in 6.1 s > > The baudrate will have an effect on the execution speed if too many > > prints > > exists in the timed section. > > > Göran > > > "Tommy Thorn" <tommy.th...@gmail.com> wrote in message > >news:f005305a-30b9-4ca2-ae01-7fd3e2622853@l17g2000pri.googlegroups.com... > >>I trying to get a feel for how the performance of my (so far > >> unoptimized) soft-core stacks up against the established competition, > >> so it would be a great help if people with convenient access to Nios > >> II / MicroBlaze respectively would compile and time this little app: > >>http://radagast.se/othello/endgame.c(It's an Othello endgame solver. > >> I didn't write it) and tell me the configuration. > > >> In case anyone cares, mine finished this in 100 seconds in this > >> configuration: 8 KiB I$, 16 KiB D$, 48 MHz clock frequency, async > >> sram. (My Mac finished this in ~ 0.5 sec :-) > > >> Thanks > >> TommyArticle: 131730
On 30 avr, 09:38, bamboutcha9...@hotmail.com wrote: > Hi , > Bitgen generates as readback files ( .rba & .rbb) , but wich one > contains exactly the data that is configured in the FPGA ? no one ? > > M.B " rba : An ASCII file that contains readback commands rather than configuration commands, and expected readback data where the configuration data would normally be. This file is only produced for Virtex/-E/Spartan-II/E devices. .rbb -: (Produced when "-g Readback" is specified) - The same as the ".rba" file, but it is a binary file. .rbd : (Produced when "-g Readback" is specified) - An ASCII file that contains only expected readback data, including pad words and frames. No commands are included."Article: 131731
Hi everyone, We've been shipping a Virtex4 FX20 based product for a few months now with relatively few problems. However, we're seeing a 60+% failure rate in our latest batch of boards, characterized by the V4 DCMs not locking or providing any output at all unless the chip is freezing cold. Neither the design, board house or CM shop has changed at all from our previous working batch, which generally runs pretty hot (~60C) with no known problems. The details I have found are: when powering up from room temperature, other logic implemented in the chip seems to work, but the DCMs have no output and are not locked. If I give the FPGA a good shot of cold spray and power up the device, it becomes fully functional, until the temperature rises somewhat and the DCM ceases completely. Among the samples I've tried, the temperature at which it stops working ranges from really cold to slightly below room temperature. The clock source is a 25MHz crystal oscillator -- I tried a function generator and nothing changes. I tried twiddling VCCAux as I've seen suggested and no difference there, either. Has anyone seen such a dramatic DCM failure before, or have any ideas what might be causing this? Thanks, Mike.Article: 131732
msn444@gmail.com wrote: > Hi everyone, > > We've been shipping a Virtex4 FX20 based product for a few months now > with relatively few problems. However, we're seeing a 60+% failure > rate in our latest batch of boards, characterized by the V4 DCMs not > locking or providing any output at all unless the chip is freezing > cold. Neither the design, board house or CM shop has changed at all > from our previous working batch, which generally runs pretty hot > (~60C) with no known problems. > > The details I have found are: when powering up from room temperature, > other logic implemented in the chip seems to work, but the DCMs have > no output and are not locked. If I give the FPGA a good shot of cold > spray and power up the device, it becomes fully functional, until the > temperature rises somewhat and the DCM ceases completely. Among the > samples I've tried, the temperature at which it stops working ranges > from really cold to slightly below room temperature. > > The clock source is a 25MHz crystal oscillator -- I tried a function > generator and nothing changes. I tried twiddling VCCAux as I've seen > suggested and no difference there, either. Has anyone seen such a > dramatic DCM failure before, or have any ideas what might be causing > this? > > Thanks, > Mike. Hi Mike, It probably isn't NBTI. http://www.xilinx.com/support/documentation/white_papers/wp224.pdf Google :- nbti group:comp.arch.fpga But, a bake at 150C for 48 hours would prove that it isn't! You also might liek to read this thread from a few days back about reset. Google :- "Virtex 4 DCM problem" group:comp.arch.fpga Good luck, Syms.Article: 131733
I have exactly the same problem... With the help of ebook Virtex5 FPGA Configuration User Guide I can "decompile" the encrypted bitstream to something like this... (NOPs arent shown) 0x0000012f: P1 WRITE CBC [30016004](4) 50 66 d8 9e f6 17 2c f8 49 ee 41 6b 3a de 6d de 0x00000143: P1 WRITE FDRI [30004000](0) 0x00000147: P2 WRITE [5003fd28](261416) 0x000ff60f: P1 WRITE CRC [30000001](1) 83 96 e4 9f I can see, that iv has the same value like that one in my .nky file (Key StartCBC 5066d89ef6172cf849ee416b3ade6dde;) and if I decrypt data part (with AES-256 mode CBC) from offset 0x0000014b, size 261416 * 4 Bytes with key from .nkey file, I get something like this 09 95 2D 5B D0 E3 D0 19 ..., that doesnt looks like a valid data, so I made another encrypted bitstream (with different key and iv). The encrypted bitstreams are different only in date, iv, crc and of course encryped data. But, when I decrypt data part of the second bitstream with his iv and key, I get different output (A5 7E F8 8A AC 74 30 F6 ...). So I think that you dont use the standard NIST C AES 256 CBC, if you would, then I should get same output from the 1. and 2. bitstream. And another thing, how are you computing crc, is it standard crc-32? after this command, crc is cleared, 0x00000097: P1 WRITE CMD [30008001](1) 00 00 00 07 Reset crc and which data from the following bistream part are involved? 0x000000a7: P1 WRITE TIMER [30022001](1) 00 00 00 00 0x000000af: P1 WRITE ??? [30026001](1) 00 00 00 00 0x000000b7: P1 WRITE COR0 [30012001](1) 00 00 31 e5 0x000000bf: P1 WRITE COR1 [3001c001](1) 00 00 00 00 0x000000c7: P1 WRITE IDCODE [30018001](1) 02 86 e0 93 0x000000cf: P1 WRITE CMD [30008001](1) 00 00 00 09 Switch clk 0x000000db: P1 WRITE MASK [3000c001](1) 00 40 04 40 0x000000e3: P1 WRITE CTL0 [3000a001](1) 00 40 04 40 0x000000eb: P1 WRITE MASK [3000c001](1) 00 00 00 00 0x000000f3: P1 WRITE CTL1 [30030001](1) 00 00 00 00 0x0000011b: P1 WRITE FAR [30002001](1) 00 00 00 00 0x00000123: P1 WRITE CMD [30008001](1) 00 00 00 01 Write CFG 0x0000012f: P1 WRITE CBC [30016004](4) 50 66 d8 9e f6 17 2c f8 49 ee 41 6b 3a de 6d de 0x00000143: P1 WRITE FDRI [30004000](0) 0x00000147: P2 WRITE [5003fd28](261416) 0x000ff60f: P1 WRITE CRC [30000001](1) 83 96 e4 9f Whole packets with their heads (30022001 00 00 00 00) or only the data parts (00 00 00 00) and what about NOPs (20 00 00 00) are they involved? I was trying to compute crc from all data, all data w/o nops, only the data parts but with no luck... Antonin KrizArticle: 131734
Austin, Thanks for your reply. Actually there is the AR#20810 where the power- up seems to be important issue to make the PROM visible in the JTAG chain. But it is not clear if it is related to the STMicro memory (in the AR) because in the description it says XCF32. I did try to disconnect the XCF02=B4s VCC pins, and reconnect after 2V5 and 1V2 are stable but the memory still not there. There is a disruption on the data on TDO since the tool can see the other devices but cannot get their ID properly. So, the memory is doing something. It=B4s TDO pin shows activity during the chain initialization. My current thoughts are about TMS and TCK slew rate and a possible sensitivity of this PROM on this matter. Best regards, AugustoArticle: 131735
Hi all, Wanted to verify a idea. Is is possible use Xilinx PCI express core in FPGA to use SMA ports for the physical layers rather than the normal PCI express slots? probably some parameters needs to be modified. Thanks ShakithArticle: 131736
On Apr 29, 8:19=A0pm, "Brad Smallridge" <bradsmallri...@dslextreme.com> wrote: > When I started with ROMs circa ISE 6.2 the COE thing > wasn't working past some small number of values. > Someone prompty directed me to infer the ROM and not > use the core generator at all. > > The VHDL looks something like this: > > type i2c_array_type is array(natural range <>) of natural; > constant i2c_data_array : i2c_array_type :=3D( > 0,0,0 -- put your constants here > ); > signal i2c_bit_index0 : unsigned(2 downto 0); > begin > i2c_data1 <=3D i2c_data_array(i2c_dat_index0); > > You can use natural,integer,enumerated data,std_logic_vectors, > or records to fit your design data type. All the simulation > data will be visible and the ISE reports will tell you how > many BRAMs were used. I don't think any deeper level of > simulation will provide you more confidence. > > If you post your code, I'm sure we can help you debug some > of the issues regarding conversions and the like. > > Brad Smallridge > AiVision Thanks for the replies. I don't need help with the code for the ROM. No problem with my writing the code and letting ISE infer the ROM, however what if I wanted to use block memory for a dual-port memory. This is a non-trivial coding effort. I am wondering what use are the memory cores if I am better off writing the code myself? I thought the purpose of the cores is not only to help the user by having the code pre-written by Xilinx's experts, but also to ensure that the design is optimal for fitting into the the FPGA. I used a core adder/ subtractor and comparator with no problems. The trick with the memory seems to be simulating the memory with the contents loaded. so maybe the dual-port memory would work OK. Thanks, CharlesArticle: 131737
charles.elias@wpafb.af.mil wrote: > > Thanks for the replies. I don't need help with the code for the ROM. > No problem with my writing the code and letting ISE infer the ROM, > however what if I wanted to use block memory for a dual-port memory. > This is a non-trivial coding effort. I am wondering what use are the > memory cores if I am better off writing the code myself? I thought > the purpose of the cores is not only to help the user by having the > code pre-written by Xilinx's experts, but also to ensure that the > design is optimal for fitting into the the FPGA. I used a core adder/ > subtractor and comparator with no problems. The trick with the memory > seems to be simulating the memory with the contents loaded. so maybe > the dual-port memory would work OK. Inferring a dual port memory is just as easy as a single port memory: constant AWIDTH : integer := 8; type drngs_array is array(0 to 2**AWIDTH-1) of std_logic_vector(31 downto 0); signal drngs_ram : drngs_array; signal rd_addr_drngs : unsigned(AWIDTH-1 downto 0); drngs_wr_p: process(CLK) begin if rising_edge(CLK) then if DRNGS_WE = '1' then drngs_ram(to_integer(DRNGS_WADR)) <= std_logic_vector(DRNGSI); end if; end if; end process drngs_wr_p; drngs_rd_p: process(CLK) begin if rising_edge(CLK) then rd_addr_drngs <= DRNGS_RADR; end if; end process drngs_rd_p; DRNGSO <= drngs_ram(to_integer(rd_addr_drngs)); I really cannot see a reason to be using coregen for things like rams, adders, or comparators. Synthesis tools will do just as good as coregen. I can see using it for complex things like FFTs and such.Article: 131738
Mike, Have you double-checked the range settings of the DCM? That would be my first guess... /MikhailArticle: 131739
Augusto, The very first silicon of the 32 part had that issue. Subsequent silicon did not have the issue. Austin AugustoEinsfeldt wrote: > Austin, > Thanks for your reply. Actually there is the AR#20810 where the power- > up seems to be important issue to make the PROM visible in the JTAG > chain. But it is not clear if it is related to the STMicro memory (in > the AR) because in the description it says XCF32. > I did try to disconnect the XCF02´s VCC pins, and reconnect after 2V5 > and 1V2 are stable but the memory still not there. > There is a disruption on the data on TDO since the tool can see the > other devices but cannot get their ID properly. So, the memory is > doing something. It´s TDO pin shows activity during the chain > initialization. > My current thoughts are about TMS and TCK slew rate and a possible > sensitivity of this PROM on this matter. > Best regards, > Augusto >Article: 131740
Eilert's comments are spot-on. You also might try: assign len_lt = addr[11:2] + length < 13'h401; to get rid of the <= comparison, since < is just the sign bit of a subtraction. Depending on which synthesis tool you are using, it may have already applied this transformation. You can also try to rearrange the inequality: assign len_lt = addr[11:2] < 13'h401 - length; or assign len_lt = length < 13'h401 - addr[11:2]; If either length or addr is available a clock cycle earlier, then their combination with the constant can be performed a clock cycle early, and the final single comparison (subtraction) performed when you need it. AndyArticle: 131741
All, For those working to use your own AES256 decryption software for decrypting the bitstream, sorry I can't be of more help. I have been told what I have related here (generally how to do it). I haven't done it myself, but I certainly trust our test benches which show it is being done, and I also trust those who have independently verified it (No Such Agency, etc.) and are happy. I will find out which CRC we are using, and post here later. I believe it is a 32 bit CRC, but there is more than one "standard" 32 bit CRC out there, so I will find which polynomial is used. I think it is also important to know what the CRC covers for the bitstream (config frames+?+overhead?+commands?). It may be you have to run bitgen on the commond line, and specify every possible option (yea or nay), so that the encrypted, and unencrypted differ only in the encryption. Not sure if this is done through the gui if the options are identical with the exception of the encryption. AustinArticle: 131742
Mike, How is the "DCM Performance Mode" attribute set? "MAX_SPEED" or "MAX_RANGE"? "MAX_RANGE" should be used with an oscillator of 25 MHz. The process may have shifted to be slightly faster, and if the attribute was "MAX_SPEED" it could be the frequency input is too slow for that mode. AustinArticle: 131743
http://www.xilinx.com/support/documentation/data_sheets/ds302.pdf table 45. Also shows that 25 MHz is too low (per the specification). You may have just been lucky (process variations in last lot made it 'work'). The DCM delay line is temperature compensated, so it may be that when extremely cold, it is slowed down so much that it works with 25 MHz. Austin austin wrote: > Mike, > > How is the "DCM Performance Mode" attribute set? > > "MAX_SPEED" or "MAX_RANGE"? > > "MAX_RANGE" should be used with an oscillator of 25 MHz. The process > may have shifted to be slightly faster, and if the attribute was > "MAX_SPEED" it could be the frequency input is too slow for that mode. > > AustinArticle: 131744
Hi, has anybody tried using co-sim for Handel C with modelsim? I managed to set up the co-sim environment, and got the handel C code to work with my EDK generated microblaze environment (in VHDL). In short, I am using handel C to build a peripheral which i attached to microblaze via the FSL bus. The simulation works ok when I used Handel-C + VHDL using the cosim manager provided by PDK. However, when I used Handel C DK to generate the VHDL equivalent of the original handel C design, and then re- simulate, the results are now different. The data results which i get from the re-generated core in VHDL is different from the one which I get when i use PDK's co-sim manager to interface between microblaze (in VHDL) + the peripheral (in Handel- C). Question: who to trust? should I trust co-sim or should I trust the VHDL simulator (modelsim)? any takers on this? :) ChrisArticle: 131745
chrisdekoh@gmail.com wrote: > Question: > who to trust? should I trust co-sim or should I trust the VHDL > simulator (modelsim)? I would trust modelsim. Send your code to Celoxica. -- Mike TreselerArticle: 131746
The CRC32 used internally is not a standard. AustinArticle: 131747
On 30 Apr., 16:45, Andy <jonesa...@comcast.net> wrote: > assign len_lt = length < 13'h401 - addr[11:2]; 'h401 - addr = ~addr +1 + 401 If you can compare with 3FF instead of 401 you can save most of the adder. assign len_lt = length < 13'h400 + ~addr[11:2]; In this formulation only the upper three bits are used in the addition. Kolja SulimmaArticle: 131748
>Thanks for the replies. I don't need help with the code for the ROM. >No problem with my writing the code and letting ISE infer the ROM, >however what if I wanted to use block memory for a dual-port memory. You mention ROM in your original post. That's what I answered. >This is a non-trivial coding effort. I haven't done any elaborate dual port BRAMs because everthing I have done fits into a single or maybe two BRAMs. Yeah, I suppose spreading init data among several BRAMs is not trivial and the combining of addresses and outputs. So maybe you should spill your requirements and see if someone can help? >I am wondering what use are the >memory cores if I am better off writing the code myself? I thought >the purpose of the cores is not only to help the user by having the >code pre-written by Xilinx's experts, but also to ensure that the >design is optimal for fitting into the the FPGA. That maybe true. >I used a core adder/ >subtractor and comparator with no problems. And did the core work better than an inferred adder/subtractor? >The trick with the memory >seems to be simulating the memory with the contents loaded. so maybe >the dual-port memory would work OK. Thanks, CharlesArticle: 131749
On May 1, 2:08 am, Kolja Sulimma <ksuli...@googlemail.com> wrote: > On 30 Apr., 16:45, Andy <jonesa...@comcast.net> wrote: > > > assign len_lt = length < 13'h401 - addr[11:2]; > > 'h401 - addr = ~addr +1 + 401 > > If you can compare with 3FF instead of 401 you can save most of the > adder. > > assign len_lt = length < 13'h400 + ~addr[11:2]; > > In this formulation only the upper three bits are used in the > addition. > > Kolja Sulimma thx, i mean speed. my solution: assign addr_pg=addr[11:2]+length; always @(*) begin if(addr_pg[10]) //11'h400 begin if(|addr_pg[9:0]) tlp_length=start_dist4k; else tlp_length=tlp_length1; end else begin tlp_length=tlp_length1; end end tlp_length is other variable. it can remove a big comparator. any better coding style?
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z