Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Hi, i would like to know is there is any tools available from xilinx or other to split the design into multiple fpga and synthesize... regards subinArticle: 113951
Ralf Hildebrandt schrieb: > Ben Jackson schrieb: > >>> reg num = 7; >> That's almost certainly wrong. > > Initial values are ignored during synthesis. -> Create a reset for it! Is it true for Verilog? Because at least XST regards initial values in VHDL. I know some years ago, they had been ignored. Bye TomArticle: 113952
Hi all, These days, I found my Verilog code reading speed is not fast like my C/C++ reading speed. It take me a lot of time to understand Verilog code than C/C++ code. So, I want to read through a small CPU core (I prefer line <10k) to improve my Verilog coding reading/writing skills. Please recommend a small open CPU core or other things with good document and coding style. Any suggestions about improve Verilog code reading speed is welcome! Best regards, ShenliArticle: 113953
Hi People , I am using a custom BRAM , but it is not getting synthesised , it keeps giving the error that ERROR:NgdBuild:604 - logical block 'my_transmitter_0/my_transmitter_0/dpram0' with type 'custom_bram_0' could not be resolved. A pin name misspelling can cause this, a missing edif or ngc file, or the misspelling of a type name. Symbol 'custom_bram_0' is not supported in target 'virtex2p'. custom BRAM generated using ----> Xilinx Core Generator Xilinx ISE functional simulation -----> completed successfully Xilinx EDK BFM simulation -----> completed successfully Netlist ------> completed successully Bit Stream Generation --------> FAILURE There has been traffic on this groups regarding the problem that I am facing, but none of the solutions that have been proposed are working in my case. I have designed a OPB Master Slave Peripheral , in which I have used a Dual Port RAM generated by XilinxCoreGenerator. I name this core custom_bram.The following files are generated 1)custom_bram.asy 2)custom_bram.edn 3)custom_bram.sym 4)custom_bram.v 5)custom_bram.veo 6)custom_bram.vhd 7)custom_bram.vho 8)custom_bram.xco 9)custom_bram_flist.txt I copied custom_bram.vhd into my user directory and instantiated it as a component in the main module. Things that I have tried: 1) custom_bram uses an entity called XilinxCoreLib.blkmemdp_v6_3 defined in the library XilinxCoreLib . I copied all the related files from the blkmemdp_v6_3 into the pcores/hdl/vhdl directory. 2) i copied the custom_bram.edn file generated into a directory called /pcores/netlist and in the my_transmitter_0.mpd file made an entry specifying that OPTIONS STYLE = MIX. 3) In the instantiation of the custom_bram i have tried my_transmitter_v1_00_0.custom_bram ( as suggested in one of the posts on this group) 4) I am using 4 block rams in this designs ... In of the documents that I read , it was stated that you cannot make multiple instantiations of the same module, so i made 4 copies of the custom_bram , renamed them and referred to each only once None of these have worked .... Any ideas ? :) Thanks VenuArticle: 113954
"subint" <subin.82@gmail.com> wrote in message news:1167456641.853139.193370@s34g2000cwa.googlegroups.com... > Hi, > i would like to know is there is any tools available from > xilinx or other to split the design into multiple fpga and > synthesize... Have a look at Certify (http://www.synplicity.com/products/certify/index.html), BYO (http://www.byo-solutions.com/index.htm) and Auspy (http://www.auspy.com/), Hans www.ht-lab.com > regards > subin >Article: 113955
Shenli wrote: > Hi all, > > These days, I found my Verilog code reading speed is not fast like my > C/C++ reading speed. It take me a lot of time to understand Verilog > code than C/C++ code. > > So, I want to read through a small CPU core (I prefer line <10k) to > improve my Verilog coding reading/writing skills. Please recommend a > small open CPU core or other things with good document and coding > style. > > Any suggestions about improve Verilog code reading speed is welcome! Have a look at the LatticeMico32: http://www.latticesemi.com/products/intellectualproperty/ipcores/mico32/index.cfm Cheers, JonArticle: 113956
"Thomas Reinemann" <tom.reinemann@gmx.net> wrote in message news:en54kt$s5u$1@news.boerde.de... > Ralf Hildebrandt schrieb: >> Ben Jackson schrieb: >> >>>> reg num = 7; >>> That's almost certainly wrong. >> >> Initial values are ignored during synthesis. -> Create a reset for it! > Is it true for Verilog? Because at least XST regards initial values in > VHDL. I know some years ago, they had been ignored. As a blanket statement, Ralf is incorrect in stating that "initial values are ignored during synthesis". First of all it depends on the target device: Does the target device have a defined state at power up (CPLD) or after configuration (FPGA). Many devices do have such a definition. The second consideration is the tool set used to synthesize the bitstream from the source code. Some tools might not support an initial value. It really does not depend on the language itself but the synthesis tool. In any case, it's not hard to find a device and tool that will support initial values. Ralf's advice to use a reset though is well founded. Having something that depends solely on the power up reset state is 'usually' not sound design practice. Again though there are exceptions, the shift chain that one should use to generate a synchronous reset being a good example. Kevin JenningsArticle: 113957
I want to transmit a set of data R to FPGA board. The ABS|R|<=1. Initally, I enlarge these data by multiplying 2^7. If the data is negative, the data will plus the 2^16. Actually, I did not think too much at the beginning. The result back from FPGA board is correct. Now, I am thinking if the data is negative, the data should plus 2^8 instead of 2^16. But the result is wrong. I am really confused now. Could you tell me what's wrong with it? Thank you.Article: 113958
I would like to add some information here. I transmit these data by separating them as 2 parts like this: for i= 1: N^2 if R1(i) < 0 R1(i) = R1(i) + 2^8; end for j=1:2 if j==1 R3(k)= rem(R1(i),256); k=k+1; elseif j==2 R3(k)= floor(R1(i)/256); k=k+1; end end end Then I use a double-ports ram in FPGA to store these data, I use 8bits wideth data port to receive these data. Data is out from a port with 16bits width. So these data resume to the original ones. I guess if these data actually needs 16bits, so I plus 2^8 will get the wrong result. I cannot convince myself. Does anybody know something about it? Thank you.Article: 113959
Hi, I would like to connect many independent data source/targets to a common data stream. There will be a 36-bit static RAM block of 2^20 words (9x IDT71V428-12) running as fast as possible, i.e. at ~83MHz, which is supposed to be the main storage of the system and a number of completely unsynchronized components, trying to send/receive their data streams to/from the RAM block. The FPGA chip will be a Spartan 3 or 3E, I haven't chosen it yet. The FPGA will host, among other things, the following components: a) a 2-way 18-bit SIMD fixed-point complex math processor running at 65 MHz. All its simple scalar instructions should complete in 1 cycle, which is doable, as there are hardware 18x18 multipliers. It will thus consume 292,5 MiB/s of the avaliable bandwidth. b) a high-speed USB2.0 bidirectional 8-bit datalink running at 48Mhz, which gives 48 MiB/s. c) an Ethernet 100 controller, full duplex mode => ~20 MiB/s. d) an LCD display driver, about 2 MiB/s. e) many slow links (SPI-like, AC-97 TDMA etc.), won't consume much bandwidth. The total bandwidth is 373 MiB/s, which easily covers the requirements. My idea is to implement a static DMA-like RAM transaction slot allocator, which will grant the bus for the CPU in 65 slots out of 83, in 11 for the USB link etc., but how to implement a bunch of low-latency half-duplex bridges between the 83MHz domain and the remaining ones? I don't want to waste my precious BRAMs for that purpose, so what should I do? Best regards Piotr WyderskiArticle: 113960
Jon Beniston wrote: > Shenli wrote: > > Hi all, > > > > These days, I found my Verilog code reading speed is not fast like my > > C/C++ reading speed. It take me a lot of time to understand Verilog > > code than C/C++ code. > > > > So, I want to read through a small CPU core (I prefer line <10k) to > > improve my Verilog coding reading/writing skills. Please recommend a > > small open CPU core or other things with good document and coding > > style. > > > > Any suggestions about improve Verilog code reading speed is welcome! > > Have a look at the LatticeMico32: > > http://www.latticesemi.com/products/intellectualproperty/ipcores/mico32/index.cfm > > Cheers, > Jon Hi Jon, Thanks a lot for the information! Is LatticeMico32 easy to understand? Or with good documents describe the Verilog file? Is it with good testbench? Best regards, DavyArticle: 113961
can any one tell em dcm or clock tree and if multiple cloks r there in my design how to handle it bufg concept in clock in fpga i know only dedicated pins in fpga to assign diff clocks whihc r input to fpgaArticle: 113962
Hi Piotr, Piotr Wyderski napisa³(a): > Hi, > > I would like to connect many independent data source/targets > to a common data stream. There will be a 36-bit static RAM > block of 2^20 words (9x IDT71V428-12) running as fast > as possible, i.e. at ~83MHz, which is supposed to be the > main storage of the system and a number of completely > unsynchronized components, trying to send/receive their > data streams to/from the RAM block. The FPGA chip will > be a Spartan 3 or 3E, I haven't chosen it yet. The FPGA will host, among > other things, the following components: > > a) a 2-way 18-bit SIMD fixed-point complex math processor > running at 65 MHz. All its simple scalar instructions should > complete in 1 cycle, which is doable, as there are hardware > 18x18 multipliers. It will thus consume 292,5 MiB/s of the > avaliable bandwidth. > > b) a high-speed USB2.0 bidirectional 8-bit datalink running > at 48Mhz, which gives 48 MiB/s. > > c) an Ethernet 100 controller, full duplex mode => ~20 MiB/s. > > d) an LCD display driver, about 2 MiB/s. > > e) many slow links (SPI-like, AC-97 TDMA etc.), won't consume > much bandwidth. > > The total bandwidth is 373 MiB/s, which easily covers the > requirements. My idea is to implement a static DMA-like > RAM transaction slot allocator, which will grant the bus for > the CPU in 65 slots out of 83, in 11 for the USB link etc., > but how to implement a bunch of low-latency half-duplex > bridges between the 83MHz domain and the remaining ones? > I don't want to waste my precious BRAMs for that purpose, > so what should I do? IMHO you should use at least BRAM for preparing data to/from SRAM's BUS. BUS side should work at 83MHz, but inner side should work faster to accomplish multiplexing data in adequate "slots". I don't know how you like to match Address BUS and Data BUS, If I were you I use second BRAM for matching address. Best Regards, Jerzy GburArticle: 113963
Jerzy Gbur wrote: > IMHO you should use at least BRAM for preparing data to/from SRAM's BUS. Yes, but this way the fast random access time will be lost and the whole system will behave like a DRAM-based system with a tiny cache. Another option is to clock the CPU at 83MHz to match the bus speed and add the HLD signal, like in the old good DMA controllers. It simplifies a lot of things, but the initial question "how to connect many slower participants to the bus?" remains open. In this design some of them can be easily attached, as 83/2 = 41,5 and 83/4 = 20,75, so my USB and Ethernet links could work synchronously with the bus, but many other sources (AC-97 codecs, display) cannot by synchronized this way. > BUS side should work at 83MHz, but inner side should work faster to > accomplish multiplexing data in adequate "slots". It depends what you call "inner side". The CPU is supposed to work at 2--3 times higher frequency than I said, to hide its internal simple pipeline and appear to be one cycle design. But its memory interface is bounded by the available bandwidth. There is a large data source/target domain that _must_ be clocked at 65MHz, but I can connect it via a BRAM to the CPU domain. > I don't know how you like to match Address BUS and Data BUS What do you mean by "bus matching"? Best regards Piotr WyderskiArticle: 113964
salu, http://www.xilinx.com/cgi-bin/search/googleSearch?btnG=Google+Search&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=iso-8859-1&client=xilinx&oe=iso-8859-1&proxystylesheet=xilinx&filter=0&requiredfields=&q=dcm+clock+tree&site=Documentation&submit2.x=0&submit2.y=0&submit2=Search or http://tinyurl.com/yjmq5r Read all about DCM/s, clock trees. Or refine your search for a specific product, and read less. AustinArticle: 113965
Hello, there are some pins on xilinx xc9536 which are called global clock1/2/3 global reset, etc, where are these explained?Article: 113966
Hi, I'm pretty new to verilog and I am trying to write code to compute a mean and store it in RAM. I update the ram each time a new sample comes in and thus the ram becomes my second addend. Here is a bit of the code. Why am I getting multi-source in unit on cal_ram_di? always @(posedge clk) begin if(~done & stepCnt == 0) begin if(cal_cnt == 0) begin addend1 <= {8'h00, samp12}; //pad with zeros to the left, new sample to add to running sum addend2 <= cal_ram_do[20:0]; //the current running sum for the channel gotoffset <= 0; end else if(cal_cnt == 1) begin if(sampCtr == 0) begin cal_ram_di[20:0] <= addend1; //this is the first sample cal_ram_we <= 1; //assert the write enable so we can latch the data in the RAM end else begin cal_ram_di[20:0] <= addend1 + addend2; //every other sample cal_ram_we <= 1; end end else if(cal_cnt == 2) begin if(sampCtr == 511) meanVal <= cal_ram_di[20:9]; //we have monitored for 512 samples so divide by 2^9 (512) --- leave this as di. end else if(cal_cnt == 3) begin if(sampCtr == 511) //we have computed sums for all 512 samples so compute the offset begin cal_ram_di[28:0] <= {(meanVal-12'h800), 16'h0000, 1'b0}; //put the offset in the right spot because if it is not there i can't do the offset subtraction the same the whole time cal_ram_we <= 1; //assert ram write enable and latch the data gotoffset <=1; end end end endArticle: 113967
"Piotr Wyderski" <wyderski@mothers.against.spam-ii.uni.wroc.pl> wrote in message news:en6qbk$p2s$1@news.dialog.net.pl... > Hi, > > I would like to connect many independent data source/targets > to a common data stream. There will be a 36-bit static RAM <snip> > The total bandwidth is 373 MiB/s, which easily covers the > requirements. My idea is to implement a static DMA-like > RAM transaction slot allocator, which will grant the bus for > the CPU in 65 slots out of 83, in 11 for the USB link etc., > but how to implement a bunch of low-latency half-duplex > bridges between the 83MHz domain and the remaining ones? > I don't want to waste my precious BRAMs for that purpose, > so what should I do? > The function that you're describing is an arbitrator; you have multiple sources that need to share access to a shared resource (the SRAM), the management of who gets control of that resource at any particular time is up to whatever arbitration function you choose to implement. If you view it in that context your 'bunch of low-latency half-duplex bridges' will present as much of a challenge as you may think. The best way to go about this is to start with the entity definition for the SRAM arbitration function. Each potential master requires a private interface to the arbitrator, the arbitrator also has a master interface to the external SRAM itself. So if you have 10 potential sources to the SRAM then the arbitrator will have 10 slave interfaces (to each of those sources) plus an SRAM master interface. Next consider the requirements of each of those sources. Do they have some sort of 'wait' signal that will cause it to hold address and write data (during a write) and cause it to hold address while waiting for a read to complete? What kind of read cycle time performance is required? It sounds like you have a handle on the bandwidth requirements but are there any latency requirements (i.e. how long can something 'wait')? If you go about the process as figuring out the requirements of the arbitration function and work through the requirements that each master presents and the target SRAM slave then it should start to fall into place. Kevin JenningsArticle: 113968
On 31 Dec 2006 10:28:22 -0800, "idp2" <ian.peikon@gmail.com> wrote: >Hi, > > I'm pretty new to verilog and I am trying to write code to compute >a mean and store it in RAM. I update the ram each time a new sample >comes in and thus the ram becomes my second addend. Here is a bit of >the code. Why am I getting multi-source in unit on cal_ram_di? > >always @(posedge clk) >begin > if(~done & stepCnt == 0) > begin > if(cal_cnt == 0) Are you sure this is all the code which assigns to cal_ram_di? What you show is a single always block so it's difficult to get a multi-source out of it. Check where else you're using cal_ram_di to see if you're declaring it as input or whether you're assigning to it again. Another comment is that you can change the "if (cal_cnt==0) to a case statement which might give you better performance.Article: 113969
That is only one of my always blocks that works with cal_ram_di. I have two others but they are based ont the conditions if(~done &stepCnt==1) and if(~done & stepCnt ==2)...is that what is causing the problem?? If so how do I fix that? mk wrote: > On 31 Dec 2006 10:28:22 -0800, "idp2" <ian.peikon@gmail.com> wrote: > > >Hi, > > > > I'm pretty new to verilog and I am trying to write code to compute > >a mean and store it in RAM. I update the ram each time a new sample > >comes in and thus the ram becomes my second addend. Here is a bit of > >the code. Why am I getting multi-source in unit on cal_ram_di? > > > >always @(posedge clk) > >begin > > if(~done & stepCnt == 0) > > begin > > if(cal_cnt == 0) > > Are you sure this is all the code which assigns to cal_ram_di? What > you show is a single always block so it's difficult to get a > multi-source out of it. Check where else you're using cal_ram_di to > see if you're declaring it as input or whether you're assigning to it > again. > > Another comment is that you can change the "if (cal_cnt==0) to a case > statement which might give you better performance.Article: 113970
KJ wrote: > Next consider the requirements of each of those sources. Do they have > some sort of 'wait' signal that will cause it to hold address and write > data (during a write) and cause it to hold address while waiting for a > read to complete? Yes, they do. > What kind of read cycle time performance is required? The CPU must run as fast as possible because of its computationally -intensive tasks, but no access time restriction is required, i.e. it is not important whether a particular single load or store takes one or ten cycles to complete, as long as they statistically complete in 1.28 cycle (83/65) on average for a trurly random access pattern. The USB and Ethernet links work similarly, as their master controllers are in the FPGA itself (i.e. no external component screams "feed me!"), so again, there are no real-time requirements. The only real-time components are AC-97 codecs and the display (that is, its pixel bus), but they are slow. > but are there any latency requirements (i.e. how long can something > 'wait')? Fortunately not, only the bandwidth matters. Well, several channels have bounded maximal latency, but it is so long compared to the RAM bus cycle that it could be easily fulfilled by an approprate arbitration function. A simple round-robin prioritizer will be perfectly enough. > If you go about the process as figuring out the requirements of the > arbitration function and work through the requirements that each master > presents and the target SRAM slave then it should start to fall into > place. Well, think of many DMA channels connected to much slower clock domains, it's a good model. The problem is how to pass their data and configuration parameters between the main clock domain and their respective domains. Now I think that a separate RAM clock domain is too hard to be implemented reliably, so I can redesign the system in order to run the CPU at the same clock rate. It will allow me to implement the arbitrator in an old way, i.e. to add the HLD signal to the CPU and state that the DMA controller has higher priority, but it will require more (mostly unidirectional) synchronization bridges elsewhere. They must be made of CLBs, because I need BRAMs for better purposes. Best regards Piotr WyderskiArticle: 113971
"KJ" <kkjennings@sbcglobal.net> wrote: > >"Piotr Wyderski" <wyderski@mothers.against.spam-ii.uni.wroc.pl> wrote in >message news:en6qbk$p2s$1@news.dialog.net.pl... >> Hi, >> >> I would like to connect many independent data source/targets >> to a common data stream. There will be a 36-bit static RAM ><snip> >> The total bandwidth is 373 MiB/s, which easily covers the >> requirements. My idea is to implement a static DMA-like >> RAM transaction slot allocator, which will grant the bus for >> the CPU in 65 slots out of 83, in 11 for the USB link etc., >> but how to implement a bunch of low-latency half-duplex >> bridges between the 83MHz domain and the remaining ones? >> I don't want to waste my precious BRAMs for that purpose, >> so what should I do? >> >If you view it in that context your 'bunch of low-latency half-duplex >bridges' will present as much of a challenge as you may think. The best way >to go about this is to start with the entity definition for the SRAM >arbitration function. Each potential master requires a private interface to >the arbitrator, the arbitrator also has a master interface to the external >SRAM itself. So if you have 10 potential sources to the SRAM then the >arbitrator will have 10 slave interfaces (to each of those sources) plus an >SRAM master interface. > >Next consider the requirements of each of those sources. Do they have some >sort of 'wait' signal that will cause it to hold address and write data >(during a write) and cause it to hold address while waiting for a read to >complete? What kind of read cycle time performance is required? It sounds >like you have a handle on the bandwidth requirements but are there any >latency requirements (i.e. how long can something 'wait')? > >If you go about the process as figuring out the requirements of the >arbitration function and work through the requirements that each master >presents and the target SRAM slave then it should start to fall into place. This is not so difficult to implement. Using a priority encoder and a state-machine which performs a memory transaction, the entire arbiter is almost finished. The trick is to design the state-machine in a way the maximum bandwidth can be used and the bandwidth is shared properly. There is also a different approach which has been discussed in this group before. I believe it is called a ring bus. It seems pretty clever and I will consider using it the next time I have to share a memory between different devices. Daniel Sauvageau wrote something about it before in a thread called 'ddr with multiple users': Why use a ring bus? - Nearly immune to wire delays since each node inserts bus pipelining FFs with distributed buffer control (big plus for ASICs) - Low signal count (all things being relative) memory controller: - 36bits input (muxed command/address/data/etc.) - 36bits output (muxed command/address/data/etc.) - Same interface regardless of how many memory clients are on the bus - Can double as a general-purpose modular interconnect, this can be useful for node-to-node burst transfers like DMA - Bandwidth and latency can be tailored by shuffling components, inserting extra memory controller taps or adding rings as necessary - Basic arbitration is provided for free by node ordering The only major down-side to ring buses is worst-case latency. Not much of an issue for me since my primary interest is video processing/streaming - I can simply preload one line ahead and pretty much forget about latency. Flexibility, scalability and routability are what makes ring buses so popular in modern large-scale, high-bandwidth ASICs and systems. It is all a matter of trading some up-front complexity and latency for long-term gain. -- Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nlArticle: 113972
Shenli wrote: > > Jon Beniston wrote: >> Shenli wrote: >> > Hi all, >> > >> > These days, I found my Verilog code reading speed is not fast like my >> > C/C++ reading speed. It take me a lot of time to understand Verilog >> > code than C/C++ code. >> > >> > So, I want to read through a small CPU core (I prefer line <10k) to >> > improve my Verilog coding reading/writing skills. Please recommend a >> > small open CPU core or other things with good document and coding >> > style. >> > >> > Any suggestions about improve Verilog code reading speed is welcome! >> >> Have a look at the LatticeMico32: >> >> http://www.latticesemi.com/products/intellectualproperty/ipcores/mico32/index.cfm >> >> Cheers, >> Jon > > Hi Jon, > > Thanks a lot for the information! > > Is LatticeMico32 easy to understand? Or with good documents describe > the Verilog file? Is it with good testbench? > > Best regards, > Davy You also could try opencores.org -- JosephKK Gegen dummheit kampfen die Gotter Selbst, vergebens.  --SchillerArticle: 113973
Check your email. -- Per ardua ad nauseamArticle: 113974
On 2006-12-31, <highZ> <> wrote: > Hello, there are some pins on xilinx xc9536 which are called global > clock1/2/3 > global reset, etc, where are these explained? There's a document called something like "XC9500 device family datasheet". Those pins are (optionally) connected to special internal routing resources that make them suitable for use as input clocks and global set/reset. Isn't there also a global tristate? -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z