Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
methi wrote: >>Is the shift register the only thing in the design? >> >> > >Nopes I have my design doing a whole lot of things...but its only when >I changed the length of the shift register that I came across the >Mapping error. > > > >>Are you using a reset for those shift elements? >> >> >I am not using any reset...It takes in a clock...and shifts a bit for >every rising edge of the clock... > > > >>Is it serial-in, serial-out? >> >> > Yes its serial_in and serial_out > > > >>Are there frequency constraints? >> >> >No. > > > >>Is the shift register fixed in length or variable? >> >> >Its a variable shift register....the length is determined by an input >variable called "right". > >The code is as follows: >entity shifting_two is > Port ( shiftin : in std_logic; > clock_in : in std_logic; > right : in integer; > shiftout : out std_logic); >end shifting_two; > >architecture Behavioral of shifting_two is > >signal shift_register : std_logic_vector ( 3454 downto 0 ):= (others => >'0'); > >begin > >process(clock_in) >begin >if rising_edge(clock_in) then >shift_register <= shift_register( 3453 downto 0 ) & shiftin; >shiftout <= shift_register(right-1); >end if; >end process; > > >end Behavioral; > > >How can I use a BlockRam...? > > > For that, you want to use a block RAM, which will give you up to 18K bits length. You can either use two address counters, one for the read side of the memory and one for the write side, and offset the address of the write counter so that the read count trails it by N where N is the shift register length, or you can use a single counter set up as a modulo N count (use a loadable downcount for that, thereby keeping it to one level of logic. If you were using an older Xilinx fpga, you'd need to delay the read by a clock relative to the write for this second scheme because they didn't support read before write operation. With the spartan3, you can set the attribute on the bram for read first, which allows you to apply the same address to both the read and the write ports. It is easier if you instantiate the BRAM rather than trying to let the software figure it out. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 87276
Look at the map report (*.mrp). There is a section that shows the trimmed logic. The lines beginning in the leftmost columns are the root of everything under them that gets pulled out. Usually this is because of an output that is unconnected, or perhaps a clock enable that is always zero. The map report is the secret to figuring out what caused everything to get ripped out. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 87277
Vaughn Betz wrote: >Tim, > >The only way I trust to compare the logic capacity of two different >architectures is to benchmark them against each other. Modern FPGA >architectures, like modern processor architectures, are too complex to say >which is more area-efficient, and by how much, based on a hand analysis. >Think of trying to guess if a P4, P3 or Athlon is faster based purely on the >specifications of their pipelines, issue units and clock rates -- it is >impossible, so you have to benchmark them. FPGAs have hit that level too. > > Vaughn even then it can be darned near impossible. Each of the FPGAs considered here have a unique set of extra features that can be exploited in a design, and if I design my design to those features it will nearly always make it map worse to the other FPGA. These two devices are essentially equal in size, so the typical 20% or more area savings one can squeeze out of a design by designing to the architecture can easily tip the balance toward whichever device you want to "win". This is a big problem I have with benchmarks. If you really want to compare the devices, you need to have several experts all design it independently for particular targets and then compare the optimized designs. Then, of course that result is only valid for that set of specifications. sure, it can be extrapolated for other designs, but the less the benchmarked design is like the user's design the less useful that benchmark is. I've always espoused looking at the fpga like a box of legos. You build what you can out of the pieces that come in the box. An average user is going to turn out some average looking stuff, and they might even tend to look somewhat alike. There will be a few guys in the room that figure out how to use some of the pieces in their box to do some really neat things so that they end up with something that is cooler than alll the others. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 87278
Hi Ray, Are you talking about a BRAM core available in Xilinx? Ive only come across the RAM based shift reg which goes upto 1024 bits (xilinx 6.3i) Or Should I be working on a BRAM code in vhdl... Thanks, MethiArticle: 87279
Vladislav, The warnings are related to the I/O's of my top level module. Release 7.1.03i Map H.41 Xilinx Mapping Report File for Design 'sata_device_gasket' Design Information ------------------ Command Line : C:/Xilinx_71i/bin/nt/map.exe -ise e:\gasket_xilinx\gasket_xilinx.ise -intstyle ise -p xc4vlx40-ff668-11 -timing -ol high -t 1 -register_duplication -cm speed -detail -ignore_keep_hierarchy -pr b -u -k 4 -c 100 -o sata_device_gasket_map.ncd sata_device_gasket.ngd sata_device_gasket.pcf Target Device : xc4vlx40 Target Package : ff668 Target Speed : -11 Mapper Version : virtex4 -- $Revision: 1.26.6.4 $ Mapped Date : Wed Jul 20 10:42:22 2005 Design Summary -------------- Number of errors: 0 Number of warnings: 87 Logic Utilization: Number of Slice Flip Flops: 573 out of 36,864 1% Number of 4 input LUTs: 1,296 out of 36,864 3% Logic Distribution: Number of occupied Slices: 865 out of 18,432 4% Number of Slices containing only related logic: 865 out of 865 100% Number of Slices containing unrelated logic: 0 out of 865 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number 4 input LUTs: 1,304 out of 36,864 3% Number used as logic: 1,296 Number used as a route-thru: 4 Number used as Shift registers: 4 Total equivalent gate count for design: 13,582 Peak Memory Usage: 291 MB Table of Contents ----------------- Section 1 - Errors Section 2 - Warnings Section 3 - Informational Section 4 - Removed Logic Summary Section 5 - Removed Logic Section 6 - IOB Properties Section 7 - RPMs Section 8 - Guide Report Section 9 - Area Group Summary Section 10 - Modular Design Summary Section 11 - Timing Report Section 12 - Configuration String Information Section 13 - Additional Device Resource Counts Section 1 - Errors ------------------ Section 2 - Warnings -------------------- WARNING:Map:120 - The command line option -c can not be used when running in timing mode. The option will be ignored. WARNING:LIT:243 - Logical network mrconStart has no load. WARNING:LIT:243 - Logical network rst_spd_chng_en has no load. WARNING:LIT:243 - Logical network genAlign_i<1> has no load. etc. WARNING:LIT:243 - Logical network spareIn<1> has no load. WARNING:LIT:243 - Logical network spareIn<0> has no load. WARNING:LIT:243 - Logical network spareOut<1> has no load. WARNING:LIT:243 - Logical network spareOut<0> has no load.Article: 87280
methi wrote: >Hi Ray, > >Are you talking about a BRAM core available in Xilinx? > >Ive only come across the RAM based shift reg which goes upto 1024 bits >(xilinx 6.3i) > >Or Should I be working on a BRAM code in vhdl... > > >Thanks, > >Methi > > > I was talking about a VHDL module with an instantiated RAMB16 primitive in it. Something like this should do the trick for you: nxt_addr<= to_unsigned(modulo-2,abits+1) when addr(abits)='1' else addr-1; process(clk) begin if clk'event and clk='1' then if ce='1' THEN addr<=nxt_addr; end if; end if; end process; a_addr<= std_logic_vector(resize(addr(abits-1 downto 0),14)) U1: ramb16_s1 --synthesis translate_off generic map( WRITE_MODE_A => "READ_FIRST", WRITE_MODE_B => "READ_FIRST") --synthesis translate_on port map( DIA => b_in, ENA => ce, WEA => '1', SSRA => '0', CLKA => clk, ADDRA=> a_addr, DOA =>b_out); Since your shift register is only 3K+ long, you could even use the other RAMB16 port for something else by setting the upper address bits to opposite constant values on each port. The loadable down counter reloads when it counts past 0 to -1. It is coded this way to give the synthesis enough of a hint not to stick the load function in as a multiplexer after the carry chain, which would force it to two levels of logic. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 87281
Vladislav, I agreee. And the nicest thing is that you can fold two BlockRAMs into one, by using the two ports independently. So one BlockRAM takes care of 24 inputs and generates two sets of 4 bits each. That means you need only 3 BlockRAMs for up to 72 inputs. (plus a few CLBs to combine the outputs, unless you want to use two more BlockRAMs to do that) 5 BlockRAMs total gives a 2-clock latency. It all depends what you are after, speed or cost. Peter AlfkeArticle: 87282
methi wrote: > I am just using the output of my shift register...which is the MSB > (this keeps changing depending on the value of the variable > "right")...as input to another component....its a pulse... If you are delaying a single pulse/edge for 3454 ticks, there are easier ways to do it than with a shift register. -- Mike TreselerArticle: 87283
Call me biased, but I was thoroughly bored by that presentation. It's the tired old "my 6-input LUT does more than your two 4-input LUTs". Ten years ago, that would have been exciting. "XC3000 LUTs are better than the simple old XC2064 LUTs". They were, really! Today's FPGA are not just LUTs, but LUT-RAMs, SRL16s, ISERDES and programmable IDELAYs, FIFO controllers, PPC, Ethernet controllers, Multi-Gigabit Transceivers, cascadable MACs, clock management, and much more. It's like a car salesman bragging: "My trunk is bigger than your trunk, if you measure it my way, with my set of boxes". But of course, I am biased. I felt pretty good after that hour. If that's all they can throw at us... Peter AlfkeArticle: 87284
Hi Peter, > Vladislav, I agreee. And the nicest thing is that you can fold two > BlockRAMs into one, by using the two ports independently. So one > BlockRAM takes care of 24 inputs and generates two sets of 4 bits each. > That means you need only 3 BlockRAMs for up to 72 inputs. (plus a few > CLBs to combine the outputs, unless you want to use two more BlockRAMs > to do that) 5 BlockRAMs total gives a 2-clock latency. > It all depends what you are after, speed or cost. I personally feel that using blockrams is a bit wasteful - I coded something up in VHDL that used 144LEs in an Altera Cyclone 1, slowest speed grade, running at 115MHz with two clocks of latency as well. No idea how big that would be in a Spartan - my guess is that it would be similar. Then again, if there's no LUTs left, and there's some leftover BRAMs, then sure this is a great solution. BTW: Peter, would you (plural) mind if I downloaded a WebPack so I can compare? Best regards, BenArticle: 87285
Methi, Take a BlockRAM, with both ports configured as 16K x 1. Make one port Write and the other one Read. Clock both ports with your data clock. Drive the Write address with a counter that you increment with the data clock. Drive the Read address from a subtractor circuit that subtracts the length N of your shift register from the Write address. Now you have a programmable-length shift register from the D input of the write port to the Q output of the Read port. And you get up to 16K bit length in a single BRAM plus four CLBs (14 bit counter plus 14-bit subtractor). Peter AlfkeArticle: 87286
Mike Treseler wrote: > methi wrote: > > > I am just using the output of my shift register...which is the MSB > > (this keeps changing depending on the value of the variable > > "right")...as input to another component....its a pulse... > > If you are delaying a single pulse/edge for 3454 ticks, > there are easier ways to do it than with a shift register. Exactly. He should use a counter that's initialized to required delay and enabled when he sees his input pulse. It counts down, and when it hits zero, an output pulse is generated and the counter is preset back to his initial value. There's nothing like getting set on one solution and stubbornly pursuing it to the point where you're blind to other, simpler, solutions. -aArticle: 87287
I do find it a little humorous that about 5-10 years ago that someone published a paper at FPGA that claimed better speed and utilization using a 3-LUT rather than a 4-LUT, and that the presenter was someone that was fairly closely aligned with Altera. I think it might have come out of Jonathon Rose's students in Toronto. Anyway, the pendulum has obviously swung to bigger LUTs are better at A. Again, for designs that use FPGA fabric correctly (ie, not levels upon levels of combinatorial logic), the LUT size is not as big a deal. Like I mentioned earlier, you work with what you have in the box of Legos. That said, I generally AVOID using the F5 and F6 LUT expanders in Xilinx because they slow a design down a bit, and more importantly tend to be one of the things that, if placed, give the mapper fits. They also don't match the bit pitch of the arithmetic. The underlying structure is important that it is done right, but let's not get wrapped around the axle here. Those bigger luts are going to tend to be underutilized. The bypass is of limited value too if you attempt to keep all your logic to one level of LUTs, as the LUTs will nearly always be associated with a flip-flop. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 87288
Ben Twijnstra wrote: Ben, you are correct, IF you need the block RAMs elsewhere in your design, or if they are not located conveniently with respect to the logic this is related to. Using LUTs, it can be done in 5 layers of logic, which even without pipelining but with floorplanning will run pretty quickly. If you pipeline it on every layer, it might even out-perform the BRAM , but only if you are very careful about the placement. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 87289
Peter Alfke wrote: >Methi, >Take a BlockRAM, with both ports configured as 16K x 1. >Make one port Write and the other one Read. Clock both ports with your >data clock. >Drive the Write address with a counter that you increment with the data >clock. >Drive the Read address from a subtractor circuit that subtracts the >length N of your shift register from the Write address. >Now you have a programmable-length shift register from the D input of >the write port to the Q output of the Read port. >And you get up to 16K bit length in a single BRAM plus four CLBs (14 >bit counter plus 14-bit subtractor). >Peter Alfke > > > Peter, with SPartan3, he can do it with one port of the BRAM if he uses a modulo-N count instead of a straight 14 bit binary count. I showed this in the code I posted earlier. The modulo N count is easy if you do it as a loadable down-count that reloads itself when it goes negative. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 87290
Mike Treseler wrote: > methi wrote: > >> I am just using the output of my shift register...which is the MSB >> (this keeps changing depending on the value of the variable >> "right")...as input to another component....its a pulse... > > > If you are delaying a single pulse/edge for 3454 ticks, > there are easier ways to do it than with a shift register. > > -- Mike Treseler Mike, I assumed this was more like a line buffer where he needed to delay a sequence of bits by the 3454 clocks. If it is indeed just a delay from a single pulse and you can guarantee that another pulse does not occur until the first one has propagated out, then it can be done with just a counter. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 87291
I am trying to delay a pulse by N ticks where N takes a maximum value of 3454...N is a variable here....Article: 87292
Ray, pretty clever. But more difficult to understand, or to modify. And what do I do with the unused port? But nevertheless, hats off to a smart solution... PeterArticle: 87293
I wasn't suggesting you should switch to verilog, just the code that I showed is Verilog but the concept should translate directly yo VHDL. Add 64 1-bit values in a single VHDL line. If the synthesizer doesn't do a good job, have eight lines of eight values each then add those 8 4-bit results in one line to get your 7-bit result. "Brad Smallridge" <bradsmallridge@dslextreme.com> wrote in message news:11dt99klhsiu2eb@corp.supernews.com... > I would like to switch to Verilog, but not on this project. > > > If I were to do it in Verilog, I might use > > always @(posedge Clk27M) > > TotalOnes <= in[0]+in[1]+in[2]+in[3]+in[4]+in[5]+in[6]+... and continue > > typing until I reach +in[63]; > > > >Article: 87294
On Wed, 20 Jul 2005 16:14:58 -0400, Ray Andraka <ray@andraka.com> wrote: >Vaughn Betz wrote: > >> ... Stuff justifying their benchmarking ... > >Ray Wrote: >Vaughn even then it can be darned near impossible. Each of the >FPGAs considered here have a unique set of extra features that >can be exploited in a design, and if I design my design to those >features it will nearly always make it map worse to the other FPGA. >These two devices are essentially equal in size, so the typical >20% or more area savings one can squeeze out of a design by >designing to the architecture can easily tip the balance toward >whichever device you want to "win". > >This is a big problem I have with benchmarks. If you really want >to compare the devices, you need to have several experts all design >it independently for particular targets and then compare the >optimized designs. Then, of course that result is only valid for >that set of specifications. sure, it can be extrapolated for other >designs, but the less the benchmarked design is like the user's >design the less useful that benchmark is. Actually, what this does is benchmark your "experts". And the results at best are only valid if you then use that expert for your design :-) Each vendor should ship an expert to each customer with each SW release. and there should be service packs for these experts that ship at the same time as the service packs for the software. (what's the ASCII for a half-smiley?) PhilipArticle: 87295
Hello, I am interfacing a XCV1000 to an old MIPS R3000. I am generating a 50 MHz clock using a DLL with an input clock of 25 MHz. The spec for the R3000 requires certain timing for 4 output clocks based on the 50 MHz clock. One clock is the reference 50MHZ clock. Two of the output clocks need to be delayed by 6 ns and the fourth clock by 12 ns. I can use a second DLL to generate 90 and 180 degree phase shifted clocks, but that gives me 5 ns and 10 ns and not 6 ns and 12 ns, respectively. There are some other delayed signals that I need to generate as well. One signal has to lag the 50 MHz clock by 3 ns. Is there a way to easily generate specified signal delays? For registered values, the OFFSET keyword seems to be appropriate, but I am just using output buffers on the clock signals now. This is for a research project using a custom PCB with 4 processor tiles. Each processor tile has 2 XCV1000s, 1 MIPS R3000 and R3010 and SRAMs for the cache. There are also 2 XC2V6000s to control processor communication and a second level of SRAMs. The other approach is inserting buffers and forcing XST to not optimize them away. Are there easier approaches that have worked for others? I am using ISEE 6.3i. I also have access to Modelsim and an oscilloscope. Thanks for the suggestions, John DavisArticle: 87296
I did this with 63 inputs all 32bits wide in a plain virtex 800 many yrs ago If you are building a syncronizer for a 64 bit sync field, if you can cut off 1 bit either the 1st or last and use 63 bits, you can save the last row of adders. Since mine was 32 wide it save alot more than 6 adders. The 1 bit loss probably wouldn't affect a syncronizer application. I wouldn't want to replicate 3 BRAMs 32 times though. Whats the application?Article: 87297
tim wrote: > Interestingly, Altera will not disclose how they combine the 2 4-input > LUTs to provide the flexibility. Stratix II ALM: http://www.altera.com/literature/hb/stx2/stx2_sii51002.pdf Page 8 Virtex-4 slices: http://www.xilinx.com/bvdocs/userguides/ug070.pdf Pages 166 and 167 ----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups ----= East and West-Coast Server Farms - Total Privacy via Encryption =----Article: 87298
>Still I am not sure if those 2 problems are really problems -: >Anyway when I run 'hyperterminal' (with correct setting - 9600baud, >8data bits, No Parity, 1 Stop bit and No flow control), nothing appears >in hyperterminal. So it should be something wrong. power cycle the board while trying to connect.Article: 87299
Ok, hi all, I'm new to fpgas and am having some fun with an Altera UP3 kit. In the app I'm developing I have a component that I use 8 times in parallel and the problem is that in the logic of this component are two divide by 5 performed on integer variables. I can't use bit shifting obviously, but is there cheap in terms of LEs way to do a divide by 5 other than with a divide? This is killing me because the divides use something like 1500 LEs which is almost 2X larger than the rest of the logic. If there's no other reasonable way I think I can rearchitect it and make it a divide by 4 and thus make available the use of bit shifting. Any help appreciated, thanks. Here's a dump of all the crap quartus seems to be adding for the divide: Info: Found 1 design units, including 1 entities, in source file ../../../../../../../altera/quartus50sp1/libraries/megafunctions/lpm_divide.tdf Info: Found entity 1: lpm_divide Info: Found 1 design units, including 1 entities, in source file db/lpm_divide_smf.tdf Info: Found entity 1: lpm_divide_smf Info: Found 1 design units, including 1 entities, in source file db/sign_div_unsign_uig.tdf Info: Found entity 1: sign_div_unsign_uig Info: Found 1 design units, including 1 entities, in source file db/alt_u_div_1od.tdf Info: Found entity 1: alt_u_div_1od Info: Found 1 design units, including 1 entities, in source file db/add_sub_ke8.tdf Info: Found entity 1: add_sub_ke8 Info: Found 1 design units, including 1 entities, in source file db/add_sub_le8.tdf Info: Found entity 1: add_sub_le8 Info: Found 1 design units, including 1 entities, in source file db/add_sub_me8.tdf Info: Found entity 1: add_sub_me8 Info: Found 1 design units, including 1 entities, in source file db/add_sub_ne8.tdf Info: Found entity 1: add_sub_ne8 Info: Found 1 design units, including 1 entities, in source file db/add_sub_la8.tdf Info: Found entity 1: add_sub_la8
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z