Messages from 126700

Article: 126700
Subject: EDK 9.2 Woes
From: motty <mottoblatto@yahoo.com>
Date: Thu, 29 Nov 2007 12:53:06 -0800 (PST)
Links: << >> << T >> << A >>

I updated to EDK 9.2 primarily b/c we are building a new project and I
wanted the latest MPMC3 core.  Here are some things that I have
noticed while using it:

1)  Instance name changes in the EDK GUI do not propagate to the MHS
file.  So if you change a name, and the GUI updates, the name changes
back to what it was when the core was pulled in.  You have to manually
change the MHS for a name change to stick.  No big deal.  There is an
easy workaround...but annoying.

2)  The MPMC3 configuration GUI displayed the correct address width
and bank address width for my DDR2 memory.  However, it did not
propagate those parameters to the MHS.  The core then uses the default
MPD values...which are incorrect.  I had to manually add the changed
parameters to the MHS.  What worries me is that there are other
parameters that were not propagated to the MHS.

3)  I am using one port of the MPMC3 as an NPI implementation.  My
custom IP is setup with the correct NPI information (Answer Record
24912).  In EDK 9.1 the MPMC2 NPI port connected directly to my custom
IP in the bus configuration tab area.  Now I cannot connect the MPMC3
to my peripheral.  There are 'No connections' available from the MPMC3
side of things.  My custom peripheral looks the same as in EDK 9.1.  I
had to resort to manually connecting the bus interfaces in the ports
tab.

4)  In simulation, the MPMC3 core goes into calibration mode and runs
infinitely.  I have dug into the HDL enough to know that it is stuck
in one state of the physical layer cal routine.  The DDR2 memory seems
to be behaving itself.

5)  I added an older OPB custom peripheral and all the bus IP and
bridging IP needed.  I assigned an address range for the custom IP.
However, libgen would not update the xparameters.h file with that
address info.  I added it manually, but the next time libgen was run,
that info was blown away.

All of the above used to work when I had an EDK 9.1 project and an
MPMC2 core.  I've opened WebCases for most of these items.

Anyone working with EDK 9.2 experiencing these or other issues?

Article: 126701
Subject: Re: CPU design uses too many slices
From: =?ISO-8859-1?Q?J=FCrgen_B=F6hm?= <jboehm@gmx.net>
Date: Thu, 29 Nov 2007 22:33:56 +0100
Links: << >> << T >> << A >>

rickman wrote:
> 
> If you are trying to fit a given device, then you need to use the full
> map and place portions of the tools as well.  Only then will you know
> for sure that your design won't fit.  But what part is on your board?
> You are using about 75% of available resources.  I can't say for sure
> about your design, but ALU logic can be very light if designed
> properly.  So the rest of your design may fit easily in the part.
> 

   I use a Spartan-3 starter kit with a XC3S200. The utilization figures
I gave above refer to this component. Map and Place I already did, too,
but it did not shrink the design significantly.

   Considering the ALU, it seems that it can become quite heavy. The
utilization figure above are with an ALU that misses some operations
which I really would have liked to implement, especially a r/lshift(x,y)
operation which shifts the 32 bit word x by an amount of y[4:0]. As long
as I kept this in the ALU I nearly had 90% device utilization and, what
is even worse, only maximal 46Mhz speed for the CPU.

> I designed my own 16 bit CPU to have minimal size and it was about 500
> LUTs, IIRC.  Like you, most of the logic was from muxes, so I kept
> them as small as possible, even to the point of eliminating some
> instructions.  Having an extra, unused select line makes them twice as
> large.  BTW, any unused inputs will be optimized out by the tools.  So
> if you don't connect the select input or data inputs, that logic will
> not be generated.

   Here I would like to ask question: if I write the following

wire[4:0] sel;

case (sel)
 0: case0;
..
 15: case15;
endcase

then obviously one specific select-line (sel[4]) won't be used and,
following your argumentation and common-sense intuition, the size of the
multiplexor should be halved. But will this be also the case with

wire[4:0] sel

case (sel)
 3: case3;
 7: case7;
 8: case8;
..
 m: casem;
endcase

where 3,7,8,..,m form a more or less arbitrary 16-element set from the
range 0..31 ?

-- 
Jürgen Böhm                                            www.aviduratas.de
"At a time when so many scholars in the world are calculating, is it not
desirable that some, who can, dream ?"  R. Thom

Article: 126702
Subject: Re: FPGA not in boundary scan
From: John_H <newsgroup@johnhandwork.com>
Date: Thu, 29 Nov 2007 13:34:09 -0800 (PST)
Links: << >> << T >> << A >>

On Nov 29, 12:10 pm, Mike <M...@yahoo.co.uk> wrote:
> > And yes, I have exactly set up this connections to the FPGA board which
> > should be fine but there is still just this one CPLD in my boundary scan!
>
> Alright, my mistake. Actually there is another JTAG interface hiden
> between the two boards. With this one it should work.
>
> Sorry for the confusion!

I'm glad you found the second JTAG chain.  If you have any more
troubles, we should be able to help out better this next round.  Happy
Hunting!

Article: 126703
Subject: Re: CPU design uses too many slices
From: =?UTF-8?B?SsO8cmdlbiBCw7ZobQ==?= <jboehm@gmx.net>
Date: Thu, 29 Nov 2007 22:47:42 +0100
Links: << >> << T >> << A >>

Eric Smith wrote:
> JÃ¼rgen BÃ¶hm wrote:
>>    Indeed I use RAMB16_S36 for microcode-storage, the final design will
>> probably need four of them, as the microcode is more than 36 bit wide.
> 
> You can use a single BRAM as a 72-bit wide single-ported RAM, if you only
> need half the "depth".  For instance, normally the maximum width of a
> Spartan 3 BRAM would be 512x36, but you can combine the two ports to get
> 256x72.
> 
> Obviously if you need greater depth or dual-port this won't help you.
> 
Right, I need the full depth, but there are two other points coming into
play here:

1. I noticed that using a dual port RAMB instead of a single port
increases (slightly) the number of used slices, even if only one port
the RAMB was used. I do not know the reason for this, maybe it is
because some external dual-port logic has to generated and added.

2. More importantly the access delay seems to be shorter for a single
port BRAM - I could lift my design from 46Mhz above the 50Mhz barrier
only by replacing dual port with single port BRAM.

JÃ¼rgen

-- 
JÃ¼rgen BÃ¶hm                                            www.aviduratas.de
"At a time when so many scholars in the world are calculating, is it not
desirable that some, who can, dream ?"  R. Thom

Article: 126704
Subject: Re: CPU design uses too many slices
From: rickman <gnuarm@gmail.com>
Date: Thu, 29 Nov 2007 14:33:58 -0800 (PST)
Links: << >> << T >> << A >>

On Nov 29, 4:33 pm, J=FCrgen B=F6hm <jbo...@gmx.net> wrote:
> rickman wrote:
>
> > If you are trying to fit a given device, then you need to use the full
> > map and place portions of the tools as well.  Only then will you know
> > for sure that your design won't fit.  But what part is on your board?
> > You are using about 75% of available resources.  I can't say for sure
> > about your design, but ALU logic can be very light if designed
> > properly.  So the rest of your design may fit easily in the part.
>
>    I use a Spartan-3 starter kit with a XC3S200. The utilization figures
> I gave above refer to this component. Map and Place I already did, too,
> but it did not shrink the design significantly.
>
>    Considering the ALU, it seems that it can become quite heavy. The
> utilization figure above are with an ALU that misses some operations
> which I really would have liked to implement, especially a r/lshift(x,y)
> operation which shifts the 32 bit word x by an amount of y[4:0]. As long
> as I kept this in the ALU I nearly had 90% device utilization and, what
> is even worse, only maximal 46Mhz speed for the CPU.

Yes, an n stage barrel shifter is a very logic intensive function.  It
can easily be larger than all of the other ALU functions combined.  If
you consider what is required, you in essence need to build a mux with
an input for each possible shift on every bit.  If you are shifting in
zeros instead of rotating the other bits back on the other end, you
can cut your mux roughly in half.  But it is still huge.  If you want
to be able to shift both left and right it is doubled again and if you
want to shift right either arithmetic or logical it is larger yet and
if you want to rotate as well it is even larger.

If you check the details of the slice logic, there should be some
additional gates to allow a pair of 4LUTs to be used to make a 4 input
mux.  I would expect the tools to use this automatically, but I never
trust the tools and I check.  If there is any logic driving the select
inputs rather than being connected to register outputs, that logic can
get mixed in with the mux and make quite an ugly picture.  I don't
know that it is any less efficient, but I can no longer verify how
good it is.  I like to verify the logic my HDL is generating.

> > I designed my own 16 bit CPU to have minimal size and it was about 500
> > LUTs, IIRC.  Like you, most of the logic was from muxes, so I kept
> > them as small as possible, even to the point of eliminating some
> > instructions.  Having an extra, unused select line makes them twice as
> > large.  BTW, any unused inputs will be optimized out by the tools.  So
> > if you don't connect the select input or data inputs, that logic will
> > not be generated.
>
>    Here I would like to ask question: if I write the following
>
> wire[4:0] sel;
>
> case (sel)
>  0: case0;
> ..
>  15: case15;
> endcase
>
> then obviously one specific select-line (sel[4]) won't be used and,
> following your argumentation and common-sense intuition, the size of the
> multiplexor should be halved. But will this be also the case with

Unless you specifically specify don't care for cases 16 to 31, I don't
know what the tool assumes.  I expect it will add sel[4] as an
enable.  But my point is that it won't use the data inputs case16
through case31 which should cut the number of mux LUTs in half.

In fact, (I am very rusty in Verilog working mostly in VHDL) but the
above logic may well generate a latch.  That is what happens with
incompletely specified functions, no?  So sel[4] may end up as an
enable to a latch at the output of the mux.  In VHDL you can't use a
case statement without specifying all possible cases or using an
otherwise case.  If the otherwise is spec'd to output a zero, then
sel[4] will be an enable.  To have sel[4] ignored you would have to
spec the case from 16 to 31 to be the same output as 0 to 15
respectively.

My original statement about the logic being automatically optimized
away would only apply if you designed the mux to have 32 data inputs
and did not drive half of them.  Again, that likely is not legal, but
I don't recall what any particular compiler will do for Verilog.

> wire[4:0] sel
>
> case (sel)
>  3: case3;
>  7: case7;
>  8: case8;
> ..
>  m: casem;
> endcase
>
> where 3,7,8,..,m form a more or less arbitrary 16-element set from the
> range 0..31 ?

This design will use all of the select inputs even if only half of the
data inputs are used.  So again the mux logic will be reduced, but
each data input will be enabled by a full decode all of the sel
inputs.  I don't think the logic will be halved in this case however,
but it depends on how it is implemented.  Again without a fully spec'd
case, it may generate a latch on the output.

Article: 126705
Subject: Re: CPU design uses too many slices
From: rickman <gnuarm@gmail.com>
Date: Thu, 29 Nov 2007 14:38:46 -0800 (PST)
Links: << >> << T >> << A >>

On Nov 29, 4:47 pm, J=FCrgen B=F6hm <jbo...@gmx.net> wrote:
> Eric Smith wrote:
> > J=FCrgen B=F6hm wrote:
> >>    Indeed I use RAMB16_S36 for microcode-storage, the final design will=

> >> probably need four of them, as the microcode is more than 36 bit wide.
>
> > You can use a single BRAM as a 72-bit wide single-ported RAM, if you onl=
y
> > need half the "depth".  For instance, normally the maximum width of a
> > Spartan 3 BRAM would be 512x36, but you can combine the two ports to get=

> > 256x72.
>
> > Obviously if you need greater depth or dual-port this won't help you.
>
> Right, I need the full depth, but there are two other points coming into
> play here:
>
> 1. I noticed that using a dual port RAMB instead of a single port
> increases (slightly) the number of used slices, even if only one port
> the RAMB was used. I do not know the reason for this, maybe it is
> because some external dual-port logic has to generated and added.
>
> 2. More importantly the access delay seems to be shorter for a single
> port BRAM - I could lift my design from 46Mhz above the 50Mhz barrier
> only by replacing dual port with single port BRAM.

This sounds odd to me, but obviously your dual port design is
different from the single port design in other ways than just the
ram.  You need two address busses and control signal sets, not to
mention the two data paths.  How did you connect the dual port ram
that was different from the single port ram?  I am pretty sure the
block ram itself fully implements the dual port memory and does not
require any slices to be used.

Article: 126706
Subject: Re: ISE WARNING Xst:647
From: Mark McDougall <markm@vl.com.au>
Date: Fri, 30 Nov 2007 10:45:42 +1100
Links: << >> << T >> << A >>

Tricky wrote:
> Have you checked to see if ISE hasnt optimised the logic connected to
> those signals away (like you said, often caused by an unconnected
> clock)? Use a post synthesis RTL and Technology veiw to have a look.
> Quartus has them, Im sure ISE must have them too.

I'll check out the RTL viewer - thanks!

Regards,

-- 
Mark McDougall, Engineer
Virtual Logic Pty Ltd, <http://www.vl.com.au>
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266

Article: 126707
Subject: Re: EDK 9.2 Woes
From: John Williams <jwilliams@itee.uq.edu.au>
Date: Fri, 30 Nov 2007 09:47:16 +1000
Links: << >> << T >> << A >>

motty wrote:
> I updated to EDK 9.2 primarily b/c we are building a new project and I
> wanted the latest MPMC3 core.  Here are some things that I have
> noticed while using it:
> 
> 1)  Instance name changes in the EDK GUI do not propagate to the MHS
> file.  So if you change a name, and the GUI updates, the name changes
> back to what it was when the core was pulled in.  You have to manually
> change the MHS for a name change to stick.  No big deal.  There is an
> easy workaround...but annoying.

I'm pretty sure I've renamed peripherals in the 9.2 GUI, and had them 
stick.  If you also had the MHS file open, maybe you re-saved the MHS 
file after modifying the name via the GUI - this might case it to go 
inconsistent.  If you want to tweak the system both by MHS hand edits, 
and via the GUI, you need to be careful, the tool can infer only so much 
of your intention!

> 
> 5)  I added an older OPB custom peripheral and all the bus IP and
> bridging IP needed.  I assigned an address range for the custom IP.
> However, libgen would not update the xparameters.h file with that
> address info.  I added it manually, but the next time libgen was run,
> that info was blown away.

Did you add a driver entry for the bridge ands peripheral in the MSS 
file (or assign a driver in the software platform settings dialog)?

Regards,

John

Article: 126708
Subject: Re: CPU design uses too many slices
From: =?ISO-8859-1?Q?J=FCrgen_B=F6hm?= <jboehm@gmx.net>
Date: Fri, 30 Nov 2007 01:30:13 +0100
Links: << >> << T >> << A >>

rickman wrote:
> On Nov 29, 4:47 pm, Jürgen Böhm <jbo...@gmx.net> wrote:
>> Eric Smith wrote:
>>> Jürgen Böhm wrote:
>>
>> 1. I noticed that using a dual port RAMB instead of a single port
>> increases (slightly) the number of used slices, even if only one port
>> the RAMB was used. I do not know the reason for this, maybe it is
>> because some external dual-port logic has to generated and added.
>>
>> 2. More importantly the access delay seems to be shorter for a single
>> port BRAM - I could lift my design from 46Mhz above the 50Mhz barrier
>> only by replacing dual port with single port BRAM.
> 
> This sounds odd to me, but obviously your dual port design is
> different from the single port design in other ways than just the
> ram.  You need two address busses and control signal sets, not to
> mention the two data paths.  How did you connect the dual port ram
> that was different from the single port ram?  I am pretty sure the
> block ram itself fully implements the dual port memory and does not
> require any slices to be used.
> 

Actually I just used "dummy signals" at the unused port B ADDR, DI,
DO,.. signals. That is, first I wrote (because of laziness, I did only
copy&paste) something like

RAMB16_S36_S36  micro_store ( ... ,.DOB(dummydob), ... );

and used only the port A. (dummydob is a signal left undeclared).

Secondly I wrote explicitly

RAMB16_S36 micro_store (..)

and got the results with faster timing and less slices used.


- Jürgen

-- 
Jürgen Böhm                                            www.aviduratas.de
"At a time when so many scholars in the world are calculating, is it not
desirable that some, who can, dream ?"  R. Thom

Article: 126709
Subject: Re: lossless compression in hardware: what to do in case of uncompressibility?
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Fri, 30 Nov 2007 15:09:49 +1300
Links: << >> << T >> << A >>

Denkedran Joe wrote:
> Hi all,
> 
> I'm working on a hardware implementation (FPGA) of a lossless compression 
> algorithm for a real-time application. The data will be fed in to the 
> system, will then be compressed on-the-fly and then transmitted further.
> 
> The average compression ratio is 3:1, so I'm gonna use some FIFOs of a 
> certain size and start reading data out of the FIFO after a fixed 
> startup-time. The readout rate will be 1/3 of the input data rate The size 
> of the FIFOs is determined by the experimental variance of the mean 
> compression ratio. Nonetheless there are possible circumstances in which no 
> compression can be achieved. Since the overall system does not support 
> variable bitrates a faster transmission is no solution here.
> 
> So my idea was to put the question to all of you what to do in case of 
> uncompressibility? Any ideas?

If you have cast this in concrete : "The readout rate will be 1/3 of the 
input data rate" and you hit any compression case above 33.33%, then 
you are dead in the water : something HAS to give - either discard data, 
or take longer.
You can tolerate 'errant peaks' in the data compression,
by using larger buffers, but the _average_ must remain under 33.33% over 
the buffer size.

-jg

Article: 126710
Subject: Re: CPU design uses too many slices
From: rickman <gnuarm@gmail.com>
Date: Thu, 29 Nov 2007 19:11:10 -0800 (PST)
Links: << >> << T >> << A >>

On Nov 29, 7:30 pm, J=FCrgen B=F6hm <jbo...@gmx.net> wrote:
> rickman wrote:
> > On Nov 29, 4:47 pm, J=FCrgen B=F6hm <jbo...@gmx.net> wrote:
> >> Eric Smith wrote:
> >>> J=FCrgen B=F6hm wrote:
>
> >> 1. I noticed that using a dual port RAMB instead of a single port
> >> increases (slightly) the number of used slices, even if only one port
> >> the RAMB was used. I do not know the reason for this, maybe it is
> >> because some external dual-port logic has to generated and added.
>
> >> 2. More importantly the access delay seems to be shorter for a single
> >> port BRAM - I could lift my design from 46Mhz above the 50Mhz barrier
> >> only by replacing dual port with single port BRAM.
>
> > This sounds odd to me, but obviously your dual port design is
> > different from the single port design in other ways than just the
> > ram.  You need two address busses and control signal sets, not to
> > mention the two data paths.  How did you connect the dual port ram
> > that was different from the single port ram?  I am pretty sure the
> > block ram itself fully implements the dual port memory and does not
> > require any slices to be used.
>
> Actually I just used "dummy signals" at the unused port B ADDR, DI,
> DO,.. signals. That is, first I wrote (because of laziness, I did only
> copy&paste) something like
>
> RAMB16_S36_S36  micro_store ( ... ,.DOB(dummydob), ... );
>
> and used only the port A. (dummydob is a signal left undeclared).
>
> Secondly I wrote explicitly
>
> RAMB16_S36 micro_store (..)
>
> and got the results with faster timing and less slices used.

I don't know the impact of dummydob.  I would expect it to use LUTs to
source a fixed value signals since you are instantiating a fixed ram
block, but I don't really know what the tools would do with that.  If
it doesn't provide signal drivers it would have to minimize the dual
port rams to single port rams since that is all that is being used.  I
don't think the ram blocks can ignore inputs, but again, I don't know
for sure.  One of the Xilinx guys could tell you for sure.

Article: 126711
Subject: Re: Gnd plane coupling with DDR routing from FPGA <-> DDR?
From: John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com>
Date: Thu, 29 Nov 2007 19:39:39 -0800
Links: << >> << T >> << A >>

On Thu, 29 Nov 2007 08:57:26 -0000, "Nial Stewart"
<nial*REMOVE_THIS*@nialstewartdevelopments.co.uk> wrote:

>> I disagree, as does most of the research done into the subject. The use of buried capacitance, 
>> typically by having adjacent power and ground planes separated by as small a distance as possible 
>> (2 thou is normal), has been shown to be very favorable when compared to discrete decoupling caps 
>> because although the capacitance is much lower the inductance is very much lower so the overall 
>> impedence is significantly lower. There is an article about it here: 
>> http://www.ddmconsulting.com/Design_Guides/bcguide.pdf
>
>David,
>
>I too am skeptical about the amount of decoupling that close GND/PWR plane
>coupling can provice.
>
>In that atricle above they quote 560pF/sq inch. Most BGAs are smaller
>than that.

It's not just the footprint under the bga that helps, it's the entire
plane.

I sometimes add a few SMA connector footprints to pc boards, so I can
TDR the power planes relative to the ground plane. It's amazing. A
typical power plane, on an unloaded board, looks like a perfect
capacitor to 20 GHz, with no evidence of reflections or edge effects.
Then if you start adding bypass caps *anywhere*, it just looks like a
bigger perfect capacitor.

I know one guy who doesn't use bypass caps at all, and his stuff works
too.

John

Article: 126712
Subject: Re: Hand solder that FPGA on your prototype
From: Chris Maryan <kmaryan@gmail.com>
Date: Thu, 29 Nov 2007 19:49:38 -0800 (PST)
Links: << >> << T >> << A >>

On Nov 29, 9:43 am, "Tony Burch" <t...@burched.com.au> wrote:
> Hi all,
>
> I've just finished constructing a new site that has web videos of soldering
> techniques, including how to hand solder quad flat packs. Very handy for
> prototyping.
>
> You can go and get a free membership there, which lets you see 5 videos on
> how to hand solder quad flat packs. Watch me hand solder a Spartan 2E in a
> PQ208 package onto a board:)http://supersolderingsecrets.com/
>
> The upgraded membership covers lots of other soldering techniques such as
> "toaster-oven soldering", "frying pan solder pot soldering", etc., but if
> you just wanted to know how to hand solder that FPGA on your prototype, then
> the free membership reveals all:)
>
> Cheers,
>
> Tony Burchhttp://supersolderingsecrets.com/

<rain_on_parade>
www.sparkfun.com has lots of good soldering info, including surface
mount, toasters, hot plates, etc. And the best part is that you don't
have to sign up for anything there.
</rain_on_parade>

Article: 126713
Subject: Re: ISE WARNING Xst:647
From: Mark McDougall <markm@vl.com.au>
Date: Fri, 30 Nov 2007 15:29:29 +1100
Links: << >> << T >> << A >>

Tricky wrote:

> Have you checked to see if ISE hasnt optimised the logic connected to
> those signals away (like you said, often caused by an unconnected
> clock)? Use a post synthesis RTL and Technology veiw to have a look.
> Quartus has them, Im sure ISE must have them too.

OK, now I am officially insane!

I have 2 projects with a lot of common modules which I have been porting
to Xilinx. The 1st has no video output, the 2nd works perfectly.

I was looking at the RTL viewer for the video controller (common to both)
for the 1st project with no video. It shows the X pixel output as being
tied to GND, and NO y pixel output at all! I can't explain why it has
decided to do this, but it would explain why there is no video.

So I go to the working project, and view the RTL for the same controller.
It TOO shows X pixel output tied to GND and NO y pixel output!!!! Let me
reming you, this project works perfectly!

So there you have it, I am certifiably insane! Either I have no clue what
I am looking at, or Xilinx RTL viewer is complete and utter garbage!?! I'm
willing to accept either hypothesis as being true at this point...

Regards,

-- 
Mark McDougall, Engineer
Virtual Logic Pty Ltd, <http://www.vl.com.au>
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266

Article: 126714
Subject: Re: EDK 9.2 Woes
From: motty <mottoblatto@yahoo.com>
Date: Thu, 29 Nov 2007 20:33:20 -0800 (PST)
Links: << >> << T >> << A >>

Thanks John,

I definitely tried (and just re-verified) changing an instance name
with the MHS file closed and it still did not stick.  I am doing this
from the ports tab.  I may try from the bus tab.

The xparameter.h file was my fault.  I had a typo in the OPB
peripheral address that placed it out of the plb-to-opb bridge's
address range. : )

By the way, the NPI bus interface thing was solved by a Xilinx app
engineer.  There is an apparently undocumented change.  The NPI bus
naming convention in the custom peripheral's MPD has to change from
'NPI' to 'XIL_NPI'.  After making these changes, the NPI bus
connections work as they should.  I am hoping this may solve the
simulation issue as well and am about to try that.

I am VERY impressed with the response time from Xilinx regarding these
cases.  I try not to open them without trying a lot of things myself
or coming here and  searching.  But with a new EDK release, I have
learned from past experience, it is sometimes easier and faster to
open a WebCase.  It would have taken me a long time (if ever) to find
the NPI to XIL_NPI thing!

Article: 126715
Subject: Pipelining of FPGA code
From: dash82 <dhavalrules@gmail.com>
Date: Thu, 29 Nov 2007 20:55:03 -0800 (PST)
Links: << >> << T >> << A >>

Hi,

I am trying to understand what Pipelined designing/architecture for
FPGA's mean ?

I went through documents which list all the benefits of using
pipelining for FPGA's. But, none of them explicitly explained how
pipelined architecture was better (efficiency-wise) against a non-
pipelined architecture. I would'nt generally ask such kind of
questions in a forum. But going through books on Verilog (Samir
Palnitkar's)and searching in Google didnt help me.

It would help me if someone could point to some article / book /
example (and
preferably a Verilog based one) which explains pipelining  at in
depth.

I did post in the Verilog group, but from the response, I thought that
the
problem is more FPGA focussed.

Thanks.

Shah.

Article: 126716
Subject: Re: Pipelining of FPGA code
From: "KJ" <kkjennings@sbcglobal.net>
Date: Fri, 30 Nov 2007 05:10:46 GMT
Links: << >> << T >> << A >>


"dash82" <dhavalrules@gmail.com> wrote in message 
news:d85591fb-29ef-42c3-b8e1-7fc4d37a0f78@e67g2000hsc.googlegroups.com...
> Hi,
>
> I am trying to understand what Pipelined designing/architecture for
> FPGA's mean ?
>
> I went through documents which list all the benefits of using
> pipelining for FPGA's. But, none of them explicitly explained how
> pipelined architecture was better (efficiency-wise) against a non-
> pipelined architecture. I would'nt generally ask such kind of
> questions in a forum. But going through books on Verilog (Samir
> Palnitkar's)and searching in Google didnt help me.
>
> It would help me if someone could point to some article / book /
> example (and
> preferably a Verilog based one) which explains pipelining  at in
> depth.

Pipelining means simply to take a clock cycle (or more) to produce a result. 
What this essentially does is to spread out a computation so that part of it 
gets done in one clock cycle, another part gets done in some other clock 
cycle.  The reason you would do such a seemingly counterproductive thing is 
because sometimes the time it takes to do the entire calculation would mean 
that the system clock would have to slow down.  By breaking the problem into 
smaller chunks, each chunk can be done faster.

>
> I did post in the Verilog group, but from the response, I thought that
> the
> problem is more FPGA focussed.
>
No, pipelining has nothing to do with VHDL, Verilog, FPGAs or ASICs....it is 
a basic design technique applicable to any digital design.

KJ

Article: 126717
Subject: Re: Cascaded DCMs with variable phase shift (Xilinx)
From: chesi <cesteban75@gmail.com>
Date: Thu, 29 Nov 2007 23:12:53 -0800 (PST)
Links: << >> << T >> << A >>

On 29 nov, 16:46, austin <aus...@xilinx.com> wrote:
> chesi,
> I am looking into this now.
> My concern is that feeding a phase shifted clock out of one DCM into
> another that is set to multiply by ten (10) may not work (I never
> simulated that case).  Not sure what the second DCM will do while the
> input clock is changing phase (it might lose lock).
> I am presuming that your problem is not with the second DCM, but with
> the first one?  It does not perform the negative phase shifts between 20
> and 28 counts?
> At 27 MHz, that is a 37 ns period, so each count is 27,000/256 =3D~ 105
> ps.  Given that there is a finite number of delay taps, I believe the
> lower frequency limit is 19 MHz, there should be no reason you are
> running out of taps (and the status register would indicate overflow or
> underflow, and you would lose lock).
> I will discuss this, and post again.
> Austin

Hi,
I've discovered something about this. The problem appears only when
the top-left DCM has to carry out variable phase shift. If I avoid
this situation using LOC constraints, everything seems to work ok. By
the moment I have just one board to try. As soon as I can try with
other ones, I'll try to find if this issue has some repetitive nature.

Regards,
C=E9sar

Article: 126718
Subject: Re: Asynchronous FIFO and almost empty - bug?
From: "heinerlitz@googlemail.com" <heinerlitz@googlemail.com>
Date: Fri, 30 Nov 2007 00:44:47 -0800 (PST)
Links: << >> << T >> << A >>

Hi Peter,

we are using Virtex4 FX devices. The FIFO runs at 100 MHz and was
generated with coregen 3.5. We just found out that there have been
several modifications regarding the almost empty signal in coregen
4.2. We'll try that out first.

Point is we NEED an almost empty signal at threshold == 1 which we can
rely on 100%. If I understand you correctly this is not given with the
Xilinx sync FIFOs so we would have to build our own, right?

regards, Heiner

On Nov 29, 5:35 pm, Peter Alfke <al...@sbcglobal.net> wrote:
> On Nov 29, 3:46 am, "heinerl...@googlemail.com"
>
> <heinerl...@googlemail.com> wrote:
> > Hi,
>
> > we are using an asynchronous FIFO to bridge two clock domains. Both
> > domains have "the same" clock speed but different clock oscillators.
>
> > We shift data phits in the FIFO which always form a data packet. In
> > between a packet data is shifted in continously without a break.
> > Breaks (no shift in) are only allowed in between packets.
> > On the output side of the FIFO we need a steady data stream during a
> > data packet. The packet may not be interrupted. As the input side may
> > be slower we start shift-out data if at least two data phits are in
> > the FIFO. As the 2 clocks have almost the same frequency this
> > guarantees that we never have a buffer underflow.
>
> > The problem we found is that the almost empty flag is only asserted if
> > the FIFO is beeing emptied and not if it is beeing filled. So if the
> > FIFO was empty and we get a shift in the almost empty is not asserted
> > although we set the treshold to one. Is this a bug?
>
> Which FPGA family, which type of FIFO controller, and also what clock
> rate?
> Peter Alfke, Xilinx Applications>
>
> > We tried to solve that problem by generating a delay-empty signal at
> > the output which guarantees that if the FIFO was emtpy and than
> > receives a shift in we still wait another cycle so we get another
> > shift in to avoid underflow.
>
> > This solution however does not solve the problem if the FIFO exactly
> > had one entry when starting to shift out a packet. In this case
> > neither delayed-empty nor almost empty is asserted, hence we get an
> > underflow.
>
> > Why isn't the almost empty signal asserted every time there is a
> > single packet in the FIFO? Ideas?

Article: 126719
Subject: Re: Gnd plane coupling with DDR routing from FPGA <-> DDR?
From: "Symon" <symon_brewer@hotmail.com>
Date: Fri, 30 Nov 2007 08:47:40 -0000
Links: << >> << T >> << A >>

"John Larkin" <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote in message 
news:061vk3hncpn2man9udarrgt80b2lasfl97@4ax.com...
>
> I sometimes add a few SMA connector footprints to pc boards, so I can
> TDR the power planes relative to the ground plane. It's amazing. A
> typical power plane, on an unloaded board, looks like a perfect
> capacitor to 20 GHz, with no evidence of reflections or edge effects.
> Then if you start adding bypass caps *anywhere*, it just looks like a
> bigger perfect capacitor.
>
Hi John,
As we've discussed before, I think that would be a really useful experiment 
if this thread was about microwave engineering (say). Sadly, we're talking 
about FPGA PDSs. FPGAs don't have SMA connections to hook up their power 
supplies to a board's planes, so, altough interesting, I think the TDR 
experiment results aren't applicable in this case. It doesn't matter how 
amazing the capacitance quality is, you can't wire it to the silicon.
I still say, ditch the power planes, put the bypass caps (maybe X2Y types) 
on power puddles near the device, that'll work great. This save planes, 
which you can use as ground planes. This topology allows the designer to 
filter the supplies near the FPGA to isolate it. If you believe that your 
high speed signals work better with a return path, and I know some folk on 
CAF apparently don't, then when these signals swap reference from one ground 
plane to another you can just use a ground via for the return path as 
opposed to the situation where a signal switches from ground referred to 
power plane referred, which costs a bypass cap and two vias and has much 
more inductance.
One thing from John's post I do find intriguing is his mate who doesn't use 
any bypass caps. I can quite believe that his stuff works just fine. I still 
think it's easier to get PDS right than wrong. Most designs will 'work'. 
That's why this subject is perfect for a usenet religious war!
HTH, Syms.

p.s. From experience, I know John likes to read links I post so he can offer 
his reasoned critique. I saw the comment about 'bypass caps *anywhere*' in 
his post, and so I eagerly await John's response to this.
http://www.x2y.com/bypass/method/does_position_matter.pdf
:-)

Article: 126720
Subject: Re: Gnd plane coupling with DDR routing from FPGA <-> DDR?
From: "Symon" <symon_brewer@hotmail.com>
Date: Fri, 30 Nov 2007 09:03:12 -0000
Links: << >> << T >> << A >>

"Nial Stewart" <nial*REMOVE_THIS*@nialstewartdevelopments.co.uk> wrote in 
message news:5r7druF135mm3U1@mid.individual.net...
>
> I think you've said before that you've got away with routing power in like
> this with no problems.
>
Hi Nial,
I just re-read this bit, and would just like to clarify that it wasn't so 
much 'got away with' as 'had success with'! :-) It works out cheaper and 
performs better in terms of EMI, routability, SI over the whole board.

One more point, in the six layer stackup I suggested, I would make the 
centre core (between layers 3 and 4) thick, so that the signal layers 
1,3,4,6 are close to their reference planes. There's some stuff in this link 
that explains why better than I can. (It's applicable for regular caps as 
well as X2Y ones!)
http://www.x2y.com/bypass/mount/get_the_most.pdf
Cheers, Syms.

Article: 126721
Subject: Re: Gnd plane coupling with DDR routing from FPGA <-> DDR?
From: "Nial Stewart" <nial*REMOVE_THIS*@nialstewartdevelopments.co.uk>
Date: Fri, 30 Nov 2007 09:50:09 -0000
Links: << >> << T >> << A >>

> I know one guy who doesn't use bypass caps at all, and his stuff works
> too.


For FPGA designs with multiple syncronous fast IOs?




Nial

Article: 126722
Subject: Re: ISE WARNING Xst:647
From: Tricky <Trickyhead@gmail.com>
Date: Fri, 30 Nov 2007 01:58:18 -0800 (PST)
Links: << >> << T >> << A >>

Have you assigned output to actual output pins? are the outputs at the
top level? If you havent assigned pins, it should, by default, assign
the output to the next available pins. Did it run out of pins to
assign the signal too?

If its not there in RTL view, it sounds like theres been some serious
logic removal - go back through the synthesis logs; it should warn you
that logic has been removed. The removal of 1 register could have
caused all of this to happen.

Article: 126723
Subject: Re: Gnd plane coupling with DDR routing from FPGA <-> DDR?
From: "Nial Stewart" <nial*REMOVE_THIS*@nialstewartdevelopments.co.uk>
Date: Fri, 30 Nov 2007 09:58:37 -0000
Links: << >> << T >> << A >>

> I still say, ditch the power planes, put the bypass caps (maybe X2Y types) on power puddles near 
> the device, that'll work great. This save planes, which you can use as ground planes. This 
> topology allows the designer to filter the supplies near the FPGA to isolate it. If you believe 
> that your high speed signals work better with a return path, and I know some folk on CAF 
> apparently don't, then when these signals swap reference from one ground plane to another you can 
> just use a ground via for the return path as opposed to the situation where a signal switches from 
> ground referred to power plane referred, which costs a bypass cap and two vias and has much more 
> inductance.

Symon,

You've posted useful snapshots of board desing before (ie to illustrate the
usefulness of micro-vias).

Any change of a screen shot of a board you've done using this technique to
illustrate things better?

We have an interrupted power plane with is _really_ well decoupled at the
FPGA and DDR so our current thinking is to use routinh on layers 1 3 and 6.


From another post...

> One more point, in the six layer stackup I suggested, I would make the centre core (between layers 
> 3 and 4) thick, so that the signal layers 1,3,4,6 are close to their reference planes. There's 
> some stuff in this link that explains why better than I can. (It's applicable for regular caps as 
> well as X2Y ones!)

Hmm, more expense specifying stack build up?

I think a 'normal' 6 layer stack uses relatively thick cores with thinner
pre-preg so layers 3 and 4 end up significantly nearer the planes at 2 and
5 that the top and bottom layer.

Bearing this in mind, if we're using a normal stack perhaps we should be
using the top and layers 3 and 4 to route out to the DDR.

Or again I'm worrying about things too much?

> Most designs will 'work'.

Hopefully.

> That's why this subject is perfect for a usenet religious war!

[panto] Oh no it's not [/panto].


Nial.

Article: 126724
Subject: Re: Hand solder that FPGA on your prototype
From: "Tony Burch" <tony@burched.com.au>
Date: Fri, 30 Nov 2007 22:24:58 +1100
Links: << >> << T >> << A >>

"Chris Maryan" <kmaryan@gmail.com> wrote in message 
news:c02e0d7f-8fbf-48bc-a187-a0d5f368a2ad@a35g2000prf.googlegroups.com...
> On Nov 29, 9:43 am, "Tony Burch" <t...@burched.com.au> wrote:
>> Hi all,
>>
>> I've just finished constructing a new site that has web videos of 
>> soldering
>> techniques, including how to hand solder quad flat packs. Very handy for
>> prototyping.
>>
>> You can go and get a free membership there, which lets you see 5 videos 
>> on
>> how to hand solder quad flat packs. Watch me hand solder a Spartan 2E in 
>> a
>> PQ208 package onto a board:)http://supersolderingsecrets.com/
>>
>> The upgraded membership covers lots of other soldering techniques such as
>> "toaster-oven soldering", "frying pan solder pot soldering", etc., but if
>> you just wanted to know how to hand solder that FPGA on your prototype, 
>> then
>> the free membership reveals all:)
>>
>> Cheers,
>>
>> Tony Burch http://supersolderingsecrets.com/
>
> <rain_on_parade>
> www.sparkfun.com has lots of good soldering info, including surface
> mount, toasters, hot plates, etc. And the best part is that you don't
> have to sign up for anything there.
> </rain_on_parade>

Hi Chris,

Don't worry about the rain, it's good for the garden:)

Yes I agree, the Sparkfun stuff is exellent.

But I don't think anyone gets all of their info from just one place.

You should check out the http://SuperSolderingSecrets.com videos too. If you 
are interested in soldering, there is some real value there with a unique 
perspective & original material from my personal experiences.

I only ask first name & email address when joining, and then you can 
unsubscribe from the email list immediately if you want to - you get to keep 
the free lifetime membership so that you can watch the videos any time.

I fully respect members & email privacy.

Kind regards,

Tony Burch
http://SuperSolderingSecrets.com

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search