Messages from 138175

Article: 138175
Subject: Re: Experiencing problems when moving an FPGA-based implementation to
From: rickman <gnuarm@gmail.com>
Date: Sun, 8 Feb 2009 15:22:16 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 6, 5:20 am, mjunaidel...@gmail.com wrote:
> I have got Verilog code for a design that has been implemented on
> Virtex-4. My target is take this design to an ASIC implementation. The
> issues that I am having right now are related to porting an FPGA based
> design to an ASIC. When I read this design into Cadence synthesis
> tools, I get few unresolved references for IDELAYCTRL, ODDR, BUFG,
> BUFR, BUFIO, BUFGMUX_VIRTEX4. These modules are placed in the design
> as follows:
>
> IDELAYCTRL             - connected to DCM
>
> ODDR                        - connected to DCM
>
> BUFG                         - connected to DCM
>
> BUFR                         - connected to SERDES
>
> BUFIO                        - connected to SERDES
>
> BUFGMUX_VIRTEX4 - connected to DCM
>
> It seems that these modules as xilinx specific and are automatically
> instantiated in the verilog code when DCM and SERDES components were
> configured using xilinx tools. Can you please kindly guide on what
> should I do about these componets as there is no code for these
> modules in my code directory? Has anyone else experienced this sort of
> issue before when moving an FPGA-based implementation to an ASIC?

I don't remember the marketing name they use for it, but Xilinx has a
line of ASICs that you can port your FPGA design to which will have
all the same functional units.  I expect this will not have quite as
low a unit price as a full ASIC, but if your design meets their
requirements, they certify that the ASIC will work to the test
vectors.  Certainly a good thing for someone without a lot of
experience in designing ASICs.

Rick

Article: 138176
Subject: Re: Is this phase accumulator trick well-known???
From: nico@puntnl.niks (Nico Coesel)
Date: Sun, 08 Feb 2009 23:23:00 GMT
Links: << >> << T >> << A >>

Jonathan Bromley <jonathan.bromley@MYCOMPANY.com> wrote:

>hi comp.arch.fpga,
>(accidentally posted to comp.lang.vhdl 
>a few moments ago- sorry)
>
>The question - repeated after the explanation - 
>is: here's what I think is a nifty trick; has
>anyone seen it, or been aware of it, before?
>I can't believe it's really new.
>
>I have been messing around with baud rate generators
>and suchlike - creating a pulse that's active for
>one clock period at some required repetition rate -
>and wanted to try a phase accumulator technique 
>instead of a simple divider.  That makes it far 
>easier to specify the frequency - it's simply the
>phase-delta input value - and easily allows for
>non-integral divide ratios, at the cost of one
>master clock period of jitter.
>
>The phase-accumulator produces pulses with a
>repetition rate of 
>  Fc * M / N 
>where Fc is the master clock, M is the phase delta
>and N is the counter's modulus.  However, to get
>the huge convenience of specifying M as the required
>frequency, I must make N be equal to the frequency
>of Fc, and this is unlikely to be an exact power of 2.
>So the phase accumulator works like this:
>
>  on every clock pulse...
>    if (acc < 0) then
>      add := acc + N;
>      output_pulse <= '1';
>    else
>      output_pulse <= '0';
>    end if;
>    acc := acc - M;  -- unconditionally

What if you simply add N-M to the accumulator?

  on every clock pulse...
    if (acc < 0) then
      acc := acc + (N -M);
      output_pulse <= '1';
    else
      output_pulse <= '0';
      acc := acc - M;
    end if;


-- 
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
                     "If it doesn't fit, use a bigger hammer!"
--------------------------------------------------------------

Article: 138177
Subject: Re: Is this phase accumulator trick well-known???
From: Glen Herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Sun, 08 Feb 2009 17:37:18 -0700
Links: << >> << T >> << A >>

Jonathan Bromley wrote:
(snip)

> Yes, exactly.  But if "869" is a variable (from an
> input port) so that the rate is configurable, then
> you have

>   if (...overflow...) then
>     modulo_adder <= modulo_adder + N;
>   else
>     modulo_adder <= modulo_adder + N - 1600;

> The last line requires TWO adders, in addition to the
> multiplexer created by the IF.  This causes a significant
> performance hit.  That's what I was trying to fix. 

The adders will run in parallel, so there should be no
performance difference.

-- glen

Article: 138178
Subject: Re: Is this phase accumulator trick well-known???
From: Muzaffer Kal <kal@dspia.com>
Date: Sun, 08 Feb 2009 16:48:33 -0800
Links: << >> << T >> << A >>

On Sun, 08 Feb 2009 17:37:18 -0700, Glen Herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>Jonathan Bromley wrote:
>(snip)
>
>> Yes, exactly.  But if "869" is a variable (from an
>> input port) so that the rate is configurable, then
>> you have
>
>>   if (...overflow...) then
>>     modulo_adder <= modulo_adder + N;
>>   else
>>     modulo_adder <= modulo_adder + N - 1600;
>
>> The last line requires TWO adders, in addition to the
>> multiplexer created by the IF.  This causes a significant
>> performance hit.  That's what I was trying to fix. 
>
>The adders will run in parallel, so there should be no
>performance difference.

I don't think they can. The first line needs a 2 input adder and the
second line needs two 2 input adders in a tree or a 3 input adder. No
matter how you look at it, that path will be slower than a 2 input
adder; whether slow enough to matter is a different issue.
More precisely, there are two paths 
1) module_adder.Q to module_adder.D  and
2) N.Q to module_adder.D

The second path here will have some extra gates on it because before
N.Q can be presented to the input of the second (m_a+N) adder, it has
to go through one more adder to calculate N.Q -1600, which is the
extra delay.
-- Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services
http://www.dspia.com

Article: 138179
Subject: Re: How to divide clock frequency......
From: GrIsH <grishkunwar@gmail.com>
Date: Sun, 8 Feb 2009 17:52:26 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 6, 9:36=A0pm, Gabor <ga...@alacron.com> wrote:
> On Feb 5, 1:53=A0pm, Lorenz Kolb <lorenz.k...@uni-ulm.de> wrote:
>
> > GrIsH wrote:
> > > I am using spartan-II board..and in my project, i need number of
> > > signals of different frequencies..So i want to divide 50Mz clock of
> > > Spartan-II board to get the clocks of required frequencies......
> > > So anyone suggest me on How to divide the clock frequency??.........
> > > any help will be greatly appreciated.............................
>
> > Either use DCMs or use counters generating enable-signals for the faste=
r
> > clocks to act as slower clocks.
>
> > Avoid using counters to generate your own clocks.
>
> > Regards,
>
> > Lorenz
>
> Spartan II won't have DCM's, only DLL's. =A0The first question is
> what frequencies do you need and are they simple integer divisions
> from your 50 MHz? =A0If so, use simple counters with a single decoded
> state to provide a clock enable for the slower logic. =A0The flip-flops
> in the Spartan II all have clock enables, so there is no
> need to use multiple clocks (there are only 4 global clock nets
> in that part IIRC) if your frequencies are simple divisions from
> a single master clock.

actually, i need numbers of frequencies that may be of integer as well
as fraction division of 50 MHz clock..iam a biginer  one for FPGAs and
using Spartan-II board first time so i don't know anything about
DLL's......Can i use DLL's for my puspose???
>
> Regards,
> Gabor

Article: 138180
Subject: Re: clk synchronization of reset signal
From: "KJ" <kkjennings@sbcglobal.net>
Date: Sun, 8 Feb 2009 20:55:40 -0500
Links: << >> << T >> << A >>

"rickman" <gnuarm@gmail.com> wrote in message 
news:38e9b5a0-5fd0-4f24-9b49-e1838552bebb@x10g2000yqk.googlegroups.com...
On Feb 6, 9:01 pm, "KJ" <kkjenni...@sbcglobal.net> wrote:

> There can be an issue on the propagation delay for this reset after
> the double FF sync.  But if your design is not all that fast, then
> this is not so much an issue.  As the clock speed increases, the tools
> will have to work harder to distribute the sync'd reset across the
> chip.  Sometimes the tools will use a global clock line for this.  If
> you find problems having this signal meet timing, you may want to
> consider localized reset sync circuits rather than one global
> circuit.

No, you'll need to limit the fanout by telling the synthesis tool what max 
fanout to apply *if* timing is failing because of this signal (typically I 
don't find it to be any problem since I tend to only reset things that need 
it, not the multitude of data path flops that don't).

> This can work if each region controlled by a local reset
> does not need to be brought out of reset on the same clock cycle as
> the other regions.  This is common in designs rather than the entire
> device needing to be released from reset at once.

Delaying reset by a clock cycle doesn't do anything.  All of those lines of 
code that are equivalent to this...
local_reset <= reset when rising_edge(clock)
will be recognized as being the same thing so now you'll simply have large 
fanout on this single signal called 'local_reset'.  Just because you put 
this local_reset signal inside various entities doesn't accomplish much. 
Everything at the same hierarchy level will get the same 'local_reset'. 
Maybe that breaks up the net enough for it not to be a problem, but simply 
applying a max fanout will make it not be a problem in a more efficient way.

> There are often
> many FFs that do not need to be reset at all.  All of this depends on
> your design of course.

Agreed, most flops have no functional requirement to be reset.

> > The double buffer does not need to be 'reset', nor does any other double
> > buffering of any other async signal for that matter. Think about it for 
> > a
> > second

> I don't agree that the reset or other buffer FFs don't need to be
> reset by something other than the external reset signal.

Actually I said the double buffer flops do not need to be reset, so I don't 
get your disagreement.

> > If your device and synthesis tools support initializing flops to a known
> > state at power up (or configuration load) then simply supplying an 
> > initial
> > value to the double buffer flops is a way to force a reset at this time
> > independent of what the external reset is doing. Something like this...

> Unfortunately, that is a big if.  I think it is becoming more common,
> but it is not universal.

Initial values work with Synplify and Quartus, don't know about ISE.  That 
covers darn near all of FPGA land.  The tools that don't support it need 
trouble reports written against them.

>   The other alternative is not portable
> between chips, but most families allow a GSR type component to be
> instantiated which can be used to drive an async input to the FF.

Ummm....the GSR component is the not portable thing here.

> Then all FFs coded this way will have a known initial state regardless
> of the tool used.  The GSR instantiation will need to be changed when
> porting to different brands/families of FPGAs.

See what I mean by not portable?

I certainly accept that not all *parts* inherently allow for initial power 
up values to be loaded but for the ones that do it does work.  But this is 
way off on a tangent since initial values aren't needed for a synchronizer 
anyway.

> I typically use an external reset to async control a few "reset" FFs
> used to sync control the rest of the design.  These "reset" FFs can be
> duplicated around the design if it is large to prevent the sync reset
> delays from affecting the speed of the design.  Of course, each region
> controlled by separate "reset" FFs must be able to operate correctly
> if started a clock or two out of step with the rest of the design.

You can accomplish the same and not have to worry about manually 
partitioning the regional resets by simply giving the synthesis tool a max 
fanout limit on the reset signal though...tool specific yes, but much 
cleaner 'specially when you develop reusable code where the 'region' that a 
particular local reset signal is in can vary depending on which design you 
instantiate the reusable widget into.

Kevin

Article: 138181
Subject: ISIM and SDF Files
From: reganireland@gmail.com
Date: Sun, 8 Feb 2009 20:03:57 -0800 (PST)
Links: << >> << T >> << A >>

Hey guys,

Im trying to conduct a PPAR Sim in ISIM, but it keeps giving me the
error that it cannot find".sdf" file.

The files are being created and stored in /netgen/par but the
simulator seems to be looking in the project root directory. The names
of the files are correct: toplevel_timesim.sdf etc. but the simulator
still seems to look for ".sdf" without even a file name.

ERROR:Simulator:35 - Sdf file ".sdf" specified does not exist or can
not be
   read.

Any ideas how I can manually tell it where the files are?

Cheers,
Regan

Article: 138182
Subject: Re: How to divide clock frequency......
From: Peter Alfke <alfke@sbcglobal.net>
Date: Sun, 8 Feb 2009 20:06:18 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 8, 5:52=A0pm, GrIsH <grishkun...@gmail.com> wrote:
> On Feb 6, 9:36=A0pm, Gabor <ga...@alacron.com> wrote:
>
>
>
> > On Feb 5, 1:53=A0pm, Lorenz Kolb <lorenz.k...@uni-ulm.de> wrote:
>
> > > GrIsH wrote:
> > > > I am using spartan-II board..and in my project, i need number of
> > > > signals of different frequencies..So i want to divide 50Mz clock of
> > > > Spartan-II board to get the clocks of required frequencies......
> > > > So anyone suggest me on How to divide the clock frequency??........=
.
> > > > any help will be greatly appreciated.............................
>
> > > Either use DCMs or use counters generating enable-signals for the fas=
ter
> > > clocks to act as slower clocks.
>
> > > Avoid using counters to generate your own clocks.
>
> > > Regards,
>
> > > Lorenz
>
> > Spartan II won't have DCM's, only DLL's. =A0The first question is
> > what frequencies do you need and are they simple integer divisions
> > from your 50 MHz? =A0If so, use simple counters with a single decoded
> > state to provide a clock enable for the slower logic. =A0The flip-flops
> > in the Spartan II all have clock enables, so there is no
> > need to use multiple clocks (there are only 4 global clock nets
> > in that part IIRC) if your frequencies are simple divisions from
> > a single master clock.
>
> actually, i need numbers of frequencies that may be of integer as well
> as fraction division of 50 MHz clock..iam a biginer =A0one for FPGAs and
> using Spartan-II board first time so i don't know anything about
> DLL's......Can i use DLL's for my puspose???
>
>
>
> > Regards,
> > Gabor

Direct Digital Synthesis (DDS) is a popular method. Google it!
You need to build a reasonably long accumulator (10 to 20 bits long),
which really is a register that adds a binary value to its present
content on every clock tick of the 50 MHz clock.
And you just use the most significant bit as the output.
Explore the operation by assuming that you add just one bit, the least
significant one, every time.
That generates your lowest output frequency. Any other output
frequency will be a multiple of this frequency.
The accumulator length establishes your frequency resolution. A longer
accumulator gives you finer resolution.
The additional problem is that any output frequency that is not an
integer fraction of 50 MHz will have an unavoidable jitter component
that is max 20 ns, one period of your 50 MH clock.
Reducing that jitter is notoriously difficult, but if you can life
with the frequency granularity and the potential jitter, the DDS
circuit is nice and simple, and very flexible.
Peter Alfke, Xilinx, at home with a cold...

Article: 138183
Subject: Re: Is this phase accumulator trick well-known???
From: jhallen@TheWorld.com (Joseph H Allen)
Date: Mon, 9 Feb 2009 04:18:29 +0000 (UTC)
Links: << >> << T >> << A >>

I remember implementing Bresenham's line drawing algorithm on 8-bit CPUs
like this:

	suba	#N	* Subtract N
	bcc	no_ov	* Branch if no borrow...
	adda	#M	* Add M
	inc	cur_y	* increment y axis or whatever...
no_ov

(bonus points if you can identify the CPU and the assembler...)

In Verilog this would be something like:

	reg [16:0] accu;

	if (accu[16]) // Did previous (- N) overflow?
	  accu <= accu + M; // Yes, add M
	else
	  accu <= accu - N;


-- 
/*  jhallen@world.std.com AB1GO */                        /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Article: 138184
Subject: Re: Is this phase accumulator trick well-known???
From: Glen Herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Sun, 08 Feb 2009 21:47:05 -0700
Links: << >> << T >> << A >>

Muzaffer Kal wrote:

(after someone wrote)
>>>  if (...overflow...) then
>>>    modulo_adder <= modulo_adder + N;
>>>  else
>>>    modulo_adder <= modulo_adder + N - 1600;

>>>The last line requires TWO adders, in addition to the
>>>multiplexer created by the IF.  This causes a significant
>>>performance hit.  That's what I was trying to fix. 

(I wrote)
>>The adders will run in parallel, so there should be no
>>performance difference.

> I don't think they can. The first line needs a 2 input adder and the
> second line needs two 2 input adders in a tree or a 3 input adder. No
> matter how you look at it, that path will be slower than a 2 input
> adder; whether slow enough to matter is a different issue.

I thought N was more or less constant.  It should probably be

        modulo_adder <= modulo_adder + (N - 1600);

to make sure it comes out that way, though.

In any case, it should run at least at 100MHz even with two adders
on most current FPGAs.

(snip)

-- glen

Article: 138185
Subject: Re: Is this phase accumulator trick well-known???
From: Antti <Antti.Lukats@googlemail.com>
Date: Sun, 8 Feb 2009 21:29:40 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 9, 1:12=A0am, Mike Treseler <mtrese...@gmail.com> wrote:
> Jonathan Bromley wrote:
> > It simulates correctly too, although of course it needs to
> > be wrapped in something that provides a signal to connect
> > to its unconstrained input port.
>
> This may be the point that Antti missed,
> as he mentioned "unmodified" code.
> Without a wrapper or constraint, I wouldn't expect
> elaboration to succeed:
>
> # vsim -c rate_gen
> # Loading /flip/usr1/modeltech/linux/../std.standard
> # Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body)
> # Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body)
> # Loading work.rate_gen(rtl)#1
> # ** Fatal: (vsim-3347) Port 'rate' is not constrained.
>
> A minimal constraining modification like this:
> =A0rate =A0: in =A0unsigned(15 downto 0)
> proves that the code would sim ok, with a proper instance.
>
> # vsim -c rate_gen
> # Loading /flip/usr1/modeltech/linux/../std.standard
> # Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body)
> # Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body)
> # Loading work.rate_gen(rtl)#1
> VSIM 1> run
> VSIM 2>
>
> =A0-- Mike Treseler

Mike I am not that dumb ;)

the original code does NOT pass synthesis without wrapper
after passing the synthesis with wrapper it does not pass
compile for Xilinx ISIM
the same code may pass __all__ others simulator,
i made no statement about an of them, only that
code making xilinx XST happy doest not pass ISIM
so those tools VHDL support level is just different.

i constrained the signal connected to rate in the wrapper as:

signal rate : unsigned(15 downto 0) :=3D X"2580";

this was the _needed_ trick in the wrapper.
i assumed Jonathans code does not modifications.
what he claims the case if the tools are fully VHDL-93 compliant

Antti

Article: 138186
Subject: Re: Is this phase accumulator trick well-known???
From: rickman <gnuarm@gmail.com>
Date: Sun, 8 Feb 2009 22:08:08 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 9, 12:29=A0am, Antti <Antti.Luk...@googlemail.com> wrote:
> On Feb 9, 1:12=A0am, Mike Treseler <mtrese...@gmail.com> wrote:
>
>
>
> > Jonathan Bromley wrote:
> > > It simulates correctly too, although of course it needs to
> > > be wrapped in something that provides a signal to connect
> > > to its unconstrained input port.
>
> > This may be the point that Antti missed,
> > as he mentioned "unmodified" code.
> > Without a wrapper or constraint, I wouldn't expect
> > elaboration to succeed:
>
> > # vsim -c rate_gen
> > # Loading /flip/usr1/modeltech/linux/../std.standard
> > # Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body)
> > # Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body)
> > # Loading work.rate_gen(rtl)#1
> > # ** Fatal: (vsim-3347) Port 'rate' is not constrained.
>
> > A minimal constraining modification like this:
> > =A0rate =A0: in =A0unsigned(15 downto 0)
> > proves that the code would sim ok, with a proper instance.
>
> > # vsim -c rate_gen
> > # Loading /flip/usr1/modeltech/linux/../std.standard
> > # Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body)
> > # Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body)
> > # Loading work.rate_gen(rtl)#1
> > VSIM 1> run
> > VSIM 2>
>
> > =A0-- Mike Treseler
>
> Mike I am not that dumb ;)
>
> the original code does NOT pass synthesis without wrapper
> after passing the synthesis with wrapper it does not pass
> compile for Xilinx ISIM
> the same code may pass __all__ others simulator,
> i made no statement about an of them, only that
> code making xilinx XST happy doest not pass ISIM
> so those tools VHDL support level is just different.
>
> i constrained the signal connected to rate in the wrapper as:
>
> signal rate : unsigned(15 downto 0) :=3D X"2580";
>
> this was the _needed_ trick in the wrapper.
> i assumed Jonathans code does not modifications.
> what he claims the case if the tools are fully VHDL-93 compliant
>
> Antti

ISIM is known to be very buggy compared to more mature tools.  Xilinx
will even admit that and they ask for all input about things code that
makes it fail.

I would not put much importance on the fact that any code does not
work correctly in ISIM.

Rick

Article: 138187
Subject: Re: Is this phase accumulator trick well-known???
From: rickman <gnuarm@gmail.com>
Date: Sun, 8 Feb 2009 22:42:04 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 8, 4:32=A0pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Sun, 8 Feb 2009 10:27:08 -0800 (PST), Antti wrote:
> >ok, now i KNOW too
>
> Aw, shucks, you found me out. =A0Sitting in my futuristic
> bunker in a hollowed-out volcano, stroking my fluffy
> white cat and letting out the occasional megalomaniac
> cackle, I was thinking: BWAH HAH HAH At last I have
> created something that could confuse even the great
> Antti. =A0But it was not to be so :-)
>
> >xilinx synthesis: 50mhz ref 9600 baud (fixed not variable), s3
> >FF:18
> >LUT 17
>
> Yup. =A0With a constant rate input, XST does The Right Thing and
> collapses all the constants. =A0With a variable input, you get
> approximately 2X LUT-to-FF ratio because you can't fit a
> 3-input MUX and an adder bit into a single LUT (at least,
> not in Spartan-3... maybe when we have LUT6??? .....)
>
> >used Jons unmodifed code :)
>
> It simulates correctly too, although of course it needs to
> be wrapped in something that provides a signal to connect
> to its unconstrained input port. =A0I haven't tried it with
> ISIM just yet, but I believe the code is strictly VHDL-93
> compliant and therefore it is NOT MY FAULT if some simulator
> cannot handle it. =A0By contrast, if you can find a synthesis
> tool that can't handle it then I'd be glad to know, because
> my intent was to create portable synthesisable code.

I think I understand everything that has been posted here.  I
understand that your approach may be an improvement.  But I am not
sure it should be a *LOT* faster than two adders.  Using 4 input LUTs,
an adder is no more or less logic than a mux other than the carry
chain.  Two cascaded adders will have the carry chains running in
parallel other than the extra LUT delay.  This LUT delay is no more
than what you would see adding an extra MUX.  In fact, I can picture a
hardware implementation using a single 2 input mux and two adders
where one of the adders is not in the timing loop.

The calculation is Acc <=3D Acc - (N muxed with N-M).  In the N-M
calculation both N and M are variable, but the variables are not
updated in the loop, so for the purposes of calculating clock timing,
this adder should not be part of the delay (unless you are
implementing chirp or spread spectrum).  Perhaps you need to specify
the path from the N and M registers through the N-M adder is not
important.  That should give you as good or better timing than the 3
input mux approach and does not limit the output frequency as much.
In the newer families with a 6 input LUT, the timing critical parts
will all fit in a single LUT.  I don't think that is true if you use a
3 input mux because both select signals have to update on a clock by
clock basis and none of it can be separated into another LUT without
impacting performance.

This is an interesting problem, am I understanding it correctly?

Rick

Article: 138188
Subject: Re: Experiencing problems when moving an FPGA-based implementation to an ASIC
From: hal-usenet@ip-64-139-1-69.sjc.megapath.net (Hal Murray)
Date: Mon, 09 Feb 2009 00:55:53 -0600
Links: << >> << T >> << A >>


>I don't remember the marketing name they use for it, but Xilinx has a
>line of ASICs that you can port your FPGA design to which will have
>all the same functional units.  I expect this will not have quite as
>low a unit price as a full ASIC, but if your design meets their
>requirements, they certify that the ASIC will work to the test
>vectors.  Certainly a good thing for someone without a lot of
>experience in designing ASICs.

Hardwire?

I thought Xilinx got out of the business several generations ago.

The current cost-reduction approach is to use normal silicon but
only test the parts that your design uses.

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

Article: 138189
Subject: Re: clk synchronization of reset signal
From: rickman <gnuarm@gmail.com>
Date: Sun, 8 Feb 2009 23:17:43 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 8, 8:55 pm, "KJ" <kkjenni...@sbcglobal.net> wrote:
> "rickman" <gnu...@gmail.com> wrote in message
>
> news:38e9b5a0-5fd0-4f24-9b49-e1838552bebb@x10g2000yqk.googlegroups.com...
> On Feb 6, 9:01 pm, "KJ" <kkjenni...@sbcglobal.net> wrote:
>
> > There can be an issue on the propagation delay for this reset after
> > the double FF sync.  But if your design is not all that fast, then
> > this is not so much an issue.  As the clock speed increases, the tools
> > will have to work harder to distribute the sync'd reset across the
> > chip.  Sometimes the tools will use a global clock line for this.  If
> > you find problems having this signal meet timing, you may want to
> > consider localized reset sync circuits rather than one global
> > circuit.
>
> No, you'll need to limit the fanout by telling the synthesis tool what max
> fanout to apply *if* timing is failing because of this signal (typically I
> don't find it to be any problem since I tend to only reset things that need
> it, not the multitude of data path flops that don't).

I am not aware that the tools will combine signals like you describe,
but it makes some sense.  I have seen this with combinatorial logic,
so I suppose it will happen with sequential logic as well.  So your
suggestion makes sense.

> > This can work if each region controlled by a local reset
> > does not need to be brought out of reset on the same clock cycle as
> > the other regions.  This is common in designs rather than the entire
> > device needing to be released from reset at once.
>
> Delaying reset by a clock cycle doesn't do anything.  All of those lines of
> code that are equivalent to this...
> local_reset <= reset when rising_edge(clock)
> will be recognized as being the same thing so now you'll simply have large
> fanout on this single signal called 'local_reset'.  Just because you put
> this local_reset signal inside various entities doesn't accomplish much.
> Everything at the same hierarchy level will get the same 'local_reset'.
> Maybe that breaks up the net enough for it not to be a problem, but simply
> applying a max fanout will make it not be a problem in a more efficient way.

Delaying the reset is not what I am suggesting.  I am saying that if
the external async reset goes to multiple, parallel sync circuits,
they can get released on different clock cycles.  In many cases this
will not matter.  But you need to be sure that this applies to your
design.

> > > The double buffer does not need to be 'reset', nor does any other double
> > > buffering of any other async signal for that matter. Think about it for
> > > a
> > > second
> > I don't agree that the reset or other buffer FFs don't need to be
> > reset by something other than the external reset signal.
>
> Actually I said the double buffer flops do not need to be reset, so I don't
> get your disagreement.

I don't agree with that.  They need to be configured to the "reset"
state in the sense of asserting the reset output.  Otherwise there
will be at least one clock cycle after configuration where the reset
is not asserted.

> > > If your device and synthesis tools support initializing flops to a known
> > > state at power up (or configuration load) then simply supplying an
> > > initial
> > > value to the double buffer flops is a way to force a reset at this time
> > > independent of what the external reset is doing. Something like this...
> > Unfortunately, that is a big if.  I think it is becoming more common,
> > but it is not universal.
>
> Initial values work with Synplify and Quartus, don't know about ISE.  That
> covers darn near all of FPGA land.  The tools that don't support it need
> trouble reports written against them.

You are making assumptions about the tools.  This may be true for
current versions, but there are times when older versions of the tools
must be used.  Also, what happens if you want to convert your design
to an ASIC?  Do you want to go in and try to find every initialization
that will have to be fixed and make sure it then works?  Maybe I am
just stuck in my ways, but I tend to code rather conservatively.

> >   The other alternative is not portable
> > between chips, but most families allow a GSR type component to be
> > instantiated which can be used to drive an async input to the FF.
>
> Ummm....the GSR component is the not portable thing here.

Yes, but it is very easy to spot and deal with.  Most FPGA families
have an equivalent function and in an ASIC you will need to recode
this small part of the design.

> > Then all FFs coded this way will have a known initial state regardless
> > of the tool used.  The GSR instantiation will need to be changed when
> > porting to different brands/families of FPGAs.
>
> See what I mean by not portable?

I don't strive for my code to be 100% portable, just easily portable.
Relying on specific features in tools throughout your code can present
a *huge* effort to deal with if you run up against the tool that does
not support it.

> I certainly accept that not all *parts* inherently allow for initial power
> up values to be loaded but for the ones that do it does work.  But this is
> way off on a tangent since initial values aren't needed for a synchronizer
> anyway.

Initial values are needed if the synchronizer is controlling the chip
reset.  I suppose you can use both a sync and async input to the
synchronizer FFs, but then that what I am proposing in essence.

> > I typically use an external reset to async control a few "reset" FFs
> > used to sync control the rest of the design.  These "reset" FFs can be
> > duplicated around the design if it is large to prevent the sync reset
> > delays from affecting the speed of the design.  Of course, each region
> > controlled by separate "reset" FFs must be able to operate correctly
> > if started a clock or two out of step with the rest of the design.
>
> You can accomplish the same and not have to worry about manually
> partitioning the regional resets by simply giving the synthesis tool a max
> fanout limit on the reset signal though...tool specific yes, but much
> cleaner 'specially when you develop reusable code where the 'region' that a
> particular local reset signal is in can vary depending on which design you
> instantiate the reusable widget into.

I have never actually run into a design where the fanout was a
problem, so I have not had to try specing this.  What you say about
this spec makes sense.  The reset propagation delay is something that
will vary between brands, families and even individual FPGA sizes
within a family.  So it makes sense to control the fan out as an
external constraint.

Rick

Article: 138190
Subject: Re: Is this phase accumulator trick well-known???
From: Muzaffer Kal <kal@dspia.com>
Date: Sun, 08 Feb 2009 23:40:37 -0800
Links: << >> << T >> << A >>

On Sun, 08 Feb 2009 21:47:05 -0700, Glen Herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>Muzaffer Kal wrote:
>
>(after someone wrote)
>>>>  if (...overflow...) then
>>>>    modulo_adder <= modulo_adder + N;
>>>>  else
>>>>    modulo_adder <= modulo_adder + N - 1600;
>
>>>>The last line requires TWO adders, in addition to the
>>>>multiplexer created by the IF.  This causes a significant
>>>>performance hit.  That's what I was trying to fix. 
>
>(I wrote)
>>>The adders will run in parallel, so there should be no
>>>performance difference.
>
>> I don't think they can. The first line needs a 2 input adder and the
>> second line needs two 2 input adders in a tree or a 3 input adder. No
>> matter how you look at it, that path will be slower than a 2 input
>> adder; whether slow enough to matter is a different issue.
>
>I thought N was more or less constant.  

I'm not sure what that means. N is either constant in which case
"N-1600" is a constant and there is no reason to hold this discussion
(and earlier in the thread it was mentioned that N was variable to
make the generator programmable) or it is variable in which case one
needs a real adder to deal with it. (or it maybe a very slowly
changing number in which case one can use some multi-cycle constraints
but I don't think that's the topic now).

> It should probably be
>
>        modulo_adder <= modulo_adder + (N - 1600);
>
>to make sure it comes out that way, though.
>

You pay your dues either way. If N is variable you either do 
"(m_a +N) - 1600 " where you incur the cost of a 2 input adder with
variable inputs and another 2 input adder where one input is fixed
(constant) or you calculate:
"m_a + (N-1600)" where you add a constant to N and feed the result to
another 2 input adder where both inputs are variable.
In either case the path from N.Q to m_a.D is longer than it would be
without the third addend.

>In any case, it should run at least at 100MHz even with two adders
>on most current FPGAs.

Whether that's adequate or relevant really depends on what one is
doing.
-- Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services
http://www.dspia.com

Article: 138191
Subject: Learning backend stuff
From: googler <pinaki_m77@yahoo.com>
Date: Mon, 9 Feb 2009 00:06:32 -0800 (PST)
Links: << >> << T >> << A >>

Hi all,

I am an ASIC designer who works mostly on the front end (RTL coding).
I want to teach myself on how things are done in the backend by
working on small (hobby) projects. My main requirement is not just to
learn the concepts, but to gain hands-on experience using some of the
industry standard tools (like Design Compiler, PrimeTime etc) and
formats (like SDC files).

(1) How can I do this on my own without spending a lot of money for
all the different tools? My first choice, of course, is the most
common tools like DC or PT, but if there are cheaper but similar tools
available, that may be fine too.
(2) I had earlier been thinking about using an FPGA board for this,
but will the same tools (as mentioned earlier) work for FPGA? How
different are the tools that come with the FPGA boards?

I got myself a copy of the book "Advanced ASIC Chip Synthesis using
Synopsys Design Compiler, Physical Compiler and PrimeTime". So I would
prefer something (that is, what tools or which platform) that will not
be too different from what the book talks about.

Thanks in advance.

Article: 138192
Subject: Re: Is this phase accumulator trick well-known???
From: Eric Smith <eric@brouhaha.com>
Date: Mon, 09 Feb 2009 01:16:00 -0800
Links: << >> << T >> << A >>

Joseph H Allen writes:
> I remember implementing Bresenham's line drawing algorithm on 8-bit CPUs
> like this:
>
> 	suba	#N	* Subtract N
> 	bcc	no_ov	* Branch if no borrow...
> 	adda	#M	* Add M
> 	inc	cur_y	* increment y axis or whatever...
> no_ov
>
> (bonus points if you can identify the CPU and the assembler...)

Looks like MC6800/6802/6808, MC6801, MC6809, MC6809, or MC68HC11 families to me,
or the Hitachi HD6301 family.

Most of my experience with 8-bit parts from Motorola was with the MC68HC05,
MC6809, and MC68HC11 families, but IIRC the HC05 only has a single accumulator
so I don't recall using suba and adda on it.

Article: 138193
Subject: pulser problem
From: uraniumore238@gmail.com
Date: Mon, 9 Feb 2009 01:18:21 -0800 (PST)
Links: << >> << T >> << A >>

Does anyone know how I can generate a pulser of maximum repition rate
of 50 Mhz in verilog .. into my existing design ? I'd like to simulate
this signal before I use the actual pulser into the board

Article: 138194
Subject: Re: pulser problem
From: Muzaffer Kal <kal@dspia.com>
Date: Mon, 09 Feb 2009 01:36:31 -0800
Links: << >> << T >> << A >>

On Mon, 9 Feb 2009 01:18:21 -0800 (PST), uraniumore238@gmail.com
wrote:

>Does anyone know how I can generate a pulser of maximum repition rate
>of 50 Mhz in verilog .. into my existing design ? I'd like to simulate
>this signal before I use the actual pulser into the board

I think you want a clock source. Here is one way you can generate it:

`timescale 1ns/100fs
reg clk;
initial
begin
	pck = 0;
	forever clk = #10 ~clk;
end

This clock toggles at 50 MHz as you want (ie 10ns high, 10 ns low) and
you can change the number 10 to your needs.
Please note that this is only for testbench usage and you can't
actually put this in your fpga. You have to use an oscillator on your
board to generate a similar clock to drive into your fpga.

Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services
http://www.dspia.com

Article: 138195
Subject: Re: Is this phase accumulator trick well-known???
From: Jonathan Bromley <jonathan.bromley@MYCOMPANY.com>
Date: Mon, 09 Feb 2009 10:14:09 +0000
Links: << >> << T >> << A >>

On Sun, 8 Feb 2009 22:42:04 -0800 (PST), rickman wrote:

>This is an interesting problem, am I understanding it correctly?

Yes; more correctly than I did at first, I think.

Various people have correctly pointed out that the N-M 
calculation does not need to be on a timing arc, but it's
tough to convince the tools of that.

Other people have correctly pointed out that my trick
to convert 2 adders and a 2-in MUX into one adder and
a 3-in MUX does not save any area.  I did consistently
find, however, that it gave significantly better Fmax;
I'm not 100% sure I know why.  If we have 6-input LUTs
then my trick would be a very big win.

Finally, someone pointed out that the N-M calculation
could be pipelined.  In FPGAs, with one FF per LUT
whether you use it or not, that turns out better than
any other form I've tried.  Better still, if N and M 
are both constants then the tools correctly identify
that the (N-M) pipeline register is constant, and 
optimize it away.  So my original question, and my
original "trick", become irrelevant (except in 
Spartan-6, maybe???!!!) and my "best effort" is:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity rate_gen is
  generic ( ref_Hz: positive := 50_000_000 );
  port
    ( clock : in  std_logic
    ; reset : in  std_logic
    ; rate  : in  unsigned
    ; pulse : out std_logic
    );
end;

architecture RTL of rate_gen is

begin
  process (clock)
    variable count: integer range -2**rate'length to ref_Hz-1 := 0;
    variable wrap: natural range 0 to ref_Hz := ref_Hz;
  begin
    if rising_edge(clock) then
      pulse <= '0';
      if reset = '1' then
        count := 0;
      elsif count < 0 then
        pulse <= '1';
        count := count + wrap;
      else
        count := count - to_integer(rate);
      end if;
      wrap := ref_Hz - to_integer(rate);
    end if;
  end process;
end;

The synchronous reset adds a tiny amount of delay (routing???)
and is probably unnecessary.

But there's another idea coming...
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Article: 138196
Subject: Re: REWARD $$$ Xilinx USB Platform Cable problems
From: Alan Fitch <alan.fitch@spamtrap.com>
Date: Mon, 09 Feb 2009 10:22:20 +0000
Links: << >> << T >> << A >>

luudee wrote:
> On Feb 4, 11:13 pm, Antti <Antti.Luk...@googlemail.com> wrote:
>> solution 1:
>>
>> get another cheap PC, install the required and working versions cable
>> drivers
>> there and connect over ethernet to your cable server PC
>>
>> fighting with impact is pointless to absurd. it sometimes works.
>> thats all that can be said.
>>
>> that sametimes usually isnt then when you need it most.
>>
>> Antti
>> PS good luck getting your paid answer..:)
>> 11.1 is soon to be released, so you will again fight
>> the linux drivers
>>
>> a winxp PC cableserver box is cheaper solution
> 
> 
> Hi Antti,
> 
> yes, I have been thinking of what you have suggested as
> a "last resort". I figure give it a try and see if there
> is somebody out there who might know how to fix this really
> annoying problem ...
> 
> Regards,
> rudi

One thing to be aware of - the cableserver doesn't work at all from 
linux to windows on ISE 10.1. It's been partially fixed in 10.1 SP3, it 
does work from the command line - however the Impact gui running on 
Linux accessing a cable server on PC won't work until 11.1 according to 
this note:

http://www.xilinx.com/support/answers/31314.htm

So watch out, it's a jungle out there :-)

Alan

-- 
Alan Fitch
Doulos
http://www.doulos.com

Article: 138197
Subject: Re: Learning backend stuff
From: Muzaffer Kal <kal@dspia.com>
Date: Mon, 09 Feb 2009 02:44:46 -0800
Links: << >> << T >> << A >>

On Mon, 9 Feb 2009 00:06:32 -0800 (PST), googler
<pinaki_m77@yahoo.com> wrote:

>Hi all,
>
>I am an ASIC designer who works mostly on the front end (RTL coding).
>I want to teach myself on how things are done in the backend by
>working on small (hobby) projects. My main requirement is not just to
>learn the concepts, but to gain hands-on experience using some of the
>industry standard tools (like Design Compiler, PrimeTime etc) and
>formats (like SDC files).
>
These somewhat of a pickle. These days most back-end issues are
related to CTS, low-power/leakage, cross-talk, signal-integrity, DFM,
DFT and similar issues and they require significant investment of time
(assuming money is no object) to get acquainted with. Also as an RTL
designer, you should be familiar with documenting your own constraints
in SDC anyway.

>(1) How can I do this on my own without spending a lot of money for
>all the different tools? My first choice, of course, is the most
>common tools like DC or PT, but if there are cheaper but similar tools
>available, that may be fine too.

Do you have access to DC, PC, PT and similar tools through your
current employer? That would be the only way to get at them. ASIC
tools are extremely expensive and would be quite difficult to acquire
any other way.

>(2) I had earlier been thinking about using an FPGA board for this,
>but will the same tools (as mentioned earlier) work for FPGA? How
>different are the tools that come with the FPGA boards?
>
Almost none of the significant ASIC back-end issues are of any
importance for FPGAs so other than doing synthesis with correct
constraints, FPGA experience doesn't buy you much.

>I got myself a copy of the book "Advanced ASIC Chip Synthesis using
>Synopsys Design Compiler, Physical Compiler and PrimeTime". So I would
>prefer something (that is, what tools or which platform) that will not
>be too different from what the book talks about.

If I were starting ASIC back-end work  today, I'd download a serious
design (like LEON or Open-sparc etc.) and do a top to bottom flow on
it ie synthesis, P&R, CTS, STA, DFT, LVS, DRC; the whole thing if you
have access to the tools. 

Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services
http://www.dspia.com

Article: 138198
Subject: Re: Is this phase accumulator trick well-known???
From: Jonathan Bromley <jonathan.bromley@MYCOMPANY.com>
Date: Mon, 09 Feb 2009 10:57:02 +0000
Links: << >> << T >> << A >>

On Mon, 09 Feb 2009 10:14:09 +0000, Jonathan Bromley wrote:

>But there's another idea coming...

which is to time-division mux the two additions.
This degrades the jitter to 2 master clock periods,
but gives what I believe to be the most compact
and fastest possible implementation for a phase
accumulator whose modulus is not a power of 2.
I removed the reset because it's fairly useless.

As with the earlier implementation, this one
can only provide output rates up to Fc/2.

  library ieee;
  use ieee.std_logic_1164.all;
  use ieee.numeric_std.all;

  entity rate_gen is
    generic ( ref_Hz: positive := 50_000_000 );
    port
      ( clock : in  std_logic
      ; rate  : in  unsigned
      ; pulse : out std_logic
      );
  end;

  architecture RTL_2ph of rate_gen is
  begin
    process (clock)
      -- Halve the modulus to account for 2-phase operation
      constant modulus: integer := ref_Hz/2;
      -- This flag controls the adder multiplexing
      variable phase: boolean;
      variable count: integer range -2**rate'length to modulus-1 := 0;
    begin
      if rising_edge(clock) then
        pulse <= '0';
        if phase then
          count := count - to_integer(rate);
        elsif count < 0 then
          count := count + modulus;
          pulse <= '1';
        end if;
        phase := not phase;
      end if;
    end process;

  end;

Thanks for all the comments.
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Article: 138199
Subject: Re: MPEG-1 Layer 3 (Mp3) Encoder and Decoder
From: FredrikH <fredrik.holmsten@gmail.com>
Date: Mon, 9 Feb 2009 03:39:24 -0800 (PST)
Links: << >> << T >> << A >>

Thanks for a very detailed answer. Is this DSP processor of yours
something that you distribute in any way? Is there a license
associated with it? Or is it something for opencores.org!?

In any way I guess it requires good knowledge and a great deal of work
to implement it. It=92s not a push button flow and that is more or less
what I was looking for when I started searching for Mp3-cores. Maybe a
naive starting point, but this project I=92m working on will probably
not give me a day for studying the Mp3 decoding and encoding in
detail.

I don=92t need to decode or encode more than six channels simultaneously
so even if the encoding requires more processing power/resources it
seems to be an easy match to fit everything in a 30 000 LUT FPGA. I
don=92t know yet what kind of FPGA I=92ll be using. Maybe a Spartan-6?

/Fredrik

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search