Messages from 96750

Article: 96750
Subject: Re: Microblaze Virtual platform problem
From: "siva.velusamy@gmail.com" <siva.velusamy@gmail.com>
Date: 9 Feb 2006 17:09:38 -0800
Links: << >> << T >> << A >>

Hi,

The error could be because your system has a model that is not
supported by virtual platform. The list of supported models should be
available in the documentation. Also, try using the latest version of
EDK.

-Siva
Xilinx, Inc.

Article: 96751
Subject: Re: Async Processors
From: fpga_toys@yahoo.com
Date: 9 Feb 2006 17:12:18 -0800
Links: << >> << T >> << A >>

rickman wrote:
> You have ignored the real issue.  The issue is not whether the async
> design can run faster under typical conditions; we all know it can.
> The issue is how do you make use of that faster speed?  The system
> design has to work in the worst case conditions, so you can only use
> the available performance under worse case conditions.

Again, depends on the application. If it's a packet routing/switching
engine
running below wire speed, then it means that the device will
route/switch
more packets per second without overrunning when not worst case.

> You can do the same thing with a clocked design.  Measure the
> temperature and run the clock faster when the temperature is cooler.
> It just is not worth the effort since you can't do anything useful with
> the extra instructions.

But that only allows derating for temp based on worst case assessment
of the process data. It doesn't allow for automatic adjustments for
process
variation or other device specific variances.

Sure you can take every device and characterize it across all the
environmental factors which impact performance, and write a custom
clock table per device ... but get realistic ...

it's all about tradeoff's ... designs and the target implementation
hardware.

> The glitching happens in any design.  Inputs change and create changes
> on the gate outputs which feed other gates, etc until you reach the
> outputs.  But the different paths will have different delays and the
> outputs as well as the signals in the path can jump multiple times
> before they settle.  The micro-glitching you are talking about will
> likely cause little additional glitching relative to what already
> happens.  Of course, YMMV.

Not true, as there have always been ways to design without glitches
using choices like gray coded counters for state machines, one hot
state machines, and covering the logic states to be glitch free by
design, which most good engineers will purposefully do when practical
and necessary, as should good tools. It's just a design decision to
ensure that every term is deterministic without static or dynamic
hazards. Maybe they don't teach this in school any more now that
everyone does vhdl/verilog.

    http://www.interfacebus.com/Design_Logic_Timing_Hazards.html
    http://findatlantis.com/absint_extended.pdf

In the best async designs, extremely fine grained dual rail, it
shouldn't happen at all with good gate design.

Article: 96752
Subject: Re: Parallel NCO (DDS) in Spartan3 for clock synthesis - highest possible speed?
From: "PeterC" <peter@geckoaudio.com>
Date: 9 Feb 2006 17:13:33 -0800
Links: << >> << T >> << A >>


John -

Pipelining the accumulators I will certainly look at and this should be
simple, since they have simple ripple-carry carry chains, will try 8
then 4-bit granularity if needed.

On your point (2), I'm not sure I understand completely - this would
require MUXing both inputs of a single adder - both the feedback and
the input tuning words, adding an additional MUX delay? Yes, my tuning
range spans about 10kHz around the 50 kHz point, and I would like to do
this with single Hz resolution. If you can send a quick sketch to peter
(at) geckoaudio (dot) com that would be great.

By "reducing the resolution of the non-accumulating adders" I take it
to mean that since N/4 etc will be a relatively small number, it
certainly would not need to sit in a 32-bit register?

The bit serial approach is interesting, but I think the internal fabric
clock limit is around 300 MHz anyway, and an 8-bit or 4-bit pipelined
adder would probably run at close to this anyway (I'm guessing here)?

On the topic of *fun* - how does knowing the ratio of the contents of
the accumulator to the tuning word (N) after it turns over? Excuse my
ignorance, but I don't see how this is useful.

Cheers,
PeterC.

Article: 96753
Subject: Re: Async Processors
From: Jim Granville <no.spam@designtools.co.nz>
Date: Fri, 10 Feb 2006 14:16:37 +1300
Links: << >> << T >> << A >>

rickman wrote:
>>>Yes, the async processor will run faster when conditions are good, but
>>>what can you do with those extra instruction cycles?
>>
>>Nothing, the point is you save energy, by finishing earlier.
> 
> 
> How did you save energy?  You are thinking of a clocked design where
> the energy is a function of time because the cycles are fixed in
> duration.  In the async design energy is not directly a function of
> time but rather a fuction of the processing.  In this case the
> processing takes the same amount of energy, it just gets done faster.
> Then you wait until the next external trigger that you need to
> synchronize to.  No processing or energy gain, just a longer wait time.

Well, it depends what you take from their published infos.

To me, it makes sense, and I look forward to seeing real devices :)

The ultimate proof, is not what someone thinks may, or may not matter,
but how the actual chip performs.

-jg

Article: 96754
Subject: Re: Parallel NCO (DDS) in Spartan3 for clock synthesis - highest possible speed?
From: "PeterC" <peter@geckoaudio.com>
Date: 9 Feb 2006 17:18:39 -0800
Links: << >> << T >> << A >>

Jim,

I can tolerate a 1 Hz step (I need real-time tuning with at least this
resolution, as well as a small number of "coarse" steps of about 5kHz).
Apologies for not posting this initially to eliminate this as a
candidate, I have thought about the simple integer division - but my
range and tuning require DDS. As much as I'd like to, I can't use a PLL
due to cost!

Cheers,
Peter.

Article: 96755
Subject: Re: Parallel NCO (DDS) in Spartan3 for clock synthesis - highest
From: Jim Granville <no.spam@designtools.co.nz>
Date: Fri, 10 Feb 2006 15:15:25 +1300
Links: << >> << T >> << A >>

PeterC wrote:
> Jim,
> 
> I can tolerate a 1 Hz step (I need real-time tuning with at least this
> resolution, as well as a small number of "coarse" steps of about 5kHz).
> Apologies for not posting this initially to eliminate this as a
> candidate, I have thought about the simple integer division - but my
> range and tuning require DDS. As much as I'd like to, I can't use a PLL
> due to cost!

  The DPLL I meant, was the Clock module inside the FPGA, not an 
external one.

  A simple divider, from ~200Mhz, gives better than 1Hz dF, below 14KHz 
Fo.  Could that be good enough ? [It will have vey low jitter]

-jg

Article: 96756
Subject: Re: Parallel NCO (DDS) in Spartan3 for clock synthesis - highest possible speed?
From: "Peter Alfke" <alfke@sbcglobal.net>
Date: 9 Feb 2006 19:30:34 -0800
Links: << >> << T >> << A >>

Peter, you fist of all, have to decide on frequency resolution, and
acceptable jitter or phase noise.
Resolution is easy with DDS, just make the accumulator long enough.
Jitter is fundamentally determined by the clock frequency. I would try
200 MHz or (in Virtex-4) 400+ MHz. That gets you to a few ns. If you
need better, you can struggle with a factor of 2, but anything below 2
ns is tough, and below 1 ns is impossible, unless you use MGTs.
There you can get 300 ps granularity from the 3 Gbps outputs, which
sounds like 150 ps jitter. It takes some trickery and some duplication
of resources, so it's not all that cheap. And Spartan and its friends
do not have 3 Gbps transmitters...
Find out first what you really need. Jitter is your enemy. And fighting
it is never cheap.
Peter Alfke, Xilinx (from home)

Article: 96757
Subject: ModelSim # Error loading design
From: "mBird" <no@email.com>
Date: Thu, 9 Feb 2006 23:24:51 -0500
Links: << >> << T >> << A >>

I just downloaded Xilinx ISE 8.1 and ModelSim XE III/Starter 6.0d
I make a simple project, using schematic (one and gate) an dthen make a test 
bench waveform. I then do Simulate Behaviural Model but no matter what I do 
I always get # Error loading design with no other indication of erors. In 
the previous version of ISE and ModelSim it all worked so I am not sure what 
is error?
Any help greatly appretiared!

The results of from ModelSim:
# Reading C:/Modeltech_xe_starter/tcl/vsim/pref.tcl
# do m.fdo
# ** Warning: (vlib-34) Library already exists at "work".
# Model Technology ModelSim XE III vlog 6.0d Compiler 2005.04 Apr 26 2005
# -- Compiling module FD_MXILINX_matt_sch
# -- Compiling module matt_sch
#
# Top level modules:
#  matt_sch
# Model Technology ModelSim XE III vcom 6.0d Compiler 2005.04 Apr 26 2005
# -- Loading package standard
# -- Loading package textio
# -- Loading package std_logic_1164
# -- Loading package std_logic_textio
# -- Loading package std_logic_arith
# -- Loading package std_logic_unsigned
# -- Compiling entity m
# -- Compiling architecture testbench_arch of m
# Model Technology ModelSim XE III vlog 6.0d Compiler 2005.04 Apr 26 2005
# -- Compiling module glbl
#
# Top level modules:
#  glbl
# vsim -L cpld_ver -L uni9000_ver -lib work -t 1ps m glbl
# Loading C:\Modeltech_xe_starter\win32xoem/../std.standard
# Loading C:\Modeltech_xe_starter\win32xoem/../std.textio(body)
# Loading C:\Modeltech_xe_starter\win32xoem/../ieee.std_logic_1164(body)
# Loading C:\Modeltech_xe_starter\win32xoem/../ieee.std_logic_textio(body)
# Loading C:\Modeltech_xe_starter\win32xoem/../ieee.std_logic_arith(body)
# Loading C:\Modeltech_xe_starter\win32xoem/../ieee.std_logic_unsigned(body)
# Loading work.m(testbench_arch)
# XE version supports only a single HDL
# Error loading design
# Error: Error loading design
#        Pausing macro execution
# MACRO ./m.fdo PAUSED at line 8

Article: 96758
Subject: Simulation of MicroBlaze embedded system
From: "motty" <mottoblatto@yahoo.com>
Date: 9 Feb 2006 20:29:26 -0800
Links: << >> << T >> << A >>

Is there a guide to behaviorally simulating a MB embedded system?  I
have gotten as far as compiling the libraries, running simgen,
instatiating the system in a top_level, and writing a testfixture for
that top level.  It all loads correctly in NC-Sim, and I am able to
look at all signals within the system.  I am able to validate that the
dcm is working as it should.  It gets the input clock from the
testfixture and outputs the multiplied/divided frequency that it
should.

Our system has a Xilinx GPIO connected to a simple USB chip (FTDI).
When a signal from the USB chip goes low, the microblaze code starts
the process of reading the data from the chip.  I thought I had that
working even though I was blindly writing timing in the testfixture
without knowledge of where the MicroBlaze processor was or when things
should occur.  It really isn't working...in the sim at least.

Is there a way to know what code the MicroBlaze is running at a
particular time?  There are MANY signals available to look at in
regards to MB.  There doesn't seem to be any documentation explaining
how to do this.  It all kind of leaves off after getting everything
compiled.

When do I know that the MicroBlaze has finished all the intitialization
code that is written?  Will I have to actually analyze the number of
cycles used and insert that timing in the testfixture?  It seems like
there should be an easier way to do this.

Eventually, I would like to get to the point of running timing sims to
see where our problems lie.  We get weird results after different
builds when we change a signal or two on a debug port we are pulling
out from our modules.  Sometimes things totally hose up!

Thanks.

Article: 96759
Subject: Re: Async Processors
From: "rickman" <spamgoeshere4@yahoo.com>
Date: 9 Feb 2006 20:43:41 -0800
Links: << >> << T >> << A >>

fpga_t...@yahoo.com wrote:
> rickman wrote:
> > You have ignored the real issue.  The issue is not whether the async
> > design can run faster under typical conditions; we all know it can.
> > The issue is how do you make use of that faster speed?  The system
> > design has to work in the worst case conditions, so you can only use
> > the available performance under worse case conditions.
>
> Again, depends on the application. If it's a packet routing/switching
> engine
> running below wire speed, then it means that the device will
> route/switch
> more packets per second without overrunning when not worst case.

Do they design the equipment to drop packets when it gets hot or when
the PSU is near the low margin or when the chip was just not the
fastest of the lot?  That is my point.  You design equipment to a level
of acceptable performance.  The equipment spec does not typically
derate for uncontrolled variables such as temperature, voltage and
process.

If your VOIP started dropping packets so that your phone calls were
garbled and the provider said, "of course, we had a hot day, do you
expect to see the same speeds all the time?", would you find that
acceptable?

> > You can do the same thing with a clocked design.  Measure the
> > temperature and run the clock faster when the temperature is cooler.
> > It just is not worth the effort since you can't do anything useful with
> > the extra instructions.
>
> But that only allows derating for temp based on worst case assessment
> of the process data. It doesn't allow for automatic adjustments for
> process
> variation or other device specific variances.

Actually they do.  That is called speed grading and nearly all makers
of expensive CPU chips do it.  You could do that with any chip if you
wanted to.  But there is no point unless you want to run your Palm 10%
faster because it is a toy to you.

> Sure you can take every device and characterize it across all the
> environmental factors which impact performance, and write a custom
> clock table per device ... but get realistic ...
>
> it's all about tradeoff's ... designs and the target implementation
> hardware.

Yes, I agree, you have to be realistic.  There is no significant
advantage to having a processor run at different speeds based on
uncontrolled variables.

> > The glitching happens in any design.  Inputs change and create changes
> > on the gate outputs which feed other gates, etc until you reach the
> > outputs.  But the different paths will have different delays and the
> > outputs as well as the signals in the path can jump multiple times
> > before they settle.  The micro-glitching you are talking about will
> > likely cause little additional glitching relative to what already
> > happens.  Of course, YMMV.
>
> Not true, as there have always been ways to design without glitches
> using choices like gray coded counters for state machines, one hot
> state machines, and covering the logic states to be glitch free by
> design, which most good engineers will purposefully do when practical
> and necessary, as should good tools. It's just a design decision to
> ensure that every term is deterministic without static or dynamic
> hazards. Maybe they don't teach this in school any more now that
> everyone does vhdl/verilog.

You are talking about stuff that no one uses because there is very
little advantage and it does not outweight the cost.  My point is that
none of this is justified at this time.

Besides, for the most part, the things you mention do not prevent the
glitches.  One hot state machines use random logic with many input
variables.  Only one FF is a 1 at a time, but that means two FFs change
state at each transition and potentially feed into the logic for many
state FFs.  This means each rank of logic can have transitions from the
two FFs that change and potentially significant power is used even in
the ranks that do not change their output.

>     http://www.interfacebus.com/Design_Logic_Timing_Hazards.html
>     http://findatlantis.com/absint_extended.pdf

I have never heard anyone suggest that you should design to avoid the
intermediate transients of logic.  Of course you can, but there are
very few designs indeed that need to be concerned about the last few %
of power consumption this would save.

> In the best async designs, extremely fine grained dual rail, it
> shouldn't happen at all with good gate design.

Great, you have identified an advantage of async designs.  They can be
done with extremely fine grained dual rail logic that can avoid
transients in intermediate logic.  But then you can do that in sync
designs if you decide you want to, right?

Article: 96760
Subject: Re: Software reset for the MicroBlaze
From: "Simon Peacock" <simon$actrix.co.nz>
Date: Fri, 10 Feb 2006 17:44:33 +1300
Links: << >> << T >> << A >>

In windoze land .. its called the power switch.. :-)


"John Williams" <jwilliams@itee.uq.edu.au> wrote in message
news:43EBC87B.5020301@itee.uq.edu.au...
> Simon Peacock wrote:
>
> > That would be a hardware reset .. not software :-).... but it depends on
> > what you call a hard reset
>
> OK, I'll give you that :) my reading of the question was "how do I
> initiate a reset from within software".
>
> In Linux land we call it "shutdown -r now"
>
> John

Article: 96761
Subject: Re: Async Processors
From: "Simon Peacock" <simon$actrix.co.nz>
Date: Fri, 10 Feb 2006 17:53:57 +1300
Links: << >> << T >> << A >>


"Jim Granville" <no.spam@designtools.co.nz> wrote in message
news:43ebe95e$1@clear.net.nz...
> rickman wrote:
> >>>Yes, the async processor will run faster when conditions are good, but
> >>>what can you do with those extra instruction cycles?
> >>
> >>Nothing, the point is you save energy, by finishing earlier.
> >
> >
> > How did you save energy?  You are thinking of a clocked design where
> > the energy is a function of time because the cycles are fixed in
> > duration.  In the async design energy is not directly a function of
> > time but rather a fuction of the processing.  In this case the
> > processing takes the same amount of energy, it just gets done faster.
> > Then you wait until the next external trigger that you need to
> > synchronize to.  No processing or energy gain, just a longer wait time.
>
The gain is very simple.. every time a sync circuit clocks.. a zillion
transistors switch.. less if the circuit is partitioned correctly.  But by
definition, in an async circuit, only the one path actually does something.
The clock distribution in an advanced processor is a significant proportion
of the overall clock budget.  There is also the capacitive effect.. charge
and discharge.. if the current never falls to zero, then the minimum power
is slightly above zero also.  Static cmos fets, of course, draw no current.

Simon

Article: 96762
Subject: Spartan3 embedded synchronous multipliers
From: "Isaac Bosompem" <x86asm@gmail.com>
Date: 9 Feb 2006 21:16:59 -0800
Links: << >> << T >> << A >>

Hi guys, I've been reading through the Spartan3 architecture embedded
multipliers app note and I can't seem to find out how long (in terms of
clock cycles) the sync multipliers in the Spartan3 will take. Can I
safely assume that after I have asserted the inputs to the module, I
will get the output back in the following clock cycle? 

Thanks,

Isaac

Article: 96763
Subject: Re: Parallel NCO (DDS) in Spartan3 for clock synthesis - highest possible speed?
From: "PeterC" <peter@geckoaudio.com>
Date: 9 Feb 2006 21:24:28 -0800
Links: << >> << T >> << A >>

Jim - the numbers you have chosen are of course correct, but I'm
missing the point -

14,000.7000 Hz = 200 MHz / 14285. Next divisor is 14286,, which gives
13999.7200 Hz, so yes 0.7 Hz control is possible for a 14 kHz output
frquency. But sub 1 Hz adjustment is also possible Fo = 15 kHz for
example.

I do need the same degree of control around 50 kHz (ideally even better
than 1 Hz, down to as low as 0.1 Hz) so I don't think a simple integer
division is acceptable.

As far as Peter's comments - I simply don't know exactly what the
jitter spec and freq resolution should be - it all depends on other
parts of the system which are being simultaneously designed. It comes
down to a certain amount of experimentation to see how the audio DAC
output spectrum will behave with jittery clocks.

Article: 96764
Subject: Re: Parallel NCO (DDS) in Spartan3 for clock synthesis - highest
From: John_H <johnhandwork@mail.com>
Date: Fri, 10 Feb 2006 06:11:53 GMT
Links: << >> << T >> << A >>

PeterC wrote:

> John -
> 
> Pipelining the accumulators I will certainly look at and this should be
> simple, since they have simple ripple-carry carry chains, will try 8
> then 4-bit granularity if needed.
> 
> On your point (2), I'm not sure I understand completely - this would
> require MUXing both inputs of a single adder - both the feedback and
> the input tuning words, adding an additional MUX delay? Yes, my tuning
> range spans about 10kHz around the 50 kHz point, and I would like to do
> this with single Hz resolution. If you can send a quick sketch to peter
> (at) geckoaudio (dot) com that would be great.

The accumulator feeds four values but only one of those values (Acc+N) 
feeds back to the accumulator.  The other three values (Acc+N/4, 
Acc+N/2, Acc+3N/4) are simple adders where all you care about are the 
MSbits.  Are you a Verilog guy?  It would be much easier to send you a 
few lines of code rather than a sketch.

> By "reducing the resolution of the non-accumulating adders" I take it
> to mean that since N/4 etc will be a relatively small number, it
> certainly would not need to sit in a 32-bit register?

It's not about the size of the number, it's that you don't need 32 bits 
of precision to decide if the edge should go at at the quarter-period 
point or at the half-period.  If your jitter is already at about 1/4 
cycle, your adders don't have to be much more precise than that to give 
you 4 MSBits for 4 clock phases with no (noticeably) additive jitter.

> The bit serial approach is interesting, but I think the internal fabric
> clock limit is around 300 MHz anyway, and an 8-bit or 4-bit pipelined
> adder would probably run at close to this anyway (I'm guessing here)?
> 
> On the topic of *fun* - how does knowing the ratio of the contents of
> the accumulator to the tuning word (N) after it turns over? Excuse my
> ignorance, but I don't see how this is useful.
> 
> Cheers,
> PeterC.

If you're running a phase accumulator at a master clock rate, at the 
update of that clock when the MSBit changes will have LSBits in the 
range of 0<=Acc<N where N is the phase value you're adding every cycle. 
  How long ago the "virtual" edge (zero crossing) occurred is the 
fraction of Acc/N.  If the Acc is zero, the "virtual" edge is at the 
clock that generated that value.  If the Acc is nearly N, that edge was 
nearly a full master clock period ago.  If the Acc is N/2, the 
transition would have ideally happened in the middle of that clock 
period.  The trouble with a "simple" implementation is that division 
isn't so simple.

It's my belief that the sub-50kHz signals won't feel the difference in 
10 ns of jitter but your experiments could surprise me.  As far as the 
12 MHz and 24 MHz signals go, look at what the DSM can offer you for 
fixed values.  I would hope these wouldn't need to be tuned like the low 
frequencies.

Please consider that adding dither to spread out your sidebands may have 
detrimental effects.  The dither (especially multi-bit dither) will add 
energy to the overall sideband spectrum.  While audio can deal with a 
higher noise floor but can't accommodate audio spurs, the investigations 
I've done for telecom jitter issues may not help me as much with the 
insight for your issues.

One more thing to consider if you like the idea of a sped-up output: 
Rather than an NCO where you'd have to have multiple adders to get 
multiple output phases, or a divider that doesn't give you the 
resolution you want, consider an N/N+1 divider where the choice of N or 
N+1 comes from a smaller NCO.

If you want to use a 24MHz clock for the divider and NCO (as just one 
extreme example) your divider would be 480 or more for 50kHz or less. 
For 49.999kHz, you'd want a divider of 480.009600.  Use a divider 
configured for divide by 480 or 481 and a 20-bit NCO (could be fewer but 
his gives outrageous 1ppm accuracy) with the decision of which divide to 
use controlled by the rollover of the NCO; when the fraction is large 
(480.96) the large phase value rolls over most cycles but in the 
49.999kHz case rarely rolls over.  When the divider goes to zero, the 
little NCO increments the fractional value.  The fractional value will 
give you a binary fraction *of the clock period*.  If you use the 4 
MSBits of the NCO, you'll have a range of 0-15 to select the clock phase 
for the output clock to feed a 16x speed edge generator.  (I've ignored 
the fact that the divider approach needs a 2x output clock to provide 
both edges of the output clock but the approach still applies)

Ain't frequency control fun?

Article: 96765
Subject: Re: Async Processors
From: "rickman" <spamgoeshere4@yahoo.com>
Date: 9 Feb 2006 22:30:22 -0800
Links: << >> << T >> << A >>

Simon Peacock wrote:
> "Jim Granville" <no.spam@designtools.co.nz> wrote in message
> news:43ebe95e$1@clear.net.nz...
> > rickman wrote:
> > >>>Yes, the async processor will run faster when conditions are good, but
> > >>>what can you do with those extra instruction cycles?
> > >>
> > >>Nothing, the point is you save energy, by finishing earlier.
> > >
> > >
> > > How did you save energy?  You are thinking of a clocked design where
> > > the energy is a function of time because the cycles are fixed in
> > > duration.  In the async design energy is not directly a function of
> > > time but rather a fuction of the processing.  In this case the
> > > processing takes the same amount of energy, it just gets done faster.
> > > Then you wait until the next external trigger that you need to
> > > synchronize to.  No processing or energy gain, just a longer wait time.
> >
> The gain is very simple.. every time a sync circuit clocks.. a zillion
> transistors switch.. less if the circuit is partitioned correctly.  But by
> definition, in an async circuit, only the one path actually does something.
> The clock distribution in an advanced processor is a significant proportion
> of the overall clock budget.  There is also the capacitive effect.. charge
> and discharge.. if the current never falls to zero, then the minimum power
> is slightly above zero also.  Static cmos fets, of course, draw no current.

You are talking about a circuit with NO clock gating.  A sync clocked
circuit can have the clocks dynamically switched on and off to save
power.  I already posted an example of a high end CPU which is doing
that and getting a 3x power savings, just like the async chip compared
to the sync chip listed earlier in the thread.

Actually power dissipation is more complex than just saying "this path
is not used".  If the inputs to a block of logic change, it does not
matter if the outputs are used or not.  The inputs must be held
constant or the combinatorial logic switches drawing current.

Clock distribution is not a significant issue in the timing budget.
The delays in the clock tree are balanced.  If you need to minimize the
effective delay to match IO clocking the delay can be compensated by a
DLL.  The clock tree does use some power, but so does the handshaking
used in async logic.

Everyone seems to minimize their analysis of the situation rather than
to think it through.  This reminds me a bit of the way fuzzy logic was
claimed to be such an advance, but when you looked at it hard you would
find little or no real advantage.  Do you see many fuzzy logic projects
around anymore?

Article: 96766
Subject: Re: Spartan3 embedded synchronous multipliers
From: austin <austin@xilinx.com>
Date: Thu, 09 Feb 2006 22:52:35 -0800
Links: << >> << T >> << A >>

Yes,

Austin

Isaac Bosompem wrote:

> Hi guys, I've been reading through the Spartan3 architecture embedded
> multipliers app note and I can't seem to find out how long (in terms of
> clock cycles) the sync multipliers in the Spartan3 will take. Can I
> safely assume that after I have asserted the inputs to the module, I
> will get the output back in the following clock cycle? 
> 
> Thanks,
> 
> Isaac
>

Article: 96767
Subject: Re: Async Processors
From: fpga_toys@yahoo.com
Date: 9 Feb 2006 23:05:23 -0800
Links: << >> << T >> << A >>

rickman wrote:
> > Again, depends on the application. If it's a packet routing/switching
> > engine
> > running below wire speed, then it means that the device will
> > route/switch
> > more packets per second without overrunning when not worst case.
>
> Do they design the equipment to drop packets when it gets hot or when
> the PSU is near the low margin or when the chip was just not the
> fastest of the lot?  That is my point.  You design equipment to a level
> of acceptable performance.  The equipment spec does not typically
> derate for uncontrolled variables such as temperature, voltage and
> process.

Yes, nearly every communications device with a fast pipe, will discard
packets when over run. Cisco Routers of all sizes, DSL modems,
wireless radios, .... it's just everyday fact of life.

Faster cpu's cost more money ... if you want to by a Cisco router that
drops packets at higher loads, spend more money. The primary difference
between whole families, is simply processor speed.

> If your VOIP started dropping packets so that your phone calls were
> garbled and the provider said, "of course, we had a hot day, do you
> expect to see the same speeds all the time?", would you find that
> acceptable?

IT HAPPENS!!!  Reality Check ... IT HAPPENS EVERY DAY.

>
> > > You can do the same thing with a clocked design.  Measure the
> > > temperature and run the clock faster when the temperature is cooler.
> > > It just is not worth the effort since you can't do anything useful with
> > > the extra instructions.
> >
> > But that only allows derating for temp based on worst case assessment
> > of the process data. It doesn't allow for automatic adjustments for
> > process
> > variation or other device specific variances.
>
> Actually they do.  That is called speed grading and nearly all makers
> of expensive CPU chips do it.  You could do that with any chip if you
> wanted to.  But there is no point unless you want to run your Palm 10%
> faster because it is a toy to you.

No, that is a completely different issue ... not dynamic fit of
processing
power to current on chip delays.

> Yes, I agree, you have to be realistic.  There is no significant
> advantage to having a processor run at different speeds based on
> uncontrolled variables.

No ... wrong.

Sorry ... that is true only in your mind for your designs. It does not
apply
broadly to all designs for all real world problems. Real engineers do
this because it really does matter for THEIR designs.

So, I've already stated clearly one every day application where the
customer
benifits by having routers only drop packets when the hardware isn't
capable
of going faster, rather than derating the whole design to reduced worst
case
performance levels.

> > Not true, as there have always been ways to design without glitches
> > using choices like gray coded counters for state machines, one hot
> > state machines, and covering the logic states to be glitch free by
> > design, which most good engineers will purposefully do when practical
> > and necessary, as should good tools. It's just a design decision to
> > ensure that every term is deterministic without static or dynamic
> > hazards. Maybe they don't teach this in school any more now that
> > everyone does vhdl/verilog.
>
> You are talking about stuff that no one uses because there is very
> little advantage and it does not outweight the cost.  My point is that
> none of this is justified at this time.

"no one uses because" ... sorry, but clearly you haven't been keeping
up with your reading and professional skills training as you certainly
don't know everyone.

You really need to read a lot more the C.A.F. to get a better grounding
on what people actually do these days. ... For starters read starting
at the end of page 3 about Data path reordering and glitch power
reduction:

  http://www.sequencedesign.com/downloads/90300-368.pdf

Get the point ... people concerned about low power, do actually design
to remove glitching by design ... by serious engineering design. Keep
on reading about what "no one uses because" to get a real understanding
about real no body engineering for power in section 5 Architecture
Optimization:

      http://www.mrtc.mdh.se/publications/0914.pdf

Note the lead in to the topic ... glitches can consume a large amount
of power. Now clearly some engineers have never had to worry about
battery life or waste heat from excess power. But for the real power
sensitive engineers, the truth is that nobody can ignore these factors.

The reality is that the faster the logic gets, the more you have to
worry about these timing miss match effects. Three generations back, a
1ns glitch was obsorbed into the switching times. At 90nm glitches as
short as a few dozen ps will cause two unwanted transistions and power
loss. The whole problem with glitches is this extra double state flip
when there should have been zero that robs power ... and that is
amplified by all the logic behind the glitch also flipping once or
twice as well ... greatly amplifing the cost of the initial failure to
avoid glitches by design. At 90nm there are a whole lot more sources of
glitches that require attention to design details that didn't even
matter two or three generations back. So while you may think that no
one actually attempts glitch free design practices, by using formal
logic tools to stop them dead, you clearly do not know everyone to make
that statement so firmly.

If you still think that no one decides to design formally correct
glitch free circuits, keep
reading what leading engineers from Actel, Xilinx, say:

http://klabs.org/mapld04/tutorials/vhdl/presentations/low_power_techniques.ppt
    http://www.ece.queensu.ca/hpages/faculty/shang/papers/fpga02.pdf

Note the end of section 5.2 where it discusses the power consumed in
several of the designs sections due to glitches was 9-18%. Note also
that agressively pipelining with the additional "free" registers in
FPGA's is a clear win. Other ASIC studies by Shen on CMOS combinatorial
logic have stated that as much as 20-70% of a devices power can be
consumed by glitches, which is a strong reminder to use the FPGA
registers and pipeline wisely.

So, from my perspective "no one" concerned about power can possibly be
doing their job if they are unware of glitching power issues ... a
stark contrast from your enlightening broad statements to the contrary.

> >     http://www.interfacebus.com/Design_Logic_Timing_Hazards.html
> >     http://findatlantis.com/absint_extended.pdf
>
> I have never heard anyone suggest that you should design to avoid the
> intermediate transients of logic.  Of course you can, but there are
> very few designs indeed that need to be concerned about the last few %
> of power consumption this would save.

I think you have now, and it's a lot more than a few percent for some
designs.

> Great, you have identified an advantage of async designs.  They can be
> done with extremely fine grained dual rail logic that can avoid
> transients in intermediate logic.  But then you can do that in sync
> designs if you decide you want to, right?

yep ... with worst case limited performance too.

Article: 96768
Subject: Re: Parallel NCO (DDS) in Spartan3 for clock synthesis - highest possible speed?
From: "Daniel Lang" <invalid@invalid.caltech.edu>
Date: Thu, 9 Feb 2006 23:09:04 -0800
Links: << >> << T >> << A >>

"PeterC" <peter@geckoaudio.com> wrote in message 
news:1139532804.420372.21440@f14g2000cwb.googlegroups.com...
>
> Yes, I need the MSB out of the FPGA, to drive an audio DAC. It's value
> Peter C.
>

Would it be possible to use the FPGA to interpolate the values going to the 
DAC to compensate for the clock jitter?  Use the previous and current 
uncorrected DAC values and the phase error for the current clock pulse to 
estimate the corrected DAC value.  This would ease your clock jitter 
requirements greatly.

Daniel Lang

Article: 96769
Subject: Re: Simulation of MicroBlaze embedded system
From: Paul Hartke <phartke@Stanford.EDU>
Date: Thu, 09 Feb 2006 23:25:00 -0800
Links: << >> << T >> << A >>

The XUPV2P Using Base System Builder Quickstart
(http://www.xilinx.com/univ/xupv2p.html) has an example modelsim do file
that displays the Microblaze program counter and MSR.  

Use mb-objdump to associate the program counter values with actual
assembly instructions from the source code.

Paul

motty wrote:
> 
> Is there a guide to behaviorally simulating a MB embedded system?  I
> have gotten as far as compiling the libraries, running simgen,
> instatiating the system in a top_level, and writing a testfixture for
> that top level.  It all loads correctly in NC-Sim, and I am able to
> look at all signals within the system.  I am able to validate that the
> dcm is working as it should.  It gets the input clock from the
> testfixture and outputs the multiplied/divided frequency that it
> should.
> 
> Our system has a Xilinx GPIO connected to a simple USB chip (FTDI).
> When a signal from the USB chip goes low, the microblaze code starts
> the process of reading the data from the chip.  I thought I had that
> working even though I was blindly writing timing in the testfixture
> without knowledge of where the MicroBlaze processor was or when things
> should occur.  It really isn't working...in the sim at least.
> 
> Is there a way to know what code the MicroBlaze is running at a
> particular time?  There are MANY signals available to look at in
> regards to MB.  There doesn't seem to be any documentation explaining
> how to do this.  It all kind of leaves off after getting everything
> compiled.
> 
> When do I know that the MicroBlaze has finished all the intitialization
> code that is written?  Will I have to actually analyze the number of
> cycles used and insert that timing in the testfixture?  It seems like
> there should be an easier way to do this.
> 
> Eventually, I would like to get to the point of running timing sims to
> see where our problems lie.  We get weird results after different
> builds when we change a signal or two on a debug port we are pulling
> out from our modules.  Sometimes things totally hose up!
> 
> Thanks.

Article: 96770
Subject: Re: EDK - PLB/OPB Bus questions.
From: Zara <yozara@terra.es>
Date: Fri, 10 Feb 2006 08:47:18 +0100
Links: << >> << T >> << A >>

On 9 Feb 2006 04:51:05 -0800, me_2003@walla.co.il wrote:

>Hi all,
>I have some questions regarding the PLB and OPB busses used in EDK.
>
>1) Is it true that a bus arbiter is needed only if there are 2 masters
>and up..(is there another scenario for using an arbiter) ?

Yes. But normally there will be an arbiter, as there are two masters:
microblaze (instruction) and microblaze (data). I do remember that
some time ago, there was the option to avoid any of the two OPB, but
at least from EDK 7.1 I cannot find it. (Will search a little, BTW)

>
>2) Do I need to instantiate a bus arbiter manually or does the tool
>does it for me ?

Tool does,latest versions.
>
>3) PLB is said to be able to perform read/write in the same cycle - how
>it is accomplished (I saw only one master address bus) ?

It is not exactly so. You may put the order to read and address to
read in one cycle, and read data while puttin order to write, address
and data to write in the following. It is simply that you may write
while terminating the read.
>
>4) What does a bus-split / decoupled address terms stand for ?

I don't know. Or at least, I don't recognize the concept under those
terms. But I am not native english speaver, it may just be a limited
knowledeg of the language.
>

Article: 96771
Subject: Re: ModelSim # Error loading design
From: "Hans" <hans64@ht-lab.com>
Date: Fri, 10 Feb 2006 08:25:46 GMT
Links: << >> << T >> << A >>

Looks like you are using both vlog (verilog) and vcom (vhdl) compiler, check 
that you have a dual language license,

Hans.
www.ht-lab.com

"mBird" <no@email.com> wrote in message 
news:11uo5ck9o574tfa@corp.supernews.com...
>I just downloaded Xilinx ISE 8.1 and ModelSim XE III/Starter 6.0d
> I make a simple project, using schematic (one and gate) an dthen make a 
> test bench waveform. I then do Simulate Behaviural Model but no matter 
> what I do I always get # Error loading design with no other indication of 
> erors. In the previous version of ISE and ModelSim it all worked so I am 
> not sure what is error?
> Any help greatly appretiared!
>
> The results of from ModelSim:
> # Reading C:/Modeltech_xe_starter/tcl/vsim/pref.tcl
> # do m.fdo
> # ** Warning: (vlib-34) Library already exists at "work".
> # Model Technology ModelSim XE III vlog 6.0d Compiler 2005.04 Apr 26 2005
> # -- Compiling module FD_MXILINX_matt_sch
> # -- Compiling module matt_sch
> #
> # Top level modules:
> #  matt_sch
> # Model Technology ModelSim XE III vcom 6.0d Compiler 2005.04 Apr 26 2005
> # -- Loading package standard
> # -- Loading package textio
> # -- Loading package std_logic_1164
> # -- Loading package std_logic_textio
> # -- Loading package std_logic_arith
> # -- Loading package std_logic_unsigned
> # -- Compiling entity m
> # -- Compiling architecture testbench_arch of m
> # Model Technology ModelSim XE III vlog 6.0d Compiler 2005.04 Apr 26 2005
> # -- Compiling module glbl
> #
> # Top level modules:
> #  glbl
> # vsim -L cpld_ver -L uni9000_ver -lib work -t 1ps m glbl
> # Loading C:\Modeltech_xe_starter\win32xoem/../std.standard
> # Loading C:\Modeltech_xe_starter\win32xoem/../std.textio(body)
> # Loading C:\Modeltech_xe_starter\win32xoem/../ieee.std_logic_1164(body)
> # Loading C:\Modeltech_xe_starter\win32xoem/../ieee.std_logic_textio(body)
> # Loading C:\Modeltech_xe_starter\win32xoem/../ieee.std_logic_arith(body)
> # Loading 
> C:\Modeltech_xe_starter\win32xoem/../ieee.std_logic_unsigned(body)
> # Loading work.m(testbench_arch)
> # XE version supports only a single HDL
> # Error loading design
> # Error: Error loading design
> #        Pausing macro execution
> # MACRO ./m.fdo PAUSED at line 8
>
>
>
>

Article: 96772
Subject: Re: Async Processors
From: Paul Johnson <abuse@127.0.0.1>
Date: Fri, 10 Feb 2006 11:52:18 +0000
Links: << >> << T >> << A >>

On 9 Feb 2006 16:08:35 -0800, "rickman" <spamgoeshere4@yahoo.com>
wrote:

>Jim Granville wrote:
>> rickman wrote:
>> > BTW, how does the async processor stop to wait for IO?  The ARM
>> > processor doesn't have a "wait for IO" instruction.
>>
>> Yes, that has to be one of the keys.
>> Done properly, JNB  Flag,$ should spin only that opcode's logic, and
>> activate only the small cache doing it.
>
>No, it should not spin since that still requires clocking of the fetch,
>decode, execute logic.  You can do better by just stopping until you
>get an interrupt.
>
>> > So it has to set
>> > an interrupt on a IO pin change or register bit change and then stop
>> > the CPU, just like the clocked processor.  No free lunch here!
>>
>> That's the coarse-grain way, the implementation above can drop to
>> tiny power anywhere.
>
>I disagree.  Stopping the CPU can drop the power to static levels.  How
>can you get lower than that?

There's overhead and delays associated with starting the clock up
again; not significant for power, but may make it impractical for
high-rate I/O.

The other option is to memory-map and use a wait. The clocked
processor will stall, and power consumption will drop, since the
output of the clocked elements aren't changing. However, the clock
nets are still charging and discharging.

For the async processor, though, you should be able to get down to
leakage currents.

Article: 96773
Subject: Re: Async Processors
From: Paul Johnson <abuse@127.0.0.1>
Date: Fri, 10 Feb 2006 12:11:56 +0000
Links: << >> << T >> << A >>

On 9 Feb 2006 15:58:10 -0800, "rickman" <spamgoeshere4@yahoo.com>
wrote:

>You have ignored the real issue.  The issue is not whether the async
>design can run faster under typical conditions; we all know it can.
>The issue is how do you make use of that faster speed?  The system
>design has to work in the worst case conditions, so you can only use
>the available performance under worse case conditions.

I think there's an issue here with the definition of "worst-case
conditions". It's not just process/voltage/temperature corners, and
tool would have to build in a safety margin even if it was. But, when
you're designing a static timing analyser, you also have to take into
account random localised on-die variations, and you have to build in
more safety margin just in case. The end result is that when doing
synchronous design your tool gives you a conservative estimate, and
you're stuck with it. If you've got a bad process async design and a
bad-process sync design sitting next to each other in a hot room with
low voltages, then the async design should presumably run faster.

>You can do the same thing with a clocked design.  Measure the
>temperature and run the clock faster when the temperature is cooler.

You can't do that because, I think, you can't get the tools to give
you a graph of max frequency vs. temperature for worst-case process
and voltage. You just get the corner cases. With an async design it
doesn't matter - it just runs as fast as it can. Brings to mind the
gingerbread man.

Article: 96774
Subject: ANTTI*HAPPY: building MicroBlaze uClinux on WinXP full sucess !!
From: "Antti" <Antti.Lukats@xilant.com>
Date: 10 Feb 2006 04:25:43 -0800
Links: << >> << T >> << A >>

I am ! happy and smiling, I got finally fully working MicroBlaze
uClinux image built fully from GPL sources on WinXP without the use of
any linux machine or linux emulation.

here is short intro how todo this:

http://help.xilant.com/UClinux:MicroBlaze:Win32Build

I wish I could have time to add more detailed docu about the process
but I need to prepare some demos for the Embedded in Nurnberg what
starts next tuesday

Antti

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search