Messages from 50475

Article: 50475
Subject: Hold violation in synthesis but not fitting
From: prashantj@usa.net (Prashant)
Date: 11 Dec 2002 09:17:18 -0800
Links: << >> << T >> << A >>

Hi all,
I have been struggling with a problem for nearly a month now. I have a
small piece of code which gives hold time violations when its
synthesized in Quartus II. But if I synthesize and complete the
fitting of the design, there are no hold time violations. Post fitting
simulations dont show any violations either. Has anyone see such
behavior in their designs ? Is this a problem and if so, what needs to
be done to fix it ?

Thanks,
Prashant

Article: 50476
Subject: Re: partial Bitstream Size in Virtex-II
From: Peter Alfke <peter@xilinx.com>
Date: Wed, 11 Dec 2002 09:18:39 -0800
Links: << >> << T >> << A >>

Heiko Kalte wrote:

> Hi,
> I am trying to estimate the partial bitstream size of a 201 slices
> design part in a Virtex-II 500.
>
> The problem is that I did not found out the frames per CLB in a Virtex2,
> there must be more than 48 because there are 4 slices in a CLB.
> Additionally I nead the bit per frame for a Virtex-II 500.
> Please Help me.

Here are some basic numbers:
The XC2V500 has 32 rows and 24 columns of CLBs, and needs a total of ~2.56
million configuration bits. Each column of CLBs is composed of 22 frames.

Do the math:
There are roughly 107 000 config bits per CLB column, with about 4800 bits
in each of the 22 frames.

Now come the constraints:
You can only reconfigure integer frames, and you have to be really clever
to reconfigure only portions of the 22 frames making up a column.
So, most likely you will reconfigure  whole columns of 32 CLBs = 128 slices
= 256 LUTs.
In your case, you will reconfigure two columns, and you have to floorplan
your partial design such that it all fits into two vertical CLB columns.

So, count on roughly 1/12 of the bitstream, or about 215,000 config bits.

Peter Alfke, Xilinx Applications

Article: 50477
Subject: Re: Hold violation in synthesis but not fitting
From: Muzaffer Kal <kal@dspia.com>
Date: Wed, 11 Dec 2002 17:44:13 GMT
Links: << >> << T >> << A >>

On 11 Dec 2002 09:17:18 -0800, prashantj@usa.net (Prashant) wrote:

>Hi all,
>I have been struggling with a problem for nearly a month now. I have a
>small piece of code which gives hold time violations when its
>synthesized in Quartus II. But if I synthesize and complete the
>fitting of the design, there are no hold time violations. Post fitting
>simulations dont show any violations either. Has anyone see such
>behavior in their designs ? Is this a problem and if so, what needs to
>be done to fix it ?
>
>Thanks,
>Prashant

Hold violations happen when clk->Q + logic delay (assuming zero skew)
is smaller than the hold constraint of the accepting flop. If you have
little or no logic and a fast flop, the synthesizer may underestimate
the wire delay and generate a hold violation. When you do P&R, the
actual wire delay gets added and that may resolve the hold violation.
So if you're not seeing hold violations after P&R, you're OK. In the
future if you see hold violations after P&R the solution might be to
do an ECO and add a slow buffer between the two flops to fix it.

Muzaffer Kal

http://www.dspia.com
ASIC/FPGA design/verification consulting specializing in DSP algorithm implementations

Article: 50478
Subject: Power consumption question
From: brad@tinyboot.com (Brad Eckert)
Date: 11 Dec 2002 10:00:59 -0800
Links: << >> << T >> << A >>

I'm designing a soft CPU and I have a question regarding power
consumption. I know that in an ASIC, the power consumption is roughly
proportional to the clock rate.

I think driving the capacitance of wires is mainly what consumes
power, so I'd like to keep them short, and try to keep long wires from
changing state often.

My question is, do registers draw much power if you clock them but
their outputs don't change state?

Article: 50479
Subject: Re: question about fft vs. cross corelation in fpga
From: "Pete Fraser" <pete@rgb.com>
Date: Wed, 11 Dec 2002 10:01:29 -0800
Links: << >> << T >> << A >>


"Jay" <kayrock66@yahoo.com> wrote in message
news:d049f91b.0212101735.6d7ba6f6@posting.google.com...
> I'm going to second Paul's suggestion about just computing the the
> correlation sequentially, its just too easy not to do.  And also, I
> don't think a single correlation is going to give you your time delay,
> I think you're going to have to slide the 2 data sets across each
> other and keep computing that correlation until it peaks.
>

Do you mean convolution?

I think correlation already does the sliding stuff.

Article: 50480
Subject: Re: partial Bitstream Size in Virtex-II
From: Heiko Kalte <kalte@hni.upb.de>
Date: Wed, 11 Dec 2002 19:49:13 +0100
Links: << >> << T >> << A >>

Hello Peter,
thanks for your help. Can you give me the amount of additional bits
(besides the frame data) for initialisation and all that stuff. I heard
about 384Bit + frame data + dummy frame for the Virtex and VirtexE
series.
Heiko

PS: Ich bins, aus dem kalten und regnerischen Paderborn, NRW. Ich wünsch
dir frohe Weihnachten, falls wir uns nicht mehr hören.
 

Peter Alfke schrieb:
> 
> Heiko Kalte wrote:
> 
> > Hi,
> > I am trying to estimate the partial bitstream size of a 201 slices
> > design part in a Virtex-II 500.
> >
> > The problem is that I did not found out the frames per CLB in a Virtex2,
> > there must be more than 48 because there are 4 slices in a CLB.
> > Additionally I nead the bit per frame for a Virtex-II 500.
> > Please Help me.
> 
> Here are some basic numbers:
> The XC2V500 has 32 rows and 24 columns of CLBs, and needs a total of ~2.56
> million configuration bits. Each column of CLBs is composed of 22 frames.
> 
> Do the math:
> There are roughly 107 000 config bits per CLB column, with about 4800 bits
> in each of the 22 frames.
> 
> Now come the constraints:
> You can only reconfigure integer frames, and you have to be really clever
> to reconfigure only portions of the 22 frames making up a column.
> So, most likely you will reconfigure  whole columns of 32 CLBs = 128 slices
> = 256 LUTs.
> In your case, you will reconfigure two columns, and you have to floorplan
> your partial design such that it all fits into two vertical CLB columns.
> 
> So, count on roughly 1/12 of the bitstream, or about 215,000 config bits.
> 
> Peter Alfke, Xilinx Applications

-- 
---------------------------------------------------------------
Dipl. Ing. H. Kalte               |
HEINZ NIXDORF INSTITUTE           | Office: F1.213
System and Circuit Technology     | Fon: +49 (0)5251 60-6459
Fürstenallee 11                   | Fax: +49 (0)5251 60-6351
33102 Paderborn, Germany          |
---------------------------------------------------------------
mailto:kalte@hni.uni-paderborn.de
http://wwwhni.uni-paderborn.de/sct/
---------------------------------------------------------------

Home of the RAPTOR Rapid Prototyping Systems
http://www.RAPTOR2000.de/

---------------------------------------------------------------

Article: 50481
Subject: Re: Tiny Forth Processors
From: Jim Granville <jim.granville@designtools.co.nz>
Date: Thu, 12 Dec 2002 07:50:37 +1300
Links: << >> << T >> << A >>

Martin Schoeberl wrote:
> 
<snip>
> I was really disappointed when starting with my Java processor. I played
> around a little bit with hand optimization of JVM code. In one example
> program execution time dropped on JOP and in Suns JVM in interpreting mode
> for about 25%. But the execution time
> of the JIT compiled program was LONGER!. In JDK 1.1 about 10% and in JDK 1.3
> it was about 15 times slower!

Wow - a real warning to those who would use Java for embedded/real time
work.
Just be real carefull to never change versions.

Was that 15x slower, because the interpreting mode got faster ?
It's not a trivial amount of work to make something run 15x slower :)

> 
> Here is the complete example: http://www.jopdesign.com/perf.html
> 
> I think javac does no optimization to make it easier for JIT to compile JVM
> stack code to a register machine. And the JIT assumes this simple minded
> stack code for it's optimization.
> 
> I was looking arund to find some byte code optimizer, but have only found
> this 'obfuscators'. No optimizer for the stack code.
> 
> > Further you might say that the stack access must be slower than a register
> > file, since you require to access to the stack plus write-back.  But in
> 
> No. I assumed the same single cycle access time for 2 operands load and one
> write on the stack as for two register load and one write.
> 
> I think this thread gets a little bit OT, perhaps we should move to
> c.l.forth or c.l.j.machine

I find the HW/SW layer interaction interesting, and not OT. 
It is also good to hear real experiences in deploying 
different designs into FPGAs. 

Did you ever compare the .NET byte code, or consider a .NET FPGA engine
?

I noted this in a AMD press release: 

#AMD Athlon MP 2400+ dual point-to-point 266MHz processor buses,
providing up to
#2.1Gbytes/s bandwidth. 0.13-micron processing
#.. has a pipelined superscalar floating point engine and a Level 2
#cache translation look-aside buffer. 
#AMD also has added 51 new instructions to its 3DNow instruction set.
#The processor has a list price of $228 in 1,000-unit quantities. 

The addition of 51 new opcodes seemed quite significant, and we should
expect stack-machine opcodes to appear in future mainstream CPUs.

-jg

Article: 50482
Subject: Re: Tiny Forth Processors
From: wtanksley@bigfoot.com (William Tanksley Google)
Date: 11 Dec 2002 10:55:25 -0800
Links: << >> << T >> << A >>

"Ralph Mason" wrote:
> "William Tanksley Google" <wtanksley@bigfoot.com> wrote:
> > Why would that be? The simulation I built as a college project was faster
> > than any of my classmates', so I'm a bit sceptical.

> A stack machine will require more cycles per instruction.  So by the cycles
> per interction metric they are slower.

My machine was single-cycle, and I computed the clock rate as 90MHz, better
than any of the other machines in that class for which people computed a rate.
Chuck Moore's chips follow a similar design, and he fabs them; they're single
cycle per instruction, limited by memory bandwidth (except for the ADD 
instruction, which will return bad results if you don't let the numbers sit
on the stack long enough for the carry to propagate). They run at peak
500MIPS, so roughly 100MHz (they're a bit asynchronous, but only between
word loads -- there are 4 instructions per instruction word load).

Stack machines should require less time per instruction, not more. They
have no need for a register access stage, for example; the ALU can be
directly connected to the TOS and nOS, and compute all results directly.
Memory accesses can also be prefetched (to read from memory, first load
the A register, then do other things until you think it's fetched, then
read from the D register).

But anyhow, both of these are single-cycle machines.

>But perhaps a stack machine is simple enough that it can have a faster cycle
>time,

I would guess so -- it eliminates the register selection phase, thus allowing
ALU results to be computed in parallel with instruction decoding.

> or perhaps the instructions give a higher computation yield than risc
>instructions.

I doubt it -- most of them I've seen are fairly RISC.

Although for my final assignment in that design class I had to implement
a sort -- I added a perfect shuffle circuit onto the stack, and added an
instruction to operate it. Four instructions to sort 16 elements (plus
overhead to load and unload the elements, of course).

Then, to add insult to injury, one of my fellow student complained to me
that the TA graded him down for excessive program length in that assignment,
and asked me what my program length was. He walked away dejected and convinced
that the TA was right after all (his program was 800 bytes, mine was 90).

Heh.

-Billy

Article: 50483
Subject: Re: Tiny Forth Processors
From: dkelvey@hotmail.com (dwight elvey)
Date: 11 Dec 2002 11:02:34 -0800
Links: << >> << T >> << A >>

"Martin Schoeberl" <martin.schoeberl@chello.at> wrote in message news:<VtuJ9.115181$A9.1351706@news.chello.at>...
> > > Thats perfectly rigth. Things are a lot easier with a stack architecture
> > > (but also slower since you need more cycles for basic operations than on
>  a
> > > RISC CPU).
> >
> > Why would that be? The simulation I built as a college project was faster
> > than any of my classmates', so I'm a bit sceptical.
> 
> It's slower in terms of cycles/instructions for the same statement:
> 
> A simple example: a = b + c;
> 
> On a RISC CPU the chance is high, that local variables are allocated to
> registers. So this statement compiles to somthing like:
> 
>     add r0, r1, r3
> 
> On a stack machine (like the JVM) this will compile to:
> 
>     load a
>     load b
>     add
>     store c

Hi
 You seemed to have missed the point of a stack machine.
One vary rarely uses local variables. One maintains
the stack in such a way as to avoid using locals. Most
stack machine instructions would just be:

 add

In some cases, the stack order might not be right or you might
need to fetch something else. Things like:

 over
 add

 In this case, it looks like more cycles but often the over operation
can be optimized into the instruction it self, with no cycle
penalty.
 Your thinking of local usage is the result of using languages
that depend on them. When one programs a stack machine, one
thinks more about optimizing data flow. The 'just in time concept'
applies. It is just another way of thinking.
Dwight

> 
> > The advantage of Forth is that it's well-suited as-is for running
> > hardware, and once you have it running Java can be implemented on top. I
>  would
> > rather use Forth as a machine language than Java bytecodes.
> 
> Isn't Forth a 16 bit system? Building a 32 bit JVM on top of this would be
> not very efficient.
> 
> Martin

Article: 50484
Subject: Re: Tiny Forth Processors
From: wtanksley@bigfoot.com (William Tanksley Google)
Date: 11 Dec 2002 11:04:07 -0800
Links: << >> << T >> << A >>

hmurray@suespammers.org (Hal Murray) wrote:
> > A stack machine will require more cycles per instruction.  So by the cycles
> > per interction metric they are slower.
> That's not obvious to me.  What's an instruction?  Who cares?  Why
> not measure time to execute a line of code?

The old MIPS problem (Meaningless Indication of Processor Speed). Yes,
instruction time is hard to judge -- but on a theoretical basis we can
talk about different processors with equivalent instruction sets, and
the comparison becomes meaningful.

> > That's a point I like about stack machines. See the stack as a better, more
> > efficient cach. And it's better predictable when it comes to real time
> > systems.

> Huh?  How are you measuring efficiency?  More predictable than registers?

Yes, more predictable than registers; but more importantly, more predictable
than random access memory. The top of the stack contains the things which
will be needed soon, so store them in fast memory or registers; the middle
of the stack will be needed eventually, so store it in medium-speed storage
like RAM; and the rest of the stack will take a while to be needed, so allow
it to be paged out if needed.

> > And my thinking was, perhaps a stack machine meets these requirements better
> > than a risc type machine.

> How are you measuring goodness?

Meeting requirements.

-Billy

Article: 50485
Subject: Re: Power consumption question
From: "Mathew Orman" <orman@tyrellinnovations.com>
Date: Wed, 11 Dec 2002 20:19:06 +0100
Links: << >> << T >> << A >>

The output of any register has to drive the input of the next block in your
design. If the output is low than the is a sink current drown from the next
input and if it is high
the register is sousering the current to the next input block.
Now when there is a high frequency of the register switching there are
additional currents for charging and discharging the transmition lines
(connection wires).
And so on.....

"Brad Eckert" <brad@tinyboot.com> wrote in message
news:4da09e32.0212111000.599de814@posting.google.com...
> I'm designing a soft CPU and I have a question regarding power
> consumption. I know that in an ASIC, the power consumption is roughly
> proportional to the clock rate.
>
> I think driving the capacitance of wires is mainly what consumes
> power, so I'd like to keep them short, and try to keep long wires from
> changing state often.
>
> My question is, do registers draw much power if you clock them but
> their outputs don't change state?

Article: 50486
Subject: Re: Tiny Forth Processors
From: dkelvey@hotmail.com (dwight elvey)
Date: 11 Dec 2002 11:19:55 -0800
Links: << >> << T >> << A >>

"Martin Schoeberl" <martin.schoeberl@chello.at> wrote in message news:<BnGJ9.123098$A9.1424459@news.chello.at>...
> Sorry, but my first statement was to simple and a mix of theory and praxis.
> I'll try (as I can) to explain it in more detail:
> 
> > Acutally I'm currently looking at implemeting a stack based architecture
> > along with a RISC architecture.  I think you miss the point a bit with
> > your example.  You say that local varaiables are allocated to registers
> > for the RISC example, but then force the stack based design to load to
> > the stack.  Giving the stack based architecture the same advantage,
> > that both operands are available at the top of the stack surely the
> > optimised code would be :-
> >
> > add
> 
> As I know (perhaps I'm wrong) in theory every computing problem can be
> solved with a stack architecture without local variables. But for procedural
> languages you need two stacks: one for the operands and one for the return
> addresses. When you mix them in one stack you have to load the function
> parameters on the current operand stack.
> 
> And how is following example solved?
> 
> f(a) {
>     b = 1-a;
>     return b
> }
> 
> assume paramters and return values on stack:
> 
>     push 1
>     swap             -- one 'extra' stack manipulation is necessery
>     sub
>     return

Why wouldn't you include an instruction that did:

swap/sub as a single step. This is waht is typically done
in many of the Forth processors I've seen.

> 
> Now to the practical thing:
> 
> The JVM is a little bit inconsistent on usage of the stack. For function
> calls the parameters must be pushed on the stack. But in the called function
> they are accessed as 'locals'. I think this point comes from (perhaps wrong)
> anticipation of the language designers to use one stack for data
> (parameters) and the return addresses.
> 
> So a simple function like:
> 
> int f(int a, int b) {
>     return a+b;
> }
> 
> translates to:
> 
> Method int f(int, int)
>    0 iload_1
>    1 iload_2
>    2 iadd
>    3 ireturn
---snip---


 Again, your mind set is setting your expectations. You are
forcing the machine to do what you believe is the best
way to represent the problem. Maybe Java is not the best
example of a stack language to look at. In the above
example, you've assumed that the two values need to be
loaded. In a typical stack implementation, they are already there
and are just consumed by the add and replaced by the result.
Forth typically has two stacks but both input and result
data are kept on the same stack. In Forth, the two stacks
are use to keep program flow separate from data flow.
Dwight

Article: 50487
Subject: HDL for Hough tranform
From: inasir1a@hotmail.com (cheema)
Date: 11 Dec 2002 11:23:12 -0800
Links: << >> << T >> << A >>

Is there a place where i can get HDL for Hough tranform. I would like
to implement the algorithm for both circle and elips


thanks

Article: 50488
Subject: Re: Tiny Forth Processors
From: wtanksley@bigfoot.com (William Tanksley Google)
Date: 11 Dec 2002 11:27:55 -0800
Links: << >> << T >> << A >>

"Ralph Mason" wrote:
> But in a stack machine, one would hope that the operands you were working on
> would be right on the top of the stack, thus the instruction is only
> add
> All the registers are implicit, thereby giving a more compact instruction
> size.

True.

> The slowness comes in because of the work the stack machine must do to
> perform that add

> Fetch op one from stack (dec sp )
> Fetch op two from stack (dec sp)
> Add ops
> Push result to stack ( inc sp)

Nope -- the mistake here is in assuming that the stack processor is
only pretending to be a stack processor. Nope, it's a real stack
processor; the ALU is gated directly to the two top-of-stack elements.
If the stack's entirely on-chip, there are no fetch operations; if
only the top few elements are on-chip, the fetch operations can occur
in parallel with the add.

> By my count this is 4 cycles, although I suppose you could use a dual port
> stack and pop the arguments in one cycle. Although this increases the area
> of the design, which doesn't fit with what I want to acheieve. It doesn't
> look like it lends itself well to any kind of
> pipelining either.

Does the design I explained make sense?

> Basically I want to make something that is

> 1. very small (in area)
> 2. As useful as possible (eg can fit lots of code)
> 3. Powerful enough (how's that for scientific)
> 4.Is easy to make a good development tool

See the MuP21 and later chips for concrete examples.

> Ralph

-Billy

Article: 50489
Subject: Re: partial Bitstream Size in Virtex-II
From: Austin Lesea <austin.lesea@xilinx.com>
Date: Wed, 11 Dec 2002 11:33:18 -0800
Links: << >> << T >> << A >>

Heiko,

Use the following option to get the difference between two bitstreams:

bitgen {all options used for design1.bit} -g ActiveReconfig:Yes -r
design1.bit design2.ncd

This creates the difference bitfile from the two ncd files.

In this way you can see exactly what a partial bitstream size is for
reconfiguration.

Austin

Heiko Kalte wrote:

> Hi,
> I am trying to estimate the partial bitstream size of a 201 slices
> design part in a Virtex-II 500. I did a rough estimation for a
> Virtex600E. Therefor I divied the Slices by 2 to get the Number of CLBs.
> Afterward I divied this by the number of CLBs in a Column for a
> Virtex600E. This leads to at least 2 columns (or more depends on the
> floorplan). Each CLB column consists of 48 frames and each frame of
> 960bit. Adding some initialization leads to 93504 config bit.
>
> The problem is that I did not found out the frames per CLB in a Virtex2,
> there must be more than 48 because there are 4 slices in a CLB.
> Additionally I nead the bit per frame for a Virtex-II 500.
> Please Help me.
> Heiko
>
> By the way is this calculation correct?
>
>
> --
> ---------------------------------------------------------------
> Dipl. Ing. H. Kalte               |
> HEINZ NIXDORF INSTITUTE           | Office: F1.213
> System and Circuit Technology     | Fon: +49 (0)5251 60-6459
> Fürstenallee 11                   | Fax: +49 (0)5251 60-6351
> 33102 Paderborn, Germany          |
> ---------------------------------------------------------------
> mailto:kalte@hni.uni-paderborn.de
> http://wwwhni.uni-paderborn.de/sct/
> ---------------------------------------------------------------
>
> Home of the RAPTOR Rapid Prototyping Systems
> http://www.RAPTOR2000.de/
>
> ---------------------------------------------------------------

Article: 50490
(removed)

Article: 50491
Subject: Re: hardware image processing - log computation
From: Ray Andraka <ray@andraka.com>
Date: Wed, 11 Dec 2002 19:49:47 GMT
Links: << >> << T >> << A >>

If it is just 8 bit video, you could just use a single BRAM as an 8 in to 8 out
LUT and be done with it, that is if you don't need the BRAM for your line
buffers.

"Normand Bélanger" wrote:

> "Open mouth, insert foot"
>
> I went back to look at the previous messages on this thread and saw
> that you are right. I was replying to Philip Freidin message in which he
> talked about LNS and floating point so I assumed that FP like precision
> was needed; I should have taken a look a the OP first.
>
>    Sorry for the confusion,
>
>       Normand
>
> "Ray Andraka" <ray@andraka.com> a écrit dans le message de news:
> 3DF7699F.D0D3A7C8@andraka.com...
> > When I read imaging application, I was assuming 8 or 10 bit video, in
> which case
> > a 4 or 5 bit lookup is plenty.  If it is for medical or surveillance, it
> might
> > have more bits per pixel, in which case a higher precision log might be
> > desired.  Could use block ram as a LUT for 8 bit look-up if you have the
> block
> > ram to spare, or you could go to either a divider-like structure similar
> to the
> > one Isreal Koren presents in his book, or to a two tiered LUT approach.
> >
> > "Normand Bélanger" wrote:
> >
> > > Agreed. I was under the impression that precision was needed in this
> > > case so I suggested this.
> > >
> > >    Normand
> > >
> > > "Ray Andraka" <ray@andraka.com> a écrit dans le message de news:
> > > 3DF6BC7D.B9D0BB5A@andraka.com...
> > > > Depends on the accuracy you desire and the resources at hand.  The
> quick
> > > and
> > > > dirty log I mentioned before is both faster and smaller than the
> restoring
> > > > arithmetic method described by Israel Koren in his book.
> > > >
> > > > "Normand Bélanger" wrote:
> > > >
> > > > > I'm currently working on an "FPU" like this (i.e. LNS computations).
> > > > > The best way I know of computing a LOG is described in Prof. Koren
> > > > > Computer arithmetic book in chapter 9 (if I recall correctly). It is
> > > also
> > > > > fairly easy to implement if you don't mind a significant latency.
> > > > >
> > > > >    Good luck,
> > > > >
> > > > >       Normand
> > > >
> > > > --
> > > > --Ray Andraka, P.E.
> > > > President, the Andraka Consulting Group, Inc.
> > > > 401/884-7930     Fax 401/884-7950
> > > > email ray@andraka.com
> > > > http://www.andraka.com
> > > >
> > > >  "They that give up essential liberty to obtain a little
> > > >   temporary safety deserve neither liberty nor safety."
> > > >                                           -Benjamin Franklin, 1759
> > > >
> > > >
> >
> > --
> > --Ray Andraka, P.E.
> > President, the Andraka Consulting Group, Inc.
> > 401/884-7930     Fax 401/884-7950
> > email ray@andraka.com
> > http://www.andraka.com
> >
> >  "They that give up essential liberty to obtain a little
> >   temporary safety deserve neither liberty nor safety."
> >                                           -Benjamin Franklin, 1759
> >
> >

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 50492
Subject: Re: Power consumption question
From: "Kip Ingram" <Kip@NOkipSPAMingram.rom>
Date: Wed, 11 Dec 2002 19:53:11 GMT
Links: << >> << T >> << A >>

I don't believe so.  It's been a while since I've cogitated at the
transistor level, but my recollection is that power consumption arises as
transistors move through their active region (from saturation to cutoff or
from cutoff to saturation).  If the bits of the register don't change then
the transistor states don't change and you shouldn't have to pay the power
piper.

Sound right to the rest of you?

Kip Ingram

--
Get daily news and analysis from the FPGA market for pennies a day.
Subscribe to
The FPGA Roundup today: http://www.KipIngram.com/FPGARoundup.html

--
"Brad Eckert" <brad@tinyboot.com> wrote in message
news:4da09e32.0212111000.599de814@posting.google.com...
> I'm designing a soft CPU and I have a question regarding power
> consumption. I know that in an ASIC, the power consumption is roughly
> proportional to the clock rate.
>
> I think driving the capacitance of wires is mainly what consumes
> power, so I'd like to keep them short, and try to keep long wires from
> changing state often.
>
> My question is, do registers draw much power if you clock them but
> their outputs don't change state?

Article: 50493
Subject: Re: Tiny Forth Processors
From: "tim simpson" <tim_xxx_@monster-works.co.nz>
Date: Thu, 12 Dec 2002 08:56:31 +1300
Links: << >> << T >> << A >>

> As I know (perhaps I'm wrong) in theory every computing problem can be
> solved with a stack architecture without local variables. But for
procedural
> languages you need two stacks: one for the operands and one for the return
> addresses. When you mix them in one stack you have to load the function
> parameters on the current operand stack.
>
> And how is following example solved?
>
> f(a) {
>     b = 1-a;
>     return b
> }
>
> assume paramters and return values on stack:
>
>     push 1
>     swap             -- one 'extra' stack manipulation is necessery
>     sub
>     return
>

There are some excellent points by others here as well, but I should also
say that, having two stacks for your stack based processor the return
instruction can actually be encoded into your sub, so that the actual
instruction count is only :-

push 1
swap
sub + return

So your return is basically free, as it can be computed in parallel with
the subtraction.

--
Tim Simpson Ph.D
Design Engineer
(reply to address is not valid remove _xx_ to reply)

Article: 50494
Subject: Re: FPGA/PCI on low budget
From: "Austin Franklin" <austin@da98rkroom.com>
Date: Wed, 11 Dec 2002 15:04:59 -0500
Links: << >> << T >> << A >>

Hi Hal,

> The PCI connector has power pins for 5V, and 3.3V, and also a
> few more pins for IO power.

No, there is no guarantee you will get 3.3V power, unless you are in a 3.3V
slot!

>  They are either 3 or 5, depending
> upon the signaling voltage,

No, as well.  The VIO pins are 3 or 5 depeding on the signaling voltage, but
the 5V pins are still 5V, and the 3.3V pins are only guaranteed to be 3.3V
on a 3.3V slot.

> the idea being that you can wire
> them to the supply rail for your IO pads and make a board that
> supports either 3V signaling or 5V, depending upon the power the
> motherboard supplies on those pins.

The power pins and VIO pins are separate.

> The PCI connector has a plug that matches with a cutout on the
> board.  The plug goes in either of two positions (turn the connector
> around), one for 5V signaling, the other for 3V.  So in theory,
> you can make three types of cards.  The normal card in wide use
> is 5V signaling, though they may only drive the outputs with a
> 3V CMOS driver.  You can also make 3V only card by putting the
> cutout on the other end of the card.  You can also make 3V/5V
> cards by cutting out both slots and maybe wiring the IO pad
> rail on your chip to the IO supply from the PCI connector.

Yes, that is correct.

> I've never seen any 3V or dual cards.

I have done quite a few of them.

> The main question I was trying to ask was if anybody had seen
> any 3V or dual signaling level cards.  If so, I might think more
> about taking advantage of that.  Since I didn't see many
> encouraging responses I'll probably but this on the back burner.

It depends on what your goal is.  There is no need to do a 3.3V card for use
in a standard PC, as they are all still 5V today.  The only reason really
for going to 3.3V is to go 66MHz.

> Some early systems didn't actually supply any 3.3V power.  You
> can dance around that with an on-board regulator.  I plan to
> ignore that.  (But I'll check my systems first, just in case,
> and listen for tales of troubles with not-so-early boards.)

I would not ignore that.  Most systems don't have the 3.3V power, not just
"early" ones.

> The 3V signaling rules overlap the 5V rules enough so that a
> card that drives high to 3V will work in a 5V system.

Well, not really.  The important issue isn't voltage but the VI curve.

> The Spartan-II is 5V tolerant but doesn't have DLLs.  The -IIE
> has DLLs, but doesn't tolerate 5V signaling.

What do you want to use the DLL for?

> Since 3V systems don't seem to be very popular, I probably won't
> build a card expecting to find a 3V only slot.
>
> Several years ago, I put a scope on a system that had the connector
> pegs set for 5V.  I never saw anything go over 3V.  Obviously that
> depends upon what cards are plugged in.  Somebody could add an
> old/evil card that really does drive to 5V.
>
> For hack/research systems it might make sense to use a FPGA that
> wasn't 5V tolerant on a card that could be plugged into a 5V system.
> You would have to remember to get out the scope before adding a card
> that hadn't been tested yet.  I'm probably not desperate enough
> to get the DLLs that I will do this.  (But I'm still scheming.)

Checking a voltage level has somewhat little to do with the actual
signaling.  It's far more than just the voltage level with PCI.  I'm not
quite sure what you're talking about above...but you really should read the
signaling part of the PCI spec.

> Thanks for the PLX suggestions.  Their web site expects me to
> register before they give me data sheets so I'll put that on the
> back burner.

Why is registering a problem?

> Thanks for the heads-up about using DLLs on PCI clocks.  Is
> that a clear don't-do-that, or just another worm for the list?

To be PCI spec compliant, don't do that...but I personally have NEVER seen a
system that changes the PCI clock, except when switching from 33MHz to 66MHz
(they all start out at 33MHz to check which cards are 66MHz compatable...or
at least they're supposed to ;-).  Some times they base the PCI clock on the
FSB speed...100MHz FSB = 25MHz PCI clock...but it's static, and doesn't
change once up and running.

Regards,

Austin

Article: 50495
Subject: Any experience with Altera Apex PCI Development Kit?
From: jhannula@cs.uvic.ca (Jason Hannula)
Date: 11 Dec 2002 12:07:06 -0800
Links: << >> << T >> << A >>

I've been playing with the PCI development kit for a few months using
Altera's Quartus II software.

I keep running problems trying to program the board using either JTAG
or on board flash with designs configured in Quartus with Altera's
MT64 interface IP. The design simulate fine for target transactions
(that's all I looking at right now).

Altera's support people alternate between telling me to reinstall the
drivers and rebooting my computer.

If anyone is using the kit I would like to hear if it really works or
not.

Thanks
Jason Hannula

Article: 50496
Subject: Re: FPGA/PCI on low budget
From: "Austin Franklin" <austin@da98rkroom.com>
Date: Wed, 11 Dec 2002 15:07:48 -0500
Links: << >> << T >> << A >>

Stephen,

> Normal consumer PC's most certainly have 3.3V power rails on their
> PCI slots,

You can't guarantee ALL PCI systems do that, consumer or not...and is
something to be conscious of if you want to make a board that will not give
you customer service issues.

 Austin

Article: 50497
Subject: Re: Tiny Forth Processors
From: wtanksley@bigfoot.com (William Tanksley Google)
Date: 11 Dec 2002 12:08:04 -0800
Links: << >> << T >> << A >>

"Martin Schoeberl" <martin.schoeberl@chello.at> wrote:
>>> Thats perfectly rigth. Things are a lot easier with a stack
architecture
>>> (but also slower since you need more cycles for basic operations
than on
>>> a RISC CPU).

>> Why would that be? The simulation I built as a college project was
faster
>> than any of my classmates', so I'm a bit sceptical.

> It's slower in terms of cycles/instructions for the same statement:
> A simple example: a = b + c;
> On a RISC CPU the chance is high, that local variables are allocated to
> registers. So this statement compiles to somthing like:
>     add r0, r1, r3

> On a stack machine (like the JVM) this will compile to:

>     load a
>     load b
>     add
>     store c

That's not a stack machine -- that's a register machine with a
temporary stack, probably the most wasteful decision possible.

>>The advantage of Forth is that it's well-suited as-is for running
>>hardware, and once you have it running Java can be implemented on
top. I
>>would rather use Forth as a machine language than Java bytecodes.

> Isn't Forth a 16 bit system? Building a 32 bit JVM on top of this would be
> not very efficient.

No, it's whatever bittedness you want it to be. Most modern Forths
match the processors they run on. There's a 20.5 bit processor whose
instruction set is Forth (21 bit stack, 20 bit memory).

> Martin

-Billy

Article: 50498
Subject: Re: Power consumption question
From: Peter Alfke <peter@xilinx.com>
Date: Wed, 11 Dec 2002 12:10:05 -0800
Links: << >> << T >> << A >>

Mathew Orman wrote:

> The output of any register has to drive the input of the next block in your
> design. If the output is low than the is a sink current drown from the next
> input and if it is high
> the register is sousering the current to the next input block.

Not true. Nowadays we are all using CMOS technology, where the static current
is essentially zero ( femtoamps inside the chip).

The original question was whether a flip-flop draws dynamic current when it is
clocked, but does not change state. And the answer is: a little power is spent
by the clock line plus the clock input to the flip-flop wiggling, but
obviously less than when Q also wiggles.

BTW, Xilinx FPGAs automatically prune the permanently unused branches from the
clock-distribution tree, in order to save power..

Peter Alfke, Xilinx Applications

Article: 50499
Subject: Re: Power consumption question
From: Jim Granville <jim.granville@designtools.co.nz>
Date: Thu, 12 Dec 2002 09:25:07 +1300
Links: << >> << T >> << A >>

Kip Ingram wrote:
> 
> I don't believe so.  It's been a while since I've cogitated at the
> transistor level, but my recollection is that power consumption arises as
> transistors move through their active region (from saturation to cutoff or
> from cutoff to saturation).  If the bits of the register don't change then
> the transistor states don't change and you shouldn't have to pay the power
> piper.
> 
> Sound right to the rest of you?

 Yes, but you do have a clock budget to pay, to actually get the clock
distributed to the (not changing) registers.

 So there are two elements in the power eqn, Clock routing, and output
transistion.

 Note also with the latter, that glitches from combinatorial delay
deltas
will add power for no nett logic - see some earlier posts about the
% of power change from reduce of glitch by better pipeline/floorplan.

 The clock structure inside the FPGA should also be considered, and just
how granular the routes are - typically there will be coarser steps, as
a new row or column buffer gets enabled.

-jg

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search