Messages from 107175

Article: 107175
Subject: Re: JOP as SOPC component
From: "Martin Schoeberl" <mschoebe@mail.tuwien.ac.at>
Date: Fri, 25 Aug 2006 09:49:05 +0200
Links: << >> << T >> << A >>

seems that you posted in the wrond thread ;-)

> http://indi.joox.net now has the first compiled quartus files for the
> 16 bit indi core.
>
> basiclly and alu, control and registers, with fast interrupt switch.
>
> asynchro busy of cpu, and syncronous reset.
>
> the bus interface is not complete yet, as i have to think about the
> expansion modules.

Now you have to decide about: Avalon, SimpCon, Wishbone,....

Perhaps you can independetly compare Avalon and SimpCon with
your CPU design :-)

KJ can give you Avalon support, and I can give you SimpCon
support. However, you will find lot of information already
in this thread.

Martin

Article: 107176
Subject: Re: Why No Process Shrink On Prior FPGA Devices ?
From: "jacko" <jackokring@gmail.com>
Date: 25 Aug 2006 00:49:40 -0700
Links: << >> << T >> << A >>

hi

fpga_toys@yahoo.com wrote:
> Peter Alfke wrote:
> > Higher performance requires radical innovation and real cleverness
> > these days.
> > Peter Alfke
>

pin compatability is just customer support, how about a 1 pin high
implies a self program from a small hardwired rom, which gets enough of
the chip off the ground, to work as a programmer for itself and others.
some of that extra space :-)

internally they don't have to be the same, just roughly the same, as
i'm sure there will be extra logic area.

or how about a single sided io series, with 2 edges of for for corners,
then a scale down is just more logic mapped to fewer pins. and extra
die copies per cut chip.

it just needs an interface mapping layer (ie new standard size pads, to
old shrunk size pads (hyper buffers? or Capacitive resource.).

and could someone put some analog low power fast comparators on
please??

cheers

jacko

http://indi.joox.net a 24 blue block CPU element (16 bit)

Article: 107177
Subject: Re: Digilent USB support from Xilinx Impact (Programmer cable SDK for Impact)
From: Uwe Bonnes <bon@hertz.ikp.physik.tu-darmstadt.de>
Date: Fri, 25 Aug 2006 07:53:36 +0000 (UTC)
Links: << >> << T >> << A >>

zcsizmadia@gmail.com <zcsizmadia@gmail.com> wrote:
> I inject a dll into the impact.exe, and hook DeviceIoControl and some
> other kernel32 APIs. impact.exe calls DeviceIoControl to read/write LPT
> I/O port using windriver6 driver. Instead of calling original windrvr
> DeviceIoControl function I just forward the TMS/TDI/TDO/TCK bits  to
> Digilent USB (or any other programmer cable).

Do you do this without the windriver header file?

> On linux the easiest could be to create a brand new windrvr emulator
> driver where we implement all the IOCTLs used by impact. BTW, I have no
> clue why they are using the windriver as a device driver, because all
> the features they do with the Jungo driver ius really simple and
> generic(eg: user mode I/O access, USB acces to device, etc.)



-- 
Uwe Bonnes                bon@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik  Schlossgartenstrasse 9  64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------

Article: 107178
Subject: Re: fastest FPGA
From: lb.edc@telenet.be
Date: Fri, 25 Aug 2006 08:04:29 GMT
Links: << >> << T >> << A >>

Peter,

It's not enough with such a project to scan just to two leading
manufacturers. In stead, you need to scan all, because the others
(i.e. Lattice, Actel, Quicklogic) can just have the feature you need.
I'll give an example: on the generic I/O's, both Altera and Xilinx
can't get higher than 1.3 Gbps. Lattice's newest SC get I/O speeds up
to 2Gpbs. (I can't comment on Actel's speed as I have never used them)

At the logic side all three have about the same speed. As you will
know, the highest system speed will depend on the design constraints
(and also how well the tools are and how well you know the features).
I shouldn't use a microprocessor (even not a soft core) as it is only
addtional load (and taking away your resources).

PS. why don't you mention the V5 - it should be intrinsically faster?

Regards,

Luc

On 23 Aug 2006 11:12:36 -0700, "Peter Alfke" <peter@xilinx.com> wrote:

>First, you have to decide how much logic you need, i.e. how much money
>you want to spend.
>Then you have to look at the two leading manufacturers, which are -in
>order of size and speed- Xilinx and Altera
>>From Xilinx, I would recommend the appropriate size Virtex-4 LX part,
>or -if you need lots of multipliers and/or accumulators- the
>appropriate size Virtex-4 SX part.
>
>If you are after max speed, you hardly need a microprocessor, but both
>companies offer a soft microprocessor, it's called MicroBlaze in
>Xilinx.
>
>Good luck, sounds like a fun project.
>Peter Alfke, Xilinx

Article: 107179
Subject: Linear priority encoder in Xilinx Virtex4
From: "Sylvain Munaut <SomeOne@SomeDomain.com>" <246tnt@gmail.com>
Date: 25 Aug 2006 01:26:55 -0700
Links: << >> << T >> << A >>

Hello,

I need a linear priority encoder that has N input and N outputs.
Searching the group, I saw a thread where Peter Alfke stated :

--- cut ---

> Let me tell you what can be done in Virtex-4 (probably also in
> Spartan3):
> A priority "linear encoder" with 4 x N inputs and 4 x N outputs, each
> output corresponding to a prioritized input.
> Only one output is ever active, the one corresponding to the
> highest-priority active input.
> Total cost: 5N+1 (LUTs+flip-flops).
> Such a 32-input linear priority encoder uses 41 LUTs = 21 slices (<6
> CLBs), and runs at >250 MHz.
> The design is fully modular (per 4 bits).
>Peter Alfke

--- cut ---

But no details where given.

Can someone provide more details on how that implemented in a slice ?

Thanks.

    Sylvain

Article: 107180
Subject: Re: Linear priority encoder in Xilinx Virtex4
From: "Antti" <Antti.Lukats@xilant.com>
Date: 25 Aug 2006 01:44:07 -0700
Links: << >> << T >> << A >>

Sylvain Munaut <SomeOne@SomeDomain.com> schrieb:

> Hello,
>
> I need a linear priority encoder that has N input and N outputs.
> Searching the group, I saw a thread where Peter Alfke stated :
>
> --- cut ---
>
> > Let me tell you what can be done in Virtex-4 (probably also in
> > Spartan3):
> > A priority "linear encoder" with 4 x N inputs and 4 x N outputs, each
> > output corresponding to a prioritized input.
> > Only one output is ever active, the one corresponding to the
> > highest-priority active input.
> > Total cost: 5N+1 (LUTs+flip-flops).
> > Such a 32-input linear priority encoder uses 41 LUTs = 21 slices (<6
> > CLBs), and runs at >250 MHz.
> > The design is fully modular (per 4 bits).
> >Peter Alfke
>
> --- cut ---
>
> But no details where given.
>
> Can someone provide more details on how that implemented in a slice ?
>
> Thanks.
> 
>     Sylvain

was possible meant as brain teaser!

Antti

Article: 107181
Subject: Re: Digilent USB support from Xilinx Impact (Programmer cable SDK for Impact)
From: Martin Thompson <martin.j.thompson@trw.com>
Date: 25 Aug 2006 09:50:43 +0100
Links: << >> << T >> << A >>

David Ashley <dash@nowhere.net.dont.email.me> writes:

> Actually the question I have is what kinds of programmer cables can
> I use with linux. Impact -- I'd just as soon not use it. I prefer open
> source command line utilities. In fact I want everything to be a
> command line utility -- get rid of the IDE's. I use my own editor
> and "make" and I'm happy.
> 

I've done just this in Linux with a bit of makefile like this:

prog: work/$(TOP).bit
        @echo "setPreference -pref StartupCLock:AUTO_CORRECTION" > work/impact.c
md
        @echo "setMode -bs" >> work/impact.cmd
        @echo "setCable -port auto" >> work/impact.cmd
        @echo "Identify" >> work/impact.cmd
        @echo "setAttribute -position 1 -attr configFileName -value
$(TOP).bit" >> work/impact.cmd
        @echo "Program -p 1 " >> work/impact.cmd
        @echo "quit" >> work/impact.cmd
        cd work && impact -batch impact.cmd

make prog then does the work...

Of course, ideally, ditching Impact would be good ;-)

Cheers,
Martin

-- 
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html

Article: 107182
Subject: Re: ISERDES strange simulation behaviour
From: Martin Thompson <martin.j.thompson@trw.com>
Date: 25 Aug 2006 10:31:06 +0100
Links: << >> << T >> << A >>

"Antti" <Antti.Lukats@xilant.com> writes:

> GaLaKtIkUs™ schrieb:
> > A small question: what does mean gosh?
> >
> > A+
> 
> gosh no idea!
> 
> maybe is another perfectly perfect word, like "spunk" invented by Pippi
> 
> Antti
> 

http://www.thefreedictionary.com/gosh
gosh Pronunciation (gsh)
interj.
Used to express mild surprise or delight.
[Alteration of God.]

Cheers,
Martin
-- 
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html

Article: 107183
Subject: Re: Linear priority encoder in Xilinx Virtex4
From: mk <kal*@dspia.*comdelete>
Date: Fri, 25 Aug 2006 10:40:28 GMT
Links: << >> << T >> << A >>

On 25 Aug 2006 01:26:55 -0700, "Sylvain Munaut
<SomeOne@SomeDomain.com>" <246tnt@gmail.com> wrote:

>Hello,
>
>I need a linear priority encoder that has N input and N outputs.
>Searching the group, I saw a thread where Peter Alfke stated :
>
>--- cut ---
>
>> Let me tell you what can be done in Virtex-4 (probably also in
>> Spartan3):
>> A priority "linear encoder" with 4 x N inputs and 4 x N outputs, each
>> output corresponding to a prioritized input.
>> Only one output is ever active, the one corresponding to the
>> highest-priority active input.
>> Total cost: 5N+1 (LUTs+flip-flops).
>> Such a 32-input linear priority encoder uses 41 LUTs = 21 slices (<6
>> CLBs), and runs at >250 MHz.
>> The design is fully modular (per 4 bits).
>>Peter Alfke
>
>--- cut ---
>
>But no details where given.
>
>Can someone provide more details on how that implemented in a slice ?
>
>Thanks.
>
>    Sylvain

Let me try to verbally sketch this out for you for 4 bits:
Imagine two colums of luts. There is one lut on the left for 4 inputs
and 4 on the right one for each output. The left LUT is a 4 input OR
and all the LUTs on the right are AND gates but with bubbles on some
inputs. The top right LUT is just an AND of top first input bit and
the output of left LUT. The second LUT on the right has first input
inverted, second bit and output of left LUT. The third LUT has first
two inputs inverted, third input, and output of left LUT. The last LUT
has the first 3 inputs inverted and output of left LUT. This gives you
4 input, 4 output priority encoder with 5 LUTs. Now you have to be
able to cascade this ie you have to disable the output of left (OR)
LUT by a higher up left LUT. For this purpose you can use the carry
chain.
HTH.

Article: 107184
Subject: Re: JOP as SOPC component
From: "KJ" <kkjennings@sbcglobal.net>
Date: Fri, 25 Aug 2006 11:16:16 GMT
Links: << >> << T >> << A >>

Martin,

Thanks for the detailed response.  OK, we're definitely in the home stretch 
on this one.
To summarize...
>> I'm assuming that the master side address and command signals enter the
>> 'Simpcon' bus and the 'Avalon' bus on the same clock cycle.
> This assumption is true. Address and command (+write data) are
> issued in the same cycle - no magic there.
So Avalon and SimpCon are both leaving the starting blocks at the same 
time....no false starts from the starting gun.

>> Given that assumption though, it's not clear to me why the address and
>> command could not be designed to also end up at the actual memory
>> device on the same clock cycle.
I don't think your response here hit my point.  I wasn't questioning on 
which cycle the address/command/write data actually got to the SRAM, just 
that I didn't see any reason why the Avalon or SimpCon version would arrive 
on different clock cycles.  Given the later responses from you though I 
think that this is true....we'll get to that.

>> Given that address and command end up at the memory device on the same
>> clock cycle whether SimpCon or Avalon, the resulting read data would
>> then be valid and returned to the SimpCon/Avalon memory interface logic
>> on the same clock cycle.
> In SimpCon it will definitely arrive one cycle later. With Avalon
> (and the generated memory interface) I 'assume' that there is also
> one cycle latency - I read this from the tco values of the output
> pins in the Quartus timing analyzer report. For the SRAM interface I
> did in VHDL I explicitly added registers at the addredd/rd/wr/data
> output. I don't know if the switch fabric adds another cycle.
> Probably not, if you do not check the pipelined checkbox in the SOPC
> Builds.
Again, when I was saying 'the same clock cycle' I'm referring to clock cycle 
differences between Avalon and SimpCon.  In other words, if the 
SimpCon/Avalon bus cycle started on clock cycle 0, then when we start 
talking about when the data from the SRAM arriving back at the input to the 
FPGA, then with both designs it happens on clock cycle 'N'.  For the 
relative comparison between the two busses, I don't much care what 'N' is 
(although it appears to either be '1' or '2') just that 'N' is the same for 
both designs.  Again, I *think* you might be agreeing that this is true 
here, but coming up is a more definitive agreement.

By the way, no Avalon does not add any clock cycle latency in the fabric. 
It is basically just a combinatorial logic router as it relates to moving 
data around.

>> Given all of that, it's not clear to me why the actual returned data
>> would show up on the SimpCon bus ahead of Avalon or how it would be any
>> slower getting back to the SimpCon or Avalon master.  Again, this might
>> be where my hangup is but if my assumptions have been correct up to
>> this paragraph then I think the real issue is not here but in the next
>> paragraph.
>
> Completely agree. The read data should arrive in the same cycle from
> Avalon or SimpCon to the master.
And this is a key point.  So regardless of the implementation (SimpCon or 
Avalon), the JOP master starts the command at the same time for both and the 
actual data arrives back at the JOP master at the same time.  So the race 
through the data path is identical....(whew!), now on to the differences.

> Now that's the point where this
> bsy_cnt comes into play. In my master (JOP) I can take advantage of
> the early knowledge when data will arrive. I can restart my waiting
> pipeline earlier with this information. This is probably the main
> performance difference.
To contine the race analogy...So in some sense, even though the race through 
the data path ends in a tie, the advantage you feel you have with SimpCon is 
that the JOP master is endowed with the knowledge of when that race is going 
to end by virtue of this bsy_cnt signal and with Avalon you think you don't 
have this apriori knowledge.

So to the specifics now...I'm (mis)interpreting this to mean that if 
'somehow' Avalon could give JOP the knowledge of when 'readdatavalid' is 
going to be asserted one clock cycle earlier before it actually is then JOP 
on Avalon 'should' be able to match JOP on SimpCon in performance, is that 
correct?  (Again, this is a key point, where if this assumption is not 
correct, the following paragraphs will be irrelevant).

So under the assumption that the key problem to solve is to somehow enable 
the Avalon JOP master with the knowledge of when 'readdatavalid' is going to 
be asserted, one clock cycle before it actually is I put on my Avalon Mr. 
Wizard hat and say....well, gee, for an Avalon connection between a master 
and slave that are both latency aware (i.e. they implement 'readdatavalid') 
the Avalon specification requires that the 'waitrequest' output be asserted 
at least one clock cycle prior to 'readdatavalid'.  It can be more than one 
and it can vary (what Avalon calls 'variable latency') but it does have to 
be at least one clock cycle.  Since the Avalon slave design is under your 
design control, you could design it to act just this way, to assert 
'readdatavalid' one clock cycle after dropping 'waitrequest'.  So now, I 
have my 'early readdatavalid' signal.

Now inside the JOP master, currently you have some sort of signal that I'll 
call 'start_the_pipeline' which is currently based on this busy_cnt hitting 
a particular count.  'start_the_pipeline' happens to fire one clock cycle 
prior to the data from the SRAM actually arriving back at JOP (from the 
previously stated and possibly incorrect assumption).  My Avalon equivalent 
cheat to the sort of SimpCon cheating about having apriori knowledge about 
when the race completes is simply the following
start_the_pipeline    <= Jop_Master_Read and not(JOP_Master_Wait_Request);

To reiterate, this JOP master side equation is working under the assumption 
that the Avalon slave component that interfaces to the actual SRAM is 
designed to assert it's readdatavalid output one clock cycle after dropping 
it's waitrequest output.  So in some sense now I've endowed the Avalon JOP 
with the same sort of apriori knowledge of when the data is available that 
the SimpCon implementation is getting.

And here is another point where I think we need to stop and flat out agree 
or not agree that
- My stated assumption that if Avalon was to 'somehow' provide JOP a 'early 
readdatavalid' signal that was one clock earlier than 'readdatavalid' than 
the two JOP impliementations should have the same performance.
- The implementation of the Avalon slave component and the timing of 
waitrequest and readdatavalid is entirely doable.
- Given the apriori knowledge that the Avalon slave is now 'known' to have 
one clock cycle latency (the Avalon definition of latency measuring clock 
cycle delay from waitrequest to readdatavalid) that the equation for 
start_the_pipeline is doable and correct and will allow JOP on Avalon to get 
started whatever it needs to get started on the exact same clock cycle as 
JOP on Simpcon.

Assuming that we agree on those three points, then I think it is safe to say 
that I didn't really do any worse 'cheating' on Avalon than you did with 
SimpCon.  I'm using standard Avalon signals to accomplish the interface to 
JOP, I've smartened up the master side to have it 'know' about the latency 
in just the same way that SimpCon knows about it.

I'm also guessing that there might not even need to be any sort of cheating 
on Avalon either.  By that I mean, what if slower RAM was used and an extra 
clock cycle or two was needed.  In the SimpCon design you would need to 
update the master side logic to tell it about the correct new latency.  On 
the Avalon implementation the slave side 'could' now be asserting 
readdatavalid 2 or 3 clock cycles after after dropping waitrequest (which is 
entirely permissable and doable) and the magic 'start_the_pipeline' signal 
could now give JOP and extra clock cycle or two head start on readdatavalid. 
Whether that works or breaks JOP is something only you would know, but it 
might be worth pondering a bit.  In fact, if it does break JOP is there some 
issue with the JOP design?  What is magic about getting a one clock cycle 
head start versus two or three?  I'm not really expecting an answer here it 
might just be that's the way it is but I was jus playing devil's advocate 
and seeing if there is some inherent reason why JOP couldn't work with even 
earlier flags.

Anyway, if it doesn't break JOP, then really the Avalon master side doesn't 
need apriori knowledge of this *one* clock cycle delay at all, it could be 
anything.  If however, it would break JOP to have 'start_the_pipeline' come 
2 or 3 clock cycles before readdatavalid then this simply means that the new 
Avalon slave interface to the SRAM would have to be redesigned to maintain 
the one clock cycle Avalon latency between the waitrequest and readdatavalid 
outputs but nothing on the master side would need changing.

So with SimpCon, the design change required to accomodate the now slower 
SRAMs would be made in the SimpCon master via the busy_cnt signal; with the 
Avalon implementation the design change would be made in the slave interface 
to the SRAM.  Either one should be about the same amount of work I would 
think.

> As I see it, this can be enhanced in the same way I did the little
> Avalon specification violation on the master side. Use a MUX to
> deliver the data from the input register in the first cycle and
> switch to the 'hold' register for the other cycles. Should change
> the interface for a fairer comparison.
I agree, without the mux you've hindered the Avalon implementation.  I guess 
also to be fair, one would need to look at resource usage of the two designs 
as well and see how the two compare given supposedly 'equivalent' 
implementations.  Maybe SimpCon has some advantage in that regard.  Maybe 
the muxing costs logic, or maybe it all synthesizes to exactly the same 
thing.

Avalon being a mostly combinatorial fabric, reduces quite well but to be 
honest, the way they keep track of pending reads to slaves that have latency 
is pretty poor.  I don't think it will show up in this case because there 
just can't be a lot of reads pending but I had a design where I provided a 
bridge to a 33 MHz PCI bus which went into another processor which then 
wrote or fetched data from it's DRAM then provided it back to PCI and there 
I needed to provide for a fairly hefty number of pending reads (~64 or so I 
believe) to keep things moving along.  It's also where I learned that you 
really don't want to go overboard with the use of readdatavalid either.  Use 
it where you need the highest performance only, otherwise use waitrequest 
only.  Otherwise the SOPC Builder generated code that gets generated 
sloooooooooooows has terrible clock cycle performance for general busses 
with lots of slaves.

> Because rdy_cnt has a different meaning than waitrequest. It is more
> like an early datavalid. Dropping waitrequest does not help with my
> pipeline restart thing.
True, but I believe I've addressed the root of what those differences are.

>>> Enjoy this discussion :-)
>>> Martin
>>
>> Immensely.  And I think I'll finally get the light bulb turned on in my
>> head after your reply.
>>
> BTW: As I'm also academic I should/have to publish papers. SimpCon
> is on my list for months to be published - and now it seems to be
> the right time. I will write a draft of the paper in the next few
> days. If you are interested I'll post a link to it in this thread
> and your comments are very welcome.
>
OK.

KJ

Article: 107185
Subject: Re: JOP as SOPC component
From: "KJ" <kkjennings@sbcglobal.net>
Date: Fri, 25 Aug 2006 11:31:52 GMT
Links: << >> << T >> << A >>


>
> Let's say the address/command phase is per definition one cycle.
>
> That definition frees the master to do whatever it wants in the next
> cycle. For another request to the same slave it has to watch for the
> rdy_cnt in SimpCon. However, you can design a switch fabric with
> SimpCon where it is legal to issue a command to a different slave in
> the next cycle without attention to the first slave. You can just
> ignore the first slaves output until you want to use it.
In Avalon this would happen as well.  By your definition, the Avalon slave 
(if it needed more than one clock cycle to totally complete the operation) 
would have to store away the address and command.  It would not assert 
waitrequest on the first access.  If the subsequent access to that slave 
occurred while the first was still going on it would then assert wait 
request but accesses to other slaves would not be hindered.  The Avalon 
approach does not put this sort of stuff in the switch fabric but inside the 
slave design itself.  In fact, the slave could queue up as many commands as 
needed (i.e. not just one) but I don't get the impression that SimpCon would 
allow this because there is one rdy_cnt per slave (I'm guessing).

>> The Avalon fabric 'almost' passes the waitrequest signal right back to 
>> the
>> master device, the only change being that the Avalon logic basically 
>> gates
>> the slave's waitrequest output with the slave's chipselect input (which 
>> the
>> Avalon fabric creates) to form the master's waitrequest input (assuming a
>> simple single master/slave connection for simplicity here).  Per Avalon,
>
> I'm repeating myself ;-) That's the point I don't like in Avalon,
> Wishbone, OPB,...: You have a combinatorial path from address
> register - decoding - slave decision - master decision (to hold
> address/command or not). With a few slaves this will not be an
> issue. With more slaves or a more complicated interconnect (multiple
> master) this can be your critical path.
You're right, in fact it most likely will be the critical path.  Does 
SimpCon support different delays from different slaves?  If not and 
'everyone' is required to have the same number of wait states than I can see 
where SimpCon would have a performance advantage in terms of final clock 
speed on the FPGA, the tradeoff being that...everyone MUST have the same 
number of wait states.  Whether that is a good or bad tradeoff is a design 
decision specific to a particular design so in that regard it's good to have 
both SimpCon (as I limitedly understand it) and Avalon.

If SimpCon does allow for different slaves to have different delays than I 
don't see how SimpCon would be any better since there would still need to be 
address decoding done to figure out what the rdy_cnt needs to count to and 
such.  Whether that code lives in the master side logic or slave side logic 
is irrelevant to the synthesis enging.

>> how it appears to me, which is why I asked him to walk me through the
> As described in the other posting:
Yep, go that posting for the blow by blow description.

KJ

Article: 107186
Subject: Re: Style of coding complex logic (particularly state machines)
From: mikegurche@yahoo.com
Date: 25 Aug 2006 04:37:44 -0700
Links: << >> << T >> << A >>


backhus wrote:
> > In my original post I had no intention to reach a common consensus. I
> > wanted to see practical code examples which demonstrate the various
> > techniques and discuss their relative merits and disadvantages.
> >
> > Kind regards,
> > Eli
>
> Hi Eli,
> Ok, that's something different.
> Earns some contribution from my side :-)
>
> My example uses 3 Processes.
> The first one is the simple state Register.
> the second is the combinatocrical branch selection,
> The third creates the registered outputs.
>
> Recognize that the third process uses NextState for the case selection.
> Advantage: Outputs change exactly at the same time as the states do.
> Disadvantage: The branch logic is connected to the output logic, causing
>   longer delays.
> Workaround: If a one clock delay of the outputs doesn't matter, Current
> State can be used instead.
>
> The only critical part I see is the second process. Because it's
> combinatorical some synthesis tools might generate latches here, when
> the designer writes no proper code. But we all should know how to write
> latch free code, don't we? ;-)
>
> The structure is very regular, which makes it a useful template for
> autogenerated code.
>
> Have a nice synthesis
>     Eilert
>
> ENTITY Example_Regout_FSM IS
>    PORT (Clock : IN STD_LOGIC;
>          Reset : IN STD_LOGIC;
>          A : IN STD_LOGIC;
>          B : IN STD_LOGIC;
>          Y : OUT STD_LOGIC;
>          Z : OUT STD_LOGIC);
> END Example_Regout_FSM;
>
>
> ARCHITECTURE RTL_3_Process_Model_undelayed OF Example_Regout_FSM IS
>    TYPE State_type IS (Start, Middle, Stop);
>    SIGNAL CurrentState : State_Type;
>    SIGNAL NextState : State_Type;
>
> BEGIN
>
>    FSM_sync : PROCESS(Clock, Reset)
>      BEGIN -- CurrentState register
>        IF Reset = '1' THEN
>          CurrentState <= Start;
>        ELSIF Clock'EVENT AND Clock = '1' THEN
>          CurrentState <= NextState;
>        END IF;
>    END PROCESS FSM_sync;
>
>    FSM_comb : PROCESS(A, B, CurrentState)
>      BEGIN -- CurrentState Logic
>        CASE CurrentState IS
>          WHEN Start =>
>            IF (A NOR B) = '1' THEN
>              NextState <= Middle;
>            END IF;
>          WHEN Middle =>
>            IF (A AND B) = '1' THEN
>              NextState <= Stop;
>            END IF;
>          WHEN Stop =>
>            IF (A XOR B) = '1' THEN
>              NextState <= Start;
>            END IF;
>          WHEN OTHERS => NextState <= Start;
>        END CASE;
>    END PROCESS FSM_comb;
>
>    FSM_regout : PROCESS(Clock, Reset)
>      BEGIN -- Output Logic
>        IF Reset = '1' THEN
>          Y <= '0';
>          Z <= '0';
>        ELSIF Clock'EVENT AND Clock = '1' THEN
>          Y <= '0';  -- Default Value assignments
>          Z <= '0';
>        CASE NextState IS
>          WHEN Start => NULL;
>          WHEN Middle => Y <= '1';
>                         Z <= '1';
>          WHEN Stop => Z <= '1';
>          WHEN OTHERS => NULL;
>        END CASE;
>      END IF;
>    END PROCESS FSM_regout;
> END RTL_3_Process_Model_undelayed;

Hi, Eilert,

I generally use this style but with a different output segment.  I have
three output logic templates:

Template 1: vanilla, unbuffered output
   -- FSM with unbuffered output
   -- Can be used for Mealy/Moore output
   -- (include input in sensitivity list for Mealy)
   FSM_unbuf_out : PROCESS(CurrentState)
         Y <= '0';  -- Default Value assignments
         Z <= '0';
       CASE CurrentState IS
         WHEN Start => NULL;
         WHEN Middle => Y <= '1';
                        Z <= '1';
         WHEN Stop => Z <= '1';
         WHEN OTHERS => NULL;
       END CASE;
     END IF;
   END PROCESS FSM_regout;

Template 2: add buffer for output (There are 4 processes now ;-)
   -- FSM with buffered output
   -- there is a 1-clock delay
   -- can be used for Mealy/Moore output
   FSM_unbuf_out : PROCESS(CurrentState)
         Y_tmp <= '0';  -- Default Value assignments
         Z_tmp <= '0';
       CASE CurrentState IS
         WHEN Start => NULL;
         WHEN Middle => Y_tmp <= '1';
                        Z_tmp <= '1';
         WHEN Stop => Z_tmp <= '1';
         WHEN OTHERS => NULL;
       END CASE;
     END IF;
   END PROCESS FSM_unbuf_out;

   -- buffer for output signal
   FSM_out_buf : PROCESS(Clock, Reset)
     BEGIN -- Output Logic
       IF Reset = '1' THEN
         Y <='0';  -- Default Value assignments
         Z <='0';
       ELSIF Clock'EVENT AND Clock = '1' THEN
         Y <= Y_tmp ;  -- Default Value assignments
         Z <= Z_tmp;
     END IF;
   END PROCESS FSM_out_buf;


Template 3: buffer with "look-ahead" output logic
   -- FSM with look-ahead buffered output
   -- no 1-clock delay
   -- can be used for Moore output only
   FSM_unbuf_out : PROCESS(NextState)
         Y_tmp <= '0';  -- Default Value assignments
         Z_tmp <= '0';
       CASE NextState IS
         WHEN Start => NULL;
         WHEN Middle => Y_tmp <= '1';
                        Z_tmp <= '1';
         WHEN Stop => Z_tmp <= '1';
         WHEN OTHERS => NULL;
       END CASE;
     END IF;
   END PROCESS FSM_unbuf_out;

   -- buffer for output signal
   -- same as template 2
   FSM_out_buf : PROCESS(Clock, Reset)
   . . .

The code is really lengthy.  However, as you indicated earlier, its
structure is regular, and can be served as a template or even
autogenerated.  I develop the template based on
"http://academic.csuohio.edu/chu_p/rtl/chu_rtL_book/rtl_chap10_fsm.pdf"
It is a very good article on FSM (or very bad, if this is not your
coding style).

Mike G.

Article: 107187
Subject: Xilinx IPIF DMA done interrupt ?
From: "Martijn" <M.G.v.d.Horst@gmail.com>
Date: 25 Aug 2006 04:45:23 -0700
Links: << >> << T >> << A >>

Hello group,

I would appreciate help in capturing the DMA done interrupt. If anybody
could point me to a working example or a give any pointers on what I am
doing wrong, I would be very gratefull.

So, what am I doing?
I am using the Xilinx EDK 8.1 tool and have generated a system with a
custom peripheral. The custom peripheral was created using the "Create
or Import Peripheral" wizard and includes DMA, FIFOs, and user logic
interrupt support.
However, the interrupt service routine is not called when a DMA
transfer completes. Currently I need to poll the interrupt status
register (ISR) to see if a DMA transfer has finished.

The user logic stub generated by the wizard generates an interrupt
approx. every 10s @ 100Mhz, and when this interrupt occurs the
interrupt the service routine is called and I see a message on my UART.
So I know the service routine is hooked up correctly, that the
interrupt enable register (IER) and the global interrupt enable
register (GIER) are set correctly.

I made sure that the INCLUDE_DEV_ISC is set to 1 in the VHDL code to
make sure the device interrupts are included.
And sure enough, when I try to transfer data from an empty FIFO, the
service routine is also called because of a transaction error
interrupt.
So I know the interrupts from the device are enabled and working
correctly.

After that I checked that the device interrupt enable register (DIER)
is set to enable all interrupts from the device. (Not only the IPIR bit
for the user logic interrupts and the TERR for the transaction error
but all of them).
Furthermore I enabled the interrupts in the DMA0 and DMA1 interrupt
enable registers (DMA0_IER and DMA1_IER).

However, the only interrupts that cause a call to my service routine
are the timer interrupt from the user logic and transaction error. I
don't know why the service routine is not called when a transfer
completes.

I tried finding some working examples, but all the examples I could
find so far use polling on the ISR register to wait for the transfer to
complete.

Any pointers to get me in the right direction would be appreciated.

Thanks,

Martijn

Article: 107188
Subject: Re: Linear priority encoder in Xilinx Virtex4
From: "Sylvain Munaut <SomeOne@SomeDomain.com>" <246tnt@gmail.com>
Date: 25 Aug 2006 05:33:15 -0700
Links: << >> << T >> << A >>



> Let me try to verbally sketch this out for you for 4 bits:
> Imagine two colums of luts. There is one lut on the left for 4 inputs
> and 4 on the right one for each output. The left LUT is a 4 input OR
> and all the LUTs on the right are AND gates but with bubbles on some
> inputs. The top right LUT is just an AND of top first input bit and
> the output of left LUT. The second LUT on the right has first input
> inverted, second bit and output of left LUT. The third LUT has first
> two inputs inverted, third input, and output of left LUT. The last LUT
> has the first 3 inputs inverted and output of left LUT. This gives you
> 4 input, 4 output priority encoder with 5 LUTs. Now you have to be
> able to cascade this ie you have to disable the output of left (OR)
> LUT by a higher up left LUT. For this purpose you can use the carry
> chain.

Thanks ! Exactly what I was looking for ;)

     Sylvain

Article: 107189
Subject: Re: JOP as SOPC component
From: "KJ" <Kevin.Jennings@Unisys.com>
Date: 25 Aug 2006 06:04:45 -0700
Links: << >> << T >> << A >>

Martin,

A bit of an ammendment to my previous post starting...
KJ wrote:
> Martin,
>
> Thanks for the detailed response.  OK, we're definitely in the home stretch
> on this one.

After pondering a bit more, I believe the Avalon slave component to the
SRAM should NOT have a one clock cycle delay between waitrequest
de-asserted and readdatavalid asserted since that obviously would stall
the Avalon master (JOP) needlessly.  Instead the slave component should
simply assert waitrequest when a request comes in while it is still
busy processing an earlier one.  Something along the lines of...

process(Clock)
begin
 if rising_edge(Clock) then
    if (Reset = '1') or (Count = MAX_COUNT) then
       Wait_Request <= '0';
   elsif (Chip_Select = '1') then
       Wait_Request <= '1';
   end if;
end if;
end process;

where 'Count' and MAX_COUNT are used to count however many cycles it
takes for the SRAM data to come back/or be written.  If the SRAM only
needs one clock cycle then the term "(Count = MAX_COUNT)" could be
replaced with simply "Wait_Request = '1'"

So now back on the Avalon master side, I can still count on the Avalon
waitrequest to precede readdatavalid but now I've removed the guarantee
that the slave will make the delay between the two to be exactly one
clock cycle.  To compensate, I still would key off when the JOP Avalon
master read signal is asserted and waitrequest is not asserted.  In
other words the basic logic of my 'start_the_pipeline' signal is OK,
but depending on what the actual latency is for the design, maybe it
needs to be delayed by a clock cycle or so.  In any case, that signal
will still provide an 'early' form of the Avalon readdatavalid signal
and I think all of my points on that previous post would still apply.

Hopefully you've read this post before you got too far into typing a
reply to that post.

After yet more pondering on whether this is 'cheating' on the Avalon
side or not I think perhaps it's not.  The 'questionable' logic is in
the generation of the 'start_the_pipeline' signal that keys off of
waitrequest and uses it to produce this 'early data valid' signal.  But
this logic is simply a part of what I would consider to be a SimpCon to
Avalon bridge.  As such, that bridge is privy to whatever signals and
apriori knowledge that the SimpCon bus specification provides as well
as whatever signals and apriori knowledge that the Avalon bus
specification provides and has the task of mating the two.

If SimpCon needs an 'early data valid' signal as part of the interface
then it also needs to pony up to providing whatever info that the
SimpCon master has in regards to being able to know ahead of time when
that data will be valid...in other words, it would need to know the
same thing that you used to generate your rdy_cnt or busy_cnt whatever
it was called.

So I've basically concluded that while it might appear on the surface
to be a 'cheat' to use the Avalon signals as I have, you can only say
that if you're looking strictly at Avalon alone.  But since the
function being implemented is a bridge between SimpCon and Avalon, use
of SimpCon information to implement that function is fair game and not
a 'cheat'.

KJ

Article: 107190
Subject: Error message in ISE7.1
From: "Marco" <marco@marylon.com>
Date: 25 Aug 2006 06:14:33 -0700
Links: << >> << T >> << A >>

Hi,

anyone ever had something like this:
"ERROR:Xst:800 - "C:/MyFolder/control_unit_top.vhd" line 317:
Multi-source on Integers in Concurrent Assignment."?
It refers to a process line:
"control_make_reply: process(ctrl_top_clock)"
I'll provide further details if needed, but for this first post I was
just wondering if someone ever saw it working with ISE7.1

Thanks,
Marco

Article: 107191
Subject: Re: high level languages for synthesis
From: Jan Panteltje <pNaonStpealmtje@yahoo.com>
Date: Fri, 25 Aug 2006 13:29:49 GMT
Links: << >> << T >> << A >>

On a sunny day (24 Aug 2006 23:20:57 -0700) it happened fpga_toys@yahoo.com
wrote in <1156486857.335345.201150@i3g2000cwc.googlegroups.com

>:There is a reason the software world doesn't allow
>software engineers to write production programs in native machine
>language ones and zeros, or even use high level assembly languange in
>most cases .... and increasingly not even low level C code.

There are many many cases where ASM on a micro controller is to be preferred.
Not only for code-size, but also for speed, and _because of_ simplicity.

For example PIC asm (Microchip) is so simple, and universal, and
there is so much library stuff available, that it is, at least
for me the _only_ choice for simple embedded projects.
No way 'C'.
Yes hardware engineers add sometimes great functionality with a few lines of
ASM or maybe Verilog, cool!

I guess it is a matter of learning, I started programming micros with
switches and 0010 1000 etc, watch the clock cycles.. you know.
Teaches you not to make any mistakes, as re-programming an EPROM took
15 to 20 minutes erase time first...  
Being _on_ the hardware (registers) omits the question of 'what did that
compiler do', in many cases gives you more flexibility.
C already puts some barrier, special versions of C for each micro
support special functions in these micros....

But from a 'newcomer' POV perhaps coding part of a project in a higher
level language... but hey...

In spite of what I just wrote .. anyways why wants everybody all of the
sudden Linux in FPGA? So they can then write in C?
Have it slower then in a cheap mobo ?

Ok, OTOH I appreciate the efforts for a higher level programming, as long
as the one who uses it also knows the lower level.
That sort of defuses your argument that 'engineers need less training' or
something like that, the thing will have to interface to the outside world
too, C or not.
 
>The time
>for allowing engineers to design hardware at the ones and zeros binary
>level is passing too as the tools like System C emerge and produce
>reasonable synthesis with co-design.

Much more important then _how_ it is coded is the design, idea, behind it.
Yes one sort of paint may give better results then an other painting the
house, but wrong colors will be wrong with all kinds of paint.
If it becomes a fine line, like art, then it is the painter not the paint.

Article: 107192
Subject: Re: Digilent USB support from Xilinx Impact (Programmer cable SDK
From: "Amontec, Larry" <laurent.gauch@ANTI-SPAMamontec.com>
Date: Fri, 25 Aug 2006 15:32:11 +0200
Links: << >> << T >> << A >>

zcsizmadia@gmail.com wrote:
> I've created a patch for Impact so it supoorts Digilent USB. This patch
> could be modified to become some kind of SDK for different programmer
> cable which are not supported by Impact.
> 
> So here is my survey. What kind of programmer cables would you like to
> use with Impact?
> 
> Regards,
> 
> Zoltan
> 

The support of Amontec JTAGkey would be very nice (based on FTDI FT2232 
- FT2232L to be green). The Amontec JTAGkey is one of the only USB JTAG 
POD with very large io voltage range (all IOs can drive at 24mA) !

goto http://www.amontec.com/jtagkey.shtml

Note:
We are working on a new version cheaper but without ESD-EMI overvoltage 
... protection and for 5V to 2.8V only. Coming in the next two weeks.
We are working on our own .dll for controlling the JTAGkey integrating 
JTAG layer.

Let me know if you want to receive one Amontec JTAGkey as sample (we 
will ship to you without any charge for you), this can be help for your 
integration.

If interested, send me an email to laurent DOT gauch @ amontec DOT com

Regards,
Laurent
www.amontec.com

Article: 107193
Subject: Re: Linear priority encoder in Xilinx Virtex4
From: "Sylvain Munaut <SomeOne@SomeDomain.com>" <246tnt@gmail.com>
Date: 25 Aug 2006 06:48:17 -0700
Links: << >> << T >> << A >>


Sylvain Munaut <SomeOne@SomeDomain.com> wrote:
> > Let me try to verbally sketch this out for you for 4 bits:
> > Imagine two colums of luts. There is one lut on the left for 4 inputs
> > and 4 on the right one for each output. The left LUT is a 4 input OR
> > and all the LUTs on the right are AND gates but with bubbles on some
> > inputs. The top right LUT is just an AND of top first input bit and
> > the output of left LUT. The second LUT on the right has first input
> > inverted, second bit and output of left LUT. The third LUT has first
> > two inputs inverted, third input, and output of left LUT. The last LUT
> > has the first 3 inputs inverted and output of left LUT. This gives you
> > 4 input, 4 output priority encoder with 5 LUTs. Now you have to be
> > able to cascade this ie you have to disable the output of left (OR)
> > LUT by a higher up left LUT. For this purpose you can use the carry
> > chain.
>
> Thanks ! Exactly what I was looking for ;)
>

Well, actually I'm having trouble with the chaining ... how do you
implement
what you describe (masking the lower priority OR luts) when a higher
one is active.

I have the slice schema before me and I can't figure that out ...
(damn, am I slow today ...)

I see that :
 - I could chain two stages, but no more than two.
 - I could use a big or of all the upper priority groups with the carry
chain and then use it to mask the left OR. But then for 3 groups, I
would need :
   - Upper priority group : - Just 1 LUT4 for the OR + 4 LUT4 for
encoding
   - Middle priority group : - 1 LUT4 for the OR + 1 LUT4 for the
'masking' + 4 LUT4 for encoding
   - Low priority group - 1 LUT4 for the OR + 2 LUT4 for the masking +
4 LUT4 for encoding

So that would be 18 LUT4.

And 5*3 + 1 = 16; So there is better ;)



   Sylvain

Article: 107194
Subject: Re: Digilent USB support from Xilinx Impact (Programmer cable SDK for Impact)
From: "zcsizmadia@gmail.com" <zcsizmadia@gmail.com>
Date: 25 Aug 2006 07:02:48 -0700
Links: << >> << T >> << A >>

Antii,

The cableserver methos sounds much better. In that case the full
Digilent USB could be
utilized.
Next week I'll look into it.


Uwe,

Yes and no. I use the windrvr.h to check which IOCTL is important to
overwrite. The actual LPT I/O communication is very simple. You can see
the location of port and data in the packets.

BTW I use strace to log low level API activity by impact.
http://www.bindview.com/Services/RAZOR/Utilities/Windows/strace_readme.cfm

Zoltan

Article: 107195
Subject: Re: Ultracontroller II: PROM solution in EDK 8.1
From: "Patrick Dubois" <prdubois@gmail.com>
Date: 25 Aug 2006 07:05:44 -0700
Links: << >> << T >> << A >>

Hi Louis,

Did you manage to fix your problem? If so I'd be interested to know the
solution, as my struggling with the UC2 myself...

Anyone else using the UltraController II? I'd be interested to talk
with someone who managed to get a system with UC2 + SystemAce running.

Patrick

louis lin wrote:
> Hi Patrick,
>
> Thank you for your help.
> Actually, I patched Answer #23011 before going through the flow
> in EDK 8.1i.
> I'll try to use uc2.vhd instead of uc2.ngc.
>
> Regards,
> louis
>
>
> "Patrick Dubois" <prdubois@gmail.com>
> :1155922197.062518.72270@75g2000cwc.googlegroups.com...
> > Hi Louis,
> >
> > I'm also currently working with the UC2, but on a Virtex-II Pro.
> >
> > First of all, make sure to replace some files as indicated in Answer
> > Record #23011:
> > http://tinyurl.com/e7c2g
> >
> > I also just solved a bug with the UC2 design on the Virtex-II Pro (I
> > spent a week on it). It turns out that the file uc2.ngc is buggy. There
> > is no error given by Xilinx tools however (which made is quite hard to
> > debug), but the resulting bit file cannot be programmed in the FPGA
> > (programming fails in Impact). The solution is to not use uc2.ngc at
> > all and instead use uc2.vhd, also provided in the reference design. I
> > don't know if the reference design for the V4 has the same issue
> > though...
> >
> > Best of luck,
> >
> > Patrick Dubois
> >
> >
> >
> > louis lin wrote:
> > > Has anyone tried Ultracontroller PROM solution of V4 in EDK 8.1i?
> > >
> > > The reference example is built by EDK 7.1i.
> > > I went through the flow again in ISE 8.1 SP3 + EDK 8.1i SP2.
> > > However, the resultant MCS can't program the XC4VFX12 properly
> > > (DONE LED didn't go high).
> > >
> > > After I got the reference, I only added the -nostartfiles compiler
> option to
> > > fix
> > > the multiple _start problem.
> > >
> > > Regards,
> > > louis
> >

Article: 107196
Subject: Re: Why No Process Shrink On Prior FPGA Devices ?
From: christopher.saunter@durham.ac.uk (c d saunter)
Date: Fri, 25 Aug 2006 14:07:07 +0000 (UTC)
Links: << >> << T >> << A >>

Peter Alfke (alfke@sbcglobal.net) wrote:

<snip>
: Higher performance requires radical innovation and real cleverness
: these days.
: Peter Alfke

Such as this?

http://www.tip.csiro.au/ISEC2003/talks/OWe2.pdf

JPL and Northrop Grumman built a 5k gate 8 bit CPU running at 20GHz by 
using superconducitng logic on a chip, it needs helium cycle cryogenics to 
hit 4.5k, but on the other hand it doesn't generate much heat being 
superconducting...

I'd have thought gate arrays would make an excelent tool for 
investigating the technology...

cds

Article: 107197
Subject: Re: ISERDES strange simulation behaviour
From: "=?iso-8859-1?B?R2FMYUt0SWtVc5k=?=" <taileb.mehdi@gmail.com>
Date: 25 Aug 2006 07:21:25 -0700
Links: << >> << T >> << A >>

Antti wrote:
> GaLaKtIkUs=99 schrieb:
>
> > In the Virtex-4 user guide (ug070.pdf p.365 table 8-4) it is clearly
> > indicated that for INTERFACE_TYPE=3DNETWOKING and DATA_RATE=3DSDR the
> > latency should be 2 CLKDIV clock periods.
> > I instantiated an ISERDES of DATA_WIDTH=3D6 but I see that valid output
> > appears on the next CLKDIV rizing edge.
> > Any explanations?
> >
> > Merci d'avance!
>
> advice: dont belive the simulator, its not always correct.
> place the iserdes and chipscope ILA into dummy toplevel, load some FPGA
> and look what happens in real silicon.
>
> Antti

Unfortunately the tests on the board using Chipscope gave the same
results as in simulation.
I looked for informations on this issue on Xilinx's site but I didn't
found any thing.
So I assume that the issue is that I didn't understand the table 8-4 in
the Virtex-4 UserGuide.
If you can help you're welcome (I can send you
simulation/implementation files I used).
I'm going to make the same simulations/tests on-board as described a
few posts higher but for wordlengths>6 i.e where 2 ISERDES are needed.

Cheers

Article: 107198
Subject: Re: Why No Process Shrink On Prior FPGA Devices ?
From: "Peter Alfke" <alfke@sbcglobal.net>
Date: 25 Aug 2006 07:28:47 -0700
Links: << >> << T >> << A >>


jacko wrote:
>>
> pin compatability is just customer support, how about a 1 pin high
> implies a self program from a small hardwired rom, which gets enough of
> the chip off the ground, to work as a programmer for itself and others.
>
>
We have had that since the beginning, 20 years ago.
It is called "Master Mode Configuration"

Peter Alfke, Xilinx

Article: 107199
Subject: Re: fastest FPGA
From: Ray Andraka <ray@andraka.com>
Date: Fri, 25 Aug 2006 10:34:48 -0400
Links: << >> << T >> << A >>

Totally_Lost wrote:
> Austin Lesea wrote:
> 
>>There is no such thing as "over-clocking" a FPGA
> 
> 
> Since Austin is technically and english language challenged, here is an
> aid to decrypting this bull shit claim that Austin proudly would object
> that it's only ammonium nitrate .... hehehehe
> 
> The Free On-line Dictionary of Computing (27 SEP 03) [foldoc]
> 
> overclocking
> 
>         <hardware> Any adjustments made to computer hardware (or
>         software) to make its CPU run at a higher clock frequency
>         than intended by the original manufacturers.  Typically this
>         involves replacing the crystal in the clock generation
>         circuitry with a higher frequency one or changing jumper
>         settings or software configuration.
> 
>         If the clock frequency is increased too far, eventually some
>         component in the system will not be able to cope and the
>         system will stop working.  This failure may be continuous (the
>         system never works at the higher frequency) or intermittant
>         (it fails more often but works some of the time) or, in the
>         worst case, irreversible (a component is damaged by
>         overheating).  Overclocking may necessitate improved cooling
>         to maintain the same level of reliability.
>      
>         (1999-09-12)
> 

  Mr Lost,

 >> So ... you are claiming any valid design will run in any Xilinx FPGA
 >> at max clock rate?

I'm afraid you don't know what you are talking about as far as FPGAs go. 
  The clock rate for an FPGA design depends heavily on the design.  It 
is not like a microprocessor where you have a fixed hardware design that 
has been characterized to guarantee running at a specific clock rate. 
Instead, it is up to the FPGA user to perform a timing analysis on his 
design to determine what the maximum clock rate for that design is.  The 
max clock rate depends on the logic and routing delays for that design. 
  As part of the due diligence for the design, the designer needs to 
perform a timing analysis which in turn gives you a minimum clock cycle 
time for which the design is guaranteed to work.  Overclocking then only 
makes sense in the context of that design.  If you clock it faster than 
the minimum cycle time found in the timing analysis, then you are 
overclocking the design.  This is usually considered poor form for 
hardware design, but it can certainly be done if you are aware of the 
risks.  That said, in laboratory conditions, FPGA designs can usually be 
overclocked by 10 or 15% of the max clock frequency for that design as 
found the timing analysis.

The maximum toggle rate in the data sheets only tells you what the 
flip-flops in the fabric are capable of doing reliably over the 
temperature range.  That doesn't take into account the propagation 
delays for the routing or combinatorial logic surrounding those 
flip-flops....those parameters, which to the user are far more important 
than the max toggle rate (that number is mainly for the benefit of 
export restrictions), weigh heavily on the specification for the user's 
design.

Attempting to use the max toggle rate of the flip-flops to define 
overclocking would be like trying to define the overclocking of a CPU in 
terms of the switching time of a transistor on the die rather than the 
aggregate that comprises the useful circuit.  The difference that is 
probably confusing you is that the CPU is characterized as a completed 
circuit design, similar to doing the timing analysis on a placed and 
routed FPGA design.  Overclocking then is clocking the design at a clock 
rate faster than the clock rate the design was intended to be clocked 
at. Overclocking does not make sense outside the context of a specific 
design.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search