Messages from 109025

Article: 109025
Subject: Re: New Lattice 32-bit Embedded Microprocessor Available Through Unique Open Source License
From: "Jon Beniston" <jon@beniston.com>
Date: 20 Sep 2006 03:42:19 -0700
Links: << >> << T >> << A >>


> I worked before with the NIOSII and a price tag was on the tool. As I
> can see it here there is not much difference with the Lattice approach.
> Even if someone finds later a 3rd party low cost board and debugger you
> still need to buy the ispLEVER software for $595.
> Or is there a complete free solution with the LatticeMico32?

Lattice give you the RTL for free and you are allowed to use it on any
device, including non-Lattice devices. i.e. you don't have to buy
ispLEVER or any Lattice FPGAs.

In contrast, the NIOS and MicroBlaze RTL costs tens of thousands of
dollars, and I believe you are only able to use it either in an ASIC or
on their devices.

Cheers,
Jon


From removethisthenleavejea@replacewithcompanyname.co.uk Wed Sep 20 03:43:18 2006
Path: newsdbm04.news.prodigy.com!newsdst01.news.prodigy.net!prodigy.com!newscon04.news.prodigy.net!prodigy.net!newsfeed.media.kyoto-u.ac.jp!newsfeed.icl.net!colt.net!feeder.news-service.com!newsfeed.freenet.de!solnet.ch!solnet.ch!news.clara.net!wagner.news.clara.net!monkeydust.news.clara.net!proxy01.news.clara.net
From: "John Adair" <removethisthenleavejea@replacewithcompanyname.co.uk>
Newsgroups: comp.arch.fpga
References: <2006091508283364440-news@gornallnet>   <51AOg.657$ya1.37@news02.roc.ny>   <1158345466.190880.170250@i3g2000cwc.googlegroups.com>   <1158349839.111121.265060@e3g2000cwe.googlegroups.com>   <1158480184.944123.205200@m73g2000cwd.googlegroups.com>   <2006091710251416807-news@gornallnet>   <1158588269.5970.0@proxy02.news.clara.net> <1158670362.300984.116900@m73g2000cwd.googlegroups.com>
Subject: Re: USB programming cables
Date: Wed, 20 Sep 2006 11:43:18 +0100
Lines: 56
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2962
X-RFC2646: Format=Flowed; Original
X-Complaints-To: abuse@clara.net (please include full headers)
X-Trace: 763930d320300139237e11202203ee010330880262205705d3c04b1045111b49
NNTP-Posting-Date: Wed, 20 Sep 2006 11:43:21 +0100
Message-Id: <1158749001.9836.0@proxy01.news.clara.net>
Xref: prodigy.net comp.arch.fpga:119927

They will use SPI prom for loading. The FPGA will be a Spartan-3E in a CP132 
package.

John Adair
Enterpoint Ltd. - Home of Broaddown2. The Ultimate Spartan3 Development 
Board.
http://www.enterpoint.co.uk

"radarman" <jshamlet@gmail.com> wrote in message 
news:1158670362.300984.116900@m73g2000cwd.googlegroups.com...
> John Adair wrote:
>> Craignell family are are a set of DIL style 5V tolerant modules based on
>> Spartan-3E. Mainly aimed at obsolete component replacement they can also 
>> be
>> used for hobby electronics due to their mechanical pitch. I should have 
>> said
>> there are actually 3 members of this family to release supporting DIL28,
>> DIL32 and DIL40.
>>
>> John Adair
>> Enterpoint Ltd. - Home of Broaddown4. The Ultimate Virtex-4 Development
>> Board.
>> http://www.enterpoint.co.uk
>>
>> "Simon" <news@gornall.net> wrote in message
>> news:2006091710251416807-news@gornallnet...
>> > On 2006-09-17 01:03:05 -0700, "John Adair" <g1@enterpoint.co.uk> said:
>> >
>> >> Did you mean the Tarfessock1?
>> >>
>> >> We have slipped a little due to the amount of customer work that has
>> >> come in over the summer but will like be available in approximately 
>> >> 3-4
>> >> weeks. Our main limitation is the arrival of the cardbus frames and
>> >> covers which we are waiting for.
>> >
>> > No, I meant the Darnaw1 (S3S1200E module, yes ?) - it's for a different
>> > project than the V4FX board :-)
>> >
>> >> Darnaw1 and Craignell1/2 are roughly on the same timescale.
>> >
>> > I'm assuming the casual mention of a completely unknown (at least to 
>> > me,
>> > and I can't find it on your site :-) board is a marketing ploy designed 
>> > to
>> > tease out the question: "What is Craignell1 and 2 ?". Ok then [grin],
>> > dish!
>> >
>> > Simon
>> >
>
> Any info on the Craignell boards would be nice. Will these have
> built-in platform flash?
>

Article: 109026
Subject: Re: DDR2 Memory Controller : IOSTANDARD
From: Aurelian Lazarut <aurash@xilinx.com>
Date: Wed, 20 Sep 2006 11:46:06 +0100
Links: << >> << T >> << A >>

Zyan,
please make sure you don't have two assignments in the ucf for the same 
pin, the last one in the file will be cosidered.
Aurash
zyan wrote:

>Thanks.
>
>I only have 2 types of IOSTANDARD in the same bank, SSTL18_II_DCI and DIFF_SSTL18_II_DCI. Both requires 1.8V. However, it was found that ISE had assigned LVCOMS25 to the pin that I had specified to be DIFF_SSTL18_II_DCI. This ended up with both 2.5V and 1.8V in the same bank. How can I fix this problem?
>
>Thanks.
>  
>


-- 
 __
/ /\/\ Aurelian Lazarut
\ \  / System Verification Engineer
/ /  \ Xilinx Ireland
\_\/\/
 
phone:	353 01 4032639
fax:	353 01 4640324

Article: 109027
Subject: Re: Lattice .bit file format
From: Johannes Hausensteiner <johannes.hausensteiner@pcl.at>
Date: Wed, 20 Sep 2006 13:53:42 +0200
Links: << >> << T >> << A >>

Works great! thanks!

Antti wrote:
> Johannes Hausensteiner schrieb:
> 
>> Hi,
>> Thank you for your hint. Unfortunately I am not able to find the
>> mentioned reference design. Can you provide a link?
>> Thanks again,
>>
>> Johannes
>>
>> Antti wrote:
>>> Johannes Hausensteiner schrieb:
>>>
>>>> Hi,
>>>>
>>>> Does anybody know the internal structure of a Lattice .bit file? I
>>>> do a CPU design including RAM and ROM for the CPU inside the FPGA.
>>>> I am currently working on the firmware and every time I make a new
>>>> version of the CPU program I have to integrate it into ispLEVER and
>>>> recompile the whole FPGA to generate the .bit file, which takes
>>>> about 8 minutes on my PC, which is a real pain.
>>>> Is there any way to insert the contents of the ROM (implemented as
>>>> EBR blocks) into the .bit file directly?
>>>> Or is there any other way to do this?
>>>> I am using a LFECP33E chip on an "HPEmini" development board.
>>>>
>>>> Thanks a lot,
>>>>
>>>> Johannes
>>> there is a network reference design at lattice website with Z80
>>> cpu a networking stuff, I think it includes some script to update
>>> the bt file as well.
>>>
>>> Antti
>>>
> http://www.latticesemi.com/products/intellectualproperty/ipcores/trispeedethernetmediaacce/trispeedethernetmacdemofo.cfm
> 
> Antti
>

Article: 109028
Subject: Re: Tools that support ECO
From: "Jon Beniston" <jon@beniston.com>
Date: 20 Sep 2006 04:54:44 -0700
Links: << >> << T >> << A >>


gen_vlsi wrote:
> HI,
>
>       Can anyone tell me the tools that support ECO (Engineering Change
> Order).

Quite a few FPGA synthesis tools support incremental compilation of
some variety, if that's what you are getting at.

Cheers,
Jon

Article: 109029
Subject: Xilinx System Generator -> Block RAM
From: aphedorov@yandex.ru
Date: 20 Sep 2006 05:06:30 -0700
Links: << >> << T >> << A >>

Could anybody explain me the algorithm of the generating RAM (ROM)
modules by Xilinx System Generator?

I need approximately 16000 of the 12-bit words stored in high speed
ROM. I consider that optimal decision is to use twelve 16k x 1-bit
primitives with full pipelining. But SystemGenerator's ROM does not
contain any optimization settings. It generates the strange
architecture which used only 11 RAMB16 but extremely slow. The
generated RAM contains eight not pipelined 9-bit RAMB16 with additional
LUTs and registers and three pipelined 16k x 1bit.

Is there any workaround how to generate necessary ROM architecture (not
manually)?

Xilinx System Generator version is 8.1.

PS The other strangeness is a latency of the generated ROM. When I set
Latency=5 I see generic "latency => 4" in the generated ROM
instantiation (in VHDL). But when I see generated .edn I don't see any
additional registers. There are not pipelined RAMB16 + FDE (latency is
2) or pipelined RAMB (latency is 2 also).

PPS When I say "pipelined" I mean that property DOA_REG  is set to
(string "1") in generated .edn core. And when I say "not pipelined" I
mean that property DOA_REG is set to (string "0") in the one.

Article: 109030
Subject: Re: FPGA : Open core FFT
From: bijoy <pbijoy@rediffmail.com>
Date: Wed, 20 Sep 2006 05:19:05 -0700
Links: << >> << T >> << A >>

Hiello ZHAO Ming

Thank you for your mail.

I have taken the fft core from open cores website

 <http://www.opencores.org/cvsweb.shtml/cfft/>

and then i configured it for 1024 point and 16-bit width. i modified the test-bench to and send my input signal ( a sine wave). but what i observe is i am getting a magnitude peak at the correct bin but there are significant peaks in nearby bins also. and the phase response shows 180 degree change for some bins compared to the matlab values.

could you please send the exact files that you have used.

rgds bijoy

Article: 109031
Subject: i2c,ahb,apb
From: "vits" <vittal.patil@gmail.com>
Date: 20 Sep 2006 05:24:00 -0700
Links: << >> << T >> << A >>

Hi,
I came across these buses i2c ,ahb,apb.What is the difference between
them.
I dont know about the ahb or apb .just started reading about i2c.
I have to test the i2c bus in verilog. In i2c bus DUT(design under
test) i saw something
like wishbone interface or ahb thing.what does it mean .please explain
me in detail.
or give some links.I am exploring them too.
Thanks,
Vittal

Article: 109032
Subject: Re: Buffering the critical path.
From: "vssumesh" <vssumesh_asic@yahoo.com>
Date: 20 Sep 2006 05:24:33 -0700
Links: << >> << T >> << A >>

G=F6ran Bilski wrote:

> What are the logic you do after the addition?
>
> If the logic is any related to the carry-out of the addition, I tend to
> extend the carry-chain with more logic instead of doing it after the
> addition. It's easy to do AND/OR boolean expression using the carry-chain.
Hi Goran,
The combinational logic after the addition includes a saturation logic
which checks the 37 bit and based on that output constant values or if
it is not saturation condition output the accumalated data. The bit
checking operation is AND and OR operation like and 31 to 16 bits and
if it is one based on some other parametrs output the saturated value
etc. So it is a AND,OR array then a decision logic after that a mux.
After all these operations there is a shifting operation which formats
the data to write to RAM locations (There is a special format to write
to the RAM). The saturation operation is based on the addition result.
Is it possible to optimize such an operation as you said if yes please
show me an example how to implement that in HDL.


Peter Alfke wrote:
> Sumesh, I have a special place in the dungeon for people who ask
> questions where they leave the most important details out, and tell us
> afterwards. "O, by the way..."
Peter i am not that type. I am comming to you people for advice so i
wont behave like that. Though i am working for a company so i wont be
able to reveal the full system but to the extent to which you people
can understand the problem i think i am able to explain. Also my
english is week so may not be the correct sentance that i am trying to
express. Please forgive me if that happend.
> You started mentioning address and long carry chains, which -as we know
> by now- are completely irrelevant to your problem
I know that the routing delay for a carry chain is very small but i
thought that the long steps in it is causing confusion to the tool.
Thats why i have mentioned it.

> I think Ray has the best possible advice, but I do not see an easy
> solution. Look at how you arrange your Dual-Port RAMs, and how you can
> exchange data between them.
Ya i am trying with that modifying the area placements etc.
One doubt in that ; what should be the margin i gave to tool for the
time constrains. Eg: if the working frqency is 80MHZ (12.5 ns) is it
enough that i gave 12ns second constrain to the critical path. Or 24ns
second for a double cycle path.

> Have you looked at Virtex-5LX devices?
No i have to complete the design in V4LX60.

Also please advice me on the braking the logic path. will that be good
for timing ???

Article: 109033
Subject: Re: synchronous clocks
From: Brian Drummond <brian_drummond@btconnect.com>
Date: Wed, 20 Sep 2006 13:39:57 +0100
Links: << >> << T >> << A >>

On Tue, 19 Sep 2006 20:11:57 -0700, zohair <szohair@gmail.com> wrote:

>I have a single input clock in my design, and it is used to drive clock enables to the entire design. The clock enables are user-programmable as a divider function of the input clock frequency. A second clock is generated on-chip by a hand-coded divider circuit (not a DCM), and this clock needs to be declared as synchronous to the input clock. Additionally I want the tool to time paths between flops across these synchronous domains - it is not doing this by default.
>
>I understand it is not a recommended design practice to build your own divider, we're supposed to use a DCM, but we are prototping an ASIC, and the divider is a part of that logic that needs to be proto-typed.
>
>Does anyone know how I can tell the ISE tool that: (1) these two clocks are synchronous (2) the tool should minimize the skew between these clocks, i.e. balance the clock tree (3) the tool should time paths crossing the two synchronous clock domains

If the issue is that the ASIC tool can generate the two clocks
synchronously (by balancing the clock trees) but the FPGA can't, then
why not put a compensating delay in the HF clock? For example by using a
DCM in phase shift mode, to match the delay of the divider - and use the
delayed clock for the remainder of the logic.

Wrap the delay in a "clock buffer" block so that it conceptually matches
the reality of the ASIC.

The delay can either be fixed (in which case you'll need to know it in
advance, or iterate to find it) or adjustable (which would require
differing connections from the ASIC). Alternatively use a DCM to
multiply up the divided clock to regenerate the divider's input, but
synchronised to the divider output using clkfb.

- Brian

Article: 109034
Subject: Re: VHDL oddity
From: michaelrbodnar@gmail.com
Date: 20 Sep 2006 05:47:39 -0700
Links: << >> << T >> << A >>

Alex,

If I'm reading this correctly, it seems that column_out has no
'initial' state in HW.  As far as I know, setting default values in
signal declarations (i.e. what you've done) works fine in a simulator,
but this information gets lost in synthesis, and thus HW (though there
are ways around it via UCFs, etc.).

You are therefore trying to read a signal that hasn't been set to any
state, and increment it by 1.  Think of it as:

 column_out <= "XXXX"+1;

"column_out" must be set to some initial value i.e. include a reset
signal in your entity/architecture.

Ex.

entity ...
port (
    ...
    reset : in std_logic;
    ...
};

architecture
...
begin

    process (clk, reset)
    begin
        if reset = '1' then
             column_out <= (others => '0');
        elsif rising_edge(clk) then
             ...

Hope this helps, and I hope I didn't miss anything in previous posts.

Article: 109035
Subject: Re: maximum life of FPGA based products ????
From: aholtzma@gmail.com
Date: 20 Sep 2006 06:18:24 -0700
Links: << >> << T >> << A >>

Was this just a problem with the 2v6000? I'm sure there are millions of
V-II designs out there that expect their 2x DCM clocks to continue to
be phase aligned.

cheers,
aaron

John Adair wrote:
> This may be the result of a silicon mask change that came in. Contact
> your FAE and they should be able to advise in more detail. I believe
> that for customers that need this feature there may be a special order
> code to use.
>
> John Adair
> Enterpoint Ltd.
>
> mh wrote:
> > hi
> >
> > Manufacturers claim that SRAM based FPGAs can be used in state of the
> > art products and perform well over a long period of time, but my
> > experience negates that.
> >
> > Two years ago, we developed a "real time face recognition system" using
> > xilinx XC2V6000 on AVNET development kit and delivered the final
> > working product to end user; Who Recently came up with a complaint that
> > the product is not functioning properly. Our technical staff identified
> > that the DCM output (2X clock) is no more in synch with other clocks on
> > the board. same old bitstream was working when we changed the board ,
> > and we concluded that the board is faulty.
> >
> > The faulty board was again working perfectly fine when we used phase
> > shifted DCM output clock. (CLK2X180_OUT)
> >
> >
> >
> > 1-- Maximum working life of FPGA based products is dependant on the
> > device and oscillator characteristics?
> >
> > 2-- What if the same problem arise in commercially manufactured product
> > (whose millions of pieces have been sold) ?
> >
> >
> > any comments from more experienced users??
> > 
> > 
> > regards
> > MH

Article: 109036
Subject: Xilinx PowerPC slower than FPGA Design?
From: "Peter Kampmann" <peter.kampmann@googlemail.com>
Date: 20 Sep 2006 06:22:47 -0700
Links: << >> << T >> << A >>

Hi everyone,

I've build a custom peripheral and now want to insert new Data into my
Design via the PowerPC every clock cycle.

My Problem is: how to resolve the next clock cycle with the PowerPC?
I've read the Tutorial by Richard Griffin, who suggests to set an
additional register that counts backwards a given number, to
synchronize with your peripheral.

As I want to use the clock cycle for the next input, I set the clock
signal to a bit of my output registers (using OPB Bus with User Logic
S/W Register Support on Virtex 2 Pro). I now tried to detect the rising
and falling edge of the clock by reading from the Register via the
<PERIPHERAL>_mReadReg(....) command.

But unfortunately, I never get the rising edge of the clock :(
The PowerPC, on which the C++ Code that reads the register, runs with
300 Mhz. The synthesis of my custom peripheral told me a Frequency of
about 150 Mhz.
The custom peripheral is attached to the sys_clk_s.

I assume that the PowerPC should run significantly faster than the
peripheral, but it seems that both run with the same clock.

Hope I have described my problem clearly.

Thanks for any help!

Regards,
Peter

Article: 109037
Subject: Re: Xilinx PowerPC slower than FPGA Design?
From: "Kyle H." <kyle.hable@gmail.com>
Date: 20 Sep 2006 06:31:51 -0700
Links: << >> << T >> << A >>


Peter,

What's the clock speed of your OPB BUS?

I think I know the answer to this, I just don't know how to explain it.
 I'm a bit new :)

Regards,
Kyle

Article: 109038
Subject: Re: Buffering the critical path.
From: "KJ" <Kevin.Jennings@Unisys.com>
Date: 20 Sep 2006 06:40:11 -0700
Links: << >> << T >> << A >>

vssumesh wrote:
<big snip>

> Also please advice me on the braking the logic path. will that be good
> for timing ???

>From perusing this thread, you seem to be both overanalyzing and
underanalyzing the problem.  By 'overanalyzing' I mean that you
originally homed in on the carry chain as being a possible contributor
to your problem and started wondering about ways to improve it; by
'underanalyzing' I mean that you didn't just look in the final fitted
report for the worst case timing paths to either confirm or deny this
to be the case.

The process for improving timing is both relatively straightforward but
is still somewhat of an art.  The straightforward process part is
simply...

1. Identify all of the timing paths that are failing.  This should come
out of the timing report not out of any guessing.
2. Understand what the root cause of the timing problem is for each
failing path (if there aren't too many) or the top problems (if there
are a lot).
3. Decide what to do to fix the problem (more on that later).
4. Impement the fix, run through the tools and go back to step 1.
Repeat until there are no failing paths.

Step #3 "Decide what to do..." is where the art comes in and probably
you are the only one who knows what can be traded off in your
particular application.  It's also somewhat of an art in somehow
knowing ahead of time which tricks will be best in a particular
situation.  Another word for 'art' though would be simply 'experience'.
 Even without this you can still do it, it just may take you longer.

Here are some of the tricks of the trade that you'll need to decide for
yourself on how to apply.  Some may be applicable to your design, some
may not.  There are other tricks that I'm sure other people can suggest
as general improvements but without more detail on your code I don't
think anyone will necessarily be able to give specific guidance on your
design.

Like I said, the list below are just some samples of things, some or
none of which may apply to your design...but I'm guessing that some
will.

1. Better algorithms.  This tends to result in either be a big
improvement or will not be applicable at all.  An example here could be
to use a linear feedback shift register instead of a counter.  The
improvement here is that there is no carry chain to worry about,
performance is basically independent of the size of the counter.  Also
logic resource scales less rapidly than with a binary counter.
Whatever your 'and/or' logic is that you have after your adder might be
worth inspecting to see if there is a better overall algorithm.

2. Increase latency.  Find some point in the logic where you can
register signals without 'too much' problem and that also breaks up the
failing timing path (as determined from #1 at the start of the post)
roughly into equal timing lengths.  Generally when you delay things by
a clock cycle there is something else that is affected by this so you
need to change whatever that is as well.  How many and how difficult
those 'other' things are is how you decide if it's 'too much of a
problem' to break the chain at a certain point.  In some cases, there
might be no choice, you have to break the chain at some point and there
are several things that get affected by this and you just have to code
to it.  This general technique is also called 'pipelining'.

3.  Relax clock cycle requirements.  Do things really need to be done
in one clock cycle or can they be done in two (or more)?  The default
static timing analysis works on the assumption that the combinatorial
logic between flops must be completed within one clock cycle.  If your
input isn't changing that fast and you can take more than clock cycle
in your application then you simply need to specify to the synthesis
tool which paths those are and how many clock cycles the computation
can take.

4.  Transform operations that depend on two operands into one that
depends on a single operand and a constant.  This is somewhat of a more
specific example of a 'better algorithm'.  An example of this would be
let's say you have a counter that needs to count up to a programmable
number of counts and then reset.  One way to do this would be to
provide a writable register that contains the 'stop count'.  Reset the
counter to 0 and count up until you get to 'stop count'.  If you do
that you'll find the critical path to be the statement "if Count =
Stop_Count then....".  If 'Count' and 'Stop_Count' are 10 bit numbers
then the '=' function will take as input 20 signals.  A different
approach would be to preload the counter with 'stop count' and then
count down until you get to 0.  Now you will be comparing count to a
hardcoded number (i.e. 0) which means the '=' function will only take
as input 10 signals.  Most likely fewer logic resources and better
timing.

5.  See if operations can be moved from one clock cycle to another.
This one is a bit tricky to explain and falls under what synthesis
tools would call 'register retiming' (which may or may not be happening
already on your design) but the idea here is that let's say there is
already a three clock cycle pipeline where stuff on clock cycle 2 is
where the critical path is but the flops that are doing the clock cycle
1 stuff get done really quickly.  It might be possible to move some of
the computation from clock cycle 2 into clock cycle 1 slowing down the
timing paths in clock cycle 1 (but it was 'fast' to begin with) and
improving your critical timing path in clock cycle 2.  The 'simplest'
way to approach this would be to read up on 'register retiming' (if
your tool has this) and try it out and hope that it can spot ways to do
this movement without further involvement on your part.  If it can help
you, it is usually just a matter of adding some extra flops to your
source code and let the tool do the hard work.  If it doesn't then
you'll need to sit down and figure out for yourself how to move things
from one clock cycle to another to spread out the load a bit.
Unfortunately, I can't give you a good specific example here it is very
application specific.

The examples I've given for each type of trick is not because I have
any reason to think it will or won't apply to your design but just
specific examples of the general thing that you're trying to do for
each point.  You need to analyze your failing paths and figure out
which basic idea might be applicable and then develop the code specific
to your design and go through the 4 step process.

KJ

Article: 109039
Subject: Re: Xilinx PowerPC slower than FPGA Design?
From: "Peter Kampmann" <peter.kampmann@googlemail.com>
Date: 20 Sep 2006 06:45:43 -0700
Links: << >> << T >> << A >>

The OPB Bus clock is attached to the sys_clk_s, in Base System Wizard,
the Bus Clock Frequency is set to 100 Mhz and cannot be set higher.

So the problem is perhaps, that the Bus is clocked too slow? It seems
that I cannot the the Frequency higher, because when using the PowerPC
in 300 Mhz mode, I cannot choose any other frequency in the Base System
Wizard.
When I try to set the clk to the same clock as the PowerPC (proc_clk_s)
I get the error that the RS232 won't work with that frequency?
Perhaps I have to use the PLB Bus then? Because its a faster
peripheral?

Regards,
Peter

Kyle H. schrieb:

> Peter,
>
> What's the clock speed of your OPB BUS?
>
> I think I know the answer to this, I just don't know how to explain it.
>  I'm a bit new :)
> 
> Regards,
> Kyle

Article: 109040
Subject: Re: Xilinx PowerPC slower than FPGA Design?
From: "Antti" <Antti.Lukats@xilant.com>
Date: 20 Sep 2006 06:49:25 -0700
Links: << >> << T >> << A >>

Peter Kampmann schrieb:

> Hi everyone,
>
> I've build a custom peripheral and now want to insert new Data into my
> Design via the PowerPC every clock cycle.
>
this if possible is very tricky.
the _only_ part that actually is able to run at 300MHz is the PPC proc
hardmacro itself, all the busses can be run at about 100MHz. So
whatever peripherals you have they are on 100MHz busses.

there maybe some tricks in V4 like using the FCB bus, or have
special peripherals on OCM bus - but thats all rather complicated
business.

whatever is on PLB or OPB runs at about 100MHz max speeds.

Antti

Article: 109041
Subject: Re: VHDL oddity
From: "Andy" <jonesandy@comcast.net>
Date: 20 Sep 2006 06:53:53 -0700
Links: << >> << T >> << A >>

#1 is not an issue in a clocked process, only in a combinatorial
process, where you could end up with a latch.  In a clocked process,
the incompletely specified values would have clock enables on the
registers, that were disabled whenever they were not updated.

I write and use "is0()" and "is1()" functions (that return boolean) to
correctly interpret metavalues such as 'H' & 'L', and assert warnings
if other metavalues are encountered. They get optimized out in
synthesis. Or I use integer, bit, or boolean types that don't have
metavalues.

if is0(column_out(0)) then
...
else
...
end if;

Andy


Rajeev wrote:
> Alex,
>
> Couple of things strike me as cause for concern, being rusty on my VHDL
> I don't recall the semantics of incomplete specifications, ans would
> hesitate to say any of these comments will help, but still...
>
> 1. if mode='0' clause does not define signals bDATA_OUT and
> column_out[].
> 2. not sure why you have elsif mode='1' instead of just "else" --
> synthesis is one thing but simulation considers 'X' 'U' 'Z' etc
> (9-value logic) and it's prudent to have "else" specified.
> 3. synthesis and simulation may not implement logic identically.
>
> HTH,
>
> -rajeev-
>
> Alex wrote:
> > I would greatly appreciate if someone could explain the behavior I'm
> > seeing for me.
> >
> > In the inner most if-state, where I write to bDATA_OUT ---- if I run
> > the program as written, it does nothing (my DATA_OUT lines remain in
> > the state they were previously).  If I remove the "else,   bDATA_OUT <=
> > "11000000"" segment, it properly outputs 00001010.  I don't understand
> > why it would work w/o the else, but not w/.
> >
> > This is a snippet of a larger VHDL, trimmed down for debugging.
> >
> > Thank you.
> >
> > Alex McHale
> >
> > entity driver is
> >     Port ( CLOCK : in  STD_LOGIC;
> >            ACTIVE : in STD_LOGIC;
> >            CLOCK_IN : in  STD_LOGIC;
> >            LATCH_IN : in  STD_LOGIC;
> >            DATA_IN : in  STD_LOGIC_VECTOR (7 downto 0);
> >            ADDRESS_IN : in  STD_LOGIC_VECTOR (4 downto 0);
> >            DATA_CLOCK_OUT : out STD_LOGIC;
> >            CLOCK_OUT : out  STD_LOGIC;
> >            LATCH_OUT : out  STD_LOGIC;
> >            DATA_OUT : out  STD_LOGIC_VECTOR (7 downto 0) );
> > end driver;
> >
> > architecture Behavioral of driver is
> >     signal mode : STD_LOGIC := '0';
> >     signal column_out : STD_LOGIC_VECTOR(9 downto 0) := "0000000000";
> >     signal bDATA_OUT : STD_LOGIC_VECTOR(7 downto 0);
> > begin
> >     process( CLOCK )
> >     begin
> >         if( rising_edge( CLOCK ) ) then
> >             if mode='0' then
> >                 CLOCK_OUT <= '0';
> >                 LATCH_OUT <= '0';
> >
> >                 mode <= '1';
> >             elsif mode='1' then -- DATA INCOMING
> >                 if column_out(0)='0' then
> >                     bDATA_OUT <= "00001010";
> >                 else
> >                     bDATA_OUT <= "11000000";
> >                 end if;
> >
> >                 CLOCK_OUT <= '1';
> >                 LATCH_OUT <= '1';
> >                 column_out <= column_out + 1;
> >
> >                 mode <= '0';
> >             end if;
> >         end if;
> >     end process;
> > 
> >     DATA_OUT <= bDATA_OUT;
> > end Behavioral;

Article: 109042
Subject: Re: Xilinx PowerPC slower than FPGA Design?
From: "Peter Kampmann" <peter.kampmann@googlemail.com>
Date: 20 Sep 2006 06:57:24 -0700
Links: << >> << T >> << A >>

Arrgh,

so that means I am not able to detect the clock signal of a peripheral
that is faster than 100 Mhz ?
So I'll have to search for an option where I can slow down my
peripheral. Hope there is such a possibility?

Regards,
Peter

Antti schrieb:

> Peter Kampmann schrieb:
>
> > Hi everyone,
> >
> > I've build a custom peripheral and now want to insert new Data into my
> > Design via the PowerPC every clock cycle.
> >
> this if possible is very tricky.
> the _only_ part that actually is able to run at 300MHz is the PPC proc
> hardmacro itself, all the busses can be run at about 100MHz. So
> whatever peripherals you have they are on 100MHz busses.
>
> there maybe some tricks in V4 like using the FCB bus, or have
> special peripherals on OCM bus - but thats all rather complicated
> business.
>
> whatever is on PLB or OPB runs at about 100MHz max speeds.
> 
> Antti

Article: 109043
Subject: APU disabled after context switch in Xilkernel
From: Andreas Hofmann <ahofmann@ti.cs.uni-frankfurt.de>
Date: Wed, 20 Sep 2006 16:10:33 +0200
Links: << >> << T >> << A >>

Hello,

i'm trying to move data between the ppc405-core of a V4FX20 and a
microblaze. The ppc and the microblaze are connected via an FSL link:

PPC-APU <-> FCB <-> FCB2FSL <-> FSL <-> MB

To transfer data the putfsl()/getfsl() functions are used.

The data transfer works flawlessly if the standalone BSP is used on the
PPC. If xilkernel is used on PPC and a task switch occurs between
enabling the APU and calling putfsl()/getfsl() an unknown-instruction
exception is generated. It looks like the MSR is not properly saved
and/or restored.
Is this the case or do i have to do a little more than just set bit 6 in
MSR?


Regards,
Andreas

Article: 109044
Subject: Re: Xilinx PowerPC slower than FPGA Design?
From: "Kyle H." <kyle.hable@gmail.com>
Date: 20 Sep 2006 07:12:30 -0700
Links: << >> << T >> << A >>

I've been dancing around this problem as well.  Haven't even come close
to finding a way around it, as I'm still learning system design, HDL
and the software (EDK).

Regards,
Kyle

Article: 109045
Subject: Re: Buffering the critical path.
From: =?ISO-8859-1?Q?G=F6ran_Bilski?= <goran.bilski@xilinx.com>
Date: Wed, 20 Sep 2006 16:14:41 +0200
Links: << >> << T >> << A >>

vssumesh wrote:
> Göran Bilski wrote:
> 
> 
>>What are the logic you do after the addition?
>>
>>If the logic is any related to the carry-out of the addition, I tend to
>>extend the carry-chain with more logic instead of doing it after the
>>addition. It's easy to do AND/OR boolean expression using the carry-chain.
> 
> Hi Goran,
> The combinational logic after the addition includes a saturation logic
> which checks the 37 bit and based on that output constant values or if
> it is not saturation condition output the accumalated data. The bit
> checking operation is AND and OR operation like and 31 to 16 bits and
> if it is one based on some other parametrs output the saturated value
> etc. So it is a AND,OR array then a decision logic after that a mux.
> After all these operations there is a shifting operation which formats
> the data to write to RAM locations (There is a special format to write
> to the RAM). The saturation operation is based on the addition result.
> Is it possible to optimize such an operation as you said if yes please
> show me an example how to implement that in HDL.
> 
> 
> Peter Alfke wrote:
> 
>>Sumesh, I have a special place in the dungeon for people who ask
>>questions where they leave the most important details out, and tell us
>>afterwards. "O, by the way..."
> 
> Peter i am not that type. I am comming to you people for advice so i
> wont behave like that. Though i am working for a company so i wont be
> able to reveal the full system but to the extent to which you people
> can understand the problem i think i am able to explain. Also my
> english is week so may not be the correct sentance that i am trying to
> express. Please forgive me if that happend.
> 
>>You started mentioning address and long carry chains, which -as we know
>>by now- are completely irrelevant to your problem
> 
> I know that the routing delay for a carry chain is very small but i
> thought that the long steps in it is causing confusion to the tool.
> Thats why i have mentioned it.
> 
> 
>>I think Ray has the best possible advice, but I do not see an easy
>>solution. Look at how you arrange your Dual-Port RAMs, and how you can
>>exchange data between them.
> 
> Ya i am trying with that modifying the area placements etc.
> One doubt in that ; what should be the margin i gave to tool for the
> time constrains. Eg: if the working frqency is 80MHZ (12.5 ns) is it
> enough that i gave 12ns second constrain to the critical path. Or 24ns
> second for a double cycle path.
> 
> 
>>Have you looked at Virtex-5LX devices?
> 
> No i have to complete the design in V4LX60.
> 
> Also please advice me on the braking the logic path. will that be good
> for timing ???
> 

Hi,

The idea is to use MUXCY as a building block.
The drawback is that you need to express the adders using MUXCY as well 
since you need to extend that carry-chain with some more logic.

ex.

If you do a comparison between two values and want to do some function 
depending on the result of comparison.
Since comparison is basically a subtraction, the carry output from the 
subtraction will tell the result of the comparison.

Let say you want to AND the carry-output signal with some other signal.
You can do this by a MUXCY like this.

     MUXCY_I : MUXCY_L
       port map (
         DI => '0',
         CI => Subtract_carry_out,
         S  => signal_to_and_with,
         LO => anding_result);

As you can see the signal "anding_result" will only be high if 
"subtract_carry_out" and "signal_to_and_with" both is high. ie. an AND

The beauty of this is that instead of routing out the carry-signal to 
normal routing and using a LUT for the ANDing, this use hard-route wires 
and an extremely fast logic.

The "normal" way can take around 1 ns depending on family including 
routing and using the carry-logic around 0.01 ns

Doing an "OR" is done like this

     MUXCY_I : MUXCY_L
       port map (
         DI => '1',
         CI => Subtract_carry_out,
         S  => Inverted signal_to_or_with,
         LO => oring_result);

Here the oring result is high if either the subtract_carry_out is high 
or the inverted signal_to_or_with is high.


You can now create a long chain of these AND/OR logic and have extremely 
fast operations compare to the normal way.

MicroBlaze is using this a LOT in order to maximize the clock frequency.


Göran

Article: 109046
Subject: XUP Boad User Expansion Ports
From: "S.T." <st@iss.tu-darmstadt.de>
Date: Wed, 20 Sep 2006 16:14:41 +0200
Links: << >> << T >> << A >>

Hi

Learning from the thread: "problems with IOSTANDARD", it shows that the
IO-voltage must be changed in hw and can not be programmed via ucf files.

We have problems with IO on the XUP Board (UG069) trying to connect a cmos
device which accepts 2 Volts input. When just looking at the tables of the
Expansion Headers and the Digilent Expansion Connectors it states LVTTL
(High in between 2.4 and 3.6 V) which should work with 2 Volts high input
threshold of our cmos device. Measuring with a scope gives barly 2 V output
on the Digilent Expansion Port. So to remedy this problem: 

Are there any jumpers that allow clamping to an higher value on the XUP
board?

What is the typical Pass Voltage of the IDTQS32861. It seems to me we have a
type with 1.3 V drop.

Thanks
ST

Article: 109047
Subject: MPMC2 and MontaVista Linux
From: eric <erixx@gmx.net>
Date: Wed, 20 Sep 2006 16:17:13 +0200
Links: << >> << T >> << A >>

Hi,
how can I make a MPMC2 and a MontaVista Linux on ML403 from Xilinx work 
together?

Is there a howtoguide for that problem or has anybody experiences with 
that stuff?

Thanks a lot :-)

Eric

Article: 109048
Subject: Re: Xilinx PowerPC slower than FPGA Design?
From: "Antti" <Antti.Lukats@xilant.com>
Date: 20 Sep 2006 07:21:28 -0700
Links: << >> << T >> << A >>

Kyle H. schrieb:

> I've been dancing around this problem as well.  Haven't even come close
> to finding a way around it, as I'm still learning system design, HDL
> and the software (EDK).
>
> Regards,
> Kyle

the user IP cores may run at high clock frequencies that no problem at
all.
but they have to be connected to 100MHz clocked bus, so there
peripheral
itself must convert betweent the clock domains

Antti

Article: 109049
Subject: Re: Xilinx PowerPC slower than FPGA Design?
From: "Peter Kampmann" <peter.kampmann@googlemail.com>
Date: 20 Sep 2006 07:38:01 -0700
Links: << >> << T >> << A >>


Antti schrieb:

> the user IP cores may run at high clock frequencies that no problem at
> all.
> but they have to be connected to 100MHz clocked bus, so there
> peripheral
> itself must convert betweent the clock domains
>

Hmm I don't understand how this can work, can you explain it to me?
The Peripheral runs at, say 120 Mhz, the Bus delivers data at 100 Mhz.

Now I need new Data in the peripheral every clock cycle, but I get only
new Data at 100 Mhz and the peripheral wants them at 120 Mhz. So that
means I have to slow down the clock internally in the fpga design, as
described in the "Custom Peripheral Design Guide".

The Bus2IP_Clk is the clock of the OnChip Peripheral Bus?
If this is the case, I have to half this clock, then I have 50 Mhz for
the Peripheral, 100 Mhz for the Bus and 300 for the PowerPC and then I
should be able to detect clock cylces.

> Antti

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search