Messages from 130375

Article: 130375
Subject: Re: verilog question, break while loop to avoid combinational feedback during synthesis
From: Jonathan Bromley <jonathan.bromley@MYCOMPANY.com>
Date: Fri, 21 Mar 2008 16:16:19 +0000
Links: << >> << T >> << A >>

On Fri, 21 Mar 2008 10:04:23 -0400, Fei Liu wrote:

>My brain is still wired to think only in software sense.

A simulator is a piece of software.  I was talking about 
the behaviour of simulation's  software execution. 

Synthesis promises to build hardware that matches the
code's simulation behaviour (within certain limits), 
or else give an error to say that it can't do that.

> Another thing that I don't 
>understand is the disable command. In the following code example, the 
>'disable count;' statement does not break the loop (or maybe it's 
>reentered right away?)
>
>Does always code_block causes code_block executed indefinitely even when 
>  code_block is disabled? always and disable seem to contradict with 
>each other.
[...]
>     always @(posedge clock)
>     begin: count
>         counter = counter + 1;
>         if(counter == 1000) disable count;
>     end

The effect of "disable code_block" is to abandon execution
of code_block.  This is equivalent to transferring control
(jumping) to the "end" keyword that closes code_block.

"always", like EVERY control construct in Verilog, controls
exactly one procedural statement; in your case, that one
statement is a "begin...end" block, with an @(posedge clock)
delay prefix.  When the disable kills your begin...end block, 
the block terminates - the "always" then causes it to execute
again, but as usual the @(posedge clock) prefix delays its
execution until the next clock.  The counter will then 
increment from 1000 to 1001, and the if() test will fail,
so your if()...disable has in fact had no effect whatever.

If you want to disable a loop, you must disable a code block
that encloses the entire loop.  Here are the standard ways
to implement C's "continue" and "break" in Verilog:

  initial begin : outer_block
    forever begin : inner_block
      ...
      ...
      if (...) disable inner_block;  // continue
      ...
      if (...) disable outer_block;  // break
      ...
    end // inner_block
  end // outer_block

Note that "forever" is simply "while (1)".

This kind of code, using initial/forever/disable, is
unlikely to be synthesisable.
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Article: 130376
Subject: Re: ISE 10.0 finally with multi-threading and SV support ?
From: Kolja Sulimma <ksulimma@googlemail.com>
Date: Fri, 21 Mar 2008 09:22:12 -0700 (PDT)
Links: << >> << T >> << A >>

On 20 Mrz., 20:30, j...@capsec.org wrote:
> Hello,
>
> > That is easy to do for high level synthesis, next to impossible for
> > placement and very difficult for routing.
>
> I did not dig into it -- but I always felt its exact the opposite.
> Some time ago, i read most P&R ist based
> upon simulated annealing. Is that still true? While SA might not be
> the most parallizable
> algorithm on earth, it should give you some speedup; at least on
> SMP...
>
> Do you've got some read-worthy documents on that topic?

With SA you perform a small change on your design and evaluate the
fitness of the new design.
Based on that you decide how to continue. In cases where the fitness
update is inexpensive SA
is inherently serial and hard to parallize.
Of course you could evaluate multiple changes before making the next
decision like genetic algorithms do.
But anyway, I doubt that Xilinx still uses SA for their placer.

Please not that the algorithms running in the EDA world are extremely
complicated compared to stuff
like computer graphics or similar. Performance in many cases depends
on a multitude of parameters that are hand tuned.
These parameters can be affected by parallization. You need to do all
the parameter tuning again or convergence migh actually be worse when
running two cores instead of one.

Anything that can be done on partitions of the design is easy. For
example you can start synthesis on parallel on multiple cores for
different source files. But later in the flow it gets really nasty.
Onother easy thing to do is to running the timing analyzer in parallel
with bitgen. You do not need to wait for Xilinx to do that, just write
your own makefile.

Before complaining about Xilinx you should be aware that ASIC
designers often face runtimes of many hours for their tools. (Nightly
builds....) Parallel EDA software therefore is an active research
topic but with no major successes yet. Don't expect the FPGA vendors
to support multiple cores before the big EDA companies like synopsys,
Magma DA or Cadence do.

Kolja Sulimma

Article: 130377
Subject: Re: Is there a means to conditional synthesis in VHDL?
From: Kolja Sulimma <ksulimma@googlemail.com>
Date: Fri, 21 Mar 2008 09:26:52 -0700 (PDT)
Links: << >> << T >> << A >>

On 20 Mrz., 19:29, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Thu, 20 Mar 2008 11:07:33 -0700 (PDT),
>
> fl <rxjw...@gmail.com> wrote:
> >I am designing a bunch (about 100) of short length tap (5 taps each)
> >FIR. The tap coefficients would be many 1...31. I want to use
> >multiplier adder graph method for the multiplication. That is,
> >multiplying 15 will be implemented as left shift 4 bits, then minus
> >the original. I would like VHDL can intelligently select one of 16
> >multiplication structure. Is that possible? Or, I have to write C code
> >to generate a VHDL doc? Are there other better methods? Thanks
>
> You *can* do it with VHDL's if...generate construct.  But it tends
> to get pretty clunky, and you may find that a C-based code
> generator is less painful.


You can also use a case statement and hope that the compiler does not
suck:

case factor is
  when 0:
    y <= 0;
  when 1:
    y <= x:
  when 2:
   y <= 2*x;
  when 3:
   y <= 2*x + x;
..

Also: For some toolchains
 y = factor*x;
will provide you with the desired result if factor is a constant. This
is called strength reduction.
Every C compiler does it, some VHDL compilers to.

Kolja Sulimma

Article: 130378
Subject: Re: chip scope
From: "u_stadler@yahoo.de" <u_stadler@yahoo.de>
Date: Fri, 21 Mar 2008 09:39:01 -0700 (PDT)
Links: << >> << T >> << A >>

hi well i think i did that.
there is an option under debug in edk where you can insert a chipscope
core. my actual problem is that i created an ip core that i use in
edk. i created the ip with the peripheral wizard and i can look at the
signals with chip scope that are in the top level entity of the ip
core. but what i want to do is to also have a look for example at some
counters inside that ip core which are not in the top level entity. my
workaround right now is to bring all the signal that i want to have a
look at into the top level entity of the ip. but i belive thats not
the best way and there should be something else...

thanks

Article: 130379
Subject: Re: Power Estimation of Microblaze (Power PC) based architectures
From: austin <austin@xilinx.com>
Date: Fri, 21 Mar 2008 09:50:09 -0700
Links: << >> << T >> << A >>

Hello,

First, if you are student, I suggest you ask your professor to enter a
webcase on your behalf.  The 25% issue sounds like it might be a bug, or
a missing or wrong software switch, and is clearly beyond anything I can
help you with here.

If you are a customer, again, a webcase is in order.

For the question about why there is only an 8 mW difference (?), that
might be correct.  In a FPGA device, a great deal of "stuff" is there
regardless of what the "software" is doing.  Another way of saying this,
is that the MicroBlaze(tm) or 405PPC(tm) processors are really
incredibly efficient.  The 405PPC design in V2P is the lowest power per
MHz of any 405PPC at 130nm.  In V4 it is still "world class" in its very
low power/MHz.  So, the processor part of the design is almost nothing
(both in power, and die area) compared to IO, clocking, BRAM, etc.

Check using the Excel spreadsheet power estimator for just the 405PPC
block (as an example) and change the clock from 1 to 400 MHz, and see
what the mW/MHz is.  This might give you some idea of how efficient the
405PPC is.  The MicroBlaze is not that good (after all, it is soft, not
hard IP), but in terms of the total resources, it will provide you with
a means of comparison.

At the same time, you may place some clock trees in the Excel
spreadsheet, and see how they vary.  Same with 32 IO's. Or 512 DFF';s.
A bit of experimentation with the spreadsheet will provide you insight
into what is primarily responsible for the dynamic power.

Austin

Article: 130380
Subject: Re: Synoplify ???
From: ghelbig@lycos.com
Date: Fri, 21 Mar 2008 10:02:14 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 8:45=A0am, Mike Treseler <mike_trese...@comcast.net> wrote:
>
> Yes, having once tried quartus 1.0, I am still surprised that
> altera was able to catch up on language support and viewers.
> Synplicity was already focusing on asic prototyping as
> a result.
>
> =A0 =A0 -- Mike Treseler

Is QII-1.0 the last version you tried?

I once was a Leonardo/Synopsys/Synplicity user.  In the last few years
both QII and XST have improved to the point where I see no point in
maintaining the additional licenses.  (Or the annoyance of a split
tool chain.)

The last design I tried on both (A V2-1K, 75% full, 50MHz) Synplify
Pro was 2 CLB's smaller and 5 Hz faster than XST.

$.02,
G.

Article: 130381
Subject: Re: Synoplify ???
From: Mike Treseler <mike_treseler@comcast.net>
Date: Fri, 21 Mar 2008 10:33:53 -0700
Links: << >> << T >> << A >>

ghelbig@lycos.com wrote:

> Is QII-1.0 the last version you tried?

Today I am using quartus 7.2.
It works fine and has the best viewers.

> I once was a Leonardo/Synopsys/Synplicity user.  In the last few years
> both QII and XST have improved to the point where I see no point in
> maintaining the additional licenses.  (Or the annoyance of a split
> tool chain.)

We still have a leo license, but it is not used much.
I agree that not having to deal directly with netlists
and synthesis libraries is a plus for the designer.
XST 9.1 has one vhdl synthesis bug that I have to
work around, but it is usable for me.

> The last design I tried on both (A V2-1K, 75% full, 50MHz) Synplify
> Pro was 2 CLB's smaller and 5 Hz faster than XST.

Unfortunately for mentor, synplicity et. al.
parity has been reached by some fpga vendors
and vendor-independent synthesis licenses
are much harder to justify than they once were.

         -- Mike Treseler

Article: 130382
Subject: Re: chip scope
From: "Symon" <symon_brewer@hotmail.com>
Date: Fri, 21 Mar 2008 17:35:02 -0000
Links: << >> << T >> << A >>

<u_stadler@yahoo.de> wrote in message 
news:92773b89-b263-4105-a26e-9b5700738693@z38g2000hsc.googlegroups.com...
> hi well i think i did that.

Hi,
Clearly you didn't. Read UG029 Chapter 3.
HTH., Syms.

p.s. Is the 'Shift' key broken on your keyboard? ;-)
http://www.catb.org/~esr/faqs/smart-questions.html#writewell

Article: 130383
Subject: Re: Designing CPU
From: referringto@googlemail.com
Date: Fri, 21 Mar 2008 10:57:04 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 5:00 pm, Frank Buss <f...@frank-buss.de> wrote:
> referrin...@googlemail.com wrote:
> > Maybe this is of interest, it recently found a new home:
>
> >http://www.opencores.org/projects.cgi/web/mcpu/overview
>
> Nice small instruction set, but maybe too limitd, because even implementing
> a shift operation (e.g. for implementing multiplication) would need
> multiple instructions. And 64 bytes of data/code memory doesn't look very
> useful.

Well, it obviously is not meant for real world applications.

Article: 130384
Subject: Actel SX-A Timing Constraints Issues
From: kkoorndyk <kris.koorndyk@gmail.com>
Date: Fri, 21 Mar 2008 11:02:34 -0700 (PDT)
Links: << >> << T >> << A >>

Hello all,

I helping a colleague debug a timing issue on his SX-A design.  I
don't have any experience with Actel devices or tools, only Xilinx and
Altera, so setting timing constraints with these devices is new to
me.  I'm hoping someone might be able to help.

The design has two clock domains with a handful of control signals
crossing the domains:   48 MHz and 24 MHz generated on the board by a
12MHz oscillator through a zero delay buffer.  The FPGA is a bridge
between two different bus types.  The problem we're seeing is the
device fails when cold, but not after it has warmed up.

Apparently for the SX-A family, the only timing constraint functions
available are create_clock, set_min_delay and set_max_delay.  Here's
what he currently has:

create_clock -name {clk_48M_p} -period 20.000 -waveform {0.000 10.000}
{clk_48M_p}
create_clock -name {clk_24M_p} -period 40.000 -waveform {0.000 20.000}
{clk_24M_p}

set_max_delay 20.000 -from [all_inputs] -to [get_clocks{ * }]
set_max_delay 15.000 -from [get_clocks{ * }] -to [all_outputs]
set_max_delay 20.000 -from [all_inputs] -to [all_outputs]

I'm looking at that and thinking we might have an issue since it's
constraining the max delay from all inputs to both clocks as 20.000
ns.  Shouldn't we constrain the inputs/outputs in each clock domain
only to that clock?  I'm thinking the tools are going to spend
unnecessary effort to constrain the signals in the 24MHz domain when
we could somehow partition the constraints between the two clock
domains and allow the tools to put more effort on the 48MHz signals.
Does that sound right?

Is there a good, clean way of partitioning the constraints for the two
clock domains short of listing all of the different signals in each
domain?  Anybody have any other ideas?

Article: 130385
Subject: Re: Spartan 3E intefacing for dummies
From: "Eric Crabill" <eric.crabill@xilinx.com>
Date: Fri, 21 Mar 2008 11:09:44 -0700
Links: << >> << T >> << A >>

Hello,

Just so there is no confusion, you cannot directly apply 5V to the I/O of 
any memeber of the Spartan-3 Generation.  You should evaluate the input and 
output switching specifications for the devices you want to interface.  In 
some cases, you may be able to do direct interfacing but for all other 
cases, I recommend you use level translators.

I found http://www.ti.com/translation a useful reference.  There are many 
vendors with similar products, but I found TI has done a nice job of 
discussing the topic and  made it easy to find all of the technical 
collateral.  Please note, this is a personal recommendation and not an 
endorsement from Xilinx.

I see a lot of people using this series resistor "trick" to achive 
compatibility between 3.3V and 5V devices.  You have to be very careful with 
this technique, it will only work with devices that have I/O clamp diodes 
and even then, the selection of the resistor value has to be computed based 
on parameters in the device datasheet and IBIS models.  The resistor values 
will be different for Spartan-3 and Spartan-3E devices.  Further, this 
technique is totally invalid with Spartan-3A, Spartan-3AN, and Spartan-3ADSP 
devices, since these devices use a floating well in the I/O design and do 
not have clamp diodes to VCCO (except for a programmable clamp in PCI modes; 
this clamp is normally OFF).

Eric

<sky465nm@trline4.org> wrote in message news:fs09tt$nmr$1@aioe.org...
> Giuseppe Marullo <giuseppe.marullospamnot@iname.com> wrote:
>>Hi all,
>>just got Xylo-LM board (Spartan3E + FX2), I was searching for tips and
>
> This one?? http://www.knjn.com/docs/KNJN%20FX2%20ARM%20boards.pdf
>
>>tricks to avoid frying it. In particular, I was interested in interfacing
>>with outside world. So far, I found that Spartan 3E is 5V tolerant if you
>>put a 220 Ohm series resistor but this would work only in input.
>
>>What can I do to protect and make it  fully 5V tolerant?
>
> One possibility is to add a 5V tolerant buffer chip that works with 3.3V
> (LVTTL), which has the benefit of speed. Another one is the resistor one
> which for most cases are likely to be the most overall effecient.
>
>>If this is not possible, is there any DIL chip I could use to protect it? 
>>I
>>also have this problem with i2c bus, but this is a little more complicated
>>due to the bidirectionality of the signal.
>
> You can buy a simple chip packaged array of resistors to make sure nothing
> will overload the Spartan-3E.
>
> I suggest you have a look at the schematics of different developer boards.
> They often show how things should be done.

Article: 130386
Subject: Re: Actel SX-A Timing Constraints Issues
From: Mike Treseler <mike_treseler@comcast.net>
Date: Fri, 21 Mar 2008 11:15:35 -0700
Links: << >> << T >> << A >>

kkoorndyk wrote:

> Is there a good, clean way of partitioning the constraints for the two
> clock domains short of listing all of the different signals in each
> domain?  Anybody have any other ideas?

Consider running everything on 48 MHz and using
the 24 MHz as an input.

         -- Mike Treseler

Article: 130387
Subject: Re: Synoplify ???
From: "Symon" <symon_brewer@hotmail.com>
Date: Fri, 21 Mar 2008 18:20:56 -0000
Links: << >> << T >> << A >>

ghelbig@lycos.com wrote:
>
> The last design I tried on both (A V2-1K, 75% full, 50MHz) Synplify
> Pro was 2 CLB's smaller and 5 Hz faster than XST.
>
> $.02,
> G.

G.,
I guess the reason the two results go at the same speed is that the P&R 
tools don't optimise any further than the constraints specify. The 
performance of the synthesiser is truly revealed when timing is hard to 
meet.
Possibly there is a similar argument to the CLB usage. The P&R tools spread 
the LUTs out over the all slices available. What was the difference in 4-LUT 
count?
That's not to say that your implied point that XST is 'as good as' Synplify 
isn't true, it's just that the examples you provide to back up your position 
may not be appropriate.
Cheers, Syms.

Article: 130388
Subject: Re: Spartan 3E intefacing for dummies
From: "David Spencer" <davidmspencer@verizon.net>
Date: Fri, 21 Mar 2008 18:28:37 GMT
Links: << >> << T >> << A >>

"Eric Crabill" <eric.crabill@xilinx.com> wrote in message 
news:fs0tl8$idi3@cnn.xsj.xilinx.com...
>
> I see a lot of people using this series resistor "trick" to achive 
> compatibility between 3.3V and 5V devices.  You have to be very careful 
> with this technique, it will only work with devices that have I/O clamp 
> diodes and even then, the selection of the resistor value has to be 
> computed based on parameters in the device datasheet and IBIS models.  The 
> resistor values will be different for Spartan-3 and Spartan-3E devices. 
> Further, this technique is totally invalid with Spartan-3A, Spartan-3AN, 
> and Spartan-3ADSP devices, since these devices use a floating well in the 
> I/O design and do not have clamp diodes to VCCO (except for a programmable 
> clamp in PCI modes; this clamp is normally OFF).
>
> Eric

When using the series resistor technique there are also a couple of other 
considerations. Firstly, you need to make sure that the input FET gate can 
actually withstand 5V without breaking down - in other words, it is the 
maximum current and not maximum voltage that makes the pin non-5V tolerant 
to start with. Secondly, you need to make sure that the power supply 
powering the I/Os (or more particularly the clamp diodes) can sink current. 
If it can't you need to add a resistor from power to ground.

Article: 130389
Subject: Re: Actel SX-A Timing Constraints Issues
From: "David Spencer" <davidmspencer@verizon.net>
Date: Fri, 21 Mar 2008 18:35:13 GMT
Links: << >> << T >> << A >>

"kkoorndyk" <kris.koorndyk@gmail.com> wrote in message 
news:3d3b9bb5-ee3a-4084-8633-cb80a4131c72@s50g2000hsb.googlegroups.com...
> I'm looking at that and thinking we might have an issue since it's
> constraining the max delay from all inputs to both clocks as 20.000
> ns.  Shouldn't we constrain the inputs/outputs in each clock domain
> only to that clock?  I'm thinking the tools are going to spend
> unnecessary effort to constrain the signals in the 24MHz domain when
> we could somehow partition the constraints between the two clock
> domains and allow the tools to put more effort on the 48MHz signals.
> Does that sound right?
>
> Is there a good, clean way of partitioning the constraints for the two
> clock domains short of listing all of the different signals in each
> domain?  Anybody have any other ideas?
>
>

If the place and route is achieving timing closure then it doesn't matter 
that it might be over-constrained, and this won't be causing the failure. It 
is more likely you have a design problem with the retiming of signals 
crossing between the clock domains. Although the two clocks come from a 
common source, it's not wise to assume they are synchronized. Presumably the 
PLL in the clock buffer is multiplying the 12MHz up to 48MHz and dividing it 
down to generate the 24MHz. I would be surprised if it guarantees phase 
alignment between the two when operating at different frequencies.

Article: 130390
Subject: Re: Actel SX-A Timing Constraints Issues
From: kkoorndyk <kris.koorndyk@gmail.com>
Date: Fri, 21 Mar 2008 11:45:36 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 2:35 pm, "David Spencer" <davidmspen...@verizon.net> wrote:
> "kkoorndyk" <kris.koorn...@gmail.com> wrote in message
>
> news:3d3b9bb5-ee3a-4084-8633-cb80a4131c72@s50g2000hsb.googlegroups.com...
>
>
> If the place and route is achieving timing closure then it doesn't matter
> that it might be over-constrained, and this won't be causing the failure. It
> is more likely you have a design problem with the retiming of signals
> crossing between the clock domains. Although the two clocks come from a
> common source, it's not wise to assume they are synchronized. Presumably the
> PLL in the clock buffer is multiplying the 12MHz up to 48MHz and dividing it
> down to generate the 24MHz. I would be surprised if it guarantees phase
> alignment between the two when operating at different frequencies.

That's probably it.  I just took at look at the code and noticed a
couple of control signals crossing the clock domains without
appropriate synchronization.  Thanks for the input!

Article: 130391
Subject: Re: A Challenge for serialized processor design and implementation
From: Antti <Antti.Lukats@googlemail.com>
Date: Fri, 21 Mar 2008 12:20:29 -0700 (PDT)
Links: << >> << T >> << A >>

On 21 Mrz., 16:13, rickman <gnu...@gmail.com> wrote:
> On Mar 20, 3:19 am, Antti <Antti.Luk...@googlemail.com> wrote:
>
>
>
> > On 20 Mrz., 06:55, Jim Granville <no.s...@designtools.maps.co.nz>
> > wrote:
>
> > > Frank Buss wrote:
> > > > rickman wrote:
>
> > > >>Maybe I am missing something, but I have seen CPUs in FPGAs as small
> > > >>as 600 LUTs.  I am pretty sure the picoBlaze is about that size.
>
> > > > I think it is smaller, about 200 LUTs:
>
> > > >http://www.embeddedrelated.com/groups/fpga-cpu/show/2028.php
>
> > > And also the similar Mico8  ~240-323 LUThttp://www.latticesemi.com/products/intellectualproperty/referencedes...
>
> > > >>A bit serial CPU might be smaller than an 8 bit CPU, but what is the
> > > >>driving need for something that small?  600 LUTs is not much in a 3000
> > > >>LUT FPGA!
>
> > > > Could be interesting to pack it in a Max II, where the smallest device has
> > > > 240 LEs. Sometimes you need some high speed logic and some more complex
> > > > tasks, but which can be low speed (keyboard sampling, output to LCD text
> > > > display). If you can get an additional low speed CPU for free, you could
> > > > save an external microcontroller.
>
> > > The serial code memory is part of the appeal.
> > > FPGA cores are easy enough, but they are like stone soup,
> > > and you need to add code execution storage, = many pins, and EMC and PCB
> > > area issues.
> > > Single chip uC are a tough nut to crack, as they have FLASH+Analog,
> > > and higher volumes and growths than the FPGA sector.
>
> > > -jg
>
> > Hi all
>
> > I wasnt online yesterday (was in Tirol/Austri-not for fun) so I answer
> > all in one
>
> > "serial implementations of the past" - have worked with COP800 and I
> > had hard days optimizing DES for ST62T10
> > - none of those is suitable directly, maybe there is something to look
> > and learn, but not much to direct use
>
> > small FPGA soft-cores (existing ones)
> > - too large
>
> Until you tell us what "too large" means in numbers, we can't consider
> this requirement.

small enough means "virtually free" in most small FPGA's, from any
vendor
what may be small enough for Spartan3 may not be an option for Actel
as the fabric is too different and the "price of each type of resource
is different"

>
> > - not flexible in configuration options
>
> What flexibility do you require?
>
either with regisers in LUTRAM
or in BRAM
or even in external serial ram or ram buffer or serial rom

> > - no real C compilers exist
>
> That is not my understanding...   What do you mean by "real"?
>
NOT PLAY compilers.

try C compiler for picoblaze, that is an example of what is not really
really useful

> > - can not address "flat word" 32bit addressing
>
> That is an interesting requirement.  If you have a bit serial
> processor that executes, at best, 2 MIPS, why do you "require" a 32
> bit flat addressing model?  What is the application that needs 4 GB of
> address space?
>

yes, that is interesting and valid argument. if you did read my specs
you
maybe noticed that I mentioned possibility to use SD card as MAIN
memory
that is code execution directly while reading from sd card in place,
so
if the SD card interface is the CODE/DATA memory interface then it
would be nice to simply MAP ALL the card into the same flat space
sure the slow cpu would take ages executing a 4GB large program
but the ability to directly access the full sd card would still
simplify things

well the 32 bit is even too small as the card exeed 4GB (even microSD
are available in 8GB)

> > now some additional considerations:
>
> > 32 bit ALU for serial implementation cost the same as 1 bit ALU
>
> Not true.  The overhead for control of a bit serial ALU is higher than
> the data processing and more complex than a parallel data path.  There
> are happy mediums such as using an 8 bit core to perform 32 bit
> computations.
>

sure it is not 1:1 but wide datapath is overhead that sometimes
isnt needed

> > 32 bit registers if implemented in BRAM or data buffer of Atmel
>
> I don't know what this means.
>

it means that WHEN onchip RAM is at premium, then the spu would
use the DUAL RAM buffers in the SPI DATAFLASH ROM as main storage
for registers and stack, this would make i EXTREMLY slow but
if goal is not use any BRAM then it mayb only option

> > Dataflash cost very little (0 FPGA fabric resource)
>
> Yes, anytime you go off chip, you are not using the FPGA fabric.
>
> > code data memory space cost very little, so opcode density is almost
> > least priority
>
> Code memory may be inexpensive, but the time to fetch is not low.
> This ends up being a limiter on the speed, but then you have already
> given up any speed requirements in the initial set of constraints.
> I'm not so sure of data memory.  Are you talking ram or flash?
>
flash is cheap, serial ram in most cases doesnt makes sense
so the processor would in most cases have limited ram

> > number of cycles per instruction is also very low priority (at least
> > for some optimization options)
>
> > lets look sone special targets
>
> > 1) device S3AN-50
> > =============
>
> > if we use picoblaze, we use 30% of BRAM and some small % of FPGA, but
> > we only get 1KW of code and 8 bit processor/ALU
>
> You can always extend the code size by extending the processor to
> fetch from external dataflash.  I don't think CPU cores are locked to
> the original implementation.  Like I said above, an 8 bit processor
> can do 32 bit arithmetic very easily.
>

sure, any processor could be made to fetch code from external serial
memory
but only processor that was designed to perform for this operation can
do it optimal

Think for NIOS this is already done, a avalon slave that reads from
serial flash
I would like that for Microblaze too, but havent had eneough reasons
todo it,
and it would be still rather large implementation

> > a S3/Dataflash optimized SPU could use Dataflash dual ram buffers for
> > registers,ram,stack those not use BRAM at all. Say it would run at 512
> > clock per instruction, so what? It would be 32 bit processor with 0.1
> > MIPS leaving ALL BRAMs to the user and almost all of the fabric as
> > well. And it would not cost anything extra as it the Dataflash is
> > already present in the S3AN
>
> > 2) Actel A3P060/IGLOO60+SD card
> > ==========================
> > SPU should be configured to use half-BRAM for registes, ram,stack and
> > could be executing either from SD-card, again this would be 32 bit
> > processor with c compiler. Actel has no LUT ram option and any known
> > small 8 but softcore would already too be too large also. the code
> > memory price on SD card is virtually 0
>
> > 3) XXX + Winbond Quad SPI
> > ====================
> > here we could also achieve some not so bad performance despite the
> > serialized code memory
>
> > if all of the above special cases would be support by the same C
> > compiler (with settings to adapt to config differences) ?
>
> > single chip MCU's are hard to crack, but that isnt the goal, in many
> > cases there are "unused" resources present, so the SPU could really
> > come virtually free, besides an extra IC + extra 0.80 USD is still
> > extra cost for additional MCU in the system
>
> > Antti

Article: 130392
Subject: Re: verilog question, break while loop to avoid combinational
From: Gabor <gabor@alacron.com>
Date: Fri, 21 Mar 2008 12:27:56 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 12:16 pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Fri, 21 Mar 2008 10:04:23 -0400, Fei Liu wrote:
> >My brain is still wired to think only in software sense.
>
> A simulator is a piece of software.  I was talking about
> the behaviour of simulation's  software execution.
>
> Synthesis promises to build hardware that matches the
> code's simulation behaviour (within certain limits),
> or else give an error to say that it can't do that.
>
>
>
> > Another thing that I don't
> >understand is the disable command. In the following code example, the
> >'disable count;' statement does not break the loop (or maybe it's
> >reentered right away?)
>
> >Does always code_block causes code_block executed indefinitely even when
> >  code_block is disabled? always and disable seem to contradict with
> >each other.
> [...]
> >     always @(posedge clock)
> >     begin: count
> >         counter = counter + 1;
> >         if(counter == 1000) disable count;
> >     end
>
> The effect of "disable code_block" is to abandon execution
> of code_block.  This is equivalent to transferring control
> (jumping) to the "end" keyword that closes code_block.
>
> "always", like EVERY control construct in Verilog, controls
> exactly one procedural statement; in your case, that one
> statement is a "begin...end" block, with an @(posedge clock)
> delay prefix.  When the disable kills your begin...end block,
> the block terminates - the "always" then causes it to execute
> again, but as usual the @(posedge clock) prefix delays its
> execution until the next clock.  The counter will then
> increment from 1000 to 1001, and the if() test will fail,
> so your if()...disable has in fact had no effect whatever.
>
> If you want to disable a loop, you must disable a code block
> that encloses the entire loop.  Here are the standard ways
> to implement C's "continue" and "break" in Verilog:
>
>   initial begin : outer_block
>     forever begin : inner_block
>       ...
>       ...
>       if (...) disable inner_block;  // continue
>       ...
>       if (...) disable outer_block;  // break
>       ...
>     end // inner_block
>   end // outer_block
>
> Note that "forever" is simply "while (1)".
>
> This kind of code, using initial/forever/disable, is
> unlikely to be synthesisable.
> --
> Jonathan Bromley, Consultant
>
> DOULOS - Developing Design Know-how
> VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
>
> Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
> jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com
>
> The contents of this message may contain personal views which
> are not the views of Doulos Ltd., unless specifically stated.

Thanks for the "continue" and "break" comments.  I was looking
for these features in Verilog (for simulation), but noticing them
to be absent from the LRM, assumed there was no equivalent to
the C constructs.  I worked around this using the loop condition,
but disable seems to be exacly what I was after.  The only hitch
appears to be that the block must be named.

Regards,
Gabor

Article: 130393
Subject: Re: Designing CPU
From: donald <Donald@dontdoithere.com>
Date: Fri, 21 Mar 2008 13:01:02 -0700
Links: << >> << T >> << A >>

climber.tim@gmail.com wrote:
> Hi.
> It's probably not very good place for asking such, but there're should
> be at least those who knows starting points.
> We need to design our own CPU which can be very slow. It can execute
> each instruction, let's say, up to 50 cycles. We don't care about
> speed, and we are also don't care about memory size for microcode, but
> we're really care about CPU unit size.
> Where to read about CPU designing techniques, which are about shifting
> all possible to microcode from CPU unit? Extreme case will be probably
> Turing machine, but it's not practical. CPU registers and instructions
> in our case should be looks like ARM9 processor, maybe.

This entire thread has been very interesting in how to "design and 
build" a CPU from an FPGA.

Even the subject line states this.

However, the OP of this thread was not looking for an education, he was 
looking for a complete project with little or no work on his part.

This is obvious by his posting from a gmail account.

Everyone who has added to this thread must have assumed that the OP was 
interested in the "design" of such a device.

I for one thank all those who added to this interesting discussion, I 
may have learned something myself.

donald

Article: 130394
Subject: Re: Spartan 3E intefacing for dummies
From: Peter Alfke <peter@xilinx.com>
Date: Fri, 21 Mar 2008 13:48:40 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 11:28=A0am, "David Spencer" <davidmspen...@verizon.net>
wrote:
> "Eric Crabill" <eric.crab...@xilinx.com> wrote in message
>
> news:fs0tl8$idi3@cnn.xsj.xilinx.com...
>
>
>
> > I see a lot of people using this series resistor "trick" to achive
> > compatibility between 3.3V and 5V devices. =A0You have to be very carefu=
l
> > with this technique, it will only work with devices that have I/O clamp
> > diodes and even then, the selection of the resistor value has to be
> > computed based on parameters in the device datasheet and IBIS models. =
=A0The
> > resistor values will be different for Spartan-3 and Spartan-3E devices.
> > Further, this technique is totally invalid with Spartan-3A, Spartan-3AN,=

> > and Spartan-3ADSP devices, since these devices use a floating well in th=
e
> > I/O design and do not have clamp diodes to VCCO (except for a programmab=
le
> > clamp in PCI modes; this clamp is normally OFF).
>
> > Eric
>
> When using the series resistor technique there are also a couple of other
> considerations. Firstly, you need to make sure that the input FET gate can=

> actually withstand 5V without breaking down - in other words, it is the
> maximum current and not maximum voltage that makes the pin non-5V tolerant=

> to start with. Secondly, you need to make sure that the power supply
> powering the I/Os (or more particularly the clamp diodes) can sink current=
.
> If it can't you need to add a resistor from power to ground.

I was not too keen to jump onto this unpleasant subject, but I must
correct David:
The basic issue is to avoid excessive voltage on the input, and the
external resistor, together with the internal clamp diode (together
with a power supply that can absorb current) does just that
beautifully!. So the chip input never sees a voltage higher than Vcc +
0.7 V diode drop. You can easily measure this.
The secondary issue is the amount of current, if the resistor value is
too small, or the excessive delay if the resistor value is too high.
And all of this only works if there is a clamp diode to Vc (and the
supply can absorb the current, especially from many pins).
We just hope that 5-V soon becomes a relic of the past, same as 12 V
became obsolete long ago...
Peter Alfke, Xilinx

Article: 130395
Subject: Re: Spartan 3E intefacing for dummies
From: John Adair <g1@enterpoint.co.uk>
Date: Fri, 21 Mar 2008 15:36:59 -0700 (PDT)
Links: << >> << T >> << A >>

Giuseppe

We use bus switches for this type of problem. Have a look at the
schematics for our obsolete component replacement family Craignell
http://www.enterpoint.co.uk/component_replacements/craignell.html
where we have exactly this problem where we need to make a Spartan-3E
to be 5V tolerant but also need to achieve 5V CMOS levels on outputs
to the outside world.

John Adair
Enterpoint Ltd.

On 21 Mar, 07:34, "Giuseppe Marullo"
<giuseppe.marullospam...@iname.com> wrote:
> Hi all,
> just got Xylo-LM board (Spartan3E + FX2), I was searching for tips and
> tricks to avoid frying it. In particular, I was interested in interfacing
> with outside world. So far, I found that Spartan 3E is 5V tolerant if you
> put a 220Ohm series resistor but this would work only in input.
>
> What can I do to protect and make it =A0fully 5V tolerant?
>
> If this is not possible, is there any DIL chip I could use to protect it? =
I
> also have this problem with i2c bus, but this is a little more complicated=

> due to the bidirectionality of the signal.
>
> TIA,
>
> Giuseppe Marullo

Article: 130396
Subject: Re: ISE 10.0 finally with multi-threading and SV support ?
From: aludwin@altera.com
Date: Fri, 21 Mar 2008 16:44:13 -0700 (PDT)
Links: << >> << T >> << A >>

Hi all,

> > That is easy to do for high level synthesis, next to impossible for
> > placement and very difficult for routing.
> Parallel EDA software therefore is an active research
> topic but with no major successes yet. Don't expect the FPGA vendors
> to support multiple cores before the big EDA companies like synopsys,
> Magma DA or Cadence do.

Parallel EDA is is indeed difficult but we've had some success with it
here at Altera - our first parallel algorithms were shipped in 2006
and our first parallel placer was shipped in 2007. The main placement
algorithm is getting a speedup of 1.6x on two processors and 2.2x on
four, works without partitioning, and always gives exactly the same
answer as the serial version. We described the techniques we used at a
recent conference - if you're lucky enough to have an ACM web account,
you can read the full paper here:
http://portal.acm.org/citation.cfm?id=1344671.1344676&coll=GUIDE&dl=ACM&type=series&idx=SERIES100&part=series&WantType=Proceedings&title=FPGA&CFID=60200343&CFTOKEN=18993681.

As of the latest release of Quartus II, you'll get an average speedup
of about 15% on two processors and 20% on four. We're also actively
improving our existing parallel algorithms and working on a bunch of
new ones, so we expect these numbers to improve significantly in the
future. You can check out
http://www.altera.com/products/software/products/quartus2/multi-processor/qts-multi-processor.html
for a few more details, or find out what the numbers are for the
latest release.

Cheers,
Adrian Ludwin
Altera

Article: 130397
Subject: Re: A Challenge for serialized processor design and implementation
From: David R Brooks <davebXXX@iinet.net.au>
Date: Sat, 22 Mar 2008 07:58:23 +0800
Links: << >> << T >> << A >>

Antti wrote:
> On 21 Mrz., 16:13, rickman <gnu...@gmail.com> wrote:
>> On Mar 20, 3:19 am, Antti <Antti.Luk...@googlemail.com> wrote:
>>
[snip]

>>> - can not address "flat word" 32bit addressing
>> That is an interesting requirement.  If you have a bit serial
>> processor that executes, at best, 2 MIPS, why do you "require" a 32
>> bit flat addressing model?  What is the application that needs 4 GB of
>> address space?
>>
> 
> yes, that is interesting and valid argument. if you did read my specs
> you
> maybe noticed that I mentioned possibility to use SD card as MAIN
> memory
> that is code execution directly while reading from sd card in place,
> so
> if the SD card interface is the CODE/DATA memory interface then it
> would be nice to simply MAP ALL the card into the same flat space
> sure the slow cpu would take ages executing a 4GB large program
> but the ability to directly access the full sd card would still
> simplify things
> 
> well the 32 bit is even too small as the card exeed 4GB (even microSD
> are available in 8GB)
> 
>>> now some additional considerations:
>>> 32 bit ALU for serial implementation cost the same as 1 bit ALU
>> Not true.  The overhead for control of a bit serial ALU is higher than
>> the data processing and more complex than a parallel data path.  There
>> are happy mediums such as using an 8 bit core to perform 32 bit
>> computations.
>>
[snip]
But if you have long (ie 32 bit) words, that is 32 clocks for every 
operation, just to optimise the (relatively few) accesses to the full 
space of the SD card. Would it not be better to have, say 16 bit words, 
and divide the SD into 64k pages. Provide two page registers, one for 
code execution (in a cheapskate design, hardwire this as zero), and one 
to define the page for general data access. Optimise the most frequent 
case, ie every operation, not the less frequent, ie SD access to full range.

Article: 130398
Subject: Re: Spartan 3E intefacing for dummies
From: "Giuseppe Marullo" <giuseppe.marullospamnot@iname.com>
Date: Sat, 22 Mar 2008 01:12:03 +0100
Links: << >> << T >> << A >>

First of all thank you all for your answers.

>This one?? http://www.knjn.com/docs/KNJN%20FX2%20ARM%20boards.pdf

Yes, maybe this pdf  explains better the FPGA part (it has an ARM also):
http://www.knjn.com/docs/KNJN%20FX2%20FPGA%20boards.pdf

It says it is a Spartan XC3S500E (speed grade -4) in a pq208 package.
Very nice board, I am already able to talk with the FPGA thru the USB2.0
with a Delphi program.
This was my major concern, I wanted a easy way to exchange data with the pc,
and so far I was right.


>We use bus switches for this type of problem. Have a look at the
>schematics for our obsolete component replacement family Craignell
>http://www.enterpoint.co.uk/component_replacements/craignell.html
>where we have exactly this problem where we need to make a Spartan-3E
>to be 5V tolerant but also need to achieve 5V CMOS levels on outputs
>to the outside world.
>
>John Adair
>Enterpoint Ltd.

Curious, I was writing a good idea could be to use the PCI bus interface of
the Raggedstone (QS386PA), I guess Craignell uses a similar one (QS32X361?)
.

If I understand correctly, they should work bidirectionally, or it seems to
good to be true? Bidirectional, automatic, zero delay...

 There are just two drawbacks for me:

1) I need to find them at reasonable price in Italy in low qty

2) I need to build a smd pcb (no DIL package I suppose)

Other than this they should solve my problem, no messing with resistors and
currents bla bla.


Giuseppe Marullo

Article: 130399
Subject: Re: ISE 10.0 finally with multi-threading and SV support ?
From: <steve.lass@xilinx.com>
Date: Fri, 21 Mar 2008 18:22:49 -0600
Links: << >> << T >> << A >>

ISE 10.1 does not yet support multi-threading, but it does have a 2X Map&P&R 
runtime improvement.

No SV yet. You will have to get that from our partners (I guess that would 
be Synopsys if you're looking for SV synthesis).

Steve

"ratztafaz" <heinerlitz@googlemail.com> wrote in message 
news:fe06f4f7-1074-43ce-8926-012e6c5d81c2@m34g2000hsc.googlegroups.com...
> Will the 10th edition of ISE, to be relealed next week, finally
> support multithreading/SMP machines to reduce synthesis + P&R time?
>
> Will we finally get support for synthesis of System Verilog
> constructs?
>
> What other major features to you still miss? - Discuss!

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search