Messages from 107100

Article: 107100
Subject: Why isn't there a thermal diode on large FPGAs?
From: "PeteS" <PeterSmith1954@googlemail.com>
Date: 24 Aug 2006 11:36:56 -0700
Links: << >> << T >> << A >>

A recent thread went over the maximum permitted power on a device.

So why don't we have a sensing diode (well, a transistor with collector
tied to base works better) on the die somewhere? It's fairly easy to
do, according to my VLSI acquaintances, and with faster and faster IO
[implying as it does faster and faster logic switching as well], it
would make sense to see these on any largescale (more than 100 pins
perhaps) FPGA package.

Comments?

Cheers

PeteS

Article: 107101
Subject: Re: Xilinx BRAMs question - help needed ..
From: Ben Jackson <ben@ben.com>
Date: Thu, 24 Aug 2006 13:38:35 -0500
Links: << >> << T >> << A >>

On 2006-08-24, me_2003@walla.co.il <me_2003@walla.co.il> wrote:
> different bit field it should be also written to it (like a bit-wise OR
> between current value and new value).

You could have both sides write to their own BRAM and define the output
of the whole thing as the OR of the outputs of the BRAMs.

-- 
Ben Jackson AD7GD
<ben@ben.com>
http://www.ben.com/

Article: 107102
Subject: Re: Why isn't there a thermal diode on large FPGAs?
From: "MM" <mbmsv@yahoo.com>
Date: Thu, 24 Aug 2006 14:45:07 -0400
Links: << >> << T >> << A >>

"PeteS" <PeterSmith1954@googlemail.com> wrote in message >
>
> So why don't we have a sensing diode (well, a transistor with collector
> tied to base works better) on the die somewhere?

It is there, at least in Virtex 4.

/Mikhail

Article: 107103
Subject: Re: Why isn't there a thermal diode on large FPGAs?
From: "PeteS" <PeterSmith1954@googlemail.com>
Date: 24 Aug 2006 12:12:59 -0700
Links: << >> << T >> << A >>

MM wrote:
> "PeteS" <PeterSmith1954@googlemail.com> wrote in message >
> >
> > So why don't we have a sensing diode (well, a transistor with collector
> > tied to base works better) on the die somewhere?
>
> It is there, at least in Virtex 4.
>
> /Mikhail

I'd like to see it in all the larger devices :)

Cheers

PeteS

Article: 107104
Subject: Re: Style of coding complex logic (particularly state machines)
From: "Andy" <jonesandy@comcast.net>
Date: 24 Aug 2006 12:14:42 -0700
Links: << >> << T >> << A >>

Very interesting coding style. I'm curious why there are separate
clocked processes. You could just tack on the output code to the bottom
of the state transition process, but that is only a nit.

As long as I'm using registered outputs, I would personally prefer a
combined process, but that's just how I approach the problem. I want to
know everthing that happens in conjunction with a state by looking in
one place, not by looking here to see where/when the next state goes,
and then looking there to see what outputs are generated.

To illustrate, by modifying the original example:

 my_state_proc: process(clk, reset_n)
  type my_state_type is (wait, act, test);
  variable my_state: my_state_type;
 begin
   if (reset_n = '0') then
     my_state := wait;
     my_output <= '0';
   elsif (rising_edge(clk))
     case my_state is
       when wait =>
         if (some_input = some_value) then
           my_state := act;
         end if;
         ...
         ...
       when act =>
         if some_input = some_other_val then
           my_output <= yet_another_value;
         else
           ...
        end if;         ...
       when test =>
         ...
       when others =>
         my_state := wait;
      end case;
   end if;
 end process;

The only time I would use separate logic code for outputs is if I
wanted to have combinatorial outputs (from registered variables, not
from inputs). Then I would put the output logic code after the clocked
clause, inside the process. I try to avoid combinatorial
input-to-output paths if at all possible.

Then it would look like this:

 my_state_proc: process(clk, reset_n)
  type my_state_type is (wait, act, test);
  variable my_state: my_state_type;
 begin
   if (reset_n = '0') then
     my_state := wait;
     my_output <= '0';
   elsif (rising_edge(clk))
     case my_state is
       when wait =>
         if (some_input = some_value) then
           my_state := act;
         end if;
         ...
         ...
       when act =>
         if some_input = some_other_val then
           my_output <= yet_another_value;
         else
           ...
        end if;         ...
       when test =>
         ...
       when others =>
         my_state := wait;
      end case;
   end if;
   if state = act then -- cannot use process inputs here
      my_output <= yet_another_value; -- or here
   end if;
 end process;

Interestingly, the clock cycle behavior of the above is identical if I
changed the end of the process to:

      ...
      end case;
      -- you CAN use process inputs here:
      if (state = act) then
         my_output <= yet_another_value; -- or here
      end if;
   end if;
 end process;

Note that my_output is now a registered output from combinatorial
inputs, whereas before it was a combinatorial output from registered
values. Previously  you could not use process inputs, now you can.

Andy

Eli Bendersky wrote:
> backhus wrote:
> > Hi Eli,
> > discussion about styles is not really satisfying. You find it in this
> > newsgroup again and again, but in the end most people stick to the style
> > they know best. Style is a personal queastion than a technical one.
> >
> > Just to give you an example:
> > The 2-process -FSM you gave as an example always creates the registered
> > outputs one clock after the state changes. That would drive me crazy
> > when checking the simulation.
>
> I guess this indeed is a matter of style. It doesn't drive me crazy
> mostly because I'm used to it. Except in rare cases, this single clock
> cycle doesn't change anything. However, the benefit IMHO is that the
> separation is cleaner, especially when a lot of signals depend on the
> state.
>
> >
> > Why are you using if-(elsif?) in the second process? If you have an
> > enumerated state type you could use a case there as well. Would look
> > much nicer in the source, too.
>
> I prefer to use if..else if there is only one "if". When there are
> "elsif"s, case is preferable.
>
> >
> > Now... Will you change your style to overcome these "flaws" or are you
> > still satisfied with it, becaused you are used to it?
> >
> > Both is OK. :-)
> >
> > Anyway, each style has it's pros and cons and it always depends on what
> > you want to do.
> > -- has the synthesis result to be very fast or very small?
> > -- do you need to speed up your simulation
> > -- do you want easy readable sourcecode (that also is very personal,
> > what one considers "readable" may just look like greek to someone else)
> > -- etc. etc.
> >
> > So, there will be no common consensus.
> >
>
> In my original post I had no intention to reach a common consensus. I
> wanted to see practical code examples which demonstrate the various
> techniques and discuss their relative merits and disadvantages.
>
> Kind regards,
> Eli
>
>
>
> > Eli Bendersky schrieb:
> > > Hello all,
> > >
> > > In a recent thread (where the O.P. looked for a HDL "Code Complete"
> > > substitute) an interesting discussion arised regarding the style of
> > > coding state machines. Unfortunately, the discussion was mostly
> > > academic without much real examples, so I think there's place to open
> > > another discussion on this style, this time with real examples
> > > displaying the various coding styles. I have also cross-posted this to
> > > c.l.vhdl since my examples are in VHDL.
> > >
> > > I have written quite a lot of VHDL (both for synthesis and simulation
> > > TBs) in the past few years, and have adopted a fairly consistent coding
> > > style (so consistent, in fact, that I use Perl scripts to generate some
> > > of my code :-). My own style for writing complex logic and state
> > > machines in particular is in separate clocked processes, like the
> > > following:
> > >
> > >
> > > type my_state_type is
> > > (
> > >   wait,
> > >   act,
> > >   test
> > > );
> > >
> > > signal my_state: my_state_type;
> > > signal my_output;
> > >
> > > ...
> > > ...
> > >
> > > my_state_proc: process(clk, reset_n)
> > > begin
> > >   if (reset_n = '0') then
> > >     my_state <= wait;
> > >   elsif (rising_edge(clk))
> > >     case my_state is
> > >       when wait =>
> > >         if (some_input = some_value) then
> > >           my_state <= act;
> > >         end if;
> > >         ...
> > >         ...
> > >       when act =>
> > >         ...
> > >       when test =>
> > >         ...
> > >       when others =>
> > >         my_state <= wait;
> > >      end case;
> > >   end if;
> > > end process;
> > >
> > > my_output_proc: process(clk, reset_n)
> > > begin
> > >   if (reset_n = '0') then
> > >     my_output <= '0';
> > >   elsif (rising_edge(clk))
> > >     if (my_state = act and some_input = some_other_val) then
> > >       ...
> > >     else
> > >       ...
> > >    end if;
> > >   end if;
> > > end process;
> > >
> > >
> > > Now, people were referring mainly to two styles. One is variables used
> > > in a single big process, with the help of procedures (the style Mike
> > > Tressler always points to in c.l.vhdl), and another style - two
> > > processes, with a combinatorial process.
> > >
> > > It would be nice if the proponents of the other styles presented their
> > > ideas with regards to the state machine design and we can discuss the
> > > merits of the approaches, based on real code and examples.
> > > 
> > > Thanks
> > > Eli
> > >

Article: 107105
Subject: Re: QuickLogic
From: Uwe Bonnes <bon@hertz.ikp.physik.tu-darmstadt.de>
Date: Thu, 24 Aug 2006 19:16:20 +0000 (UTC)
Links: << >> << T >> << A >>

Chuck Levin <clevin1234@comcast.net> wrote:
> Hi,

>   I was thinking about using QuickLogic for a low power FPGA design. Does
> anyone have any experiences they would like to share about their devices or
> tools ?

Be sure that you have neither reprogrammability nor in-circuit
programmability at all.

-- 
Uwe Bonnes                bon@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik  Schlossgartenstrasse 9  64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------

Article: 107106
Subject: Re: Xilinx BRAMs question - help needed ..
From: me_2003@walla.co.il
Date: 24 Aug 2006 12:17:22 -0700
Links: << >> << T >> << A >>

Hi peter,

Peter Alfke wrote:
> I did not really understand the question, but:
> You can configure the BRAM 1-bit wide, and thus address each bit
> individually.
> It seems to me that this solves all your problems.

Yes, that is what I was going to do but i thought that maybe someone
will come out with a nicer idea, or that maybe there is a way to make
Xilinx BRAMs to work as I needed them to (data out in the same cycle).
the clock rate is not the issue beacuse I dont want to give two cycles
for each write (not very elegant).

> You can configure the two ports separately, e.g. one can be 1-bit wide,
> the other 9 bits wide.

That's an interesting idea, If I define portA to be 1 bit wide and
portB 6 bits wide, when reading from portB (address 0) would I get the
6 values written to portA (address 0 to 5) ?

Thanks, Mordehay.

Article: 107107
Subject: Re: Xilinx BRAMs question - help needed ..
From: me_2003@walla.co.il
Date: 24 Aug 2006 12:20:48 -0700
Links: << >> << T >> << A >>

Ben Jackson wrote:
> On 2006-08-24, me_2003@walla.co.il <me_2003@walla.co.il> wrote:
> > different bit field it should be also written to it (like a bit-wise OR
> > between current value and new value).
>
> You could have both sides write to their own BRAM and define the output
> of the whole thing as the OR of the outputs of the BRAMs.
>

Yes but I need a 6 bit wide vector - so that means that I would have to
use 6 BRAMs.
while utilizing a very small precentage of each (I need only 512
entries).

Thanks, Mordehay.

Article: 107108
Subject: RocketIO over cable
From: "vt2001cpe" <vt2001cpe@gmail.com>
Date: 24 Aug 2006 12:22:57 -0700
Links: << >> << T >> << A >>

Anyone have experience with directly driving a cable with RocketIO? I
am interested in any information/experiences/advice regarding linking
two FPGAs via RocketIO over a cable. I have seen some signal
characterization information for high-speed links over copper, but
usually less than 800Mhz. I believe my implementation would use a a
less than 1 meter, but would like to know it works at 3, 5,
10...meters. Ideally I would like to run the link at 10gbits, but
6gbits could work. How feasible is this, or is it back to the drawing
board?

Thanks in advance!
Dennis

Article: 107109
Subject: Re: Open source Xilinx JTAG Programmer released on sourceforge.net
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 24 Aug 2006 21:26:42 +0200
Links: << >> << T >> << A >>

David Ashley wrote:
> Andreas Ehliar wrote:
> 
>>No need to do that, the S3E starter kit is supported by Bryan's
>>modified XC3Sprog, available at http://inisyn.org/src/xup/ .
>>
>>/Andreas
> 
> 
> Andreas,
> 
> Very good! It says USB1 is not supported, get a USB2
> card. I'm not sure what this means. I'm able to download
> bit files to spartan-3e starter board with no problems under
> windows -- but I don't think my usb controller is usb 2.0
> (as in high speed). Will the xup work on my machine?
> 
> -Dave

F***'n A! I just answered my own question. Followed the
step by step instructions except I'm using gentoo so just
did
emerge sdcc
Then picked up from the rest of the process (from the
tar zxvf xup-0.0.2.tar.gz) and everything worked fine.

The ./p outputs a lot of text and I wasn't sure if there was
a problem, but after 5 minutes or so it finished reporting
success.

The step "./xc3prog /some/file.bit" needs to actually be
"xc3sprog"
       ^
The 's' is missing.

Worked fine. This fills in a big hole in my linux development
approach. All I'm likely to be doing is fire-and-forget downloads
anyway. The fpga isn't big enough for chipscope and I don't
want to pay for chipscope anyway :^). I'll invest in a logic
analyzer and just bring signals out if necessary. 8 seconds
to download is the least of my worries -- making the bit
file in the first place takes lots of minutes.

Thanks!!!

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 107110
Subject: Re: JOP as SOPC component
From: "KJ" <Kevin.Jennings@Unisys.com>
Date: 24 Aug 2006 12:32:45 -0700
Links: << >> << T >> << A >>

Tommy Thorn wrote:
> KJ wrote:
> .... a (AFAICT) correct description of Avalon.
>
> Ah, we only differ in perspective. Yes, Avalon _allows_ you to write
> slaves like that
Umm, yeah it's defined up front in the spec and not off in some corner
like Wishbone's tag method either.

> and if your fabric consists only of such slaves, then
> yes, they are the same.
What is the same as what?  Also, there is no restriction about having
latency aware masters and slaves.

> But variable latency does _not_ work like that,
How do you think it works?  I've been using the term 'variable latency'
as it is used by Avalon which is that there can be an arbitrary delay
between the end of the address phase (i.e. when waitrequest is not
asserted to the master) and the end of the data phase (i.e. when
readdatavalid, is asserted to the master).

> thus you can't make such an assumption in general if you wish the fabric
> to be able to accommodate arbitrary Avalon slaves.
What assumption do you think I'm making?  The Avalon fabric can connect
any mix of Avalon slaves whether they are fixed latency, variable
latency or no latency (i.e. controlled by waitrequest).  Furthermore it
can be connected to an Avalon master that is 'latency aware' (i.e. has
a 'readdatavalid' input) or one that is not (i.e. does not have
'readdatavalid' as an input, so cycles are controlled only by
'waitrequest').  You get different performance based on which method is
used but that is a design choice on the master and slave side design,
not something that Avalon is doing anything to help or hinder.

>
> That was not my understanding. SimpCon allows Martin to get an "early
> warning" that a transaction is about to complete.
And what happens as a result of this 'early warning'?  I *thought* it
allowed the JOP Avalon master to start up another transaction of some
sort.  If so, then that can be accomplished with waitrequest and
readdatavalid.  But maybe it's something on the data path side that
gets the jump that I'm just not getting just yet.

>
> > Anyway, hopefully that explains why it's not abusing Avalon in any way.
>
> My wording was poor. Another way to say it is "to use Avalon in a
> constrained way".
I'm not clear on what constraint you're seeing in the usage.

> Used this way you cannot hook up slaves with variable
> latency, so it's not really Avalon, it's a subset of Avalon.
If anything, choosing to not use the readdatavalid signal in the master
or slave design to allow for completion of the address phase prior to
the data phase is the subset not the other way around.

KJ

Article: 107111
Subject: Re: Style of coding complex logic (particularly state machines)
From: mikegurche@yahoo.com
Date: 24 Aug 2006 12:36:50 -0700
Links: << >> << T >> << A >>

Eli Bendersky wrote:
> Hello all,
>
> In a recent thread (where the O.P. looked for a HDL "Code Complete"
> substitute) an interesting discussion arised regarding the style of
> coding state machines. Unfortunately, the discussion was mostly
> academic without much real examples, so I think there's place to open
> another discussion on this style, this time with real examples
> displaying the various coding styles. I have also cross-posted this to
> c.l.vhdl since my examples are in VHDL.
>
> I have written quite a lot of VHDL (both for synthesis and simulation
> TBs) in the past few years, and have adopted a fairly consistent coding
> style (so consistent, in fact, that I use Perl scripts to generate some
> of my code :-). My own style for writing complex logic and state
> machines in particular is in separate clocked processes, like the
> following:
>
>
> type my_state_type is
> (
>   wait,
>   act,
>   test
> );
>
> signal my_state: my_state_type;
> signal my_output;
>
> ...
> ...
>
> my_state_proc: process(clk, reset_n)
> begin
>   if (reset_n = '0') then
>     my_state <= wait;
>   elsif (rising_edge(clk))
>     case my_state is
>       when wait =>
>         if (some_input = some_value) then
>           my_state <= act;
>         end if;
>         ...
>         ...
>       when act =>
>         ...
>       when test =>
>         ...
>       when others =>
>         my_state <= wait;
>      end case;
>   end if;
> end process;
>
> my_output_proc: process(clk, reset_n)
> begin
>   if (reset_n = '0') then
>     my_output <= '0';
>   elsif (rising_edge(clk))
>     if (my_state = act and some_input = some_other_val) then
>       ...
>     else
>       ...
>    end if;
>   end if;
> end process;
>
>
> Now, people were referring mainly to two styles. One is variables used
> in a single big process, with the help of procedures (the style Mike
> Tressler always points to in c.l.vhdl), and another style - two
> processes, with a combinatorial process.
>
> It would be nice if the proponents of the other styles presented their
> ideas with regards to the state machine design and we can discuss the
> merits of the approaches, based on real code and examples.
>
> Thanks
> Eli

I usually separate the state register and combinational logic for the
following reason.

First, I think that the term "coding style" is very misleading.  It
is more like "design style".  My approach for designing a system
(not just FSM) is
- Study the specification and think about the hardware architecture
- Draw a sketch of top-level block diagram and determine the
functionalities of the blocks.
- Repeat this process recursively if a block is too complex
- Derive HDL code according to the block diagram and perform synthesis.
This approach is based on the observation that synthesis software is
weak on architecture-level manipulation but good at gate-level logic
minimization.  It allows me to have full control of the system
architecture (e.g., I can easily identify the key components, optimize
critical path etc.).

The basic block diagram of FSM (and most sequential circuits) consists
of a register, next-state logic and output logic.  Based on my design
style, it is natural to describe each block in a process or a
concurrent signal assignment.  The number of segments (process and
concurrent signal assignments etc.) is really not an issue. It is just
a by-product of this design style.

The advantage of this approach is that I have better control on final
hardware implementation.  Instead of blindly relying on synthesis
software and testing code in a trial-and-error basis, I can
consistently get what I want, regardless which synthesis software is
used.  On the downside, this approach requires more time in initial
design phase and the code is less compact.  The VHDL code itself
sometimes can be cumbersome. But it is clear and easy to comprehend
when presented with the block diagram.

One interesting example in FSM design is the look-ahead output buffer
discussed in section 10.7.2 of "RTL Hardware Design Using VHDL"
(http://academic.csuohio.edu/chu_p/), the book mentioned in the
previous thread.  It is a clever scheme to obtain a buffered Moore
output without the one-clock delay penalty.  The code follows the block
diagram and uses four processes, one for state register, one for output
buffer, one for next-state logic and one for look-ahead output logic.
Although it is somewhat lengthy, it is easy to understand.   I believe
the circuit can be described by using one clocked process with proper
mix of signals and variables and reduce the code length by 3 quarters,
but I feel it will be difficult to relate the code with the actual
circuit diagram and vice versa.

My 2 cents.

Mike G.

Article: 107112
Subject: Re: Xilinx Floorplanner
From: Ray Andraka <ray@andraka.com>
Date: Thu, 24 Aug 2006 15:38:32 -0400
Links: << >> << T >> << A >>

Brad Smallridge wrote:

> 
> It seemed that what I was looking at was completely flattened, part of the
> mapping process? The only thing that was grouped was the carry chains for
> some of the counters in the design. Everything else was in top.  Should
> I be adding constraints to my vhdl code instead of using the floorplanner?
> 

The design is hierarchical, ie. different VHDL components for each 
submodule, yes?  If so then the edif netlist should be hierarchical. 
Make sure the synthesizer you are using isn't set to flatten the design, 
and check in PAR to verify the properties are set to flatten.

>>>How do I add registers to allow a bus to transverse across the chip and
>>>not have the synth tool pack the registers into an SRL16?
>>
>>It has to be done in your RTL of course.
> 
> 
> I still don't know what RTL is.

Register Transfer Level design.  It is basically the device independent 
HDL source code.  My usage above is I guess improper.  I should have 
said "It has to be done in your source of course".

> 
> 
>>The easiest way to prevent SRL16 inference is to put a reset on the 
>>flip-flops.
> 
> 
> That's clever.
Whatever. It works.

Article: 107113
Subject: Re: Xilinx BRAMs question - help needed ..
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 24 Aug 2006 21:38:59 +0200
Links: << >> << T >> << A >>

me_2003@walla.co.il wrote:
>>You can configure the two ports separately, e.g. one can be 1-bit wide,
>>the other 9 bits wide.
> 
> 
> That's an interesting idea, If I define portA to be 1 bit wide and
> portB 6 bits wide, when reading from portB (address 0) would I get the
> 6 values written to portA (address 0 to 5) ?

I don't think 6 bits wide is an option, I'd expect only powers of 2
are allowed. Could you make port B 8 bits and just throw away 2
bits?
So port A bit # = addr*8 + bit [0-5]

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 107114
Subject: Re: Style of coding complex logic (particularly state machines)
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 24 Aug 2006 22:05:38 +0200
Links: << >> << T >> << A >>

mikegurche@yahoo.com wrote:
> One interesting example in FSM design is the look-ahead output buffer
> discussed in section 10.7.2 of "RTL Hardware Design Using VHDL"
> (http://academic.csuohio.edu/chu_p/), the book mentioned in the
> previous thread.  It is a clever scheme to obtain a buffered Moore
> output without the one-clock delay penalty.  The code follows the block
> diagram and uses four processes, one for state register, one for output
> buffer, one for next-state logic and one for look-ahead output logic.
> Although it is somewhat lengthy, it is easy to understand.   I believe
> the circuit can be described by using one clocked process with proper
> mix of signals and variables and reduce the code length by 3 quarters,
> but I feel it will be difficult to relate the code with the actual
> circuit diagram and vice versa.

Combining the input and registered state this way allows for
a non registered path from input to output. Is this ok? Or is
there an assumption that the device connected to the output
is itself latching on the clock edge?

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 107115
Subject: Re: RocketIO over cable
From: "Brendan Illingworth" <billingworth@furaxa.com>
Date: Thu, 24 Aug 2006 13:08:42 -0700
Links: << >> << T >> << A >>

Haven't had any experience with this yet, however in anticipation of an 
upcoming project I have been looking into solutions.  So far it seems like 
the highest electrical performace solution is to use a Zd type connector and 
associated cable.  This is a relatively new connector series that has been 
incorporated in the PICMG telecom standards.  These connectors and cables 
are made by a number of companies including:

Tyco Amp:
http://catalog.tycoelectronics.com/TE/bin/TE.Connect?C=22438&F=0&M=CINF&BML=10576,16358,17560,17759,17654&GIID=0&LG=1&I=13&RQS=C~22438^M~FEAT^BML~10576,16358,17560,17759,17654^G~G

ERNI:
http://www.erni.com/ermetzdfront.htd

Gore:
http://www.gore.com/en_xx/products/cables/copper/backplane/gore_eye-opener_airmaxvs_cable_assemblies.html

Hope this helps
Brendan

"vt2001cpe" <vt2001cpe@gmail.com> wrote in message 
news:1156447377.879040.291470@b28g2000cwb.googlegroups.com...
> Anyone have experience with directly driving a cable with RocketIO? I
> am interested in any information/experiences/advice regarding linking
> two FPGAs via RocketIO over a cable. I have seen some signal
> characterization information for high-speed links over copper, but
> usually less than 800Mhz. I believe my implementation would use a a
> less than 1 meter, but would like to know it works at 3, 5,
> 10...meters. Ideally I would like to run the link at 10gbits, but
> 6gbits could work. How feasible is this, or is it back to the drawing
> board?
>
> Thanks in advance!
> Dennis
>

Article: 107116
Subject: Re: Why No Process Shrink On Prior FPGA Devices ?
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Fri, 25 Aug 2006 08:12:57 +1200
Links: << >> << T >> << A >>


tweed_deluxe wrote:
> I'm wondering what intrinsic ecomomic, technical, or "other" barriers
> have precluded FPGA device vendors from taking this step.   In other
> words, why are there no advertised, periodic refreshes of older
> generation FPGA devices.

Reasonable question

> 
> In the microprocessor world, many vendors have established a long and
> succesful history of developing a pin compatible product roadmap for
> customers.  For the most part, these steps have allowed customers to
> reap periodic technology updates without incurring the need to perform
> major re-work on their printed circuit card designs or underlying
> software.
> 
> On the Xilinx side of the fence there appears to be no such parallel.
> Take for example, Virtex-II Pro.  This has been a proven work-horse for
> many of our designs.    It takes quite a bit of time to truly
> understand and harness all of the capabilities and features offered by
> a platform device like this.      After making the investment to
> develop IP and hardware targeted at this technology, is it unreasonable
> to expect a forward looking roadmap that incorporates modest updates to
> the silicon ?   A step that doesn't require a flow blown jump to a new
> FPGA device family and subsequent re-work of the portfolio of hardware
> and, very often, the related FPGA IP ?
> 
> Sure, devices like Virtex-5 offer capabilities that will be true
> enablers for many customers (and for us at times as well).   But why
> not apply a 90 or 65 nm process shrink to V2-Pro, provide modest speed
> bumps to the MGT, along with minor refinements to the hardware
> multipliers.  Maybe toss in a PLL for those looking to recover clocks
> embedded in the MGT data stream etc.    And make the resulting devices
> 100% pin and code compatible with prior generations.
> 
> Perhaps I'm off in the weeds.  But, in our case, the ability to count
> on continued refinement and update of a pin-comaptible products like
> V2-Pro would result in more orders of Xilinx silicon as opposed to
> fewer.
> 
> The absence of such refreshes in the FPGA world leads me to believe
> that I must be naive.  So I am trying to understand where the logic is
> failing.  Its just that there are times I wish the FPGA vendors could
> more closely parallel what the folks in the DSP and micro-processor
> world do ...

  The FPGA market is not growing all that quickly, so the funds are not
available for this.

  You will also find that the design life of FPGA products is shorter 
than DSP/microprocessors, plus they cannibalize sales of their 'hot new'
devices, as well as confuse the designers.

  Sometimes, there are physical barriers, like changes to flip chip, and 
whole-die bonding, that mandate BGA. There, backwards compatible has to
go - and that's the key reason for doing this.

  All those factors, mean this is unlikely to happen.

  What they CAN do, is try and keep ball-out compatible, over a couple 
of generations, but I'm not sure even that relatively simple effort is 
pushed too hard ?

-jg

Article: 107117
Subject: Re: high level languages for synthesis
From: kayrock66@yahoo.com
Date: 24 Aug 2006 13:15:19 -0700
Links: << >> << T >> << A >>

I wouldn't bother, the abstraction added is not sufficient above
properly written HDL to warrent the extra step.  To get reasonable
implimentation you will end up writting "C" in such a limited form it
won't help you much.

Sanka Piyaratna wrote:
> Hi,
>
> What is your opinion on high level languages such as systems C, handel-C
> etc. for FPGA development instead of VHDL/Verilog?
> 
> Sanka

Article: 107118
Subject: Re: Why isn't there a thermal diode on large FPGAs?
From: Austin Lesea <austin@xilinx.com>
Date: Thu, 24 Aug 2006 13:18:37 -0700
Links: << >> << T >> << A >>

PeteS,

There is a diode in every (Virtex) device.

If you have an old data sheet, you might want to get a newer one.

If you find a data sheet without a diode, let me know, and I will find
which pins are the diode.

It seems that there are those who do not understand how to cool our
devices, as well.  For help on heatsinks, etc. we have a lot of
information that would allow up to (and perhaps more) 25 watts per
device of heat.  That is basically 100% of everything switching, at or
near the BUFG clock maximum frequency.

http://www.xilinx.com/bvdocs/userguides/ug112.pdf

The heatsinking has to be able to remove the heat such that the 85C
(commercial) or 100 C (industrial) junction temperature is not exceeded,

Austin

PeteS wrote:
> A recent thread went over the maximum permitted power on a device.
> 
> So why don't we have a sensing diode (well, a transistor with collector
> tied to base works better) on the die somewhere? It's fairly easy to
> do, according to my VLSI acquaintances, and with faster and faster IO
> [implying as it does faster and faster logic switching as well], it
> would make sense to see these on any largescale (more than 100 pins
> perhaps) FPGA package.
> 
> Comments?
> 
> Cheers
> 
> PeteS
>

Article: 107119
Subject: Re: Why No Process Shrink On Prior FPGA Devices ?
From: kayrock66@yahoo.com
Date: 24 Aug 2006 13:22:23 -0700
Links: << >> << T >> << A >>

They actually kinda do die shrinks, but they give the new die a new
name and alter a few other things.  Example, a Virtex 2 becomes a
Spartan 3, you get the idea...

For each step in the process technology they make the trade offs that
make sense for those geometries (e.g bigger memory).  They first
release a high priced wiz bang part with one name, then they follow up
with a lower price smaller die using the same process and give it a
different name.


tweed_deluxe wrote:
> I'm wondering what intrinsic ecomomic, technical, or "other" barriers
> have precluded FPGA device vendors from taking this step.   In other
> words, why are there no advertised, periodic refreshes of older
> generation FPGA devices.
>
> In the microprocessor world, many vendors have established a long and
> succesful history of developing a pin compatible product roadmap for
> customers.  For the most part, these steps have allowed customers to
> reap periodic technology updates without incurring the need to perform
> major re-work on their printed circuit card designs or underlying
> software.
>
> On the Xilinx side of the fence there appears to be no such parallel.
> Take for example, Virtex-II Pro.  This has been a proven work-horse for
> many of our designs.    It takes quite a bit of time to truly
> understand and harness all of the capabilities and features offered by
> a platform device like this.      After making the investment to
> develop IP and hardware targeted at this technology, is it unreasonable
> to expect a forward looking roadmap that incorporates modest updates to
> the silicon ?   A step that doesn't require a flow blown jump to a new
> FPGA device family and subsequent re-work of the portfolio of hardware
> and, very often, the related FPGA IP ?
>
> Sure, devices like Virtex-5 offer capabilities that will be true
> enablers for many customers (and for us at times as well).   But why
> not apply a 90 or 65 nm process shrink to V2-Pro, provide modest speed
> bumps to the MGT, along with minor refinements to the hardware
> multipliers.  Maybe toss in a PLL for those looking to recover clocks
> embedded in the MGT data stream etc.    And make the resulting devices
> 100% pin and code compatible with prior generations.
>
> Perhaps I'm off in the weeds.  But, in our case, the ability to count
> on continued refinement and update of a pin-comaptible products like
> V2-Pro would result in more orders of Xilinx silicon as opposed to
> fewer.
>
> The absence of such refreshes in the FPGA world leads me to believe
> that I must be naive.  So I am trying to understand where the logic is
> failing.  Its just that there are times I wish the FPGA vendors could
> more closely parallel what the folks in the DSP and micro-processor
> world do ...

Article: 107120
Subject: Re: RocketIO over cable
From: "vt2001cpe" <vt2001cpe@gmail.com>
Date: 24 Aug 2006 13:33:02 -0700
Links: << >> << T >> << A >>


Brendan Illingworth wrote:
> Haven't had any experience with this yet, however in anticipation of an
> upcoming project I have been looking into solutions.  So far it seems like
> the highest electrical performace solution is to use a Zd type connector and
> associated cable.  This is a relatively new connector series that has been
> incorporated in the PICMG telecom standards.  These connectors and cables
> are made by a number of companies including:
>
> Tyco Amp:
> http://catalog.tycoelectronics.com/TE/bin/TE.Connect?C=22438&F=0&M=CINF&BML=10576,16358,17560,17759,17654&GIID=0&LG=1&I=13&RQS=C~22438^M~FEAT^BML~10576,16358,17560,17759,17654^G~G
>
> ERNI:
> http://www.erni.com/ermetzdfront.htd
>
> Gore:
> http://www.gore.com/en_xx/products/cables/copper/backplane/gore_eye-opener_airmaxvs_cable_assemblies.html
>
> Hope this helps
> Brendan
>

Thanks for the reply! These connectors look useful. I have noticed that
people seem to have had success with FR4 connections of 40 inches or
less. In som some cases that includes transmission through a backplane
connector. Hope that helps with your application!
--Dennis

Article: 107121
Subject: Re: Global signal conservation
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 24 Aug 2006 22:33:38 +0200
Links: << >> << T >> << A >>

Alan Nishioka wrote:
> When using Xilinx, the best way to see what hardware is actually there
> is to use fpga_editor.  You don't even need a design; just create a new
> one, make up a name, and select the part you want to look at.  Then you
> can double-click on the slice and see what is inside of it.
> 
> Alan Nishioka
> 

Rebooted into windows and launched fpga_editor, and was able to
see the detail, sure enough there's an inverter right there. Thanks
for the tip.

Tried launching fpga_editor from linux but it doesn't work.

dave% /Xilinx/bin/lin/fpga_editor
Cannot register service: RPC: Unable to receive; errno = Connection refused
unable to register (registryProg, registryVers, tcp)

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 107122
Subject: Re: Xilinx BRAMs question - help needed ..
From: "Marlboro" <ccon67@netscape.net>
Date: 24 Aug 2006 13:36:56 -0700
Links: << >> << T >> << A >>


me_2003@walla.co.il wrote:
> Ben Jackson wrote:
> > On 2006-08-24, me_2003@walla.co.il <me_2003@walla.co.il> wrote:
> > > different bit field it should be also written to it (like a bit-wise OR
> > > between current value and new value).
> >
> > You could have both sides write to their own BRAM and define the output
> > of the whole thing as the OR of the outputs of the BRAMs.
> >
>
>
> Yes but I need a 6 bit wide vector - so that means that I would have to
> use 6 BRAMs.
> while utilizing a very small precentage of each (I need only 512
> entries).
>
> Thanks, Mordehay.

As others suggested, you can ultilize 1 Bram, the writing port is 4k
x1, the reading port is 512x8.  No reason why it doesn't work..

Cheers,

Article: 107123
Subject: Re: Style of coding complex logic (particularly state machines)
From: Mike Treseler <mike_treseler@comcast.net>
Date: Thu, 24 Aug 2006 13:43:09 -0700
Links: << >> << T >> << A >>

mikegurche@yahoo.com wrote:

> This approach is based on the observation that synthesis software is
> weak on architecture-level manipulation but good at gate-level logic
> minimization. 

I have observed that synthesis software does what it is told.
If I describe two gates and a flop, that is what I get.
If I describe a fifo or an array of counters, that
is what I get.

> The advantage of this approach is that I have better control on final
> hardware implementation.  Instead of blindly relying on synthesis
> software and testing code in a trial-and-error basis, I can
> consistently get what I want, regardless which synthesis software is
> used.

What I want is a netlist that sims the same
as my code and makes reasonable use of the
device resources. Synthesis does a good
job of this with the right design rules.
Trial and error would only come into play
if I were to run synthesis without simulation.

>  On the downside, this approach requires more time in initial
> design phase and the code is less compact.  The VHDL code itself
> sometimes can be cumbersome. But it is clear and easy to comprehend
> when presented with the block diagram.

I prefer clean, readable code,
verified by simulation and static timing.
I use the rtl viewer to convert
my logical description to a structural
one for review.

     -- Mike Treseler

Article: 107124
Subject: Re: JOP as SOPC component
From: "Martin Schoeberl" <mschoebe@mail.tuwien.ac.at>
Date: Thu, 24 Aug 2006 22:44:54 +0200
Links: << >> << T >> << A >>

Hi Kevin,

now I know more from your name than KJ ;-)

>> My pipeline approach is just this little funny busy counter
>> instead of a single ack and that a slave has to declare it's
>> pipeline level (0 to 3). Level 1 is almost ever possible.
>> It's more or less for free in a slave. Level 1 means that
>> the master can issue the next read/write command in the same
>> cycle when the data is available (rdy_cnt=0). Level 2 means
>> issue the next command one cycle earlier (rdy_cnt=1). Still
>> not a big issue for a slave (especially for a memory slave
>> where you need a little state machine anyway).
>
> I'm assuming that the master side address and command signals enter the
> 'Simpcon' bus and the 'Avalon' bus on the same clock cycle.  Maybe this
> assumption is where my hang up is and maybe JOP on Simpcon is getting a
> 'head start' over JOP on Avalon.

This assumption is true. Address and command (+write data) are
issued in the same cycle - no magic there. In SimpCon this is a
single cycle thing and there is no ack or busy signal involed in
this first cycle. That means no combinatorial generation of ack or
busy. And no combinatorial reaction of the master in the first
cycle.

What I loos with SimpCon is a single cycle latency access. However,
I think this is not too much to give up for easier pipelining of the
arbitration/data in MUX.

> Given that assumption though, it's not clear to me why the address and
> command could not be designed to also end up at the actual memory
> device on the same clock cycle.  Again, maybe this is where my hang up
> is.

The register that holds the address is probably a ALU result
register (or in my case the top-of-stack). That one is usually
buried deep in the design. Additional you have to generate your
slave selection (chip select) from that address. This ends up with
some logic and long routing pathes to the pins. In a practical
example with the Cyclone 6-7 ns are not so uncommon. Almost one
cycle at 100 MHz. Furthermore, this delay is not easy to control in
your design - add another slave and the output delay changes.

To avoid this unpredictability one will add a register at the IO pad
for address and rd/wr/cs. If we agree on this additional register at
the slave/memory interface we can drop the requirement on the master
to hold the address and control longer than one cycle. Furthermore,
as we have this minimum one cycle latency from master command till
address/rd/wr/data on the pins we do not need an ack/busy indication
during this command cycle. We just say to the master: in the cycle
the follows your command you will get the information about ready or
wait.

> Given that address and command end up at the memory device on the same
> clock cycle whether SimpCon or Avalon, the resulting read data would
> then be valid and returned to the SimpCon/Avalon memory interface logic
> on the same clock cycle.  Pretty sure this is correct since this is
> just saying that the external memory performance is the same which is
> should be since it does not depend on SimpCon or Avalon.

In SimpCon it will definitely arrive one cycle later. With Avalon
(and the generated memory interface) I 'assume' that there is also
one cycle latency - I read this from the tco values of the output
pins in the Quartus timing analyzer report. For the SRAM interface I
did in VHDL I explicitly added registers at the addredd/rd/wr/data
output. I don't know if the switch fabric adds another cycle.
Probably not, if you do not check the pipelined checkbox in the SOPC
Builds.

> Given all of that, it's not clear to me why the actual returned data
> would show up on the SimpCon bus ahead of Avalon or how it would be any
> slower getting back to the SimpCon or Avalon master.  Again, this might
> be where my hangup is but if my assumptions have been correct up to
> this paragraph then I think the real issue is not here but in the next
> paragraph.

Completely agree. The read data should arrive in the same cycle from
Avalon or SimpCon to the master. Now that's the point where this
bsy_cnt comes into play. In my master (JOP) I can take advantage of
the early knowledge when data will arrive. I can restart my waiting
pipeline earlier with this information. This is probably the main
performance difference.

Going through my VHDL code for the Avalon interface I found on more
issue with the JOP/Avalon interface: In JOP I issue read/write
commands and continue to execute microcode if possible. Only when
the result is needed the main pipeline waits for the slave result.
However, the slave can deliver the result earlier than needed. In
that case the slave has to hold the data for JOP. The Avalon
specification guarantees the read data valid only for a single
cycle. So I added a register to hold the data and got one cycle
latency:

    * one register at the input pins for the read data
    * one register at the JOP/Avalon interface to hold the data
    longer than one cycle

As I see it, this can be enhanced in the same way I did the little
Avalon specification violation on the master side. Use a MUX to
deliver the data from the input register in the first cycle and
switch to the 'hold' register for the other cycles. Should change
the interface for a fairer comparison. Thanks for pointing me to
this :-)

> If I got through this far then it comes down to....You say "Level 1
> means that the master can issue the next read/write command in the same
> cycle when the data is available (rdy_cnt=0). Level 2 means issue the
> next command one cycle earlier (rdy_cnt=1)."  and presumably the
> 'rdy_cnt=1' is the reason for the better SimpCon numbers.  Where I'm
> pretty sure I'm hung up then is why can't the Avalon slave drop the
> wait request output on the clock cycle that corresponds to rdy_cnt=1
> (i.e. one before data is available at the master)?

Because rdy_cnt has a different meaning than waitrequest. It is more
like an early datavalid. Dropping waitrequest does not help with my
pipeline restart thing.

> rdy_cnt=1 sounds like it is allowing JOP on SimpCon to start up the
> next transaction (read/write or twiddle thumbs) one clock cycle before
> the read data is actually available.  But how is that different than

As above: the main thing is to get the master pipeline started early
to use the read data. Perhaps this is a special design feature of
JOP and not usable in a different master. I don't know. We would
need to design a different CPU to evaluate if this feature is useful
in the general case.

>
>> Enjoy this discussion :-)
>> Martin
>
> Immensely.  And I think I'll finally get the light bulb turned on in my
> head after your reply.
>
BTW: As I'm also academic I should/have to publish papers. SimpCon
is on my list for months to be published - and now it seems to be
the right time. I will write a draft of the paper in the next few
days. If you are interested I'll post a link to it in this thread
and your comments are very welcome.

Martin

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search