Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 90700

Article: 90700
Subject: Re: which is Low power FPGA?
From: "jerzy.gbur@gmail.com" <jerzy.gbur@gmail.com>
Date: 19 Oct 2005 03:19:34 -0700
Links: << >> << T >> << A >>

I think that flash based FPGA has advantages only :)
But they could have smaller capacity then other chips.
First you need to look in datasheets on quiscient power for chips you
interested in.

Spartan3L could have 100-200mW.

regards

Jerzy Gbur

Article: 90701
Subject: Re: clock timing
From: "Symon" <symon_brewer@hotmail.com>
Date: Wed, 19 Oct 2005 03:31:13 -0700
Links: << >> << T >> << A >>


"Benjamin Menküc" <benjamin@menkuec.de> wrote in message 
news:dj3v5s$3t1$01$1@news.t-online.com...
> Hi Symon,
>
> fig. 11 in XAPP622 is about direct LVDS output. I am doing the parallel to 
> lvds conversion with an external IC.
>
Hi Benjamin,
You've missed the point. It's the DDR bit of the diagram I wanted you to 
look at, not the output buffer. That FDDRRSE is in the IOB. After looking up 
FDDRCPE in the libraries guide, try something like this in your VHDL:-

  component FDDRCPE
   port (
     Q : out std_logic;
     C0 : in std_logic;
     C1 : in std_logic;
     CE : in std_logic;
     CLR : in std_logic;
     D0 : in std_logic;
     D1 : in std_logic;
     PRE : in std_logic
   );
  end component;

begin

  not_clock <= not clock;

  fddrcpe_ins : fddrcpe
    port map (
      q   => output,
      c0  => clock,
      c1  => not_clock,
      ce  => '1',
      clr => clr,
      d0  => '1',
      d1  => '0',
      pre => pre
    );

The output can be any standard, set it in your UCF.

HTH, Syms.
p.s. The gclk pins have special features for incoming clocks, not outgoing 
ones AFAIK.

Article: 90702
Subject: Re: which is Low power FPGA?
From: "Antti Lukats" <antti@openchip.org>
Date: Wed, 19 Oct 2005 12:56:08 +0200
Links: << >> << T >> << A >>

<jerzy.gbur@gmail.com> schrieb im Newsbeitrag
news:1129717174.637512.326670@g44g2000cwa.googlegroups.com...
> I think that flash based FPGA has advantages only :)
> But they could have smaller capacity then other chips.
> First you need to look in datasheets on quiscient power for chips you
> interested in.
>
> Spartan3L could have 100-200mW.
>
> regards
>
> Jerzy Gbur
>

read the datasheet for 3L !

the 3L has NO POWER advantages over s-3 while operationg!
NONE!
exactly the same power consumption as S3

3L only has additional hibernate mode, but that means the FPGA is not
configured during low power hibernation.

Antti

Article: 90703
Subject: Re: which is Low power FPGA?
From: "himassk" <himassk@gmail.com>
Date: 19 Oct 2005 04:17:04 -0700
Links: << >> << T >> << A >>

I could nt make out the best low power FPGA among the Spartan3, Cyclone
II and Lattice FPGAs.

Regards,
Himassk

Article: 90704
Subject: dagen.exe,where can i get it,thanks(for digital filter)
From: "cehon" <cehonzhang@sina.com.cn>
Date: Wed, 19 Oct 2005 20:01:47 +0800
Links: << >> << T >> << A >>

I need it to help me to design digital filter,where can i got it ,thank you
very much!
                    cehon

Article: 90705
Subject: Re: using i2c core
From: Bevan Weiss <kaizen__@NOSPAMhotmail.com>
Date: Thu, 20 Oct 2005 01:27:13 +1300
Links: << >> << T >> << A >>

CMOS wrote:
> hi,
> the IOBUF im  using has 4 pins.
>  it is made of one TRISTATE output buffer and input buffer. the
> inputbuffer's input and tri_state output buffers output is connected
> together and function as the IO port for the entity. the input of the
> tristate output buffer is connected to the output from the core, which
> is always grounded. its enable/disable pin is also controlled by the
> core. the output of the input buffer is connected to an input of the
> core. 
> 
> CMOS

Does the output from the core not just have two pins for SCL and another 
two for SDA?

I'd have just thought that you'd have:
OE for SCL connected to SCL output from core
I for SCL connected to SCL input from core
OE for SDA connected to SDA output from core
I for SDA connected to SDA input from core

Which signals are available from this I2C core?

Article: 90706
Subject: Re: Rosetta Results
From: Martin Thompson <martin.j.thompson@trw.com>
Date: 19 Oct 2005 13:47:23 +0100
Links: << >> << T >> << A >>

Hi Austin,

Austin Lesea <austin@xilinx.com> writes:

> All,
> 
> http://tinyurl.com/clzqh
> 

Can you post the big URL (as well) for posterity, in case tinyurl ever
goes away?

> Details the latest readouts for actual single event upsets for Virtex
> 4, and Spartan 3.
> 

Could you clarify something for me please? I have read the Rosetta
stuff that I could find, and I'm still not sure:  The experiment is
testing the configuration latches only, not the flipflops I use in my
designs - is that correct?  Can you point me at the document I need to
read more carefully :-)

Thanks!

Martin

-- 
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.trw.com/conekt

Article: 90707
Subject: Re: using i2c core
From: Bevan Weiss <kaizen__@NOSPAMhotmail.com>
Date: Thu, 20 Oct 2005 01:52:20 +1300
Links: << >> << T >> << A >>

Bevan Weiss wrote:
> CMOS wrote:
>> hi,
>> the IOBUF im  using has 4 pins.
>>  it is made of one TRISTATE output buffer and input buffer. the
>> inputbuffer's input and tri_state output buffers output is connected
>> together and function as the IO port for the entity. the input of the
>> tristate output buffer is connected to the output from the core, which
>> is always grounded. its enable/disable pin is also controlled by the
>> core. the output of the input buffer is connected to an input of the
>> core.
>> CMOS
> 
> Does the output from the core not just have two pins for SCL and another 
> two for SDA?
> 
> I'd have just thought that you'd have:
> OE for SCL connected to SCL output from core
> I for SCL connected to SCL input from core
> OE for SDA connected to SDA output from core
> I for SDA connected to SDA input from core
> 
> Which signals are available from this I2C core?

Never mind I've seen the spec myself now...
If you're using VHDL, then what's wrong with simply following the guide 
laid out in the specs doc?  Or are these errors found when doing that?

scl <= scl_pad_o when (scl_padoen_oe = ‘0’) else ‘Z’;
sda <= sda_pad_o when (sda_padoen_oe = ‘0’) else ‘Z’;
scl_pad_i <= scl;
scl_pad_i <= sda;

This code (as copied from specs doc for opencores I2C core) should infer 
a tri-state buffer into the mix.

Alternatively, you could create instances of an IOBUF, however this 
shouldn't be needed.

Article: 90708
Subject: Re: How to speed up the critical path (Xilinx)
From: "zqhpnp@gmail.com" <zqhpnp@gmail.com>
Date: 19 Oct 2005 06:03:00 -0700
Links: << >> << T >> << A >>

One use the pipeline tech,which insert registers in the datapath,can
improve the working frequency but increase the data delay.
the other is parallel  tech if the fpga have enough resource.

Article: 90709
Subject: Re: Newbie question: XC3S400 Gate Count
From: Philip Freidin <philip@fliptronics.com>
Date: Wed, 19 Oct 2005 13:11:29 GMT
Links: << >> << T >> << A >>

On Tue, 18 Oct 2005 14:53:37 -0700, Ed McGettigan <ed.mcgettigan@xilinx.com> wrote:
>We counted transistors for a while internally as we thought that it
>was an interesting statistic.  But, it's actually a very hard problem to
>handle as our devices are almost 100% full custom and sometimes the legs
>of a transistor may not be clearly defined or split between different
>submodules. The auto reporting functions from the CAD tools were also
>spitting out numbers that were way too high.  We stopped doing this in
>detail for Virtex-4 as there was no real benefit in knowing the exact
>number except for bragging rights.
>
>We had a press release for the Virtex-II Pro 2VP100 part that stated
>430 Million transistors back in 2003
>http://www.xilinx.com/prs_rls/xil_corp/03133taiwan.htm
>
>I think that we were estimating about 1 Billion transistors for the
>Virtex-4 LX200 parts that we are shipping now. I'm not as familiar with
>the Spartan-III line, but I think that you are looking at about 30-35 Million
>for the XC3S400 which has a configuration size of 1.7 Mbit.
>
>Ed

At the FPGA-FAQ web site, I have a page that displays a bunch of
statistics about various FPGAs:

   http://www.fpga-faq.org/compare/build_form.cgi

I just ran it with the above chips that Ed mentioned, and looked
at the transistor count estimates:

(Use monospaced font)

Part Number    Ed's TX count        My TX count
               ESTIMATE             ESTIMATE

XC2VP100         430,000,000       554,279,440
XC4VLX200      1,000,000,000       734,982,144
XC3S400           35,000,000        26,043,328


My estimates are based on a complex set of rules using only
publicly available information, which is why they are listed
as "Transistors (Wild estimate)" on my web site.
Ed is much closer to the source of the information.
Sometimes my estimates are higher, sometimes lower, than Ed's.
Your answer is probably bounded by these values.

I 110% agree with Ed that
     "there is no real benefit in knowing the exact number"

Cheers,
Philip



===================
Philip Freidin
philip.freidin@fpga-faq.org
Host for WWW.FPGA-FAQ.ORG

Article: 90710
Subject: Re: using i2c core
From: "CMOS" <manusha@millenniumit.com>
Date: 19 Oct 2005 06:19:12 -0700
Links: << >> << T >> << A >>

this is exactly what i did initialy. when i do that, i cant see any
ports corresponding to the outputs ( i.e scl and sda ) when i try to
assign those to chip scope pro logic analyzer core using chipscope pro
core inserter. im not sure whther the mentioned error comes for that
configuration too ( ERROR:924). When i changed the design to the one i
explained, Core inserter shows those outports so that i can assign them
to trigger ports, but that error comes in the translation stage.

CMOS

Article: 90711
Subject: Re: dagen.exe,where can i get it,thanks(for digital filter)
From: cecarrion1@gmail.com
Date: 19 Oct 2005 06:51:58 -0700
Links: << >> << T >> << A >>


cehon wrote:
> I need it to help me to design digital filter,where can i got it ,thank you
> very much!
>                     cehon





Hello, I have some examples, in sysgen and run in a spartan-3 and I
have a example for a FIR filter but this is in spanish, if you want
this?

Article: 90712
Subject: Re: using i2c core
From: John_H <johnhandwork@mail.com>
Date: Wed, 19 Oct 2005 13:59:02 GMT
Links: << >> << T >> << A >>

"the inputbuffer's input and tri_state output buffers output is 
connected together"

and

"the input of the tristate output buffer is connected to the output from 
the core"

seems to suggest an extra connection.

sda_pad_o to IOBUF.I
sda_pad_oen to IOBUF.T
sda_pad_i to IOBUF.O
pad to IOBUF.IO

similar for clock.




CMOS wrote:
> hi,
> the IOBUF im  using has 4 pins.
>  it is made of one TRISTATE output buffer and input buffer. the
> inputbuffer's input and tri_state output buffers output is connected
> together and function as the IO port for the entity. the input of the
> tristate output buffer is connected to the output from the core, which
> is always grounded. its enable/disable pin is also controlled by the
> core. the output of the input buffer is connected to an input of the
> core. 
> 
> CMOS

Article: 90713
Subject: Re: Carry Chain Design
From: "Brannon" <brannonking@yahoo.com>
Date: 19 Oct 2005 08:05:21 -0700
Links: << >> << T >> << A >>

I guess what I was picturing was a four bit LUT (like it currently is
in Xilinx chips) with two outputs where one output was tied to an AND
gate on the output of the MUXCY. Or that extra bit could be tied to the
select bits on the MUXCY and the MUXCY could have three or four inputs,
two of which could be from CY0. Currently only one input on the MUXCY
is from CY0, and there is only one CY0 per MUXCY. That looks like a
cheap, small part to me. We should be able to easily add another one,
and possibly tie it to the upper bits going into the LUT.

Article: 90714
Subject: Re: Rosetta Results
From: Austin Lesea <austin@xilinx.com>
Date: Wed, 19 Oct 2005 08:11:36 -0700
Links: << >> << T >> << A >>

Martin,

http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?sSecondaryNavPick=&iLanguageID=1&multPartNum=1&category=&sTechX_ID=al_v4rse&sGlobalNavPick=&BV_SessionID=@@@@1217042584.1129733086@@@@&BV_EngineID=cccgaddfmiggfdlcefeceihdffhdfkf.0

is the long URL.

-snip-
> Could you clarify something for me please? I have read the Rosetta
> stuff that I could find, and I'm still not sure: The experiment is
> testing the configuration latches only, not the flipflops I use in my
> designs - is that correct? Can you point me at the document I need to
> read more carefully :-)

September Issue, IEEE Transactions on Device and Materials Reliability
has the Rosetta story.

For those who have IEEE library usernames and passwords, or those who
belong to this group, you may find the article online:

http://ieeexplore.ieee.org/Xplore/guesthome.jsp

Go to Journals & Magazines, and to this transactions, and "view
forthcoming articles."

Since we wrote this, IEEE owns the copyrights, and we can no longer
distribute the paper.

If you wish to have a presentation on Rosetta, our FAEs have powerpoint
slide shows on the subject they can present (under NDA).

Given your affiliation (TRW), I imagine you know who is your
Aerospace/Defense Xilinx FAE, and can contact him regarding this subject.

The Aerospace/Defense Group has further requirements from the commercial
group (heavey ions, total dose, etc.). These are addressed in other
reports and publications. Xilinx has a radiations effects industry
consortium with more than a dozen members who are actively working on
the use, use models, performance, and mitigation of radiation effects.
Please ask about this consortium.

We also have agreements with groups in the EU on the same subject,
directed from our research arm in Ireland (to facilitate better
communications).

Rick Katz of NASA (who posts here often) has proceedings from
conferences he can direct you to with even more information that we have
presented.

An example here is the MAPLD conference, 2005:

http://www.klabs.org/mapld05/abstracts/index.html

To answer your specific question: what about the D FF in the CLB? What
is it's Failure Rate in Time compared to the configuration bits?

There are 200,000 DFF in our largest part (XC4VLX200). the failure rate
of these is .2 FIT (ie 1 FIT/Mb). That is .2 failures in 1 billion
hours for the largest device (at sea level). The DFF is a very large,
well loaded structure as it is programmable to be: a latch, a D flip
flop, asynchronous reset, synchronous reset, with other options as well
for load, preset, and capture.

Compared to the 6.1 FIT/million bits of configuration memory (as of
today's readout) for the mean time to a functional failure, times the
number of config bits (in Mbit), the DFF upset rate is many times less.

We also are recording the upset rate in the BRAM.

In Virtex 4, we have FRAME_ECC for detecting and correcting
configuration bit errors, and BRAM_ECC for detecting and correcting BRAM
errors (hard IP built into every device).

Regardless, for the highest level of reliability, we suggest using our
XTMR tool which automatically converts your design to a TMR version
optimized for the best reliability using the FPGA resources (the XTMR
tool understands how to TMR a design in our FPGA -- not something
obvious how to do). In addition to the XTMR, we also suggest use fo the
FRAME_ and BRAM_ ECC features so that you are actively "scrubbing"
configuration bits so they get fixed if they flip, and the same for the
BRAM. The above basically describes how our FPGAs are being used now in
aerospace applications.

Austin

Article: 90715
Subject: Re: How to speed up the critical path (Xilinx)
From: "Symon" <symon_brewer@hotmail.com>
Date: Wed, 19 Oct 2005 08:14:13 -0700
Links: << >> << T >> << A >>

KB,
How about adding another multiplier so that you can eliminate the first mux. 
Have one multiplier for each source. Then make the second mux one port wider 
to accommodate the extra multiplier result. The Xilinx CLB muxes are 
expandable without adding too much extra delay.
HTH, Syms.

"starbugs" <starbugs@gmx.net> wrote in message 
news:1129711172.603645.233560@g14g2000cwa.googlegroups.com...
> Hi there,
>
> I would be happy about some suggestions on how I could start to make my
> design faster. My design is a processor (18 bit datapath)and the
> critical path looks like this:
>
> 1. Instruction register (containing number of register)
> 2. Register file (distributed RAM)
> 3. Mux (2-way, selects either register or RAM)
> 4. Mult18x18 within the ALU
> 5. Mux (ALU output selector)
> 6. Register file (distributed RAM)
>

Article: 90716
Subject: Re: Anyone remember the really early Xilinx FPGAs?
From: "Symon" <symon_brewer@hotmail.com>
Date: Wed, 19 Oct 2005 08:23:45 -0700
Links: << >> << T >> << A >>

Hi Peter,
So, I missed your reply a couple of days back. My newsreader threw a wobbly. 
Anyway, I did specifically say "and keep any device specific stuff in 
separate files". Worry not, I'm using every feature I can of your chips! I 
just instantiate DCMs, MULTSs etc. in their own files, so that when I come 
to (lie to my FAE and say I'm going to) port it to a different device, I 
know where to focus my efforts. ;-)
Cheers, Syms.
"Peter Alfke" <alfke@sbcglobal.net> wrote in message 
news:1129502161.733791.62380@g14g2000cwa.googlegroups.com...
> Hi Symon.
> You know why you do it, but I suppose you are also aware of the
> performance and the money you leave on the table by "designing to the
> lowest common denominator".
> Cheers
> Peter Alfke
>

Article: 90717
Subject: Re: Anyone remember the really early Xilinx FPGAs?
From: "Symon" <symon_brewer@hotmail.com>
Date: Wed, 19 Oct 2005 08:26:17 -0700
Links: << >> << T >> << A >>

"GPE" <See_my_website_for_email@cox.net> wrote in message 
news:6wD4f.2209$%42.1138@okepread06...
>
>
> And how can anyboty forget the routed signal delays thru the old 3000 
> series of parts?!?!  I did a whole lot of hand routing back in those days. 
> Not much need for that anymore, thankfully.  I can't exactly imagine hand 
> routing a XCV3200E anyways...
>
Ed,
I remember back in the 80's closing my eyes and going to sleep after a hard 
day's 'XACT'ing and still seeing the image of those switchboxes burnt into 
my retina.
Cheers, Syms.

Article: 90718
Subject: Spartn 3 configuration failure
From: nithin.pal@gmail.com
Date: 19 Oct 2005 09:13:40 -0700
Links: << >> << T >> << A >>

Hello,

I am a newbie to the group and FPGAs in general. We have a board with
XC3S1500 FPGA on it. We use the master serial mode to configure the
FPGA using a platform flash (PROM) with the FPGA providing the CCLK
(confg rate -50). For debug reasons we also have the confg. signals
going to a POD header.

We see that sometimes when we probe the CCLK on the POD header the FPGA
fails to configure. On the scope , we could see the sequence of INIT
going high, CCLK going active (clocking), DONE being low and after some
time, the INIT goes high, the CCLK stops clocking but the DONE pin does
not go high. We have 330 ohm pull-up on DONE pin.

If i remove the probe from the CCLK pin on the POD header the board
starts-up with successful configuration.

I was wondering if anybody in the group has seen such behaviour before
or may be the experts in the group could guide me on the possible
issues involved.

Also, sometimes I have also seen that as the INIT goes low, CCLK goes
high but does not begin  to clock . In this case, a configuration
failure is obvious. So, is this a problem with the chip not being
consistent?


Thanks a lot in advance

Regards
Nithin

Article: 90719
Subject: MAC Architectures
From: Tim Wescott <tim@seemywebsite.com>
Date: Wed, 19 Oct 2005 09:25:12 -0700
Links: << >> << T >> << A >>

Jeorg's question on sci.electronics.design for an under $2 DSP chip got 
me to thinking:

How are 1-cycle multipliers implemented in silicon?  My understanding is 
that when you go buy a DSP chip a good part of the real estate is taken 
up by the multiplier, and this is a good part of the reason that DSPs 
cost so much.  I can't see it being a big gawdaful batch of 
combinatorial logic that has the multiply rippling through 16 32-bit 
adders, so I assume there's a big table look up involved, but that's as 
far as my knowledge extends.

Yet the reason that you go shell out all the $$ for a DSP chip is to get 
a 1-cycle MAC that you have to bury in a few (or several) tens of cycles 
worth of housekeeping code to set up the pointers, counters, modes &c -- 
so you never get to multiply numbers in one cycle, really.

How much less silicon would you use if an n-bit multiplier were 
implemented as an n-stage pipelined device?  If I wanted to implement a 
128-tap FIR filter and could live with 160 ticks instead of 140 would 
the chip be much smaller?

Or is the space consumed by the separate data spaces and buses needed to 
move all the data to and from the MAC?  If you pipelined the multiplier 
_and_ made it a two- or three- cycle MAC (to allow time to shove data 
around) could you reduce the chip cost much?  Would the amount of area 
savings you get allow you to push the clock up enough to still do audio 
applications for less money?

Obviously any answers will be useless unless somebody wants to run out 
and start a chip company, but I'm still curious about it.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Article: 90720
Subject: Re: MAC Architectures
From: "Pramod Subramanyan" <pramod.sub@gmail.com>
Date: 19 Oct 2005 09:48:52 -0700
Links: << >> << T >> << A >>


Tim Wescott wrote:
> Jeorg's question on sci.electronics.design for an under $2 DSP chip got
> me to thinking:
>
> How are 1-cycle multipliers implemented in silicon?  My understanding is
> that when you go buy a DSP chip a good part of the real estate is taken
> up by the multiplier, and this is a good part of the reason that DSPs
> cost so much.  I can't see it being a big gawdaful batch of
> combinatorial logic that has the multiply rippling through 16 32-bit
> adders, so I assume there's a big table look up involved, but that's as
> far as my knowledge extends.
>

There's no lookup table. Its just a BIG cascade of and's. This might
help:

http://www2.ele.ufes.br/~ailson/digital2/cld/chapter5/chapter05.doc5.html

> Yet the reason that you go shell out all the $$ for a DSP chip is to get
> a 1-cycle MAC that you have to bury in a few (or several) tens of cycles
> worth of housekeeping code to set up the pointers, counters, modes &c --
> so you never get to multiply numbers in one cycle, really.
>
> How much less silicon would you use if an n-bit multiplier were
> implemented as an n-stage pipelined device?  If I wanted to implement a
> 128-tap FIR filter and could live with 160 ticks instead of 140 would
> the chip be much smaller?
>
I think this would lead to lousy performance on small loops - such as
those found in JPEG encoding.

> Or is the space consumed by the separate data spaces and buses needed to
> move all the data to and from the MAC?  If you pipelined the multiplier
> _and_ made it a two- or three- cycle MAC (to allow time to shove data
> around) could you reduce the chip cost much?  Would the amount of area
> savings you get allow you to push the clock up enough to still do audio
> applications for less money?

Quite a lot of the chip cost depends on the design complexity and the
amount of time and money spent in R&D, not to mention the quantity of
chips the company hopes to sell, so its not a direct proportional
relation between cost and size of chip. If you're trying to save money,
you could try using a fast general purpose microcontroller instead of a
DSP.

>
> Obviously any answers will be useless unless somebody wants to run out
> and start a chip company, but I'm still curious about it.
>
> --
>
> Tim Wescott
> Wescott Design Services
> http://www.wescottdesign.com

Article: 90721
Subject: Re: MAC Architectures
From: Austin Lesea <austin@xilinx.com>
Date: Wed, 19 Oct 2005 09:52:08 -0700
Links: << >> << T >> << A >>

Tim,

http://klabs.org/richcontent/MAPLDCon02/abstracts/ahlquist_a.pdf

is just one of thousands of ways to design a multiplier.  This one is 
interesting, as they do it in (our) FPGA.

Google:  designing multipliers

Depending on what you want (widths, latencies, etc.) you can go from a 
serial implementation (extremely cheap, but takes a huge number of 
cycles), to a massively parallel, with critical stage pipeline registers 
(very expensive, but also very fast, with a low latency).

And, basically, if you can think of it, it has probably been done, more 
than once, with at least four or five papers written on it (with 
Master's and PhD degrees trailing behind) in each technology generation.

Xilinx chose to implement a simple 18X18 multiplier starting in Virtex 
II to facilitate our customers' designs.  As we have gone on since then, 
we have graduated to a more useful MAC in Virtex 4, but still keeping 
its function generic so it is useful across the widest range of 
applications.

There are many on this group who are experts in this area (both building 
multipliers in ASIC form, and building them using LUTs in FPGAs).

I am sure they will offer up their comments.

Austin

Tim Wescott wrote:

> Jeorg's question on sci.electronics.design for an under $2 DSP chip got 
> me to thinking:
> 
> How are 1-cycle multipliers implemented in silicon?  My understanding is 
> that when you go buy a DSP chip a good part of the real estate is taken 
> up by the multiplier, and this is a good part of the reason that DSPs 
> cost so much.  I can't see it being a big gawdaful batch of 
> combinatorial logic that has the multiply rippling through 16 32-bit 
> adders, so I assume there's a big table look up involved, but that's as 
> far as my knowledge extends.
> 
> Yet the reason that you go shell out all the $$ for a DSP chip is to get 
> a 1-cycle MAC that you have to bury in a few (or several) tens of cycles 
> worth of housekeeping code to set up the pointers, counters, modes &c -- 
> so you never get to multiply numbers in one cycle, really.
> 
> How much less silicon would you use if an n-bit multiplier were 
> implemented as an n-stage pipelined device?  If I wanted to implement a 
> 128-tap FIR filter and could live with 160 ticks instead of 140 would 
> the chip be much smaller?
> 
> Or is the space consumed by the separate data spaces and buses needed to 
> move all the data to and from the MAC?  If you pipelined the multiplier 
> _and_ made it a two- or three- cycle MAC (to allow time to shove data 
> around) could you reduce the chip cost much?  Would the amount of area 
> savings you get allow you to push the clock up enough to still do audio 
> applications for less money?
> 
> Obviously any answers will be useless unless somebody wants to run out 
> and start a chip company, but I'm still curious about it.
>

Article: 90722
Subject: Re: MAC Architectures
From: "Noway2" <no_spam_me2@hotmail.com>
Date: 19 Oct 2005 09:59:18 -0700
Links: << >> << T >> << A >>

Your question got me thinking, trying to recall the discussions I had
in the microprocessor architecture classes.   So here is some food for
thought:

 I seem to recall that (back then? - 99 -> 01) that multipliers were
assumed to take multiple cycles, I think for the class purposes we
usually assumed three or four cycles.   Sometimes the premise was that
there were multiple multipliers and other ALU units that could be used
simulataneously.  If an instruction was set to execute and there
weren't resources available, this resulted in a pipeline stall, but
otherwise the apparent output was single cycle.  I even believe we had
test problems dealing with determining how many multipliers a processor
required versus other resource items (each with a $ value attatched),
given a certain mix of instructions and having to determine the optimal
resource mix.

 In the latter portions of the class, we got away from the CPU
architecture and spent a lot of time dealing with the concept of
maintaining single cycle execution through the use of compiler
scheduling.  A lot of emphasis was placed on scheduling algorthims that
scanned for data and resource dependancies and how code will get
executed out of sequence to maximize resource utilization.

Another concept that was raised is the idea of sub cycle (clocking) or
micro-operations where in a single "instruction cylce" multiple
processor cycles would occur while still maintaing the apparent single
cycle execution.

I would imagine that modern DSPs rely on techniques like these, or some
totally new ones, to maximize the throughput.

Article: 90723
Subject: Re: clock timing
From: =?ISO-8859-15?Q?Benjamin_Menk=FCc?= <benjamin@menkuec.de>
Date: Wed, 19 Oct 2005 19:30:50 +0200
Links: << >> << T >> << A >>

Hi,

the whole thing works now. I am doing the LVDS with the IOB now.

The Problem was:

* Some hand-soldered fixes on the par->lvds line make problems on high 
frequency

* I did not transfer all the code fixes (most important one is to latch 
the output data) to the direct lvds version

After I have taken the code improvements from the par->lvds version to 
the direct lvds version it works now very good. I am doing the LVDS Bus 
at 380 MHz, thats the highest a DCM can do on my speedgrade on a 2x 
output. If I want to do higher Frequency I would need a fast external 
clock or use the DDR functionality I guess.

The hardware part of my work is finished with that for now. Maybe later 
I will order a new PCB for the par->lvds version, with the bugfixes so 
that I can use that to. On the current board there is the clk to the 
par->lvds IC on the wrong pin of the board connector, because the MEMEC 
handbook is wrong :( I have fixed that with a wire, but it seems like 60 
MHz is too much for that wireing. Because the picture on the screen 
improves when I touch that wire with my hands :-o very strange...

Thanks to the group for the help so far :)

regards,
Benjamin

Article: 90724
Subject: Re: clock timing
From: =?ISO-8859-15?Q?Benjamin_Menk=FCc?= <benjamin@menkuec.de>
Date: Wed, 19 Oct 2005 19:33:42 +0200
Links: << >> << T >> << A >>

Hi Symon,

yes I figured out that there are only IBUFG but no OBUFG :)

See my other post in this topic - it works now. If I need later on a 
higher speed than the 2x output of my DCM can do (for me its 380 MHz), 
than I will look at the DDR stuff.

The thing I was missing was really to latch all the outgoing data. In 
later designs I will not use a falling edge thing again, it is really a 
pain.

regards,
Benjamin

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search