Messages from 158300

Article: 158300
Subject: Re: System On Chip From Microsemi
From: rickman <gnuarm@gmail.com>
Date: Tue, 6 Oct 2015 23:09:53 -0400
Links: << >> << T >> << A >>

On 10/5/2015 1:12 PM, Kevin Neilson wrote:
>
>>
>> I won't say I understand it, but I have seen somethings about this.  But
>> "true random number generator"???  My understanding is this is virtually
>> impossible.  I haven't read about this.  Is it based on noise from a
>> diode or something?  I recall a researcher trying that and it was good,
>> but he could never find the source of a long term DC bias.
>>
>
> I don't know how these guys do it, but you can make a decent true random number generator with ring oscillators.  I read one paper that described using several of these along with non-linear feedback shift registers to get a good random number.

Again, I don't know for sure, but I think ring oscillators make very 
*poor* random number generators because they are easily linked to noise 
sources such as clocks on the chip.

-- 

Rick

Article: 158301
Subject: Re: System On Chip From Microsemi
From: rickman <gnuarm@gmail.com>
Date: Tue, 6 Oct 2015 23:19:57 -0400
Links: << >> << T >> << A >>

On 10/5/2015 1:37 PM, Meshenger wrote:
>
> In terms of Ethernet on the KickStart eval/development board, one can
> add an Arduino shield for that.  There is Bluetooth on-board but not
> Ethernet.

The only Arduino shields I have seen are general purpose interfaces 
which duplicate the MAC internal to the SF2.  This means it would not be 
code compatible with the SF2 Ethernet and so not much value using the 
Kickstart board for software development of a SF2 project.  I think the 
SF2 only needs a Phy and the transformer.

I would consider designing an add-on board to use the internal Ethernet 
MAC and make the I/O capability compatible with the Microsemi eval 
board.  I might add a few bells and whistles too.

-- 

Rick

Article: 158302
Subject: Re: Question about partial multiplication result in transposed FIR
From: David Brown <david.brown@hesbynett.no>
Date: Wed, 07 Oct 2015 09:41:55 +0200
Links: << >> << T >> << A >>

On 07/10/15 05:02, rickman wrote:
> On 10/5/2015 1:29 PM, Kevin Neilson wrote:
>>>
>>> We are talking modulo 2 multiplies at every bit, otherwise known as
>>> AND gates with no carry?  I'm a bit fuzzy on this.
>>>
>>> Now I'm confused by Kevin's description.  If the vector is
>>> multiplied by a scalar, what parts are common and what parts are
>>> unique?  What parts of this are fixed vs. variable?  The only parts
>>> a tool can optimize are the fixed operands.  Or I am totally
>>> missing the concept.
>>>
>> Say you're multiplying a by a vector [b c d].  Let's say we're using
>> the field GF(8) so a is 3 bits.  Now a can be thought of as (
>> a0*alpha^0 + a1*alpha^1 + a2*alpha^2 ), where a0 is bit 0 of a, and
>> alpha is the primitive element of the field.  Then a*b or a*c or a*d
>> is just a sum of some combination of those 3 values in the
>> parentheses, depending upon the locations of the 1s in b, c, or d.
>> So you can premultiply the three values in the parentheses (the
>> common part) and then take sums of subsets of those three (the
>> individual parts).  It's all a bunch of XORs at the end.  This is
>> just a complicated way of saying that by writing the HDL at a more
>> explicit level, the synthesizer is better able to find common factors
>> and use a lot fewer gates..
> 
> Ok, I'm not at all familiar with GFs.  I see now a bit of what you are
> saying.  But to be honest, I don't know the tools would have any trouble
> with the example you give.  The tools are pretty durn good at
> optimizing... *but*... there are two things to optimize for, size and
> performance.  They are sometimes mutually exclusive, sometimes not.  If
> you ask the tool to give you the optimum size, I don't think you will do
> better if you code it differently, while describing *exactly* the same
> behavior.
> 
> If you ask the tool to optimize for speed, the tool will feel free to
> duplicate logic if it allows higher performance, for example, by
> combining terms in different ways.  Or less logic may require a longer
> chain of LUTs which will be slower.  LUT sizes in FPGAs don't always
> match the logical breakdown so that speed or size can vary a lot
> depending on the partitioning.
> 

GF(8) is a Galios Field with 8 elements.  That means that there are 8
different elements.  These can be written in different ways.  Sometimes
people like to be completely abstract and write 0, 1, a, b, c, d, e, f.

<http://www.wolframalpha.com/input/?i=GF%288%29>

Others will want to use the polynomial forms (which are used in the
definition of GF(8)) and write: 0, 1, x, x+1, x², x²+1, x²+x, x²+x+1.

<http://www.math.uic.edu/~leon/mcs425-s08/handouts/field.pdf>

That form is easily written in binary: 000, 001, 010, 011, 100, etc.

One key point about a GF (or any field) is that you can combine elements
through addition, subtraction, multiplication, and division (except by
0) and get another element of the field.  To make this work, however,
addition and multiplication (and therefore their inverses) are very
different from how you normally define them.

A GF is always of size p^n - in this case, p is 2 and n is 3.  Your
elements are a set of n elements from Z/p (the integers modulo p).
Addition is just pairwise addition of elements, modulo p.  For the case
p = 2 (which is commonly used for practical applications, because it
suits computing so neatly), this means you can hold your elements as a
simple binary string of n digits, and addition is just xor.

Multiplication is a little more complex.  Treat your elements as
polynomials (as shown earlier), and multiply them.  Any factors of x^i
can be reduced modulo p.  Then the whole thing is reduced modulo the
field's defining irreducible degree n polynomial (in this case, x³ + x +
1).  This always gets you back to another element in the field.

The real magic here is that the multiplication table (excluding 0) forms
a Latin square - every element appears exactly once in each row and
column.  This means that division "works".

It's easy to get a finite field like this for size p, where p is prime -
it's just the integers modulo p, and you can use normal addition and
multiplication modulo p.  But the beauty of GF's is that they give you
fields of different sizes - in particular, size 2^n.

Back to the task in hand - implementing GF(8) on an FPGA.  Addition is
just xor - the tools should not have trouble optimising that!
Multiplication in GF is usually done with lookup tables.  For a field as
small as GF(8), you would have a table with all the elements.  This
could be handled in a 6x3 bit "ROM", or the tools could generate logic
and then reduce it to simple gates (for example, the LSB bit of the
product of non-zero elements is the xnor of the LSB bits of the
operands).  For larger fields, such as the very useful GF(2^8), you will
probably use lookup tables for the logs and antilogs, and use them for
multiplication and division.

A key area of practical application for GF's is in error detection and
correction.  A good example is for RAID6 (two redundant disks in a RAID
array) - there is an excellent paper on the subject by the guy who
implemented RAID 6 in Linux, using GF(2^8).  It gives a good
introduction to Galois fields.

<https://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf>

Article: 158303
Subject: Custom FPGA routing
From: "lilzz" <109234@FPGARelated>
Date: Wed, 07 Oct 2015 06:18:54 -0500
Links: << >> << T >> << A >>

1)Have an array of logic cells or programmable logic evenly spaced out.
Need to have the space reserved for routing channel.

 2)But each logic cell need some SRAM configuration bits from
configuration register

 The question I have is if I don't want configuration bits and wires
jamming up the routing channel for the logic array, can I have them on
different metal layers?

 I want to have simplest. 2 layers of metal. Metal 1 for logic arrays and
routing channels and Metal 2 for SRAM configuration bits and wirings. 


---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158304
Subject: Re: Question about partial multiplication result in transposed FIR
From: Allan Herriman <allanherriman@hotmail.com>
Date: 07 Oct 2015 11:30:59 GMT
Links: << >> << T >> << A >>

On Tue, 06 Oct 2015 23:02:27 -0400, rickman wrote:

[snip] 
> Ok, I'm not at all familiar with GFs.  I see now a bit of what you are
> saying.  But to be honest, I don't know the tools would have any trouble
> with the example you give.  The tools are pretty durn good at
> optimizing... *but*... there are two things to optimize for, size and
> performance.  They are sometimes mutually exclusive, sometimes not.  If
> you ask the tool to give you the optimum size, I don't think you will do
> better if you code it differently, while describing *exactly* the same
> behavior.

You seemed to have misinterpreted my earlier post in this thread in which 
I had two implementations of *exactly* the same function that were giving 
very different speed and area results after synthesis.

I think what you say (about the coding not mattering and the tools doing 
a good job) is true for "small" functions.
Once you get past a certain level of complexity, the tools (at least the 
Xilinx ones) seem to do a poor job if the function has been coded the 
canonical way.  Recoding as a tree of XORs can give better results.

This is a fault in the synthesiser and is independent of the speed vs 
area optimisation setting.

Regards,
Allan

Article: 158305
Subject: Re: Custom FPGA routing
From: rickman <gnuarm@gmail.com>
Date: Wed, 7 Oct 2015 12:06:59 -0400
Links: << >> << T >> << A >>

On 10/7/2015 7:18 AM, lilzz wrote:
> 1)Have an array of logic cells or programmable logic evenly spaced out.
> Need to have the space reserved for routing channel.
>
>   2)But each logic cell need some SRAM configuration bits from
> configuration register
>
>   The question I have is if I don't want configuration bits and wires
> jamming up the routing channel for the logic array, can I have them on
> different metal layers?

Are you asking about designing your own FPGA? If so, you can do anything 
you want with that.

>   I want to have simplest. 2 layers of metal. Metal 1 for logic arrays and
> routing channels and Metal 2 for SRAM configuration bits and wirings.

I don't think you can divide up the signals in that way.  Any set of 
connections will require routing in both the X and the Y direction.  It 
is often best to use two layers, one for X and one for Y.  Otherwise you 
find a set of signals routing in one direction block routing in the 
other direction.  So if you want to partition your routing layers into 
two classes, each class will need two layers for a total of four layers 
of metal.

-- 

Rick

Article: 158306
Subject: Re: System On Chip From Microsemi
From: Theo Markettos <theom+news@chiark.greenend.org.uk>
Date: 07 Oct 2015 21:43:08 +0100 (BST)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:
> Again, I don't know for sure, but I think ring oscillators make very 
> *poor* random number generators because they are easily linked to noise 
> sources such as clocks on the chip.

Yes:
http://www.cl.cam.ac.uk/~atm26/papers/markettos-ches2009-inject-trng.pdf

That has been confirmed to me by various industrial security folks.

Theo

Article: 158307
Subject: Re: Custom FPGA routing
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 8 Oct 2015 00:42:46 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:

(snip on configuration registers in FPGAs)

> Are you asking about designing your own FPGA? If so, you can do anything 
> you want with that.

>>   I want to have simplest. 2 layers of metal. Metal 1 for logic arrays and
>> routing channels and Metal 2 for SRAM configuration bits and wirings.

> I don't think you can divide up the signals in that way.  Any set of 
> connections will require routing in both the X and the Y direction.  It 
> is often best to use two layers, one for X and one for Y.  Otherwise you 
> find a set of signals routing in one direction block routing in the 
> other direction.  So if you want to partition your routing layers into 
> two classes, each class will need two layers for a total of four layers 
> of metal.

I haven't thought about it this much since the XC4000 days, but as
far as I know there are configuration bit shift registers along rows
(or columns) and a column (or row) on one side that steers bits
as appropriate.

Note also that configuration runs much slower than the actual logic,
so the transistors are different, and routing can be different.

For logic, one tries to keep signals in metal, instead of silicon,
as its lower resistivity means it runs faster.  That isn't so important
for configuration. 

It might be that you can share with wiring that has a different
use after configuration.  If so, there is probably a patent, 
though it might have expired.

Note also that the LUTs in some FPGAs can also be used as
shift registers, and also need to be part of the configuration
data. Maybe they use the same shift logic in both cases.

I am sure by now there is much art on the optimal designs
for FPGAs, which should be well documented somewhere.
Patents are a convenient place to look.

-- glen

Article: 158308
Subject: FPGA/HDL/HLS/Digital design centered Master degree online
From: Leonardo Capossio <capossio.leonardo@gmail.com>
Date: Thu, 8 Oct 2015 14:34:00 -0700 (PDT)
Links: << >> << T >> << A >>

Hi. I'm looking at universities that offer online/distance (part or full time) master degree based in FPGAs, HDLs/HLS and/or digital design. I wonder if someone knows of such a degree, or can point in the right direction.

Thanks

Article: 158309
Subject: recovery/removal timing
From: "zak" <93737@FPGARelated>
Date: Fri, 09 Oct 2015 15:25:16 -0500
Links: << >> << T >> << A >>

Hi All,

In Altera devices (at least) it is recommended that reset be applied to
the async port of flips. It is also recommended that such reset should be
pre-synchronised before wiring it to these async ports. This saves
resource and helps recovery/removal timing. 

What exactly is recovery/removal. I know it is defined in terms of reset
release and that reset should not be de-asserted close to clock edge. Fair
enough but is this independent of D input? I mean if D input is stable (or
passes setup/hold) does it matter still that reset release near clock edge
will be problem on its own. From timieQuest it looks certainly that it
does matter but why? How is reset actually applied inside the flip?

Any help appreciated.

Zak
 
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158310
Subject: Re: recovery/removal timing
From: gtwrek@sonic.net (Mark Curry)
Date: Fri, 9 Oct 2015 20:44:35 +0000 (UTC)
Links: << >> << T >> << A >>

In article <sdudnT94KJgxv4XLnZ2dnUU7-KOdnZ2d@giganews.com>,
zak <93737@FPGARelated> wrote:
>Hi All,
>
>In Altera devices (at least) it is recommended that reset be applied to
>the async port of flips. It is also recommended that such reset should be
>pre-synchronised before wiring it to these async ports. This saves
>resource and helps recovery/removal timing. 
>
>What exactly is recovery/removal. I know it is defined in terms of reset
>release and that reset should not be de-asserted close to clock edge. Fair
>enough but is this independent of D input? I mean if D input is stable (or
>passes setup/hold) does it matter still that reset release near clock edge
>will be problem on its own. From timieQuest it looks certainly that it
>does matter but why? How is reset actually applied inside the flip?

Without any specific knowledge of the actual circuitry of a FF - ask 
yourself this - at the inactive edge of reset, let's say the D pin is
a one - the reset is causing a 0 at the FF output.  If the clock 
recovery timing is not met (basically a setup/hold check with respect
to the async reset pin) - what value would you expect at Q?  An 
unknown value is the only reasonable assumption.

Now ask yourself if D is a zero (matching the reset state).  Is it now
safe to assume the the recovery check doesn't matter?  I wouldn't
bet on it.  I can see hand wavy arguments both ways.  

Now someone with more intimate details of the FF internals
may argue one way or another - but me as a logic designer?  No way I'm
depending on that.   

Do as Altera advises and properly synchronize that inactive edge 
of reset.  Make sure your timing tools are checking that path.
Reset/Initialization problems can be quite the devil to find and 
debug.

Regards,

Mark

Article: 158311
Subject: Re: recovery/removal timing
From: "zak" <93737@FPGARelated>
Date: Fri, 09 Oct 2015 16:54:26 -0500
Links: << >> << T >> << A >>

Thanks Mark. That is a good answer

Zak


---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158312
Subject: Re: FPGA/HDL/HLS/Digital design centered Master degree online
From: "chrisnonso" <54876@FPGARelated>
Date: Sat, 10 Oct 2015 20:15:06 -0500
Links: << >> << T >> << A >>

>Hi. I'm looking at universities that offer online/distance (part or full
>time) master degree based in FPGAs, HDLs/HLS and/or digital design. I
wonder
>if someone knows of such a degree, or can point in the right direction.
>
>Thanks

I dont think i've heard of any Masters program with a 100% focus on
HDLs/FPGAs.  What you are likely to find if FPGA is of interest to you is
pursuing a Master in either EE with DSP concentration or Computer
Science/Engineering. Studying either of these courses should give you much
exposure to FPGA's.  From my experience you sort of have to build on your
experience once you have been exposed to it.  For clearer information you
can go talk to some technical institutions in regards to this.  This is
just my opinion.  Hope this helps.


---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158313
Subject: Re: FPGA/HDL/HLS/Digital design centered Master degree online
From: Leonardo Capossio <capossio.leonardo@gmail.com>
Date: Wed, 14 Oct 2015 15:18:25 -0700 (PDT)
Links: << >> << T >> << A >>

Thanks. Do you have any program in particular that you would recommend ?

Article: 158314
Subject: Sum of 8 numbers in FPGA
From: "b2508" <108118@FPGARelated>
Date: Tue, 20 Oct 2015 06:40:27 -0500
Links: << >> << T >> << A >>

How do I most efficiently add 8 numbers in FPGA?
What is the best way to save LUTs?
How is data width affecting LUT consumption?
Thanks in advance.
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158315
Subject: Re: Sum of 8 numbers in FPGA
From: KJ <kkjennings@sbcglobal.net>
Date: Tue, 20 Oct 2015 07:01:34 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, October 20, 2015 at 7:40:35 AM UTC-4, b2508 wrote:
> How do I most efficiently add 8 numbers in FPGA?
With an adder.  You haven't stated any requirements so any answer here would be OK.  Consider:
- You didn't specify your latency or processing speed requirements
- You didn't specify your efficiency metric (i.e. power?  LUTs?  Something else?)

> What is the best way to save LUTs?
Use an accumulator and stream the numbers in sequentially might use fewer LUTs

> How is data width affecting LUT consumption?
More LUTs will be used when you increase the data width

Kevin

Article: 158316
Subject: Re: Sum of 8 numbers in FPGA
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 20 Oct 2015 15:18:08 +0000 (UTC)
Links: << >> << T >> << A >>

b2508 <108118@fpgarelated> wrote:
> How do I most efficiently add 8 numbers in FPGA?
> What is the best way to save LUTs?
> How is data width affecting LUT consumption?

The most efficient adder is the carry save adder.

But the actual implementation depends on many other details,
such as the timing of the availability of the numbers,
and also the bit width.

-- glen

Article: 158317
Subject: Re: Sum of 8 numbers in FPGA
From: rickman <gnuarm@gmail.com>
Date: Tue, 20 Oct 2015 16:26:29 -0400
Links: << >> << T >> << A >>

On 10/20/2015 7:40 AM, b2508 wrote:
> How do I most efficiently add 8 numbers in FPGA?
> What is the best way to save LUTs?
> How is data width affecting LUT consumption?
> Thanks in advance.

This sounds like a homework problem.  In an FPGA there aren't many ways 
to save LUTs for adders.  Unless you can process your data serially, the 
only thing I can think of is to do the additions in a tree structure 
which saves you a very few LUTs from the bit growth of the result 
compared to processing the additions serially, it's also faster. 
(((a+b)+(c+d))+((e+f)+(g+h)))  vs.  ((((((a+b)+c)+d)+e)+f)+g)+h

-- 

Rick

Article: 158318
Subject: Re: recovery/removal timing
From: "jt_eaton" <84408@FPGARelated>
Date: Tue, 20 Oct 2015 15:53:39 -0500
Links: << >> << T >> << A >>

>In article <sdudnT94KJgxv4XLnZ2dnUU7-KOdnZ2d@giganews.com>,
>zak <93737@FPGARelated> wrote:
>>Hi All,
>>

>Now ask yourself if D is a zero (matching the reset state).  Is it now
>safe to assume the the recovery check doesn't matter?  I wouldn't
>bet on it.  I can see hand wavy arguments both ways.  
>
>Now someone with more intimate details of the FF internals
>may argue one way or another - but me as a logic designer?  No way I'm
>depending on that.   
>
>Do as Altera advises and properly synchronize that inactive edge 
>of reset.  Make sure your timing tools are checking that path.
>Reset/Initialization problems can be quite the devil to find and 
>debug.
>
>Regards,
>
>Mark

You shouldn't have to hand wave or guess, you should be able to look at
timing requirements from the silicon vendors data sheet and it will tell
what the reset_deassert to clock recovery time is. It should also note if
that constraint is only valid when D=1 or not.

It's hard to imagine any flop going to 1 in that situation but you can get
some really weird behavior from muxed based logic.

I like to do both an asynchronous and synchronous reset on every flop and
connect the asynchronous reset directly to the input pad so that both
edges are asynchronous. You delay the synchronous reset enough clocks so
that you never see a 1 on the D input until after the recovery time has
passed. This is great for large chips where the transport delay across the
die can be multiple clock cycles.

If you syncronize the asynchronous reset then you also create a mess that
the dft engineer has to clean up. 

You will also mess up the timing for any signal that passes between soft
reset domains.

John Eaton

z3qmtr45@gmail.com

---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158319
Subject: Re: Sum of 8 numbers in FPGA
From: jim.brakefield@ieee.org
Date: Tue, 20 Oct 2015 14:52:30 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, October 20, 2015 at 6:40:35 AM UTC-5, b2508 wrote:
> How do I most efficiently add 8 numbers in FPGA?
> What is the best way to save LUTs?
> How is data width affecting LUT consumption?
> Thanks in advance.
> ---------------------------------------
> Posted through http://www.FPGARelated.com

At the risk of doing someone else's homework:

> How do I most efficiently add 8 numbers in FPGA?
The latest parts from Xilinx and Altera will add three numbers at a time using a single carry chain.

> What is the best way to save LUTs?
A: Doing serial arithmetic using block RAM to hold inputs & outputs.
B: Using DSP adders in place of LUT carry chains.

Jim

Article: 158320
Subject: Re: Sum of 8 numbers in FPGA
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 20 Oct 2015 22:00:19 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:
> On 10/20/2015 7:40 AM, b2508 wrote:
>> How do I most efficiently add 8 numbers in FPGA?
>> What is the best way to save LUTs?
>> How is data width affecting LUT consumption?
>> Thanks in advance.

> This sounds like a homework problem.  

Yes, but even so, leaving lots of unknowns.

> In an FPGA there aren't many ways to save LUTs for adders.  

If you have 8 n-bit inputs and need the sum as fast as
possible, there aren't a huge number of choices.
Though it does depends a litlte on n.

> Unless you can process your data serially, the 

In this case, there are two choices. You can process the data
bit serial, or word serial. (Or, I suppose somewhere in between.)

Choosing one of those would depend on how the data was supplied,
and again, how fast you need the result. In addition, only one
set of eight, or many?

> only thing I can think of is to do the additions in a tree structure 
> which saves you a very few LUTs from the bit growth of the result 
> compared to processing the additions serially, it's also faster. 
> (((a+b)+(c+d))+((e+f)+(g+h)))  vs.  ((((((a+b)+c)+d)+e)+f)+g)+h

If you just chain adders, the usual tools will optimize them.

But you might also want some registers in there, too.

Also, this could be a lab homework problem, where the student is
supposed to try things out and see what happens.

-- glen

Article: 158321
Subject: Re: Sum of 8 numbers in FPGA
From: David Wade <dave.g4ugm@gmail.com>
Date: Tue, 20 Oct 2015 23:56:41 +0100
Links: << >> << T >> << A >>

On 20/10/2015 22:52, jim.brakefield@ieee.org wrote:
> On Tuesday, October 20, 2015 at 6:40:35 AM UTC-5, b2508 wrote:
>> How do I most efficiently add 8 numbers in FPGA?
>> What is the best way to save LUTs?
>> How is data width affecting LUT consumption?

Why not try it out. Run one of the tool chains and see what happens when 
you build adder in different ways and  then if its not what you expect 
come and ask on here. The tool chains will show you what the LUT usage 
is. I was a tad suprised to find that when I coded:-

byteout <= byte1+byte2+byte3+byte4+byte5+byte6+byte7+byte8 ;

and compared it with

temp1 <= byte1 + byte2 + byte3 + byte4 ;
temp2 <= byte5 + byte6 + byte7 + byte8 ;
byteout <= temp1 + temp2 ;

I got the same number of LUTs and Slices used....

>> Thanks in advance.
>> ---------------------------------------
>> Posted through http://www.FPGARelated.com
>
> At the risk of doing someone else's homework:
>
>> How do I most efficiently add 8 numbers in FPGA?

Define efficiency. Almost all efficiency is a trade off between space 
and performance.

> The latest parts from Xilinx and Altera will add three numbers at a time using a single carry chain.
>

In this modern world of optimising tool chains why not just put them all 
in one expression and let the tool chain work out what is best for the chip.

>> What is the best way to save LUTs?
> A: Doing serial arithmetic using block RAM to hold inputs & outputs.

A classical trade off of speed, as its now serial, for gates used. If 
you do it serially then you may need to do 7 separate serial additions..
.. which will need more LUTs for the carry latches....

> B: Using DSP adders in place of LUT carry chains.

Assuming your chip has one?

>
> Jim
>

Just my two cents/pence/yuan...
And Jim, Nothing personal, your comments seemed a suitable place to hang 
my hat....

Dave

Article: 158322
Subject: Re: recovery/removal timing
From: "zak" <93737@FPGARelated>
Date: Wed, 21 Oct 2015 10:26:09 -0500
Links: << >> << T >> << A >>

>>In article <sdudnT94KJgxv4XLnZ2dnUU7-KOdnZ2d@giganews.com>,
>>zak <93737@FPGARelated> wrote:
>>>Hi All,
>>>
>
>>Now ask yourself if D is a zero (matching the reset state).  Is it now
>>safe to assume the the recovery check doesn't matter?  I wouldn't
>>bet on it.  I can see hand wavy arguments both ways.  
>>
>>Now someone with more intimate details of the FF internals
>>may argue one way or another - but me as a logic designer?  No way I'm
>>depending on that.   
>>
>>Do as Altera advises and properly synchronize that inactive edge 
>>of reset.  Make sure your timing tools are checking that path.
>>Reset/Initialization problems can be quite the devil to find and 
>>debug.
>>
>>Regards,
>>
>>Mark
>
>You shouldn't have to hand wave or guess, you should be able to look at
>timing requirements from the silicon vendors data sheet and it will tell
>what the reset_deassert to clock recovery time is. It should also note
if
>that constraint is only valid when D=1 or not.
>
>It's hard to imagine any flop going to 1 in that situation but you can
get
>some really weird behavior from muxed based logic.
>
>
>
>I like to do both an asynchronous and synchronous reset on every flop
and
>connect the asynchronous reset directly to the input pad so that both
>edges are asynchronous. You delay the synchronous reset enough clocks so
>that you never see a 1 on the D input until after the recovery time has
>passed. This is great for large chips where the transport delay across
the
>die can be multiple clock cycles.
>
>If you syncronize the asynchronous reset then you also create a mess
that
>the dft engineer has to clean up. 
>
>
>You will also mess up the timing for any signal that passes between soft
>reset domains.
>
>
>John Eaton
>
>z3qmtr45@gmail.com
>
>
>
>
>---------------------------------------
>Posted through http://www.FPGARelated.com

Apologies but can I ask if this reply is from an ASIC mindset or is it
spam. Almost every statement doesn't make any sense whatsoever for FPGAs.

Zak
---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158323
Subject: Re: recovery/removal timing
From: Jon Elson <jmelson@wustl.edu>
Date: Wed, 21 Oct 2015 14:24:20 -0500
Links: << >> << T >> << A >>

jt_eaton wrote:


> You shouldn't have to hand wave or guess, you should be able to look at
> timing requirements from the silicon vendors data sheet and it will tell
> what the reset_deassert to clock recovery time is. It should also note if
> that constraint is only valid when D=1 or not.
> 
> It's hard to imagine any flop going to 1 in that situation but you can get
> some really weird behavior from muxed based logic.
The real problem is if the reset is distributed on general switch fabric 
while the clock is on a low-skew clock net, some FFs could be released from 
reset on one clock, others could be released on the next clock, if the clock 
is fast.  Think of some state machine register that is supposed to start at 
zero, and start going through states after reset ends.  It could end up in 
an odd state if some FFs are still being reset while others have got out of 
reset.

Jon

Article: 158324
Subject: Re: recovery/removal timing
From: "zak" <93737@FPGARelated>
Date: Wed, 21 Oct 2015 15:43:09 -0500
Links: << >> << T >> << A >>


>The real problem is if the reset is distributed on general switch fabric

>while the clock is on a low-skew clock net, some FFs could be released
from
>
>reset on one clock, others could be released on the next clock, if the
clock
>
>is fast.  Think of some state machine register that is supposed to start
at
>
>zero, and start going through states after reset ends.  It could end up
in 
>an odd state if some FFs are still being reset while others have got out
of
>
>reset.
>
>Jon

True but the tool takes care of that and tells you if it passed
recovery/removal at every register based on clock period check. In case it
doesn't you can then assist by reducing reset fanout e.g. cascading
stages.

Zak
---------------------------------------
Posted through http://www.FPGARelated.com

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search