Messages from 159675

Article: 159675
Subject: Re: Hardware floating point?
From: rickman <gnuarm@gmail.com>
Date: Fri, 27 Jan 2017 14:00:07 -0500
Links: << >> << T >> << A >>

On 1/27/2017 10:12 AM, Benjamin Couillard wrote:
> Le vendredi 27 janvier 2017 03:17:21 UTC-5, David Brown a écrit :
>> On 27/01/17 05:39, rickman wrote:
>>> On 1/26/2017 9:38 PM, Kevin Neilson wrote:
>>>>>>
>>>>>> I think you oversimplify FP.  It works a lot better with
>>>>>> dedicated hardware.
>>>>>
>>>>> Not sure what your point is.  The principles are the same in
>>>>> software or hardware.  I was describing hardware I have worked on.
>>>>> ST-100 from Star Technologies.  I became very intimate with the
>>>>> inner workings.
>>>>>
>>>>> The only complications are from the various error and special case
>>>>>  handling of the IEEE-754 format.  I doubt the FPGA is implementing
>>>>> that, but possibly.  The basics are still the same.  Adds use a
>>>>> barrel shifter to denormalize the mantissa so the exponents are
>>>>> equal, a integer adder and a normalization barrel shifter to
>>>>> produce the result.  Multiplies use a multiplier for the mantissas
>>>>> and an adder for the exponents (with adjustment for exponent bias)
>>>>> followed by a simple shifter to normalize the result.
>>>>>
>>>>> Both add and multiply are about the same level of complexity as a
>>>>> barrel shifter is almost as much logic as the multiplier.
>>>>>
>>>>> Other than the special case handling of IEEE-754, what do you think
>>>>> I am missing?
>>>>>
>>>>> --
>>>>>
>>>>> Rick C
>>>>
>>>> It just all works better with dedicated hardware.  Finding the
>>>> leading one for normalization is somewhat slow in the FPGA and is
>>>> something that benefits from dedicated hardware.  Using a DSP48 (if
>>>> we're talking about Xilinx) for a barrel shifter is fairly fast, but
>>>> requires 3 cycles of latency, can only shift up to 18 bits, and is
>>>> overkill for the task.  You're using a full multiplier as a shifter;
>>>> a dedicated shifter would be smaller and faster.  All this stuff adds
>>>> latency.  When I pull up CoreGen and ask for the basic FP adder, I
>>>> get something that uses only 2 DSP48s but has 12 cycles of latency.
>>>> And there is a lot of fabric routing so timing is not very
>>>> deterministic.
>>>
>>> I'm not sure how much you know about multipliers and shifters.
>>> Multipliers are not magical.  Multiplexers *are* big.  A multiplier has
>>> N stages with a one bit adder at every bit position.  A barrel
>>> multiplexer has nearly as many bit positions (you typically don't need
>>> all the possible outputs), but uses a bit less logic at each position.
>>> Each bit position still needs a full 4 input LUT.  Not tons of
>>> difference in complexity.
>>>
>>
>> A 32-bit barrel shifter can be made with 5 steps, each step being a set
>> of 32 two-input multiplexers.  Dedicated hardware for that will be
>> /much/ smaller and more efficient than using LUTs or a full multiplier.
>>
>> Normalisation of FP results also requires a "find first 1" operation.
>> Again, dedicated hardware is going to be a lot smaller and more
>> efficient than using LUT's.
>>
>> So a DSP block that has dedicated FP support is going to be smaller and
>> faster than using integer DSP blocks with LUT's to do the same job.
>>
>>> The multipliers I've seen have selectable latency down to 1 clock.
>>> Rolling a barrel shifter will generate many layers of logic that will
>>> need to be pipelined as well to reach high speeds, likely many more
>>> layers for the same speeds.
>>>
>>> What do you get if you design a floating point adder in the fabric?  I
>>> can only imagine it will be *much* larger and slower.
>>>
>
> If I understand, you can do a barrel shifter with log2(n) complexity, hence your 5 steps but you will have the combitional delays of 5 muxes, it could limit your maximum clock frequency. A brute force approach will use more resoures but will probably allow a higher clock frequency.

Technically N log(N).

-- 

Rick C

Article: 159676
Subject: Re: Hardware floating point?
From: rickman <gnuarm@gmail.com>
Date: Fri, 27 Jan 2017 14:43:35 -0500
Links: << >> << T >> << A >>

On 1/27/2017 11:33 AM, David Brown wrote:
> On 27/01/17 16:12, Benjamin Couillard wrote:
>> Le vendredi 27 janvier 2017 03:17:21 UTC-5, David Brown a écrit :
>>> On 27/01/17 05:39, rickman wrote:
>>>> On 1/26/2017 9:38 PM, Kevin Neilson wrote:
>>>>>>>
>>>>>>> I think you oversimplify FP.  It works a lot better with
>>>>>>> dedicated hardware.
>>>>>>
>>>>>> Not sure what your point is.  The principles are the same in
>>>>>> software or hardware.  I was describing hardware I have
>>>>>> worked on. ST-100 from Star Technologies.  I became very
>>>>>> intimate with the inner workings.
>>>>>>
>>>>>> The only complications are from the various error and special
>>>>>> case handling of the IEEE-754 format.  I doubt the FPGA is
>>>>>> implementing that, but possibly.  The basics are still the
>>>>>> same.  Adds use a barrel shifter to denormalize the mantissa
>>>>>> so the exponents are equal, a integer adder and a
>>>>>> normalization barrel shifter to produce the result.
>>>>>> Multiplies use a multiplier for the mantissas and an adder
>>>>>> for the exponents (with adjustment for exponent bias)
>>>>>> followed by a simple shifter to normalize the result.
>>>>>>
>>>>>> Both add and multiply are about the same level of complexity
>>>>>> as a barrel shifter is almost as much logic as the
>>>>>> multiplier.
>>>>>>
>>>>>> Other than the special case handling of IEEE-754, what do you
>>>>>> think I am missing?
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Rick C
>>>>>
>>>>> It just all works better with dedicated hardware.  Finding the
>>>>> leading one for normalization is somewhat slow in the FPGA and
>>>>> is something that benefits from dedicated hardware.  Using a
>>>>> DSP48 (if we're talking about Xilinx) for a barrel shifter is
>>>>> fairly fast, but requires 3 cycles of latency, can only shift
>>>>> up to 18 bits, and is overkill for the task.  You're using a
>>>>> full multiplier as a shifter; a dedicated shifter would be
>>>>> smaller and faster.  All this stuff adds latency.  When I pull
>>>>> up CoreGen and ask for the basic FP adder, I get something that
>>>>> uses only 2 DSP48s but has 12 cycles of latency. And there is a
>>>>> lot of fabric routing so timing is not very deterministic.
>>>>
>>>> I'm not sure how much you know about multipliers and shifters.
>>>> Multipliers are not magical.  Multiplexers *are* big.  A
>>>> multiplier has N stages with a one bit adder at every bit
>>>> position.  A barrel multiplexer has nearly as many bit positions
>>>> (you typically don't need all the possible outputs), but uses a
>>>> bit less logic at each position. Each bit position still needs a
>>>> full 4 input LUT.  Not tons of difference in complexity.
>>>>
>>>
>>> A 32-bit barrel shifter can be made with 5 steps, each step being a
>>> set of 32 two-input multiplexers.  Dedicated hardware for that will
>>> be /much/ smaller and more efficient than using LUTs or a full
>>> multiplier.
>>>
>>> Normalisation of FP results also requires a "find first 1"
>>> operation. Again, dedicated hardware is going to be a lot smaller
>>> and more efficient than using LUT's.
>>>
>>> So a DSP block that has dedicated FP support is going to be smaller
>>> and faster than using integer DSP blocks with LUT's to do the same
>>> job.
>>>
>>>> The multipliers I've seen have selectable latency down to 1
>>>> clock. Rolling a barrel shifter will generate many layers of
>>>> logic that will need to be pipelined as well to reach high
>>>> speeds, likely many more layers for the same speeds.
>>>>
>>>> What do you get if you design a floating point adder in the
>>>> fabric?  I can only imagine it will be *much* larger and slower.
>>>>
>>
>> If I understand, you can do a barrel shifter with log2(n) complexity,
>> hence your 5 steps but you will have the combitional delays of 5
>> muxes, it could limit your maximum clock frequency. A brute force
>> approach will use more resoures but will probably allow a higher
>> clock frequency.
>>
>
> The "brute force" method would be 1 layer of 32 32-input multiplexers.
> And how do you implement a 32-input multiplexer in gates?  You basically
> have 5 layers of 2-input multiplexers.
>
> If the depth of the multiplexer is high enough, you might use tri-state
> gates but I suspect that in this case you'd implement it with normal logic.

A barrel shifter is simpler than that.  I believe in a somewhat parallel 
method to computing an FFT, the terms in a barrel shifter can be shared 
to allow this.  (pseudo vhdl)


function (indata : unsigned(31:0), sel : unsigned(4:0))
      return unsigned(31:0) is
   variable a, b, c, d, e : unsigned(31:0);
begin
   a := indata(31:0) & '0' when sel(0) else indata;
   b := (a(30:0), others => '0') when sel(1) else indata;
   c := (b(27:0), others => '0') when sel(2) else indata;
   d := (c(23:0), others => '0') when sel(3) else indata;
   e := (d(15:0), others => '0') when sel(4) else indata;

   return (e);
end;

-- 

Rick C

Article: 159677
Subject: Re: Hardware floating point?
From: Benjamin Couillard <benjamin.couillard@gmail.com>
Date: Fri, 27 Jan 2017 12:46:09 -0800 (PST)
Links: << >> << T >> << A >>

Le vendredi 27 janvier 2017 14:00:10 UTC-5, rickman a =C3=A9crit=C2=A0:
> On 1/27/2017 10:12 AM, Benjamin Couillard wrote:
> > Le vendredi 27 janvier 2017 03:17:21 UTC-5, David Brown a =C3=A9crit :
> >> On 27/01/17 05:39, rickman wrote:
> >>> On 1/26/2017 9:38 PM, Kevin Neilson wrote:
> >>>>>>
> >>>>>> I think you oversimplify FP.  It works a lot better with
> >>>>>> dedicated hardware.
> >>>>>
> >>>>> Not sure what your point is.  The principles are the same in
> >>>>> software or hardware.  I was describing hardware I have worked on.
> >>>>> ST-100 from Star Technologies.  I became very intimate with the
> >>>>> inner workings.
> >>>>>
> >>>>> The only complications are from the various error and special case
> >>>>>  handling of the IEEE-754 format.  I doubt the FPGA is implementing
> >>>>> that, but possibly.  The basics are still the same.  Adds use a
> >>>>> barrel shifter to denormalize the mantissa so the exponents are
> >>>>> equal, a integer adder and a normalization barrel shifter to
> >>>>> produce the result.  Multiplies use a multiplier for the mantissas
> >>>>> and an adder for the exponents (with adjustment for exponent bias)
> >>>>> followed by a simple shifter to normalize the result.
> >>>>>
> >>>>> Both add and multiply are about the same level of complexity as a
> >>>>> barrel shifter is almost as much logic as the multiplier.
> >>>>>
> >>>>> Other than the special case handling of IEEE-754, what do you think
> >>>>> I am missing?
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Rick C
> >>>>
> >>>> It just all works better with dedicated hardware.  Finding the
> >>>> leading one for normalization is somewhat slow in the FPGA and is
> >>>> something that benefits from dedicated hardware.  Using a DSP48 (if
> >>>> we're talking about Xilinx) for a barrel shifter is fairly fast, but
> >>>> requires 3 cycles of latency, can only shift up to 18 bits, and is
> >>>> overkill for the task.  You're using a full multiplier as a shifter;
> >>>> a dedicated shifter would be smaller and faster.  All this stuff add=
s
> >>>> latency.  When I pull up CoreGen and ask for the basic FP adder, I
> >>>> get something that uses only 2 DSP48s but has 12 cycles of latency.
> >>>> And there is a lot of fabric routing so timing is not very
> >>>> deterministic.
> >>>
> >>> I'm not sure how much you know about multipliers and shifters.
> >>> Multipliers are not magical.  Multiplexers *are* big.  A multiplier h=
as
> >>> N stages with a one bit adder at every bit position.  A barrel
> >>> multiplexer has nearly as many bit positions (you typically don't nee=
d
> >>> all the possible outputs), but uses a bit less logic at each position=
.
> >>> Each bit position still needs a full 4 input LUT.  Not tons of
> >>> difference in complexity.
> >>>
> >>
> >> A 32-bit barrel shifter can be made with 5 steps, each step being a se=
t
> >> of 32 two-input multiplexers.  Dedicated hardware for that will be
> >> /much/ smaller and more efficient than using LUTs or a full multiplier=
.
> >>
> >> Normalisation of FP results also requires a "find first 1" operation.
> >> Again, dedicated hardware is going to be a lot smaller and more
> >> efficient than using LUT's.
> >>
> >> So a DSP block that has dedicated FP support is going to be smaller an=
d
> >> faster than using integer DSP blocks with LUT's to do the same job.
> >>
> >>> The multipliers I've seen have selectable latency down to 1 clock.
> >>> Rolling a barrel shifter will generate many layers of logic that will
> >>> need to be pipelined as well to reach high speeds, likely many more
> >>> layers for the same speeds.
> >>>
> >>> What do you get if you design a floating point adder in the fabric?  =
I
> >>> can only imagine it will be *much* larger and slower.
> >>>
> >
> > If I understand, you can do a barrel shifter with log2(n) complexity, h=
ence your 5 steps but you will have the combitional delays of 5 muxes, it c=
ould limit your maximum clock frequency. A brute force approach will use mo=
re resoures but will probably allow a higher clock frequency.
>=20
> Technically N log(N).
>=20
> --=20
>=20
> Rick C

Yep true, thanks for the clarification

Article: 159678
Subject: Re: Anyone use 1's compliment or signed magnitude?
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Fri, 27 Jan 2017 20:14:17 -0700
Links: << >> << T >> << A >>

 > On 01/26/2017 09:15 PM, rickman wrote:
Snip

 > What does "customer visible" have to do with it?  You seem to be talking
 > about rolling your own DAC using a bunch of I/O pins and resistors with
 > analog inversion circuitry after.  I'm explaining how it can be done
 > more easily.  If you don't need bipolar outputs, why do you need
 > negative binary values at all?
 >
 > I've said there are a few sign magnitude DAC parts out there, but not so
 > many.  I can't recall ever using one.
 >

"Not customer visible" means that the person that buys one of these 
parts to stick on a board is un-aware of the interface between the 
internal digital logic and internal analog circuitry. The people 
designing the analog guts of a mixed signal chip do "roll their own" 
DACs in several flavors. Unless you work in mixed signal IC design or 
test, you would never see these interfaces. For a DAC or ADC product, 
"customer visible" has a lot to do with it because people want 2's 
complement or biased binary at the system interface so that it plays 
nice with the arithmetic in the rest of the system or doesn't confuse 
the firmware people. For internals of a chip, as a digital design 
engineer or test engineer, you just deal with it. As for why it is done 
the way it is, I am not completely sure, it is just what analog 
designers do.

Even fairly analog seeming components like the voltage reference that 
David mentioned generally have a digital section with flash or OTP 
memory that gets read at power up and sets DACs used to configure the 
reference output to the 1% or whatever the part spec is. Bandgaps have 3 
or 4 analog parameters effecting output voltage and output flatness over 
temperature that need to be set. As part of final test on the silicon 
die, they measure the device performance and set the values in the 
memory. This allows the manufacturer to correct for process variations 
and uniformity issues across a wafer the size of a dinner plate. When 
you see "NC" pins on small analog devices, they are often used for the 
access to these memories. The actual signalling methods are pretty 
closely held and not something a customer is likely to stumble on.

The "I/O pins" in use are interconnects between the digital block and 
the analog section on the die. The last chip that I worked on had 100+ 
digital signals crossing the analog/digital boundary. Some were control 
signals and some were trim signals. I think that there were 5 or 6 
parallel interfaced DACs from 4 bits to 11 bits. Of those, 2 or 3 used 
sign magnitude format to pass the trim information across.

BobH

Article: 159679
Subject: Re: VHDL Editors (esp. V3S)
From: thomas.entner99@gmail.com
Date: Sat, 28 Jan 2017 16:10:51 -0800 (PST)
Links: << >> << T >> << A >>


>    I do almost all of my work in Verilog, but I do have comments
> about editors in general.
>=20
>    I actually tried Sigasi's editor briefly, but found it lacking some
> of the features I was used to, and I wasn't so interested in the
> project management portion since I generally work from the Xilinx
> GUI. =20

The last time I checked, both Sigasi and V3S had quite limited Verilog supp=
ort, I think in the moment they are really VHDL editors, with some basic Ve=
rilog support if you need to edit a Verilog file here and then...

I did a Verilog project only once, about 2 years ago, and at that time VEdi=
tor worked the best for me by far, at least from the free options (it was a=
lso recommended to me by the respective customer, but I first tried some di=
fferent approaches). It flagged a lot of issues in the source code that are=
 not detected by a Verilog compiler and saved me a lot of time looking for =
stupid bugs. (In contrast, VHDL editors flag errors, that would be detected=
 by the compiler... But this is a different topic ;-)

Regards,

Thomas

www.entner-electronics.com - Home of EEBlaster and JPEG Codec

Article: 159680
Subject: Re: Anyone use 1's compliment or signed magnitude?
From: Mike Perkins <spam@spam.com>
Date: Sun, 29 Jan 2017 18:06:56 +0000
Links: << >> << T >> << A >>

On 26/01/2017 01:14, Tim Wescott wrote:
> On Wed, 25 Jan 2017 02:59:46 -0800, cfbsoftware wrote:
>
>> On Wednesday, January 25, 2017 at 3:14:39 PM UTC+10:30, Tim Wescott
>> wrote:
>>> This is kind of a survey; I need some perspective (possibly historical)
>>>
>>> Are there any digital systems that you know of that use 1's compliment
>>> or signed-magnitude number representation for technical reasons?
>>>
>>> Have you ever used it in the past?
>>>
>>>
>> Quote:
>>
>> "Some designers chose 1’s complement, where −n was obtained from n by
>> simply inverting all bits. Some chose 2’s complement, where −n is
>> obtained by inverting all bits and then adding 1. The former has the
>> drawback of featuring two forms for zero (0…0 and 1…1). This is nasty,
>> particularly if available comparison instructions are inadequate. For
>> example, the CDC 6000 computers had an instruction that tested for zero,
>> recognizing both forms correctly, but also an instruction that tested
>> the sign bit only, classifying 1…1 as a negative number, making
>> comparisons unnecessarily complicated. This case of inadequate design
>> reveals 1’s complement as a bad idea. Today, all computers use 2’s
>> complement arithmetic."
>>
>> Ref: "Good Ideas, Through the Looking Glass" Niklaus Wirth, IEEE
>> Computer. Issue No. 01 - January (2006 vol. 39).
>>
>> https://www.computer.org/csdl/mags/co/2006/01/r1028-abs.html
>
> I'm looking for current practice, not history.

1's complement is nearly the very definition of history as implied by 
"Have you ever used it in the past?"

The only machine I am aware used 1's complement is a 1970's mainframe.

I have not been aware of any since, its a daft idea.

-- 
Mike Perkins
Video Solutions Ltd
www.videosolutions.ltd.uk

Article: 159681
Subject: Re: Hardware floating point?
From: David Brown <david.brown@hesbynett.no>
Date: Sun, 29 Jan 2017 22:39:48 +0100
Links: << >> << T >> << A >>

On 27/01/17 19:59, rickman wrote:
> On 1/27/2017 3:17 AM, David Brown wrote:
>> On 27/01/17 05:39, rickman wrote:
>>> On 1/26/2017 9:38 PM, Kevin Neilson wrote:
>>>>>>
>>>>>> I think you oversimplify FP.  It works a lot better with
>>>>>> dedicated hardware.
>>>>>
>>>>> Not sure what your point is.  The principles are the same in
>>>>> software or hardware.  I was describing hardware I have worked on.
>>>>> ST-100 from Star Technologies.  I became very intimate with the
>>>>> inner workings.
>>>>>
>>>>> The only complications are from the various error and special case
>>>>>  handling of the IEEE-754 format.  I doubt the FPGA is implementing
>>>>> that, but possibly.  The basics are still the same.  Adds use a
>>>>> barrel shifter to denormalize the mantissa so the exponents are
>>>>> equal, a integer adder and a normalization barrel shifter to
>>>>> produce the result.  Multiplies use a multiplier for the mantissas
>>>>> and an adder for the exponents (with adjustment for exponent bias)
>>>>> followed by a simple shifter to normalize the result.
>>>>>
>>>>> Both add and multiply are about the same level of complexity as a
>>>>> barrel shifter is almost as much logic as the multiplier.
>>>>>
>>>>> Other than the special case handling of IEEE-754, what do you think
>>>>> I am missing?
>>>>>
>>>>> --
>>>>>
>>>>> Rick C
>>>>
>>>> It just all works better with dedicated hardware.  Finding the
>>>> leading one for normalization is somewhat slow in the FPGA and is
>>>> something that benefits from dedicated hardware.  Using a DSP48 (if
>>>> we're talking about Xilinx) for a barrel shifter is fairly fast, but
>>>> requires 3 cycles of latency, can only shift up to 18 bits, and is
>>>> overkill for the task.  You're using a full multiplier as a shifter;
>>>> a dedicated shifter would be smaller and faster.  All this stuff adds
>>>> latency.  When I pull up CoreGen and ask for the basic FP adder, I
>>>> get something that uses only 2 DSP48s but has 12 cycles of latency.
>>>> And there is a lot of fabric routing so timing is not very
>>>> deterministic.
>>>
>>> I'm not sure how much you know about multipliers and shifters.
>>> Multipliers are not magical.  Multiplexers *are* big.  A multiplier has
>>> N stages with a one bit adder at every bit position.  A barrel
>>> multiplexer has nearly as many bit positions (you typically don't need
>>> all the possible outputs), but uses a bit less logic at each position.
>>> Each bit position still needs a full 4 input LUT.  Not tons of
>>> difference in complexity.
>>>
>>
>> A 32-bit barrel shifter can be made with 5 steps, each step being a set
>> of 32 two-input multiplexers.  Dedicated hardware for that will be
>> /much/ smaller and more efficient than using LUTs or a full multiplier.
>
> Yes, I stand corrected.  Still, it is hardly a "waste" of multipliers to
> use them for multiplexers.

Well, if the multipliers are already there and you don't have 
alternative dedicated hardware, then I agree you are not wasting the 
multipliers in using them for a shifter.

>
>
>> Normalisation of FP results also requires a "find first 1" operation.
>> Again, dedicated hardware is going to be a lot smaller and more
>> efficient than using LUT's.
>
> Find first 1 can be done using a carry chain which is quite fast.  It is
> the same function as used in Gray code operations.
>

It is not something I have looked into, but I'll happily take your word 
for it.  However, like pretty much /any/ function, it will be smaller 
and faster in dedicated hardware than in logic blocks.

>
>> So a DSP block that has dedicated FP support is going to be smaller and
>> faster than using integer DSP blocks with LUT's to do the same job.
>
> Who said it wouldn't be?  I say exactly that below.  My point was just
> that floating point isn't too hard to wrap your head around and not so
> horribly different from fixed point.  You just need to stick a few
> functions onto a fixed point multiplier/adder.

Fair enough.

>
> I was responding to:
>
> "Is this really a thing, or are they wrapping some more familiar fixed-
> point processing with IP to make it floating point?"
>
> The difference between fixed and floating point operations require a few
> functions beyond the basic integer operations which we have been
> discussing.  Floating point is not magic or incredibly hard to do.  It
> has not been included on FPGAs up until now because the primary market
> is integer based.

Okay.

>
> Some 15 years ago I discussed the need for hard IP in FPGAs and was told
> by certain Xilinx employees that it isn't practical to include hard IP
> because of the proliferation of combinations and wasted resources that
> result.  The trouble is the ratio of silicon area required for hard IP
> vs. FPGA fabric gets worse with each larger generation.  So as we see
> now FPGAs are including all manner of functio blocks.... like other
> devices.
>
> What I don't get is why FPGAs are so special that they are the last hold
> out of becoming system on chip devices.

I think this has come up before in this newsgroup.  But I can't remember 
if any conclusion was reached (probably not!).

>
>
>>> The multipliers I've seen have selectable latency down to 1 clock.
>>> Rolling a barrel shifter will generate many layers of logic that will
>>> need to be pipelined as well to reach high speeds, likely many more
>>> layers for the same speeds.
>>>
>>> What do you get if you design a floating point adder in the fabric?  I
>>> can only imagine it will be *much* larger and slower.
>

Article: 159682
Subject: Re: Hardware floating point?
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Mon, 30 Jan 2017 10:40:39 -0800 (PST)
Links: << >> << T >> << A >>

> >> Normalisation of FP results also requires a "find first 1" operation.
> >> Again, dedicated hardware is going to be a lot smaller and more
> >> efficient than using LUT's.
> >
> > Find first 1 can be done using a carry chain which is quite fast.  It i=
s
> > the same function as used in Gray code operations.
> >
>=20
> It is not something I have looked into, but I'll happily take your word=
=20
> for it.  However, like pretty much /any/ function, it will be smaller=20
> and faster in dedicated hardware than in logic blocks.
>=20

I've done it in a Xilinx, and it's not fast.  First you have to go across t=
he routing fabric and go through a set of LUTs to get onto the carry chain.=
  The carry chain is pretty fast; getting on and off the carry chain is slo=
w.  After you get off the carry chain, you have to go through the general r=
outing fabric again.  This is where most of your clock cycle gets eaten up.=
  Remember, if you had dedicated hardware, this would be a dedicated route.=
  Now you get into a second set of LUTs, where you have to AND the data fro=
m the carry chain with the original number in order to get a one-hot bus wi=
th only the leading 1 set.  Now you have to encode that into a number which=
 you can use for your shifter.  You may be able to do this with the same se=
t of LUTs; I can't remember.

Article: 159683
Subject: Re: Hardware floating point?
From: Tim Wescott <tim@seemywebsite.com>
Date: Mon, 30 Jan 2017 14:54:42 -0600
Links: << >> << T >> << A >>

On Mon, 30 Jan 2017 10:40:39 -0800, Kevin Neilson wrote:

>> >> Normalisation of FP results also requires a "find first 1"
>> >> operation.
>> >> Again, dedicated hardware is going to be a lot smaller and more
>> >> efficient than using LUT's.
>> >
>> > Find first 1 can be done using a carry chain which is quite fast.  It
>> > is the same function as used in Gray code operations.
>> >
>> >
>> It is not something I have looked into, but I'll happily take your word
>> for it.  However, like pretty much /any/ function, it will be smaller
>> and faster in dedicated hardware than in logic blocks.
>> 
>> 
> I've done it in a Xilinx, and it's not fast.  First you have to go
> across the routing fabric and go through a set of LUTs to get onto the
> carry chain.  The carry chain is pretty fast; getting on and off the
> carry chain is slow.  After you get off the carry chain, you have to go
> through the general routing fabric again.  This is where most of your
> clock cycle gets eaten up.  Remember, if you had dedicated hardware,
> this would be a dedicated route.  Now you get into a second set of LUTs,
> where you have to AND the data from the carry chain with the original
> number in order to get a one-hot bus with only the leading 1 set.  Now
> you have to encode that into a number which you can use for your
> shifter.  You may be able to do this with the same set of LUTs; I can't
> remember.

What Xilinx part?

The Altera Stratus 10 (I think that's the one) uses paired DSP blocks 
that are designed with a bit of extra logic so that you can use the pair 
of them as a floating-point block, or each one as a fixed-point block.  
(I'm not using their terminology).

Apparently there's enough stuff going on at the really high end that 
floating point is better.

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Article: 159684
Subject: Re: Hardware floating point?
From: Benjamin Couillard <benjamin.couillard@gmail.com>
Date: Mon, 30 Jan 2017 13:24:40 -0800 (PST)
Links: << >> << T >> << A >>

Le vendredi 27 janvier 2017 11:34:00 UTC-5, David Brown a =C3=A9crit=C2=A0:
> On 27/01/17 16:12, Benjamin Couillard wrote:
> > Le vendredi 27 janvier 2017 03:17:21 UTC-5, David Brown a =C3=A9crit :
> >> On 27/01/17 05:39, rickman wrote:
> >>> On 1/26/2017 9:38 PM, Kevin Neilson wrote:
> >>>>>>=20
> >>>>>> I think you oversimplify FP.  It works a lot better with=20
> >>>>>> dedicated hardware.
> >>>>>=20
> >>>>> Not sure what your point is.  The principles are the same in=20
> >>>>> software or hardware.  I was describing hardware I have
> >>>>> worked on. ST-100 from Star Technologies.  I became very
> >>>>> intimate with the inner workings.
> >>>>>=20
> >>>>> The only complications are from the various error and special
> >>>>> case handling of the IEEE-754 format.  I doubt the FPGA is
> >>>>> implementing that, but possibly.  The basics are still the
> >>>>> same.  Adds use a barrel shifter to denormalize the mantissa
> >>>>> so the exponents are equal, a integer adder and a
> >>>>> normalization barrel shifter to produce the result.
> >>>>> Multiplies use a multiplier for the mantissas and an adder
> >>>>> for the exponents (with adjustment for exponent bias)=20
> >>>>> followed by a simple shifter to normalize the result.
> >>>>>=20
> >>>>> Both add and multiply are about the same level of complexity
> >>>>> as a barrel shifter is almost as much logic as the
> >>>>> multiplier.
> >>>>>=20
> >>>>> Other than the special case handling of IEEE-754, what do you
> >>>>> think I am missing?
> >>>>>=20
> >>>>> --
> >>>>>=20
> >>>>> Rick C
> >>>>=20
> >>>> It just all works better with dedicated hardware.  Finding the=20
> >>>> leading one for normalization is somewhat slow in the FPGA and
> >>>> is something that benefits from dedicated hardware.  Using a
> >>>> DSP48 (if we're talking about Xilinx) for a barrel shifter is
> >>>> fairly fast, but requires 3 cycles of latency, can only shift
> >>>> up to 18 bits, and is overkill for the task.  You're using a
> >>>> full multiplier as a shifter; a dedicated shifter would be
> >>>> smaller and faster.  All this stuff adds latency.  When I pull
> >>>> up CoreGen and ask for the basic FP adder, I get something that
> >>>> uses only 2 DSP48s but has 12 cycles of latency. And there is a
> >>>> lot of fabric routing so timing is not very deterministic.
> >>>=20
> >>> I'm not sure how much you know about multipliers and shifters.=20
> >>> Multipliers are not magical.  Multiplexers *are* big.  A
> >>> multiplier has N stages with a one bit adder at every bit
> >>> position.  A barrel multiplexer has nearly as many bit positions
> >>> (you typically don't need all the possible outputs), but uses a
> >>> bit less logic at each position. Each bit position still needs a
> >>> full 4 input LUT.  Not tons of difference in complexity.
> >>>=20
> >>=20
> >> A 32-bit barrel shifter can be made with 5 steps, each step being a
> >> set of 32 two-input multiplexers.  Dedicated hardware for that will
> >> be /much/ smaller and more efficient than using LUTs or a full
> >> multiplier.
> >>=20
> >> Normalisation of FP results also requires a "find first 1"
> >> operation. Again, dedicated hardware is going to be a lot smaller
> >> and more efficient than using LUT's.
> >>=20
> >> So a DSP block that has dedicated FP support is going to be smaller
> >> and faster than using integer DSP blocks with LUT's to do the same
> >> job.
> >>=20
> >>> The multipliers I've seen have selectable latency down to 1
> >>> clock. Rolling a barrel shifter will generate many layers of
> >>> logic that will need to be pipelined as well to reach high
> >>> speeds, likely many more layers for the same speeds.
> >>>=20
> >>> What do you get if you design a floating point adder in the
> >>> fabric?  I can only imagine it will be *much* larger and slower.
> >>>=20
> >=20
> > If I understand, you can do a barrel shifter with log2(n) complexity,
> > hence your 5 steps but you will have the combitional delays of 5
> > muxes, it could limit your maximum clock frequency. A brute force
> > approach will use more resoures but will probably allow a higher
> > clock frequency.
> >=20
>=20
> The "brute force" method would be 1 layer of 32 32-input multiplexers.
> And how do you implement a 32-input multiplexer in gates?  You basically
> have 5 layers of 2-input multiplexers.
>=20
> If the depth of the multiplexer is high enough, you might use tri-state
> gates but I suspect that in this case you'd implement it with normal logi=
c.

Yeah, you're right.

Article: 159685
Subject: Re: Hardware floating point?
From: rickman <gnuarm@gmail.com>
Date: Mon, 30 Jan 2017 16:30:14 -0500
Links: << >> << T >> << A >>

On 1/30/2017 1:40 PM, Kevin Neilson wrote:
>>>> Normalisation of FP results also requires a "find first 1" operation.
>>>> Again, dedicated hardware is going to be a lot smaller and more
>>>> efficient than using LUT's.
>>>
>>> Find first 1 can be done using a carry chain which is quite fast.  It is
>>> the same function as used in Gray code operations.
>>>
>>
>> It is not something I have looked into, but I'll happily take your word
>> for it.  However, like pretty much /any/ function, it will be smaller
>> and faster in dedicated hardware than in logic blocks.
>>
>
> I've done it in a Xilinx, and it's not fast.  First you have to go across the routing fabric and go through a set of LUTs to get onto the carry chain.  The carry chain is pretty fast; getting on and off the carry chain is slow.  After you get off the carry chain, you have to go through the general routing fabric again.  This is where most of your clock cycle gets eaten up.  Remember, if you had dedicated hardware, this would be a dedicated route.  Now you get into a second set of LUTs, where you have to AND the data from the carry chain with the original number in order to get a one-hot bus with only the leading 1 set.  Now you have to encode that into a number which you can use for your shifter.  You may be able to do this with the same set of LUTs; I can't remember.

The comparison is using a carry chain vs. not using a carry chain. 
First 1 in LUTs is either log2(N) in depth and linear in size or log2(N) 
in size and linear in depth (speed).  Using general routing and LUTs 
this is very slow.  Using a fast carry uses a LUT to enter the carry 
chain and a LUT to exit the carry chain.  The carry chain is a fraction 
of a nanosecond per bit.

-- 

Rick C

Article: 159686
Subject: Re: Hardware floating point?
From: rickman <gnuarm@gmail.com>
Date: Mon, 30 Jan 2017 16:32:51 -0500
Links: << >> << T >> << A >>

On 1/30/2017 3:54 PM, Tim Wescott wrote:
> On Mon, 30 Jan 2017 10:40:39 -0800, Kevin Neilson wrote:
>
>>>>> Normalisation of FP results also requires a "find first 1"
>>>>> operation.
>>>>> Again, dedicated hardware is going to be a lot smaller and more
>>>>> efficient than using LUT's.
>>>>
>>>> Find first 1 can be done using a carry chain which is quite fast.  It
>>>> is the same function as used in Gray code operations.
>>>>
>>>>
>>> It is not something I have looked into, but I'll happily take your word
>>> for it.  However, like pretty much /any/ function, it will be smaller
>>> and faster in dedicated hardware than in logic blocks.
>>>
>>>
>> I've done it in a Xilinx, and it's not fast.  First you have to go
>> across the routing fabric and go through a set of LUTs to get onto the
>> carry chain.  The carry chain is pretty fast; getting on and off the
>> carry chain is slow.  After you get off the carry chain, you have to go
>> through the general routing fabric again.  This is where most of your
>> clock cycle gets eaten up.  Remember, if you had dedicated hardware,
>> this would be a dedicated route.  Now you get into a second set of LUTs,
>> where you have to AND the data from the carry chain with the original
>> number in order to get a one-hot bus with only the leading 1 set.  Now
>> you have to encode that into a number which you can use for your
>> shifter.  You may be able to do this with the same set of LUTs; I can't
>> remember.
>
> What Xilinx part?
>
> The Altera Stratus 10 (I think that's the one) uses paired DSP blocks
> that are designed with a bit of extra logic so that you can use the pair
> of them as a floating-point block, or each one as a fixed-point block.
> (I'm not using their terminology).
>
> Apparently there's enough stuff going on at the really high end that
> floating point is better.

I'm not sure what "high end" means.  Floating point has some advantages 
and it has some disadvantages.  Fixed point is the same.  Neither is 
perfect for all uses or even *any* uses actually.  You always need to 
analyze the problem you are solving and consider the sources of 
computational errors.  They are different but always potentially present 
with either approach.

-- 

Rick C

Article: 159687
Subject: Re: Hardware floating point?
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Mon, 30 Jan 2017 15:44:48 -0600
Links: << >> << T >> << A >>

On Mon, 30 Jan 2017 16:32:51 -0500, rickman wrote:

> On 1/30/2017 3:54 PM, Tim Wescott wrote:
>> On Mon, 30 Jan 2017 10:40:39 -0800, Kevin Neilson wrote:
>>
>>>>>> Normalisation of FP results also requires a "find first 1"
>>>>>> operation.
>>>>>> Again, dedicated hardware is going to be a lot smaller and more
>>>>>> efficient than using LUT's.
>>>>>
>>>>> Find first 1 can be done using a carry chain which is quite fast. 
>>>>> It is the same function as used in Gray code operations.
>>>>>
>>>>>
>>>> It is not something I have looked into, but I'll happily take your
>>>> word for it.  However, like pretty much /any/ function, it will be
>>>> smaller and faster in dedicated hardware than in logic blocks.
>>>>
>>>>
>>> I've done it in a Xilinx, and it's not fast.  First you have to go
>>> across the routing fabric and go through a set of LUTs to get onto the
>>> carry chain.  The carry chain is pretty fast; getting on and off the
>>> carry chain is slow.  After you get off the carry chain, you have to
>>> go through the general routing fabric again.  This is where most of
>>> your clock cycle gets eaten up.  Remember, if you had dedicated
>>> hardware, this would be a dedicated route.  Now you get into a second
>>> set of LUTs,
>>> where you have to AND the data from the carry chain with the original
>>> number in order to get a one-hot bus with only the leading 1 set.  Now
>>> you have to encode that into a number which you can use for your
>>> shifter.  You may be able to do this with the same set of LUTs; I
>>> can't remember.
>>
>> What Xilinx part?
>>
>> The Altera Stratus 10 (I think that's the one) uses paired DSP blocks
>> that are designed with a bit of extra logic so that you can use the
>> pair of them as a floating-point block, or each one as a fixed-point
>> block. (I'm not using their terminology).
>>
>> Apparently there's enough stuff going on at the really high end that
>> floating point is better.
> 
> I'm not sure what "high end" means.  Floating point has some advantages
> and it has some disadvantages.  Fixed point is the same.  Neither is
> perfect for all uses or even *any* uses actually.  You always need to
> analyze the problem you are solving and consider the sources of
> computational errors.  They are different but always potentially present
> with either approach.

Yes, you are correct.

I tend to mostly work with stuff that comes out of an ADC, goes through 
some processing (usually for me it's a processor and not an FPGA, but 
it's still DSP), and then goes out a DAC.  In that case, fixed-point 
processing for the signal itself is usually the way to go because the ADC 
and DAC between them pretty much set the ranges, which means that 
floating point is just a waste of silicon.

HOWEVER: that's just what I mostly run into.  I'm currently working on a 
project where, by its nature, the sensible numerical format is double-
precision floating point (not FPGA -- it's _slow_ data reception on a PC-
class processor, where double-precision floating point is almost as fast 
as integer math unless you use the DSP extensions).

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Article: 159688
Subject: Re: VHDL Editors (esp. V3S)
From: noreeli.schmidt79@gmail.com
Date: Thu, 2 Feb 2017 00:36:32 -0800 (PST)
Links: << >> << T >> << A >>

I have installed V3S ...

When I instantiate a component in the following manner:

i_my: entity work.sub

V3S complains that sub is unknown ...

How can I suppress that behavior? (Ok, apart from using a component declaration...)

Noro

Article: 159689
Subject: Re: VHDL Editors (esp. V3S)
From: thomas.entner99@gmail.com
Date: Thu, 2 Feb 2017 12:29:40 -0800 (PST)
Links: << >> << T >> << A >>

Am Donnerstag, 2. Februar 2017 09:36:34 UTC+1 schrieb noreeli....@gmail.com:
> I have installed V3S ...
> 
> When I instantiate a component in the following manner:
> 
> i_my: entity work.sub
> 
> V3S complains that sub is unknown ...
> 
> How can I suppress that behavior? (Ok, apart from using a component declaration...)
> 
> Noro

<sub> must be defined somewhere in the project (either another entity, or in a library/package) -> add that file to the project

Thomas

Article: 159690
Subject: Re: VHDL, how to convert sensor data to Q15
From: abirov@gmail.com
Date: Fri, 3 Feb 2017 07:32:52 -0800 (PST)
Links: << >> << T >> << A >>

It is too complicated to understand to me,I just want to divide range of nu=
mbers from -32767 to +32767 and get according data from -1 to +1. To get th=
is i must divide by 32767.

For this purpose i use Xilinx divider generator.So i use signed core, remin=
der type both fractional and reminder too.=20

It is OK when results must be 1 or more,but when it less then 1 thats sad.
When result must be more then 1 quotient show right results in two's compli=
ment digits (according to datasheet) . But !!! fractional part is mad, cann=
ot use two's compliment digit conversion. when i divide any digit to 32767 =
(0111111111111111) and in fractional result is always  dividend. I tryed fr=
actional part 32-bit width but no result/

Why it doesnot work ? Does anyone meet this problem ?

Article: 159691
Subject: Re: VHDL, how to convert sensor data to Q15
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Fri, 03 Feb 2017 11:36:29 -0600
Links: << >> << T >> << A >>

On Fri, 03 Feb 2017 07:32:52 -0800, abirov wrote:

> It is too complicated to understand to me,I just want to divide range of
> numbers from -32767 to +32767 and get according data from -1 to +1. To
> get this i must divide by 32767.
> 
> For this purpose i use Xilinx divider generator.So i use signed core,
> reminder type both fractional and reminder too.
> 
> It is OK when results must be 1 or more,but when it less then 1 thats
> sad.
> When result must be more then 1 quotient show right results in two's
> compliment digits (according to datasheet) . But !!! fractional part is
> mad, cannot use two's compliment digit conversion. when i divide any
> digit to 32767 (0111111111111111) and in fractional result is always 
> dividend. I tryed fractional part 32-bit width but no result/
> 
> Why it doesnot work ? Does anyone meet this problem ?

Explain how it is "mad".

And please, please, please, stop for a moment and think on how sensible 
it is to use up a whole bunch of resources to do a divide by 32767 when a 
divide by 32768 is just a matter of shifting down by 16 bits -- which, on 
an FPGA, is simply a matter of relabeling your wires.

If you're absolutely bound and determined to divide by 32767, then use 
the following rule, which shouldn't take too much logic, because if you 
think about it you'll only be paying attention to the top two bits:

* If the input number has an absolute value less than 0x4000, shift down 
by 16

* If the input number has an absolute value 0x4000 or greater, shift down 
by 16 and add (or subtract) 1 to (from) it, depending on whether it's 
positive or negative.

* Unless, of course, the input is 32767, in which case you need to shift 
down by 16 and _don't_ add 1, because if you do the result will be -1, 
which is a lot different from 1 - 1/32768.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Article: 159692
Subject: Re: VHDL, how to convert sensor data to Q15
From: abirov@gmail.com
Date: Mon, 6 Feb 2017 05:26:03 -0800 (PST)
Links: << >> << T >> << A >>

Shifting down by 15 means what? My english very poor sorry.
You mean shifting down like following vhdl code :

Ding(15) is sign bit.
Dout <= std_logic_vector(unsigned(Din(14 downto 1)) sll 14);  ?

Article: 159693
Subject: Re: VHDL, how to convert sensor data to Q15
From: abirov@gmail.com
Date: Mon, 6 Feb 2017 06:17:10 -0800 (PST)
Links: << >> << T >> << A >>

On Friday, February 3, 2017 at 11:36:36 PM UTC+6, Tim Wescott wrote:
> On Fri, 03 Feb 2017 07:32:52 -0800, abirov wrote:
> 
> > It is too complicated to understand to me,I just want to divide range of
> > numbers from -32767 to +32767 and get according data from -1 to +1. To
> > get this i must divide by 32767.
> > 
> > For this purpose i use Xilinx divider generator.So i use signed core,
> > reminder type both fractional and reminder too.
> > 
> > It is OK when results must be 1 or more,but when it less then 1 thats
> > sad.
> > When result must be more then 1 quotient show right results in two's
> > compliment digits (according to datasheet) . But !!! fractional part is
> > mad, cannot use two's compliment digit conversion. when i divide any
> > digit to 32767 (0111111111111111) and in fractional result is always 
> > dividend. I tryed fractional part 32-bit width but no result/
> > 
> > Why it doesnot work ? Does anyone meet this problem ?
> 
> Explain how it is "mad".
> 
> And please, please, please, stop for a moment and think on how sensible 
> it is to use up a whole bunch of resources to do a divide by 32767 when a 
> divide by 32768 is just a matter of shifting down by 16 bits -- which, on 
> an FPGA, is simply a matter of relabeling your wires.
> 
> If you're absolutely bound and determined to divide by 32767, then use 
> the following rule, which shouldn't take too much logic, because if you 
> think about it you'll only be paying attention to the top two bits:
> 
> * If the input number has an absolute value less than 0x4000, shift down 
> by 16
> 
> * If the input number has an absolute value 0x4000 or greater, shift down 
> by 16 and add (or subtract) 1 to (from) it, depending on whether it's 
> positive or negative.
> 
> * Unless, of course, the input is 32767, in which case you need to shift 
> down by 16 and _don't_ add 1, because if you do the result will be -1, 
> which is a lot different from 1 - 1/32768.
> 
> -- 
> 
> Tim Wescott
> Wescott Design Services
> http://www.wescottdesign.com
> 
> I'm looking for work -- see my website!

I got it, and finish this problem, Very thanx  everything is OK now.

Article: 159694
Subject: Re: VHDL, how to convert sensor data to Q15
From: rickman <gnuarm@gmail.com>
Date: Mon, 6 Feb 2017 14:24:59 -0500
Links: << >> << T >> << A >>

On 2/6/2017 8:26 AM, abirov@gmail.com wrote:
> Shifting down by 15 means what? My english very poor sorry.
> You mean shifting down like following vhdl code :
>
> Ding(15) is sign bit.
> Dout <= std_logic_vector(unsigned(Din(14 downto 1)) sll 14);  ?

Shifting down mean a right shift.  "Down" because the value of the 
number is less.  The assumption is that the input value is an integer. 
So logically to divide by 32768 (2^15) would be the same as a right 
shift by 15 bits.  Your integer has no fractional part so this would 
require using a fixed point 31 bit number 16.15 which means 16 bits to 
the left of the binary point and 15 bits to the right.  This will 
prevent loss of data when shifting.  However...

Shifting can also be done by moving the binary point while keeping the 
data in place.  In other words, treat the data Ding(15 downto 0) as a 
1.15 fixed point number rather than a 16 bit integer.

I hope this is more clear.

-- 

Rick C

Article: 159695
Subject: Re: VHDL, how to convert sensor data to Q15
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Mon, 06 Feb 2017 14:23:50 -0600
Links: << >> << T >> << A >>

On Mon, 06 Feb 2017 14:24:59 -0500, rickman wrote:

> On 2/6/2017 8:26 AM, abirov@gmail.com wrote:
>> Shifting down by 15 means what? My english very poor sorry. You mean
>> shifting down like following vhdl code :
>>
>> Ding(15) is sign bit.
>> Dout <= std_logic_vector(unsigned(Din(14 downto 1)) sll 14);  ?
> 
> Shifting down mean a right shift.  "Down" because the value of the
> number is less.  The assumption is that the input value is an integer.
> So logically to divide by 32768 (2^15) would be the same as a right
> shift by 15 bits.  Your integer has no fractional part so this would
> require using a fixed point 31 bit number 16.15 which means 16 bits to
> the left of the binary point and 15 bits to the right.  This will
> prevent loss of data when shifting.  However...
> 
> Shifting can also be done by moving the binary point while keeping the
> data in place.  In other words, treat the data Ding(15 downto 0) as a
> 1.15 fixed point number rather than a 16 bit integer.
> 
> I hope this is more clear.

That's what I said!  Only Rick's version makes sense.

Yes -- perform a right shift.  Except, as Rick says, you're not really 
moving anything, you're just re-labeling the wires.  Your 16-bit integer 
had a wire with weight 1, a wire with weight 2, etc., all the way up to a 
wire with weight 32768.  You "shift" that by relabeling your wires as 
having weight 1/32768, 1/16384, ... 1/2, 1.

Note that there is no physical operation whatsoever inside your chip to 
perform this shift -- you're just _thinking differently_ about the number 
for all operations except multiplications.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Article: 159696
Subject: Re: VHDL, how to convert sensor data to Q15
From: Rob Gaddi <rgaddi@highlandtechnology.invalid>
Date: Mon, 6 Feb 2017 12:57:41 -0800
Links: << >> << T >> << A >>

On 02/06/2017 12:23 PM, Tim Wescott wrote:
> On Mon, 06 Feb 2017 14:24:59 -0500, rickman wrote:
>
>> On 2/6/2017 8:26 AM, abirov@gmail.com wrote:
>>> Shifting down by 15 means what? My english very poor sorry. You mean
>>> shifting down like following vhdl code :
>>>
>>> Ding(15) is sign bit.
>>> Dout <= std_logic_vector(unsigned(Din(14 downto 1)) sll 14);  ?
>>
>> Shifting down mean a right shift.  "Down" because the value of the
>> number is less.  The assumption is that the input value is an integer.
>> So logically to divide by 32768 (2^15) would be the same as a right
>> shift by 15 bits.  Your integer has no fractional part so this would
>> require using a fixed point 31 bit number 16.15 which means 16 bits to
>> the left of the binary point and 15 bits to the right.  This will
>> prevent loss of data when shifting.  However...
>>
>> Shifting can also be done by moving the binary point while keeping the
>> data in place.  In other words, treat the data Ding(15 downto 0) as a
>> 1.15 fixed point number rather than a 16 bit integer.
>>
>> I hope this is more clear.
>
> That's what I said!  Only Rick's version makes sense.
>
> Yes -- perform a right shift.  Except, as Rick says, you're not really
> moving anything, you're just re-labeling the wires.  Your 16-bit integer
> had a wire with weight 1, a wire with weight 2, etc., all the way up to a
> wire with weight 32768.  You "shift" that by relabeling your wires as
> having weight 1/32768, 1/16384, ... 1/2, 1.
>
> Note that there is no physical operation whatsoever inside your chip to
> perform this shift -- you're just _thinking differently_ about the number
> for all operations except multiplications.
>

The bundle of wires doesn't care whether you think it has a decimal 
point in it or not.


-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.

Article: 159697
Subject: Re: Anyone use 1's compliment or signed magnitude?
From: Alexander Kane <ajpkane@gmail.com>
Date: Tue, 7 Feb 2017 04:50:15 -0800 (PST)
Links: << >> << T >> << A >>

On Wednesday, 25 January 2017 05:44:39 UTC+1, Tim Wescott  wrote:
> This is kind of a survey; I need some perspective (possibly historical)
> 
> Are there any digital systems that you know of that use 1's compliment or 
> signed-magnitude number representation for technical reasons?  
> 
> Have you ever used it in the past?
> 
> Is the world down to legacy applications and interfacing with legacy 
> sensors?
> 
> TIA.
> 
> -- 
> 
> Tim Wescott
> Wescott Design Services
> http://www.wescottdesign.com
> 
> I'm looking for work -- see my website!

I've used sign-magnitude for some compression schemes, in which case the sign bit takes the least significant place.
10   = -1
11   = +1
100  = -2
101  = +2
110  = -3
111  = +3
1000 = -4
...

Article: 159698
Subject: All-real FFT for FPGA
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Sun, 12 Feb 2017 12:05:17 -0600
Links: << >> << T >> << A >>

So, there are algorithms out there to perform an FFT on real data, that 
save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically for 
FPGAs.  Was my search too quick, or are there no IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include 
these algorithms in their FFT libraries.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Article: 159699
Subject: Re: All-real FFT for FPGA
From: jim.brakefield@ieee.org
Date: Sun, 12 Feb 2017 16:54:07 -0800 (PST)
Links: << >> << T >> << A >>

On Sunday, February 12, 2017 at 12:05:25 PM UTC-6, Tim Wescott wrote:
> So, there are algorithms out there to perform an FFT on real data, that 
> save (I think) roughly 2x the calculations of FFTs for complex data.
> 
> I did a quick search, but didn't find any that are made specifically for 
> FPGAs.  Was my search too quick, or are there no IP sources to do this?
> 
> It would seem like a slam-dunk for Xilinx and Intel/Altera to include 
> these algorithms in their FFT libraries.
> 
> -- 
> 
> Tim Wescott
> Wescott Design Services
> http://www.wescottdesign.com
> 
> I'm looking for work -- see my website!

It's been a long time, as I remember:
The Hartley transform will work.
Shuffling the data before and after a half size complex FFT will work.
And you can use one of them to check the other.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search